The Intricate Workings of Hash Tables

Notation:
O(x) = Worst Case Running Time
W(x) = Best Case Running Time
Q(x) = Best and Worst case are the same.

Page numbers refer to the Preiss text book Data Structures and Algorithms with Object-Orientated Design Patterns in Java.

Hash Algorithm Page Implementation, Conflict Resolution Advantages Disadvantages Asymptotic Complexities
Chained 211 Implemented as an Array of linked lists. Inserted elements are appended to the linked list that corresponds to their particular hash value. Potentially infinite size. Code friendly implementation. Find and withdraw have the same complexities. Can be wasteful on memory, as many of array positions might contain empty linked lists. Must have dynamically available memory to satisfy the demands of the linked lists.

Insert: Q(1)
Find/Withdraw: O(n)1

Chained Scatter 219 Implemented as a fixed array of data pairs. The first is the element, and the second is a pointer (possibly null) to the next element in the "chain" (with the same hash value). When a collision occurs, the chain is followed until an empty space is found, and the new element is inserted there. All information is stored within a fixed size array, therefore no more memory is required. Memory reserved for that array is made use of. Not infinite capacity, as when the load factor increases, the chains have a tendency to coalesce (combine to form larger chains), making removal very complicated, and the other operations slower.

Insert: O(M)2
Find: O(M)2
Withdraw: O(M2)

Open Scatter 227 Similar to the Chained Scatter implementation, however instead of a pointer system, a probing function is used to determine the location of the next element in the event of a collision. A probing function can something as simple as a linear equation, or it can be much more complex. Has the same advantages as the Chained scatter table, however there is no need for the pointer, and if the probing function is very good, (especially not linear), then the chance of the elements clustering together (similar effect to search times as coalescing, however the withdrawal is the same as the search time for any given element) is less. Like the Chained Scatter table, the array can fill up, and at worst (if it is full) slow the find and withdraw operations to O(M) time.

Insert: O(n)3
Find: O(M)4
Withdraw: O(M)4

Each of the Insert, Find and Withdraw algorithms have best case W(1) running times, which occurs when each of the keys maps to an unique table entry. i.e. there have been no collisions. This normally means that the hashing function is very good at generating unique hash values, or there are very few elements.

1 Where n is the total number of elements in the Table. This will only occur if all of those keys map to the same hash value
2 Where M is the number of elements in the Hash table array. This will only occur if there is only 1 unused entry, and the item to be inserted hashes to the the head of the chain, or in the case of find, when the element doesn't exist, and the table is full.
3 As it has to search for the next available space, if all of the elements (n) hash to the same value it is an O(n) search
4 Unlike the insert method, this has to search for the next available empty space, not empty or deleted (see below), so it is possible it would have to search though the entire table (M).

 

Chained Hash Tables

More points:

 

Chained Scatter Hash Table

More points:

 

Open Scatter Hash Table

More points:

 

Summary:

Chained Hash tables are probably the most versatile due to their relative simplicity, and potentially infinite storage (limited of course by the hardware). Chained Scatter tables are very mucky, and shouldn't be touched with a 40ft pole. Open Scatter tables however are a bit more useful, especially in limited memory environments, as they embody all of the advantages of a chained scatter table without some of the nasty site-effects.

These study notes are proudly brought to you by William Denniss.
Date created: 20 Nov 2001
OmegaDelta.net