The Intricate Workings of Hash Tables
Notation:
O(x) = Worst Case Running Time
W(x) = Best Case Running Time
Q(x) =
Best and Worst case are the same.
Page numbers refer to the Preiss text book Data Structures and Algorithms
with Object-Orientated Design Patterns in Java.
| Hash Algorithm |
Page |
Implementation, Conflict Resolution |
Advantages |
Disadvantages |
Asymptotic Complexities |
| Chained |
211 |
Implemented as an Array of linked
lists. Inserted elements are appended to the linked list that corresponds
to their particular hash value. |
Potentially infinite size. Code friendly
implementation. Find and withdraw have the same complexities. |
Can be wasteful on memory, as many
of array positions might contain empty linked lists. Must have dynamically
available memory to satisfy the demands of the linked lists. |
Insert: Q(1)
Find/Withdraw: O(n)1
|
| Chained
Scatter |
219 |
Implemented as a fixed array of
data pairs. The first is the element, and the second is a pointer (possibly
null) to the next element in the "chain" (with the same hash value).
When a collision occurs, the chain is followed until an empty space is found,
and the new element is inserted there. |
All information is stored within
a fixed size array, therefore no more memory is required. Memory reserved
for that array is made use of. |
Not infinite capacity, as when the
load factor increases, the chains have a tendency to coalesce (combine to
form larger chains), making removal very complicated, and the other operations
slower. |
Insert: O(M)2
Find: O(M)2
Withdraw: O(M2)
|
| Open
Scatter |
227 |
Similar to the Chained Scatter implementation,
however instead of a pointer system, a probing function is used to determine
the location of the next element in the event of a collision. A probing
function can something as simple as a linear equation, or it can be much
more complex. |
Has the same advantages as the Chained
scatter table, however there is no need for the pointer, and if the probing
function is very good, (especially not linear), then the chance of the elements
clustering together (similar effect to search times as coalescing, however
the withdrawal is the same as the search time for any given element) is
less. |
Like the Chained Scatter table,
the array can fill up, and at worst (if it is full) slow the find and withdraw
operations to O(M) time. |
Insert: O(n)3
Find: O(M)4
Withdraw: O(M)4
|
Each of the Insert, Find and Withdraw algorithms have best case W(1)
running times, which occurs when each of the keys maps to an unique table entry.
i.e. there have been no collisions. This normally means that the hashing function
is very good at generating unique hash values, or there are very few elements.
1 Where n is the total number of elements in the Table. This will
only occur if all of those keys map to the same hash value
2 Where M is the number of elements in the Hash table array. This
will only occur if there is only 1 unused entry, and the item to be inserted
hashes to the the head of the chain, or in the case of find, when the element
doesn't exist, and the table is full.
3 As it has to search for the next available space, if all of the
elements (n) hash to the same value it is an O(n) search
4 Unlike the insert method, this has to search for the next available
empty space, not empty or deleted (see below),
so it is possible it would have to search though the entire table (M).
Chained Hash Tables
More points:
- Insertion is always O(1) as you are simply appending the element onto the
linked list that corresponds to its hash value, which is an O(1) operation.
- When searching for an item, you retrieve its position using the hash function,
and then do a potentially O(n) search on that linked list to find the element
you are looking for.
- The biggest disadvantage of this is the wasted space, and the need for lots
of dynamic memory to handle the linked lists.
- This however has some considerable advantages, where each of the insert,
find and delete are at their minimum of the three implementations, and the
quality doesn't degrade as much when the load factor increases. There is also
capacity, for potentially unlimited storage, restricted only by the hardware.

Chained Scatter Hash Table
More points:
- When searching for an element, you visit its position using the hash function,
and if it is not there, you follow its pointer to the next possible location,
and the search continues, until either the item is found, or a null pointer
is encountered.
- Insertion in the worst case as slow as finding an element, as you have to
travel to the end of the chain that corresponds to that hash value (the first
free value).
- This essentially uses linked lists as well, with the data being stored in
the table, as opposed to it being external. The big difference is fact that
the lists (chains) can coalesce. This is where a collision occurs with an
item not in its originally hashed value, and thus, the chain of that element,
and of the newly inserted one combine.
- This causes many headaches, especially when the load on the table becomes
high, and there is little space left in the table. The greatest disadvantage
is the withdrawal (deletion) method, which must ensure no chains are broken,
causing the operation to be O(M2). It is however fairly efficient
when the load is light.

Open Scatter Hash Table
More points:
- When searching for an element, firstly the hash value position is checked,
followed by the first value in the probing function sequence, and so on until
either an empty element is found, or the desired one. The reason it must keep
going even if it finds a deleted one is that that element could have been
deleted from the middle of a "chain".
- Insertion is similar to searching, however when you come across the first
empty or deleted cell, you insert the new element there, making it O(n) worst
case (which happens when all of the elements are hashed to the same value).
- Withdraw is exactly the same running time as find. When an element is deleted
that cell is marked as "deleted" as opposed to "empty".
This is so elements further down the chain are still linked to the start of
the chain.
- The advantage of this is that when searching for a free cell, as opposed
to the Chained Scatter Table, it stops when it finds an "empty"
or "deleted" cell, even if it is in the middle of the chain
- The disadvantage is that at no time, except when the entire table is purged
or created, is a cell ever marked "empty", and the find operation
(as opposed to the insert operation) keeps searching until it finds either
a cell marked "empty" or the correct element, which is of course
ONLY a problem when that element can't be found, and if at some stage in the
life of the table, all cells have been used, finding an element that does
not exist has a minimum running time of W(M).
- The effectiveness of this hashing method relies heavily on a good probing
function. There are 3 commonly used probing types:
- Linear Probing. This is the simplest, and the benefits aren't as good
because as the table fills, values cluster together which increases search
times.
- Quadratic Probing. So as all array positions are accessible from a given
hash value using a quadratic probing function, the array size needs to
be a prime number (so that the probing function is modulo M, where M is
a prime). The other restriction is that it won't work unless the table
is less than half full. As only the first M/2 probes are distinct.
- Double Hashing fixes both of these problems. It uses two separate hash
functions, and a probing sequence which incorporates both. Basically how
it works is that most of the time, a quadratic search is used, however
when the second hashing function hashes a value to 1, a linear search
is used. In a good search algorithm, this should happen only very rarely.
Because it is still using a form of quadratic probing, M must be prime,
or else not all values would be reached before the probing algorithm started
repeating.

Summary:
Chained Hash tables are probably the most versatile due to their relative simplicity,
and potentially infinite storage (limited of course by the hardware). Chained
Scatter tables are very mucky, and shouldn't be touched with a 40ft pole. Open
Scatter tables however are a bit more useful, especially in limited memory environments,
as they embody all of the advantages of a chained scatter table without some
of the nasty site-effects.
These study notes are proudly brought to you by William Denniss.
Date created: 20 Nov 2001
OmegaDelta.net