A hash table is a data structure that maps a unique key to a value, usually with an associative array and a hash function.

The hash functions that hash table use are usually imperfect, meaning they would always result in eventual collisions.

As such hash tables need to deal with collisions as they occur.

Some methods to reduce the amount of collisions in the first place, is to have a secondary hash function that is triggered when an element would collide with just one hash function. Adding the second hash function would jump the insertion value to a space denoted by the result of the second hash function.

Removal

One problem that occurs is when removing an element in the middle of a cluster, that could cause a linear probing algorithm to fail, as it would end prematurely upon finding an empty space in the middle of a cluster. One method to deal with such an issue is to tombstone that space, basically just indicating that a value was there.

Collision resolution methods

Separate chaining

Separate chaining, or otherwise known as closed addressing is where each table slot stores a linked list or other data structure, where all keys that hash to that index are stored. In this case, the load factor can exceed 1 (unlike open addressing). The worst case lookup becomes O(n) if all keys collide at one index. The buckets are generally implemented as a linked list, a dynamic array, or a balanced BST.

Open addressing

In open addressing, all keys are stored directly into one layer of the table. Collisions are resolved through a differing set of strategies as below.

Linear probing

In linear probing, if the current slot is occupied according to the hash function, try the next slot. This is very simple and cache-friendly.

Quadratic probing

Similarly to the previous one, except it moves in a quadratic pattern by exponentially increasing the amount added to the initial hash value by some constant^2.

Double hashing

Double hashing involves adding another function so that if the current slot is occupied, the secondary hash will add that result to the result of the first. If that key is also occupied, then add the result of the secondary hash again, until a spot is found.

Advanced methods

There are other methods that have not been seen/taught in CPSC 221, including robin hood hashing and cuckoo hashing.

Load factor

The load factor of a hash table is equivalent to the amount of elements over the amount of slots.

Qs

When performing a lookup on a hash table, how can we tell if the value we just looked up using the hash function is the intended value or in the middle of a cluster?