Hash Table

A hash table, sometimes called a map or dictionary, is a data structure that maps a unique key to a value, usually with an associative array and a hash function. In the case of a dictionary (the actual book) for example, it associates a word with its definition.

The hash functions that hash table use are usually imperfect, meaning they would always result in eventual collisions.

As such hash tables need to deal with collisions as they occur.

Some methods to reduce the amount of collisions in the first place, is to have a secondary hash function that is triggered when an element would collide with just one hash function. Adding the second hash function would jump the insertion value to a space denoted by the result of the second hash function.

Collision Resolution Methods

There are different methods of avoiding collisions to improve the efficiency of our hash table.

Separate Chaining

Separate chaining, or otherwise known as closed addressing is where each table slot stores a linked list or other data structure, where all keys that hash to that index are stored. In this case, the load factor can exceed 1 (unlike open addressing). The worst case lookup becomes O(n) if all keys collide at one index. The buckets are generally implemented as a linked list, a dynamic array, or a balanced BST.

Open Addressing

In open addressing, all keys are stored directly into one layer of the table. Collisions are resolved through a differing set of strategies as below.

Linear Probing

In linear probing, if the current slot is occupied according to the hash function, try the next slot. This is very simple and cache-friendly. However, it can encounter some problems when dealing with removal.

Quadratic Probing

Similarly to the previous one, except it moves in a quadratic pattern by exponentially increasing the amount added to the initial hash value by some constant^2.

Double Hashing

Double hashing involves adding another function so that if the current slot is occupied, the secondary hash will add that result to the result of the first. If that key is also occupied, then add the result of the secondary hash again, until a spot is found.

Advanced Methods

There are other methods that have not been seen/taught in CPSC 221, including robin hood hashing and cuckoo hashing.

Load Factor

The load factor of a hash table is equivalent to the amount of elements over the amount of slots or capacity.

Removal

One problem that can occur when removing an element from a hash table is when removing an element in the middle of a cluster of elements, that could cause a linear probing algorithm to fail, as it would end prematurely upon finding an empty space in the middle of a cluster.

One method to deal with such an issue is to tombstone that space, basically just indicating that a value was there. This however degrades performance on the long run, as if your table is full of tombstones, you are slowly creeping to linear time performance and large memory usage.

CPSC Notes

Explorer