1.3. Hash tables

A hash table is a data structure useful for storing a set of items in which the order is not important. It is useful if you want to be able to build a set up an element at a time, and at every stage being able to know whether a particular element is in the table (e.g. for avoiding duplicates).


How does it work?

The basis of a hash table is a hash function. A hash function is a function that takes an element of whatever data type you are storing (integer, string, etc) and outputs an integer in a certain range.

A good hash function is one that gets a fairly even distribution of the numbers in the output range, even if the input values are very poorly distributed (for example, English words are poorly distributed since they only use 26 different letters and some letters and some combinations of letter occur far more frequently than others.

Many hash functions work by computing a large value modulo the table size. In this case it is generally a good idea to make the table size a prime number.

The hash table itself consists of an array whose indexes are the range of the hash function. You should always try to make the table at least 50% bigger than the actual number of items that will ever need to be stored in it, and the larger the better. The elements of the array are just elements of the set you are using (you can also store associated values with each key). You also need a way to specify whether any given array entry is "in use" or not. Elements can occur anywhere in the hash table, with the condition that there must be no unused entries between an element's hash value and the element's address.


Insertion

To add an element to the hash table, first apply the hash function to it to get a hash value. Then go to that element in the array, and search from there until you find a table entry that isn't occupied (possibly wrapping at the end of the array). Then place the element in the empty table entry.


Searching

Searching for an element is described on the searching page.


[Prev] [Next] [Up]

Last updated Sun Nov 28 22:04:38.0000000000 2004. Copyright Bruce Merry (bmerry '@' gmail dot. com).