*** Hash Table *** ------------------ Files: hashtbl.c, hashtbl.h A hash table stores items so that they can be looked up efficiently. Each item in a hash table has a key, which is used for indexing the item. The hash table is made up of an array of `buckets', where each bucket can hold several items. The hash table determines which bucket an item belongs in, by hashing the value of the items key using a hash function. The same hash function is used for looking up the appropriate bucket, for each of the hash table operations; insert, find, and delete. The hash table implementation provided by the source files hashtbl.c and hashtbl.h use a linked list of pointers to items to implement the buckets. The array of `buckets' is implemented as an array of pointers to the head items of the linked lists. An empty bucket is represented by having the corresponding array entry assigned a NULL pointer. This hash table implementation assumes and enforces the uniqueness of keys of items in the hash table. The Hash Function ----------------- Suppose the hash table has n buckets, numbered 0 to n-1. It is possible that the range of possible keys for items is much greater than the number of buckets. The hash function takes the key of an item, and maps it to a bucket number in the range 0 to n-1. We can use the notation x = hash(k), where k is the key and x is the hashed value of k (or bucket number). It is possible for several different keys to be hashed to the same bucket number, thus allowing for the large range of possible keys. Because of this it is possible to have several items in a single bucket. Hash tables are effective, when only a small subset of the range of possible keys are actually used by items in the hash table. It is important that the hash function maps keys to buckets as randomly as possible, so that items are distributed as uniformly as possible over all buckets. This minimises the time required to search through a bucket to locate a particular item by its key. If the mapping is not random enough, cases can arise with only a small number of buckets containing items, and all other buckets empty. In practice, only small groups of keys may be used for items in the table. Consider the following example: 1. Keys range from 1 to 1000000. 2. The hash table has 20 buckets. 3. 20 items are inserted into the hash table, but using only keys in the ranges 9001 to 9010 and 900001 to 900010. A poor hash function might map all keys from 9001 to 9010 to the same bucket, and all keys from 900001 to 900010 to the same bucket, leaving only two occupied buckets, with 10 items each. Up to 10 items may need to be checked when searching for a particular item. A good hash function, assigns keys to buckets almost randomly, so that cases like this can be avoided. If the hash function is good we would expect most buckets to have 1 item, some with more than 1 item, and some with empty buckets. In this example, the best case possible is for all buckets to have 1 item, but this is unlikely to arise for an almost random hash function. The Insert Operation -------------------- To insert an item with key k, add the item to bucket x, where x = hash(k). If the hash table enforces the uniqueness of keys, each item in bucket x is checked to ensure there is not already an item with key k. The Find Operation ------------------ To find an item with key k, calculate x = hash(k), then search items in the bucket until an item with key k is found. Either the item with key k will be located, or it is not in the hash table. The Delete Operation -------------------- To delete an item with key k calculate x = hash(k), then search items in the bucket until an item with key k is found. If the item exist, it is removed from bucket x.