Sunteți pe pagina 1din 14

Hashing and Hash Tables

EECS 215

Hash Tables
Another kind of Table O(1) on average for insert, lookup, and remove use an array named T of size `capacity` define a hash function that returns an integer int hash(string key, int N) must return an integer between 0 and N-1 store the key and info at T[hash(key, capacity)] hash() must always return the same integer for a given key

Table Size
Table size is usually prime to avoid bias Overly large table size means wasted space Overly small table size means more collisions What happens as table size approaches 1?

Hash Functions
a good hash function has the following characteristics avoids collisions spreads keys evenly in the array inexpensive to compute - must be O(1)

Hash Functions for Signed Integers


remainder after division by table length if keys are positive, you can eliminate the abs int hash(int key, int N) { return abs(key) % N; }

Hash Functions for Strings


Must be careful to cover range from 0 through capacity-1 Some poor choices summing all the ASCII codes multiplying the ASCII codes Important insight letters and digits fall in range 0101 and 0172 octal so all useful information is in lowest 6 bits Key length has a constant limit, so hash(k,N) is O(1)

Hash Functions for Strings


int hash(string key, int N) { const int shift = 6; const int mask = ~0 >> (32-shift); // lower 6 bits on int result = 0; for (int i = 0; i < key.length(); i++) result = (result << shift) | (key[i] & mask); return abs(result) % N; }

Dealing with Collisions


Open addressing key/value pairs are stored in array slots but what about deletions? Linear probing hash(k, i) = (hash1(k) + i) mod N increment hashvalue by a constant, 1, until free slot is found simplest to implement leads to `primary clustering` Quadratic probing hash(k, i) = (hash1(k) + c1*i + c2*i*i) mod N leads to `secondary clustering`

Dealing with Collisions


Double hashing hash(k, i) = (hash1(k) + i*hash2(k)) mod N avoids clustering Separate chaining each array slot is a SearchList avoids clustering never gets 'full' deletions are not a problem they are for other methods

Dealing with A Full Table


allocate a larger hash table rehash each from the smaller into the larger delete the smaller

Linked List Implementation


struct ListNode { string key; Object info; ListNode * next; ListNode(string k, Object ob, ListNode * n) { key = k; info = ob; next = n; } static ListNode * find(string k, ListNode * l); static ListNode * remove(string k, ListNode * l); static void destruct(ListNode * l); };

Chained Hash Table Implementation


class ChainedHashTable { int hash(string value, int N); // as given above ListNode * * T; // T is an array of linked lists int capacity; public: ChainedHashTable(int numberOfChains); void insert(string key, Object info); Object lookup(string key); void remove(string key); ~ChainedHashTable(); };

Direct Address Tables


A hash table with separate chaining when set of keys is a reasonable set of integral values say m .. M e.g., 0..127 for ASCII characters we can use the key itself as the hash code table size is M - m + 1

Perfect Hashing
when set of keys is known beforehand EG a set of reserved words for a programming language we can construct a function that guarantees no collisions called `Perfect Hashing' but may require more than N slots in the table a `Minimal Perfect Hash Function` uses exactly N slots

S-ar putea să vă placă și