Documente Academic
Documente Profesional
Documente Cultură
• Basic concepts
• Hash functions
• Collision resolution
• Open addressing
• Linked list resolution
• Bucket hashing
1
Basic Concepts
• Sequential search: O(n) Requiring several
key comparisons
• Binary search: O(log2n) before the target is
found
2
Basic Concepts
• Search complexity:
Size Binary Sequential Sequential
(Average) (Worst Case)
16 4 8 16
50 6 25 50
256 8 128 256
1,000 10 500 1,000
10,000 14 5,000 10,000
100,000 17 50,000 100,000
1,000,000 20 500,000 1,000,000
3
Basic Concepts
• Is there a search algorithm whose
complexity is O(1)?
4
Basic Concepts
• Is there a search algorithm whose
complexity is O(1)?
YES.
5
Basic Concepts
memory
keys addresses
hashing
005 Vu Nguyen
005
Vu Nguyen
102002 Hash 100 007 Ray Black
7
Basic Concepts
• Home address: address produced by a hash
function.
8
Basic Concepts
• Synonyms: a set of keys that hash to the
same location.
9
Basic Concepts
• Ideal hashing:
– No location collision
– Compact address space
10
Basic Concepts
Insert A, B, C
hash(A) = 9
hash(B) = 9
hash(C) = 17
A
[1] [5] [9] [17]
11
Basic Concepts
Insert A, B, C
hash(A) = 9
hash(B) = 9 B and A
hash(C) = 17 collide at 9
A B
[1] [5] [9] [17]
Collision
Resolution
12
Basic Concepts
Insert A, B, C
hash(A) = 9
hash(B) = 9 B and A C and B
hash(C) = 17 collide at 9 collide at 17
C A B
[1] [5] [9] [17]
Collision
Resolution
13
Basic Concepts
Searh for B
hash(A) = 9
hash(B) = 9
hash(C) = 17
C A B
[1] [5] [9] [17]
Probing
14
Hash Functions
• Direct hashing
• Modulo division
• Digit extraction
• Mid-square
• Folding
• Rotation
• Pseudo-random
15
Direct Hashing
• The address is the key itself:
hash(Key) = Key
16
Direct Hashing
• Advantage: there is no collision.
17
Modulo Division
• Example:
Numbering system to handle 1,000,000 employees
Data space to store up to 300 employees
• Example:
379452 → 394
121267 → 112
378845 → 388
160252 → 102
045128 → 051
19
Mid-square
• Example:
9452 * 9452 = 89340304 → 3403
20
Mid-square
• Disadvantage: the size of the Key2 is too
large
21
Folding
• The key is divided into parts whose size
matches the address size
Key = 123|456|789
fold shift
123 + 456 + 789 = 1368
⇒ 368
22
Folding
• The key is divided into parts whose size
matches the address size
Key = 123|456|789
23
Rotation
• Hashing keys that are identical except for
the last character may create synonyms.
24
Rotation
• Used in combination with fold shift
25
Pseudorandom
y = ax +
c
For maximum efficiency, a and c should be prime
numbers
26
Pseudorandom
• Example:
27
Collision Resolution
• Except for the direct hashing, none of the
others are one-to-one mapping
⇒ Requiring collision resolution methods
28
Collision Resolution
• A rule of thumb: a hashed list should not be
allowed to become more than 75% full.
Load factor:
α = (k/n) x 100
n = list size
k = number of filled elements
29
Collision Resolution
• As data are added and collisions are
resolved, hashing tends to cause data to
group within the list
⇒ Clustering: data are unevenly distributed across
the list
30
Collision Resolution
• Primary clustering: data become clustered
around a home address.
A B C D E
[1] [9] [10] [11] [12] [13]
31
Collision Resolution
• Secondary clustering: data become
grouped along a collision path throughout a
list.
32
Open Addressing
• When a collision occurs, an unoccupied element is
searched for placing the new element in.
33
Linear Probe
• When a home address is occupied, go to the next
address (the current address + 1).
34
Linear Probe
001 Mary Dodd (379452)
002 Sarah Trapp (070918)
003 Bryan Devaux (121267)
002
Harry Eagle Hash
166702
008 John Carver (378845)
35
Linear Probe
001 Mary Dodd (379452)
002 Sarah Trapp (070918)
003 Bryan Devaux (121267)
004 Harry Eagle (166702)
002
Harry Eagle Hash
166702
008 John Carver (378845)
36
Linear Probe
• Advantages:
quite simple to implement
data tend to remain near their home address
• Disadvantages:
produces primary clustering
37
Quadratic Probe
• The address increment is the collision probe
number squared (12, 22, 32, ...).
38
Quadratic Probe
• Disadvantage: time required to square the probe
number.
(n + 1)2 = n2 + 2n + 1
Increment
Factor
39
Quadratic Probe
1 1 12 = 1 02 1 1
2 2 22 = 4 06 3 4
3 6 32 = 9 15 5 9
4 15 42 = 16 31 7 16
5 31 52 = 25 56 9 25
6 56 62 = 36 92 11 36
7 92 72 = 49 41 13 49
40
Quadratic Probe
• Limitation: it is not possible to generate a new
address for every element in the list.
41
Pseudorandom Probe
42
Pseudorandom Probe
43
Secondary Clustering
A C B D
[1] [9] [11] [13] [15]
Linear probes
Search/find location for D
44
Key Offset
45
Key Offset
46
Linked List Resolution
• Major disadvantage of Open Addressing: each
collision resolution increases the probability for
future collisions.
⇒ use linked lists to store synonyms
47
Linked List Resolution
001 Mary Dodd (379452)
002 Sarah Trapp (070918) Harry Eagle (166702)
003 Bryan Devaux (121267)
Chris Walljasper (572556)
overflow
008 John Carver (378845)
area
prime
area
306 Tuan Ngo (160252)
307 Shouli Feldman (045128)
48
Bucket Hashing
• Hashing data to buckets that can hold multiple
pieces of data.
49
Bucket Hashing
Mary Dodd (379452)
001
50