Sunteți pe pagina 1din 49

Hashing

Static Hashing
A bucket is a unit of storage containing one or more records (a
bucket is typically a disk block).
In a hash file organization we obtain the bucket of a record directly
from its search-key value using a hash function.
Hash function h is a function from the set of all search-key values K
to the set of all bucket addresses B.
Hash function is used to locate records for access, insertion as well
as deletion.
Records with different search-key values may be mapped to the
same bucket; thus entire bucket has to be searched sequentially to
locate a record.
Example of Hash File Organization
There are 10 buckets,
The binary representation of the ith character is assumed to be the
integer i.
The hash function returns the sum of the binary representations of
the characters modulo 10
E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

Hash file organization of account file, using branch_name as key
(See figure in next slide.)
Example of Hash File Organization
Hash file organization
of account file, using
branch_name as key
(see previous slide for
details).
Hash Functions
Worst hash function maps all search-key values to the same bucket;
this makes access time proportional to the number of search-key
values in the file.
An ideal hash function is uniform, i.e., each bucket is assigned the
same number of search-key values from the set of all possible values.
Ideal hash function is random, so each bucket will have the same
number of records assigned to it irrespective of the actual distribution of
search-key values in the file.
Typical hash functions perform computation on the internal binary
representation of the search-key.
For example, for a string search-key, the binary representations of
all the characters in the string could be added and the sum modulo
the number of buckets could be returned. .
Handling of Bucket Overflows
Bucket overflow can occur because of
Insufficient buckets
Skew in distribution of records. This can occur due to two
reasons:
multiple records have same search-key value
chosen hash function produces non-uniform distribution of key
values
Although the probability of bucket overflow can be reduced, it cannot
be eliminated; it is handled by using overflow buckets.
Handling of Bucket Overflows (Cont.)
Overflow chaining the overflow buckets of a given bucket are chained
together in a linked list.


Hash Indices
Hashing can be used not only for file organization, but also for index-
structure creation.
A hash index organizes the search keys, with their associated record
pointers, into a hash file structure.
Strictly speaking, hash indices are always secondary indices
if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
Example of Hash Index
Deficiencies of Static Hashing
In static hashing, function h maps search-key values to a fixed set of B
of bucket addresses. Databases grow or shrink with time.
If initial number of buckets is too small, and file grows, performance
will degrade due to too much overflows.
If space is allocated for anticipated growth, a significant amount of
space will be wasted initially (and buckets will be underfull).
If database shrinks, again space will be wasted.
One solution: periodic re-organization of the file with a new hash
function
Expensive, disrupts normal operations
Better solution: allow the number of buckets to be modified dynamically.
Dynamic Hashing
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
Hash function generates values over a large range typically b-bit
integers, with b = 32.
At any time use only a prefix of the hash function to index into a
table of bucket addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2
i.
Initially i = 0
Value of i grows and shrinks as the size of the database grows
and shrinks.
Multiple entries in the bucket address table may point to a bucket .
The number of buckets also changes dynamically due to coalescing
and splitting of buckets.

General Extendable Hash Structure
key values to be indexed are 4,1,5,12,32,16,10,5,21,13,15,7,19
0
0
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 1 5 12
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

0
1
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
key values to be indexed are 4,1,5,12,32,16,10,21,13,15,7,19
1
1
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
0
1
key values to be indexed are 4,1,5,12,32,16,10,21,13,15,7,19
1
1
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
0
1
key values to be indexed are 4,1,5,12,32,16,10,21,13,15,7,19
1
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
0
1
2
10
key values to be indexed are 4,1,5,12,32,16,10,21,13,15,7,19
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5 21 13
00
01
10
11

2
10
key values to be indexed are 4,1,5,12,32,16,10,21,13,15,7,19
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

00
01
10
11

2
10
1
1 5 21 13
13 00
01
10
11
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10
1 21
4 12 32 16
15
5
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

2
Insert: 20
13 00
01
10
11
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10
1 21
4 12 32 16
15
5
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

2
Insert h(r)=20
20
00
01
10
11
2
2
2
2
LOCAL DEPTH
DIRECTORY
GLOBAL DEPTH
Bucket A
Bucket B
Bucket C
Bucket D
1 5 21 13
32
16
10
15 7 19
4 12
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

3
3
19
2
2
2
000
001
010
011
100
101
110
111
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
32
1 5 21 13
16
10
15 7
4 20 12
LOCAL DEPTH
GLOBAL DEPTH
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011
29 : 11101

3
3
3
Bucket A
Insert h(r)=29
19
2
2
2
000
001
010
011
100
101
110
111
3
3
3
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
(`split image'
of Bucket A)
32
1 5 21 13
16
10
15 7
4 20 12
LOCAL DEPTH
GLOBAL DEPTH
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011
29 : 11101

Insert h(r)=29
19*
2
2
000
001
010
011
100
101
110
111
3
3
3
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
(`split image'
of Bucket A)
32* 16*
10*
15* 7*
4* 20* 12*
LOCAL DEPTH
GLOBAL DEPTH
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011
29 : 11101

3
1* 21*
3
5* 29 13*
Bucket B2
key values to be indexed are
21,13,15,7,19,4,1,5,12,32,16,10,5

4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

Deletion in Extendable Hash Structure
To delete a key value,
locate it in its bucket and remove it.
The bucket itself can be removed if it becomes empty (with
appropriate updates to the bucket address table).
Coalescing of buckets can be done (can coalesce only with a
buddy bucket having same value of i
j
and same i
j
1 prefix, if it is
present)
Decreasing bucket address table size is also possible
Note: decreasing bucket address table size is an expensive
operation and should be done only if number of buckets becomes
much smaller than the size of the table
Delete 10
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5 21 13
00
01
10
11

2
10
1
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
0
1
2
10
1
1
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
DATA PAGES
4 12 32 16
4 : 100
12: 1100
32: 100000
16: 110000
20: 10100
10: 1010
1 : 01
5 : 101
21: 10001
13: 1101
15: 1111
7 : 111
19: 10011

1
1 5
0
1

E.g. index on ID attribute of instructor relation
Dense index on dept_name, with instructor file sorted on
dept_name
Use of Extendable Hash Structure:
Example
Initial Hash structure, bucket size = 2
Example (Cont.)
Hash structure after insertion of one Brighton and two Downtown
records
0
1

Example (Cont.)
Hash structure after insertion of Mianus record
00
01
10
11
Hash structure after insertion of three Perryridge records
0:000
1:001
2:010
3:011
4:100
5:101
6:110
7:111
Hash structure after insertion of Redwood and Round Hill records
0:000
1:001
2:010
3:011
4:100
5:101
6:110
7:111
Index Definition in SQL
Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.:
Simple index: create index b-index on branch(branch_name)
Composite index : : create index Tindex on Transac(tranc_no,acct_no)
Use create unique index to indirectly specify and enforce the condition
that the search key is a candidate key.
Not really required if SQL unique integrity constraint is supported
create unique index Uindex on branch(branch_name)
Reverse order index:
create index b-index on branch(branch_name) reverse
Alter index b_index rebuild noreverse
To drop an index
drop index <index-name>
Most database systems allow specification of type of index, and clustering.
End of Module

45
Comparison :B-Tree and
Hash Indexing
46
B
+
Tree
Supports equality and range searches, multiple attribute
keys and partial key searches
A B-tree index can be used for column comparisons
in expressions that use the =, >, >=, <, <=, or
BETWEEN operators. The index also can be used for
LIKE comparisons if the argument to LIKE is a
constant string that does not start with a wildcard
character.
Either a separate index or the basis for a storage
structure
Responds to dynamic changes in the table
SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
47
48
Hash Indices - Problems
Does not support range search
Since adjacent elements in range might hash to different buckets,
there is no way to scan buckets to locate all search key values v
between v
1
and v
2

Although it supports multiple attribute keys, it
does not support partial key search
Entire value of v must be provided to h
Dynamically growing files produce overflow
chains, which negate the efficiency of the
algorithm

49
Extendable Hashing
Eliminates overflow chains by splitting a bucket when it overflows
Range of hash function has to be extended to accommodate additional
buckets
Example family of hash functions based on h:
h
k
(v) = h(v) mod 2
k
(use the last k bits of h(v))
At any given time a unique hash, h
k
, is used depending on the number of
times buckets have been split
50
Extendable Hashing
Deficiencies:
Extra space for directory
Cost of added level of indirection:
If directory cannot be accommodated in main memory, an
additional page transfer is necessary.
Linear Hashing
51
Choosing An Index
An index should support a query of the application that has the most impact
on performance
Choice based on frequency of invocation, execution time, acquired
locks, table size

Ex 1 SELECT E. Id
FROM Employee E
WHERE E.Salary < 100000 AND E.Salary > 50000

Choose B
+
tree with search key = Salary
52
Choosing An Index
Ex 2 SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = : id

Choose B+ tree or hash with search key StudId

Ex 3 SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = : id AND T.Semester = F2000

Choose B+ tree or hash with search key StudId
since Semester is not as selective as StudId

End of Chapter

S-ar putea să vă placă și