Documente Academic
Documente Profesional
Documente Cultură
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Static Hashing
Dynamic Hashing
Comparison of Ordered Indexing and Hashing
Index Definition in SQL
Multiple-Key Access
11.2
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
file
An index file consists of records (called index entries) of the form
search-key
pointer
Index files are typically much smaller than the original file
11.3
Access time
Insertion time
Deletion time
Space overhead
11.4
Ordered Indices
In an ordered index, index entries are stored sorted on the search key
The search key of a primary index is usually but not necessarily the
primary key
11.5
11.6
dept_name
11.7
values
11.8
Good tradeoff: sparse index with an index entry for every block in file,
11.9
11.10
Multilevel Index
If primary index does not fit in memory, access becomes
expensive
Solution: treat primary index kept on disk as a sequential file
11.11
11.12
Sparse indices
If
11.13
11.14
Secondary Indices
Frequently, one wants to find all the records whose values in
11.15
11.16
11.17
Example of B+-Tree
11.18
n children.
A leaf node has between (n1)/2 and n1 values
Special cases:
11.19
11.20
11.21
11.22
Example of B+-tree
11.23
indices.
The B+-tree contains a relatively small number of levels
Level
Next
..
etc.
11.24
Queries on B+-Trees
Find record with search-key value V.
1.
2.
3.
4.
5.
C=root
While C is not a leaf node {
1. Let i be least value s.t. V Ki.
2. If no such exists, set C = last non-null pointer in C
3. Else { if (V= Ki ) Set C = Pi +1 else set C = Pi}
}
Let i be least value s.t. Ki = V
If there is such a value i, follow pointer Pi to the desired record.
Else no record with search-key value k exists.
11.25
Handling Duplicates
With duplicate search keys
11.26
Handling Duplicates
We modify find procedure as follows
traverse Pi even if V = Ki
Procedure printAll
** Errata note: modified find procedure missing in first printing of 6th edition
Database System Concepts - 6th Edition
11.27
kilobytes
Contrast this with a balanced binary tree with 1 million search key
11.28
2.
2.
3.
11.29
let the new node be p, and let k be the least key value in p. Insert
(k,p) in the parent of the node being split.
If the parent is full, split it and propagate the split further up.
Splitting of nodes proceeds upwards till a node that is not full is found.
In the worst case the root node may be split increasing the height
of the tree by 1.
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri,pointer-to-new-node) into parent
Database System Concepts - 6th Edition
11.30
B+-Tree Insertion
11.31
B+-Tree Insertion
11.32
internal node N
Adams Brandt
11.33
Crick
11.34
11.35
Value separating two nodes (at the parent) is pulled down when merging
Root node then has only one child, and is deleted
Database System Concepts - 6th Edition
11.36
the node and a sibling fit into a single node, then merge siblings:
Insert all the search-key values in the two nodes into a single node
(the one on the left), and delete the other node.
Delete the pair (Ki1, Pi), where Pi is the pointer to the deleted
node, from its parent, recursively using the above procedure.
11.37
entries in the node and a sibling do not fit into a single node, then
redistribute pointers:
Redistribute the pointers between the node and a sibling such that
both have more than the minimum number of entries.
The node deletions may cascade upwards till a node which has n/2
11.38
Deletion
Low
Simpler
Widely
used
11.39
Organization.
The leaf nodes in a B+-tree file organization store records, instead of
pointers.
Leaf nodes are still required to be half full
Insertion and deletion are handled in the same way as insertion and
11.40
pointers.
To improve space utilization, involve more sibling nodes in redistribution
11.41
11.42
Indexing Strings
Variable length strings as keys
Variable fanout
Prefix compression
11.43
Efficient alternative 1:
details as an exercise
11.44
pointers.
11.45
11.46
11.47
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select ID
from instructor
where dept_name = Finance and salary = 80000
Possible strategies for processing query using indices on
single attributes:
1. Use index on dept_name to find instructors with
department name Finance; test salary = 80000
2. Use index on salary to find instructors with a salary of
$80000; test dept_name = Finance.
3. Use dept_name index to find pointers to all records
pertaining to the Finance department. Similarly use index
on salary. Take intersection of both sets of pointers
obtained.
Database System Concepts - 6th Edition
11.48
a1 < b1, or
11.49
May fetch many records that satisfy the first but not the second
condition
11.50
Other Features
Covering indices
11.51
Hashing
Static Hashing
A bucket is a unit of storage containing one or more records (a
as deletion
Records with different search-key values may be mapped to the
11.53
integer i
The hash function returns the sum of the binary representations of
E.g. h(Music) = 1
h(History) = 2
h(Physics) = 3 h(Elec. Eng.) = 3
11.54
11.55
Hash Functions
Worst hash function maps all search-key values to the same bucket;
same number of search-key values from the set of all possible values
Ideal hash function is random, so each bucket will have the same
11.56
Insufficient buckets
11.57
11.58
Hash Indices
Hashing can be used not only for file organization, but also for index-
structure creation
A hash index organizes the search keys, with their associated record
11.59
11.60
function
11.61
Dynamic Hashing
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
11.62
11.63
All the entries that point to the same bucket have the same values on
the first ij bits
1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket
address table, and follow the pointer to appropriate bucket
To insert a record with search-key value Kj
Else the bucket must be split and insertion re-attempted (next slide)
11.64
11.65
11.66
11.67
Example (Cont.)
Initial Hash structure; bucket size = 2
11.68
Example (Cont.)
11.69
Example (Cont.)
Hash structure after insertion of Einstein record
11.70
Example (Cont.)
Hash structure after insertion of Gold and El Said records
11.71
Example (Cont.)
11.72
Example (Cont.)
And after insertion of
eleven records
11.73
Example (Cont.)
And after insertion of
Kim record in previous
hash structure
11.74
11.75
In practice:
11.76
Bitmap Indices
Bitmap indices are a special type of index designed for efficient
from, say, 0
of distinct values
11.77
In a bitmap for value v, the bit for a record is 1 if the record has the
value v for the attribute, and is 0 otherwise
11.78
Intersection (and)
Union (or)
Complementation (not)
Each operation takes two bitmaps of the same size and applies the
operation on corresponding bits to get the result bitmap
11.79
E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space
used by relation.
not(A=v):
11.80
11.81
To drop an index
clustering.
11.82
End of Chapter
Figure 11.01
11.84
Figure 11.15
11.85
Partitioned Hashing
Hash values are split into segments that depend on each
(customer-street, customer-city)
search-key value
(Main, Harrison)
(Main, Brooklyn)
(Park, Palo Alto)
(Spring, Brooklyn)
(Alma, Palo Alto)
hash value
101 111
101 001
010 010
001 001
110 010
11.86
Grid Files
Structure used to speed the processing of general multiple search-
11.87
11.88
(a1 A a2)
(b1 B b2)
(a1 A a2 b1 B b2),.
corresponding candidate grid array cells, and look up all the buckets
pointed to from those cells.
11.89
cells.
11.90