Documente Academic
Documente Profesional
Documente Cultură
B+-Tree Indexes
Overview
Index classification
– Primay/Secondary, Clustered/Non-clustered
– Sparse/Dense, Single-attribute/Composite
B+-Tree
– Structure, Search etc.
– Insert and Delete algorithms
– Duplicate keys, variable-length keys, bulk loading
1
Basics
Index Classification
2
Index Classification
Index Classification
30
Ashby
index) and data records, Cass Cass, 50, 5004
33
44
dense or sparse? Smith, 44, 3000
50
Tracy, 44, 5004
– Can a non-clustered
index be dense or sparse? Sparse Index
on
Dense Index
on
Name Data File Age
3
Benefits and Costs
Cost of retrieving data records through index varies greatly based on
types of indexes.
– (E.g.) For an exact-match query with a unique/non-unique, clustered/non-
clustered, dense/sparse index, how many pages need to be retrieved from the
index and the base relation?
– (E.g.) How about a range query?
Some may be more costly than others
– To build or maintain a clustered index, the records in the base relation must
be sorted and be in sorted order.
– For dynamic data, it will be costly to maintain the sorted order.
– To reduce the cost of dynamic insertions,
Keep some free space on each page of the base relation for future insertions.
Use overflow pages (with links) for more future insertions. (Thus, order of data
records is `close to’, but not identical to, the sort order.)
Ex) Cost of indexed scan: clustered vs. non-cluster. Which is better?
4
B+-Tree: Introduction
Data Entries
("Sequence set")
5
B+-Tree: Node Structures
Internal nodes
N P1 K1 P2 K2 ... Pd-1 Kd-1 Pd
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
6
Inserting a Key into a B+-Tree
Find a correct leaf node L.
Put a key entry into L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up the middle key.
Insert a new index entry pointing to L2 into the parent of L.
Node splitting can happen recursively
– To split a non-leaf node, redistribute entries evenly, but
push up the middle key. (Contrast with leaf splits.)
Splits grow tree wider or taller
– Splitting the root node makes the tree one level taller.
Database Systems Implementation, Bongki Moon 13
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
5 13 24 30
7
Example B+-Tree After Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
8
Example: Deleting 19* and 20*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19*20* 22* 24* 27*29* 33* 34* 38* 39*
Root
17
5 13 27 30
5 13 27 30
Root
17
5 13 30
9
Example B+-Tree After Deleting 24*
Root
17
5 13 30
Root
5 13 17 30
Two internal nodes merge : Key 17 is pulled down from the root.
5 13 17 20 27 30
Root
22
5 13 17 20 30
10
Example of Non-leaf Re-distribution
Root
22
5 13 17 20 30
Root
17
5 13 20 22 30
Entries are re-distributed by `pushing through’ the entries in the parent node.
– Pull down 22 from the root, and push up 20 to the root. We can stop here.
– It’s ok to do once more. Pull down 20 from the root, and push up 17 to the root.
Insert
– Split a leaf node; copy up the middle key.
– Split an internal node; push up the middle key.
Delete
– Merge leaf nodes; remove the middle key from the parent.
– Merge internal nodes; pull down the middle key from the parent.
– Redistribute leaf nodes; remove the middle key from the parent, and
copy up a new middle key.
– Redistribute internal nodes; pull down the middle key from the
parent to one child node, and push up a new middle key from the
other child node.
11
Other Issues of B+-tree
12
Variable-Length Keys
Longer keys may reduce the fan-out and grow the index
taller. (The taller the tree, the more disk accesses.)
Prefix-key compression is done to increase fan-out.
Key values in index entries only `direct traffic’; can often
compress them.
– (E.g.) If we have adjacent index entries with search key values
Dannon Yogurt, David Smith and Devarakonda Murthy, we can
abbreviate them to Dan, Dav and Dev.
What if there is a data entry Davey Jones? (Then, we can only
compress David Smith to Davi)
Insert/delete must be suitably modified.
What if the keys (or strings) are longer than a page?
Root
Sorted pages of data entries; not yet in B+ tree
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
13
B+-Trees in Practice
14
Summary
B+-tree is a dynamic structure.
– Inserts/deletes leave tree height-balanced; log F N cost.
– Adjusts to growth gracefully.
– High fanout (F) means depth rarely more than 3 or 4.
– Typically, 67% occupancy on average.
– We will discuss B-tree locking soon.
Most widely used index in database management systems
because of its versatility. One of the most optimized
components of a DBMS.
Discussions
– B-Tree vs. B+-Tree
– BST vs. B+-Tree
15