Database Design and Implementation 05.btree

Introduction to Indexing
B+-Tree Indexes
Ramakrishnan & Gehrke: Chap. 8.2, 10.3-10.8
Database Systems Implementation, Bongki Moon 1
Overview
Index classification
– Primay/Secondary, Clustered/Non-clustered
– Sparse/Dense, Single-attribute/Composite
B+-Tree
– Structure, Search etc.
– Insert and Delete algorithms
– Duplicate keys, variable-length keys, bulk loading
1
Basics
To speed up selections on the search key fields.

– Any subset of the fields of a relation can be the
search key for an index on the relation.
Typically stored as a separate file (or relation).
Typically much smaller than base relations.
Index Classification
Primary vs. secondary: If search key contains

primary key, then called primary index.
– Primary index value is unique (e.g., SSN).
– Secondary index allows duplicate key values (e.g.
Names, GPA).
– Unique index: Search key is or contains a candidate
key.
2
Clustered vs. Non-clustered: If order of data records is the same as, or

`close to’, order of keys, then called clustered index.
How many clustered and non-clustered indexes can a relation have?
CLUSTERED Index entries UNCLUSTERED

direct search for
data entries
Data entries Data entries

(Index File)
(Data file)
Data Records Data Records
Dense vs. Sparse: If

Ashby, 25, 3000
there is 1-to-1 mapping Basu, 33, 4003
22
between key values (in Bristow, 30, 2007

25
30
Ashby
index) and data records, Cass Cass, 50, 5004
33
then the index is dense. Smith Daniels, 22, 6003

40
Jones, 40, 6003
– Can a clustered index be 44
44
dense or sparse? Smith, 44, 3000
50
Tracy, 44, 5004
– Can a non-clustered
index be dense or sparse? Sparse Index
on
Dense Index
on
Name Data File Age
3
Benefits and Costs
Cost of retrieving data records through index varies greatly based on
types of indexes.
– (E.g.) For an exact-match query with a unique/non-unique, clustered/non-
clustered, dense/sparse index, how many pages need to be retrieved from the
index and the base relation?
– (E.g.) How about a range query?
Some may be more costly than others
– To build or maintain a clustered index, the records in the base relation must
be sorted and be in sorted order.
– For dynamic data, it will be costly to maintain the sorted order.
– To reduce the cost of dynamic insertions,
Keep some free space on each page of the base relation for future insertions.
Use overflow pages (with links) for more future insertions. (Thus, order of data
records is `close to’, but not identical to, the sort order.)
Ex) Cost of indexed scan: clustered vs. non-cluster. Which is better?
Composite Search Keys

Examples of composite key
Search on a combination of indexes using lexicographic order.
columns.
– Equality query: 11,80 11
age=20 and sal =75 12,10 12
– Range query: 12,20 name age sal 12
age =20; or age=20 and sal > 10
13,75 bob 12 10 13
Keys in index should be in sorted <age, sal> cal 11 80 <age>
order by search key to support
range queries. joe 12 20
– But, how for multiple columns? 10,12 sue 13 75 10
– Certain queries may benefit from a 20,12 Data records 20
particular order. 75,13 sorted by name 75
– (E.g.) age-then-sal, sal-then-age 80,11 80
Asymmetric vs. Symmetric <sal, age> <sal>
Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
4
B+-Tree: Introduction
Most widely used.

Support both range and equality searches
efficiently.
Dynamic index
– Adjusts gracefully under insertions and deletions.
– Keeps tree height-balanced.
– The cost of exact-match/insertion/deletion is log F N.
F is the fanout, and N is the # leaf nodes.
B+-Tree: Structural Characteristics
Root node has at least two children.

Minimum 50% occupancy (except for root). For a B+- tree
of order d,
– An internal node has d/2 ≤ m ≤ d children.
– A leaf node has d/2 ≤ m ≤ d-1 pointers to data pages.
< ≥ Index Entries

(Direct search)
Data Entries
("Sequence set")
5
B+-Tree: Node Structures
Internal nodes
N P1 K1 P2 K2 ... Pd-1 Kd-1 Pd
– N : the number of valid entries.

– Ki : key values (K1 < K2 < … < Kd-1).
– Pi : tree pointers to internal or leaf nodes.
Leaf nodes
N Ppred K1 P1 K2 P2 ... Kd-1 Pd-1 Pnext
– Ppred, Pnext : pointers to neighboring leaf nodes.

– Pi : data pointers to a record or a block.
Example B+-Tree (of order 5)
Search begins at root, and key comparisons

direct it to a leaf.
(E.g.) Search for 5, 15, or all data entries > 24 ...
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
6
Inserting a Key into a B+-Tree
Find a correct leaf node L.
Put a key entry into L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up the middle key.
Insert a new index entry pointing to L2 into the parent of L.
Node splitting can happen recursively
– To split a non-leaf node, redistribute entries evenly, but
push up the middle key. (Contrast with leaf splits.)
Splits grow tree wider or taller
– Splitting the root node makes the tree one level taller.
Inserting 8* into Example B+-Tree

Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Note that 5 is copied up and

5 continues to appear in the leaf.
Note that 17 is pushed up and

17 appears once in the index.
2* 3* 5* 7* 8*
5 13 24 30
7
Example B+-Tree After Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Notice that root was split, leading to increase in height.

In this example, we can avoid split by re-distributing
entries; however, this is usually not done in practice.
Deleting a Key from a B+-Tree
Start at the root, find a leaf L where the entry belongs.

Remove the entry.
– If L is at least half-full, done!
– If L has less than d/2 entries (pointers),
Try to re-distribute, borrowing from sibling (adjacent node with
same parent as L).
If re-distribution fails, merge L and a sibling.
If merge occurred, must delete an entry (pointing to L or

sibling) from the parent of L.
Merge could propagate to the root, decreasing the height.
8
Example: Deleting 19* and 20*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19*20* 22* 24* 27*29* 33* 34* 38* 39*
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14*16* 22*24* 27* 29* 33*34*38* 39*
Deleting 20* is done with re-distribution (by moving key 24).

– The key 24 is removed from the parent. The new middle key 27 is copied up.
... And Then Deleting 24*

Root
17
5 13 27 30
2* 3* 5* 7* 8* 14*16* 22*24* 27* 29* 33*34*38* 39*
Root
17
5 13 30
2* 3* 5* 7* 8* 14*16* 22* 27*29* 33* 34*38* 39*
Two leaf nodes merge : Key 27 is removed from the parent.

9
Example B+-Tree After Deleting 24*
Root
17
5 13 30
2* 3* 5* 7* 8* 14*16* 22* 27*29* 33* 34*38* 39*
Root
5 13 17 30
2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*
Two internal nodes merge : Key 17 is pulled down from the root.
Example: Deleting 24*

Root
22
5 13 17 20 27 30
2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*24* 27*29* 33* 34*38*39*
Root
22
5 13 17 20 30
2* 3* 5* 7* 8* 14*16* 17*18* 20* 21* 22*27*29* 33*34*38*39*
A different scenario: Deleting 24* causes two leaf nodes merged.

– Then, this causes an underflow in the internal node.
10
Example of Non-leaf Re-distribution
Root
22
5 13 17 20 30
2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*27*29* 33*34*38*39*
Root
17
5 13 20 22 30
2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*27*29* 33*34*38*39*
Entries are re-distributed by `pushing through’ the entries in the parent node.
– Pull down 22 from the root, and push up 20 to the root. We can stop here.
– It’s ok to do once more. Pull down 20 from the root, and push up 17 to the root.
Summary of B+-tree Operations
Insert
– Split a leaf node; copy up the middle key.
– Split an internal node; push up the middle key.
Delete
– Merge leaf nodes; remove the middle key from the parent.
– Merge internal nodes; pull down the middle key from the parent.
– Redistribute leaf nodes; remove the middle key from the parent, and
copy up a new middle key.
– Redistribute internal nodes; pull down the middle key from the
parent to one child node, and push up a new middle key from the
other child node.
11
Other Issues of B+-tree
1. Duplicate Key Values

2. Prefix Key Compression
Variable-length Keys such as strings
3. Bulk-Loading
4. Choosing an Optimal Node Size
Duplicate Key Values
Three alternatives for Leaf Page Layout:

1. Multiple <key,pointer> pairs
2. One-key and multiple-pointers
variable-length records, variable fanout.
3. Another level of indirection
additional overhead for dereferencing and disk accesses.
Page Overflows: several leaf pages may contain entries

with the same key value,
– if the 1st or 2nd layout option is selected.
– The 3rd layout option may be a better choice for this.
How about AM Layer of MiniRel project?
12
Variable-Length Keys
Longer keys may reduce the fan-out and grow the index
taller. (The taller the tree, the more disk accesses.)
Prefix-key compression is done to increase fan-out.
Key values in index entries only `direct traffic’; can often
compress them.
– (E.g.) If we have adjacent index entries with search key values
Dannon Yogurt, David Smith and Devarakonda Murthy, we can
abbreviate them to Dan, Dav and Dev.
What if there is a data entry Davey Jones? (Then, we can only
compress David Smith to Davi)
Insert/delete must be suitably modified.
What if the keys (or strings) are longer than a page?
Bulk Loading of a B+-Tree

For a large collection of records, building a B+-tree by
repeatedly inserting records is very slow.
– Does not give sequential storage of leaves.
Bulk-Loading:
– Start with an empty root node and sorted entries
– Insert a leaf node at a time into the right-most slot of the root or
parent node.
Root
Sorted pages of data entries; not yet in B+ tree
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
13
B+-Trees in Practice
Typical order: 200. Typical fill-factor: 67%.

– Block: 8KB, key: 36 Bytes, Pointer: 4 Bytes.
– average fanout = 133
Typical capacities:
– Height 3: 1333 = 2,352,637 records
– Height 4: 1334 = 312,900,700 records
Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbytes
– Level 3 = 17,689 pages = 133 MBytes
Optimal Size of B-tree Nodes

Problem
– Small pages make B-tree taller and inefficient.
– Large pages increase transfer time.
Idea: Find a Page Size that maximizes the benefit-to-cost
ratio.
Benefit = log2F
Because F(fanout) ∝ Page size; Height ∝ 1/log2F
AccessCost = disk_latency + PageSize/TransferRate (ignoring
cache effects).
From Table 6 in Gray&Graefe 5 Minute Rule paper,
– 8/16/32 Kbytes are near optimal, when one entry is 20 bytes
long.
14
Summary
B+-tree is a dynamic structure.
– Inserts/deletes leave tree height-balanced; log F N cost.
– Adjusts to growth gracefully.
– High fanout (F) means depth rarely more than 3 or 4.
– Typically, 67% occupancy on average.
– We will discuss B-tree locking soon.
Most widely used index in database management systems
because of its versatility. One of the most optimized
components of a DBMS.
Discussions
– B-Tree vs. B+-Tree
– BST vs. B+-Tree
15

Database Design and Implementation 05.btree

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Database Design and Implementation 05.btree

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction to Indexing

Ramakrishnan & Gehrke: Chap. 8.2, 10.3-10.8

Database Systems Implementation, Bongki Moon 1

Database Systems Implementation, Bongki Moon 2

To speed up selections on the search key fields.

Database Systems Implementation, Bongki Moon 3

Primary vs. secondary: If search key contains

Database Systems Implementation, Bongki Moon 4

Clustered vs. Non-clustered: If order of data records is the same as, or

CLUSTERED Index entries UNCLUSTERED

Data entries Data entries

Data Records Data Records

Database Systems Implementation, Bongki Moon 5

Dense vs. Sparse: If

between key values (in Bristow, 30, 2007

then the index is dense. Smith Daniels, 22, 6003

– Can a clustered index be 44

Database Systems Implementation, Bongki Moon 6

Database Systems Implementation, Bongki Moon 7

Composite Search Keys

Database Systems Implementation, Bongki Moon 8

Most widely used.

Database Systems Implementation, Bongki Moon 9

B+-Tree: Structural Characteristics

Root node has at least two children.

< ≥ Index Entries

Database Systems Implementation, Bongki Moon 10

– N : the number of valid entries.

– Ppred, Pnext : pointers to neighboring leaf nodes.

Database Systems Implementation, Bongki Moon 11

Example B+-Tree (of order 5)

Search begins at root, and key comparisons

Database Systems Implementation, Bongki Moon 12

Inserting 8* into Example B+-Tree

Note that 5 is copied up and

Note that 17 is pushed up and

Database Systems Implementation, Bongki Moon 14

Notice that root was split, leading to increase in height.

Deleting a Key from a B+-Tree

Start at the root, find a leaf L where the entry belongs.

If merge occurred, must delete an entry (pointing to L or

Database Systems Implementation, Bongki Moon 16

2* 3* 5* 7* 8* 14*16* 22*24* 27* 29* 33*34*38* 39*

Deleting 20* is done with re-distribution (by moving key 24).

... And Then Deleting 24*

2* 3* 5* 7* 8* 14*16* 22*24* 27* 29* 33*34*38* 39*

2* 3* 5* 7* 8* 14*16* 22* 27*29* 33* 34*38* 39*

Two leaf nodes merge : Key 27 is removed from the parent.

2* 3* 5* 7* 8* 14*16* 22* 27*29* 33* 34*38* 39*

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

Database Systems Implementation, Bongki Moon 19

Example: Deleting 24*

2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*24* 27*29* 33* 34*38*39*

2* 3* 5* 7* 8* 14*16* 17*18* 20* 21* 22*27*29* 33*34*38*39*

A different scenario: Deleting 24* causes two leaf nodes merged.

Database Systems Implementation, Bongki Moon 20

2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*27*29* 33*34*38*39*

2* 3* 5* 7* 8* 14*16* 17*18* 20*21* 22*27*29* 33*34*38*39*

Database Systems Implementation, Bongki Moon 21

Summary of B+-tree Operations

Database Systems Implementation, Bongki Moon 22

1. Duplicate Key Values

Database Systems Implementation, Bongki Moon 23

2* 3* 5* 7* 8* 1416 2224 27* 29* 333438* 39*

2* 3* 5* 7* 8* 1416 2224 27* 29* 333438* 39*

2* 3* 5* 7* 8* 1416 22* 2729 33* 3438 39*

2* 3* 5* 7* 8* 1416 22* 2729 33* 3438 39*

2* 3* 5* 7* 8* 1416 1718 2021 2224 2729 33* 343839*

2* 3* 5* 7* 8* 1416 1718 20* 21* 222729* 33343839

2* 3* 5* 7* 8* 1416 1718 2021 222729* 33343839

2* 3* 5* 7* 8* 1416 1718 2021 222729* 33343839