B Trees Advanced)

B-Trees & its
Variants
Advanced Data Structure
Spring 2007
Zareen Alamgir
Motivation
Yet another Tree!

Why do we need another
Tree-Structure ?
Data Retrieval from
External Storage
 In database programs, the data is too large to fit in
memory, therefore, it is stored on secondary storage
(disks or tapes).
 Disk access is very expensive, the disk I/O operation

takes milliseconds while CPU processes data on the
order of nanoseconds, one million times faster.
 When dealing with external storage the disk accesses

dominate the running time.
Balanced Binary Search
Trees
 Balanced binary search trees (AVL & Red-Black) have
good performance if the entire data can fit in the main
memory.
 These trees are not optimized for external storage and

require many disk accesses, thus give poor
performance for very large data.
Reduce Disk Accesses
 Data is transfer to and from the disk in block (typically block
are of 512, 2048, 4096 or 8192 bytes).
 To reduce disk accesses

 Store multiple records in a block on the disk.
 Reduce tree height by increasing the number of children
of a node.
 To achieve above goals we use Multiway (m-way) search

tree, which is a generalization of BST, binary search tree.
25 62
12 19 32 39 73 84
3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94
Multiway(m-way)
Search Trees
Multiway(m-way) Search
Trees
 In an m-way tree all the nodes have degree ≤ m.
 Each node has the following structure:
P1 K1 P2 Ki-1 Pi Ki Kq-1 Pq
keys<K1 Ki-1<keys<Ki Kq-1<keys
 Ki are the keys, where 1 ≤ i ≤ q-1, q<=m

 Pi are pointers to subtrees, where 1 ≤ i ≤ q, q<=m
 The keys in each node are in ascending order K1 ≤ K2 ≤ ... ≤ Ki

 The key Ki is larger than keys in subtree pointed by Pi and
smaller than keys in subtree pointed by Pi+1 .
 The subtrees are the m-way trees.
Multiway(m-way) Search
Trees
 M-way tree is a generalization of BST, its working, benefits
& issues are same.
 Benefits
 Problems
 Fast information
 The tree is not balanced.
retrieval.  Leaf nodes are on different
 Fast update. levels.
 Bad space usage, tree can
become skew.
25 62
12 23 32 73 84
3 5 15 21 30 31 69 71 90 94
17 19
M-way tree
B-Trees
B-Trees
 B-Tree is a balanced m-way tree that is tuned to
minimize disk accesses.
 The node size of B-Tree is usually equal to the disk block
size and the number of keys in a node depends on
 Key size
 Disk block size
 Data organization (either key or entire data record is store in a node)
 Access paths from root to leaf nodes are small.

25 62
12 19 32 39 73 84
3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94
B-Tree: Definition
 A B-tree of order m has following properties
 The root has at least two subtrees unless it is a leaf.
 Each non-root and each non-leaf node holds q-1 keys
and q pointers to subtrees, where  m / 2 ≤ q ≤ m .
 Each leaf node holds q-1 keys where  m / 2 ≤ q ≤ m .
 All lea are on the same level.
 It is clear that B-tree is always at least half full, has
fewer levels and is perfectly balanced.
P1
Fr K1 P2 Ki-1 Pi Ki Kq-1 Pq

B-Tree
 B-Tree can have a field KeyTally, KT, to indicate the number
of keys currently stored in the node.
 B-Tree node usually contains key and data pointer pair. The
data pointer points to the data record which is not stored in
the node, with this scheme we can pack more keys &
pointers in a B-Tree node.
KT P1 K1 D1 Ki-1 DKi-1i-1 Pi Ki Di Kq-1 KDq-1q-1 Pq
Data pointer Data pointer

Height of B-Tree
 Height of the B-Tree with n keys is important as it bound
the number of disk accesses.
 The height of the tree is maximum when each node has

minimum number of the subtree pointers, q =  m / 2 .
Height of B-Tree
 The height of B-tree is maximum if all nodes have minimum
number of keys.
1 key in the root + 2(q-1) keys on the second level +……+ 2qh-2(q-1) keys in
the leaves (level h).
1 + 2(q - 1) + 2q(q - 1) + …… + 2q h-2 (q - 1)

 h −2 i 
= 1 + ( q − 1) ∑ 2q 
 i =0 
Applying the formula of geometric progression
 q h −1 − 1 
= 1 + 2( q − 1) 
 q − 1 
= −1 + 2q h −1
Thus, the number of keys in B - Tree of height h is given as :
n ≥ −1 + 2q h −1
n +1
h ≤ log q +1
2
Height of B-Tree
 The height of B-tree is minimum if all nodes are full, thus we have
m-1 keys in the root + m(m-1) keys on the second level +……+ mh-1(m-1) keys in
the leaf nodes
(m - 1) + m(m - 1) + m 2 (m - 1) + …… + m h-1 (m - 1)
h −1 h −1
= ∑ (m − 1)m = ( m − 1)∑ m i
i
i =0 i =0
Applying the formula of geometric progression

 mh − 1 
= (m − 1) 
 m − 1 
= mh − 1
Thus, the number of keys in B - Tree of height h is given as :
n ≤ mh − 1
h ≥ logm (n + 1)
n +1
logm ( n + 1) ≤ h ≤ log q +1
2
Height of B-Tree
 If number of nodes in B-tree equal 2,000,000 (2 million)
and m=200 then maximum height of B-tree is 3, where
as the binary tree would be of height 20.
 Note: Order m is chosen so that B-tree node size is

nearly equal to the disk block size.
Search in a B-Tree
 Search in a B-tree is similar to the search in BST
except that in B-tree we make a multiway
branching decision instead of binary branching in
BST.
25 62
12 19 32 39 73 84
3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94
Search key 71
B-Tree Insert Operation
 Insertion in B-tree is more complicated than in BST.
 In BST, the keys are added in top down fashion

resulting in an unbalanced tree.
 B-tree is built bottom up, the keys are added in the

leaf node, if the leaf node is full another node is
created, keys are evenly distributed and middle key
is promoted to the parent. If parent is full, the
process is repeated.
 B-tree can also be built in top down fashion using

pre-splitting technique.
Basic Idea
Find position for the key
in the appropriate leaf node
Insert key in order Is node

and adjust pointer
No full ?
yes
Split node: If parent is full

• Create a new node
• Move half of the keys from the full node to
the new node and adjust pointers
• Promote the median key (before split)
to the parent
≥  m / 2 − 1
Split guarantees that each node has
keys.
Cases in B-Tree Insert
Operation
 In B-tree insertion we have the following cases:
 Case 1: The leaf node has room for the new key.
 Case 2: The leaf in which key is to be placed is full.
 This case can lead to the increase in tree height.
 Now we explain these cases in detail.

 Case 1: The leaf node has room for the new key.
Find appropriate leaf

Insert 3 node for key 3
3
10 25
5 8 14 19 20 23 32 38
Insert 3 in order
 Case 2: The leaf in which key is to be placed is full.

16
10 19
25
3 5 8 14 19 20 23 32 38
No room for key 16 in leaf node
Insert key 19 in parent node in order Move median key 19 up and

Split node: create a new node and
move keys to the new node.
19
14 16 20 23
 Case 2: The leaf in which key is to be placed is full
and this lead to the increase in tree height.
45 55 67 81
13 27 33 38 48 52 57 61 72 77 86 92
9 12 14 19 20 2 3 3 2 38 47 51 59 75 32 38 4 7 51 59 75 32 38 47 51 59 75 32 3 8 47 51 59 75 32 38 47 51 59 75
 Case 2: The height of the tree increases.
Insert 16
Insert 27 in parent in order 55
No room for 27 in parent, Split node
16
45
5555 67 81
No room for 19 in parent,

Split parent node 48 52 57 61 72 77 86 92
13 19
27 33 38
3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7
2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5
9 12 14 19 20 23 29 31 35 36 41 42
Insert 19 in parent node in order
No room for key 16,

Move median key 19 up & Split node
19
14 16 20 23
B-Tree Delete Operation
 Deletion is analogous to insertion, but a little
more complicated.
 Two major cases
 Case 1: Deletion from leaf node
 Case 2: Deletion from nonleaf node
 Apply delete by copy technique used in BST, this will
reduce this case to case 1.
 In delete by copy, the key to be deleted is replaced
by the largest key in the right subtree or smallest in
left subtree (which is always a leaf).
 Leaf node deletion cases:
 After deletion node is at least half full.
 After deletion underflow occurs
m
 Redistribute: if number of keys in siblings >  2  − 1 .
 Merge nodes if number of keys in siblings < m
 2  − 1 .
 Merging leads to decrease in tree height.
 After deletion node is at least half full. (inverse of insertion
case 1)
Search key 3
10 25
3 5 8 14 19 32 38 40 45
Key found, delete key 3.

Move others keys in the node to eliminate the gap.
 Underflow occurs, evenly redistribute the keys if left or
right sibling has keys >  m / 2 − 1.
Search key 14
Delete 14
10 25
5 8 14 19 32 38 40 45
Underflow occurs, evenly redistribute keys

in the underflow node, in its sibling and the
separator key.
 Underflow occurs and the keys in the left & right sibling are
=  m / 2 − 1
. Merge the underflow node and a sibling.
Delete 25 Move separator key down.
Move the keys to underflow

10 32
node and discard the sibling.
5 8 19 25 38 40
Underflow occurs, merge nodes.

 Underflow occurs, height decreases after merging.
Delete 21
70
Underflow occurs,
merge nodes
8 32 79 85
3 5 21 27 47 66 73 75 78 81 83 88 90 92
Underflow occurs, merge nodes

by moving separator key and
the keys in sibling node to the
underflow node.
Issues in B-tree
 In B-tree, accessing data in sequential order is not
efficient. In-order traversal of the B-tree requires many
disk accesses.
 In B-tree, data pointers are stored in each node, thus

resulting in less subtree pointers per node and more tree
levels.
KT P1 K1 D1 Ki-1 D
Ki-1 Pi Ki Di Kq-1 free
KDspace
q-1
q-1 Pq
Data pointer Data pointer

B+-Trees
B+ -Tree- Variant of B-
Tree
 Resolves the issues in B-tree.
 In B+ -tree
 Pointers to data is stored only in leaf nodes.
 Internal nodes contain only keys and subtree pointers
 Can accommodate more keys in internal nodes.
 Less disk accesses due to fewer levels in the tree.
 B+ -tree provides faster sequential access of data.

B+-Tree Structure
 B+-tree consist of two parts
 Index Set
 Provides indexes for fast access of data.
 Consist of internal nodes that store only key & subtree pointers.
 Sequence Set
 Consist of leaf nodes that contain data pointers.
 Provide efficient sequential access of data (using doubly linked list).
Index Set
Sequential Sequence Set

Search
B+-Tree: Index Node
Structure
 The basic structure of the B+-tree index node of order
m is same as that of B-tree node of order m.
 The root has at least two subtrees unless it is a leaf.
 Each non-root index node holds q-1 keys and q subtree
pointers, where  m / 2 ≤ q ≤ m .
 Only difference is that index node do not contain
data pointers.
P1
Fr K1 P2 Ki-1 Pi Ki Kq-1 Pq

B+-Tree: Sequence Node
Structure
 The structure of the B+-tree sequence node is as
follows:
K1 D1 Ki-1 Di-1 Ki Di Kq-1 Dq-1

Pointers to previous
and next leaf node in tree
Data pointer Data pointer Data pointer Data pointer

Search in a B+-Tree
 The search in B+-tree works similar to B-tree but it
always ends at the leaf node because
 Pointer to the data is stored in the leaf node.
 Existence of the key in index set does not guarantee that the
particular record is present in the tree.
 A key can occur multiple time in the index set (this does not
create problem because the key in index set node act only as a
separator).
Note 62 is not present
in the sequence set
30 62
15 21 34 45 75 90
3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94
B+-tree of order 5
B+-Tree Insert
 Case: The leaf node has room for the key to be inserted.
Find appropriate leaf node for

key 3, and insert in order.
Insert 3
3
14 32
5 8 14 19 20 23 32 38
B+-Tree Insert Operation
 Case 2: The leaf in which key is to be placed is full.

16
10 19
25
3 5 8 14 19 20 23 32 38
No room for key 16,

Split node: create a new
node and move  m / 2
keys to the new node. Insert a copy of the first key of the
new node in the parent node in order.
14 16
19 20 23 19
Modify Sequence Set next node links

B+ -Tree Insert
Operation
 Case: Only root node exists, and it is full.
Insert 18 Find appropriate

position for key 18
18
No room for key 18, 18
Split node: create a new
sequence set node and
10 17 20 23 18
move  m / 2 keys to the
new node.
Create a new index set node and
make it a root node.
Insert the first key of the new

sequence set node in the new root.
B+-Tree Delete
Operation
 B+-tree deletion follows same rules as that of B-
tree deletion but the separator in index set node is
not removed when a key is deleted.
 Deletion cases:
 After deletion node is at least half full.
m
 Redistribute: if number of keys in siblings >  2  − 1 .
 Merge nodes if number of keys in siblings < m
 2  − 1 .
Merging lead to decrease in tree height.
B+-Tree Delete
 Case: After deletion node is at least half full.
Search key 14
Delete 14
Note: key 14 in the parent node is
14 32 still a valid separator key. It is not
modified.
3 5 8 14 19 21 32 38 40
Key found, delete key 14.

Move others keys in the node to eliminate the gap.
B+-Tree Delete
Operation
 Underflow occurs, evenly redistribute the keys if left or
right sibling has keys >  m / 2 − 1.
Separator
Search key
key 14 in the parent
& delete it. is
Delete 14 no longer valid
14 38
32 Insert a copy of the first key of
the sibling node in parent in order.
5 8 14 19 32 38 40
Underflow occurs, evenly redistribute keys

in the underflow node and in the sibling node.
Note: unlike B-tree, the separator key in the

parent node is not included. Why?
B+-Tree Delete
Operation
 Underflow occurs and the keys in the left & right sibling are
=  m / 2 − 1
. Merge the underflow node and a sibling.
Search key 32 & delete it.
Delete 32
14 38
5 8 19 32 38 40
Underflow occurs, merge nodes.
Move keys in sibling to underflow

node and discard the sibling node.
Efficient Sequential Access in
B+-Tree
For efficient sequential access, start from the beginning of
the sequence set and traverse the sequence set using the
next pointers in sequence set nodes
45 55 67 81
13 27 33 38 48 52 57 61 72 77 86 92
Sequence Set Start

Comparison B-Tree & B+-
Tree
B-Tree B+-Tree
 Data pointers are stored in  Data pointers are stored
all nodes only in leaf nodes (sequence
set)
 No redundant keys
 Search can end at any
 Redundant keys may exist
node  Search always ends at leaf
 Slow sequential access node
 Higher trees
 Efficient sequential access
 Flatter trees (no data pointers in
index set nodes)
B* -Trees
B* -Tree -- Variant of B-
Tree
 Each node of a B-tree represents a block of secondary memory,
therefore, accessing a node is expensive operation. Thus, the
fewer nodes that are created, the better.
 In B*-tree is a variant of B-tree introduced by Donald Knuth and

named by Douglas Comer.
 In a B*-tree, all nodes except the root are required to be at least

two-thirds full, not just half full as in B-tree.
 The number of keys in all non-root nodes in a B*-tree of order m

2m − 1
is k, where  3  ≤ k ≤ m − 1.
 
 Average utilization of B*-tree is 81%.

B* -Tree Insert Operation
 In B*-tree, the frequency of node splitting is decreased by
delaying a split.
 A split in a B*-tree is delayed by attempting to redistribute the

keys between node and its sibling when node overflows.
 In B*-tree split operation two nodes are split into three instead of
one into two as in B-tree.
 All three nodes participating in the split are guaranteed to be two-

thirds full after split.
B*-Tree Insert Operation
 Overflow occurs, evenly redistribute the keys between node
and its sibling.
Insert 22
22 32
10 12 15 16 21 24 25 29 35 42 47 51 53
Overflow occurs, evenly redistribute keys

in the overflow node, in its sibling including
the separator key in parent and new key.
B*-Tree Insert Operation
 Overflow occurs, sibling is full, split node.
Insert 72
72 32
10 12 15 16 21 24 25 29 35 42 47 51 53 55 57 59
Overflow occurs, sibling is full, split node

B* -Tree Delete
Operation
 B*-tree deletion follows same rules as that of B-tree deletion.
 Deletion cases:
 After deletion node is at least two third full.
2m − 1
 Redistribute: if number of keys in siblings > .
3
2m − 1
 Merge nodes if number of keys in siblings < .
3
 Examples of deletion are omitted as they are similar to B-tree

deletion.
Questions ?
References
 Data Structure and Algorithms in C++, Adam Drozek.
 Introduction to Algorithms, T.H.Cormen, C.E.Leiserson,
R.L.Rivest, and C.Stein.
 Fundamentals of Database Systems, Elmasri Navathe.
 Fundamentals of Data Structures in C++, E.Horowitz, S.Sahni and

D.Mehta
 The Ubiquitous B-Tree, DOUGLAS COMER, ACM Computing
Surveys (CSUR), Volume 11 ,Issue 2(June 1979).
The End

B Trees Advanced)

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

B Trees Advanced)

Încărcat de

Drepturi de autor:

Formate disponibile

B-Trees & its

Yet another Tree!

 Disk access is very expensive, the disk I/O operation

 When dealing with external storage the disk accesses

 These trees are not optimized for external storage and

 To reduce disk accesses

 To achieve above goals we use Multiway (m-way) search

keys<K1 Ki-1<keys<Ki Kq-1<keys

 Ki are the keys, where 1 ≤ i ≤ q-1, q<=m

 The keys in each node are in ascending order K1 ≤ K2 ≤ ... ≤ Ki

 Access paths from root to leaf nodes are small.

keys<K1 Ki-1<keys<Ki Kq-1<keys

KT P1 K1 D1 Ki-1 DKi-1i-1 Pi Ki Di Kq-1 KDq-1q-1 Pq

Data pointer Data pointer

 The height of the tree is maximum when each node has

1 + 2(q - 1) + 2q(q - 1) + …… + 2q h-2 (q - 1)

Applying the formula of geometric progression

 Note: Order m is chosen so that B-tree node size is

 In BST, the keys are added in top down fashion

 B-tree is built bottom up, the keys are added in the

 B-tree can also be built in top down fashion using

Insert key in order Is node

Split node: If parent is full

 Now we explain these cases in detail.

Find appropriate leaf

Find appropriate leaf

No room for key 16 in leaf node

Insert key 19 in parent node in order Move median key 19 up and

No room for 19 in parent,

Insert 19 in parent node in order

No room for key 16,

Key found, delete key 3.

Underflow occurs, evenly redistribute keys

Delete 25 Move separator key down.

Move the keys to underflow

Underflow occurs, merge nodes.

Underflow occurs, merge nodes

 In B-tree, data pointers are stored in each node, thus

Data pointer Data pointer

 B+ -tree provides faster sequential access of data.

Sequential Sequence Set

keys<K1 Ki-1<keys<Ki Kq-1<keys

K1 D1 Ki-1 Di-1 Ki Di Kq-1 Dq-1

Data pointer Data pointer Data pointer Data pointer

Find appropriate leaf node for

Find appropriate leaf

No room for key 16,

Modify Sequence Set next node links

Insert 18 Find appropriate

Insert the first key of the new

Key found, delete key 14.

Underflow occurs, evenly redistribute keys

Note: unlike B-tree, the separator key in the

Underflow occurs, merge nodes.

Move keys in sibling to underflow

Sequence Set Start

 In B*-tree is a variant of B-tree introduced by Donald Knuth and

 In a B*-tree, all nodes except the root are required to be at least

 The number of keys in all non-root nodes in a B*-tree of order m

 Average utilization of B*-tree is 81%.

 A split in a B*-tree is delayed by attempting to redistribute the

 All three nodes participating in the split are guaranteed to be two-

Overflow occurs, evenly redistribute keys