Sunteți pe pagina 1din 24

B+ Trees

Simple Prefix B+ tree is a variant of B+ tree. B+ tree doesnt involve the use of prefixes as separators. Instead ,the separators in the index set blocks are copies of actual keys. Operations performed on B+ trees are same as simple prefix B+ tree. B+ tree contains set of records arranged in key order in a sequence set, coupled with an index set that provide rapid access to the block containing the record.

Simple prefix B+ tree is often good solution. because it makes index set shallower and we can place as many separators as we want. But 2 factors favors the use of B+ tree structure, which uses full copy of keys as separators in index set. - For some applns the cost of extra overhead of maintaining and using variable length structure outweighs the benefit of shorter separator. -Some keys do not show much compression when simple prefix method is used to produce separators.ex:34C18K756,34C18K757,etc.

To compress these keys as shorter separators,very expensive and complicated techniques need to be used . The height of tree remains acceptable with the use of full copies of keys as separators.it is better to chose the one which doesnt involve compression technique.

All these tools are used in structuring the file. Since all have similar properties, common features. we need a way to differentiate these .so we can chose the most appropriate one for the given job. These are not the only tools used in file structure. there are others which are discussed previously, ex: simple index structure are simpler, neater solution when they are maintained wholly in memory.

B-trees,B+ trees, Simple Prefix B+ trees in perspective.

These indexes can be coupled with sequence set to provide effective indexed sequential access. only when index grow large that could not be held in memory, we turned in to paged index structure: B+ tree, B -tree, simple prefix B+ tree. So B -tree, B+ tree and simple prefix B+ tree are not a panacea. But they have a broader applicability, situations like we require to access very large file sequentially and also indexed.

Common characteristics.
All are paged index structures. the shape of tree tend to be broad and shallow. All maintain height balance trees. the trees do not grow uneven. In All cases tree grows from bottom up. Balance is maintained thru block splitting, merging, and redistribution. Storage efficiency can be obtained by all 3 types. All these approaches can be adapted for use with variable length records .

Indexed sequential file access and simple prefix B+ Trees

Index set Block Size.


Index set nodes are treated as fixed order Btree Nodes evn though separators are variable in length. The physical size of index set=physical size of sequence set blocks. Instead of calling nodes it is refered as blocks in index set.

Reasons for using common block size for both index set and seq set
Block size for sequence set is chosen bcoz there is a good fit among this block size and the characteristics of disk drive and amt of memory available. Common size makes easier to implement buffering scheme to create virtual prefix b+ tree. If block size is same it makes seeking simpler if one file is used.
9

Internal structure of Index set blocks


In Index set use of shortest separators allow us to pack more separators per node. But if index set uses a fixed order B-Tree in which there are fixed num of separator the motivation behind the use of shorter separator disappears. So Index set blocks should contain variable num of variable length separators.
But, How do we search through these separators to know which separators lead us to correct block which contain the required 10 record.?

Since we consider index set nodes as blocks. and blocks generally are larger ,so any single block can hold large num of separators. We need to structure the block so that B S can be applied even though separators are variable length. Use of separate Index provide a means for performing B.S Index to separators contain fixed length references .where B.S on the index is applied to retrieve variable length records 11

Ex: Suppose we insert the following separators in to the index set block.
As, Ba, Bro, C, Ch, Cra, Dele, Edi, Err,Fa,Fle.

Hence these separators are merged and build an index for them.
AsBaBroCChCraDeleEdi ErrFaFle.

concatenated separators
00 02 04 07 08 10 13 17 20 23 25

Index to the separators

Purpose of Index set : is to guide us downward through levels of simple prefix B+ tree to the sequence set block we want to retrieve
12

So Index set block needs some way to store the references to its children blocks References are stored in terms of RBN(relative Block Num) If there are N separators with in a block the block has N+1 children and therefore it has to store N+1 RBNs The index set block should need a space to store.these N+1 RBNs,along with Separators and Index to seaprators.
13

Index set blocks need to be structured in a way which can accommodate all these information.It also includes information abt Separator count: it helps in finding the middle element in the index to the separators when B.S is applied. Total Length of Separator:It helps to find where the index to separators start from within block.
11 2 8 AsBaBroCChCra DeleEdi ErrFaFle. 00 02 04 07 08 10 13 17 20 23 25 B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B10 B11

Conceptual Structure of the Index set Block

14

Ex: if record Beck need to be searched ,Perform B.S on index to separators. Total length and separator count allow to find begg and end and also middle of the index to separators.
This results the first middle separator Cra which starts at pos 10.Apply B.S recursively we get that Beck falls b/n the separators Ba and Bro.Now RBN B02 gives the postion of next block.this next block could be another index set block.or it cud be sequence set block.
15

Simple prefix B+tree is of variable order.


It implies that 1.Num of separators in a block is directly limited by block size rather than by predetermined order. 2.since tree is variable order,operations to find overflow,underflow of block is not simple.Decisions about splitting and merging of blocks become more complicated.
16

Loading a simple Prefix B+ tree


One way of building this tree is through series of successive insertions. Procedure is same as followed in maintenance of simple prefix B+ tree.(split /redistriubute blocks in seq set and also in index set as we add new blcoks to seq set.) But, this is best suited for maintence and random order insertions. when we have set of records to be inserted No need to follow this 17 expensive process.

Instead we begin by sorting the records that are to be loaded this guarantee that next record we encounter is the next record we need to load. Working from sorted file. we can place the records in sequence set blocks one by one, starting from new block till it fills up. As 2 blocks gets created,we can determine shrotest separators and collected in to index set block which is held in memory until it is full.
18

Ex: index set block


AlWaSpB et 0 03 0 06

Access-also

Always-ask

Aspect-best

Better-case

Sequence set block. New block to be inserted is Catch-check and a new separtor CAT has to be inserted in to Index set block. Assume Index set block is full and we write it to disk.Now 19 how to insert CAT.

This shows we need to create New index set block but it cannot be placed at the same level as the other one in index set without a parent block. Instead we promote CAT separator to parent block but it cannot directly point to sequence set block. an Intermediate block is created with default values -1. The index set created is as shown below.

20

CAT

00

-1

-1

AlWa SpBet

0 0 0 3

06

-1 -1

-1

Accessalso

Alwaysask

Aspectbest

Bettercase

Catchcheck

21

If some more records got added up then the index set block which was empty gets filled with separators. But if catch is last record in the sequence set, then the intermediate block in index set is null .this creates severe out of balance problem if tree grows to higher levels. These empty nodes violate the B-tee rules that apply to index set.By applying B-tree maintenance operations it can be corrected after regular use. 22

Advantages of loading simple prefix B+ tree


1.o/p can be written sequentially. 2. we can make one pass over the data, rather than many passes. 3.No blocks need to be reorganized as we proceed. Random insertion produces blocks that are 6080 % full but sequence loading of B+ tree ensures 100% full..No wastage of space. Sequential loading has a control over amount of 23 empty space in the newly loaded tree.

This type of loading create a degree of spatial locality within our file. This locality can minimize seeking as we search down through the tree.

24

S-ar putea să vă placă și