Sunteți pe pagina 1din 41

FILE ORGANISATION

FILE ORGANIZATION
The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. An approach, assume: record size is fixed

each file has records of one particular type only


different files are used for different relations

FIXED-LENGTH RECORDS
Simple approach:

Store record i starting from byte n (i 1), where n is the size of each record.
Deletion of record i: move records i + 1, . . ., n to i, . . . , n 1 OR move record n to I OR do not move records, but link all free records on a free list

VARIABLE-LENGTH RECORDS
Variable-length records arise in database systems in several ways: Storage of multiple record types in a file. Record types that allow variable lengths for one or more fields. Record types that allow repeating fields (used in some older data models).

TYPES OF FILE ORGANIZATION


There are mainly three types of file organizations:
Sequential Relative

Indexed

SEQUENTIAL FILE ORGANIZATION


The records in the file are ordered by a search key or primary key.

Deletion use pointer chains

Insertion locate the position where the record is to be inserted


if there is free space insert there if no free space, insert the record in an overflow block In either case, pointer chain must be updated Need to reorganize the file from time to time to restore sequential order

RELATIVE FILE ORGANIZATION


Within a relative file are numbered positions, called cells. These cells are of fixed equal length and are consecutively numbered from 1 to n, where 1 is the first cell, and n is the last available cell in the file. Records in a relative file are accessed according to cell number. A cell number is a record's relative record number; its location relative to the beginning of the file. By specifying relative record numbers, you can directly retrieve, add, or delete records regardless of their locations.

INDEXED FILE ORGANIZATION


An indexed file contains records ordered by a record key. Each record contains a field that contains the record key. The record key uniquely identifies the record. Indexed organization is similar to relative organization. In place of relative position of the block, a unique key of the block is used to find the block.

INDEXING

INDEXING
Indexing mechanisms are used to optimize access to data (records) managed in tables. For example, the author catalogue in a library is a type of index. An Index File consists of records (called index entries) of the form
search key value pointer to block in data table

DENSE INDEX FILES


Index record appears for every search-key value in the file.

SPARSE INDEX FILES


Contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key


To locate a record with search-key value K we: Find index record with largest search-key value < K Search file sequentially starting at the record to which the index record points

SINGLE LEVEL INDEXING


A single level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is called an access path to the file. These are of two types:

Primary Index also clustering index


Secondary Index also non-clustering index

PRIMARY INDEX
A primary index is an ordered file whose entries are of fixed length with two fields:

<value of primary key; address of data block>


The data file is ordered on the primary key field and requires primary key for each record to be unique. The index file includes one entry for each block.

PROBLEM WITH PRIMARY INDEXES


Insertion or deletion of record in ordered data file involves: Making space or deleting space in the data file

And changing the index entries to reflect the new situation.

SECONDARY INDEX
A secondary index is an ordered file whose entries are of fixed length with two fields: <value of key; address of data block or record pointer> A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.

This is used to find records which satisfy the given value for some specific column.

Often one wants to find all records whose values in a certain field (which is not the search key of the primary index) satisfy some condition. One can specify a secondary index with an index entry for each search key value; index entry points to a bucket, which contains pointers to all the actual records with that particular search key.

MULTI-LEVEL INDEXING
If primary index is too big to fit in memory, access to records becomes expensive. In this case multi-level indexing can be used.
Multi-level Indexing consists of different levels of indices. An outer index table points to inner index tables. The inner tables point to the record files.

Multilevel Index structure.

INDEX UPDATING Record Deletion


Single-level index deletion:

Dense indices deletion of search-key: similar to file record deletion.


Sparse indices

If deleted key value exists in the index, the value is replaced by the next search-key value in the file (in search-key order).
If the next search-key value already has an index entry, the entry is deleted instead of being replaced .

INDEX UPDATING Record Insertion


Single-level index insertion:

Perform a lookup using the key value from inserted record


Dense indices if the search-key value does not appear in the index, insert it.

Sparse indices if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created.
If a new block is created, the first search-key value appearing in the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms.

B TREE
B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. It is a generalization of a binary search tree in that a node can have more than two children. In B-trees, internal nodes can have a variable number of child nodes within some pre-defined range. When data are inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split.

Each internal node of a B-tree will contain a number of keys which divide its subtrees. For example, if an internal node has 3 child nodes then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.

B+ TREE
A B+-tree is a tree satisfying the following properties:

All paths from root to leaf are of the same length


Each node that is not a root or a leaf has between n/2 and n children.

A leaf node has between (n1)/2 and n1 values


Special cases: If the root is not a leaf, it has at least 2 children.

If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and ( n1) values.

Typical node: Ki are the search-key values Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes). The search-keys in a node are ordered K1 < K2 < K3 < . . . < Kn1

LEAF NODES
Properties of a leaf node:

For i = 1, 2, . . ., n1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. If Li, Lj are leaf nodes and i < j, Lis search-key values are less than Ljs search-key values
Pn points to next leaf node in search-key order

NON LEAF NODES


Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: All the search-keys in the subtree to which P1 points are less than K1

For 2 i n 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki1 and less than Ki
All the search-keys in the subtree to which Pn points have values greater than or equal to Kn1

QUERIES ON B+-TREES
Find all records with a search-key value of k. N=root Repeat Examine N for the smallest search-key value > k. If such a value exists, assume it is Ki. Then set N = Pi Otherwise k Kn1. Set N = Pn Until N is a leaf node If for some i, key Ki = k follow pointer Pi to the desired record or bucket. Else no record with search-key value k exists.

B+-TREES UPDATING Insertion


Find the leaf node in which the search-key value would appear

If the search-key value is already present in the leaf node


Add record to the file If the search-key value is not present, then

Add the record to the main file (and create a bucket if necessary)
If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node

Otherwise, split the node (along with the new (key-value, pointer) entry).

Splitting a Leaf node: Take the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first n/2 in the original node, and the rest in a new node. Let the new node be p, and let k be the least key value in p. Insert (k,p) in the parent of the node being split. If the parent is full, split it and propagate the split further up.

Splitting a non leaf node: when inserting (k,p) into an


already full internal node N Copy N to an in-memory area M with space for n+1 pointers and n keys Insert (k,p) into M Copy P1,K1, , K n/2-1,P n/2 from M back into node N Copy Pn/2+1,K n/2+1,,Kn,Pn+1 from M into newly allocated node N Insert (K n/2,N) into parent N

B+-TREES UPDATING Deletion


Find the record to be deleted, and remove it from the main file and from the bucket (if present) Remove (search-key value, pointer) from the leaf node if there is no bucket or if the bucket has become empty

If the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then merge the siblings .

Otherwise, if the node has too few entries due to the removal, but the entries in the node and a sibling do not fit into a single node, then redistribute the pointers:
Redistribute the pointers between the node and a sibling such that both have more than the minimum number of entries. Update the corresponding search-key value in the parent of the node.

The node deletions may cascade upwards till a node which has n/2 or more pointers is found.
If the root node has only one pointer after deletion, it is deleted and the sole child becomes the root.

HASHING

STATIC HASHING
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search key value using a hash function. Hash function h is a function from the set of all search key values K to the set of all bucket addresses B.

HASH FUNCTIONS
Worst hash function maps all search key values to the same bucket; this makes access time proportional to the number of search key values in the file. An ideal hash function is uniform, i.e., each bucket is assigned the same number of search key values from the set of all possible values. Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search key values in the file.

DEFICIENCIES OF STATIC HASHING


In static hashing, function h maps search key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. If initial number of buckets is too small, and file grows, performance will degrade.
If space is allocated for anticipated growth, a significant amount of space will be wasted initially. If database shrinks, again space will be wasted.

DYNAMIC HASHING
Allows the hash function to be modified dynamically.

Extendable hashing is one form of dynamic hashing. It splits and coalesces buckets as database size changes.
We choose a hash function that generates values over a relatively large range. Buckets are created on demand, and all bits of the hash are not used initially, the no. of bits used changes with change in database size.

S-ar putea să vă placă și