Documente Academic
Documente Profesional
Documente Cultură
FILE ORGANIZATION
The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. An approach, assume: record size is fixed
FIXED-LENGTH RECORDS
Simple approach:
Store record i starting from byte n (i 1), where n is the size of each record.
Deletion of record i: move records i + 1, . . ., n to i, . . . , n 1 OR move record n to I OR do not move records, but link all free records on a free list
VARIABLE-LENGTH RECORDS
Variable-length records arise in database systems in several ways: Storage of multiple record types in a file. Record types that allow variable lengths for one or more fields. Record types that allow repeating fields (used in some older data models).
Indexed
INDEXING
INDEXING
Indexing mechanisms are used to optimize access to data (records) managed in tables. For example, the author catalogue in a library is a type of index. An Index File consists of records (called index entries) of the form
search key value pointer to block in data table
PRIMARY INDEX
A primary index is an ordered file whose entries are of fixed length with two fields:
SECONDARY INDEX
A secondary index is an ordered file whose entries are of fixed length with two fields: <value of key; address of data block or record pointer> A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.
This is used to find records which satisfy the given value for some specific column.
Often one wants to find all records whose values in a certain field (which is not the search key of the primary index) satisfy some condition. One can specify a secondary index with an index entry for each search key value; index entry points to a bucket, which contains pointers to all the actual records with that particular search key.
MULTI-LEVEL INDEXING
If primary index is too big to fit in memory, access to records becomes expensive. In this case multi-level indexing can be used.
Multi-level Indexing consists of different levels of indices. An outer index table points to inner index tables. The inner tables point to the record files.
If deleted key value exists in the index, the value is replaced by the next search-key value in the file (in search-key order).
If the next search-key value already has an index entry, the entry is deleted instead of being replaced .
Sparse indices if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created.
If a new block is created, the first search-key value appearing in the new block is inserted into the index.
Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms.
B TREE
B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. It is a generalization of a binary search tree in that a node can have more than two children. In B-trees, internal nodes can have a variable number of child nodes within some pre-defined range. When data are inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split.
Each internal node of a B-tree will contain a number of keys which divide its subtrees. For example, if an internal node has 3 child nodes then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.
B+ TREE
A B+-tree is a tree satisfying the following properties:
If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and ( n1) values.
Typical node: Ki are the search-key values Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes). The search-keys in a node are ordered K1 < K2 < K3 < . . . < Kn1
LEAF NODES
Properties of a leaf node:
For i = 1, 2, . . ., n1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. If Li, Lj are leaf nodes and i < j, Lis search-key values are less than Ljs search-key values
Pn points to next leaf node in search-key order
For 2 i n 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki1 and less than Ki
All the search-keys in the subtree to which Pn points have values greater than or equal to Kn1
QUERIES ON B+-TREES
Find all records with a search-key value of k. N=root Repeat Examine N for the smallest search-key value > k. If such a value exists, assume it is Ki. Then set N = Pi Otherwise k Kn1. Set N = Pn Until N is a leaf node If for some i, key Ki = k follow pointer Pi to the desired record or bucket. Else no record with search-key value k exists.
Add the record to the main file (and create a bucket if necessary)
If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node
Otherwise, split the node (along with the new (key-value, pointer) entry).
Splitting a Leaf node: Take the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first n/2 in the original node, and the rest in a new node. Let the new node be p, and let k be the least key value in p. Insert (k,p) in the parent of the node being split. If the parent is full, split it and propagate the split further up.
If the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then merge the siblings .
Otherwise, if the node has too few entries due to the removal, but the entries in the node and a sibling do not fit into a single node, then redistribute the pointers:
Redistribute the pointers between the node and a sibling such that both have more than the minimum number of entries. Update the corresponding search-key value in the parent of the node.
The node deletions may cascade upwards till a node which has n/2 or more pointers is found.
If the root node has only one pointer after deletion, it is deleted and the sole child becomes the root.
HASHING
STATIC HASHING
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search key value using a hash function. Hash function h is a function from the set of all search key values K to the set of all bucket addresses B.
HASH FUNCTIONS
Worst hash function maps all search key values to the same bucket; this makes access time proportional to the number of search key values in the file. An ideal hash function is uniform, i.e., each bucket is assigned the same number of search key values from the set of all possible values. Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search key values in the file.
DYNAMIC HASHING
Allows the hash function to be modified dynamically.
Extendable hashing is one form of dynamic hashing. It splits and coalesces buckets as database size changes.
We choose a hash function that generates values over a relatively large range. Buckets are created on demand, and all bits of the hash are not used initially, the no. of bits used changes with change in database size.