File Organization and Indexing

File organization and Indexing
Cost estimation
Basic Concepts
Ordered Indices
CIS552
Indexing and Hashing
Estimating Costs
For simplicity we estimate the cost of an
operation by counting the number of blocks
that are read or written to disk.
We ignore the possibility of blocked access
which could significantly lower the cost of I/O.
We assume that each relation is stored in a
separate file with B blocks and R records per
block.
CIS552
Basic Concepts
Indexing is used to speed up access to desired data.
E.g. author catalog in library
A search key is an attribute or set of attributes used to

look up records in a file. Unrelated to keys in the db
schema.
An index file consists of records called index entries.
An index entry for key k may consist of
An actual data record (with search key value k)
A pair (k, rid) where rid is a pointer to the actual data record
A pair (k, bid) where bid is a pointer to a bucket of record pointers
Index files are typically much smaller than the original file
if the actual data records are in a separate file.
If the index contains the data records, there is a single file
with a special organization.
CIS552
Types of Indices
The records in a file may be unordered or ordered
sequentially by some search key.
A file whose records are unordered is called a heap file.
If an index contains the actual data records or the records
are sorted by search key in a separate file, the index is
called clustering (otherwise non-clustering).
In an ordered index, index entries are sorted on the search
key value. Other index structures include trees and hash
tables.
A primary index is an index on a set of fields that includes
the primary key. Any other index is a secondary index.
CIS552
Dense Index Files

Dense index index record appears for every search-key
value in the file.
Brighton
Downtown
Miami
Perryridge
Redwood
Round Hill
CIS552
Brighton
A-217
750
Downtown
A-101
500
Downtown
A-110
600
Miami
A-215
700
Perryridge
A-102
400
Perryridge
A-201
900
Perryridge
A-218
700
Redwood
A-222
700
Round Hill
A-305
350
Sparse Index Files

A clustering index may be sparse.
Index records for only some search-key values.
To locate a record with search-key value k we:
Find index record with largest search-key value < k
Search file sequentially starting at the record to which the index
record points
Less space and less maintenance overhead for insertions

and deletions.
Generally slower than dense index for locating records.
Good tradeoff: sparse index with an index entry for every
block in file, corresponding to least search-key value in the
block.
CIS552
Example of Sparse Index Files

Brighton
Miami
Redwood
CIS552
Brighton
A-217
750
Downtown
A-101
500
Downtown
A-110
600
Miami
A-215
700
Perryridge
A-102
400
Perryridge
A-201
900
Perryridge
A-218
700
Redwood
A-222
700
Round Hill
A-305
350
Multilevel Index
If an index does not fit in memory, access becomes
expensive.
To reduce number of disk accesses to index records, treat
the index kept on disk as a sequential file and construct a
sparse index on it.
outer index a sparse index on main index
inner index the main index file
If even outer index is too large to fit in main memory, yet
another level of index can be created, and so on.
Indices at all levels must be updated on insertion or
deletion from the file.
CIS552
Multilevel Index (Cont.)

inner index
outer index
Index
Block 0
Data
Block 0
Data
Block 1
M
Index
Block 1
M
M
CIS552
M
9
Non-clustering Indices
Frequently, one wants to find all the records whose values
in a certain field satisfy some condition, and the file is not
ordered on the field.
Example 1: In the account database stored sequentially by account
number, we may want to find all accounts in a particular branch.
Example 2: As above, but where we want to find all accounts with a
specified balance or range of balances.
We can have a non-clustering index with an index record for

each search-key value. The index record points to a bucket
that contains pointers to all the actual records with that
particular search-key value.
CIS552
10
Secondary Index on balance field of account

350
400
500
600
700
750
900
CIS552
Brighton
A-217
750
Downtown
A-101
500
Downtown
A-110
600
Miami
A-215
700
Perryridge
A-102
400
Perryridge
A-201
900
Perryridge
A-218
700
Redwood
A-222
700
Round Hill
A-305
350
11
Clustering and Non-clustering

Non-clustering indices have to be dense.
Indices offer substantial benefits when searching for records.
When a file is modified, every index on the file must be
updated. Updating indices imposes overhead on database
modification.
Sequential scan using clustering index is efficient, but a
sequential scan using a non-clustering index is expensive
each record access may fetch a new block from disk.
CIS552
12
File organization in detail

http://ecomputernotes.com/databasesystem/rdbms/types-of-file-organization
Note : read till Indexes Sequential Access

Method (ISAM) only.
Refer book for Hashing and collision as I marked in
the class.
CIS552
13

File Organization and Indexing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

File Organization and Indexing

Încărcat de

Drepturi de autor:

Formate disponibile

File organization and Indexing

Indexing and Hashing

Indexing and Hashing

A search key is an attribute or set of attributes used to

Indexing and Hashing

Indexing and Hashing

Dense Index Files

Indexing and Hashing

Sparse Index Files

Less space and less maintenance overhead for insertions

Indexing and Hashing

Example of Sparse Index Files

Indexing and Hashing

Indexing and Hashing

Multilevel Index (Cont.)

Indexing and Hashing

We can have a non-clustering index with an index record for

Indexing and Hashing

Secondary Index on balance field of account

Indexing and Hashing

Clustering and Non-clustering

Indexing and Hashing

File organization in detail

Note : read till Indexes Sequential Access

Indexing and Hashing

S-ar putea să vă placă și