Documente Academic
Documente Profesional
Documente Cultură
File is a collection of records related to each other. The file size is limited by the size
of memory and storage medium. A file contains Records and Records contain fields;
Fields contain data items; Data items contain characters (alphabets, digits, special
characters, etc.). Each character occupies one byte for its storage.
File organization ensures that records are available for processing. The technique
used to represent and store the records in a file is known as file organization.
For example, if we want to retrieve employee records in alphabetical order of name.
Sorting the file by employee name is a good file organization. However, if we want
to retrieve all employees whose marks are in a certain range, a file is ordered by
employee name would not be a good file organization
Direct access file is also known as random access or relative file organization.
In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout the
file.
The records does not need to be in sequence because they are updated directly
and rewritten back in the same location.
This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
It is also called as hashing.
Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
In direct access file, sorting of the records are not required.
It accesses the desired records immediately.
It updates several files quickly.
It has better control over record allocation.
In indexed sequential access file, sequential file and random file access is
possible.
It accesses the records very fast if the index table is properly organized.
The records can be inserted in the middle of the file.
It provides quick access for sequential and direct processing.
It reduces the degree of the sequential search.
The basic approach to providing the linkage between an index and the file of
data records is called multilist organization. A multilist file maintains an index
for each secondary key.
The index for secondary key contains only one primary key value related to
that secondary key. That record will be linked to other records containing the
same secondary key in the data file.
The index entry in a multi-list organization points to the first data record in the
list. It has fixed length records.
The multi-list organization differs from inverted file in that while the entry in
the inverted file index for a key value has a pointer to each data record with that
key value, the entry in the multi-list index for a key value has just one pointer to
the first data record with that key value.
Types of Indexes
Each record, whenever it is stored, is given particular location, and this location
number is called its address. Through this address the record can be accessed. An
index is a table which stores the key values and the corresponding addresses of the
records in a file. Given a key value, its address is located in the index and the
corresponding record can be accessed using this address. The idea behind an index
structure is similar to the one used commonly in textbooks. In a textbook index
important terms are listed at the end of the book in alphabetic order.
An index is usually defined on a single field of a file, called an Indexing Field. The
index typically stores each value of the index field along with a list of pointers to all
disk blocks that contain a record with that field value. The values in the index are
ordered and the index file is much smaller than the data file. There are several types
of indexes. These include primary index, clustering index, secondary index etc.
A primary index is an index specified on the primary key of a data file. Primary key
is a field which contains unique values and uniquely identifies each record.
A clustering index stores data similar to a phone directory where all people with the
same last name are grouped together. A clustering index is specified on a field that
does not have a distinct value, for each record. These records are then stored in
ascending or descending order according to the data values in this field. A
table/database can have only one clustering index.
A third type of index, called a secondary index, can be specified on any field other
than the primary key of the file. A secondary key is any field other than the primary
key that is used to uniquely identify a record in a table.