Sunteți pe pagina 1din 5

FILE ORGANIZATION

File is a collection of records related to each other. The file size is limited by the size
of memory and storage medium. A file contains Records and Records contain fields;
Fields contain data items; Data items contain characters (alphabets, digits, special
characters, etc.). Each character occupies one byte for its storage.

File organization ensures that records are available for processing. The technique
used to represent and store the records in a file is known as file organization.
For example, if we want to retrieve employee records in alphabetical order of name.
Sorting the file by employee name is a good file organization. However, if we want
to retrieve all employees whose marks are in a certain range, a file is ordered by
employee name would not be a good file organization

There are three types of organizing the file:

1. Sequential access file organization


2. Direct access files organization
3. Indexed sequential access files organization

1. Sequential access file organization

 Sequential file organization is the simplest file organization technique. In a


sequentially organized file, records are written in a sequence in one long list.
The records in the file are arranged, in the same sequence in which they were
originally entered/written into the file. That is, the records of the file are stored
one after another.
 Storing and sorting in contiguous block within files on tape or disk is called as
sequential access file organization.
 The records are arranged in the ascending or descending order of a key field.
 Sequential file search starts from the beginning of the file and the records can
be added at the end of the file.

Advantages of sequential file

 It is simple to program and easy to design.


 The ease of access to the next record.
 The simplicity of organization and the absence of auxiliary data structures.

Disadvantages of sequential file

 Sequential file is time consuming process.


 It has high data redundancy.
 Random searching is not possible.
 The drawback of a sequential file is that once a sequential file is created,
records can be added only at the end of the file. It is not possible to insert
records in the middle of the file without rewriting the file and it is also not
possible to modify an existing record without rewriting the file. To delete a
record you should locate it first.

2. Direct access file organization

 Direct access file is also known as random access or relative file organization.
 In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout the
file.
 The records does not need to be in sequence because they are updated directly
and rewritten back in the same location.
 This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
 It is also called as hashing.

Advantages of direct access file organization

 Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
 In direct access file, sorting of the records are not required.
 It accesses the desired records immediately.
 It updates several files quickly.
 It has better control over record allocation.

Disadvantages of direct access file organization

 Direct access file does not provide backup facility.


 It is expensive.
 It has less storage space as compared to sequential file.

3. Indexed sequential access file organization

 A sequential (or sorted on primary keys) file that is indexed is called an


indexed sequential file. The index provides for random access to records, while
the sequential nature of the file provides easy access to the subsequent records
as well as sequential processing.
 Indexed sequential access file combines both sequential file and direct access
file organization.
 In indexed sequential access file, records are stored randomly on a direct
access device such as magnetic disk by a primary key.
 This file has multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.
 Indexed sequential file is designed to overcome the limitations of the
sequential file. In indexed sequential file, a file is sequenced on a particular
field, and an index for that file is built. Thus in indexed sequential file a type of
indexing technique is added. The index provides a mechanism for faster
search.
 The data can be access either sequentially or randomly using the index. The
index is stored in a file and read into memory when the file is opened.

Advantages of Indexed sequential access file organization

 In indexed sequential access file, sequential file and random file access is
possible.
 It accesses the records very fast if the index table is properly organized.
 The records can be inserted in the middle of the file.
 It provides quick access for sequential and direct processing.
 It reduces the degree of the sequential search.

Disadvantages of Indexed sequential access file organization

 Indexed sequential access file requires unique keys and periodic


reorganization.
 Indexed sequential access file takes longer time to search the index for the data
access or retrieval.
 It requires more storage space.
 It is expensive because it requires special software.
 It is less efficient in the use of storage space as compared to other file
organizations.

MULTIKEY FILE ORGANIZATION

1. Inverted list file organization

 It is an index data structure which maps content to its location in a document, a


set of documents or a database file.
 In inverted file organization, a linkage is provided between an index and the
file of data records. It has variable length records.
 Inverted files may also result in space saving compared with other file
structures.
2. Multi-list file organization

 The basic approach to providing the linkage between an index and the file of
data records is called multilist organization. A multilist file maintains an index
for each secondary key.
 The index for secondary key contains only one primary key value related to
that secondary key. That record will be linked to other records containing the
same secondary key in the data file.
 The index entry in a multi-list organization points to the first data record in the
list. It has fixed length records.

The multi-list organization differs from inverted file in that while the entry in
the inverted file index for a key value has a pointer to each data record with that
key value, the entry in the multi-list index for a key value has just one pointer to
the first data record with that key value.

Both inverted files and multilist files have:


 An index for each secondary key.
 An index entry for each distinct value of the secondary key.
 The index may be tabular or tree-structured.
 The entries in an index may or may not be sorted.
 The pointers to data records may be direct or indirect.

Types of Indexes

Each record, whenever it is stored, is given particular location, and this location
number is called its address. Through this address the record can be accessed. An
index is a table which stores the key values and the corresponding addresses of the
records in a file. Given a key value, its address is located in the index and the
corresponding record can be accessed using this address. The idea behind an index
structure is similar to the one used commonly in textbooks. In a textbook index
important terms are listed at the end of the book in alphabetic order.

An index is usually defined on a single field of a file, called an Indexing Field. The
index typically stores each value of the index field along with a list of pointers to all
disk blocks that contain a record with that field value. The values in the index are
ordered and the index file is much smaller than the data file. There are several types
of indexes. These include primary index, clustering index, secondary index etc.

A primary index is an index specified on the primary key of a data file. Primary key
is a field which contains unique values and uniquely identifies each record.

A clustering index stores data similar to a phone directory where all people with the
same last name are grouped together. A clustering index is specified on a field that
does not have a distinct value, for each record. These records are then stored in
ascending or descending order according to the data values in this field. A
table/database can have only one clustering index.

A third type of index, called a secondary index, can be specified on any field other
than the primary key of the file. A secondary key is any field other than the primary
key that is used to uniquely identify a record in a table.

S-ar putea să vă placă și