Sunteți pe pagina 1din 47

File-System Interface

File Concept
Access Methods
Directory Structure
File-System Management
Allocation Methods
Free-Space Management

File Concept
Logically a file is sequence of logical records i.e
sequence of bits and bytes.
A file is named collection of related information that
is recorded on secondary storage
Contiguous logical address space
Types:
Data
numeric
character
binary

Program

File Attributes
Name only information kept in human-readable
form.
Type needed for systems that support different
types.
Location pointer to file location on device.
Protection controls who can do reading, writing,
executing.
Time, date, and user identification
Information about files are kept in the directory
structure, which is maintained on the disk.
Owner User who created the file
Size current file size in bytes, words or blocks

File Operations
Typical operations include the following:
Create: A new file is defined and positioned within the
structure of files. To create a file space must be found for
the file in storage and an entry for the file should be in
the directory structure
Delete: A files entry is removed from the directory
structure and space occupied by it are added to the free
space list
Open: An existing file is declared to be opened by a
process,
allowing
the
process
to
perform
read/write/lseek/close functions on the file
Close: The file is closed with respect to a process, so
that the process no longer may perform functions on the
file, until the process opens the file again

File Operations (Contd)


Read: A process reads all or a portion of the data in a file
from the current file position pointer
Write: A process updates a file, either by adding new data
that expands the size of the file or by changing the values
of existing data items in the file depending on the current
file position pointer
Repositioning within a file: the directory is searched for the
appropriate entry and current file position pointer is
repositioned to a given value. This need not involve any
actual I/O operation only lseek
Truncating a file: The user may want to erase the contents
of the file but keep its attributes, rather than deleting a file
and recreate it. This operation retains its old attributes
and file size is reset to zero and its space is released

File Operations (Contd)


Append: This call is restricted form of write and
it can only add data to the end of the file
Get Attributes: Using this operations, attributes
of a file can be obtained, which is required for
certain operations of systems; e.g., modified
time is used by the make command in Unix
Set Attributes: Some of the attributes are user
settable and can be changed after the file has
been created; e.g., file protection attributes
Rename: It frequently happens that user need
to change the name of an existing file

File Types name,


extension

File Type

Usual extension

Function

File Table
Operating system keeps a small table containing information
about all open files.
When a file operation is requested, the file is specified via an
index into this table, so no searching is required.
When a file is closed by the process the OS remove the entry
from the File Table
The open file table also has an open count associated with each
file, indicating the # of processes that have the file open.
Several piece of information are associated with file table:
File pointer
File open count
Disk location of the file
Access rights

Layered File System

Access Methods
The information in file can be accessed in several ways
Early operating system supported only one type of file
access that is sequential
With the introduction of disk technology, random access
of files came into picture, which is essential for most of
the applications of modern days

Access Method

One record after another,


from beginning to end

Access one specific record without having


to retrieve all records before it

Sequential Access
In this method data records are retrieved in the same
order in which have been stored on the disk
This mode of access is most common, for example editors
and compilers usually access files in this fashion
A read operation read next reads the next portion of
the file and automatically advances a file pointer, which
tracks the I/O location
Similarly a write operation write next appends to the
end of the file and advances the end of the newly written
data (new end of file)
For example magnetic tapes store information in
sequential manner and retrieval is performed in the same
manner

Sequential Access (Contd)


Sequential file records can only be accessed
sequentially, one after another, from beginning to end
Applications that need to access all records from
beginning to end
Personal information

Because you have to process each record, sequential


access is more efficient and easier than random access
Sequential file is not efficient for random access

Direct Access
A file is made up of fixed length logical records that allow
programs to read and write records rapidly in no particular
order
It is based on the disk model of the file and it allows random
access to any block
For direct access the file is treated as a numbered sequence
of blocks or records
Hence we may read a block 12, then block 57 and then
write block 6
There is no restriction on the order of reading or writing for
a direct access file
Direct access files are of great use for immediate access to
a large amount of information

Indexed file
Indexed file approach is helpful with multiple attribute fields
like in database files
In this every field is associated with an index key
While querying data, the index key is kept in the memory and
related records are fetched from the disk
These indexing keys help distinguishing one record from the
other
Indexes are managed in various ways like single level index
and multilevel index
These are also called dense and sparse index respectively
In single level, every record has an index key associated with it
These index values can get associated with primary or
secondary index key

Indexed file (Contd)


In case of multi level index, a group of entries have one
index key associated and rest of the entries in that group
are further being indexed based on another key criteria
Indexes require limited memory space for execution as
only index key values are required at a moment in the
memory
Indexed access method lies between sequential and
random access and provide benefit in case of large size of
record files

Indexed file (Contd)


An index file is made of a data file, which is a
sequential file, and an index
Index a small file with only two fields:
The key of the sequential file
The address of the corresponding record on the
disk
To access a record in the file :
1.

Load the entire index file into main memory

2.

Search the index file to find the desired key

3.

Retrieve the address of the record

4.

Retrieve the data record (using the address)

Logical view of an indexed


file

Another view of indexed


files

Mapping in a hashed file


A hashed file uses a hash function to map the key to the
address
Eliminates the need for an extra file (index)
There is no need for an index and all of the overhead
associated with it

Direct hashing
Direct Hashing the key is the address

Access Methods
Sequential Access

read next
write next
reset
no read after last write
(rewrite)

Direct Access
read n
write n
position to n
read next
write next
rewrite n

n = relative block

number
Index method:
To find a record in the file,we first search the index, and then
use the pointer to access the file directly and to find the
desired record.(index file & relative file)

Directory Structure
A collection of nodes containing information about all files.

Directory

Files

F1

F2

F3

F4
Fn

Both the directory structure and the files reside on disk.


Backups of these two structures are kept on tapes.

Information in a Device
Directory
Name
Type
Address
Current length
Maximum length
Date last accessed
Date last updated
Owner ID Protection information

Operations Performed on
Directory
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system

Single-Level Directory
A single directory for all users.

Naming problem
Grouping problem

Two-Level Directory
Separate directory for each user.

Path name
Can have the same file name for different user
Efficient searching

Tree-Structured Directories

Tree-Structured Directories
Efficient searching
Grouping Capability
Current directory (working
directory)
cd /spell/mail/prog
type list

Tree-Structured Directories
Absolute or relative path name
Creating a new file is done in current directory.
Delete a file
rm <file-name>
Creating a new subdirectory is done in current directory.
mkdir <dir-name>

Example: if in current directory /spell/mail


mkdir count

mail
prog

copy

prt

exp

count

Deleting mail deleting the entire subtree rooted by


mail.

File Management System


A file management system is the set of system
software that provides services to users and
applications in the use of files
Typically, the only way that a user or application may
access files is through the file management system

This relieves the user or programmer of the


necessity of developing special-purpose software
for each application
It provides the system with a consistent, well-defined
means of controlling its most important asset.

Objectives for a File Management


System
Objectives include:
To meet the data management needs and requirements of the user
To guarantee, to the extent possible, that the data in the file are valid
To optimize performance, both from the system point of view in terms of
over-all throughput and from the users point of view in terms of
response time
To provide I/O support for a variety of storage device types
To minimize or eliminate the potential for lost or destroyed data
To provide a standardized set of I/O interface routines to user processes
To provide I/O support for multiple users, in the case of multiple-user
systems

Requirements for a general purpose


system
1. Each user should be able to create, delete, read, write, and

modify files
2. Each user may have controlled access to other users files
3. Each user may control what types of accesses are allowed to

the users files


4. Each user should be able to restructure the users files in a

form appropriate to
5. Each user should be able to move data between files.
6. Each user should be able to back up and recover the users

files in case of damage


7. Each user should be able to access his or her files by name

rather than by numeric identifier the problem

Allocation Methods
Main problem is how the space is allocated to these files,
so that the disk space is utilized effectively and file can
be accessed quickly
An allocation method refers to how disk blocks are
allocated for files:
Three major methods of allocating disk space are in wide use
Contiguous allocation
Linked allocation
Indexed allocation

Each method has its own advantages and disadvantage

Contiguous Allocation
Each file occupies a set of contiguous blocks on the disk
Disk addresses define a linear ordering on the disk
With this ordering, assuming that only one job is
accessing the disk, there is no head movement
required, even if required it is to the next track
Hence the disk seek time is minimal
Simple only starting location (block #) and length
(number of blocks) are required
Random access : both sequential as well as direct
access is efficient

Contiguous Allocation of
Disk Space

(Contd)
One main problem, is finding the space for a new file, which depends
on the logic implemented to manage free space
Wasteful of space (dynamic storage-allocation problem): how to satisfy
a request for size n from a list of free holes
First fit and best fit are the most common strategies used to select the
free hole from the set of available holes
They are efficient but not best in terms of storage utilization
All these algorithms suffer from the problem of external fragmentation
As files are created and deleted, the free disk space is broken into
small pieces
Files cannot grow in size or need to know the maximum size at the time of
creation; but leads to inefficient usage if disk space

Contiguous Allocation
Each file occupies a set of contiguous blocks on
the disk
Simple only starting location (block #) and
length (number of blocks) are required
Random access
Wasteful of space (dynamic storage-allocation
problem)
First fit/best fit
Mapping from logical to physical

Linked Allocation
Linked allocation each file is a linked list of blocks ;

blocks may be scattered anywhere on the disk


File ends at nil pointer
No external fragmentation
Each block contains pointer to next block

block

No compaction, no external fragmentation


Free space management system called when new block
needed
Improve efficiency by clustering blocks into groups but
increases internal fragmentation
Reliability can be a problem
Locating a block can take many I/Os and disk seeks

pointer

Linked Allocation
Allocate as needed, link together; e.g., file starts at block 9

File-Allocation Table

FAT (File Allocation Table)


variation

Beginning of volume has


table, indexed by block
number

Much like a linked list, but


faster on disk and
cacheable

New block allocation simple

Indexed Allocation
Indexed allocation
Each file has its own index block(s) of pointers to its
data blocks

Logical view
Need index table
Random access

INDEX TABLE

Brings all pointers together into the index block.


Dynamic access without external fragmentation, but
have overhead of index block

Example of Indexed
Allocation

Free-Space Management
To keep track of free disk space,
the system maintains a free-space
list.
The free space list records all free
disks blocks-those not allocated to
some file or directory.

Free-Space Management
Bit vector (n blocks)

Here each block is represented by 1 bit.


0 1

bit[i] =

n-1

1 block[i] free
0 block[i] occupied

Advantage:
It is relatively simple and efficient in finding the first free blocks
or n consecutive free blocks on the disk.
Easy to get contiguous files
Disadvantage:
May require special hardware support.
Bit map requires extra space, if the disk size is large.

Free-Space Management
Linked list (free list)
This approach maintains a linked list of all the free disk blocks.
The first block contains a pointer to the next free disk block,
and so on
Cannot get contiguous space easily
No waste of space

Grouping
This store the address of n free blocks in the first free block.
The last block contain the addresses of another n free blocks
and so on

Counting
We can keep the address of the first free block and the
number n of free contiguous blocks that follow the first block.
Each entry in the free space list consist of a disk address and a
count.

Linked Free Space List on


Disk

S-ar putea să vă placă și