Sunteți pe pagina 1din 27

The Design and Implementation

of a Log-Structured File System


Mendel Rosenblum
John K. Ousterhout
Univ. of California, Berkeley

Presented by
Sunil Chowdary Vejandla
Introduction
• CPU speeds have increased rapidly while disk access
times only improved a bit.
• Speed of applications is bound by the disk access times.
• To lessen the impact of disk access on applications, LFS is
introduced.
• Disk traffic is dominated by writes.
• Log – increases write performance, faster crash recovery.
• Current file systems scan the entire disk, but a LFS
examines only the most recent portion of the log.
• LFS uses log as a permanent storage.
• Segment cleaner.
Design for file systems of 1990’s
• Two forces : Technology, and Workload.
• Technology : processors, disks, and main memory.
• Processor speed is increasing rapidly, which puts pressure on
other devices to speedup.
• Disks are also improving but in areas of cost and capacity
rather than performance.
• Main memory is increasing in size. Modern systems tend to
have large main memories.
• Large file caches alter the workload presented to the disk and
can act as write buffers.
Problems with existing file systems
• Two general problems
1. Information is placed around the disk in way that causes too
many small accesses.
-- Unix FFS
2. The existing file systems tend to write synchronously.
-- Makes hard for an application to benefit from faster CPUs.
Log-structured file systems
• Idea:
Improve write performance by buffering the file system
changes in the file cache and then writing all the changes in a
single disk write.
• Two key issues of log approach:
1. How to retrieve information from log?
2. Making large extents of free space available.
File locating and reading
• Basic structures used by Unix FFS and Sprite LFS are same.
• Inode contains file attributes, addresses of the first ten blocks
of the file, addresses of one or more indirect blocks incase of
large files.
• Unix FFS writes each inode to a fixed location on the disk,
where as in Sprite LFS, there is no fixed position to the inode.
• Sprite LFS writes the inodes to the log.
• Inode map – maintains current location of each inode.
• Inode map is divided into blocks.
• Checkpoint region – identifies location of all the inode map
blocks.
Comparison between Sprite LFS and Unix FFS
Free space management
• Two choices : Threading and Copying.
• Threading : thread through the log.
disadvantage: makes the free space heavily fragmented and
effects fastness of LFS.
• Copying : copy live data out of the log.
disadvantage : copying cost
• Sprite LFS uses combination of threading and copying.
• Disk is divided into segments and log is threaded on a
segment-by-segment basis.
• Collect long-lived data into segments, so that they can be
skipped while copying.
Segment cleaning
• “The process of copying live data out of a segment is
called segment cleaning.”
• Segment cleaning involves 3 steps.
1. reading a number of segments into memory.
2. Identifying the live data.
3. Writing the live data back to clean segments.
• Segment summary block identifies every piece of information
written in the segment.
• Segment summary blocks incur little overhead during writing.
• Version number .
Segment cleaning (contd..)
• Version number + inode number -> uid.
• No free-block list or bitmap in Sprite -> saves memory and
disk space, simplifies crash recovery.
Segment cleaning policies
4 policy issues:
1. When should the segment cleaner execute?
2. How many segments should it clean at a time?
3. Which segments should be cleaned?
4. How to group the live blocks?
-- as to enhance locality.
-- sort by age.
• Policies 1 and 2 doesn’t show much effect on the segment
cleaner, but 3 and 4 are important.
• Write cost – The average amount of time the disk is busy per
byte of data written. Write cost of 1.0 is perfect.
Segment cleaning policies (contd..)
• For log-structured file system with large segments,
௧௢௧௔௟ ௕௬௧௘௦ ௥௘௔ௗ ௔௡ௗ ௪௥௜௧௧௘௡
write cost =
௡௘௪ ௗ௔௧௔ ௪௥௜௧௧௘௡
• The overall utilization of disk space shows effect on the
performance of a LFS.

• LFS provides a cost-


performance tradeoff.
• The key to achieving high-
performance at low cost
in LFS is to force the disk
into bimodal segment
distribution.
Simulation results
• A simulator was built to analyze different cleaning policies.
• Simulator overwrites one of the files using two pseudo-
random access patterns: Uniform , Hot-and-cold.

• Greedy policy
always selects the
least utilized
segments to clean.
Simulation results (contd..)
• Reason : Greedy policy doesn’t clean a segment until it is
the least utilized of all.
• Free space in a
cold segment is
more valuable
than free space
in a hot segment.
Simulation with cost-benefit policy
• Free space in a cold segment is more important.
• Chooses segment with high benefit-cost ratio.
௕௘௡௘௙௜௧ ௙௥௘௘ ௦௣௔௖௘ ௚௘௡௘௥௔௧௘ௗ∗௔௚௘
• =
௖௢௦௧ ௖௢௦௧
• Produced bimodal
distribution of segments.
• Cold segments got
cleaned at a much higher
segment utilization.
Segment usage table
• The table records the number of live bytes in each
segment and the most recent modified time of any block
in the segment.
• Count of live bytes is decremented till zero when a delete
or overwrite is done on a block, after which the segment
can be reused without cleaning.
• Log contains the blocks of segment tables.
Crash recovery
• Reason for crash.
• While recovering, we have to review the operations done
just before the crash.
• Traditional Unix file system has no log, so it must scan all
of the metadata structures on the disk.
• This makes the cost of scans higher in large storage
systems.
• In LFS, they are at the end of the log- easy to recover
after crash.
• 2 approaches : checkpoint, roll-forward
Checkpoints
• “A checkpoint is a position in the log at which all of the
file system structures are consistent and complete.”
• Creating a checkpoint is a 2-phase process
- write all modified information to the log.
- writing a checkpoint region to a special fixed
position on disk.
• Sprite LFS uses checkpoint region to initialize its
memory data structures during a reboot.
Checkpoints (Contd..)
• Two checkpoint regions are present in LFS to
assist recovery if a crash occurs during a
checkpoint operation.
• Sprite LFS performs periodic check pointing.
• Long interval and short-interval checkpoints.
• Another approach : Perform checkpoints after
a given amount of new data is written to the
log.
Roll-forward
• Recover as much data as possible.
• segment summary blocks help in recovering newly-
written data.
• If a new inode exists, LFS updates inode map.
• If new data blocks exist without inode, roll-forward code
ignores them.
• Directory operation log is used to restore consistency
between directories and inodes.
• If a directory operation log entry exists without out an
inode or directory being written, roll-forward updates
directory and/or inode.
• Atomic rename is provided by directory operation log.
Authors experiences with the LFS
• Implementation of Sprite LFS was started in
1989 and completed by mid-1990.
• Implementation of the Sprite LFS is not
complicated than that of the Unix FFS.
• Sprite LFS doesn’t feel much different than the
Unix FFS for the users.
Micro-benchmarks
• A collection of small benchmark programs were used
to measure the best-case performance of Sprite LFS
and compare it to Sun OS 4.0.3 which uses Unix FFS.
• Machine used is a Sun-4/260 with 32 MB of memory
and the disk has 300 MB of usable storage.
• For Sprite LFS, no cleaning occurred which is the
best-case performance.
Micro-benchmarks with large files
• One anomaly of the graph is when we need to
read a file sequentially after it has been
written randomly.
• Reason : seeks
Cleaning overheads
• Results were collected over a period of several
months.
• Five systems were measured: /user6, /pcs,
/src/kernel, /swap2, /tmp.
Cleaning overheads (Contd..)
• 2 reasons why cleaning costs are lower than in the
simulations.
1. In simulation, all the files are just a single block
long. Where as in the experiments, we have some
substantially large files.
2. In simulation, access patterns were equally divided
between hot and cold groups. But in reality, there
are large number of files that are almost never
written (cold segments).
Crash recovery
• Crash recovery code was not installed on the production
system.
• Time to recover depends on the checkpoint interval and
the rate and type of operations being performed.
• Crash configurations were generated.
• Recovery time varied with the size of
file and number of files written
between the last check-point and the
crash.
Questions

Thank you

S-ar putea să vă placă și