Systems Programming Chapter 3

Systems Programming
Chapter 3: File Input/Output

2 Introduction
 The file is the most basic and fundamental abstraction in Linux. Linux follows
the everything-is-a-file philosophy.
 Consequently, much interaction occurs via reading of and writing to files,
even when the object in question is not what you would consider a normal
file.
 What most of us call “files” are what Linux labels regular files. A regular file
contains bytes of data, organized into a linear array called a byte stream.
Harvey Sama - Department of Computer Engineering 11/8/2016

3 The Universal I/O Model: Streams and
C
 All C input/output is done with streams, no matter where input is coming
from or where output is going to.
 A C program keeps data in random access memory (RAM) while executing
 Data can come from some location external to the program. Data moved
from an external location into RAM, where the program can access it, is
called input. The keyboard and disk files are the most common sources of
program input.
 Data can also be sent to a location external to the program; this is called
output. The most common destinations for output are the screen, a printer,
and disk files.

4 What is a stream?
 A stream is a sequence of characters. More exactly, it is a sequence of

bytes of data.
 A sequence of bytes flowing into a program is an input stream; a sequence
of bytes flowing out of a program is an output stream.
 The major advantage of streams is that input/output programming is
device independent.
 C streams fall into two modes: text and binary.
 A text stream consists only of characters, such as text data being sent to
the screen.
 Text streams are organized into lines, which can be up to 255 characters
long and are terminated by an end-of-line, or newline, character.

5 Predefined Streams
 A binary stream can handle any sort of data, including, but not limited to,
text data. Bytes of data in a binary stream aren’t translated or interpreted
in any special way; they are read and written exactly as-is.
 The ANSI standard for C has three predefined streams, also referred to as
the standard input/output files.
 These streams are automatically opened when a C program starts
executing and are closed when the program terminates. These three
streams are:
Name Streams Device
stdin Standard input Keyboard
stdout Standard output Screen
stderr Standard error Screen
6 File Descriptors
 Every open file in the system in maintained by the kernel in a list called the
file table.
 This table has as indices non-negative integers called file descriptors, with
each file descriptor associated with a file and is unique to that file.
 Opening a file returns a file descriptor, and reading, writing or any other file
operation takes as parameter a file descriptor.
 File descriptors are represented by the C int type.
 Unless the process explicitly closes them, every process by convention has
at least three file descriptors open: 0, 1, and 2.
 File descriptor 0 is standard in (stdin), file descriptor 1 is standard out
(stdout), and file descriptor 2 is standard error (stderr).

7 Opening Files
 A file is opened and a file descriptor is obtained with the open() system call:
 The flags argument is the bitwise-OR of one or more flags. It must contain
an access mode, which is one of O_RDONLY, O_WRONLY, or O_RDWR
which respectively request that the file be opened only for reading, only for
writing, or for both reading and writing.
 Refer to page 27 of Linux Systems Programming for all the values that can
be bit-wise OR-ed with the access mode

8 Example: Getting File Information

9 Reading From Files
 The most basic—and common—mechanism used for reading is the read()

system call, defined in POSIX.1:
 Each call reads up to len bytes into the memory pointed at by buf from the
current file offset of the file referenced by fd. On success, the number of
bytes written into buf is returned.
 On error, the call returns −1 and sets errno.

 A call to read() can result in many possibilities:

1. The call returns a value equal to len. All len read bytes are stored in buf. The
results are as intended.
2. The call returns a value less than len, but greater than zero(a condition known
as partial read). The read bytes are stored in buf. This can occur because a
signal interrupted the read midway; an error occurred in the middle of the read;
more than zero, but less than len bytes’ worth of data was available; or EOF was
reached before len bytes were read. Reissuing the read (with correspondingly
updated buf and len values) will read the remaining bytes into the rest of the
buffer or indicate the cause of the problem.

3. The call returns 0. This indicates EOF. There is nothing to read.

4. The call blocks because no data is currently available. This won’t happen in non‐
blocking mode.
5. The call returns −1, and errno is set to EINTR. This indicates that a signal was
received before any bytes were read. The call can be reissued.
6. The call returns −1, and errno is set to EAGAIN. This indicates that the read would
block because no data is currently available, and that the request should be
reissued later. This happens only in nonblocking mode.
7. The call returns −1, and errno is set to a value other than EINTR or EAGAIN. This
indicates a more serious error. Simply reissuing the read is unlikely to succeed.
 Don’t try to memorize these points! They are there just for your
understanding and for future reference. You need to know how to handle
the errors though.

12 Reading From Files and Writing to Files -
Handling read errors
The code snippet to the right
handles conditions 1, 2, 3, 5 and 7
We are not going to cover non-

blocking reads.

13 Writing to Files
 The most basic and common system call used for writing is write(). Also
defined in POSIX.1:
 A call to write() writes up to count bytes starting at buf to the current

position of the file referenced by the file descriptor fd.
 Files backed by objects that do not support seeking (for example,
character devices) always write starting at the “head.
 The entire write request is guaranteed for regular files except an error
occurs, therefore a loop is not necessary.

14 Closing Files
 After a program has finished working with a file descriptor, it can unmap
the file descriptor from the associated file via the close() system call:
 A call to close() unmaps the open file descriptor fd and disassociates the
file from the process. The given file descriptor is then no longer valid, and
the kernel is free to reuse it as the return value to a subsequent open() or
creat() call.
 A call to close() returns 0 on success. On error, it returns −1 and sets errno
appropriately. Usage is simple:

15
Closing Files Example: Bank
Account Information
 Creating a file
 Writing data to the file
• It’s a good idea to always check
the return value of close() as errors  Reading the date we saved and display
that occurred earlier may not to the screen.
manifest until later, and close() can
report them.

16 Example: Bank Account Information

17 Other System Calls: Seeking Position
 I/O I usually done linearly, from the beginning of the file to the end.
However some applications want to jump around in the file, providing
random read access.
 You can change the read/write position of an open file’s file descriptor by
using the lseek system call.
 Origin is the position from which to start seeking.

 It could take any of the values:

 SEEK_CUR – start seeking from current position and move pos distance into the
file (or beyond)
 SEEK_END – start seeking from the end of the file and move pos distance into the
file (or beyond)
 SEEK_SET – go to position pos into the file without asking questions
 pos can be a negative number (seek to the left), a positive number (seek
to the right), or zero(not moving, we’re comfortable where we are ).
 Seeking past the length of the file is legal, though it does nothing on its own.
 Issuing a read request in such a case returns EOF. However issuing a write
request after such a seek creates a space between the old length of file
and the current position and fills it with zeros.

 This zero-filled zone is called a hole, but occupies no space(they’re zeros

after all)
 Using lseek with threads (to be seen later) that share the same file
descriptor could be dangerous - you could mess up the file. This is because
thread share file tables (in which file position is stored).
 Imagine the following scenario: Thread A makes a call to lseek to position p,
then before it has the time to read/write, thread B changes the file position
to q. Now thread A starts reading from the file “thinking” it is at p while in
fact it is reading from q – a completely different data set (imagine that
number was you bank account balance divided by 10…oops)
 Linux provides alternatives to this call that has no race condition.

Other System Calls: Positional Reads
20
and Writes
• Pread reads count bytes

from pos offset and saves it
in buf, without changing
the current file position.
• Likewise pwrite writes
without changing the file
pointer position.
• Intermixed with read() and
write() calls, these could
also mess up your file.

21 Other System Calls: Truncating Files
 Linux provides two system calls for truncating files, truncate() and
ftruncate()
 A file could be truncated to make it smaller or larger, though the most
common use is to make It smaller.
 ftruncate() takes as argument a file descriptor which must me open for

reading, and truncate takes a file name which must be writable, as
parameter. Truncating to a larger size pads the additional space with zero.

22 Kernel Internals: Filesystems and
Namespace
 A filesystem is just a collection file and directories arranged in a formal
hierarchy.
 Filesystems usually exist physically on block storage devices such as hard
drives, CDs, some of which are partitionable – which means that they can
be divided into multiple filesystems (think windows and Linux on the same
drive in different partitions)
 If a filesystem is files and directory arrangement, a namespace on the
arrangement of filesystems.
 Filesystems can be added or removed from a namespace – processes
called mounting and unmounting respectively.
 Unlike other systems, Linux provides a unified namespace of files and
directories,

23 Kernel Internals: Filesystems and
Namespace
 For example, on Windows, the namespace for floppy disks(the good old
square disks) is at A:\ while the primary hard drive partition has a
namespace of its own: C:\
 In Linux the first filesystem is mounted at the root of the namespace, /, and
is called the root filesystem
 All block devices are mounted at /media by default but you could
manually mount at any other point
 Linux supports a wide range of filesystem including media-specific
filesystems, network filesystems (NFS), filesystems from other Unix systems
(XFS), and even filesystems from non-Unix systems (FAT)

24 Kernel Internals: The Virtual Filesystem
 The virtual filesystem, occasionally also called a virtual file switch, is a

mechanism of abstraction that allows the Linux kernel to call filesystem
functions and manipulate filesystem data without knowing – or even caring
about – the specific type of filesystem being used.
 However, the kernel must be compiled with support for the filesystems for
which it will use.
 The VSF provides a standard interface for the kernel to communicate with
any type of filesystem.
 The following diagram summarizes the path that a user-space read() system
call takes to get data from a different filesystem:

25 Kernel Internals: The Virtual Filesystem -
System call journey in the kernel
User-space The signal is
Signal is passed to the
Application issues trapped in the system call handler
read() call kernel
The VFS of the kernel System call

invokes the Kernel figures out handler passes
appropriate read() what object backs signal to the read()
function of the given
the file descriptor system call
file system
The function does its System call handler

The read() system call
thing and returns the copies the data to
returns the data to the user space where call
data to the read()
system call handler was made and read()
system call
Harvey Sama - Department of Computer Engineering call returns 11/8/2016
26 Kernel Internals: The Virtual Filesystem
 To system programmers, the ramifications of the VFS are important.

Programmers need not worry about the type of filesystem or media on
which a file resides.
 Generic system calls – read(), write(), and so on – can manipulate files on
any supported filesystem and on any supported media.

27 Kernel Internals: The Page Cache
 The page cache is an in-memory store of recently accessed data from an

on-disk filesystem.
 Disk access is painfully slow, particularly relative to today’s processor
speeds. Storing requested data in memory allows the kernel to fulfill
subsequent requests for the same data from memory, avoiding repeated
disk access.
 The page cache exploits two types of temporal locality : locality of
reference and sequential locality
 Temporal locality says that a resource accessed at one point has a high
probability of being accessed again in the near future.

 The page cache is the first place that the kernel looks for filesystem data.
 The kernel invokes the memory subsystem to read data from the disk only
when it isn’t found in the cache.
 Thus, the first time any item of data is read, it is transferred from the disk into
the page cache, and is returned to the application from the cache. If that
data is then read again, it is simply returned from the cache.
 The Linux page cache is dynamic in size. As I/O operations bring more and
more data into memory, the page cache grows larger and larger,
consuming any free memory.

 If the page cache eventually does consume all free memory and an
allocation is committed that requests additional memory, the page cache
is pruned, releasing its least-used pages to make room for “real” memory
usage.
 Pruning occurs seamlessly and automatically. A dynamically sized cache
allows Linux to use all of the memory in the system and cache as much
data as possible.
 Often, however, it would make more sense to swap to disk a seldom-used
page of process memory than it would to prune an oft-used piece of the
page cache that could well be reread into memory on the next read
request (swapping allows the kernel to store data on the disk to allow a
larger memory footprint than the machine has RAM).

 The Linux kernel implements heuristics to balance the swapping of data

versus the pruning of the page cache (and other in-memory reserves).
 These heuristics might decide to swap data out to disk in lieu of pruning the
page cache, particularly if the data being swapped out is not in use.

 Sequential locality says that data is often referenced sequentially. To take

advantage of this principle, the kernel also implements page cache
readahead.
 Readahead is the act of reading extra data off the disk and into the page
cache following each read request—in effect, reading a little bit ahead.
 When the kernel reads a chunk of data from the disk, it also reads the
following chunk or two. Reading large sequential chunks of data at once is
efficient, as the disk usually need not seek.
 In addition, the kernel can fulfill the readahead request while the process is
manipulating the first chunk of read data.

 If, as often happens, the process goes on to submit a new read request for
the subsequent chunk, the kernel can hand over the data from the initial
readahead without having to issue a disk I/O request.
 System programmers generally cannot optimize their code to better take
advantage of the fact that a page cache exists – other than, perhaps, not
implementing such a cache in user space themselves.
 Utilizing readahead, on the other hand, is possible.

33 Kernel Internals: Page Writeback
 As discussed earlier, the kernel defers writes via buffers.

 When a process issues a write request, the data is copied into a buffer, and
the buffer is marked dirty, denoting that the in-memory copy is newer than
the on-disk copy. The write request then simply returns.
 If another write request is made to the same chunk of a file, the buffer is
updated with the new data. Write requests elsewhere in the same file
generate new buffers.

 Eventually the dirty buffers need to be committed to disk, synchronizing the

on-disk files with the data in memory. This is known as writeback. It occurs in
two situations:
 When free memory shrinks below a configurable threshold, dirty buffers are
written back to disk so that the now-clean buffers may be removed, freeing
memory.
 When a dirty buffer ages beyond a configurable threshold, the buffer is written
back to disk. This prevents data from remaining dirty indefinitely.
 Writebacks are carried out by a gang of kernel threads named flusher
threads. When one of the previous two conditions is met, the flusher threads
wake up and begin committing dirty buffers to disk until neither condition is
true.

 There may be multiple flusher threads instantiating writebacks at the same

time. This is done to capitalize on the benefits of parallelism and to
implement congestion avoidance.
 Congestion avoidance attempts to keep writes from getting backed up
while waiting to be written to any one block device. If dirty buffers from
different block devices exist, the various flusher threads will work to fully use
each block device.
 Deferred writes and the buffer subsystem in Linux enable fast writes at the
expense of the risk of data loss on power failure. To avoid this risk, paranoid
and critical applications can use synchronized I/O.

36 References
 Linux Systems programming – Robert Love

 Sams Teach Yourself C in 24 Hours - Bradley L. Jones and Peter Aitken
 Linux Command Line and Shell Scripting Bible Richard Blum

Systems Programming Chapter 3

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Systems Programming Chapter 3

Încărcat de

Drepturi de autor:

Formate disponibile

Systems Programming

Chapter 3: File Input/Output

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 A stream is a sequence of characters. More exactly, it is a sequence of

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 The most basic—and common—mechanism used for reading is the read()

Harvey Sama - Department of Computer Engineering 11/8/2016

 A call to read() can result in many possibilities:

Harvey Sama - Department of Computer Engineering 11/8/2016

3. The call returns 0. This indicates EOF. There is nothing to read.

Harvey Sama - Department of Computer Engineering 11/8/2016

We are not going to cover non-

Harvey Sama - Department of Computer Engineering 11/8/2016

 A call to write() writes up to count bytes starting at buf to the current

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 Origin is the position from which to start seeking.

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 This zero-filled zone is called a hole, but occupies no space(they’re zeros

Harvey Sama - Department of Computer Engineering 11/8/2016

• Pread reads count bytes

Harvey Sama - Department of Computer Engineering 11/8/2016

 ftruncate() takes as argument a file descriptor which must me open for

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 The virtual filesystem, occasionally also called a virtual file switch, is a

Harvey Sama - Department of Computer Engineering 11/8/2016

The VFS of the kernel System call

The function does its System call handler

 To system programmers, the ramifications of the VFS are important.

Harvey Sama - Department of Computer Engineering 11/8/2016

 The page cache is an in-memory store of recently accessed data from an

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 The Linux kernel implements heuristics to balance the swapping of data

Harvey Sama - Department of Computer Engineering 11/8/2016

 Sequential locality says that data is often referenced sequentially. To take

Harvey Sama - Department of Computer Engineering 11/8/2016

Harvey Sama - Department of Computer Engineering 11/8/2016

 As discussed earlier, the kernel defers writes via buffers.

Harvey Sama - Department of Computer Engineering 11/8/2016

 Eventually the dirty buffers need to be committed to disk, synchronizing the

Harvey Sama - Department of Computer Engineering 11/8/2016

 There may be multiple flusher threads instantiating writebacks at the same

Harvey Sama - Department of Computer Engineering 11/8/2016

 Linux Systems programming – Robert Love

Harvey Sama - Department of Computer Engineering 11/8/2016

S-ar putea să vă placă și