Distibuted File System

Distibuted File System
1
Introduction
Definition:
— Implement a common file system that can be shared by all
autonomous computers in a distributed system
DFS: multiple users, multiple sites, and (possibly)
distributed storage of files.
— Benefits
— File sharing
— Uniform view of system from different clients
— Centralized administration
2
Goals
Network (Access)Transparency
— Users should be able to access files over a network as
easily as if the files were stored locally.
— Users should not have to know the physical location of a
file to access it.
— Transparency can be addressed through naming and file
mounting mechanisms.
3
Components of Access Transparency
— Location Transparency: file name doesn’t specify physical
location.
— Location Independence: files can be moved to new physical
location, no need to change references to them. (A name is
independent of its addresses )
— Location independence → location transparency, but the
reverse is not necessarily true.
— Most DFSs today:
— Support location transparent systems.
— Do NOT support migration; (automatic movement of a
file from machine to machine.)
4
Naming and Transparency
— The ANDREW DFS AS AN EXAMPLE:
— Is location independent.
— Supports file mobility.
— Separation of FS and OS allows for disk-less systems. These
have lower cost and convenient system upgrades. The
performance is not as good.
— NAMING SCHEMES:
1. Files are named with a combination of host and local name.
— This guarantees a unique name. NOT location transparent
NOR location independent.
— Same naming works on local and remote files. The DFS is a
5
loose collection of independent file systems.
Naming and Transparency
2. Remote directories are mounted to local directories.
— So a local system seems to have a coherent directory
structure.
— The remote directories must be explicitly mounted. The
files are location independent.
— SUN NFS is a good example of this technique.
3. A single global name structure spans all the files in the

system.
— The DFS is built the same way as a local filesystem.
— Location independent.
6
Goals
Availability:
Files should be easily and quickly accessible.
— The number of users, system failures, or other
consequences of distribution shouldn’t compromise the
availability.
— Addressed mainly through replication.
7
Introduction
Architectural options:
— Fully distributed: files distributed to all sites
— based on peer-to-peer technology
— Issues: performance, implementation complexity
— Client-server Model:
— Fileserver: dedicated sites storing files perform storage
and retrieval operations
— Client: rest of the sites use servers to access files
— e.g. Sun Microsystem Network File System (NFS)
8
Client-Server Architecture
— One or more machines (file servers) manage the file system.
— Files are stored on disks at the servers.
— Requests for file operations are made from clients to the
servers.
— Client-server systems centralize storage and management;
P2P systems decentralize it.
9
client
client
cache cache
Communication Network
Server
cache cache
Disks
Server Server
10 Architecture of a distributed file system: client-server model
Distributed File Systems:
Client-Server Architecture
11
Typical Data Access in a Client/File Server
Architecture
12
Distributed File Systems Services
Services provided by the distributed file system:
(1) Name Server: Provides mapping (name resolution) the names
supplied by clients into objects (files and directories)
— Takes place when process attempts to access file or directory the
first time.
(2) Cache manager: Improves performance through file caching
— Caching at the client -When client references file at server:
•Copy of data brought from server to client machine
•Subsequent accesses done locally at the client
— Caching at the server:
•File saved in memory to reduce subsequent access time
* Issue: different cached copies can become inconsistent. Cache managers

13
(at server and clients) have to provide coordination.
Mechanisms used in distributed file systems
(1) Mounting
• The mount mechanism binds together several filename
spaces (collection of files and directories) into a single
hierarchically structured name space (Example: UNIX and
its derivatives)
• A name space ‘A’ can be mounted (bounded) at an internal
node (mount point) of a name space ‘B’
• Implementation: kernel maintains the mount table, mapping
mount points to storage devices
14
(1) Mounting
15
(1) Mounting (cont.)
• Location of mount information
a. Mount information maintained at clients
— Each client mounts every file system
— Different clients may not see the same filename space
— If files move to another server, every client needs to update its
mount table
— Example: SUN NFS
b. Mount information maintained at servers
— Every client see the same filename space
— If files move to another server, mount info at server only
needs to change
— Example: Sprite File System
16
(2) Caching
— Improves file system performance by exploiting the locality
of reference.
— When client references a remote file, the file is cached in the
main memory of the server (server cache) and at the client
(client cache).
— When multiple clients modify shared (cached) data, cache
consistency becomes a problem.
— It is very difficult to implement a solution that guarantees
consistency.
17
Simple Distributed File System
Read (RPC)
Return (Data)
Client
Server cache
Client
— Remote Disk: Reads and writes forwarded to server
— Use RPC to translate file system calls
— No local caching/can be caching at server-side
— Advantage: Server provides completely consistent view of file system
to multiple clients
— Problems? Performance!
— Going over network is slower than going to local memory
— Lots of network traffic/not well pipelined
— Server can be a bottleneck
Use of caching to reduce network load
read(f1) ®V1
cache Read (RPC)
read(f1)®V1
read(f1)®V1 F1:V1 Return (Data)
read(f1)®V1 Client
Server cache
F1:V2
F1:V1
cache
write(f1) ®OK
F1:V2
read(f1)®V2 Client
— Idea: Use caching to reduce network load

— In practice: use buffer cache at source and destination
— Advantage: if open/read/write/close can be done locally, don’t need
to do any network traffic…fast!
— Problems:
— Failure:
— Client caches have data not committed at server
— Cache consistency!
— Client caches not consistent with server/each other
(3) Hints
— Treat the cached data as hints, i.e. cached data may not be
completely accurate.
— Can be used by applications that can discover that the
cached data is invalid and can recover
— Example:
— After the name of a file is mapped to an address, that
address is stored as a hint in the cache.
— If the address later fails, it is purged from the cache
— The name server is consulted to provide the actual
location of the file and the cache is updated
20
(4) Bulk data transfer
— Observations:
— Overhead introduced by protocols does not depend on the
amount of data transferred in one transaction.
— Most files are accessed in their entirety.
— Common practice: when client requests one block of data,
multiple consecutive blocks are transferred
(5) Encryption
— Encryption is needed to provide security in distributed systems.
— Entities that need to communicate send request to authentication
server.
— Authentication server provides key for conversation.
21
Design Issues
1. Naming and name resolution
— Terminology
— Name: each object in a file system (file, directory) has a unique
name
— Name resolution: mapping a name to an object or multiple
objects (replication)
— Name space: collection of names with or without same
resolution mechanism
— Approaches to naming files in a distributed system
(a) Concatenate name of host to names of files on that host
— Advantage: unique filenames, simple resolution
22
Design Issues
— Disadvantages:
o Conflicts with network transparency
o Moving file to another host requires changing its name
and the applications using it
(b) Mount remote directories onto local directories

— Requires that host of remote directory is known
— After mounting, files referenced location-transparent (i.e.,
file name does not reveal its location)
(c) Have a single global directory

— All files belong to a single name space
— Limitation: having unique system wide filenames require a
single computing facility or cooperating facilities
23
Design Issues
1. Naming and Name Resolution (cont.)
— Contexts
— Solve the problem of system-wide unique names, by
partitioning a name space into contexts (geographical,
organizational, etc.)
— Name resolution is done within that context.
— Interpretation may lead to another context.
— File Name = Context + Name local to context
24
Design Issues
— Nameserver
— Process that maps file names to objects (files, directories)
— Implementation options
— Single name Server
o Simple implementation,
o reliability and performance issues
— Several Name Servers (on different hosts)
o Each server responsible for a domain
o Example:
Client requests access to file ‘A/B/C’
Local name server looks up a table (in kernel)
Local name server points to a remote server for ‘/B/C’
25 mapping
Design Issues
2. Caching
— Caching at the client: Main memory vs. Disk
— Main memory: (+) Fast, (+)Works for diskless clients,
(-) Expensive memory, (-) Complex Virtual Memory
Management.
— Disk: (+) Large files, (+) Simpler Virtual Memory Management
(-) Requires local disk.
— Cache consistency
— Server initiated
— Server informs cache managers when data in client caches is
stale.
— Client cache managers invalidate stale data or retrieve new data.
26 — Disadvantage: extensive communication
Design Issues
— Client initiated
— Cache managers at the clients validate data with server
before returning it to clients
— Disadvantage: extensive communication
— Prohibit file caching when concurrent-writing
— Several clients open a file, at least one of them for
writing
— Server informs all clients to purge that cached file
— Lock files when concurrent-write sharing (at least one
client opens for write)
27
Design Issues
3.Writing policy
— Question: Once a client writes into a file (and the local
cache), when should the modified cache be sent to the server?
— Options:
— Write-through: all writes at the clients, immediately
transferred to the servers
— Advantage: reliability
— Disadvantage: performance, it does not take advantage of
the cache.
28
Design Issues
— Delayed writing: delay transfer to servers
— Advantages:
o Many writes take place (including intermediate results)
before a transfer
o Some data may be deleted
— Disadvantage: reliability
— Delayed writing until file is closed at client
— For short open intervals, same as delayed writing
— For long intervals, reliability problems
29
Design Issues
4. Availability
— Issue: what is the level of availability of files in a distributed
file system?
— Resolution: use replication to increase availability, i.e. many
copies (replicas) of files are maintained at different
sites/servers
— Replication issues:
— How to keep replicas consistent
— How to detect inconsistency among replicas
30
Design Issues
— Unit of replication
— File
— Group of files
a) Volume: group of all files of a user or group or all files
in a server
o Advantage: ease of implementation
o Disadvantage: wasteful, user may need only a subset
replicated
b) Primary pack vs. pack
o Primary pack:all files of a user
o Pack: subset of primary pack. Can receive a different
31 degree of replication for each pack
Design Issues
5. Scalability
— Issue: Can the design support a growing system?
— Example: server-initiated cache invalidation complexity and
load grow with size of system.
— Possible solutions:
— Do not provide cache invalidation service for read-only files.
— Provide design to allow users to share cached data.
— Design file servers for scalability: threads, SMPs, clusters
32
Design Issues
6. Semantics
— Expected semantics: a read will return data stored by the
latest write.
— Possible options:
— All read and writes go through the server.
— Disadvantage: communication overhead
— Use of lock mechanism
— Disadvantage: file not always available
33
STATEFUL VS. STATELESS SERVICE:
Stateful: A server keeps track of information about client
requests.
— It maintains what files are opened by a client;
connection identifiers; server caches.
— Memory must be reclaimed when client closes file or
when client dies.
Stateless: Each client request provides complete information

needed by the server (i.e., filename, file offset ).
— The server can maintain information on behalf of the
client, but it's not required.
— Useful things to keep include file info for the last N files
34 touched.
Case Studies:
The Sun Network File System (NFS)
— Developed by Sun Microsystems to provide a distributed file system

independent of the hardware and operating system.
— Architecture
— Virtual File System (VFS):
File system interface that allows NFS to support different file systems.
— Requests for operation on remote files are routed by VFS to NFS
— Requests are sent to the VFS on the remote using
— The remote procedure call (RPC), and
— The external data representation (XDR)
— VFS on the remote server initiates files system operation locally.
— Vnode (Virtual Node):
— There is a network-wide vnode for every object in the file system (file or
directory)- equivalent of UNIX inode.
35
— vnode has a mount table, allowing any node to be a mount node.
Case Studies: NFS Architecture
36
NFS (Cont.)
— Naming and location:
— Workstations are designated as clients or file servers.
— A client defines its own private file system by mounting a subdirectory of a
remote file system on its local file system.
— Each client maintains a table which maps the remote file directories to
servers.
— Mapping a filename to an object is done the first time a client references the
field. Example:
Filename: /A/B/C
— Assume ‘A’ corresponds to ‘vnode1’
— Look up on ‘vnode1/B’ returns ‘vnode2’ for ‘B’ where‘vnode2’ indicates that
object is on server ‘X’.
— Client asks server ‘X’ to lookup ‘vnode2/C’.
— ‘file handle’ returned to client by server storing that file.
37 — Client uses ‘file handle’ for all subsequent operations on that file.
NFS (Cont.)
— Caching:
— Caching done in main memory of clients.
— Caching done for: file blocks, translation of filenames to vnodes, and attributes of files and
directories.
(1) Caching of file blocks
— Cached on demand with time stamp of the file (when last modified on the server)
— Entire file cached, if under certain size, with timestamp when last modified
— After certain age, blocks have to be validated with server
— Delayed writing policy: Modified blocks flushed to the server after certain delay
(2) Caching of filenames to vnodes for remote directory names

— Speeds up the lookup procedure.
(3) Caching of file and directory attributes
— Updated when new attributes received from the server, discarded after certain time.
— Stateless Server
— Servers are stateless
— File access requests from clients contain all needed information (pointer position, etc)
— Servers have no record of past requests.
— Simple recovery from crashes.
38

Distibuted File System

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Distibuted File System

Încărcat de

Drepturi de autor:

Formate disponibile

Distibuted File System

3. A single global name structure spans all the files in the

* Issue: different cached copies can become inconsistent. Cache managers

— Idea: Use caching to reduce network load

(b) Mount remote directories onto local directories

(c) Have a single global directory

Stateless: Each client request provides complete information

— Developed by Sun Microsystems to provide a distributed file system

(2) Caching of filenames to vnodes for remote directory names

S-ar putea să vă placă și