Sunteți pe pagina 1din 30

HydraFS

C. Ungureanu, B. Atkin, A. Aranya, et al.

Slides: Joe Buck, CMPS 229, Spring 2010

April 27, 2010


1

Tuesday, April 27, 2010


Introduction

✤ What is HydraFS?

✤ Why is it necessary?

Tuesday, April 27, 2010


HYDRAstor

✤ What is HYDRAstor

✤ Immutable data

✤ High Latency

✤ Jitter

✤ Put / Get API

Tuesday, April 27, 2010


tions, and HydraFS acts as a front end for the Hydra distributed,
ovel chal- content-addressable block store (Figure 1). In this sec-
tion, we present the characteristics of Hydra and describe
Hydra Diagram
entation of
HydraFS is the key challenges faced when using it for applications,
CAS sys- such as HydraFS, that require high throughput.
e through-
Access Node
n.
ving high File Commit
ore expen- Server Server

t refer to a HydraFS

comprises
node data HYDRAstor Block Access Library

attributes,
files). Sec-
significant
on of high Hydra
rite buffer
Storage Storage Storage Storage
structures Node Node Node Node
will thrash.
Single−System Content−Addressable Store
ee design 4

adata pro-
Tuesday, April 27, 2010
CAS

Client
4 KB 4 KB 4 KB
4 KB 4 KB 4 KB

CAS

Tuesday, April 27, 2010


CAS - continued

Client
4 KB 4 KB 4 KB
4 KB 4 KB 4 KB

Chunker

CAS

Tuesday, April 27, 2010


CAS - continued

Client
4 KB 4 KB 4 KB
2 KB

CAS

cas1: 10KB

Tuesday, April 27, 2010


CAS - continued

Client
1 KB 4 KB

CAS

cas1: 10KB cas2: 9KB

Tuesday, April 27, 2010


A little more on CAS addresses

✤ Same data doesn’t mean the same address

✤ Impossible to calculate prior to write

✤ Foreground processing writes shallow trees

✤ Root cannot be updated until all child nodes are set

Tuesday, April 27, 2010


Issues for a CAS FS

✤ Updates are more expensive

✤ Metadata cache misses cause significant performance issues

✤ The combination of high latency and high throughput means lots of


buffering

10

Tuesday, April 27, 2010


Design Decisions

✤ Decouple data and metadata processing

✤ Fixed size caches with admission control

✤ Second-order cache for metadata

11

Tuesday, April 27, 2010


Issues - continued

✤ Immutable Blocks

✤ FS can only reference blocks already written

✤ Forms DAGs

✤ Height of DAGs needs to be minimized

12

Tuesday, April 27, 2010


Issues - continued

✤ High latency

✤ In stead of ms - 10’s of ms latency Hydra has 100’s ms - 1 s latency

✤ Stream hints

✤ Delay writes to batch streams together

✤ High degree of parallelism needed to mask high latencies

13

Tuesday, April 27, 2010


Issues - continued

✤ Variable sized blocks

✤ Avoids the “shifting window” problem

✤ Use a balanced tree structure

14

Tuesday, April 27, 2010


FS design

✤ High Throughput

✤ Minimize the number of dependent I/O operations

✤ Availability guarantees no worse than standard Unix FS

✤ Efficiently support both local and remote access

15

Tuesday, April 27, 2010


File System Layout
Super Blocks
File
Operations File
Server
Imap Handle

Imap B−Tree

Imap Segmented Array


Directory Inode Regular File Inode
Data Blocks

Inode B−Tree Inode B−Tree

Figure 3: HydraFS Soft

Directory Blocks File Contents


System [23]. In HydraFS, the
Filename1 321 R
array of content addresses and
Filename2 365 R a B-tree. It is used to translate i
Filename3 442 D as well as to allocate and free i
A regular file inode indexes
16
so as to accommodate very larg
Tuesday, April 27, 2010
HydraFS Software Stack

✤ Uses FUSE

✤ Split into file server and commit server

✤ Simplifies metadata locking

✤ Amortizes the cost of metadata updates via batching

✤ Each server has its own caching strategy

17

Tuesday, April 27, 2010


Writing Data

✤ Data stored in inode specific buffer

✤ Chunked, marked dirty and written to Hydra

✤ After write confirmation, block freed and entered in uncommitted


block table

✤ Needed until metadata is flushed to storage

✤ Designed for append writing, in-place updates are expensive

18

Tuesday, April 27, 2010


Metadata Cleaning

✤ Dirty data kept until the commit server applies changes

✤ New versions of file systems are created periodically

✤ Metadata in separate structures, tagged by time

✤ Always clean (in Hydra), can be dropped from cache at any time

✤ Cleaning allows file servers to drop changes in the new FS version

19

Tuesday, April 27, 2010


Admission Control

✤ Events assume worse case memory usage

✤ If insufficient resources are available, the event blocks

✤ Limits the number of active events

✤ Memory usage is tuned to the amount of physical memory

20

Tuesday, April 27, 2010


Read Processing

✤ Aggressive read-ahead

✤ Multiple fetches to get metadata

✤ Weighted caching to favor metadata over data

✤ Fast range map

✤ Metadata read-ahead

✤ Primes FRM, cache

21

Tuesday, April 27, 2010


Deletion

✤ File deletion removes the entry from the current FS

✤ Data remains until there are no pointers to it

22

Tuesday, April 27, 2010


Performance

Raw block device


File system

1.0 ex
Normalized Throughput

H
0.8

0.6 Table 1
ilar har
0.4

0.2 ited by
ing thro
0.0 keep th
Read (iSCSI) Read (Hydra) Write (iSCSI) Write (Hydra)
tem23 do
Tuesday, April 27, 2010
Metadata Intensive

✤ Postmark

✤ Generates files, then issues transactions.

✤ File size: 512 B - 16 KB

Create Delete
Overall
Alone Tx Alone Tx
ext3 1,851 68 1,787 68 136
HydraFS 61 28 676 28 57

Table 1: Postmark comparing HydraFS with ext3 on sim-


24
ilar hardware
Tuesday, April 27, 2010
tem does not accumulate a large number of uncommitted
blocks that increase the turnaround times for the commit
e system server processing, increasing unpredictably the latency of
Write Performance vs Dedup
user operations. In contrast, ext3 has no such limitations
and all metadata updates are written to the journal.
achieved
350
k device Hydra
HydraFS
pectively. 300
3 is com-
250
roughput
Throughput (MB/s)

the write 200


o around
150
of Hydra
ncy. 100
lementa-
50
general-
gnificant 0
ce comes 0 20 40 60 80
Duplicate Ratio (%)
d by de- 25
ory man-
Tuesday, April 27, 2010
as expected for duplicate data as the number of I/Os to
10
disk is correspondingly reduced. Second, for all cases, the
HydraFS throughput is within 12% of the Hydra through- 8

Write Behind put. Therefore, we conclude that HydraFS meets the de- 7

Page Memory (MB)


sired goal of maintaining high throughput.
6

10 5

3
9.5
2
9
1
8.5
Offset (GB)

7.5

7 Fig
6.5

6 tency of
0 5 10 15 20
Time (s)
block w
higher
26 t
Tuesday, April 27, 2010 parallel.
Time (s)
block
higher

Hydra Latency Figure 7: Write completion order paralle


To f
the Cu
1 event l
is crea
0.9 is dest
ure 8 s
0.8 less tha
Pr(t<=x)

0.7
Admis
that H
0.6 lying b
vent th
0.5 use ad
tem, th
0.4 buffer
0 10 20 30 40 50 60 70
Time (ms) wastin
27

Tuesday, April 27, 2010


locatio
Future Work

✤ Allow multiple nodes to manage same FS

✤ Makes failover transparent and automatic

✤ Exposing snapshots to users

✤ Incorporating SSD storage to lower latencies, make HydraFS usable


as primary storage

28

Tuesday, April 27, 2010


Thank you

✤ Questions?

✤ Comments?

✤ email: buck@soe.ucsc.edu

✤ Paper: http://www.usenix.org/events/fast10/tech/full_papers/
ungureanu.pdf

29

Tuesday, April 27, 2010


Sample Operations

✤ Block Write

✤ Block Read

✤ Searchable Block Write

✤ Searchable Block Read

30

Tuesday, April 27, 2010

S-ar putea să vă placă și