Documente Academic
Documente Profesional
Documente Cultură
✤ What is HydraFS?
✤ Why is it necessary?
✤ What is HYDRAstor
✤ Immutable data
✤ High Latency
✤ Jitter
t refer to a HydraFS
comprises
node data HYDRAstor Block Access Library
attributes,
files). Sec-
significant
on of high Hydra
rite buffer
Storage Storage Storage Storage
structures Node Node Node Node
will thrash.
Single−System Content−Addressable Store
ee design 4
adata pro-
Tuesday, April 27, 2010
plit allows
applied ef- Figure 1: HYDRAstor Architecture.
nique that
CAS
Client
4 KB 4 KB 4 KB
4 KB 4 KB 4 KB
CAS
Client
4 KB 4 KB 4 KB
4 KB 4 KB 4 KB
Chunker
CAS
The chunker uses some heuristic involving content data and some hard set limits to chunk in
variable sizes
CAS - continued
Client
4 KB 4 KB 4 KB
2 KB
CAS
cas1: 10KB
Client
1 KB 4 KB
CAS
cas2 is 9 kb
A little more on CAS addresses
10
11
✤ Immutable Blocks
✤ Forms DAGs
12
The entire tree must be updated if a block contained in it is updated, makes updates quite
expensive
Issues - continued
✤ High latency
✤ Stream hints
13
14
✤ High Throughput
15
Imap B−Tree
✤ Uses FUSE
17
File server manages the interface to the client, records file modifications in a transaction log
stored in hydra, in-memory cache of recent file modifications.
Commit server reads transaction log, updates FS metadata, generates new FS versions
Writing Data
18
✤ Always clean (in Hydra), can be dropped from cache at any time
19
20
Not all memory used are freed when an action completes. For example, cache. This can be
flushed if the system finds it needs to reclaim memory.
Not swapping is key for keeping latencies low and performance up.
Read Processing
✤ Aggressive read-ahead
✤ Metadata read-ahead
21
22
The data will remain in storage until all FS versions that reference it are garbage collected.
Block maybe pointed to by other files as well.
The FS only marks roots for deletion, Hydra handles reference counting and storage
reclamation.
Performance
1.0 ex
Normalized Throughput
H
0.8
0.6 Table 1
ilar har
0.4
0.2 ited by
ing thro
0.0 keep th
Read (iSCSI) Read (Hydra) Write (iSCSI) Write (Hydra)
tem23 do
Tuesday, April 27, 2010
blocks
Sequential throughput
Figure 5: Comparison of raw device and file system
iSCSI is 6 disks per node -> software raid5 (likely the write hit iscsi takes) server
Block size 64throughput
KB for iSCSI and Hydra user op
HydraFS 82% of read, 88% on write
and all
Metadata Intensive
✤ Postmark
Create Delete
Overall
Alone Tx Alone Tx
ext3 1,851 68 1,787 68 136
HydraFS 61 28 676 28 57
HydraFS
Hydra and HydraFS write throughput with vary-
Hydra ac-within 12% of Hydra throughout
ing duplicate ratio
e perfor-
as expected for duplicate data as the number of I/Os to
10
disk is correspondingly reduced. Second, for all cases, the
HydraFS throughput is within 12% of the Hydra through- 8
Write Behind put. Therefore, we conclude that HydraFS meets the de- 7
10 5
3
9.5
2
9
1
8.5
Offset (GB)
7.5
7 Fig
6.5
6 tency of
0 5 10 15 20
Time (s)
block w
higher
26 t
Tuesday, April 27, 2010
Figure 7: Write completion order parallel.
Helps with buffering. No IO in the write “critical path”
A lot of jitter around 6 seconds, biggest gap is 1.5 GB To fu
the Cum
1 event lif
Time (s)
block
higher
0.7
Admis
that H
0.6 lying b
vent th
0.5 use ad
tem, th
0.4 buffer
0 10 20 30 40 50 60 70
Time (ms) wastin
27
28
✤ Questions?
✤ Comments?
✤ email: buck@soe.ucsc.edu
✤ Paper: http://www.usenix.org/events/fast10/tech/full_papers/
ungureanu.pdf
29
✤ Block Write
✤ Block Read
30