Documente Academic
Documente Profesional
Documente Cultură
Filesystem Backup to
the Cloud
Paper: Michael Vrable, Stefan Savage &
Geoffrey M.Voelker
Slides: Joe Buck, CMPS 229 - Spring 2010
1
Thursday, May 27, 2010
Introduction
2
Thursday, May 27, 2010
Backup
• Simple interface
• All logic is in the client
• Minimize resource usage & cost
4
Thursday, May 27, 2010
Cumulus Backup Format Example Backup
Monday
Snapshot Roots
5
Thursday, May 27, 2010
Example Backup - cont.
Cumulus Backup Format
6
Thursday, May 27, 2010
Example Backup - cont.
Aggregation: Minimizing Per-Block Costs
Monday Tuesday
Segments Snapshot Roots
8
Thursday, May 27, 2010
Implementation Notes
9
Thursday, May 27, 2010
Analysis
10
Thursday, May 27, 2010
Evaluation Traces
Trace Data
Fileserver User
Duration (days) 157 223
Entries 26673083 122007
Files 24344167 116426
File Sizes
Median 0.996 KB 4.4 KB
Average 153 KB 21.4 KB
Maximum 54.1 GB 169 MB
Total 3.47 TB 2.37 GB
Update Rates
New data/day 9.50 GB 10.3 MB
Changed data/day 805 MB 29.9 MB
Total data/day 10.3 GB 40.2 MB
11
Thursday, May 27, 2010
Evaluation
Benefit of Cleaning
1
� Wit
0.95
clea
0.9
utili
0.85
Storage Utilization
decr
0.8
� Wee
0.75
0.7 keep
0.65 with
0.6 rang
0.55 With Cleaning � Exa
No Cleaning
0.5 dep
0 50 100 150 200
Time (days) para
13
Thursday, May 27, 2010
How Much Data is Transferred?
1 MB Segments 50
0 38
0 0.2 0.4 0.6 0.8 1
Cleaning Threshold
14
Thursday, May 27, 2010
What is the Storage Overhead?
Storage Overhead
25
16 MB Segments 3.3 � Larg
4 MB Segments
Overhead vs. Optimal (%)
15
Thursday, May 27, 2010
What Settings Minimize Total Cost?
Cost
50
16 MB Segments � Agg
0.75
Cost Increase vs. Optimal (%)
4 MB Segments larg
40 1 MB Segments
512 kB Segments 0.7 incr
128 kB Segments
30 � Tot
0.65 per-
20 inte
0.6
segm
10
0.55 � Clea
0.4–
0
0 0.2 0.4 0.6 0.8 1 size
Cleaning Threshold well
16
Thursday, May 27, 2010
Simulation Results
17
Thursday, May 27, 2010
Prototype Results
18
Thursday, May 27, 2010
Summary
• Cumulus is a cost-effective tool for
network backup
• Tunable metrics evaluated
• Low-overhead backup feasible on-top of a
simple interface
• Limited Deduplication
19
Thursday, May 27, 2010
My Thoughts
• Client-side cost?
• Segmentation...
20
Thursday, May 27, 2010
More Material
• Code available
• http://sysnet.ucsd.edu/projects/cumulus/
• FAST ’09 Presentation
• http://www.usenix.org/media/events/
fast09/tech/videos/vrable.mov
21
Thursday, May 27, 2010