Documente Academic
Documente Profesional
Documente Cultură
Filesystem Backup to
the Cloud
Paper: Michael Vrable, Stefan Savage &
Geoffrey M.Voelker
Slides: Joe Buck, CMPS 229 - Spring 2010
1
Thursday, May 27, 2010
Introduction
2
Thursday, May 27, 2010
The cloud is new and shiny, we need to rethink solutions in light of its characteristics.
The spectrum runs from thin cloud (S3) to thick cloud (Salesforce.com, Google docs)
Interesting systems work exists in the asymmetries
Backup
• Simple interface
• All logic is in the client
• Minimize resource usage & cost
4
Thursday, May 27, 2010
5
Thursday, May 27, 2010
6
Thursday, May 27, 2010
8
Thursday, May 27, 2010
9
Thursday, May 27, 2010
Segments are the units of operation. Can be parts of files or multiple files.
Compression, etc is applied to segments at the client
Analysis
10
Thursday, May 27, 2010
11
Thursday, May 27, 2010
Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 9 / 19
Analysis is based on traces.
Most of the numbers are based on the User trace
Evaluation
Benefit of Cleaning
1
� Wit
0.95
clea
0.9
utili
0.85
Storage Utilization
decr
0.8
� Wee
0.75
0.7 keep
0.65 with
0.6 rang
0.55 With Cleaning � Exa
No Cleaning
0.5 dep
0 50 100 150 200
Time (days) para
13
Thursday, May 27, 2010
1 MB Segments 50
0 38
0 0.2 0.4 0.6 0.8 1
Cleaning Threshold
14
Thursday, May 27, 2010
15
Thursday, May 27, 2010
Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud Feb
What Settings Minimize Total Cost?
Cost
50
16 MB Segments � Agg
0.75
Cost Increase vs. Optimal (%)
4 MB Segments larg
40 1 MB Segments
512 kB Segments 0.7 incr
128 kB Segments
30 � Tot
0.65 per-
20 inte
0.6
segm
10
0.55 � Clea
0.4–
0
0 0.2 0.4 0.6 0.8 1 size
Cleaning Threshold well
16
Thursday, May 27, 2010
17
Thursday, May 27, 2010
The paper tried to call out integrated solutions but I think that’s an apple / oranges
comparison as all their limitations painted them into a corner.
Prototype Results
18
Thursday, May 27, 2010
• Client-side cost?
• Segmentation...
20
Thursday, May 27, 2010
They never seem to quantify the client side cost of storing the meta-data and block-hash
maps.
Segmentation seems like just chunking a tar file. Simply auto-network tuning? Per vendor?
More Material
• Code available
• http://sysnet.ucsd.edu/projects/cumulus/
• FAST ’09 Presentation
• http://www.usenix.org/media/events/
fast09/tech/videos/vrable.mov
21
Thursday, May 27, 2010