Sunteți pe pagina 1din 12

Archiving for Indies

Creating Long-Term Archives


for Independent Digital Cinema

Torrey Loomis
Silverado Systems, Inc.

June 2010
Silverado

SILVERADO
STUDIOS
In association with
Silverado Systems, Inc.
Archiving for Indies 2

Intro

I read a quote a while back from a document called “The Digital Dilemma: Stra-
tegic Issues in Archiving and Accessing Digital Motion Picture Materials”

“...the annual cost of preserving a 4K digital master is $12,514...”

What?

I thought that statement was an error. How was it possible that archiving a 4K
digital cinema project could cost $12,514 just one time--let alone on an annual
basis?

Slow down--I figured I had better read the paper first and then opine later. I am
glad I did. Produced by the Science and Technology Council of the Academy of
Motion Picture Arts and Sciences, “The Digital Dilemma” is one of the most thor-
oughly researched and well written documents on the topic of archiving digital
cinema content. It is de rigueur reading for anyone involved in archiving assets in
any digital cinema production.

Subjects include:

• What is the difference between a digital library and a digital archive?


• How much content should be archived?
• What are the costs involved in a proper archive operation?
• Who are the other players in media archiving, and what can we learn
from them?
• Are there differences between archiving film and digital content?
• Are standards developed to provide interoperability between studios?
• Is there a storage medium preferred over others?

These topics and more are covered in this astounding reference document. Its
not the purpose of our paper to reiterate the complete findings of “The Digital
Dilemma” but to update a few of the assumptions used in the document since it
was first published in 2007 (research began in 2005) and distill the content into
some salient workflows that can be used by independent digital cinema produc-
tions today with significantly lower cost than first projected.

Before you continue with this paper, I recommend you register and download a
free digital PDF copy of “The Digital Dilemma” at AMPAS website here:

http://bit.ly/cTDsJ1

Finally, the name of this paper was inspired by Mike Curtisʼ great independent
cinema blog, HD for Indies: http://www.hdforindies.com

This article is targeted towards independent digital content creators and produc-
ers, but there is no reason why this knowledge cannot also be scaled to larger
facilities and productions.
Silverado Systems, Inc.
Archiving for Indies 3

First...a teaser

Photo above depicts the complete contents of the feature film “Rogue River.”

http://www.roguerivermovie.com

“Rogue River” is a feature produced by KeJo Productions of Roseville, CA:

http://www.kejopro.com

Shot on the RED ONE digital camera platform, the total contents of principal pho-
tography fit neatly onto seven 800 GB-capacity LTO-4 tape cartridges. These
tapes act as “digital negatives” which are set aside for safekeeping while a dupli-
cate of the filmʼs content is actively being used in editorial.

When editorial is finished, additional LTO-4 tapes will be produced that contain
other content generated during post-production such as visual effects shots,
audio and music files, Quicktime trailers, and other production assets such as
PDF files of scripts, notes, and other documents.
Silverado Systems, Inc.
Archiving for Indies 4

Background and Assumptions

Lets go back to the quote that inspired me into researching and writing this
document:

“...the annual cost of preserving a 4K digital master is $12,514...”

Why did this surprise me so much? First--Iʼm not a film guy. My day job includes
designing workflows for all-digital projects, so I am used to SD, HD, and 4K ori-
ented productions. When I saw the words “4K digital master” I initially assumed it
was leveled at the most widely used 4K acquisition format today: the RED ONE.

However, it was not. When “The Digital Dilemma” was written, the term 4K was
generally oriented towards a frame size (4096 x 2160) achieved with scanning
35mm film versus an acquisition format. Remember that “The Digital Dilemma”
was first released in 2007 which was the same year that RED Digital Cinema
released their first batch of RED ONE cameras to owners. Until this point, 4K
acquisition was not possible for the average independent shooter--especially at
the ultra-low data rates that RED ONE allowed (about 28 MB/s).

The assumption that the authors of “The Digital Dilemma” used were 4K frame
sizes of approximately 40 MB each. One second of uncompressed film scanned
at 4K could easily soak up 1 GB of storage. The authors projected the costs of
archiving an entire 4K feature this way:

“Based on an annual cost of $500 per terabyte of fully managed storage of


3 copies of an 8.3 terabyte 4K digital master.”

Using a calculation like this was perfectly reasonable in 2007, but would radically
change over the next few years as RED and other manufacturers began to ramp
up competition in 4K acquisition.

For the purposes of “Archiving for Indies” the following assumptions will be fol-
lowed for independent cinema producers:

• All assets are “born digital”


• Any asset not digital is scanned or converted to a digital asset
• Keep everything--discard nothing
• Videotape and film are not part of production--using fully digital tapeless
workflows
• Assume that the 4K acquired will have some level of compression (i.e.
REDCode)
• Insurance requirements mandate that productions make multiple “digital
negatives” of their tapeless files
• Everything is standardized on LTO (Linear Tape Open) cartridges
• Long-term archive is NOT to hard drive
• You WILL have to migrate your data to new formats in the future and
that cost should be factored into your overall TCO (total cost of owner-
ship)
• Your TCO is also affected not just by your media costs, but other costs
such as labor, utilities, system hardware and software, upgrades, per-
sonnel training, and storage costs.
Silverado Systems, Inc.
Archiving for Indies 5

Defining the Archive

What exactly is an archive? And what is a library? The authors of “The Digital
Dilemma” define them as the following:

“Working Library” is a broad term for elements that are generally kept on
hand for distribution purposes.

“Archival” is defined as storage of the master elements from which all


downstream distribution materials can be created over a 100-year time-
frame.

If you have successfully moved your tapeless media files onto your computer or
RAID-protected SAN, you have a working library. However, a RAID-protected
volume is completely inadequate as a bona fide “archive” in the strictest sense of
the definition.

At this point, we should highlight some very important things. There are NO tech-
nologies that exist today that can fulfill the definition of an “archive” that can be
trusted to last and be read 100 years from now. Technology is moving so fast that
new formats are being developed that will take the place of standards in use to-
day. Given this, the practice of data migration is not an option but a requirement.
You will need to recognize when its time to migrate your data from one format to
another.

To be fair, this process will become easier over time. Using the example of
“Rogue River” listed above, lets assume the production winds up with a total of
ten LTO-4 tapes that need migration. The natural migration of this data will be to
LTO-5.

Since LTO-5 will accommodate 1.5 TB of data, its safe to assume that migrating
ten 800 LTO-4 tapes to the new format will require approximately five or six LTO-
5 tapes. When LTO-6 is released in a few years, the storage density is assumed
to increase to 3.0 TB per tape. Hence, “Rogue River” on LTO-6 will only need two
or three cartridges.

IBM and other partners are working on new tape technology that can extend ca-
pacities up to 35 TB per tape, so the capacity for tape technology to have a du-
rable longevity is very likely.

One distinction here--we are not talking about videotape.

LTO (Linear Tape Open) is a data cartridge that can hold any type of digital file.
You can think of LTO tapeʼs capabilities the way you would think about the ca-
pacity for your hard drive to store files. If it can be stored on a hard drive, it can
be archived to LTO.
Silverado Systems, Inc.
Archiving for Indies 6

Why not optical disc? Simple--the largest capacities of optical disc today are
dual-layer Blu-ray discs at approximately 50 GB per disc. It would take 16 Blu-ray
discs to hold the same amount of data as a single 800 GB LTO-4 cartridge.

What about hard drives? Magnetic hard drives have serious shelf stability issues
that make them unsuitable for long-term storage. The authors of “The Digital Di-
lemma” explain further:

“It should be noted that magnetic hard drives are designed to be “powered
on and spinning,” and cannot just be stored on a shelf for long periods of
time. The drivesʼ internal lubrication must be occasionally redistributed
across the data recording surface through normal operation of the drive,
otherwise they can develop “stiction” problems where internal components
mechanically lock up.”

The pre-eminent Final Cut Pro expert Larry Jordan has done a significant amount
of research of the stability of hard drives that sit unused on a shelf:

“Magnetic signals recorded on a hard disk are designed to be refreshed


periodically. If your hard disks stay on, this happens automatically. How-
ever, if you store your projects to a removable hard drive, then store that
hard drive on a shelf, unattached to a computer, those magnetic signals
will fade over time... essentially, evaporating.”

“...the life-span of a magnetic signal on a hard disk is between a year and a


year and a half. The issue is complex, as you'll see, but this is a MUCH
shorter shelf-life than I was expecting.”

“The way to keep the files on your hard disks safe is to connect the hard
drive to your computer every six months or so and, ideally, copy all the
files from one drive to another.”

You can read more from Larryʼs article here: http://bit.ly/cac4kj

Essentially, if you want to see the data on your hard drives remain intact--youʼre
required to do a bit-level data migration or surface scan every six months. This is
hardly a recipe for a stable archive--rotating dozens or hundreds of hard drives
every six months.

What about tapeless media such as CF (Compact Flash) cards and SSD (solid-
state drives)? It should be noted that memory-based storage such as these tech-
nologies lack moving parts which increases their overall durability, but their high-
cost and low storage densities currently make them unsuitable for long-term ar-
chive.
Silverado Systems, Inc.
Archiving for Indies 7

The RAID question


Why shouldnʼt you just leave your media on a RAID volume? Lots of people as-
sume data on a volume under RAID 5 is a perfectly legitimate way to keep things
around a long time. Letʼs backup and explain what RAID is, and why this isnʼt a
viable option for long-term archive.

RAID stands for redundant array of inexpensive (or independent) disks. RAID
systems are sophisticated systems that combine multiple hard drives together
with a specialized controller that syncs them together for speed, redundancy, or
both.

There are different RAID schemas. Some can be managed on computer systems
with software only. Others are hardware-based because the compute resources
needed to drive certain RAID arrays are quite high.

Software RAID

RAID 0 is called a stripe and its a software-based array that takes each drive in
the array and “stripes” them together as one large volume. They gain more
speed as each additional drive is added to the overall volume. However, there is
no redundancy. If you lose one drive, your data is totally compromised.

RAID 1 is called a mirror. You can take two drives and arrange them in a RAID 1
volume and you essentially are writing data to both drives at the same time. The
data on each is identical, so if you lose one drive you can swap it for a brand new
one and the volume is rebuilt from the data on the remaining good drive. How-
ever, there is no speed boost with this arrangement.

Hardware RAID

What if you could have the best of both worlds? Combine the security of RAID 1
with the speed of RAID 0? That is the benefit of a hardware-based RAID format.
They arenʼt inexpensive, but hardware RAID controllers do some very heavy
computational lifting.

The most common hardware-based RAID formats are 3, 5, and 6.

RAID 3 is an array that takes data and spreads it across multiple drives like a
RAID 0. The difference is that a single-drive is set aside as parity storage. If you
lose a drive, then your data can be rebuilt from all the drives plus material from
the parity drive. RAID 3 can stand the loss of any single drive.

RAID 5 is similar to RAID 3, however parity data is evenly distributed across all
drives rather than allocated to a single drive as in RAID 3. The array is not de-
stroyed by a single drive failure--and because the parity is distributed, the RAID 5
volume is faster at random reads and writes. RAID 5 isnʼt bottlenecked by writing
parity data to a single drive.

RAID 6 is similar to RAID 5. However, RAID 6 provides fault tolerance from fail-
ures of two drives, not just one. As drive capacities get larger and RAID rebuilds
take longer, fault tolerance becomes a critical factor.
Silverado Systems, Inc.
Archiving for Indies 8

RAID volumes can also be assembled into larger RAID 50 and RAID 60 arrays.
Adding a “0” to the end of RAID 5 or RAID 6 simply means you take two distinct
hardware RAID volumes and combine them into a RAID 0 using software like
Disk Utility.

A RAID 50 or 60 has the benefit of RAID 0 speed with the fault tolerance of RAID
5 or 6. If a drive dies inside a RAID 50 or 60, you can pull it out and replace it and
the hardware RAID controller inside the affected RAID unit tackles the task of
rebuilding the data on that drive. Under a RAID 50, you can lose a drive on each
volume and remain intact. A RAID 60 provides fault tolerance against loss of four
drives (two on each side.)

Now that you have an understanding of what RAID is, here is why its a very poor
choice for a permanent archive:

1. RAID systems require power--otherwise, a RAID on the shelf is subject to the


same time stresses that afflict standalone hard drives when sitting on a shelf:
magnetic deterioration and stiction issues when not spun up for long periods of
time.

2. RAID systems are very costly in terms of $/GB. A 32.0 TB RAID from major
manufacturers can cost $15,000. When you add in a Mac Pro as the controller,
plus fibre card and all accessories needed, your system can easily exceed
$25,000.

Using 32 TB of online storage for archive would cost $.78 per GB and that
doesnʼt include utility costs for keeping the system powered on. A set of LTO-5
backups for 32 TB would cost $.32 per GB--and that includes the entire LTO-5
backup system. If you were just accounting for media costs, the LTO-5 price
would be about $.08 to $.12 per GB. And LTO media does not require any elec-
trical power for storage, so your utility charges are less for this type of storage.

3. If you filled up an entire RAID system with archived media, you need to ac-
count for the loss of using that storage for other projects. If your RAID is filled up,
you canʼt load new media and canʼt work on new projects.

4. One of the biggest concerns with newer-generation RAID systems is the sheer
size of the drives. With drive capacities reaching 2.0 TB each (and 3.0 TB coming
later this year) its takes an extraordinary amount of time to build RAID volumes.

When a drive dies, you need to rebuild the RAID volume after the dead drive is
removed and a new drive is inserted. With smaller volumes, this doesnʼt take too
long. However, with larger capacity drives this can take an eternity. The danger-
ous thing about a RAID rebuild is that it is very disk-intensive. Every drive is
taxed to move data around to rebuild the volume. If your dead drive was part of a
bad batch from the manufacturer, then the other drives in your system likely
came from that batch, too.

Some users have reported 100+ hours to rebuild their RAID volumes after a drive
has died. Under RAID 5, if you lose another drive while rebuilding your RAID
volume--all your data is gone.
Silverado Systems, Inc.
Archiving for Indies 9

That is one of the primary reasons for RAID 6: it can sustain a dual-drive failure
and your data is still intact.

RAID systems have an outstanding place in the postproduction chain: immediate


access to large pools of online material with extremely high data rates. Because
these systems require periodic review and service, they are not suited for low-
maintenance operations like long-term archive. Further, their cost structure
makes long-term archive on RAID systems cost prohibitive. Finally, the potential
for data loss during a RAID rebuild with larger capacity drives means that RAID is
not a “sure thing” in terms of bulletproof archive solutions.

That leaves LTO tape as the best option for creating durable long-term archives
with a defined forward migration path and reasonable cost structure. In the pages
that follow, weʼll define a workflow for long-term LTO archives and nail down hard
costs for initial build-out as well as costs spread over amount of tapes generated.

For more information on LTO-5 systems, you can refer to the following page at
Silveradoʼs website:

http://silverado.cc/shop/product.php?productid=1474
Silverado Systems, Inc.
Archiving for Indies 10

Appendix A: LTO Workflow The following solution details the basis cost of implementing a complete single
tape drive archiving solution using an Apple Mac Pro, TOLIS Group BRU Pro-
Mac Pro-based ducerʼs Edition software, an HP Ultrium LTO-5 drive (read compatible with LTO-3
LTO-5 and read/write compatible with LTO-4 and LTO-5) and an ATTO H680 card.
Archive System using
TOLIS Group BRU Please note that manufacturer MSRP is listed for pricing on items. Monitor is not
Producerʼs Edition included here since those are generally readily available.

Item Quantity MSRP Notes

Apple Mac Pro 1 $2499 • One 2.66GHz Quad-Core Intel Xeon


• 3GB (3x1GB)
• 640GB 7200-rpm Serial ATA 3Gb/s
• NVIDIA GeForce GT 120 512MB
• One 18x SuperDrive
• Apple Mouse
• Apple Keyboard with Numeric Keypad
(English) and User's Guide

ATTO H680 Host Bus Adapter 1 $495 • The ExpressSAS H680 provides high-
speed 6Gb/s performance at 600MB/s per
port. By utilizing a serial, point-to-point
architecture, in addition to PCI Express
2.0 bus technology.
• ExpressSAS 6Gb/s HBAs are engineered
for demanding IT and digital media appli-
cations which require more performance
than 3Gb/s SAS/SATA can provide.
• The ExpressSAS H680 features eight
external ports and allows connections to
256 end-point devices.

HP Ultrium 3000 LTO-5 Drive 1 $3383 • HP StorageWorks LTO-5 Ultrium 3000


SAS External Tape Drive
• Sustained transfer rate 1TB/hr. com-
pressed rate
• Buffer size 256MB buffer
• 6 Gb/sec SAS host interface
• 5.25 inch half-height
• AES 256-bit encryption

Tolis Group BRU 1 $499 • The “iTunes” of backup--BRU Producer's


Producerʼs Edition Software Edition™ from TOLIS Group creates easy,
drag-and-drop session archives.
• Engineers and users have a reliable,
flexible, and easy-to-use solution to
protect key creative digital assets regard-
less of computer technical knowledge. 

Tolis Group BRU 1 $199 • Post warranty technical support.


Producerʼs Edition Support • Provides continuing technical support
Plan beyond the initial 30-day period from data
of product purchase.
• Provide unlimited access to the support
group via unlimited telephone, email, and
fax.
• Free product updates, as may become
available, are also included in the service.
• TOLIS support team is staffed by product
engineers.

Total $7075
Silverado Systems, Inc.
Archiving for Indies 11

Appendix B: Media Costs The following chart lists the average MSRP of LTO-4 and LTO-5 media costs per
manufacturer.
Average LTO-4/LTO-5
Prices per manufacturer

Manufacturer LTO-4 Costs LTO-5 Costs

Sony $45.43 $140.32

Maxell $45.15 $220.00

Imation $48.93 $157.59

HP $44.59 $161.70

Fujifilm $46.20 $227.11

TDK $44.66 $180.00

Quantum $48.58 $149.00

Average Price $46.22 $176.53

Average Price per GB $0.06 $0.12


Silverado Systems, Inc.
Archiving for Indies 12

Appendix C: Cost Deltas Price Per GB in $ USD

Your backup systemʼs overall


cost goes down the more you 0.6
backup.

Here is an overview of how 0.45


much the system costs based
on number of backups made.
0.3

0.15

0
16.0 TB 32.0 TB 64.0 TB 96.0 TB 128.0 TB 160.0 TB

Initial costs for LTO-5 system are about $7075 for the system. Tape costs are
averaged using $150 per LTO-5 cartridge.

Over the life of the system, costs would be $.54 per GB if you backed up 16.0 TB
(approximately 11 tapes).

On the other end of the spectrum--if you archive 160.0 TB (about 107 tapes) then
prices fall to $.14 per GB.

Not shown on the chart above:

320.0 TB = $.12/GB# (approximately 214 tapes)


640.0 TB = $.11/GB# (approximately 427 tapes)
1.0 PB = $.10/GB# (approximately 667 tapes)

The above costs do not include labor or training costs. They are equipment and
media only.