Basics of Storage Environment

Storage Basics
An introduction to the fundamentals of storage technology
Storage Basics
Storage Basics
Storage Basics January 2009 Copyright Fujitsu Siemens Computers 2009 Text, editing, production: ZAZAmedia / Hartmut Wiehr Printed in Germany. Published by Fujitsu Siemens Computers GmbH Mies-van-der-Rohe-Strasse 8 80807 Munich, Germany Contact www.fujitsu-siemens.com/contact All rights reserved. Subject to delivery and technical changes. The names reproduced in this document can be brands whose use by third parties for own purposes can violate the rights of the owners.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Section 1 The information society saving data and knowledge at new levels . . . . . . . . . . . . . . 9 Section 2 Tiered storage: intelligent information management in the company . . . . . . . . . . . . 15 Section 3 Online storage: disks and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Section 4 Storage networks spoilt for choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Section 5 Backup & Restore: an unloved compulsory exercise . . . . . . . . . . . . . . . . . 39 Section 6 Storage management making complex storage networks manageable . . . . . . . . . . 49 Section 7 Virtualization some catching up is necessary regarding storage topics . . . . . . . . . . 57 Section 8 The storage strategy of Fujitsu Siemens Computers and its partners . . . 61 Forecast: Future storage trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Partners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Preface
Dear Reader,
Coming to grips with the ever-growing flood of data is still the greatest challenge as far as storage is concerned. This statement no longer only applies to major customers and their data centers, but also affects SME businesses and even smaller companies. Coming to grip means secure data storage and accessibility with due regard to agreed quality standards (service levels) and at a reasonable cost (CAPEX und OPEX). Many of the new technologies such as ILM, DeDup, SSD, virtualization, thin provisioning regardless of whether they are already established or still just hype are eager to help here. We advise our customers to develop a storage solution that suits them from this variety of established and new technologies. We have a comprehensive portfolio of best-in-class products in this regard, but neither is it our intention nor are we able to do everything ourselves. The way in which we monitor technology helps us to make the right choice here. Together with technology leaders in storage we have developed strategic partnerships, we integrate their products in solutions and provide the appropriate services. If we see that our customers have problems, for which there is no suitable solution on the market, we then develop our own products and solutions. CentricStor solutions are prime examples of this. In addition we have begun to not only develop and implement storage solutions at our customers, but also to operate them ourselves for our customers. Fujitsu Siemens Computers is to increase its investment in this managed storage business. Our managed storage customers receive an invoice and the pertinent report every month. The advantages are quite obvious: transparency as regards costs and performance, improved efficiency and return on investment. This not only gives us the advantage that we are becoming better in selecting and developing products that enable storage solutions to be run on a cost-optimized basis and at defined quality standards. Thanks to this strategy Fujitsu Siemens Computers has also developed into one of the most successful providers of storage solutions in Europe.
Preface
The aim of this book is to provide you with an introduction to storage technologies and storage networks, and to highlight the value added for your company. We also introduce some of our partners in the storage sector and provide a comprehensive overview of the current storage product portfolio of Fujitsu Siemens Computers. Our products and associated services delivered by us or our certified partners are the basis for storage solutions that help you contend with the growing flood of data! Your Helmut Beck Vice President Storage Fujitsu Siemens Computers
Section 1
The information society saving data and knowledge at new levels

There has never been a specific culture without information. Human development greatly depends on our ability to acquire information and to make it available to subsequent generations. Whenever there have been major inventions, for example the invention of typeface and printing as well as computers and the Internet, society has undergone a major change as well. Since the invention of the computer by Konrad Zuse, automated information processing and the associated electronic storage of information have affected an increasing number of areas within our lives. Although generating information is much easier, it has now also become easier to lose such information as well. Saving, archiving and managing information is more important than ever before.
n 2008 the World Wide Web celebrated its 15th birthday and Google, the Internet search machine company, became 10 years old. Such milestones represent a new element in the history of computer technology: nowadays, anyone can just click on a mouse and browse through a vast indescribable amount of information and entertainment data at anytime and anywhere. Search machines, such as Google, bring a certain amount of order to this knowledge chaos and it was this particular discovery by two Stanford students that revolutionized global knowledge within a very quick time. [1] Accessing data quickly assumes that the data can be read and is available somewhere in the World Wide Web. Its readability is based on file formats, classifications and index or meta data which must be defined before being accessed by the search machine. Its existence is based on the fact that it is stored somewhere on electromagnetic storage systems. That is only possible using new and state-of-the-art technology which has been developed over the last 60 to 70 years, i.e. information technology (IT).
10
Section 1
Original forms of information technology
toring information and knowledge is not exactly a modern intervention Irrespective of the social forms of human coexistence history has revealed certain constants in handling such knowledge and in its significance: even in early times human groups forwarded their experiences from generation to generation separately and not just by word-of-mouth. They piled up knowledge and traditions, first of all verbally and then using images and systematic symbols, such as Sumerian pictogrammes and Egyptian hieroglyphics. Our alphabet today used in European languages arose only about 3,300 years ago with the Sumerians and later the Egyptians. Basically a very short time ago when compared with the overall history of man and homo sapiens in East Africa about 4 million years ago [2]. Human life organized itself around such information, partially written, not just daily life but also fixed state institutions and not least religion itself. A wide range of cultures thus existed around the world long before the Europeans discovered the world and became subject to occidental standards. August 2008 saw the German weekly news magazine Der Spiegel look closely at the effects of such a flood of data on mankind and briefly looked at the very beginnings of information technology. The first stages were: The Sumerian matchstick lines turned language into a standard: fixed in clay, they outlived their creator and became independent of time and space. In 1300 B.C. we had the first wooden bamboo strips in China and the first book. Comprehensive information can now be moved. The Phoenicians develop the first text with syllables. About 300 years later the Greeks turned that principle into the original alphabet. [3] Knowledge progress as a result of technology made a jump forward some 3,000 years later when Johannes Gutenberg invented the copper matrices as the basis for the mass creation of individual letters in about 1450. The first weekly papers appeared after 1600 in Strasburg and Wolfenbttel and the first daily paper was published in Leipzig in 1659. Then things moved fast: the invention of the telegraph in mid 19th century, later the telephone, then the typewriter, the LP, electromagnetic waves via Hertz and then radio, film and television until we finally reached the first computing machine developed by Konrad Zuse in 1938. The rest is history: Internet, World Wide Web and mobile phones.
11
Digitizing knowledge
he means and technologies used by man to document experience and knowledge either for a short time or indeed long-term have greatly developed throughout history, whereby the term knowledge is used here without any value attached to it, i.e. without an assessment of the content. However, the reason why human or social groups pass on information is still the same. And it is the advance of computers and the World Wide Web which has now moved society away from an industrial society to one based on information. The production of goods in the classical sense is more in the background and it is now the product services and information which has been established. Distributing information and entertainment has become a separate profitable business and has constantly changed the way in which society interacts [4]. More and more digital information is entered, saved and provided via networks. On the one hand, this growth is based on the amount of accrued company data. Accounting was always at the center of automation developments. Some of the oldest clay slates found in Persia contained book-keeping documentation. The first computers were predominantly used for book-keeping purposes. And today IT has spread throughout companies like wildfire. More and more operational areas are now electronic and each new sector provides an increasing amount of digital data. Depending on the business model, there are some companies today which just exist virtually within computers and that includes many Internet businesses. Classic companies started with solutions for ERP (Enterprise Resource Planning) and moved via CRM (Customer Relationship Management), SCM (Supply Change Management), Data Warehousing and Business Intelligence to new areas in Web 2.0 and social media. It is frequently production-technical and commercial data that is entered the most and processed in applications from SAP, Oracle or Microsoft. This structured data, which is systematically stored in database fields, can be easily accessed via queries. Its evaluation and interpretation has become very complex due to the quantity and constant expansion. This is why Data Warehousing exists where the data collected data is sorted and edited according to business intelligence criteria. For example, for airline companies want to know for marketing and flight organization purposes how often their customers fly, their destination and also their chosen form, i.e. business or economy. Databases alone do not provide such interpretations.
12
Section 1
Changing information forms within companies
he Enterprise Strategy Group defines three phases: Phase one saw the automation of core processes with database-based applications, for example SAP R/3. Information became structured and transaction-oriented, hence the name transaction data. Phase two saw the structure of IT within a company change. The PC was invented and it was introduced to companies on a decentralized basis. PCs were then merged into workgroups which changed data via the first servers. That was the birth of Novell and also the starting-point to develop Windows as a server operating system. And suddenly there were vast quantities of non-structured office files. The age of distributed files had started and the required storage capacity was soon greater than that required for transaction data. And today we are at the beginning of the Internet data age Web 2.0 applications such as social networks, wikis or blogs are also being used in companies. Each person is now not only an information user but also an author of information and so the amount of data that now has to be stored has considerably multiplied. The digital footprint in the network is now already enormous and the end of such growth is not in sight. It can be expected that Internet data will soon leave all other sorts of data way behind.
The growth of non-structured data (Web 2.0)

ost of todays data is not structured. It comes from Office or e-mail programmes. Non-structured data has relevant and non-relevant data, all mixed up and not easy to sort. That is the challenge: it is non-structured data that has to be managed correctly so that important data can be archived on a long-term basis without any loss and so that non-important data can be saved with a minimum of resources. The term compliance is used to describe all the data storage requirements as defined by organizations and the state. The mountain of digital data is growing because the old analog storage media in the consumer world is being replaced: voice, music, photos, TV and film are now digitally recorded and stored resulting in gigantic quantities of data. And added to that comes the continuing conversion of analog media. The improvement in Internet streaming technology and the improvement in bandwidths will enable the Internet to presumably replace media such as CD-ROMs or DVDs for music and film. This will generate increased demand for hard disks and arrays.
13
How big is an Exabyte?

Kilobyte (KB) 1,000 bytes OR 103 bytes. 2 Kilobytes: a typewritten page. 100 Kilobytes: a low-resolution photograph. 1,000,000 bytes OR 106 bytes. 1 Megabyte: a small novel or a 3.5 inch floppy disk. 2 10 Megabytes: a high-resolution photograph taken by a digital camera. 3 4 Megabytes: a song compressed with MP3. 5 Megabytes: the complete works of Shakespeare. 10 Megabytes: a minute of high-fidelity sound. 10 20 Megabytes: a digital chest X-ray. 100 Megabytes: 1 meter of shelved books or two hours of compressed radio. 650 Megabytes: a CD-ROM. 1,000,000,000 bytes OR 109 bytes. 1 Gigabyte: a pickup truck filled with books. 3.2 Gigabytes: one hour of HDTV. 5 Gigabytes: the size of a typical movie stored on a single DVD. 20 Gigabytes: a good collection of Beethoven works. 50 Gigabytes: capacity of a Blue ray disc. 100 Gigabytes: a library floor of academic journals or about 1,200 hours of downloaded MP3 music. 500 Gigabytes: the native capacity of the largest tape cartridge in 2005. 1,000,000,000,000 bytes OR 1012 bytes. 1 Terabyte: 50,000 trees made into paper and printed. 2 Terabytes: an academic research library. 10 Terabytes: the print collections of the U.S. Library of Congress or projected capacity of a magnetic tape cartridge in 2015. 600 Terabytes: National Climactic Data Center (NOAA) database. 1,000,000,000,000,000 bytes OR 1015 bytes. 1 Petabyte: 3 years of EOS data (Earth Observing system). 2 Petabytes: all U.S. academic research libraries. 20 Petabytes: production of hard disk drives in 1995. 200 Petabytes: all printed material. 1,000,000,000,000,000,000 bytes OR 1018 bytes. 2 Exabytes: total volume of information generated in 1999. 5 Exabytes: all words ever spoken by human beings. 9.25 Exabytes: the amount of capacity needed to hold all U.S. phone calls in one year. 90 Exabytes: estimated worldwide available digital storage capacity in 2010 for all media (disk, tape, optical).
Megabyte (MB)
Gigabyte (GB)
Terabyte (TB)
Petabyte (PB)
Exabyte (EB)
Source: Horison Information Strategies, UC Berkeley Study How Much Information, IDC
The so-called Web 2.0 with its new interaction options for network participants, for example YouTube, MyFace, LinkedIn or Xing, will also result in huge data quantities being stored by the providers responsible. Most of todays Blade servers and storage arrays are being sent to such companies. This development will increase as new technologies expand, such as leased software (Software as a Service / SaaS) or Cloud Computing where the user accesses programmes and data which is stored in giant data centers somewhere in the Internet cloud. Medium-sized companies and start-ups will enjoy low-priced options that enable them to use such a sophisticated infrastructure.
14
Section 1
Amazon with its large data centers is renting out computing and storage capacity for its external customers. Of course, the appropriate network bandwidths must exist and the provider must be 100% reliable. It is clear that new technologies which are used first of all in the consumer environment will expand into the world of business IT. Risk analyses for particular situations are essential, especially when looking at security topics and cost savings. The options in our Information society have simply not yet been exhausted. New processes for transferring knowledge and storing information have joined the existing procedures [5]. Information technology has enormous potential but, just like all the other technical progress before, it is simply a tool for specific purposes. It all depends on how it is used and for which objectives. First of all, it has to be ensured that data backup itself becomes more reliable. The storage example shows both the technological opportunities as well as the restrictions. And that is why particularly in this environment there is a whole range of new fundamental inventions and gradual improvements.
Storage in such a world is thus gaining in significance:

1) The ever-increasing mountains of data first of all has to be saved first and where possible using optimized resources. 2) As data has to be available anytime, anywhere, the storage systems used must provide the data worldwide via the Internet on a 24 hour-a-day basis. Data has to be saved several times on a redundant basis in order to ensure that it is not lost due to unforeseen circumstances. 3) An increasing amount of data with even less structure will dominate everyday data storage processes: data management must thus become more intelligent so that data is suitably saved according to its value (Information Lifecycle Management). This is the only way to meet compliance requirements.
Section 2
Tiered storage: intelligent information management in the company

Companies have looked at the topic of saving data but with various degrees of intensity. If something goes wrong, i.e. data disappears, is stolen or the data backup media cannot be recovered when hardware is faulty, then people tend to keep quiet about it. It is rare for this type of situation to become public knowledge as the company will try to protect its reputation or brand name. But almost all IT departments follow some sort of storage strategy even if it is not well-known nor follows specific regulations: they move their data from A to B or C in order to save costs or they dont need the data for a while. Most companies thus have a basis and a need for a clear strategy when it comes to saving their digital treasures.
lmost all the forecasts and estimates about fast data and storage growth have so far proven to be true despite the comments of a lot of analysts and market observers. Such estimates have often proven to be even too conservative. In particular, recent years have shown that, in addition to the company IT and their storage requirements, other groups in society now digitize their analog information. These include film and video, voice and music recordings, medical x-rays (medical imaging), TV cameras in major cities or at security-conscious locations (security & surveillance) as well as the conversion from analog to digital radio in police, fire brigade and rescue services. An additional factor is the so-called social networks (social communities), such as YouTube or Facebook with their enormous amount of data photos and videos. Digital information is being saved everywhere at a great extent including data centers which are expanding daily. But is all this data really worth it? Do all those holiday, birthday and family snapshots which used to bore us all at family slideshow evenings really have to be saved on state-of-the-art technology for ever and ever? Does a company really have to save everything electronically despite legal specifications? According to IDC the growth of all non-structured data (file-based) that is being increasingly collected outside the company will exceed structured data (block-based) for the first time in 2008. The balance in the company between structured data entered in databases or ERP applications and non-structured data resulting from e-mails
16
Section 2
World wide File and Block Disk Storage Systems, 20052008

9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Block based File based
Petabytes
2005
2006
2007
2008
Source: IDC 2007
According to IDC, in 2008 file-based data will for the first time experience of stronger growth than block-based data.
including attachments, Office files, presentations, videos and so on, has also shifted. The problem of retrieving structured data via database queries and interpreting data via Data Warehousing or Business Intelligence has today basically been solved although there is no solution for non-structured data [1]. However, this cannot be said for the actual storage of such data quantities: the main type of storage i.e. fast access to expensive hard disks is limited and only assigned to really critical business applications.
Moving data
oving data records and data in servers and applications to less performant storage areas is necessary for several reasons: New up-to-date data is materializing every day, every hour in business-critical information. Booking, purchasing and selling data must all remain in direct access, yet it becomes unimportant after a certain period and even out-of-date. It has to be moved for space reasons, i.e. from primary online storage to slower and lower-priced disk systems (Nearline or Secondary Storage). Other files, such as presentations, are not necessary business critical yet have to be stored nearby (Nearline) as such files are often modified and used again and again. Such data is a typical example as it is saved several times and used or read by numer-
17
ous employees. The analysts in the Enterprise Strategy Group (ESG) have developed the Iceberg model. There are two main types of data: dynamic data which is continually changed (= visible part of the data iceberg) and permanent data that is static or fixed and will not be changed any more (= invisible part) [2]. Legal regulations or just careful commercial thinking requires long-term data storage without having to have the contents constantly available during daily business. It can move to an archive in whichever manner. In early days it was stored to tape (known nowadays as so-called Virtual Tape Libraries) which simulate tape storage in disk systems. This is a third type of storage on tape but the data is a long way from the original starting-point, such as servers and primary disk storage units. Last but not least: data loss must be avoided depending on the value of the data by making immediate or delayed copies which can be recovered as required (Backup, Restore, Snapshot, CDP/Continuous Data Protection). These procedures are based on up-to-date duplication concepts which fish out the copies during the backup processes. But this is only from the backup media and not from the primary or secondary disk storage units. More information about this complex topic is in section 5. Even if a company only uses one aspect of these procedures, it is still using a tiered data backup system as data is being moved, even it is from the server to a backup medium, and even though this would be inadmissible from a professional storage viewpoint.
HSM and ILM: classic concepts with long-term effects
wo classical concepts, apart from anything else, prove that a tiered storage process was always somehow operated within a companys IT. Both look at a fundamental question: what must be saved, how, for how long and on which medium? In the world of mainframes the Hierarchical Storage Management (HSM) method was used where the automatic storage of data is on the lowest-priced storage devices according to the performance required for each application and according to the description offered by Fred Moore [3]. This process is not visible for the users who can access all the data they want without any restrictions irrespective of the storage hierarchy level currently involved. A special HSM software looks after the various storage levels. IBM introduced this procedure for the first time in mainframes in 1975. Once the Internet bubble burst, StorageTek and other providers set up their strategy on Information Lifecycle Management (ILM) so as to guide the user to a content view of his stored data. A hardware manufacturer had a wide range of different devices on offer; there was also more cooperation with those manufacturers who concentrated on
18
Section 2
What is Tiered Storage?

Tiering means establisching a hierarchy of storage systems based on service requirements (performance, business continuity, security, protection, retention, compliance, etx.) and cost. Tiering storage requires some mechanism to place data: Static applications assigned to specific tiers Staged batched data movement (e.g. archive) Dynamic some active data mover (e.g. HSM or ILM policy serice)
Source: SNIA
The stored data is increasingly moving towards non-access on account of the various levels involved: from servers and quick primary storage and slower storage to backup mechanisms and archiving.
using software to classify data. EMC bought Documentum a manufacturer of a document management solution (DMS) in order to set up packages for both their old and new customers. ILM can be seen as a continuation of HSM in the world of Unix and Windows. ILM manages the data from its origin to its archiving or deletion and stores the data on various fast and powerful data media depending on its individual value. This type of storage management is based on HSM technology and uses company policies to establish an optimal match between data values and the respective storage subsystems. Even if companies are often not aware of it they all practice some form of ILM. Even those who keep their data for a year or even longer on high-performance and expensive online storage, have made a decision regarding the assumed value of their data. But whether such a decision can be justified is not clear as in the meantime the data could have been saved on cheaper data media. A similar approach is Tiered Storage which is basically the same as HSM and ILM, but which looks more at the combination of IT infrastructure (hardware basis) and data contents. The stored data is increasingly moving towards non-access on account of the various levels involved: from servers and quick primary storage (online storage for data that requires immediate business access) and slower storage (nearline storage for data that is only required sometimes) to backup mechanisms and archiving. Such a structure based on the value of the data exists in every company whatever they may call it.
19
ILM Implementation Roadmap

Instrument & manage service to ILM
practices across sites
Deploy ILM practices across the enterprise Automate ILM-based policies & services Pilot ILM-based Solution Stacks Standardize Information, Data & Security Services Identify information assets & infrastructure resources/services
Source: SNIA
Automate with ILM Management tools Capture overall savings & benefits Begin operating policy-based services Refine ILM practices & benefits Set data & info polices across domains Tier storage and protection into
standard Service Levels Deploy Configuration Mgmt tools
Begin collaborating on requirements Identify value, lifecycle & classification

of information for each business unit Use SRM tools to identify and track
The recommendation of the SNIA Data Management Forum is to add intelligence to the tiering efforts by integrating it into a broader ILM-based practice. More informations at www.snia.org/ dmf.
HSM and ILM can be regarded as a high-level, indeed tactile strategy which establishes the data backup stages and criteria in a justified sequence. Many manufacturers selling HSM or ILM promised their customers that they would, above all, reduce their storage costs. This refers to the classic storage processes such as data save, backup, restore and disaster recovery which must be classified as such so that the data can be saved on the appropriately priced and powerful storage medium according to its value for the company. That is more easily said than done: how can someone decide which data should be stored at which stage and on which medium and for how long? [4] The traditional data hierarchy was split in two: data saved on hard disks with direct and fast access and backup or archive data saved on cheap tape and which is not in direct access and thus partially bunkered somewhere. Those who select HSM or ILM as their strategy want to move away from this old concept and now wish to save data according to usage, i.e. its direct significance for the business process. Those that plan such a step with specific criteria can save money immediately [5]. Even if this approach is not always accepted [6], HSM and ILM have had an effect: Tiered Storage today is seen by companies to be quite normal. A real hierarchy now almost completely dominates the world of storage. The two tiers have now become four
20
Section 2
or five and, in an ideal situation, they reflect exactly the value of the data on each tier based on the corresponding costs. In other words, expensive primary storage (fibre channel and SAS disks) down to less expensive secondary storage (disk-to-disk (D2D)) on SATA disks which still have to be accessed via servers or applications, and then down to different forms of backup: either as backup storage on cheap SATA disks which have the function of the older type of tape backup (Virtual Tape Libraries / VTL) or as classic backup and archive on magnetic tapes.
Tiered storage and data classes

he classic version only had two storage levels: the first step was to have the data saved on server-related hard disks (Direct Attached Storage = DAS) and keep it there for a while for fast data access and then the second step was to move the data to a Tape Library. If the data had to remain accessible, enterprises could only use powerful tape libraries from StorageTek or IBM with high-performance robotics that loaded or unloaded the cartridges quickly. Backup and archive data was also saved on tapes and kept at a safe place usually away from the company premises at least that was the theory. But if the data was required again due to a recovery situation, it first had to be transported back and then loaded into the productive system a process which could often last for hours or even days. Further developments in disk and array technology have today resulted in a tiered storage model which comprises at least three or four classes: Tier 0: Fast data storage (Flash Memory or (Enterprise Class) Solid State Disk) is used to ensure that data can be accessed very quickly. For example, Solid State Disks (SSD) as very expensive cache storage have been on offer for years from specialist companies such as Texas Memory Systems. The main customers are state-owned organizations in the USA, banks or companies, which are actively involved in crude oil exploration with enormous amounts of data which even when using online storage would be too far away from the applications. Tier 1: Mission critical data (such as revenue data), making up about 15% of all data, very fast response time, FC or SAS disk, FC-SAN, data mirroring, local and remote replication, automatic failover, 99.999% availability, recovery time objective: immediate, retention period: hours. Tier 2: Vital data, approx. 20% of data, less critical data but fast response time, FC or SAS disk, FC-SAN or IP-SAN (iSCSI), point-in-time copies, 99.99 % availability, recovery time objective: seconds, retention period: days.
21
Tier 3: Sensitive data, about 25% of data, moderate response times, SATA disk, IPSAN (iSCSI), virtual tape libraries, MAID, disk-to-disk-to-tape periodical backups, 99.9% availability, recovery time objective: minutes, retention period: years. Tier 4: Non-critical data, ca. 40% of the data, tape FC-SAN or IP-SAN (iSCSI), 99.0% availability, recovery time objective: hours/days, retention period: unlimited. HSM, ILM or Tiered Storage require clear-cut and continuous data classification. This can only be handled manually, especially with non-structured data, which is again much too expensive. The price of the equivalent software in the market, such as Data
Tiered Storage and the Data Lifecycle

Tier 1
Data type Applicatons Availability in percent I/O, throughput Scheduled downtime Recovery technology 100 Operational Mission-critical, OLTP 99.999% Very high None Disk
Probab i
Tier 2
Application Vital, sensitive 99.99% High < 5 hours/year Disk, tape libraries
lity o f re
u
Tier 3
Reference, archive Reference 99.0%99.9% Moderate, low => 10 hours/year Tape Long-term retention Amount of data government regulations
Probability of reuse (%)
Primary storage se Enterprise class disk Mirroring and replication, CDP Synchronous and asynchronous (remote)
V
alu
Secondary storage SATA disk and virtual tape Fixed content, backup/ recorvery, reference data Point-in-time, snapshot, deduplication
Fixed content Video, medical,
Tape libraries, deep archive Offsite vaults
ata Amount of d
fd eo
at
VTLs SATA/JBOD MAID 90+ days 1+ years forever minutes (remastering) hours days
0 Average days since creation Recovery Time Objective (RTO)
0 days milliseconds
30+ days seconds Data mover
Key components (ILM) Policy engine Tiered storage hierarchy
Source: Horison Information Strategies
It is increasingly important to understand that the value of data changes throughout its lifetime. Therefore, where data should optimally reside and how it should be managed changes during its lifespan.
22
Section 2
Movers or Policy Engines, is such that an investment has to be carefully calculated beforehand. The result is that many customers regard ILM or Tiered Storage as a good idea but shy away from the corresponding investment involved. In practice, ILM has only been accepted as a comprehensive concept if it could be obtained as an integrated product (such as CentricStor from Fujitsu Siemens Computers). It would be a serious mistake to believe that the importance of data once stored does not change according to the business process, or related to the moment of data entry or other criteria. For example, immediate access to customer flight booking data is essential before, during and after the flight but a week later it is only of interest for those running statistics or customer behavior evaluations. Decisions about data storage and archive locations and periods must thus be taken. Even when nearline tape libraries were introduced at the beginning of the Nineties, it was still thought that archiving data was the last phase before it was deleted. A life span of more than one or two years for most data was pretty inconceivable in those days. But that has certainly changed. On the one hand, state regulations throughout the world now decree that IT data must be stored for longer periods; on the other hand, new hard disk technologies have resulted in various disk storage phases. Nowadays, the proportion of data that must still be retained towards the end of its lifecycle is increasing and not reducing as in earlier days. There are more transit stations than before and more computing power and server performance must be used when moving data from the one tiered storage to the next. Many start-up companies have been involved in classifying data and in automating the processes involved and have since been followed by the giants in the branch. Fujitsu Siemens Computers has been working together with Kazeon in this respect.
Section 3
Online storage: disks and reliability

This section looks at the extremely sensitive area of direct data storage during or immediately after its creation within the business process. Nothing must get lost! The added safety harness in the form of backup on other media comes at a later stage. In this situation, costs play a less important role. What are the features of expensive fibre-channel and SAS disks as used in online storage? Why are RAID systems used? And how fast must direct data access actually be? These are just some of the questions to be looked at more closely.
n the early days of electronic data processing which only took place on mainframes, punch cards were originally used to save data. This was followed in 1952 by magnetic tapes. Both methods were based on binary data (consisting of two figures: 1 or 0) either by punching or not punching a hole on paper cards or by magnetizing or not magnetizing the tapes. This form of storage is still necessary today as computers can only handle information which has been reduced or converted using such a binary system. In other words: this numbering system just consists of a 1 and a 0 because the heart of the computer, namely the processor, can only operate on such a basis. This means in turn that data backup is basically pretty uncertain. Magnetic tapes were fast and could save what was in those days a large amount of data, namely five megabytes (= 5 million characters, corresponds to several books or the complete works of Shakespeare see the overview in section 1). But as early as 1956 alternative data media started to appear: the magnetic disk, the forerunner of the modern hard disk. The first data media of this type consisted of a stack of 51 disks with a diameter of 60 cm (IBM RAMAC). Hard disks as known today have several, rotating disks which are arranged above each other in an airtight housing. In contrast to magnetic tapes, the data is no longer written and read sequentially to the data media, which slows down access and retrieval, but to coated disks. A read/write head is driven by a motor above these disks and can skip to all the positions required. In contrast to sequential storage, this type is known as random access and is much faster.
24
Section 3
Lifespan of hard disks

ard disks consist of mechanical parts which are constantly mobile and require energy. This is also the case even when there are no read/write accesses. The advantage of magnetic tapes is they are not constantly moving and that the data is also stored in a non-power status. The lifespan of a hard disk is basically limited as a result of its mechanisms and thus possible error sources; they should be replaced every three to four years on average even if some disk manufacturers claim a longer lifespan. Tapes used professionally are said to last for even thirty or more years. Relating to the tier storage concept, the use of tape today is moving from backup to archiving (see section 5), while powerful hard disks have since taken on the main job of data storage and even some of the backup activity. Disks can be directly accessed by servers for business critical data where very fast, yet expensive FC and SAS disks are used. Lower-priced and slower disks are used for backup to disk systems or as an intermediate stage in the form of virtual tape libraries (VTL) on which data is retained for application access (nearline storage) before it is finally moved to tape according to specific regulations or periods of time. It was 1980 before Seagate (a company founded in 1979 und now the worlds largest disk manufacturer) launched a hard disk suitable for IBM PCs with a capacity of 5 MB. These disks and their immediate successor did not have any intelligence of their
The disk market in terms of interfaces

2007 23,3 23,3 2009
26,4
30,4
16,4 33,9 44,9
1,4
Parallel SCSI
SAS
Fibre Channel
ATA/SATA
Source: Gartner Dataquest
According to Gartner Dataquest the SAS interface is evolving into the number one disk technology. The previosly dominant Parallel SCSI is sinking into insignificance.
25
own and were completely managed by an external controller. Todays standard hard disks are based on standards, such as IDE/EIDE (Integrated Drive Electronics and Enhanced Integrated Drive Electronics) and ATA (Advanced Technology Attachment) which come from the consumer sector or are known as SCSI disks (SCSI (Small Computer Systems Interface) specially developed for enterprises. Many different devices could be connected to a SCSI controller from hard disk to scanner. The parallel data transfer rate was much higher than the previous sequential transport methods. Since 2001 the ATA development SATA (Serial Advanced Technology Attachment) has become more widespread the data is no longer transferred in parallel but in serial. SATA hard disks now provide competition to fibre-channel hard disks as today they have a higher degree of reliability and have fallen in price. Fibre channel technology as a whole is regarded as particularly powerful for enterprises since the introduction of storage area networks (SANs). They read reliably and quickly. SAS disks (Serial Attached SCSI) today play a significant part in this professional sector as they are gradually replacing the SCSI disks. As they are compatible to SATA, they can be installed together in a joint array which can result in tier 1 and tier 2 being connected within one single device.
The performance of various disk types

Specification Fibre Channel SAS SCSI Online storage Online storage Online storage and transaction and transaction and data data transaction data 10k, 15k rpm * 10k, 15k rpm * 10k, 15k rpm * 3 4.5 ms ** 3 4.5 ms ** 3 4 ms ** 5.5 7.5 ms 24 x 7 High > 1.4 million hours 5.5 7.5 ms 24 x 7 High > 1.4 million hours 3 gbps Yes 5.5 7.5 ms 24 x 7 High > 1.4 million hours 3.2 gbps Yes SATA Low-end file storage 7,200 rpm * 8 10 ms ** 13 15 ms 10 x 5 Low 600,000 hours 1.5, 3.0 gbps No
Rotation speed Seek time Typical average access time Power-on time (hours x days) I/O duty cycle MTBF
Maximum bus 4 gbps speed Interactive error Yes management * rpm = Rotations per minute ** ms = Milli seconds Source: Horison Information Strategies
26
Section 3
The technical options of the various hard disk types have not yet been fully exploited and SATA will probably expand further regarding professional storage but will in turn be replaced by SAS in nearline storage. The advantage of fibre channel is that, in addition to better equipment with internal microprocessors for mechanic and error control, it can be positioned away from other devices in the storage network (up to 10 kilometers, whereas SCSI is only 25 meters). This was decisive in setting up Storage Area Networks (SANs) since the end of the Nineties as decentralized locations, such as buildings on extensive factory premises or within a town, could be connected to each other via the storage network. The use of IP protocols for storage networks has since extended the range of FC and SCSI/SAS hard disks to cover very large global distances. Only powerful FC, SAS and SCSI disks are used in online storage as part of the data and storage hierarchy [1]. Solid State Disks (SSD) already installed by some manufacturers in their storage systems are already significant as a kind of second cache (RAM) due to their high access rates. As they have no mechanical parts, they have a lifespan that is longer than that of classic hard disks. But they too are nearing their end as the SSD lifecycle comes to a conclusion after 10,000 to 1,000,000 write accesses [2] according to manufacturer specifications.
Online storage customer requirement: reliability, availability

What is most difficult to replace if a company loses everything, e.g. as a result of disaster: Its buildings? Its computers? Its data? 50 % of all companies who lose their data as a result of a fire or flood are bankrupt within one year. Possible solutions: More reliable disks (migration from SATA to SAS) Redundant disks (RAID, mirroring) Regular backups
Source: Fujitsu Siemens Computers
27
Hard disks a contradiction in terms regarding long-term data backup
ven if you decide to use high-value hard disks in online storage, one problem still remains, namely the lifespan of the hard disks and thus your data is extremely limited. The manufacturers warranty has usually expired after three to four years and new investments are made usually for accounting purposes. The actual reason for this situation is the strange contradiction that exists in our so-called digital age. An increasing amount of information is saved on magnetic media, but that is certainly anything but long-term as a power failure, head crash, material damage, theft or the sudden death of the system hardware are all enough to send your data so carefully saved and at considerable cost all off into Nirvana. The correct moment for replacing hard disks also depends on conditions, such as temperature and the amount of time the hard disks have been running. The lifecycle (or duty cycle) is also reduced by frequent power-ups and switch-offs more so than if a disk runs on a round-the-clock basis; this can be compared to the stress situations that arise when a plane takes off and lands [3]. Annualized failure rates (AFR) broken down by age groups
10 8 AFR (%) 6 4 2 0
3 Month
6 Month
1 Year
2 Years
3 Years
Source: E. Pinheiro/W.-D. Weber/L. A. Barroso, Failure Trends in a Large Disk Drive Population, February 2007 (Google)
Not only the age of the disks is accounted for in the annual error rates but also the different disk types.
4 Years
5 Years
28
Section 3
Even if the disks last somewhat longer than a three-year cycle in particular situations, it is advisable to make the change and recopy the data in good time. How important is the price of new disks (constantly falling) and the administration hours involved in comparison to a data catastrophe that you yourself have caused and the costs of which could damage a company beyond belief?
Reliability with RAID
o-called RAID solutions exist to protect data on a basically unreliable hard disk and they are used as a standard in most of todays disk systems. RAID, Redundant Array of Independent (earlier known as Inexpensive) Disks started their development at the University of Berkeley in the mid Nineties. There are two important functions in a RAID. First of all, many small and inexpensive disks are combined to one group LUN (Logical Unit Number) so that they provide more performance for an application. RAID 0 combines several disks in order to increase performance. However, this situation increases the risks involved (in contrast to the general belief that RAID provides greater data security) as the statistical probability increases that the whole system will no longer function when one of the disks in the array goes down. The operating system sees a RAID as one single logical hard disk (consisting of many different physical disks). The RAID function known as striping means that the capacities of each disk in the array can be split into partitions which in turn can be addressed by one or more LUNs: application data can thus be distributed across several disks whereby they use an early type of storage virtualization. Today, RAIDs are mainly involved in providing protection against disks failing. A RAID controller can do more: it takes on the disk administration, modifies the configuration and size of the system cache according to the respective application requirements. The common factor in RAID levels (apart from RAID 0) is that there is a balance between performance and redundancy according to the system or application requirements. The redundancy versions require additional investments in disks which then standby for the emergency of disk failure (so-called hot spares). Disk systems can be reset to their original state should a disk or a LUN fail by using parity calculations which take up a lot of computing time and nowadays are fixed in chips. RAID 1 saves everything twice (this requires double disk capacity), while RAID 5 requires about 10% more disk space in order to ensure redundancy: One disk covers for several disks and is ready to jump in and help should there be an emergency. The widespread RAID 6 was developed in order to eliminate a possible RAID 5 error: Specialists from different manufacturers and research groups wondered what would happen if, during processor-intensive reconstruction of the original parity status after
29
Milestones in the hard disk drive industry

Capacity 5 MB 10 MB 100 MB 500 MB 1 GB 100 GB 500 GB 1 TB 1.5 TB Company IBM IBM IBM STC IBM Seagate HGST HGST Seagate Model 350 Ramac 1301 2302-3 8800 Super Disk 3380 Barracuda 180 7K500 7K1000 Barracuda 7200.11 Year 1956 1962 1965 1975 1981 2001 2005 2007 2008 Formatted Capacity 4.4 MB 21.6 MB 112 MB 880 MB 1,260 GB 181.6 GB 500 GB 1 TB 1.5 TB
Source: Storage Newsletter, edition July 2008
a disk downtime, the spare disk or another disk in the array were to fail? If such a failure were to occur, RAID 6 has a second parity calculation ready which becomes active when a second disk fails. In such a situation, the performance of the controller drops by more than 30% compared with a simple RAID 5 downtime. Most manufacturers recommend a specific RAID configuration for their systems or applications. For example, Oracle recommends a combination of RAID 1 and 5 for its database in order to increase performance. RAID 3 is more suitable for video streaming and NetApp has selected RAID 4 for its NAS-Filer as very fast read/write actions can occur on the disks [4].
How much data protection do you need?
T manufacturers have always been fairly inventive when introducing additional protective functions for data storage around disk arrays. All these procedures are based on the idea of redundancy: keep everything twice or more where possible. This includes at a hardware level clusters and grids which means that specific hardware is available several times so that the second device can run with an identical configuration and identical data should there be any problems. The transfer between clusters and grids is based on scaling and also subject to heated expert debate. International companies have also networked their data storage units and have to go to great lengths to protect their storage media, networks and backups against any
30
Section 3
misuse. Todays SAN and NAS infrastructures unfortunately have only low-rate security mechanisms both at a fibre channel level as well as on an iSCSI basis. They thus frequently do not meet the security policy requirements for the companys IT. Zoning switches in FC-SANs ensure that access control is permitted for each storage system. This zoning can be run on a hardware or software basis. Soft zoning means that devices just get information about those systems with which they are to exchange data. Hard zoning means that hardware checks all the packages and forwards them only to the permitted addresses. LUN masking is the function in an FC-SAN which makes only those storage areas visible to an application which the latter needs to implement its tasks. With IP-SANs on iSCSI basis, IPSec is used for authentication and saving data streams, for example, via encryption [5]. Mirroring (a RAID 1 principle where i.e. one disk is an exact mirror of the other) can also be applied to mirroring an entire storage system. An identical server can be positioned at a second location possibly several kilometers away and a storage landscape can be set up based on a contingency data center. All the data is constantly transferred from location A to location B so that the same data is at both locations. In the event of a catastrophe, the productive IT including the stored data is transferred from A to B. As everything is redundant and mirrored, IT operations can be continued. The software elements in data backup are logically based on processes for Backup and Restore, Continuous Data Protection (CDP) as well as Snapshots. (See section 5 for more details.) All these processes have their origin in the basic problem of storing data on electromagnetic media, such as hard disk or tape (and likewise DVD or Blu Ray). Despite all the advantages of such technology the stored data could suddenly disappear into thin air! An old medium such as paper can be longer lasting and proven methods exist against dangers such as fire or natural catastrophes. However, protecting electronic data media on a permanent basis against downtime or damage remains a tricky and never-ending story to which IT managers must give their full attention. There is no patent recipe.
Section 4
Storage networks spoilt for choice

Today storage networks are denoted as state of the art. At least all large companies use this technology. However, it exists in various versions, which makes its understanding complicated particularly for beginners. Therefore, this section deals with the description of some fundamental architectural features of the various approaches and the explanation of important terms. This is especially necessary since small to medium-sized companies in particular are currently also able to set up own storage networks for their purposes. However, which one should you choose in practice?
ard disks that are installed in servers and PCs or are directly connected to the servers in storage arrays are still the most wide-spread structure in small to medium-sized companies known in this case as Direct Attached Storage (DAS). Small to medium-sized businesses have discovered the productive forces of IT and use them for their business processes. However, at the same time their financial resources limit their investments in an own IT infrastructure. Furthermore, there are fewer experts
EMEA market share for DAS, NAS and SAN in 2007 NAS 17 % 23 % 60 % SAN
DAS
Source: IDC, 2008
Even although SAN determines the topology of storage systems as a whole, the share of DAS is still large, particulary in small and medium sized business.
32
Section 4
available, who in addition are not able to specialize in the same way as their colleagues in large-scale companies, who only have to provide support for a small part of the IT. Consequently, small to medium-sized businesses do not follow every trend and concentrate on the basics. Although keeping a DAS structure does not meet the state of the art of current storage technology, it can be used as the starting point for a gradual transformation. But what does DAS really mean? Users who connect one or more storage arrays per server have a dedicated, exclusive storage array for precisely the application that is installed on the server. This silo structure may become somewhat complex with time and use a great deal of space and energy in the server room or data center but it is ultimately easy to monitor and manage. The disadvantage is obvious: If not split into several partitions, each directly attached storage unit is in the same way as the individual server not at full capacity. The superfluous capacity and computing power is used in the server for peaks for special occasions, such as internal monthly billing or external accesses for web-based purchase orders in the pre-Christmas period. And for storage it is reserved for corresponding write and read operations in other words there is a gap between the investment made and benefit achieved. According to analysts individual servers only run at about 15 to 20 % capacity and for storage the value is on average about 40%. In other companies, where the client/server infrastructure was implemented via the internal network, there were provided storage structures, in which several server units had access to the same storage arrays but separate for mainframes and Open Systems (Unix computers, and later also for Windows servers). However, the amounts of data to be moved in the local area network (LAN) became increasingly large, which was to the detriment of the transfer speed and caused data losses. Networks on the basis of the Internet protocol (IP), which had originally only been developed for the transport of messages [1], thus reached the limits of their capacity.
Why storage networks make sense
he need for a separated network for storage purposes only was reflected toward the end of the nineties in a separate technology for Storage Area Networks (SANs). The new infrastructure consisted of own cabling and a further development of the SCSI protocol, which was already used for the connection of various devices, such as storage arrays or printers to a server, and bears the name Fibre Channel (FC). The Fibre Channel protocol was specially developed for the transport of files. It is said to be reliable, and most recently with 8 Gbit/sec achieved a transport speed that even outperformed the Ethernet.
33
PC Client
PC Client
PC Client
Ethernet LAN
Server Storage Area Network Fibre Channel SAN
Server
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Tape Storage
A storage area network (SAN) constitutes a separate storage infrastructure which is only intended for data transport.
In an FC network special switches are given the task of connecting storage arrays with servers and also with each other. A switch works as a kind of multiple socket, to which various devices can be connected. [2] In contrast to the wide-spread image of Fibre Channel as difficult to set up and manage, specialists describe it as easy to handle. This is for example the opinion of Mario Vosschmidt, Technical Consultant with the American IT manufacturer LSI.
34
Section 4
During its time of origin this Fibre Channel architecture was particularly linked with the Californian company Brocade, which was founded in 1995. Today, this company is the market leader in FC switches, which work as the nerve center in an SAN and were equipped with more intelligence in the course of their development. This means that such switches can take on tasks within the network, such as zoning or virtualization. With their help it is possible to set up a Fabric, a structure that forms the core of a SAN. One particular aspect is the configuration of different storage zones (zoning). The administrator can define which devices and data are and which ones are not to be connected with each other. This serves to protect against unauthorized access both in and also outside a company. If a SAN is extended, additional switches and zones can be set up, depending on the existence of ports (connection for cable). The name Director has become widely accepted for larger FC switches with at least 128 ports. Brocade has taken over several providers (McData, CNT and Inrange), who were greatly involved with directors [3]. The intention of the manufacturer with these purchases was to strengthen it market position vis--vis Cisco. Cisco, the worldwide leader in Ethernet switches, has also had Fibre Channel solutions in its portfolio for several years and has thus positioned itself as a competitor against Brocade for Fibre Channel. Not for the first time in the history of information technology have the cards been re-shuffled between the companies involved a recurring development that is enhanced by a forthcoming new technology: Currently, Fibre Channel over Ethernet (FCoE) is being used to attempt to bring together the separate networks of message transport (Ethernet or TCP/IP) and data storage (Fibre Channel and iSCSI) to again form a common network. IP-SANs on an iSCSI basis would already be in a position to do this, but the communication and storage transport networks are mostly kept separate for performance reasons. A new FCoE network calls for standards and an agreement between the various providers. However, before this is finally the case, hard-fought conflicts rage to decide market positioning. Every manufacturer wants to be involved in FCoE, even if it has to switch over almost completely to new products. Some providers have obviously still not forgotten that a previous rival technology of Ethernet, named Token Ring, also lost the race because the manufacturers behind it concentrated too much on their core product and thus ultimately did not keep up with the competitors [4]. The historical service of FC-SANs, which are the prevailing storage infrastructure today in large companies and institutions, consists in providing efficient, fast transport services that are less susceptible to errors. Although the technology is simple in comparison with a classic network, problems frequently still occur in practice because server and network administrators are too unfamiliar with data storage. Compared with Ethernet, Fibre Channel has ultimately remained a niche technology, in which there is still a lack of standards in many places. And over the last few years false expectations have
35
in part been raised, because the obstacles (and prices) for FC training courses were set too high. The storage arrays attached in the SAN are mostly managed via the tools supplied by the manufacturers, which in turn only requires a short familiarization period and is directly supported by the suppliers [5].
Consolidate file services

pproximately at the same time as the FC-SANs, an alternative network structure came into being for the storage of data within the company network, which is particularly associated with the name of Network Appliance (today NetApp). A Network Attached Storage (NAS) denotes an integrated overall solution, which combines servers, operating system, storage units, file system and network services. For this purpose NetApp offers so-called filers, which support the file services NFS (= Network File System, originally developed by Sun) and CIFS (= Common Internet File System under Windows) and are especially suited for unstructured data. Whereas the files are saved in blocks in an FC-SAN, i.e. in small data blocks of about four to 128 KB, with the saving of file services we are dealing with related files. This makes handling easier for the administrator and direct access to the file contents is also possible. When storing in blocks, which derives from the physical splitting of hard disks into sectors and blocks, no data can for the time being be accessed via the start and end
DAS
Direct Attached Storage Application
NAS
Network Attached Storage Application
SAN
Storage Area Network Application
Network File System File System
File System
Network
Disk Storage
Disk Storage
Disk Storage
Every topology pursues a different concept but the goal is the same: protecting application data.
36
Section 4
The difference between SAN and NAS

Network SAN Fibre channel complex, expensive, closed system, secure FC, fast up to 200 Mb/s Rapid data transport All data Drives, resources Server (data center) All NAS IP simple implementation, cost-effective, open system, bear security in mind TCP/IP fast but very high overhead (up to 40% net) Simple implementation, open and fast communication over long distances Only files Files, stored content Clients (workgroups) Only disks
Protocol Optimized for Types of data Split Storage for Drives
of the files and their contents and structures cannot be directly accessed, either. All PC users know that their data is archived in certain files and folders and thus have a logical structure. They also know that their data are ultimately scattered over the hard disk after lengthier use they are fragmented (distributed, torn apart), because with each storage process free blocks are occupied first, regardless of the context of the file content. Consequently, the operating system needs an increasingly long amount of time to open files. First of all the various blocks have to be on the physical level and consolidated into one entity that is visible for the user. By using the command Defragment the suffering Windows user puts things in order again on the hard disk at least for a while. In an NAS the focus is placed on the network functions [6] and less on the performance of the hard disks used. Many users consider it a lower-cost alternative to an SAN. Whoever decides in favor of which version depends on a great many factors that are perhaps also very individual. David Hitz, one of the founders and now Executive Vice President of Engineering at NetApp, expressed a frank opinion in an interview: NAS and SAN are like two flavors of the same ice cream. NAS is chocolate-flavored and SAN is strawberry-flavored. And everything the customer needs to know about the two technologies is only that both systems can be used at any time for data storage. What intelligent person would be disturbed by the fact that the someone does not like chocolate-flavored ice cream, but prefers strawberry-flavored ice cream. [7] This somewhat flippant statement can also be interpreted in such a way that companies with SAN and NAS have two storage architectures to choose from, which can be individually adapted depending on their requirements. No-one needs to have any reservations.
37
Comparing the three topologies

Based on network technology Maximum number of supported appliances / HBA Vulnerability to downtime with external influences Price level Scalability Maximum distance to server Base protocol Source: Fujitsu Siemens Computers DAS No 15 Yes (copper) Low Bad 25 m SCSI NAS Yes / Yes (copper) High Relative / Ethernet SAN Yes 126 No (glass) Very high Very good 10 km FCP
A third version has been under discussion for some years now: iSCSI networks for storage units (also known as IP-SAN) obviously overcame the lengthy introductory phase a year ago and have achieved significant sales figures. The attraction of this architecture is its ability to use the existing TCP/IP infrastructure for data storage. This makes the installation and maintenance of a second infrastructure (only set up for storage) superfluous, and the administrators can fall back upon their existing IP know-how. In practice, however, there have been greater obstacles in the integration of the various tasks of the LAN and iSCSI storage network. Nevertheless, new prospects are the result of the new transfer speed of 10Gbit/sec for the Ethernet, because this technology is currently faster than Fibre Channel with only 8 Gbit/sec at present. However, customers incur additional costs due to the new cabling that becomes necessary. In the meantime, it is generally assumed that an iSCSI infrastructure is mainly suited for small to mediumsized companies and has found its true position there.
Section 5
Backup & Restore: an unloved compulsory exercise

Data backup is the same story as with many necessary tasks, but which are ultimately only performed half-heartedly or not at all because they require additional time or more money. A motorcar or a bicycle is often only used until it no longer works based on the motto Nothing will go wrong. And if something happens, for example the brakes or the steering suddenly fail or a tire bursts, then it is usually too late to avert disaster. Modern IT is similar: In actual fact, copies should be made at all times of every piece of stored information and kept ready for emergencies. However, companies that behave according to this ideal are the exception. Since everything that does not appear to be directly necessary for the day-to-day business or cannot be immediately converted into hard cash, is not given the attention that would actually be appropriate to the matter. Corporate IT and the need for data backup have been in existence now for about 40 years, but surveys and analyses still result in glaring errors in this basic discipline. That ought not to be so: creating an efficient and automated remedy is no great conjuring trick.
he penetration of society and the economy with IT has only just begun. Increasingly more parts of daily life from communication and information procurement right through to healthcare are dominated by IT systems, and economic processes today depend on electronic support in almost all branches of industry and in all sizes of companies. This interdependence of business processes and electronically generated and processed information makes it absolutely necessary for companies with requirements of all magnitudes to ensure secure data storage.
40
Section 5
Fundamental backup terms

Backup: Regular data backup to enable access to data in the event of data loss. Restore: Recovery of data via backups (irrespective of the medium) to the time when the backup was originally created. The result provides a physically intact volume / file system. The result does not necessarily provide data, with which applications can be started (particularly with databases). Basis for recovery. Recovery: Restartable restore of data and systems. For databases retrace with the help of redo logs to the most current version possible. Reset to the last possible point of consistency. Distinction in backup volumes: A full backup (complete data backup) is on account of the time required only performed at greater intervals and constitutes the basis for the following backups, which only save the extensions and changes to the original database that have taken place in the meantime. Incremental backup denotes a backup of the data that have arisen anew or been changed since the last backup (regardless of whether it is an incremental or a full backup). A differential backup always specifies all the changes after the last full backup. In this respect, a differential backup needs more storage space than an incremental one. Further specifications exist depending on the backup software used [1].
41
Backup and recovery strategies

artner analysts assume that a fundamental change in significance of the data backup is currently taking place [2]. The most important factors to trigger a realignment of backup and recovery strategies can be specified as follows: Local threats, such as fire, extreme weather conditions and failures in hardware, software, networks and media endanger business continuity. Terrorist attacks and threats have featured highly on the agenda in many countries since September 11, 2001. Power failures of a greater scope, such as those that have occurred in many countries over the last few years, can also have an impact on data backup. In the age of globalization and the Internet most companies cannot afford any interruptions and downtimes in the infrastructure. At the same time, the time slots for backup and recovery are becoming increasingly tight, because work is performed on a 24/7 basis in various locations. Unstructured data (e-mails, files) grow in a particularly disproportional manner, while classically structured data (databases, business applications) only have comparably modest growth. However, according to surveys performed by BITKOM, the German Association for Information Technology, about 50% of the data saved is not used. This calls for measures to move these unused data to less expensive levels of the storage hierarchy or to remove them from a productive context and archive them earlier. In many branches of industry, such as banking or healthcare, retail trading and mechanical or automotive engineering, efficiently organized data backup processes have always been decisive for a companys market image. Data loss in banks, insurance companies or airlines, for example, will have a relatively fast impact on the business success of these companies. This is why data backup always has to be performed with state-of-the-art methods and improved so as to continue to remain competitive. And last but not least, IT infrastructure has to be equipped to meet rapidly changing business and technical requirements. Therefore, backup and recovery processes should be planned on a long-term and flexible basis and not at the expense of the productive processes of IT.
42
Section 5
How does data loss come about?
Hardware or System Malfunction 44 %
Human Error 32 %
14 % Software Corrupton or 3 % 7% Program Malfunction Natural Disasters Computer Viruses

Software errors and viruses are relatively rarely the cause of data loss. Most faults are caused by the hardware or system, followed by human error.
n addition to the aforesaid external influences, the direct reason for data loss is software errors, which impair data integrity or can even cause entire systems to fail, as well as various hardware errors, which can go from the power supply units and processors [3], via hard disks [4] to other components and even to redundant assemblies (on a double or multiple basis) such as hard disk arrays. Added to these are user and operating errors, which even proficient administrators can make, whereby in this regard a great deal disappears under a dense veil of silence. Which IT department and which company gladly admits to having done something wrong? Regardless of the imminent disasters it is frequently these technical errors of everyday IT or simply the end of the life-span of the components and media used, whose sudden expiry can also mean the demise of the data stored upon them. A separate branch of industry, which has become greatly centralized over the course of the last few years, looks after the recovery of data storage systems of various types [5]. In order to specify the durability of hard disks their capacity and expected life-span are expressed in the term MTBF (Mean Time Between Failure), which is an extrapolated value for the probable downtime of a drive. Many manufacturers specify values between one and a million hours for high-end drives, which would mean a biblical life expect-
43
ancy of 114 years (= one million hours). The underlying tests assume a very high number of parallel disks, on the basis of which possible failure rates are calculated. However, the implied optimal conditions are in practice the exception so that real failure rates can be very high. Redundancy through disk system arrays (see section 3) and sophisticated backup mechanisms have to prevent this. Since the term MTBF has on account of its inaccuracy increasingly come under criticism, other units of measurements, for example AFR (Annualized Failure Rate), are used today. It is established in the same way as the MTBF, but specifies the anticipated annual failure rate as a percentage of the installed number of disks. If 8.7 out of 1000 disks fail a year, the annual failure rate or AFR is 0.87% [6]. The average life-span of hard disks is at present three to five years, in individual cases even longer. Companies should only rely on longer time periods if they use automatic error controls with direct notification of the manufacturers service, which depending on the stipulated Service Level Agreement (SLA) ensures a replacement before the disk finally fails. As capacity increases, disks become more and more inexpensive (in summer 2008 Seagate announced a 1.5 TB disk), and the magnetic tapes that are still used in storage are becoming more efficient. For example, their throughput has in the meantime risen to more than 500 MB/sec, while capacities are also clearly on the increase and now lie at about 1 TB (LTO 4). The life-span of magnetic tapes is specified as up to 30 years for DLT/SDLT and LTO and is thus clearly beyond that of hard disks and solid state disks [7].
Backup architectures / Storage locations for backup data
he consequence of this development in the price/performance ratio for the backup architecture is that tape continues to move further toward archiving at the end of the tiered-storage chain. An ultimately futile dispute has even developed as to which is the better solution for backup disk or tape. This dispute is futile because it is the particular requirements of each company that matter. If the proportion of data that has to remain permanently accessible for a longer period of time is very high, it is advisable to use special forms of a disk backup before the data is finally moved to tape. For the backup there are various forms how physical data backup can take place: Backup to Disk / Disk Libraries: Storage on disks is faster than write accesses on tapes. The data is usually saved on low-price SATA disks and not on cost-intensive Fibre-Channel disks. Backup to Tape / Tape Libraries: Data backup on tapes is a low-cost option of storing data. Compared with disks, tapes have the advantage of a longer life-span. However, access is slower. [8]
44
Section 5
Disk to Disk to Tape (D2D2T): This version takes various requirements into account by also considering data backup on magnetic tapes in addition to the short-term backup on disk as the next stage. If two disk levels are used within a storage array, which combines several expensive and cheaper hard disks, such as Fibre Channel or SAS, on the primary level and SATA on secondary level, price advantages can be achieved because only one array needs to be procured. Merely the time of the backup on tape has been delayed. Virtual Tape Libraries (VTL): VTLs represent the ultimate in storage. Here we have a combination of fast access times from disks with low-priced tapes. A virtual tape library (VTL) is a storage system on the basis of a disk array, which outwardly emulates a tape library. As a result, it is possible to integrate backup-to-disk concepts in existing data backup environments, which are usually based on tape drives. For connected computers a VTL presents itself like one or more tape libraries. First the data is temporarily stored on disk in order to keep the backup window to a minimum. The data backup is subsequently effected on the lower-cost tapes. To define Service Level Agreements (SLAs) for backups the following terms are often used today: RPO (Recovery Point Objective) The term RPO (Recovery Point Objective) is used to describe how much data loss the company can afford at most. The interval between the individual backups takes its bearings from this. For example, it follows from this that banks have zero tolerance toward failure, whereas other branches of industry can cope with this better.
RPO Disaster
RTO
Weeks Days Hours Mins Secs
Secs Mins Hours Days Weeks
The gap is narrowing RPO = Recovery Point Objective, RTO = Recovery Time Objective RPO = Data Loss RTO = Downtime Source: Fujitsu Siemens Computers RPO = The amount of data which has to be recovered after a data outage in order to be able to resume business as usual (measured in time). RTO = The maximum recovery time which can be tolerated before business must be resumed.
45
RTO (Recovery Time Objective) RTO concerns the question as to how long it takes to recover lost data or restore them in the system? The scope may be restricted by statutory or institutional specifications. Taken as an example here is healthcare, in which access to electronic patient data has to be fully guaranteed particularly in emergencies. Continuous Data Protection (CDP) CDP means that every change in the data is also simultaneously made in the backup. In CDP the RPO (Recovery Point Objective) is set to zero, because each change immediately triggers a storage process. In this regard, CDP is the realization of the backup ideal: Everything is saved immediately and in its entirety.
Extended backup options

ince the backup windows are becoming increasingly small, there are various technical approaches to meet these requirements. Using points in time or snapshots enables a backup to take place in a relatively small time slot. Here the data is prepared as a copy at certain times during ongoing operation in order to run the backup from this copy irrespective of productive operations. However, a snapshot is also stored on the primary storage system and if this fails, this form of data backup is also no longer effective. The backed-up data volume can be reduced through compression factors, which saves space and costs. More recent procedures, such as Single Instance and Data Deduplication, make a direct search of the data volume to be stored and sort out double or multiple datasets during the backup process. Such findings are replaced by pointers, which can entail savings figures of far more than 50%. A pointer is only a reference to the originally saved file so that it should not be saved twice or repeatedly.
Single Instance Storage Single Instance Storage describes how double files are detected and saved once only. Data Deduplication is already classed as one of the great inventions in storage technology [9]. Redundancies are not only detected at file level here, data deduplication also takes place at segment level. This can take place in two ways [10]:
46
Section 5
On the target Deduplication takes place on the storage medium itself, which helps keep the data volume to be stored to a minimum, but the entire storage process is extended as a result. At source In addition to the low data volume the other advantage here is that the reduced quantity of data can be transferred more quickly. This is relevant for branch offices, because in part only analog data links are available here.
Backup versus archiving

ompared with backup, the archiving of data is often restricted by the parameter time permanently versus temporarily. That may be applicable in many cases, because weekly backup tapes possibly end up in the data bunker marked with a label saying Archive. Nevertheless, such a distinction and in particular the associated practice is inadequate, because both cases could not be more contrary. Backup is about protection against data loss, whereas archiving deals with long-term storage in order to meet e.g. statutory requirements. Backup is normally used in a 1:1 ratio to
FSC Business Continuity Model Requirements

Data Class Description Availabilty Unscheduled downtime Planned downtime RTO (Downtime) RPO (Data loss) Archive Access Time Backup Success Rate Offline import response time 1 2 Mission Critical Business Critical > 99.99 % < 1 h/year < 1 h/month < 15 Min. < 1 hour Seconds 97 % < 30 Min. 99.9 % < 10 h/year < 2 h/month 1 hour 12 hours Seconds 95 % < 45 Min. 3 Business Important 99 % < 100 h/year < 8 h/month 8 hours 48 hours < 4 hours 90 % < 2 hours 4 Non Critical 97 % No commitment Intermittent 24 hours 96 hours 2448 hours 90 % Not specified
47
restore, in an archive you should be able to find data relatively easily and quickly without having to perform a restore. The copies of the original data mirrored on the backup media are usually only stored for a shorter period of time and are replaced on a permanent basis by more up-to-date, write-related copies. As they are actually not intended for use, only specific requirements arise, such as currency or completeness. In the event of a disaster all information before the oldest existing backup and after the last one is irretrievably lost. However, these data backups must also not be kept for a long time because the copies are only needed in the event of a disaster. You will seldom have to restore a data backup that is a few months old. Archiving is an altogether different matter. This storage procedure when information leaves the productive process is about preparing data on a separate medium and making it available for later use. Archiving media are not put to one side like backup tapes, because their use is only taken into consideration in extreme cases. Whoever archives, has reuse as his objective at some later point in time. Archiving can be done of its own free will the bandwidth ranges from precautionary storage for as yet unforeseeable purposes right through to acquisitiveness or because legislation, the banks or other institutions have issued mandatory regulations or at least recommend archiving. Whoever no longer wants to use the archived data later and also does not have to comply with regulations concerning any possible resubmission, should also consider deletion which saves resources, space and costs.
Section 6
Storage Management making complex storage networks manageable

One would think that after over 40 years of server and storage technology the handling of such infrastructure on a daily basis and their specific applications would be easy. Thats the theory. However, the fact is that data quantities are constantly increasing and the storage landscape has become more complex and comprehensive. The numerous company takeovers have resulted in users operating various IT islands with different operating systems, device parks and applications. In the meantime there are at least four major storage architectures which require different levels of know-how: DAS (Direct Attached Storage), SAN (Storage Area Network), NAS (Network Attached Storage) and IP-SAN (iSCSI), whereby the last two are based on TCP/IP protocol. The next architecture is on the horizon one that is only partially compatible with the above installations, namely Fibre Channel over Ethernet (FCoE). This could return the world of storage to a more standardized environment. But management does not become any easier as a result indeed to the contrary. So far each manufacturer has provided his own management tools for his products yet generally without any standards except for SMI-S.
he confusion in storage management software can possibly be overcome by keeping to the tools supplied by the manufacturers. Integrated configuration tools, web tools or component managers help you at the start but do not cover the entire planning of the overall architecture and its elements. The large storage management suites and the administration of the entire IT infrastructure as offered by some manufacturers requires a great deal of experience and thus are only possible for large-scale companies on account of its complexity. Planning and monitoring storage installations should be set up systematically right from the beginning and includes a constant list of all the phases, stages and modifications. This is necessary as otherwise chaos would occur when staff change their jobs. The IT Infrastructure Library (ITIL) with its standard requirements and procedures is an important aid in documenting such processes. Such standards are also of assistance in the discussions between manufacturers and customers or during company mergers
50
Section 6
when different IT worlds have to be united. ITIL has issued a series of publications which are aimed at helping a company to structure its IT and business processes. For example, Change Management describes the procedure for the continuous management of modifications which replaces simple but irregularly updated Excel tables and which is based on ITIL and the management tools supplied by the respective manufacturer. It is thus possible to avoid right from the start any storage wilderness with all types of products that are only flimsily interconnected. Such a lack of clarity results in errors and failures and thus extra overnight work for employees. However, if standard solutions or Storage out of the Box are used, the overall management is much simpler due to the use of software that is already suitable. Medium-sized companies have fewer financial resources which can result in them using quicker yet not fully tested implementations for long periods, longer than is financially viable. It is not a coincidence that it is these companies which choose technologies that they know or at least seem to know: DAS is well known today with SMEs, and iSCSI is widespread here thanks to its close relationship with LAN architecture. As long as these are well-tested and proven solutions, then such cautious behavior is certainly not wrong; it just makes the company less flexible than their larger competitors. The latter can afford to have well-trained employees and longer planning phases which enables them to text new and more effective technologies for a longer period and then apply them productively. This particularly applies to information management which requires both investment and know-how. Medium-sized customers can usually not afford most of the tools used in this sector. Their purchase would also hardly be sensible when compared to the data quantities that have to be managed. Data management is thus frequently without good planning and basically chaotic. As a range of Windows licenses already exists together with unused servers, non-structured data, such as Office documents tends to be saved in a non-systematic manner with all the ugly consequences regarding classification, indices and search options for such documents. Larger companies are one step further and provide dedicated servers in a SAN. As a SAN has already been configured, it probably also has data which could be saved more cheaply on other storage levels with somewhat less performance that would be chaos at a higher level. When the IT department realizes the mix-up, it usually adds its own negative element, i.e additional NAS filers are now created so that the non-structured data can be saved suitably. And then the question arises at some time about merging or integrating the various stand-alone storage solutions where sometimes the block level (SAN) and sometimes the file level (NAS) is the dominating element. They also have their own solutions which in turn require more investment and basically add yet another complexity level to the overall storage architecture.
51
Storage Management
(1) Storage Resource Management: SRM (Storage Resource Management) initiatives began in earnest in the late 1990s. This was a Unix, Windows and later a Linux market which held great promise 2-4 years ago but has faded in recent years. There were several reasons why the 20+ SRM companies faded and lost momentum: 1. SRM products had a hard time moving from a reactive, reporting tool to a proactive tool that could make decisions and take actions based on user-defined policies, 2. SRM products were mainly homogeneous, thus failing to provide support for heterogeneous environments, and 3. SRM products only dealt with disk space allocation and lacked any insight into disk performance issues. SRM users were worn down with all the alerts and decisions that they had to perform manually. Todays reality is that organizations will need to integrate a variety of vendor and homegrown tools. Storage organizations must accept that the structure of storage is going to be split up by vendor and type of array and that organizationally, minimizing the number of vendors and storage pools is one way to reduce storage administration overheads. (Fred Moore, Horison Information Strategies) (2) Information Management (Information Lifecycle Management) The discipline and function of oversight and control of information resources. Information management services: The processes associated with managing information as it progresses through various lifecycle states associated with a Business Process. These services exploit information about data content and relationships in making decisions. Examples include records management and content management applications. (SNIA Dictionary)
The consequence is that, due to such structures, storage management becomes difficult as problems, failures, hardware and software faults cannot be immediately identified. Counter-measures are usually taken too late in such situations. However, monitoring, reporting and constant error analysis could be implemented with software support whereby such tools are usually supplied by the manufacturers. If many different components are used, storage management is often faced with the problem of controlling such heterogeneity.
52
Section 6
IDCs Storage Taxonomy
Storage
Storage Hardware Storage Software Storage Services
Storage Systems
Storage Mechanisms
Data Protection and Recovery
Storage Management
Storage Replication
File System
Other Storage SW
Disk Systems Tape Libraries Optical Jukeboxes Storage Infrastructure
HDDs Tape Drives Optical Drives Removable Media
Archive and HSM
Storage InfraInfrastructure Storage Device Management
Consulting Implementation Management Support
Source: IDC
According to IDC, storage can be devided into three main groups, which various fields are assigned to. Storage Management is only one of many other areas.
Management elements are sufficient supply of storage for users, departments and applications as well as provisioning. The latter is understood as providing and if necessary procuring on time the storage required: sufficient storage arrays and hard disks must be available to meet all these requirements. If, for example, seasonal peaks, such as Christmas business, must be taken into account, this would mean low system usage during the other periods which could be 20 to 30% below actual capacity. The technology known as Thin Provisioning has been developed to counteract this problem. These are procedures which plan the flexible, changing assignment of storage for various applications: capacities are assigned and removed depending on capacities [1]. Stranded Storage is storage which was originally provided for specific applications or users but not used after all. The objective of Thin Provisioning is basically to use stranded storage capacities again. Thin Provisioning also uses virtualization [2] via which different physical hard disks and disk arrays are combined to logical or virtual units. This enables a very exact assignment of storage space and you no longer have to consider the maximum and minimum physical disk limits. The IT department should remain independent in order to free itself of any blame and be able to refer to the other manufacturer who has forced the user to take on a
53
specific SAN, NAS or iSCSI infrastructure. A Fujitsu Siemens customer has gone his own way and has deliberately selected and configured separate stand-alone solutions for different architectures: a SAN for mainframes including mirroring for a second (backup) data center, a virtualization solution for SQL and mail server based on Windows and Linux and finally a NAS for FlexFrame for SAP with Fujitsu Siemens servers and NetApp storage. All three sectors are separately managed whereby the required extra effort guarantees that other areas are not automatically touched in an error situation. In other words: the often vilified silo solutions can certainly make sense in certain customer situations just like DAS systems. This approach can also be seen in the use of software: it is often the case that too many tools from various sources are used instead of relying on a single provider. Whereas storage systems such as SAN, NAS or IP-SAN can be configured by the supplied software, a central tool should be used for the subsequent administration which thus enables end-to-end management. Almost all the storage manufacturers today offer the appropriate programs.
Heterogeneous storage networks standardized management
n a SAN (Storage Area Network) on Fibre Channel basis (FC) the main components in the network are FC switches whose protocol services are easy to configure and only have a few error sources. The second part of the SAN is the storage systems which are usually monitored and controlled via the supplied administration tools. However, the problem in many SAN environments is the know-how and experience of the network administrators, who come from the classic Local Area Network (LAN) sector and do not know enough about data storage topics. The alternative network NAS (Network Attached Storage) was developed with the aim of offering classic file servers and storage systems in one package. A management software is also integrated which, above all, handles the transport of files from the servers to the storage systems. Such an environment frequently has less value placed on the performance of data access and the hard disks involved. If the NAS servers are connected to powerful FC storage systems in a SAN, the amount of administration effort rises. The third version is IP-SANs which today are usually used as iSCSI storage networks. Such networks which are based on known LAN technology, are after a while more popular especially in medium-sized companies, but also in international companies with more than 1,000 employees. Of course, IT departments have a wealth of experience due to their many years of work with TCP/IP networks that are in use everywhere,
54
Section 6
but it would be a mistake to believe that storage data can simply be moved around in addition to the existing network paths. A separate IP infrastructure with the mounted storage systems must be set up in order to have a working IP-SAN [3]. Storage management itself is above these transfer techniques and consists, first of all, of the tools supplied by the manufacturer, also known as element manager. A web server is installed for easier operation so that the administrators can access a browser. More complex networks are monitored and controlled via monitoring services which connect many storage components such as HBAs (Host Bus Adapter), switches and storage systems via interfaces. Some examples are Control Center from EMC, BrightStor SRM from CA or Sanpoint Control from Symantec. In the past it was frequently the case that storage equipment from the various manufacturers did not understand each other in the network as they either do not have suitable interfaces (APIs = Application Programming Interface) or they were not compatible with those from other manufacturers. In order to make storage networks userfriendly for the companies, the SNIA (Storage Networking Industry Association which includes almost all the storage manufacturers) organized a so-called management initiative in order to create a standard for all devices. Many years of work were spent by the SNIA boards before they submitted a standardization proposal known as the Storage Management Initiative Specification (SMI-S). The manufacturers now work much closer together, swap information about their interfaces and ensure mutual licensing. The communication interfaces covered by the SMI-S now often have priority over the proprietary APIs. According to Frank Bunn, SNIA Europe and storage specialist at Symantec, SMI-S has an important long-term influence on storage management: SMI-S is not an overnight solution. It is a constant process which began as early as 2002. SMI-S creates better interoperability. It enables the standardized management of different products and provides a consistent view of the SAN and NAS environment. Users are often thus very enthusiastic as they can finally see their entire storage environment reflected in SMI-S. Customers often do not even know which storage equipment they have, let alone the quantity. But that is just the first step. The second step can greatly facilitate the management of storage systems. [4] As far as Bunn was concerned, the subject of SAN was previously predominantly controlled by the larger companies. Small and average companies were more skeptical according to the motto too complex, too expensive and doesnt work anyway. Bunn: And they were right more often than not. However, SMI-S does make SAN management much easier. Partners can thus implement and support storage networks who are not complete SAN specialists. The SMI-S versions 1.1 and 1.2 take Fibre Channel SAN as well as NAS and iSCSI into consideration which greatly expands the environment for integrators.
55
However, the standardization process has not been completed despite the many years of effort. Not every manufacturer is implementing the adapted interfaces in their devices which would be compatible to the other providers. The mutual licensing process often takes much longer than is really necessary as far as the topic itself is concerned. Furthermore, various SMI-S versions are in use. These circumstances have resulted in user acceptance not being particularly high.
Section 7
Virtualization some catching up is necessary regarding storage topics

Everyone worldwide is talking about virtualization. This technology is meanwhile said to be truly fantastic and the corresponding product announcements in this sector are immense. However, many users are still wary regarding the use of virtualization in their own company. IDC market researchers determined in a study carried out in 2008 that only 33% of companies in Western Europe have introduced server virtualization from VMware or Citrix XenSource and 67% have done nothing yet in this sector. These percentages are even lower in storage; only 15 to 20% have already looked at virtualization. The utilization of storage systems is normally very low. The degree of utilization can be increased very quickly with virtual systems where several devices or hundreds of hard disks are linked to logical units. The potential is enormous and a great deal of money could be saved.
ccording to IDC, companies are holding back from using virtualization in their storage environment as they do not regard this as absolutely essential. Storage is often regarded as unavoidable for data storage, saving, backup or archiving which does not contribute in any way towards increasing productivity or improving operational procedures. Yet the experience gathered as a result of successful server virtualization can also apply for storage. Many users follow exactly this procedure in practice: they expand virtualization to other sectors on a step-by-step basis. Virtual storage space can help companies to use existing storage resources efficiently as well as centralize and simplify administration. It was the x86 sector which drove virtualization forwards as the average system utilization was very bad. To Dr. Joseph Reger, CTO (Chief Technology Officer) at Fujitsu Siemens Computers, this is all very obvious: Approximately 10% utilization is simply a bad value compared to existing systems such as mainframes or Unix platforms. [1] Of all the options available to improve the situation it is hardware virtualization which is the best: The reason is because the superior layers no longer need to take care of this, says Reger. By superior
58
Section 7
layers we mean everything beyond the hardware layer, i.e. operating systems and applications that also benefit from virtualization. Reger continues his explanation: By simulating various machines pieces of hardware that do not even exist the result is a peculiar layer of transparency. With this technology operating systems and applications need not even know that they are running in a virtualized environment. Consequently, the average degree of utilization was dramatically increased and a great deal of money was saved. [2] According to Reger there are in principle three large areas, in which virtualization technologies can be applied today: hardware virtualization, operating system virtualization and application virtualization: With the first group it is a matter of pretending as if we have more hardware than is actually available. This applies to server and storage in the same way. You simulate virtual levels of hardware that do not even exist. Physically, the hardware exists once only; virtually, however, more is made of it. This means that the operating system does not know it is running on virtual instances from the view of the operating system there really are ten different servers or storage arrays. If the operating system is virtualized, this means that the application thinks there are several instances of the operating system, whereas in actual fact only one is running. [3] By means of thin provisioning it is for example possible to have more logical address space available in a storage array than is physically available. Storage virtualization is in part misused as a term by the IT industry. It originally meant the mapping of storage resources for the servers or applications involved, i.e. the consolidation or regrouping of physical storage units to form logical ones. Today it is usually used for the allocation of random storage resources, including data replication mechanisms. This covers terms and technologies, such as volume management, virtual disk volumes, file systems, virtual tape, virtual ports or virtual SANs: something they all have in common is the approach of separating the physical view from the logical one, i.e. splitting physical memory into partitions or consolidating several physical hard disks into one or even various logical units. When dealing with virtualization within the storage network (SAN device virtualization) itself, i.e. about virtualization at switch infrastructure level, a distinction should be made between three different approaches: 1) With so-called in-band virtualization the control entity for the data connection, meta data and data transport itself is on the same appliance. The scaling of these solutions is determined by the transport performance of this appliance. The providers FalconStor and DataCore were among the first manufacturers to have offered such solutions. 2) With out-of-band virtualization a single appliance takes care of only the meta data and of controlling the data path, while the host and server respectively organize the transport of the storage data to and from the storage devices.
Virtualization some catching up is necessary regarding storage topics
59
3) The third approach consists in the separation of the control entity and data path, which is done by an intelligent network device. This technology is known as Split Path Architecture for Intelligent Devices (SPAID). Switch manufacturers like Brocade and Cisco provide suitable devices for this purpose. The separation of the instances as performed here results in increased data transport speed and enables scaling of the concept. Such virtualization solutions normally have two goals. The first goal is to remove the constraints of a storage array and/or a manufacturer. The second goal is to provide manufacturer-independent data services, such as pool building (consolidation of storage capacities of the same service quality), data replication such as data snapshots, remote mirroring or disaster recovery. Something all these virtualization solutions have in common is that they permit a coordinated selection of the optimal storage arrays for a certain task. In this way, storage resources can be made available at will and dynamically changed independent of the storage array. Only the basic configuration of the elements to be virtualized is still performed by the proprietary management applications. Therefore, the users have to use this element manager together with the selected virtualization software. The majority of the users of virtualization solutions use them in the sense of improved storage management. Virtualization increases the freedom of choice of the users. You can use solutions from several storage array manufacturers together in a pool. Storage systems are frequently already completely partitioned during first installation and afterwards are only managed via virtualization: The approach has proved itself to be successful for years, particularly for very dynamic environments, for example with service providers or users with a large number of small application islands, such as in public administration. Wherever high scaling is required for thousands of LUNS and hundreds of servers the use of split-path technology is absolutely necessary. [4] In contrast to server virtualization, storage virtualization has not yet made the breakthrough. Neither is a market standard becoming apparent. This is certainly also due to the fact that with the LUN concept every SAN storage array already has rudimentary hard disk virtualization. However, importance is increasingly attached to online storage virtualization in virtualized server environments [5]. It is a different story entirely with file-based storage systems, the NAS systems. There are a number of very promising approaches for file virtualization, but a market standard has not established itself here, either [6]. The most progress can be seen in storage virtualization with magnetic tapes. When its comes to backup on tape, Virtual Tape Libraries (VTL) are at present best practice. CentricStor VT is currently the leading virtual tape product in data centers.
Section 8
The storage strategy of Fujitsu Siemens Computers and its partners

Fujitsu Siemens Computers is Europes largest storage provider and offers its customers everything from one source all that they need for their storage infrastructure. In addition to a wide range of products from the leading manufacturers, such as EMC, Brocade, NetApp, CA, Sun and Symantec, Fujitsu Siemens Computers has developed its own range of storage products designed to meet the particular requirements of their users. The offer also includes a range of sophisticated services from maintenance to Managed Storage.
ntil recently every application had its fixed allocated infrastructure, a collection of servers, network components and storage systems. But now the aim is to allocate only those infrastructure resources that the applications actually require. This objective is achieved via virtualization: Large resource pools exist which can then be used dynamically depending on the requirements involved. Fujitsu Siemens Computers is moving in this direction as part of its Dynamic Infrastructure strategy of which storage is an integral part.
CentricStor VT: intelligent tape storage virtualization
ne example for this process is CentricStor VT. CentricStor VT (Virtual Tape) enables intelligent backup of all company data. The dual target storage solution integrates the advantages of hard disk and magnetic tape as the backup data can be stored on a wide range of storage media based on rules and autonomously. Working with tape is thus fit for ILM (Information Lifecycle Management), and corporate data backup can be based in a flexible manner on different Service Level Agreements (SLAs). True Tape Virtualization (TTV) from
62
Section 8
CentricStor VT represents above-average connection opportunities and also enables a clear reduction in costs as it consolidates all the backup storage media used. The CentricStor Grid Architecture (CGA) has excellent scaling features and offers far-reaching reliability for each data center. The report The Forrester Wave: Enterprise Open Systems Virtual Tape Libraries published on 31st January 2008 by Forrester Research looked closely at 12 providers based on 58 criteria. The result is that Fujitsu Siemens Computers was found to be one of the market leaders for Virtual Tape Libraries. The Forrester Report awards top marks to the company for its strategy in the Virtual Tape Appliances sector: Fujitsu Siemens Computers is the leader in host support, architecture and tape integration. And continues: As the leading system provider in Europe the company has a solid basis of more than 300 customers and over 500 installed VTLs whereby most of the installations are with major companies. Fujitsu Siemens Computers has solid growth regarding new customers and has one of the highest growth values for existing customers. Version 4 of CentricStor was specifically developed in order to solve the following problems for data backup supervisors data quantities are continuously increasing. archiving periods for data are longer. a second data center must be provided in case of a possible catastrophe. The positive assessment by Forrester is also confirmed by a study carried out by the ex-Gartner analyst Josh Krischer: In 1999 Fujitsu Siemens Computers launched its CentricStor virtual tape solution, which became the most open, flexible virtual tape library supporting most existing mainframe and major Unix and Windows operating systems and major tape libraries. CentricStor was conceptually designed as an appliance to be used with all types of tape software, servers, tape libraries and tape technology as well. Initially it was released by Fujitsu Siemens Computers to support the BS2000/OSD, MVS mainframe platforms and ADIC tape libraries; however, over the years with constant certifications for major operating systems, backup applications, tape library systems and tape drive technologies it evolved into the most versatile VTL appliance, which can be deployed in almost any environment while protecting previous investments in tape automation. [1]
63
CentricStor FS: consolidated File Services

entricStor FS is an innovative, scale-out storage system which offers file services at high speed and is very reliable. It offers practically unlimited scaling regarding storage capacity, data throughput and access times. The administration of CentricStor FS is simple only one single file system must be managed irrespective of the size. The storage resources can thus be used more efficiently. Additional storage capacity can be easily provided and CentricStor FS can also be installed very easily. If a component fails, automatic failover and fallback ensures that the system keeps running.
Terri McClure, analyst working for the Enterprise Strategy Group (ESG), concluded the following in a report: The recently launched CentricStor FS from Fujitsu Siemens Computers is a high-end storage system on file basis via which scaling is possible at a very fine level, thus fulfilling capacity, availability and performance requirements as defined by file server consolidation initiatives and Web 2.0 applications. The use of standard components and the excellent cluster features make CentricStor FS a scalable easy-to-manage file storage solution with a low starting price that has been specially designed for the real world of increased file quantities. [2]
FibreCAT SX series: fast storage systems for the data center
he FibreCAT SX40, SX60, SX80, SX80 iSCSI, SX88 and SX100 storage systems offer quality and reliability as required in data centers. FibreCAT SX systems are very fast storage systems, in particular, the FibreCAT SX88, which is up to 50% faster than the FibreCAT SX80, and the FibreCAT SX100, which has twice the speed and performance of the FibreCAT SX80. This is due to Fibre Channel technology with 4 Gigabit per second, FibreCache and state-of-the-art RAID controller technology. Snapshots are integrated as a standard. The revolutionary FibreCap technology protects the system in case of power failure.
64
Section 8
FibreCAT SX systems are user-friendly and easy to put into operation. The administration is not complicated thanks to the intuitive web interface. The systems are suitable for a wide range of applications. The analyst Hamish Macarthur from Macarthur Stroud International said the following regarding the FibreCAT SX series: Managing and protecting the information assets of an organization is critical in todays markets. The systems in which the data is resident must be secure, reliable and easy to manage. The FibreCAT SX range supports reliable primary storage as well as the need for faster backup and recovery. The new arrays, with management tools included, will be a sound investment to meet the business and compliance requirements of small, medium and large organizations.
FibreCAT NX40 S4: reliable file and print server and SAN/NAS gateway for medium-sized customers
he FibreCAT NX40 S4 is a file and print server and a SAN/ NAS gateway for medium-sized customers. The systems quality and reliability provides a wide range of application usages in departments and branch offices. The FibreCAT NX40 S4 is a rack model with only 2 height units (2 HE). The system can be equipped with SATA-II or powerful SAS hard disks and provides, when fully configured, internal storage capacity of 4.5 terabytes. Additional interfaces (SAS, Fibre Channel) increase the system capacity via external storage systems and it then operates as SAN/NAS gateway. Expansion with the FibreCAT SX/CX family is particularly simple as the required software is already pre-installed. The FibreCAT NX40 S4 is also available as an option with iSCSI functionality. The system uses the operating system Windows Storage Server 2003 R2 with additional functions for the file and print server sector. The FibreCAT NX40 S4 systems are pre-tested, pre-installed and pre-configured and thus ready for operation within minutes.
65
FibreCAT TX series: LTO for medium-sized customers
he FibreCAT TX series consists of the entry model TX08 and the two models TX24 S2 and TX48 S2 designed for medium-sized customers. The FibreCAT TX08 is a compact and reliable tape automation unit and enables a low-priced entry into the FibreCAT TX family. The FibreCAT TX08 is equipped with LTO technology which is synonymous for large capacity, high speed and very low media costs. It can be equipped with a LTO-2, LTO-3 or LTO-4 drive which is a half height unit and provides storage capacity of 12.8 terabytes (LTO-4 compressed). This corresponds to an automated data backup for approximately 2 weeks. The offer price includes a full version of the backup software ARCserve Backup as well as one years on-site service. The second generation of the magnetic tape systems FibreCAT TX24 S2 and FibreCAT TX48 S2 means that backup to tape, in comparison to hard disk, is usually more energyefficient as power is only used when data is read or written to tape. Both systems provide SAS and Fibre Channel interfaces to the server.
FibreCAT Solution Lab
ujitsu Siemens Computers not only offers its own products but also has lots of storage competence within Europe. In addition to the numerous storage experts supporting projects on-site, the FibreCAT Solution Lab in Paderborn is a center of expertise. The main task of the FibreCAT Solution Lab is to constantly monitor quality with regards to a reliable system and component availability. This also includes coordinating the seamless integration of the various FibreCAT systems into the comprehensive solution portfolio offered by Fujitsu Siemens Computers. Other tasks include adhering to all legal standards regarding environment protection and DIN ISO Standards and implementing them on time. On request the FibreCAT Solution Lab can also carry out customer-specific tests and create realistic application scenarios. Fujitsu Siemens customers can thus see the FibreCAT systems operating in live demonstrations.
66
Section 8
Managed Storage
rowth in storage means above all an immense increase in data quantities which must all be managed, saved, provided and stored. The demand for online storage capacity increases as does the demand for backup storage volumes. Any limits to such growth rates are not appearing on the horizon. Against this backdrop SAP asked itself whether they wanted to continue managing the required storage volumes themselves or to practice what their own hosting experts recommend to their customers, namely outsource the work that is not part of their core competence and concentrate on the important elements essential for their core business. SAP managers would thus place their operations and support regarding processsupporting storage infrastructure into the skilled hands of external people. SAP has found such a competent partner in Fujitsu Siemens Computers. Fujitsu Siemens Computers took on the role of general contractor for SAP and the entire responsibility for providing online storage capacity for data backup 4 petabytes monthly at the start and then moving up to more than 200 terabytes daily. Furthermore, appropriate reserve capacities were provided in order to meet any additional requirements in time. Fujitsu Siemens Computers thus supports one of the largest Managed Storage projects in Europe and also manages the cooperation activities with the strategic partners involved, namely EMC and NetApp which provide products in the SAN and NAS environment and whose specialists are involved in the corresponding service sector.
Storage one part of the whole picture
t is not enough in the storage world to simply move a few levers to control this vast flow of data. A paradigm change is required instead of individual steps. The job involved is to connect up the various technologies into end-to-end solutions making them easy to manage as well as operate. Storage is not a solitary topic but certainly an important element in the overall dynamic data center. Fujitsu Siemens Computers has thus developed a strategy known as Dynamic Infrastructure. The storage solutions from Fujitsu Siemens Computers are a fixed part of this strategy and are components used in implementing the objective of providing end-to-end IT solutions that generate a maximum business contribution with minimum total costs. FlexFrame for mySAP is one of the most successful and best known examples of such solutions for a virtualized infrastructure consisting of servers and storage systems.
67
Customers benefit even more from the fact that Fujitsu Siemens Computers work together very closely with many partners on storage matters. One example of such excellent cooperation can be seen in the quotes from the CEOs of our partners EMC and NetApp: Joe Tucci, EMC President & CEO: The combination of EMCs networked storage solutions and the server-based solutions from Fujitsu Siemens Computers creates a wideranging offer of end-to-end infrastructure solutions that meet the requirements of our customers. EMC solutions play a central role in the Fujitsu Siemens Computers vision of a Dynamic Data Center, and we will continue to concentrate our joint operations on offering our customers the most comprehensive solution portfolio available on the market. Dan Warmenhoven, CEO at NetApp: The strategic partnership with Fujitsu Siemens Computers has contributed a lot to our success and is still growing in the EMEA region. Our jointly developed solution FlexFrame for mySAP Business Suite, the implementation of a fast backup solution for Oracle and the Center of Excellence which has been set up with Oracle are just some of the excellent examples that have resulted from our cooperation so far.
Forecast
Future storage trends
C
1. 2. 3. 4. 5.
ompanies are facing many challenges regarding storage systems. In particular, the continuous increase in data results in demands for new ideas on how to make storage more efficient. In contrast to PCs and servers where the main market players define the standards, the storage market has innovative companies joining in the fray which have unusual new ideas that move the market in new directions. Having looked at the main storage technologies in the preceding sections, let us look at current storage trends and see which ones have the potential to make a major breakthrough in the way that a company stores its data. We have selected five relevant trends: Fibre Channel over Ethernet (FCoE) Massive Array of Idle Disks (MAID) Flash-based Solid State Disks (SSDs) Security (Encryption of Data at Rest) New Storage Architectures (Grid Storage)
1. Fibre Channel over Ethernet (FCoE)

lthough two of the three leading storage technologies (NAS, iSCSI) are based on Ethernet, it is the third, SAN, which by far dominates the market. SANs are typically implemented on a fibre channel basis. Fibre Channel networks require dedicated HBA (host bus adapters, i.e. server plug-in cards) and switches. These network components are clearly more expensive than Ethernet cards and switches and they also require specific fibre channel know-how. The question thus arises whether SAN storage networks can also be set up on an Ethernet basis so that only one type of network is needed as this would greatly reduce costs. Instead of 4 cables (2x Ethernet, 2x Fibre Channel) the servers would only have 2 FCoE cables. iSCSI indeed started with this promise but fibre channel SAN purists were never really convinced for the following reasons. In contrast to FCoE where fibre channel tools can still be used, iSCSI changes these levels and Ethernet is not deterministic. Leading network providers are now competing to merge storage networks and server networks in new ways; the Ethernet standard is to be extended so that it can provide the same quality of service as a fibre channel network. The result is the so-called DCE (Data Center Ethernet) via which previous TCP/IP data traffic can be
70
Forecast
handled in exactly the same way as fibre channel (FCoE). In short: the 10Gb Ethernet is available and 40Gb as well as 100Gb is on the horizon. However, there are plans to extend fibre channel beyond 8Gb/s even up to 16Gb/s. A conversion to FCoE requires a high amount of infrastructure investment which in due course will be balanced by savings in operating similar networks.
2. Massive Array of Idle Disks (MAID)
he Enterprise hard disks used today run without any interruption on a 24x7 basis. This not only wears out the mechanical parts but continually consumes energy. With an alternative storage array on MAID basis only one small part of the hard disk is always active. Hard disks not accessed for some period of time are switched off, but this is only sensible if data really is not accessed for a longer period as frequently switching on and off would use up even more power and negatively affect the disk lifespan. MAID is thus mainly used in the important archive and backup sector where the price per saved terabyte is important. SATA disks are thus typically used in MAID systems. MAID is a major step forward in significantly saving energy and towards Green IT. However, there are a few obstacles to be surmounted. The disks should be periodically powered up and checked to make sure that the data can still be read. You can also reduce their rotational speed instead of switching them off completely which will significantly save power and reduce the danger of unknown disk errors.
3. Flash-based Solid State Disks (SSDs)
lash memory based SSDs, based on price per capacity, are more expensive than conventional hard disks but, thanks to their much faster read speed and faster access speed, SSDs that are optimized for enterprise use are already better than todays hard disks with regard to price per IOPS. Furthermore, they need less energy, hardly generate any heat and have no problems regarding rotational vibration as there are no moving parts. As the price of flash chips has fallen by an annual average of 50% in recent years, such disks are of great interest for a variety of situations. A large number of hard disks are used today to achieve similar speeds. The speed comes from the parallel access on many spindles. For example, such scenarios which have only 2 SSD (two in order to ensure data redundancy via mirroring) could be sued to replace a dozen hard disks in RAID 5 or RAID 6 arrays. This results in a great deal of energy savings.
Future storage trends
71
4. Security (Encryption of Data at Rest)
ata management and data storage has always gone hand-in-hand with security: there would be serious consequences if sensitive data fell into the wrong hands. Encryption helps to avoid misuse: the data cannot be read nor used by anyone who is not authorized to do so. This also applies if data media is faulty or supposedly faulty (e.g. hard disks) and has to be replaced. All data on any faulty data media must be made non-readable. Unfortunately, there are many ways to approach encryption. It can be carried out at various levels; at application, file system, network and switch level as well as in devices, such as ones own encryption appliances or tape libraries and disk drives. Key management for an entire data center is currently a problem as most manufacturers offer their own key management. Standardization does not yet exist. And without a specific standard there is no guarantee that data users can also read the data and exchange it as required without any problems.
5. New Storage Architectures (Grid Storage)
odays storage systems are usually built up in a monolithic manner and were specially designed as a storage system. As fewer storage systems are produced than PCs or standard servers, their production is thus expensive. And as with all monolithic systems, the maximum configuration is limited for design reasons. A paradigm change will revolutionize the storage industry: lets take standard servers and the large number of manufactured simple standard RAID systems and turn them into one storage system thanks to an intelligent software layer. Such scale-out systems can be greatly extended by just adding more and more standard components. Growth will have no limits and the systems will offer more capacity and more access speed as they increase in size. This architecture gives systems the scaling and flexibility urgently required in todays file-driven world and is also the backbone for Internet infrastructure (Cloud Computing). CentricStor FS is one of the first incarnations of this new type of storage generation.
Remarks
Section 1
[1] The triumph of large numbers; 10 years Google; Neue Zricher Zeitung, April 25, 2008. [2] Details about typeface development from Charles Panati, The Browsers Book of Beginnings Origins of Everything Under, and Including, the Sun, New York 1998, Page 67 ff. [3] Der Spiegel, edition dated 11. 8. 2008, cover feature Addicted to data, Page 88. [4] Many authors have written about the positive and negative effects of this development. For example, Neil Postman, Joseph Weizenbaum, Nicholas Negroponte or Nicholas Carr. See the interview in Hendrik Leitner / Hartmut Wiehr, Die andere Seite der IT Business-Transformation durch Services und dynamische Infrastruktur, Munich 2006, Seite 215 ff. [5] Cf. The following passage from an interview in the German weekly Die Zeit. Question: Which development has changed the most the way we handle knowledge in recent years? Answer: Two. Firstly: hard disks now hardly cost anything. It is now no longer an utopian idea to have mans entire published works on disk. Secondly, nobody in the world is more than a days walk from an Internet caf. We now have the communication infrastructure to provide the worlds great libraries to youngsters in Uganda or the poorer areas in the USA or Germany. (Interview with Brewster Kahle, Director of the Internet Archive in San Francisco; Die Zeit, 17. 1. 2008)
Section 2
[1] The fact that major storage manufacturers have been buying up companies, such as Cognos, Business Objects, Hyperion, Documentum or FileNet, that developed software for document management (DMS) or business intelligence (BI) proves that storage hardware and the criteria for stored data are merging. It can also be seen as an attempt to integrate classic storage equipment with ILM or HSM. [2] Steve Duplessie, File Infrastructure Requirements in the Internet Computing Era, Enterprise Strategy Group (ESG), July 2008, Page 5. [3] Fred Moore / Horison Information Strategies, Storage Spectrum (2007), Page 76. [4] Every fourth user has implemented ILM at a certain point or in a certain area but only 3.5% use such solutions throughout the company. Users, who are seriously looking at such a topic, must realize that ILM cannot be simply bought as a single product nor implemented via a one-off project. Wolfram Funk, Experton Group (quoted in the German magazine Computerwoche 46/2006, Page 40) [5] See also the interview with Michael Peterson in http://www.searchstorage.de/topics/ archiving/e-mail/articles/107136/. [6] Cf. for example Dan Warmenhoven, NetApp CEO: My view (regarding ILM) is that a workintensive data management process should not be set up if there is an automated one. And
74
Remarks
ILM is very work-intensive. (). The user wants to migrate his data from expensive storage forms to cheaper ones. And he requires an online archive in order to retrieve the data quickly and also for compliance reasons. NetApp talks of data archiving and migration. Calling all that ILM is confusing. (Interview in Computerwoche, 9. 11. 2007).
Section 3
[1] Cf. Hartmut Wiehr, Disk systems technology and products, in iX extra, edition 12/2007 (free download: http://www.heise.de/ix/extra/2007/ie0712.pdf). [2] Cf. the forecast in this book, page 69. [3] There is an early warning system in professional disk systems. Disks report during running operations as to whether they are going to fail. Drives, which have not been in use for a while, are addressed periodically to make sure that they still work. Do they react immediately to a signal or only with the retries? The latter would suggest that the magnetic interface is no longer 100%. All the corresponding information is collected and one of the spare disks is triggered by the system if certain threshold values are exceeded. Or else a message is sent to the service team. [4] ICP Vortex has provided a White Paper about the various RAID-levels. The document is in German and can be downloaded from: http://vortex.de/NR/rdonlyres/82BA6504-885D444E-AC71-7AC570CF56A3/0/raid_d.pdf. [5] See Gnter Thome / Wolfgang Sollbach, Fundamentals of Information Lifecycle Management, Berlin Heidelberg 2007, page 214 ff.
Section 4
[1] For the history of the Ethernet protocol and the Internet, which were developed on behalf of the US Army and were to ensure the transportation of messages in the event of war see http://www.informatik.uni-bremen.de/grp/unitel/referat/timeline/timeline-2.html. [2] For more information about SANs and Fibre Channel see http://www.snia.org/education/ tutorials/ and http://www.brechannel.org/. [3] Seen from a historical viewpoint, CNT rst took over its competitor Inrange, but was then itself purchased by McData shortly afterwards. Brocade now unites all the former FC switch providers against Cisco, the company with the largest market force in the network sector. Seen globally, Brocade is still the market leader for FC switches, whereas Cisco has continuously extended its market share, particularly in USA. [4] Good interpretations of the development of the IT industry are to be found in Paul E. Ceruzzis, A History of Modern Computing, Cambridge and London 2003, and in Clayton M. Christensens, The Innovators Dilemma, New York 2000. [5] Cf. Mario Vosschmidt/Hartmut Wiehr, Storage networks and their instruments of administration; in: iX, vol. 8/2008, page 122 ff. [6] A good overview is to be found in the brochure NetApp Products & Solutions.
Remarks
75
[7] Interview with David Hitz in project 57 Journal for Business Computing and Technology, Special 01/05, page 39 ff. Cf. also the interview with NetApp CEO Dan Warmenhoven in Computerzeitung, issue 38/2008, page 10.
Section 5
[1] Many storage manufacturers supply their own backup programs together with their hardware. EMC, a partner of Fujitsu Siemens Computers, has after the takeover of Legato in 2003 a complete solution (NetWorker) in its portfolio. Of the independent providers CA with ARCserve and Symantec with NetBackup are particularly worth mentioning. CommVault and BakBone can frequently be found in the Windows environment. [2] See in this respect Dave Russell / Carolyn DiCenzo, MarketScope for Enterprise Backup/Recovery Software 2008, April 2008. [3] An example of error corrections is ECC (Error Correction Code). The correction is used to detect errors during the storage and transfer of data. The errors are then automatically remedied in the second step. [4] Hard disks with S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) have a function that constantly monitors them, with technical values, such as temperature, startup time or track reliability of the read and write heads being controlled. [5] Specialist service providers, such as Kroll Ontrack or Seagate Services, can often recover hard disks damaged by re and water and thus save the lost data. [6] See Hartmut Wiehr, Begrenzte Lebensdauer (= Limited life-span): Disks Status Quo and Trends; in iX extra, issue 12/2007; download: http://www.heise.de/ix/extra/2007/ie0712.pdf. Other terms and contexts relating to this topic are explained there. [7] For more details see www.speicherguide.de: Focus on tape drives and tapes (http://www. speicherguide.de/magazin/bandlaufwerke.asp?mtyp=&lv=200). The durability of DVDs and Blu Ray Disc (BD) is subject to very large uctuations. The market is wide-spread, particularly in the consumer segment. A good overview of the durability of these optical media with due regard to their suitability for archiving purposes can be found in ct Magazin fr Computertechnik, issue 16/2008, page 116 ff. [8] As early as 1987 StorageTek presented the 4400 Automated Cartridge System (ACS), which was the basis for the 4400 Tape Libraries (Nearline) which were introduced in 1988. The large PowderHorn Libraries introduced in 1993 were a further development of this technology, with which very fast access to tape cartridges is possible. A great many PowderHorns are still in use today, and after StorageTek was taken over by Sun in 2005 the new owner had to repeatedly extend the maintenance cycles on account of the pressure exerted by major customers. These customers saw no reason to phase out their tried-and-trusted tape libraries and replace them with follow-up models. [9] Cf. The Top 10 Storage Inventions of All Time, in Byteandswitch, June 16, 2008. [10] Hartmut Wiehr, Dedup turns backup inside out, in Computerzeitung, issue 28/2008, page 10.
76
Remarks
Section 6
[1] Fujitsu Siemens Computers has developed two dynamic infrastructure solutions FlexFrame for SAP and FlexFrame for Oracle which combine storage, server and network resources on a single platform. Resources can thus be assigned and moved in run mode in a virtual environment and depending on current requirements. [2] See section 7 for more about virtualization. [3] See Mario Vosschmidt/Hartmut Wiehr, Gut eingebunden Speichernetze und ihre Verwaltungsinstrumente (Well incorporated Storage networks and their administration tools), in: iX Magazin fr professionelle Informationstechnik, Heft 8/2008, Page 123. [4] Published in tecchannel, 26th January 2006. For more information, see the article SMI-S is holding the storage networking industry together (Mit SMI-S hat sich die Storage-Networking-Industrie ihren inneren Zusammenhalt gegeben) www.searchstorage.de (September 5, 2007).
Section 7
[1] Virtualization drives IT industrialization forward: Interview by Hartmut Wiehr with Dr. Joseph Reger, at www.searchstorage.de, April 10, 2004: http://www.searchstorage.de/topics/ rz-techniken/virtuelle-systeme/articles/117015/. [2] Ibid. [3] Ibid. [4] Mario Vosschmidt/Hartmut Wiehr, Gut eingebunden Speichernetze and ihre Verwaltungsinstrumente (Well incorporated Storage networks and their administration tools), in: iX Magazin for professionelle Informationstechnik, issue 8/2008, page 124. [5] LUN masking (also LUN mapping) means that only the storage area it needs to perform its work is allocated and made visible to an application. As a result of this segmentation general access to certain storage areas is prohibited, which at the same time increases the security of all the applications. With SAN zoning the same principle is applied to the division of a network into virtual subnetworks so that servers of the one zone cannot access storage systems of another zone. [6] Steve Duplessie, File Infrastructure Requirements in the Internet Computing Era, Enterprise Strategy Group (ESG), July 2008.
Section 8
[1] Josh Krischer, Krischer & Associates, CentricStor Virtual Tape: the Swiss Army Knife for data protection, September 2008 [2] Terri McClure, Enterprise Strategy Group (ESG), CentricStor FS von Fujitsu Siemens Computers, July 2008
Glossary
A
Array (Storage Array) A subsystem which houses a group of disks (or tapes), together controlled by software usually housed within the subsystem. Asynchronous Replication After data has been written to the primary storage site, new writes to that site can be accepted, without having to wait for the secondary (remote) storage site to also finish its writes. Asynchronous Replication does not have the latency impact that synchronous replication does, but has the disadvantage of incurring data loss, should the primary site fail before the data has been written to the secondary site.
C
Compliance In data storage terminology, the word compliance is used to refer to industry-wide government regulations and rules that cite how data is managed and the need for organizations to be in compliance with those regulations. Compliance has become a major concern for organizations and businesses, due largely in part to increasing regulatory requirements which often require organizations to invest in new technologies in order to address compliance issues. Continuous Data Protection (CDP) Refers to backing up computer data by saving as an automated function a copy every time changes are made to that data. This event driven backup works during productive computing time, and could slow down the performance of the IT infrastructure. Cluster A group of servers that together act as a single system, enabling load balancing and high availability. Clustering can be housed in the same physical location (basic cluster) or can be distributed across multiple sites for disaster recovery.
B
Backup/Restore A two step process. Information is first copied to non-volatile disk or tape media. In the event of computer problems (such as disk drive failures, power outages, or virus infection) resulting in data loss or damage to the original data, the copy is subsequently retrieved and restored to a functional system. Block Data Raw data which does not have a file structure imposed on it. Database applications such as Microsoft SQL Server and Microsoft Exchange Server transfer data in blocks. Block transfer is the most efficient way to write to disk. Business Continuity The ability of an organization to continue to function even after a disastrous event, accomplished through the deployment of redundant hardware and software, the use of fault tolerant systems, as well as a solid backup and recovery strategy.
D
Data Deduplication Deduplication technology segments the incoming data stream, uniquely identifies these data segments, and then compares them to segments previously stored. If an incoming data segment is a duplicate of what has already been stored, the segment is not stored again but a reference is created for it (pointer). This process operates at a very low level of granularity or atomic level to identify as much redundancy as possible. The trade-offs in this
78
Glossary
filtering process pertain to efficiency, speed and data size. DAS (Direct Attached Storage) DAS is storage that is directly connected to a server by connectivity media such as parallel SCSI cables. This direct connection provides fast access to the data; however, storage is only accessible from that server. DAS include the internally attached local disk drives or externally attached RAID (redundant array of independent disks) or JBOD (just a bunch of disks). Although Fibre Channel can be used for direct attached, it is more commonly used in storage area networks. Disaster Recovery The ability to recover from the loss of a complete site, whether due to natural disaster or malicious intent. Disaster recovery strategies include replication and backup/restore.
FaultTolerance Faulttolerance is the ability of computer hardware or software to ensure data integrity when hardware failures occur. Fault-tolerant features appear in many server operating systems and include mirrored volumes, RAID volumes, and server clusters. File Data Data which has an associated file system. Fibre Channel (FC) A highspeed interconnect used in storage area networks (SANs) to connect servers to shared storage. Fibre Channel components include HBAs, hubs, switches, and cabling. The term Fibre Channel also refers to the storage protocol. Fibre Channel over Ethernet (FCoE) A technology that encapsulates Fibre Channel frames in Ethernet frames, allowing FC traffic to be transported over Ethernet networks. Standards are in work in different standardization committees. Products are announced for 2009 or 2010. FCoE could be an alternative to classical Fibre Channel technology.
E
Ethernet Local area network (LAN) topology commonly operating at 10 megabits per second (mbps) over various physical media such as coaxial cable, shielded or unshielded twisted pair, and fiber optics. Future plans call for 1, 10 and 100 gigabit Ethernet versions. Ethernet standards are maintained by the IEEE 802.3 committee.
G
Global File System In some configurations, as with clusters or multiple NAS boxes, it is useful to have a means to make the file systems on multiple servers or devices look like a single file system. A global or dispersed file system would enable storage administrators to globally build or make changes to file systems. To date this remains an emerging technology.
F
Failover In the event of a physical disruption to a network component, data is immediately rerouted to an alternate path so that services remain uninterrupted. Failover applies both to clustering and to multiple paths to storage. In the case of clustering, one or more services (such as Exchange) is moved over to a standby server in the event of a failure. In the case of multiple paths to storage, a path failure results in data being rerouted to a different physical connection to the storage.
H
High Availability A continuously available computer system is characterized as having essentially no downtime in any given year. A system with 99.999% availability experiences only about five minutes of downtime. In contrast, a high availability system is defined as having 99.9% uptime,
Glossary
79
which translates into a few hours of planned or unplanned downtime per year. HBA (Host Bus Adapter) The HBA is the intelligent hardware residing on the host server which controls the transfer of data between the host and the target storage device. Hierarchical Storage Management (HSM) A data storage system that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as enterprise disk drives (Fibre Channel, SAS), are more expensive (per byte stored) than slower devices, such as low-speed discs (SATA, desktop discs) or magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprises data on slower devices, and then copy data to faster disk drives when needed.
ices, and on the accommodation and environmental facilities needed to support IT. ITIL has been developed in recognition of organisations growing dependency on IT and embodies best practices for IT Service Management. ITIL is often implemented when different enterprises work together, and can also facilitate mergers and acquisitions.
J
JBOD (Just a Bunch of Disks) As the name suggests, a group of disks housed in its own box; JBOD differs from RAID in not having any storage controller intelligence or data redundancy capabilities.
L
LAN Local Area Network. Hardware and software involved in connecting personal computers and peripherals within close geographic confines, usually within a building, or adjacent buildings. Load Balancing Referring to the ability to redistribute load (read/write requests) to an alternate path between server and storage device, load balancing helps to maintain high performance networking. LTO Linear Tape Open. The LTO family, a half-inch open technology with Ultrium format a cartridge targeted at ultra-high capacity requirements. LUN (Logical Unit Number) A logical unit is a conceptual division (a subunit) of a storage disk or a set of disks. Logical units can directly correspond to a volume drive (for example, C: can be a logical unit). Each logical unit has an address, known as the logical unit number (LUN), which allows it to be uniquely identified.
I
ILM (Information Lifecycle Management) The process of managing information growth, storage, and retrieval over time, based on its value to the organization. Sometimes referred to as data lifecycle management. iSCSI (Internet SCSI) A protocol that enables transport of block data over IP networks, without the need for a specialized network infrastructure, such as Fibre Channel. ITIL (Information Technology Infrastructure Library) ITIL refers to a documentation of best practice for IT Service Management. Used by many hundreds of organisations around the world, a whole ITIL philosophy has grown up around the guidance contained within the ITIL books and the supporting professional qualification scheme. ITIL consists of a series of books giving guidance on the provision of quality IT serv-
80
Glossary
LUN Masking A method to restrict server access to storage not specifically allocated to that server. LUN masking is similar to zoning, but is implemented in the storage array, not the switch.
M
MAN Metropolitan Area Network. A network capable of high-speed communications over distances up to about 80 kilometers. Metadata The information associated with a file but separate from the data in the file; required to identify data in the file and its physical location on a disk. Mirroring A disk data redundancy technique in which data is recorded identically and either synchronously or asynchronously on multiple separate disks to protect data from disk failures. When the primary disk is off-line, the alternate takes over, providing continuous access to data. Normally used for missioncritical data, mirroring is classified as RAID 1 configuration and doubles disk costs.
Port The physical connection point on computers, switches, storage arrays, etc, which is used to connect to other devices on a network. Ports on a Fibre Channel network are identified by their Worldwide Port Name (WWPN) IDs; on iSCSI networks, ports are commonly given an iSCSI name. Not to be confused with TCP/IP ports, which are used as virtual addresses assigned to each IP address.
R
RAID (Redundant Array of Independent Disks) A way of storing the same data over multiple physical disks to ensure that if a hard disk fails a redundant copy of the data can be accessed instead. Example schemes include mirroring and RAID 5. Redundancy The duplication of information or hardware equipment components to ensure that should a primary resource fail, a secondary resource can take over its function. Replication Replication is the process of duplicating mission critical data from one highly available site to another. The replication process can be synchronous or asynchronous; duplicates are known as clones, point-in-time copies, or snapshots, depending on the type of copy being made.
N
NAS (Network Attached Storage) A NAS device is a server that runs an operating system specifically designed for handling files (rather than block data). Network-attached storage is accessible directly on the local area network (LAN) through LAN protocols such as TCP/IP. Compare to DAS and SAN.
S
SAN (Storage Area Network) A storage area network (SAN) is a specialized network that provides access to high performance and highly available storage subsystems using block storage protocols. The SAN is made up of specific devices, such as host bus adapters (HBAs) in the host servers, switches that help route storage traffic, and disk storage subsystems. The main characteristic of a SAN is that the storage subsystems are generally available to multiple hosts at the same time,
P
Partition A partition is the portion of a physical disk or LUN that functions as though it were a physically separate disk. Once the partition is created, it must be formatted and assigned a drive letter before data can be stored on it.
Glossary
81
which makes them scalable and flexible. Compare with NAS and DAS. SAS/SATA SAS: Serial Attached SCSI. While SATA (Serial ATA) is designed for desktops, making it a good choice in storage environments requiring configuration simplicity or optimal cost/capacity, SAS delivers the high performance, scalability and reliability required for mainstream servers and enterprise storage. SCSI (Small Computer System Interface) A set of standards allowing computers to communicate with attached devices, such as storage devices (disk drives, tape libraries etc) and printers. SCSI also refers to a parallel interconnect technology which implements the SCSI protocol. SCSI is available in two flavours: Parallel SCSI and Serial Attached SCSI. Parallel SCSI has been the standard in connectivity for more than 20 years, and is known for its stability and reliability. Serial Attached SCSI (SAS) is the newest generation of SCSI, with both Serial ATA (SATA) and SAS drives. Snapshot A virtual copy of a device or filesystem. Snapshots imitate the way a file or device looked at the precise time the snapshot was taken. It is not a copy of the data, only a picture in time of how the data was organized. Snapshots can be taken according to a scheduled time and provide a consistent view of a filesystem or device for a backup and recovery program to work from. Solid State Disk (SSD) A solid state disk is a high-performance plugand-play storage device that contains no moving parts. SSD components include either DRAM or EEPROM memory boards, a memory bus board, a CPU, and a battery card. Because they contain their own CPUs to manage data storage, they are a lot faster than conventional rotating hard disks; therefore, they produce highest possible I/O rates. SSDs are most effective for server applications and server systems, where I/O response time is crucial. Data stored
on SSDs should include anything that creates bottlenecks, such as databases, swap files, library and index files, and authorization and login information. Storage Controller Providing such functionality as disk aggregation (RAID), I/O routing, and error detection and recovery, the controller provides the intelligence for the storage subsystem. Each storage subsystem contains one or more storage controllers. Storage Resource Management (SRM) Refers to software that manages storage from a capacity, utilization, policy and event-management perspective. This includes bill-back, monitoring, reporting and analytic capabilities that allow you to drill down for performance and availability. Key elements of SRM include asset management, charge back, capacity management, configuration management, data and media migration, event management, performance and availability management, policy management, quota management, and media management. Switch An intelligent device residing on the network responsible for directing data from the source (such as a server) or sources directly to a specific target device (such as a specific storage device) with minimum delay. Switches differ in their capabilities; a director class switch, for example, is a high end switch that provide advanced management and availability features. Synchronous Replication In synchronous replication, each write to the primary disk and the secondary (remote) disk must be complete before the next write can begin. The advantage of this approach is that the two sets of data are always synchronized. The disadvantage is that if the distance between the two storage disks is substantial, the replication process can take a long time and slows down the application writing the data. See also asynchronous replication.
82
Glossary
T
Tape Library In data storage, a tape library is a collection of magnetic tape cartridges and tape drives. An automated tape library is a hardware device that contains multiple tape drives for reading and writing data, access ports for entering and removing tapes and a robotic device for mounting and dismounting the tape cartridges without human intervention. To mount means to make a group of files in a file system structure accessible to a user or user group. Target A target is the device to which the initiator sends data. Most commonly the target is the storage array, but the term also applies to bridges, tape libraries, tape drives or other devices. TCP/IP Transmission Control Protocol/Internet Protocol. A set of transport and network layer protocols developed under the auspices of the U.S. Department of Defense. Has emerged as the de-facto standard for communications among Unix systems, particularly over Ethernet. Thin Provisioning Thin provisioning is most commonly used in centralized large storage systems such as SANs and also in storage virtualization environments where administrators plan for both current and future storage requirements and often over-purchase capacity, which can result in wasted storage. Since thin provisioning is designed to allocate exactly what is needed, exactly when it is needed, it removes the element of paid for but wasted storage capacity. Additionally, as more storage is needed additional volumes can be attached to the existing consolidated storage system. Tiered Storage Data is stored according to its intended use. For instance, data intended for restoration in the event of data loss or corruption is stored locally, for fast recovery. Data required to be
kept for regulatory purposes is archived to lower cost disks.
V
VTL (Virtual Tape Library) Refers to an intelligent disk-based library that emulates traditional tape devices and tape formats. Acting like a tape library with the performance of modern disk drives, data is deposited onto disk drives just as it would onto a tape library, only faster. Virtual tape backup solutions can be used as a secondary backup stage on the way to tape, or as their own standalone tape library solution. A VTL generally consists of a Virtual Tape appliance or server, and software which emulates traditional tape devices and formats. Virtualization In storage, virtualization is a means by which multiple physical storage devices are viewed as a single logical unit. Virtualization can be accomplished inband (in the data path) or out-of-band. Outofband virtualization does not compete for host resources, and can virtualize storage resources irrespective of whether they are DAS, NAS or SAN. Volume A volume is an area of storage on a hard disk. A volume is formatted by using a file system, and typically has a drive letter assigned to it. A single hard disk can have multiple volumes, and volumes can also span multiple disks.
Z
Zoning A method used to restrict server access to storage resources that are not allocated to that server. Zoning is similar to LUN masking, but is implemented in the switch and operates on the basis of port identification (either port numbers on the switch or by WWPN of the attached initiators and targets). (Sources: Adaptec, Fujitsu Siemens Computers, Horison Information Strategies, Microsoft, SNIA, Webopedia, ZAZAmedia)
Information infrastructures in enterprises
he principle of profitable trading increasingly demands efficient handling of information in the enterprise, especially when you consider that the volume of information is growing by an average of 60 percent per year. With its solutions, EMC is striving to make optimal use of this capital as well as to protect, manage, store and archive it. EMC thereby rings in a paradigm change by moving the information itself rather than the applications into the center of the infrastructure. As a result, the demands on the infrastructure must be focused on the paths the information takes in the enterprise: from creation, capture and utilization, through to archiving and deletion. The optimum strategy for setting up an information infrastructure includes intelligent data storage, protection against data loss or misuse, optimizing the infrastructure of IT management and services and utilizing the value added potential of information. Alongside the top priority of cost reduction, enterprises mainly want to improve their compliance with all legal requirements and enhance support of their business processes. Business demands on IT are therefore given noticeably higher priority than technological goals such as better data security or better structured data.
EMC hardware and software to build up your information infrastructure

our enterprise is changing continuously and EMC will grow and change with you. From hard disk libraries to content management systems and SANs, our products have been the choice for many enterprises for years. EMC offers the software, systems, security and services that you need to develop a more intelligent storage concept that is effective, economical and user-friendly and also provides you with the necessary flexibility for expansion together with your enterprise. But more intelligent storage is only half the solution: a better concept is needed for information management. This can be achieved by implementing multiple key initiatives. With its years of experience, EMC has identified six important areas that can help you to store information more intelligently.
84
EMC
Six approaches to storing more intelligently.

1. Classify data and deploy tiered storage Not all data is created equal. In fact, each byte of data used in your organization varies in terms of how often its accessed and how quickly it must be recovered in the event of downtime. Therefore, its vital to begin the journey toward storing more intelligently by classifying data based on access requirements. As soon as you begin assessing your data, youll find that a one size fits all approach to storing data no longer makes sense. Rather, deploying tiered storage provides the needed framework and processes to maintain information on the right storage type at the right time, thereby aligning capabilities to requirements and ultimately lowering total cost of ownership (TCO). EMC offers a range of scalable, easy-to-use storage platforms, software, and services to help organizations like yours classify data and establish a tiered storage infrastructure. Our portfolio includes EMC Symmetrix, EMC CLARiiON, and EMC Celerra. All EMC storage platforms can be deployed with a combination of high-performance and low-cost/high-capacity disk drives within the same array, and offer feature-rich software functionality to simplify movement, management, and protection of valuable data. So, whether your organization requires a small-scale solution with tiered storage deployed within one physical system, or has a large-scale environment where multiple tiers of storage systems are needed, the ultimate benefit is the samelower TCO.
Before
After
Tier 1 Production Data Tier 2 Tier 3 "One size fits all" Tiered Storage
Durch die Klassifizierung von Daten und die Bereitstellung von By classifying and Organisationen deploying tiered storage, organizations Tiered Storagedata knnen ihre TCO in der Regel typically a 25 percent improvement in TCO. um 25%see verbessern.
85
EMC is the market leader in the area of IP storage solutions The wide proliferation of IP storage technologies, such as iSCSI and NAS for accessing storage systems, offers enterprises of all sizes the option of economically setting up and operating efficient and reliable storage networks. Enterprises can now use their existing IP technologies and IP-trained personnel to provide and operate their storage networks. All-in-all, the methods offered by todays mature, IP-based storage technology make it significantly simpler and more cost-effective to achieve the desired storage consolidation results. With the storage families Celerra NS, CLARiiON CX (FibreCAT CX) and Symmetrix DMX, EMC is the market leader in the field of IP storage solutions. According to IDC Research, no other supplier has provided more IP-based storage solutions than EMC with its wide network of EMC Velocity2 partners. The reason for this: EMC offers a wide range of flexible options to allow enterprises and organizations of all sizes to profit from cost-effective and user-friendly IP storage solutions regardless of whether you are providing a whole new network storage system or consolidating servers and applications that are not in an existing SAN. 2. Create an active archive When information is needed, it must be online and readily accessiblewhether its required daily or to meet longer-term audit cycles. Active archiving gives you the ability to move data that is used infrequently from primary storage into a more cost-effective storage infrastructure while still allowing quick and easy access. EMC Centera is the first active archive solution in the market that is purpose-built for storage and retrieval of fixed content. By migrating static or infrequently accessed information from the production storage environment to an EMC Centera active archive, valuable production storage space is freed up and backup media requirements lessen as static content is removed from daily or weekly backup processes. For compliance purposes, EMC Centera also offers guaranteed content authenticity to meet the emerging set of regulatory requirements.
86
EMC
Before
After
Tier 1
Tier 1
Active Archive
Tier 2
Tier 2
Tier 3
Tier 3
In vielen E-Mailoder Dateisystemumgebungen werden mehr In many e-mail or file system environments, greater than 75 percent als 75% der nicht verndert damit sind diese Daten for of the data is Daten not modified, which makes it an ideal candidate ideale Kandidaten fr die aktive Archivierung. active archive.
3. Reduce / eliminate redundant data One of the key drivers impacting storage costs is the amount of production data that needs to be stored and backed up. Reducing or eliminating duplicate data in backup environments is a key step to storing more intelligently. Consider a simple example: an e-mail attachment is distributed and then saved by 20 different users. The impact? Storage capacity, network bandwidth, and backup media are unnecessarily consumed by a factor of 20. With data de-duplication, storage requirements lessen, operations are streamlined, backup times are shortened, and overall TCO is improved. EMC technology proactively helps to reduce or eliminate the amount of redundant data that an organization creates, stores, and ultimately backs up. EMC Centera computes a content address every time an object is requested to be stored. For example, when two or more people try to store the same e-mail attachment, EMC Centera computes the same content address each time. With this intelligence, EMC Centera easily determines that multiple people are trying to store the same file and gives each user a pointer to the file instead of storing multiple copies. This minimizes information redundancy and the amount of storage required. EMC Avamar backup and recovery software can achieve the same results for your backup data. EMC Avamar leverages global de-duplication technology to eliminate the unnecessary transmission of redundant backup data over the network and saved to secondary storage. By de-duplicating across sites and servers, you can dramatically shrink the amount of time required for backups, network utilization, and the growth of secondary storage.
87
Before
After
EMC Avamar erzielt eine Reduzierung der tglichen EMC Avamar realizes up to 300 times reduction in zu daily backup Backup-Daten bis zum Faktor 300 und bietet bis zehn dataschnellere and up to 10 times faster backups. Mal Backups.
4. Speed backup and recovery By extending the tiered storage methodology to backup/recovery operations, organizations can dramatically improve the speed and reliability of their backup and recovery processes. For example, disk-based solutions deliver five-fold improvement in backup and recovery times compared with tape-based solutions. EMC offers a complete solution for backup-to-disk needs, including assessment services, backup software applications, and a wide range of LAN-based, SAN-based, and virtual tape library (VTL) solutions so you can realize better performance and reliability in your backup and recovery environmentno matter how you manage your backups today. All EMC storage platforms can be deployed with low-cost/high-capacity disk drive technologies that are ideal for backup-to-disk applications.
88
EMC
Before
After
Backup Data
Tier 3
Backup Data
Festplattenbasierte Lsungen bieten eine Verbesserung Disk-based solutions provide a 5x improvement in backup / der Backup- und Recovery-Zeiten um den Faktor 5. recovery times.
5. Use space-saving snapshots to reduce capacity requirements Creating full copies of production data has been a common practice to assist with backup, recovery, and testing of application environments. As information continues to grow, the storage requirements and costs associated with this practice have escalated, as have the capacity requirements to support local replication activities. Today, the use of space-saving snapshots provides a much smarter, more affordable alternative for protecting and re-purposing production data. Specifically, the space-saving characteristics of logical snapshot copies allow organizations to more quickly and effectively leverage replication capabilities while minimizing the traditional costs and capacity requirements that go along with them. All EMC storage platforms offer the flexibility to deploy array-based full copy clone and local snapshot replicationyou can choose which to leverage based on your application requirements and use cases. In addition, both types are supported and integrated with key applications, such as Microsoft Exchange, SQL Server, Oracle, and SAP.
89
Before
After
Snaps Snaps
Clones
Snaps
Snapshots provide to bis 10 times reduction in capacity required Snapshots bieten up eine zu zehnfache Reduzierung der for die local replication. fr lokale Replikation erforderlichen Kapazitt.
6. Deploy server and file virtualization IT departments today face a number of challenges in their physical server infrastructures, beginning with underutilization and server sprawl. According to IDC, a typical x86 server deployment uses only 10 to 15 percent of its total capacity, and yet organizations spend considerable resources provisioning, maintaining, and securing growing numbers of physical servers. For every $1 spent on new servers, again according to IDC, $8 is spent in ongoing management. Organizations can overcome these challenges and implement significant efficiencies by using virtualization to optimize the infrastructure at the server level. Virtualization with VMware offers a fundamentally simpler and more efficient way to manage server infrastructures. In fact, the time to provision a new server can be decreased by up to 7o percent, new applications can be provisioned in minutes, and time previously spent on manual, time-consuming tasks can be redirected toward strategic initiatives. Whats more, data center capacity can be scaled non-disruptively and availability can be ensured for all applications at a reasonable cost. When you combine VMware server virtualization with EMC Rainfinity Global File Virtualization- you get the added benefits of virtualizing your unstructured data environments. The Global File Virtualization capabilities allow you to simplify management, move data non-disruptively, address performance bottlenecks, and lower TCO by maximizing utilization in heterogeneous NAS, CAS, and file server environments
90
EMC
Addressing your key priorities At EMC, were focused on the things that are most important to you, such as making sure your information is easy to manage, taking the right steps to conserve energy, and protecting your valuable information from security threats and breaches. EMC solutions simplify storage management EMC has invested heavily in developing new and improved tools that meet user requirements to streamline storage management and improve ease of use across all platforms. These capabilities help simplify. EMC solutions can be deployed quickly, and are easy to use and manage. For example, Symmetrix Management Console, Celerra Manager, and CLARiiON Navisphere Manager all allow users to configure and provision a terabyte of storage in less than one minute. And EMC Centera includes self-configuring, self-healing, and self-managing capabilities that enable an administrator to manage up to 50 times greater quantity of content.
Energy-efficient solutions improve cost management

odays data centers face critical energy issues power and cooling limitations, high energy demands and costs, and even outages from overburdened power grids. Advanced tools and services from EMC help optimize energy efficiency in your data center. For example, EMC storage platforms consume less energy per terabyte than alternative solutions, while the EMC Power Calculator enables EMC experts to advise on ways to reduce power and cooling costs through more efficient data management. Powerful EMC virtualization solutions, such as VMware, can help you consolidate servers to increase capacity utilization and eliminate unnecessary infrastructure. And finally, EMC Energy Efficiency Services can help you make your data center as power efficient as possible. Continue to look to EMC for further innovations resulting from our investment in research and development programs aimed at reducing your energy and cooling costs.
Storage solutions with security built-in

ost organizations readily agree that information is their most important asset. In fact, in 2006 organizations spent $45B on security products; however studies show that only one in five feel protected.
91
At EMC, weve taken an information-centric approach by building security into our products, not bolting it on. This strategic approach provides our customers with improved access control, a more secure infrastructure, and expanded compliance and auditabilityso that their information infrastructure is guarded from security threats. Take the Next Step With its flexible and cost-effective storage consolidation solutions, EMC provides organizations with the information management framework they need to store more intelligently. But storage is only part of what we do. Our focus is on improving your business from the ground up. As such, we help organizations like yours with solutions focused on: Establishing backup, recovery, and archive operations Ensuring effective and affordable information protection Virtualizing information infrastructure Automating data center operations Securing critical assets Leveraging content for competitive advantage Accelerating business value for business-critical applications, such as Microsoft Exchange, SQL Server, Oracle and SAP.
Partnership between EMC and Fujitsu Siemens Computers

For nearly a decade, EMC and Fujitsu Siemens Computers have worked to build a strong relationship built on mutual trust and technological innovation. A key to our success is the combined expertise of both our sales and technical teams who have helped our mutual customers meet some of their biggest challenges. The combination of EMC networked storage systems and software with servers from Fujitsu Siemens Computers helps to provide a broad offering of end-to-end solutions. EMC technology plays a key role in Fujitsu Siemens Computers vision of the Dynamic Data Center and we continue to focus our combined efforts on providing customers with the industrys most comprehensive set of solutions. 2007, Joe Tucci, EMC President & CEO. Common strengths of Fujitsu Siemens Computers and EMC: EMC Storage integrated with Fujitsu Siemens Computers FlexFrame infrastructure Fujitsu Siemens Computers CentricStor is available in the EMC Select Reseller program Fujitsu Siemens Computers Infrastructure Services is an accredited member of the EMC Authorized Service Network (ASN) program
92
EMC
EMC Storage integration with Microsoft is implemented via Fujitsu Siemens Computers BladeFrame technology Common solution for Grid Computing based on Fujitsu Siemens Computers PRIMERGY server systems OEM and reseller agreements for EMC Networker For more information on EMC solutions, visit www.emc.com
DATA CENTER FABRIC
Leveraging 8 Gbit/sec Fibre Channel End to End in the Data Center
rocade is a leading provider of networked storage solutions that help organizations connect, share, and manage their information. Organizations that use Brocade products and services are better able to optimize their IT infrastructures and ensure compliant data management. All of Brocades products are tested and certified by Fujitsu Siemens Computers which ensures seamless integration into the comprehensive and leading edge infrastructure solutions of Fujitsu Siemens Computers. The continuing expansion of data centers and the introduction of new technologies such as server and fabric virtualization are driving the need for higher storage networking performance and greater capabilities from the data center fabric. The new Brocade Fibre Channel (FC) switch family meets current and near-future storage networking needs by doubling the current standard FC speed of 4 Gbit/sec to 8 Gbit/sec. New Brocade 8 Gbit/sec Host Bus Adapters work in concert with Brocade 8 Gbit/sec fabrics to deliver high performance and advanced functionality end to end.
Introduction
n January 2008, Brocade introduced 8 Gbit/sec capabilities for the Brocade 48000 Director and the new Brocade DCX Backbone platform. Brocade is expanding this leadership position with the introduction of an entire family of 8 Gbit/sec switch products targeting a range of data center environmentsfrom the enterprise to Small and Medium Business (SMB). In addition Brocade is launching 8 Gbit/sec Host Bus Adapters (HBAs), providing the industrys first end-to-end 8 Gbit/sec solution for SMB to enterprise customers. These high-performance solutions are driven by a new family of Brocade 8Gbit/sec ASICs, which process and route data with much higher levels of efficiency. In addition to doubling performance throughput, these new ASICs offer new capabilities that align with growing data center requirements for IT process automation, energy efficiency, and reduced Operating Expenses (OpEx). Steady increases in performance and functionality have been the hallmark of Fibre Channel evolution over the past decade. With the periodic doubling of transport speed from 1 to 2 Gbit/sec and from 2 to 4 Gbit/sec, storage administrators have quickly exploited the new perform-
94
Brocade
ance capabilities and advanced features to build more optimized storage networks. With the introduction of Brocade 8 Gbit/sec switches and HBAs, it is now possible to fully integrate advanced functionality that extends from the fabric all the way to the server platform. In trying to decide where enhanced performance and capabilities can be applied in your own environment, consider the following: Storage Growth. Storage Area Network (SAN) storage capacity has dramatically increased year over year in almost all data centers. As SAN storage grows, so do the fabrics that interconnect storage with servers. Large Fabrics. As fabrics grow, more Inter-Switch Links (ISLs) are used to keep pace with storage and server scaling. Higher Levels of Performance. In large-scale data centers, moving SAN bandwidthintensive hosts to 8 Gbit/sec connectivity enables the servers to achieve higher levels of performance using fewer HBAs and a smaller cabling infrastructure. Server Virtualization. Hosting multiple operating system instances on a single host platform dramatically increases storage I/O demands, which in turn drives up host SAN throughput. Tiered Services. In a shared environment, in which IT may be using chargeback to serve internal customers, a tiered services model requires the ability to specify service levels for hosted applications and to monitor and manage these services end to endall capabilities of Brocade 8 Gbit/sec solutions. Backup. Large amounts of traffic to tape or disk during backups require the fastest SAN speeds possible to fit within backup windows. Operational Flexibility. While not all hosts, storage, and ISLs currently require maximum speed capability, it is much easier to architect data center fabrics when highspeed ports are available. Investment Protection. Existing SANs can be significantly enhanced with new capabilities enabled by 8 Gbit/sec port speed. Integrated Routing and Adaptive Networking services are compatible with legacy SAN equipment, extending their Return on Investment (ROI) as data center fabrics scale. Data centers may have some or all of these needs today. Although meeting these needs may not require an immediate upgrade to 8 Gbit/sec for all storage applications, future plans for expansion, virtualization, and fabric scaling will make acquiring 8 Gbit/ sec capabilities today a safe and well-founded decision. As fabrics scale, for example, only half the number of ISLs is required with 8 Gbit/sec links than with 4 Gbit/sec links. Likewise, the ISL oversubscription ratio is halved by upgrading from 4 to 8 Gbit/sec, ISLs using the same number of links. At long distances, 8 Gbit/sec can earn a very fast ROI compared to 4 Gbit/sec, due to the high cost of dark fiber or WDM links. Almost all of these native FC extension links support 8 Gbit/sec speeds, so utilization can be doubled on links that usually cost thou-
95
sands, if not tens of thousands, of dollars per month. This can quickly justify the equipment cost for the increased speed capability. Building a high-performance foundation that provides the flexibility to selectively deploy 8 Gbit/sec as needed simplifies data center fabric management and accommodates the inevitable growth in applications and data over time.
Evolving Data Center Virtualization

irtualization of server platforms can dramatically increase the need for higher-speed capability in the SAN. Some virtualized hosts have 10, 20, or even 30 operating systems, which can exceed the capacity of a 4 Gbit/sec HBA. Brocades 8 Gbit/sec end-toend solutions can prevent this saturation and increase the ROI on server hardware and virtualization software investments. Brocade recognizes these three phases for evolving virtualization in the data center: Phase 1. The primary business driver for this phase is the reduction of Capital Expenditures (CapEx), as a result of server consolidation and flexible test and development. Phase 2. The challenge for Phase 2 is characterized by growth and deployment of Disaster Recovery (DR) solutions and the need for high availability and automated server provisioning. The primary business drivers are reduction of OpEx and the requirement for Business Continuity (BC). Here are some typical use cases: Automated server provisioning and applications deployment using pre-built Virtual Machine (VM) templates Data center architectures and products that provide High Availability (HA) and no interruption of service during server maintenance or failure Storage replication and automated restoration of service to support DR goals Phase 3. We are now moving into this phase, in which business drivers are flexible IT, variable cost, and further OpEx reductions. Phase 3 will provide data centers with policy-driven utility computing, service-level management, and end-to-end service provisioning.
Virtualization has fundamentally transformed the traditional relationship between servers, storage, and the fabric interconnect. Running many VMs on a single server hardware platform can dramatically increase the requisite Input/Output (I/O) load and mandate offloading as much I/O processing as possible, so that CPU cycles can be devoted more productively to application processing. Fully leveraging server virtualization therefore requires more powerful, high-performance storage adapters.
96
Brocade
Brocades new family of 8 Gbit/sec switches supports the rapidly growing data center by delivering 8 Gbit/sec performance on every port with no oversubscription. A completely non-oversubscribed switching architecture enhances server scalability by enabling the rapid growth of virtual servers without compromising data center performance. Integrated Routing (IR) fabric service is a new option on the Brocade DCX Backbone and Brocade 5300 and 5100 Switches with the release of Fabric OS (FOS) 6.1. As of FOS 6.1, IR can be activated on FC8 port blades with up to 128 IR ports per Brocade DCX chassis. (When there are two Brocade DCX chassis connected via Inter-Chassis Links, a total of 256 IR ports are available.) No additional hardware is required to enable perport Fibre Channel Routing; only an optional IR software license is required. IR can be enabled on the maximum number of ports on the Brocade 5300 (80 ports) and Brocade 5100 (40 ports) via user configuration. Brocade 8 Gbit/sec HBA ASICs support a maximum of 500k I/O per Second (IOPS) per port ( >1M IOPS on a dual-port HBA) to free up the host processors and meet virtualization productivity goals. In the future, two 8 Gbit/sec HBA ports will be able to be combined into a single, ultra-high-speed 16 Gbit/ sec connection using Brocade ISL Trunking technology, which balances traffic flows at the frame level. Currently, the benefits of Brocade 8 Gbit/sec switching technology are extended directly to VMs via N_Port ID Virtualization (NPIV), so that special Brocade features, such as Top Talkers and QoS Traffic Prioritization, can be applied to individual VMs. This end-to-end fabric and host integration is unique to Brocade and offers the industrys highest I/O performance for virtualized environments. Brocade 8 Gbit/sec HBAs complement industry-leading performance with advanced storage functionality to further streamline virtualized server operations. To meet regulatory compliance requirements, for example, Brocade 8 Gbit/sec HBAs implement the industry standard Fibre Channel Security Protocol (FC-SP) and will support in-flight data encryption for secure network transactions. In addition, the new Brocade fabric service, Adaptive Networking, provides configurable Quality of Service (QoS) for each VM. With the increasing use of VM mobility to shift application workloads from one hardware platform to another, conventional networking methods are no longer sufficient. Brocade meets the needs of more dynamic virtualized environments by providing an integrated fabric and HBA solution that can selectively deploy security and QoS to VM-hosted applications as required.
Brocade 8 Gbit/sec Products
he Brocade DCX Backbone, Brocades first 8 Gbit/sec platform, with 16-, 32-, and 48-port blades, was released in early 2008. In May 2008, Brocade completed the
97
transition to 8 Gbit/sec with the release of Fabric OS 6.1 and a full family of new switches and HBAs for end-to-end connectivity in the data center: Brocade 815 (single port) and 825 (dual port) HBAs Brocade 300 Switch with 8, 16, and 24 ports Brocade 5100 Switch with 24, 32, and 40 ports Brocade 5300 Switch with 48, 64, and 80 ports FC8-16, FC8-32, and FC8-48 port blades for the Brocade 48000 Director Brocade 8 Gbit/sec switches comply with industry standards; and fabrics with 4 and 8 Gbit/sec devices interoperate seamlessly. Visit the Brocade Web site for data sheets describing these products: www.brocade.com
Conclusion
peed increase in Brocade switching platforms is one of many advantages from Brocades next generation ASIC family. Higher speed in the data center brings the immediate benefit of higher-performing ISLs and increased scalability; since ISL performance is doubled, more ports can be used for servers and storage. In addition, 8 Gbit/sec is needed for server virtualization, scaling of fabrics, backups, and high-performance computing requirements. New capabilities, such as Adaptive Networking and Integrated Routing, plus the enhanced power efficiencies of the new switch platforms are also important drivers for adoption of 8 Gbit/sec technology. Every data center user has or will have these needs in the future, and as data center plans are developed, Brocades integrated end-to-end 8 Gbit/sec solution provides the broadest choice of capabilities with the highest performance and efficiency. Further information may be found on our website at: www.brocade.com Contact: Brocade Communications GmbH Ralf Salzmann Altknigstrasse 6 64546 Mrfelden Walldorf ralf.salzmann@brocade.com
NetApp Innovative solutions for storage and data management

etApp embodies innovative storage and data management with excellent cost efficiency. The commitment to simplicity, innovation and the success of its customers has enabled the company to become one of the fastest growing storage and data management manufacturers. The wide-ranging solution portfolio for serverto-storage virtualization, business applications, data security and much more has persuaded customers worldwide to opt for NetApp. NetApp ensures that your business-critical data is constantly available and can also simplify your business processes. Based on the motto Go further, faster, NetApp helps companies to be successful. The storage requirement for company data will continue to grow fast in coming years. This presents IT managers with the challenge of purchasing an ever increasing quantity of storage equipment yet also having to manage such devices. With the help of its powerful unified storage architecture NetApp helps companies overcome these challenges efficiently: Extremely low operating costs (TCO), very fast backup and restore processes, high availability levels, consolidation and virtualization options as well as simplified and easy management of the entire storage environment are behind the NetApp slogan Go further, faster.
Solutions Microsoft, VMware, Oracle and SAP are important strategic NetApp partners. NetApp has developed a wide range of tools for its database and application software.
The NetApp concept

Data ONTAP A small, very effective operating system kernel, optimized to meet storage process requirements, provides top performance for storage applications and has a wide range of usage: Data ONTAP permits universal data access via block or file level protocols. Data access protocols, such as NFS, CIFS, iSCSI, FC, http and FTP permit access at file level.
100
NetApp
Extremely flexible storage allocation Data ONTAP means that all the hard disks available in a system can be used for all applications. A specific storage area no longer needs to be assigned to certain applications. Each application can use all the drives installed, regardless of the overall storage space required by the application. Snapshot Using Snapshot technology developed by NetApp, 255 snapshots can be created for each volume created without any loss in performance. Snapshots are read-only copies of the data on the respective volume (i.e. the data in the snapshot is stored on the same disks as the productive data). Flexvol & Flexclone NetApp FlexCloneTM technology enables real cloning, an immediate replication of the data volumes and data sets without any additional storage space. NetApp FlexClone provides notable savings in storage space and has low overheads. Metrocluster If the two storage controllers of a FAS cluster are at different locations, MetroCluster synchronously mirrors the data (see SyncMirror) in both data centers. An automatic transfer is made if an FAS system is not available.
The NetApp unified storage concept

etApp has designed a completely standardized storage architecture which seamlessly integrates all the storage products and which can be managed with one standardized operating system (Data ONTAP). This results in a number of business benefits:
Less administration effort Fewer employees can manage more storage and more applications. Better use of existing storage resources As all the devices are connected in a network, data and applications can be distributed, as required, irrespective of the physical storage space (virtualization). Storage reductions of up to 30% can thus be obtained for the same amount of data and operational applications.
Innovative solutions for storage and data management
101
Long-term investment protection When requirements change, the storage systems can be extended at any time to optimally meet new requirements. Investment in other dedicated systems is avoided. Flexible scalability When storage requirements drastically increase due to more data or new business applications, additional storage equipment can be integrated instantly. No migration costs The end-to-end NetApp product range enables fast and smooth data transfer when other systems are added.
NetApp FAS systems

he powerful, scalable and reliable NetApp FAS systems (fabric attached storage) for simplified data management meet company requirements whatever their size starting from the Global 1000 firms down to medium-sized companies and small departments. The installation, configuration and administration processes for these systems are extremely simple. The stable and flexible microkernel operating system Data ONTAP enables the simultaneous and combined usage in FC SAN, iSCSI and NAS environments. In addition to Windows, Unix and Linux, web data is also supported. The FAS product range comprises the high-end FAS6000 series for data consolidation in large and highperformance applications, the Midrange series FAS3100 with exceptional price/performance ratio for SAN and NAS, and the FAS2000 series for data backup in branches and medium-sized companies. The FAS systems were designed to consolidate and provide data for a wide range of applications, including business applications, e-mail, Enterprise Content Management, technical applications, file shares, home directories and web content.
Maximize storage utilization and performance via virtualization Server virtualization in the data center improves server usage and simplifies server administration for the customer. The demands on storage systems increase, as they must support higher I/O rates, more capacity and the fast, non-interrupted provision of storage space. If storage optimization is not taken into consideration during server virtualization, then you have probably only moved the costs from the servers to the storage systems.
102
NetApp
Together with providers, such as VMware, NetApp offers solutions and best practices for developing a virtualized infrastructure from servers to storage that provide a number of advantages: Scalable and consistent I/O performance for all ESX protocols (NFS, iSCSI and FC) Flexible, fast, simple and low-priced provision and data management solutions First-class virtualized storage solution for thin provisioning in heterogeneous storage environments NetApp deduplication in the Esx environment The NetApp deduplication is one of the fundamental components of our data ONTAP operating system. The elimination of redundant data objects and exclusive referencing to the original object permits more efficient use of the existing storage. SnapManager for virtualized infrastructures (VI) SnapManager for VI provides customers with an automated solution for backing up and restoring virtual machines in a VMware ESX environment. The two main advantages of this solution are: The backups created using NetApp snapshot technology only use a fraction of the storage space that traditional systems would require. The system performance of the ESX environment and thus the applications are hardly impaired by the SnapManager backup and restore processes. More than 5000 customers worldwide (March 2008) already benefit from the advantages of a VMware solution with NetApp storage. Support for a virtual desktop infrastructure In addition to server virtualization, the VMWare Virtual Desktop Infrastructure (VDI) offers additional resource-saving virtualization technology. Application environments no longer run on the users desktop processor but in virtual machines in the data center. NetApp FlexClone can configure thousands of such virtual machines within minutes. Data deduplication enables storage capacity savings of approx 90%. NetApp solutions for SAP As a worldwide technology partner for SAP, NetApp has a successful history in developing solutions which significantly simplify SAP data management. As one of the founding members of the Adaptive Computing Initiative for SAP, NetApp has been awarded numerous certificates for the compatibility of its storage solutions and is in the SAP compliance list for the SAP Adaptive Computing Services for Unix, Linux and Windows platforms. NetApp won the SAP Pinnacle Award for Technical Innovation and cooperation in the Adaptive Computing Netweaver Innovation category for the FlexFrameTM for
Innovative solutions for storage and data management
103
mySAP Business SuiteTM joint development with Fujitsu Siemens Computers. Integrative components are NetApp system cloning and backup/recovery scenarios. The NetApp Unified Storage model provides SAN / IP SAN and NAS connections with block and file access methods within a single storage architecture. Data management solutions, such as FlexCloneTM, are used to clone SAP productive systems within a few minutes and without affecting performance and without any additional initial storage requirement. This thus significantly simplifies the addition and management of systems for QA, test, development, reporting, interfaces and training. The combination of NetApp SnapShotTM and SnapRestore provides SAP customers with fast, simple backup and restore processes for several TB of SAP data as well as efficient, simple upgrades and migrations of SAP systems. NetApp Adaptive Computing solutions enable SAP customers to react dynamically, flexibly and economically to business requirements. NetApp also offers the following for those companies using SAP: A comprehensive range of products for Windows, Unix and Linux environments with unified NAS/SAN storage solutions. ILM solutions: storage consolidation, backup and recovery, archiving and compliance via ArchiveLinkTM and/or WebDAV High-availability and disaster recovery solutions for data encryption SnapManager for SAP: the solution certified by SAP simplifies the creation of application-consistent snapshot copies, automated error-free data restores and permits application-specific disaster recovery. Clones of the SAP database can also be automatically created. There is a worldwide support agreement between NetApp and SAP which ensures that the customer has 7x24 SAP infrastructure support. Further information about this and other NetApp solutions can be found at www. netapp.com
CA RECOVERY MANAGEMENT
Data protection and availability

Solution
A Recovery Management offers you the functions and services which you really need: comprehensive protection and reliable recovery in a simple easy-to-use solution. It includes application-related, powerful backup with replication, interrupt-free data protection, automated failover and interrupt-free disaster recovery tests so that you can provide the protection which you need for specific company data. A simplified interface and new central management functions enable you to organize, monitor and configure your entire backup environment. This results in lower operating costs and improved administration. Thanks to the modular approach this solution is suitable for every company whatever its size. It links up CA ARCserve Backup with CA XOsoft Replication and CA XOsoft High Availability so that you can define a multi-level data protection concept to meet your corporate objectives.
Benefits
he solution can be seamlessly integrated into existing IT management solutions so that Enterprise IT Management is simplified and extended.
CA Recovery Management: a complete data protection solution

A Recovery Management offers comprehensive and integrated data protection and recovery functions which your company requires. Robust and proven technologies are used which are connected via one simplified interface. These technologies provide multi-level data protection which can be aligned to your company targets, requirements and guidelines and covers numerous hardware and software platforms.
106
CA Recovery Management
CA Recovery Management combines proven, powerful solutions CA ARCserve Backup, CA XOsoft Replication and CA XOsoft High Availability.
CA ARCserve Backup Release 12

A ARCserve Backup provides the most comprehensive data protection currently available on the market. The solution enables central management and offers enhanced functions to meet your companys changing requirements. Functions, such as FIPS-certified 256 encryption, improve security. Reports provide useful information and optimize backup procedures as well as the catalog database based on SQL Express with improved indexing for fast recovery. A combination of these functions gives companies more control, higher operational efficiency, more protection and reduced costs. This solution enables reliable data protection in the Enterprise class for a wide range of operational environments. The fact that CA ARCserve Backup also contains integrated virus protection and encryption tools makes it the most secure backup solution currently on the market and one that can be immediately implemented.
CA XOsoft High Availability & CA XOsoft Replication Release 12
his solution for maximum business continuity is based on asynchronous real-time data replication with automatic failover and automatic reset. It also enables integrated continuous data protection (CDP) against corruption and automatic disaster recovery tests which ensure that critical company data and applications can be recovered. CA XOsoft High Availability provides low-priced interrupt-free availability of servers which run Microsoft Exchange, Microsoft SQL Server or Oracle databases, Microsoft Internet Information Services (IIS) WebServer, Fileserver, BlackBerry support and other applications (both on 32 and 64-bit Windows servers). Contact: cainfo.germany@ca.com, www.ca.com/us
Sun Microsystems / StorageTek
More than 35 years experience in the classic field of data security

s a result of the takeover of StorageTek Sun Microsystems has advanced to become one of the worlds leading providers of tape libraries and drives. Customer-oriented solutions with special service offerings are a mark of the company. Sun Microsystems (JAVA) develops information technology for the global economy. With its vision The Network is the Computer Sun is promoting the distribution of the Internet and is itself focusing on open innovations, development of communities as well as market leadership in Open Source. Sun is represented in more than 100 countries.
Markets
un Microsystems is a system provider that develops hardware and software. Since software development concentrates on resolving system-related tasks or setting strategically important milestones in line with Suns vision, Sun is not in competition with the developers of application programs. On the contrary, firm partnerships exist with numerous renowned software manufacturers in order to develop offerings together. In this way, customers retain their freedom, because they can decide in favor of the best solution on the market. To ensure that systems are integrated at an early stage Sun Microsystems has set up a number of partner programs, which have pioneered both branch-related and task-related methods. There is a solution portfolio of almost 13,000 commercial and technical applications for Sun Systems with the platform SPARC/Solaris. In addition to the partnerships with independent software manufacturers, Sun is very much committed to long-term sales partnerships with innovative distributors and resellers. These partnerships have enabled fast and competent solutions for end customers on a wide-spread basis.
108
Sun Microsystems
The partnership between Fujitsu Siemens Computers and Sun Microsystems

partnership of many years exists between Fujitsu Siemens Computers and Sun Microsystems/StorageTek, which has been extended to include the Nearline (tape) sector and, since 2008, SPARC-based systems. As a result of OEM contracts as well as reseller agreements with Sun for Nearline, drive types and Sparc-based products Fujitsu Siemens Computers has won over a strong partner for disaster recovery and backup. Sun Nearline products as quasi-standards are convincing with their high performance levels and above-average error tolerance. On account of their open design and varied connection options they can be optimally integrated in backup environments with the systems sold by Fujitsu Siemens Computers, e.g. CentricStor Virtual Tape Appliance. Cooperation between Sun Microsystems and Fujitsu Siemens Computers has already proved to be extremely effective in large-scale, complex projects. And this applies to companies of the most different kind, particularly large enterprises and small to medium-sized customers. The numerous joint projects, which you can read about in the case studies, are proof of this.
Sun StorageTek Open Storage solutions
lectronically stored data are currently one of the most important assets for many companies. Storing these data securely, managing them intelligently and having them available in a controlled way at all times is of vital interest for every company. Unrelenting data growth and the increasing number of statutory regulations with regard to archiving periods and access controls call for new data storage and administration methods. Sun StorageTek Open Storage solutions allow you to face these challenges successfully. They combine our more than 35 years of experience in classic fields, such as data security, data availability and failure concepts, with new technologies like access protection through identity management and storage virtualization. This in turn enables trend-setting information lifecycle management infrastructures, which not only store their data securely and in compliance with regulations, but also make them available to everyone who needs them.
More than 35 years experience in the classic field of data security
109
Tape Library StreamLine SL500
he Sun StorageTek StreamLine SL500 tape library is the entry-level solution for small to medium-sized businesses. Tried-and-trusted LTO technology ensures trouble-free operation. The modular expandability of a standard rack permits customers to adapt capacities to meet their operating needs at any time.
Tape Library StorageTek SL3000
he Sun StorageTek SL3000 modular library system is an innovative, environmentally aware storage concept for the midrange market. The Any Cartridge Any Slot technology enables you to use the best possible drives depending on access and storage requirements. Highlights Flexible scaling, for which only the capacity actually used is charged. Cost savings through less space and lower power consumption. Simpler storage management through partitioning, sharing. Innovative technologies, risk reduction and development of new opportunities. Support and services for successful installation, optimization and maintenance.
Tape Library StreamLine SL8500
he modular Sun StorageTek StreamLine SL8500 tape library enables organizations to set up their operational processes more efficiently, to maximize availability and create optimal prerequisites for auditing reliability with the burden of costs and impairment to their processes remaining absolutely within scope. Highlights Data consolidation: Excellent scaling as well as support for mixed media. Joint resources: Designed for use in mainframes, Unix, Linux and Windows. Higher availability: Upgrade without downtimes. Redundancy of operation. Higher throughput: High-performance throughput and capacity. Simple scaling: Without any downtimes, growth according to your requirements. Low space requirements: High density of media slots, optimal utilization. For more information on Sun solutions, visit www.sun.com
Symantec: Confidence in a connected world
ymantec is a global leader in providing security, storage and systems management solutions to help businesses and consumers secure and manage their information. Headquartered in Cupertino, Calif., Symantec has operations in more than 40 countries. Market Categories Consumer Products; Security and Compliance; Storage and Availability Management; Symantec Global Services. Symantecs leaders bring decades of diverse experience and a history of success. Combining business acumen with technical savvy, these executives guide more than 17,500 talented employees to create innovative products and solutions that enable customers around the world to have confidence in their infrastructure, information and interactions.
Partnership between Fujitsu Siemens Computers and Symantec

early all Symantec products (and those formerly of VERITAS) are certified for Fujitsu Siemens Computers products. Above all, Fujitsu Siemens Computers is offering the VERITAS Storage Foundation as the strategic file systems and volume management solution for its PRIMEPOWER servers. NetBackup and Enterprise Vault are strategic products and integral components of the Fujitsu Siemens Computers storage software portfolio. The Fujitsu Siemens Computers CentricStor Virtual Tape Appliance is already being optimized through additional functions to Symantec products in its operations. The cooperation between Symantecs software-solutions alongside the platforms of Fujitsu Siemens Computers has already proved to be very effective in large and complex projects. This applies for all varieties of businesses, especially for large enterprises and SME customers. Fujitsu Siemens Computers is using Altiris technology for building its DeskView and ServerView solutions.
112
Symantec
Veritas NetBackup High performance data protection

eritas NetBackup Enterprise Server delivers high performance data protection that scales to protect the largest Unix, Windows, Linux and NetWare environments. Offering complete protection from desktop to data center to vault, NetBackup software offers a single management tool to consolidate all backup and recovery operations. NetBackup helps organizations take advantage of both tape and disk storage with its advances in disk and snapshot-based protection, off-site media management, and automated disaster recover. For the ultimate in data protection, Net-Backup offers data encryption that transmits and stores data using the latest encryption technologies on the market today. To reduce the impact on business critical systems, NetBackup software provides online database and application aware backup and recovery solutions for all leading databases and applications to deliver data availability for Utility Computing. Veritas NetBackup offers granular file-level and image-level recovery from a single backup, automatic discovery of virtual machines (winner of 2007 Best of VMWorld for Data Protection).
Symantec Enterprise Vault E-Mail Archiving
ymantec Enterprise Vault provides a software-based intelligent archiving platform that stores, manages, and enables discovery of corporate data from e-mail systems, file server environments, instant messaging platforms, and content management and collaboration systems. Because not all data is created equally, Enterprise Vault utilizes intelligent classification and retention technologies to capture, categorize, index, and store target data in order to enforce policies and protect corporate assets while reducing storage costs and simplifying management. Enterprise Vault also provides specialized applications, such as Discovery Accelerator and Compliance Accelerator, that mine archived data to support legal discovery, content compliance, knowledge management, and information security initiatives. It also eliminates PST headaches and removes the problems of backup, security, stability and storage consumption inherent to PST files. Archiving for legal and compliance enables you to retrieve e-mail content and corporate documents to meet legal and regulatory compliance requirements. Further information may be found on our website at: www.symantec.com Contact: Harald Derbsch: harald_derbsch@symantec.com Rene Khnen-Wiesemes: rene_wiesemes@symantec.com
Storage Basics
Contents
Section 1 The information society saving data and knowledge at new levels Section 2 Tiered storage: intelligent information management in the company Section 3 Online storage: disks and reliability Section 4 Storage networks spoilt for choice Section 5 Backup & Restore: an unloved compulsory exercise Section 6 Storage management making complex storage networks manageable Section 7 Virtualization some catching up is necessary regarding storage topics Section 8 The storage strategy of Fujitsu Siemens Computers and its partners Forecast: Future storage trends Glossary As well as articles about storage from our partners Brocade, CA, EMC, NetApp, Sun and Symantec
Price: 14.99 Euros

Basics of Storage Environment

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Basics of Storage Environment

Încărcat de

Drepturi de autor:

Formate disponibile

Storage Basics

An introduction to the fundamentals of storage technology

The information society saving data and knowledge at new levels

Original forms of information technology

The information society saving data and knowledge at new levels

Changing information forms within companies

The growth of non-structured data (Web 2.0)

The information society saving data and knowledge at new levels

How big is an Exabyte?

Storage in such a world is thus gaining in significance:

Tiered storage: intelligent information management in the company

World wide File and Block Disk Storage Systems, 20052008

Tiered storage: intelligent information management in the company

HSM and ILM: classic concepts with long-term effects

What is Tiered Storage?

Tiered storage: intelligent information management in the company

ILM Implementation Roadmap

Begin collaborating on requirements Identify value, lifecycle & classification

Tiered storage and data classes

Tiered storage: intelligent information management in the company

Tiered Storage and the Data Lifecycle

Probability of reuse (%)

Fixed content Video, medical,

Tape libraries, deep archive Offsite vaults

0 Average days since creation Recovery Time Objective (RTO)

30+ days seconds Data mover

Key components (ILM) Policy engine Tiered storage hierarchy

Source: Horison Information Strategies

Online storage: disks and reliability

Lifespan of hard disks

The disk market in terms of interfaces

16,4 33,9 44,9

Source: Gartner Dataquest

Online storage: disks and reliability

The performance of various disk types

Online storage customer requirement: reliability, availability

Online storage: disks and reliability

Hard disks a contradiction in terms regarding long-term data backup

Reliability with RAID

Online storage: disks and reliability

Milestones in the hard disk drive industry

Source: Storage Newsletter, edition July 2008

How much data protection do you need?

Storage networks spoilt for choice

Why storage networks make sense

Storage networks spoilt for choice

Server Storage Area Network Fibre Channel SAN

Source: Fujitsu Siemens Computers

Storage networks spoilt for choice

Consolidate file services

Network File System File System

The difference between SAN and NAS

Protocol Optimized for Types of data Split Storage for Drives

Source: Fujitsu Siemens Computers

Storage networks spoilt for choice

Comparing the three topologies

Backup & Restore: an unloved compulsory exercise

Fundamental backup terms

Backup & Restore: an unloved compulsory exercise

Backup and recovery strategies

How does data loss come about?

Hardware or System Malfunction 44 %

14 % Software Corrupton or 3 % 7% Program Malfunction Natural Disasters Computer Viruses

Backup & Restore: an unloved compulsory exercise