Sunteți pe pagina 1din 8

CLARiiON's Data Integrity Difference

The CLARiiON Advanced RAID Storage System has emerged as the leader in open systems RAID technology, due in large part to its high availability and ease of maintenance features. The true differentiation, however, can be found in its bulletproof data integrity handling. Why does data integrity become increasingly difficult in RAID systems? With RAID, user data that was located on a failed drive now must be re-created using other drives. If this process is not managed properly, incorrect data can be reconstructed and given to the user. CLARiiON full Fibre Channel arrays are the next-generation RAID product. CLARiiON has shipped over 6 Petabytes (6,000,000 Megabytes) in over 80,000 arrays. As a result, CLARiiON has millions of hours of experience handling data integrity scenarios in the field. Mission-critical storage devices providing continuous data access 24 hours each day; seven days per week must anticipate failure of disk drives, storage processors, and other array components. Potentially data-corrupting RAID scenarios are numerous and complex. Most of these scenarios involve the loss of a drive along with power or a storage processor (SP) failure. CLARiiON can handle such failures without fear of undetected data loss. The focus of this paper is on handling typical RAID 5 error conditions although CLARiiON does support other RAID types with an equal amount of data protection. In this scenario we will examine typical RAID 5 error conditions when two active SPs are present via a CLARiiON systems dual-porting feature. Other products that are limited to one SP or a "hot-standby" must solve the same problems. The reader should remember that the error conditions discussed in this paper represent only a few examples of what can go wrong in the real world.

Failures While Updating Parity

In Figure 1, a CLARiiON SP is controlling a RAID 5 group of disks. A cross section of the same physical block on each drive shows user data on four of the drives plus parity ("10") on the fifth drive. The parity data "10" is calculated by summing the data on the data drives. If one of the drives that contain data fails, the lost data can be reconstructed on a Read request by subtracting the remaining values from the parity. (Note that in actuality, parity is calculated, then reconstructed in the array using exclusive OR logic; we are using addition and subtraction here to simplify the example.)

Power Loss Scenario


CLARiiON Data Integrity Page 1

A problem arises when a write update to a RAID 5 group that has already suffered a drive failure (e.g., third drive from left in Figures 1 and 2) occurs. If the file is updated and a new block of data is written to the second drive on the left, changing the value from "2" to "3," then a parity update ("10" to "11") should be written to the drive holding the parity block. If "3" is written to the second disk and the SP loses power before writing "11" to the parity drive, the array stripe is left in an incoherent state. If the user attempts to read the third drive, the data is reconstructed as a "2" instead of the correct value of "3." When the failed drive is subsequently replaced, the same incorrect value is rebuilt and written to the replacement drive. The traditional approach to dealing with this problem has been to use an uninterruptible power supply. A UPS holds the system up (at least) long enough for a parity update to complete and for no data to be lost. However, this strategy does not apply to RAID subsystems because SP failures and drive failures are not necessarily power failures. Also, due to cost considerations, not every system is equipped with a UPS. Obviously a UPS is not a foolproof solution to the problem.

Another approach has been to use a background verify process. This is a process that runs through the entire address space of an array, fixing data/parity mismatches as it encounters them. This method has been discarded because it is useless if a drive has failed or fails during the background verify procedure. Consider Figure 2. A background verify procedure has no knowledge that the value "3" exists on the third drive. So it is impossible to recalculate the proper parity value of "11" without having all of the drives present. There are no safeguards in place should the user request the data from the failed drive.

Pro Active Data Integrity


CLARiiON uses a unique combination of three levels of data protection to provide a foolproof solution to the problem of returning incorrect data:

1. Standard RAID 5 Parity: Allows for reconstruction of data on a read request from a failed drive. 2. Patented error-handling algorithms: ensures that data and parity are always coherent, even
when multiple failures occur.

CLARiiON Data Integrity

Page 2

3. Data Stream Checksumming: A mechanism that ensures data was written/read correctly to
and/or from an individual disk. CLARiiON has employed the standard RAID definitions in its designs. Therefore, the RAID definitions are not discussed in this paper. What sets CLARiiON apart in terms of data integrity focus are the patented Prevention and Detection algorithms and the data stream check-summing techniques that recognize potentially dangerous data integrity scenarios and prevent them from occurring.

First, CLARiiON formats the disk sectors at 520 bytes instead of the normal 512 bytes to provide eight bytes per sector for error detection handling. These additional bytes include linear checksums and status bits for the stripe. Second, non-volatile memory (or NOVRAM) on the SP is used to ensure that in the event of an AC power loss, the consistency of the data and corresponding parity is maintained. To see how CLARiiON deals with the earlier example of data integrity (that in Figure 2, resulted in undetected data corruption) let us step through CLARiiON's patented "Parity Shedding" algorithm, which is shown in Figure 3. When CLARiiON determines a drive in a stripe has failed, the stripe will operate in a degraded mode where the focus is on avoiding the catastrophe of data/parity incoherence. While in this mode, a read request to the stripe will return the data as read from the target drive, if running, or, the "Exclusive Or" product if the target drive is the failed disk. But let us assume that a write operation is requested that would change the value of the second drive from a value of "2" to "4." Before the write request is executed, Parity Shedding will cause the data from the failed drive to be reconstructed via "Exclusive Or-ing" the surviving drives. This calculated value will then be written OVER the original parity value of "10," and a flag will be set in one of the additional eight bytes indicating that the parity for this stripe is now data, i.e., the Parity has been shed. Now the data value of "2" will be replaced with the requested update of "4." Note that the stripe has no parity and that there is NO point where an AC power loss could lead CLARiiON to have data/parity incoherence thus avoiding any chance for undetected data corruption. Furthermore, any request to read the data from the failed drive will be directed to the former parity drive to get the correct data in a single disk access. When the failed disk drive is replaced, the stripe will rebuild complete with parity to regain "Redundant Mode" operation. The use of NOVRAM, also a CLARiiON patent, keeps track of the state consistency of all stripes in the

CLARiiON Data Integrity

Page 3

array. If the power is lost after a data block has been written, but before the associated calculated parity update is written, this state will be reflected in the NOVRAM entries for the stripe. When power returns, the stripe states are checked and any stripes with inconsistent states have the parity recalculated and then written to the parity sector. In the worst case, if a power failure was accompanied by a failure of a storage processor (i.e., the NOVRAM is lost), a complete Background Verify is executed to ensure that all data and parity are consistent. The Parity Shedding, NOVRAM, and background verify algorithms have patents approved (refer to U.S. Patents 5,305,326 and 5,452,444), and have demonstrated success in over six years of field experience.

CLARiiON also employs Longitudinal Redundancy Checking to assure the integrity of data being written from a storage processor to a disk and being read back from the disk on a host read request. See Figure 4. A portion of the eight additional bytes per sector is used to store the calculated LRC code for each data sector within the stripe. In this way, as multiple streams of data are moving back and forth at up to 100 megabytes per second across the two "back-end" Fibre Channel Arbitrated Loop (FC-AL), one more level of protection is effected.

Dual Active Storage Processors


CLARiiON architecture features two active storage processors with each SP able to access all the drives independently. One benefit of this implementation is the continuous access to data that SP redundancy offers to the user. With two SPs, the user can now suffer a single point of failure at the SP, host, or front-end Fibre Channel level and still have uninterrupted access to the data. A second benefit of dual active SPs is increased throughput (I/O operations per second) as a result of load-sharing the disks between the two SPs. A third benefit arises from the fact that two SPs permit the use of mirrored write cache. This enables applications to write to protected cache memory instead of executing four disk transactions to complete a RAID 5 write operation. CLARiiON offers these benefits by handling the increased data integrity complexity that comes with two SPs.

CLARiiON Data Integrity

Page 4

Figure 5 shows a CLARiiON systems dual-porting feature. Both SPs can actively access different disks. In this diagram, there are two SPs with SP-A having performed a write of "3" to a RAID 5 stripe that it owns exclusively. SP-A has calculated a parity value of "12" but has not yet updated the parity value on disk. In Figure 6, SP-A has failed. The user then chooses to transfer ownership of the RAID 5 group through SP-B using CLARiiON's "trespass" feature. This feature allows for a smart host and device driver to reroute failed commands to the surviving SP without user intervention (via Application Transparent Failover ATF software). At this point, because the parity value on disk was not updated to "12," there exists an incoherent data/parity cross-section that could prove disastrous should a drive failure occur. Parity no longer reflects the data on the other four drives! The CLARiiON array, however, detects this situation and will invoke a Background Verify to ensure that SP-B makes the parity on disk coherent. CLARiiON arrays can do this because of their patented technique of maintaining status information in the additional eight bytes per sector.

Of course, without a second SP, the user loses much of the value of a RAID system by becoming susceptible to a single point of failure. The permutations that exist for RAID 5 failure scenarios are numerous. With the addition of a second SP that can access the same disks, the possibilities double. Not only has CLARiiON addressed the complicated failure scenarios inherent in any RAID 5/dual-SP implementation, but it supports ACTIVE dual SPs, where both storage processors can simultaneously

CLARiiON Data Integrity

Page 5

access different RAID groups connected via the alternate backend FC-AL to the dual-ported drives.

Stripe Access Management


An essential feature of CLARiiON dual active SP implementation is its exclusive ownership of stripes by each storage processor. That is, for normal read/write access, only one of the two SPs may access a given stripe. Only when the loss of the owning SP occurs will the stripe become accessible to the surviving SP via the CLARiiON trespass facility (either automatically or manually). Thus, CLARiiON precludes the occurrence of a wide range of possible data integrity problems. Let's examine the difficult data integrity problems that might occur if CLARiiON permitted dynamic access by SP-A and SP-B to a given stripe. Consider Figure 7, where the array contains four different files, A, B, C, and D. Parity (P) is contained on the fifth drive. Imagine if both SP-A and SP-B can access any file, with locking being done on those files at the operating system level. SP-A is writing A* as a replacement for file A. This will eventually result in a parity update (P*) to the fifth drive. SP-B is reading file C. In normal mode this sequence of events works properly.

But what happens when drive C is missing, as in Figure 8? The write of A* has already occurred to the first drive. Parity (P*) is being calculated on SP-A but has not yet been written to disk. SP-B, sensing that the data C cannot be directly read from disk, is reading from the other four drives in order to recalculate C. Once SP-B has read all four drives, it subtracts the three data files from the parity file in order to recalculate C. Note that the parity does not yet reflect the updated data A.* In this case, incorrect data will be returned to the host instead of to file C. All of this can take place even with operating system file locking. If the operating system locks file A, that does not prevent an access to file C. One possible approach to the problem in Figure 8 might be to use the reserve/release commands found in the SCSI specification (With Fibre Channel, SCSI commands are actually "tunneled" onto the Fibre Channel protocol). These commands could be used at the drive level (after the SCSI commands are stripped off the Fibre Channel link) on every I/O to remove the possibility of reconstructing incorrect data. But this is undesirable because reserve/release commands significantly degrade the system performance, and deadlocks would frequently occur that would need to be untangled.

CLARiiON Data Integrity

Page 6

Another possible scheme is for the operating system to implement stripe locking. Stripe locking would prevent multiple I/Os occurring simultaneously to the same data/parity stripe. This would require that the operating system know the sector layout and parity rotation algorithms used by the SP firmware. However, a problem still exists when rebuilding a failed drive. If SP-A is rebuilding drive C for example, SP-B could be writing to the same stripe and incorrect data could be rebuilt. Stripe locking also negates one of the main advantages of RAID 5: multiple independent accesses on the same stripe! CLARiiON's non-sharing SP architecture avoids these issues while still providing operating system independence (and therefore portability), high performance, and data integrity.

Conclusion
One thing is obvious: data integrity is no simple problem. Traditional solutions such as using a UPS or background verify fall short of customers expectations that their data will always be correct. CLARiiON has earned its level of customer confidence through years of actual usage. The lessons learned from this field experience have given birth to several patented procedures that focus on data integrity. The demand for high availability in the RAID marketplace goes hand in hand with the demand for data integrity. RAID vendors must take care to ensure that incorrect data will never be returned to the user. The response from the marketplace over the key issue of data integrity has been nothing short of outstanding. Over 80,000 CLARiiON systems are running in the field. Millions of hours of operation have been accumulated and hundreds of thousands of real-world problems have occurred. CLARiiON has provided protection for them all. Copyright 2001 EMC Corporation. All rights reserved. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written consent of EMC Corporation. The information contained in this document is subject to change without notice. EMC Corporation assumes no responsibility for any errors that may appear. All computer software programs, including but not limited to microcode, described in this document are furnished under a license, and may be used or copied only in accordance with the terms of such license. EMC either owns or has the right to license the computer software programs described in this document. EMC Corporation retains all rights, title and interest in the computer software programs.

CLARiiON Data Integrity

Page 7

EMC Corporation makes no warranties, expressed or implied, by operation of law or otherwise, relating to this document, the products or the computer software programs described herein. EMC CORPORATION DISCLAIMS ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. In no event shall EMC Corporation be liable for (a) incidental, indirect, special, or consequential damages or (b) any damages whatsoever resulting from the loss of use, data or profits, arising out of this document, even if advised of the possibility of such damages.

EMC2 (the EMC logo), EMC, CLARiiON, CLARalert and Navisphere are registered trademarks and Access Logix, Application Transparent Failover, MirrorView, SnapView, where information lives are trademarks of EMC Corporation. All other brands or products may be trademarks or registered trademarks of their respective holders.

CLARiiON Data Integrity

Page 8

S-ar putea să vă placă și