Sunteți pe pagina 1din 27

Recommended Best Practices Considerations for High

Availability on IBM® System Storage™ DS8000 and


DS6000 and IBM TotalStorage® ESS

Prepared by:
Cam-Thuy Do and John Sing
IBM High Availability Center of Competency
October 2007

© Copyright 2007 IBM Corporation. All rights reserved.


IBM Systems and Technology Group

Disclaimers

Copyright © 2007 by International Business Machines Corporation.


No part of this document may be reproduced or transmitted in any form without written permission from IBM
Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change
without notice. This information could include technical inaccuracies or typographical errors. IBM may make
improvements and/or changes in the product(s) and/or programs(s) at any time without notice.
Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such
products, programs or services available in all countries in which IBM operates or does business. Any reference to
an IBM Program Product in this document is not intended to state or imply that only that program product may be
used. Any functionally equivalent program, that does not infringe IBM’s intellectually property rights, may be used
instead. It is the user’s responsibility to evaluate and verify the operation of any on-IBM product, program or service.
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER
EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no responsibility to update this information. IBM
products are warranted according to the terms and conditions of the agreements (e.g., IBM Customer Agreement,
Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM
is not responsible for the performance or interoperability of any non-IBM products discussed herein.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any
IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.

Version 1.8 Copyright © IBM Corporation 2007. All rights reserved.


IBM High Availability Center of Competency

Trademarks
IBM, IBM eServer, IBM logo, e-business logo, CICS, DB2, MQ, ESCON, Enterprise Storage Server, GDPS, IMS, MVS,
OS/390, Parallel Sysplex, Redbook, Resource Link, S/390, System z9.iSeries, pSeries, xSeries, OS/400, i15OS, System
Storage, TotalStorage, VM/ESA, VSE/ESA, WebSphere, z/OS, z/VM, z/VSE, and zSeries are trademarks or registered
trademarks of International Business Machines Corp. in the United States, other countries, or both.

Linux is a registered trademark of Linux Torvalds in the United States, other countries, or both.

Microsoft is a registered trademark of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States, other countries, or both.

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

IBM System Storage Enterprise Disk

New Standard in New Standard


Pricing and in Functionality,
Packaging Performance, TCO

DS6000 ESS 750 / 800


DS8000

ƒ This document provides a summary of recommended High Availability best practice


considerations for the DS8000, DS6000, and Enterprise Storage Server disk subsystems
ƒ The reader is assumed to have a baseline understanding of the concepts and facilities of
these products

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices

ƒ Configuration

• RAID 5 - spreads data across multiple disk drives using parity (P) and spares, thus
providing redundancy (e.g. A 6+P+S array consists of six data, one parity drive and
one spare)
¾ Use RAID-5 when the desire is to use less storage, but at expense of longer rebuild time if drive fails

• RAID 10 stripes half the disk drives while the other half of the array mirrors the first
set of disk drives
¾ Use RAID-10 when the desire is for highest performance and/or lower rebuild time
¾ At expense of requiring larger amount of raw storage

ƒ Exploit available hardware options


• Server & Storage fail-over/fall-back in Metro Mirror Environment
• Concurrent Maintenance
• Minimize Single Frame DS8300 purchases as 1st expansion frame upgrade is disruptive.
• Distribute Host connections across multiple physical adapters on the DS8000
• Verify all host paths are available before upgrading software
• Logical Partitioning (LPAR) capability to distribute workloads

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices

ƒ Multiple Redundant Management Control Consoles


ƒ Uninterruptible Power Supply
ƒ Earthquake Resistant Kit (where applicable)
ƒ Consider IBM Standby Capacity on Demand (Standby CoD) offering for capacity planning
ƒ Enable Call Home and Remote Support
ƒ Monitor the storage subsystem status
• e-mail notification for a serviceable event
• Simple Network Management Protocol (SNMP) notification
• Service Information Message (SIM) notification – zSeries
• Reviewing the event log of the DS8000

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices

ƒ Maintain Currency
• Create a regular maintenance window for storage and SAN
• Install firmware updates as recommended
• Understand what fixes/upgrades are in a Firmware update
• Integrate into Change Control Management
• May install first on less critical systems, prior to production
• Maintain supported combinations of Host Adapter Driver
• Subscribe to MySupport
http://www.ibm.com/support/mySupport

ƒ Concurrent Maintenance
• Perform Concurrent Maintenance operations of the storage subsystem during
time of low activities
• Microcode upgrade will be performed by IBM support personnel

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices

Host
HostBased
BasedMonitors
Monitorsand andAlert
Alert
• • GDPS/PPRC
GDPS/PPRC HyperSwap
HyperSwapManager
Manager
• GDPS/PPRC HyperSwap Monitors & Alerts
• GDPS/PPRC HyperSwap Monitors & Alerts
• TPC-R
• TPC-R
ƒ Host Based Collection Facilities
ƒ Host Based Collection Facilities
• z/OS LOGREC
• z/OS LOGREC
ƒ Host Based High Availability Options for Data
ƒ Host Based High Availability Options for Data
• DFSMF Dataset Name separation
• DFSMF Dataset Name separation
ƒ Host Connections – provide multiple paths from each host to the storage
ƒ Host
• MPIO Connections
or Subsystem– Device
provide multiple
Driver (SDD)paths fromSystems
for Open each host to the storage
• MPIO or Subsystem
• Dynamic Device
Path Selection Driver
(DPS) (SDD) for
and Dynamic Open
Path Systems
Reconnect (DPR) for zOS
• • Distribute
Dynamic paths
Pathacross multiple
Selection physical
(DPS) andadapters
Dynamic on Path
the DS8000
Reconnect (DPS) for
ƒ System
zOS i
• DSCLI commands executed through i5/OS interface
• ‘Copy Services for System i’ Toolkit
• Combination of iSeries Navigator and 5250 interface

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices

ƒ Duplicate Storage Subsystems in Campus or Same Data Center Floor


• Can use Metro Mirror for data redundancy, to enable quick Re-IPL
¾ Requires automation S/W such as TPC-R or GDPS
• IBM Softek TDMF to move data around in Real Time
• Can perform local Site Switch before maintenance actions to reduce impact
of human errors and reduce impact to production

ƒ Know the following IBM System Storage web sites


• IBM System Storage support web site
¾ Starting point for IBM System Storage hardware and s/w support
¾ Includes links to subscription services to sign up for email alerts
¾ Includes links to product docs, contact information, fix search engine
¾ IBM System Storage Interoperation Center
http://www-01.ibm.com/servers/storage/support/config/ess/index.jsp

• Fibre Channel host bus adapter firmware and driver level matrix site

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Advanced Copy Functions Overview for Availability

ƒ Point in Time Copy (FlashCopy)


• Minimize application / database downtime required to make local point in time copies for:
- Production backup, data cloning, data warehouse, test and development
- Disk subsystem microcode creates internal copy of data (FlashCopy)
¾ Copy initialization of large terabytes of data can be accomplished in seconds

ƒ Remote Mirroring (Metro Mirror, Global Mirror, zOS Global Mirror)


• Create real-time, continuously updated remote copies of disk subsystem data
- Campus, metropolitan, or geographically distant site
• Data suitable for High Availability fast failover and failback
• Supports large amounts of data, at the terabyte level
• Disk subsystem microcode mirrors volumes/LUNs to remote disk subsystem
- Synchronous capability (Metro Mirror)
- Asynchronous capability (Global Mirror)

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Point in Time internal Data Replication

ƒ Fast Time-Zero internal data replication capability (FlashCopy)


• Create internal copies of data for backup, cloning, data mining, etc.
ƒ Physical configuration
• Assure sufficient target disk space allocated
ƒ Usage practices:
• Plan databases/applications to be in hot backup mode or quiesce to maintain data
integrity
• Back up internal volume/LUN required for:
¾ Operating System catalogs, etc.
¾ Database/application metadata

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Metro Mirror - Synchronous Data Replication
ƒ Applicability:
• General: provide synchronous data replication of disk subsystem at volume / LUN level
• System z: In combination with GDPS HyperSwap, provides foundation for removal of
Parallel Sysplex disk subsystem single point of failure
ƒ Physical configuration, link and infrastructure planning
• Must perform initial and ongoing analysis of write workload to determine sufficient
SAN/WAN/telecom infrastructure bandwidth

ƒ Automation
• Plan for highly automated operational control of mirroring to mask complexity and
support reliability, repeatability, testability

ƒ Testing and testing resource expectations


• Plan to provide Tertiary Copy storage at remote site
- For every production TB to be mirrored, ideally 2x that TB at remote site
- To provide additional storage for ongoing testing environment, resync protection,
and golden copy, problem determination, validation

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Global Mirror - Asynchronous Data Replication
ƒ Applicability of IBM Global Mirror : is usually chosen when
• Open Systems or mix of z/OS and Open asynchronous replication of volumes/LUNs is
desired, and when reduced bandwidth is a necessity
ƒ Link and infrastructure planning
• Must perform initial and ongoing analysis of write workload to determine sufficient
SAN/WAN/telecom infrastructure bandwidth
• Similar speed and throughput characteristics on source and target volumes can provide
optimum performance
ƒ Automation
• Plan for highly automated operational control of mirroring to mask complexity and
support reliability, repeatability, testability
ƒ Availability and Testing
• Plan to provide sufficient Tertiary Copy storage at remote site
- For every production TB to be mirrored, ideally 3x that TB at remote site
- To provide storage for ongoing testing environment, resync protection, golden copy, problem
determination, validation

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Global Mirror (XRC) - Asynchronous Data Replication
ƒ Applicability of IBM z/OS Global Mirror (XRC):
• General: z/OS Global Mirror is usually chosen when:
- Only z/OS data requires asynchronous data replication, or when heterogeneous
z/OS disk vendors are required.
ƒ Physical configuration, link and infrastructure planning
• Must perform initial and ongoing analysis of write workload to determine sufficient
SAN/WAN/telecom infrastructure bandwidth
• Similar speed and throughput characteristics on source and target volumes can provide
optimum performance
• Plan to provide sufficient System z cycles at remote site for System Data Mover
ƒ Automation
• Plan for highly automated operational control of mirroring to mask complexity and
support reliability, repeatability, testability
ƒ Availability and Testing
• Plan to provide sufficient Tertiary Copy storage at remote site
- For every production TB to be mirrored, ideally 2x that TB at remote site
- To provide ongoing testing environment for setup, validation, problem determination,
validation

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices – Three site replication

ƒ When to use 3 site


• Three site replication is used when the requirement is to combine zero data loss RPO using local
Metro Mirror, and combining that with out of region recovery (async).

ƒ Pre-requisites: Three site replication is affordable and justifiable to the business when:
• Data Center strategy and implementation is already well under way towards Active-Active or Planned
Workload Rotation for two site

ƒ Pre-requisite: Two site configuration already includes ongoing:


• Automated failover/failback
• Full Tertiary Copy capability for testing, problem determination, validation, automation
• Ongoing WAN / bandwidth / workload Capacity Planning

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices –


Management of Replication

ƒ Plan for highly automated disk mirroring environment

ƒ Provides foundation for Reliability, Repeatability, Scalability, Testability

ƒ Recommendations for automation software:

• System z environment: GDPS


• Mixed open platform: GDOC
• General disk mirroring mgmt: TPC for Replication

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices Resources


System Storage Business Continuity Solutions website
ƒ http://www-03.ibm.com/servers/storage/solutions/business_continuity/index.html
System Storage Technology Center
ƒ http://www-03.ibm.com/system/storage/
Storage Education http://www-03.ibm.com/systems/education/cust/crossprod/custcp.html
System Storage Interoperation Center
ƒ http://www-01.ibm.com/systems/support/storage/config/ssic/index.jsp

System Storage Services


ƒ http://www-03.ibm.com/systems/storage/services/index.html

Redbooks/Redpapers
ƒ http://www.redbooks.ibm.com/redbooks.nsf/portals/Storage
ƒ The IBM TotalStorage DS8000 Series: Concepts and Architecture (SG24-6452-00)
ƒ IBM System Storage Business Continuity Solutions Overview (SG24-6684-01)
ƒ IBM System Storage DS8000 Series: Copy Services with IBM System z (SG24-6787-02)
ƒ IBM System Storage DS8000 Series: Copy Services in Open Environments (SG24-6788-02)
ƒ IBM System Storage Solutions Handbook (SG24-5250-06)
White papers
ƒ IBM Storage Infrastructure for Business Continuity Solution
ƒ Global Mirror Technical Whitepaper

© Copyright 2007 IBM Corporation. All right reserved.


Data Corruption Solutions

© Copyright 2007 IBM Corporation. All rights reserved.


IBM High Availability Center of Competency

System Storage Enterprise Disk Practices – Data Corruption

ƒ Logical data corruption protection must be designed at the operational and


application level

ƒ Best practices procedures are:


• Sufficient point in time disk copies of data
• To provide adequate known restart points
• Supplemented by operational procedures at the database/application level

ƒ Tools include (but not limited to):


• FlashCopy Point in Time Copy
• Software:
¾ zCDP for DB2 (zOS 1.8 + DB2 9)
- Eliminates need for DB2 Backup Windows via DB2 BACKUP Utility
- No interruption to DB2 Processing to take backups.
- DFSMShsm Maintains up to 50 Backup versions across disk & Tape.
- DB2 RESTORE Utility Granularity - System, Volume, DB Table.

¾ Future: zCDP for Storage – IBM SOD on providing CDP function for all zOS data.

© Copyright 2007 IBM Corporation. All right reserved.


Supplemental Information

© Copyright 2007 IBM Corporation. All rights reserved.


IBM High Availability Center of Competency

FlashCopy: Local Point in Time Data Replication to improve


data availability

FlashCopy Use Cases


Copy data command issued - Copy is immediately available
- Production backup
¾ Regain information from an older level of data
¾ Re-establish production in case of any server errors

Time - Data backup


Source Target ¾ Create backups with the shortest possible
application outage
Write Read Read and write to both
source and copy possible - Data Mining
¾ Avoid performance impacts of the production system

- Test system
¾ Allow to test new application with real production
Optional background copy When copy is complete, data
relationship between
source and target ends
- Moving and migrating data
¾ Move a consistent data set from one host to another
with a minimum of downtime for the host application

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

Metro Mirror: synchronous replication of data between two


storage subsystems to improve data availability

ƒ Synchronous data mirroring (up to 300km)


Server ƒ Superior performance
cluster - Low internal MM Overhead (at zero distance DS8000
additional overhead is .38ms)
- Optimized Protocol Exchange
- Each 100KM add 1ms
- Plus Switch/channel extender Overheads
- Generally Fewer Links Required over
competition
Mirroring

Network Platform environment


Storage Storage ƒ System z: GPDS/PPRC, GPDS/PPRC Hyperswap Manager
ƒ System p: AIX HACMP/XD + Metro Mirror

Scalable Data Integrity ƒ System i: High Availability Business Partner software; ASR Toolkit
ƒ Geographically Dispersed Open Clusters (GDOC) for Unix, Linux and
Windows

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

GDPS/PPRC HyperSwap Manager and Metro Mirror


ƒ Extends Parallel Sysplex Availability to z/OS DS8000,
DS6000, ESS disk subsystems
• Eliminates disk subsystem as single point of failure
in a z/OS Parallel Sysplex

ƒ Masks primary disk subsystem failures by transparently


application switching to use secondary disks (Unplanned
HyperSwap)

ƒ Provides ability to perform disk maintenance without


UCB UCB requiring applications to be quiesced (Planned
HyperSwap)

ƒ Delivered as IGS Services offering


ƒ Technical concept:
• Planned or unplanned HyperSwap will dynamically
substitute DS8000, DS6000, or ESS Metro Mirror
Metro secondary for primary device
P Mirror S
• No operator interaction - GDPS-managed
• Can swap large number of volumes - fast
• Includes volumes with Sysres, page DS, catalogs
• Non-disruptive - applications keep using same
device addresses

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

Global Mirror: Asynchronous data replication between two storage


subsystems to improve data availability at global distance

ƒ Two site, unlimited global distance


ƒ Complete and consistent data mirroring
ƒ Consistency groups
¾ Across zOS and Open Systems data
PRIMARY REMOTE
HOSTS HOSTS ¾ Across up to 16 subsystems

ƒ Currency can be configured to as little as 3 to 5


seconds behind host I/O
Native
Performance
FlashCopy ƒ Native application performance
Platform environment
'A‘ SAN ƒ System z: GPDS/GM
Primary SAN
‘B’
Global
Copy
ƒ Geographically Dispersed Open Clusters (GDOC)
Transmission
Performance
Secondary for Unix, Linux and Windows

Consistent Data

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

z/OS Exploit Global Mirror (XRC): Asynchronous data replication between two
storage subsystems to improve data availability at global distance, using
System z MIPs

SDM systems ƒ Productivity tool that integrates management of


GDPS/XRC
production XRC and FlashCopy
systems ƒ Premium performance & scalability
¾ Data moved by System Data Mover (SDM)
address space(s) running on System z
¾ Supports heterogeneous disk subsystems

ƒ GDPS/XRC runs in the SDM location


¾ Manages availability of SDM Sysplex

¾ Performs fully automated site failover


ƒ Single point of control for multiple / coupled
System Data Movers
ƒ Supports zSeries and zSeries Linux data
journals
ƒ Over 200 installations worldwide

primary disk secondary disk


subsystems subsystems

XRC manages secondary consistency


Across any number of primary subsystems
All writes time-stamped and sorted before committed to secondary devices
© Copyright 2007 IBM Corporation. All right reserved.
IBM High Availability Center of Competency

Metro/Global Mirror : IBM three site recovery

ƒ Ability to switch production to any site


¾ Planned/Unplanned Outage
Back
up G
¾ Minimal Data Movement lobal
Mirro
LH r
ƒ Protection from local site disaster
¾ Metro Mirror (Sync PPRC ) Metro RJ
Mirror
¾ GDPS/MGM with HyperSwap locally
ƒ Protection from regional disaster RH

¾ Global Mirror (Async PPRC) Æ Regional C


IH
¾ Minimal Data Loss (3-5 seconds) IH
IH
ƒ Resynchronize any site with incremental
changes only
ƒ Managed by GDPS/MGM or TPC-R

© Copyright 2007 IBM Corporation. All right reserved.


IBM High Availability Center of Competency

IBM TotalStorage Productivity Center for Replication (TPC-R)

GUI / CLI / API Enable the configuration of complex


replication environments, provide
TPC-R V3.1 Two-Site BC V 3.1 feedback on the state of their operations,
and make changes easy to accomplish
Basic function plus
High Availability
DR Management Provide Common Interface
3rd Party Storage
Single point of control
TPC-R V3.1
Single set of commands and session states
Flash Copy
Metro Mirror, Global Mirror Build on copy services functions to provide
Session Management our customers a DR solution
Consistency Groups
Replication Monitor Dynamically monitor Metro Mirror and
maintain write order data consistency
Copy Device Interface
Hide differing hardware technologies and
unique Copy Service function
implementations
Automate Metro/Global Mirror Incremental
Resync function
ESS 800 DS8K DS6K SVC

© Copyright 2007 IBM Corporation. All right reserved.

S-ar putea să vă placă și