Storage - DS8K HA Best Practices - v1.9

Recommended Best Practices Considerations for High
Availability on IBM® System Storage™ DS8000 and

DS6000 and IBM TotalStorage® ESS
Prepared by:
Cam-Thuy Do and John Sing
IBM High Availability Center of Competency
October 2007
© Copyright 2007 IBM Corporation. All rights reserved.

IBM Systems and Technology Group
Disclaimers
Copyright © 2007 by International Business Machines Corporation.

No part of this document may be reproduced or transmitted in any form without written permission from IBM
Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change
without notice. This information could include technical inaccuracies or typographical errors. IBM may make
improvements and/or changes in the product(s) and/or programs(s) at any time without notice.
Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such
products, programs or services available in all countries in which IBM operates or does business. Any reference to
an IBM Program Product in this document is not intended to state or imply that only that program product may be
used. Any functionally equivalent program, that does not infringe IBM’s intellectually property rights, may be used
instead. It is the user’s responsibility to evaluate and verify the operation of any on-IBM product, program or service.
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER
EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no responsibility to update this information. IBM
products are warranted according to the terms and conditions of the agreements (e.g., IBM Customer Agreement,
Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM
is not responsible for the performance or interoperability of any non-IBM products discussed herein.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any
IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
Version 1.8 Copyright © IBM Corporation 2007. All rights reserved.

Trademarks
IBM, IBM eServer, IBM logo, e-business logo, CICS, DB2, MQ, ESCON, Enterprise Storage Server, GDPS, IMS, MVS,
OS/390, Parallel Sysplex, Redbook, Resource Link, S/390, System z9.iSeries, pSeries, xSeries, OS/400, i15OS, System
Storage, TotalStorage, VM/ESA, VSE/ESA, WebSphere, z/OS, z/VM, z/VSE, and zSeries are trademarks or registered
trademarks of International Business Machines Corp. in the United States, other countries, or both.
Linux is a registered trademark of Linux Torvalds in the United States, other countries, or both.
Microsoft is a registered trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States, other countries, or both.
© Copyright 2007 IBM Corporation. All right reserved.

IBM System Storage Enterprise Disk
New Standard in New Standard

Pricing and in Functionality,
Packaging Performance, TCO
DS6000 ESS 750 / 800

DS8000
This document provides a summary of recommended High Availability best practice

considerations for the DS8000, DS6000, and Enterprise Storage Server disk subsystems
The reader is assumed to have a baseline understanding of the concepts and facilities of
these products

System Storage Enterprise Disk Practices
Configuration
• RAID 5 - spreads data across multiple disk drives using parity (P) and spares, thus
providing redundancy (e.g. A 6+P+S array consists of six data, one parity drive and
one spare)
¾ Use RAID-5 when the desire is to use less storage, but at expense of longer rebuild time if drive fails
• RAID 10 stripes half the disk drives while the other half of the array mirrors the first
set of disk drives
¾ Use RAID-10 when the desire is for highest performance and/or lower rebuild time
¾ At expense of requiring larger amount of raw storage
Exploit available hardware options

• Server & Storage fail-over/fall-back in Metro Mirror Environment
• Concurrent Maintenance
• Minimize Single Frame DS8300 purchases as 1st expansion frame upgrade is disruptive.
• Distribute Host connections across multiple physical adapters on the DS8000
• Verify all host paths are available before upgrading software
• Logical Partitioning (LPAR) capability to distribute workloads

Multiple Redundant Management Control Consoles

Uninterruptible Power Supply
Earthquake Resistant Kit (where applicable)
Consider IBM Standby Capacity on Demand (Standby CoD) offering for capacity planning
Enable Call Home and Remote Support
Monitor the storage subsystem status
• e-mail notification for a serviceable event
• Simple Network Management Protocol (SNMP) notification
• Service Information Message (SIM) notification – zSeries
• Reviewing the event log of the DS8000

Maintain Currency
• Create a regular maintenance window for storage and SAN
• Install firmware updates as recommended
• Understand what fixes/upgrades are in a Firmware update
• Integrate into Change Control Management
• May install first on less critical systems, prior to production
• Maintain supported combinations of Host Adapter Driver
• Subscribe to MySupport
http://www.ibm.com/support/mySupport
Concurrent Maintenance
• Perform Concurrent Maintenance operations of the storage subsystem during
time of low activities
• Microcode upgrade will be performed by IBM support personnel

Host
HostBased
BasedMonitors
Monitorsand andAlert
Alert
• • GDPS/PPRC
GDPS/PPRC HyperSwap
HyperSwapManager
Manager
• GDPS/PPRC HyperSwap Monitors & Alerts
• GDPS/PPRC HyperSwap Monitors & Alerts
• TPC-R
• TPC-R
Host Based Collection Facilities
Host Based Collection Facilities
• z/OS LOGREC
• z/OS LOGREC
Host Based High Availability Options for Data
Host Based High Availability Options for Data
• DFSMF Dataset Name separation
• DFSMF Dataset Name separation
Host Connections – provide multiple paths from each host to the storage
Host
• MPIO Connections
or Subsystem– Device
provide multiple
Driver (SDD)paths fromSystems
for Open each host to the storage
• MPIO or Subsystem
• Dynamic Device
Path Selection Driver
(DPS) (SDD) for
and Dynamic Open
Path Systems
Reconnect (DPR) for zOS
• • Distribute
Dynamic paths
Pathacross multiple
Selection physical
(DPS) andadapters
Dynamic on Path
the DS8000
Reconnect (DPS) for
System
zOS i
• DSCLI commands executed through i5/OS interface
• ‘Copy Services for System i’ Toolkit
• Combination of iSeries Navigator and 5250 interface

Duplicate Storage Subsystems in Campus or Same Data Center Floor

• Can use Metro Mirror for data redundancy, to enable quick Re-IPL
¾ Requires automation S/W such as TPC-R or GDPS
• IBM Softek TDMF to move data around in Real Time
• Can perform local Site Switch before maintenance actions to reduce impact
of human errors and reduce impact to production
Know the following IBM System Storage web sites

• IBM System Storage support web site
¾ Starting point for IBM System Storage hardware and s/w support
¾ Includes links to subscription services to sign up for email alerts
¾ Includes links to product docs, contact information, fix search engine
¾ IBM System Storage Interoperation Center
http://www-01.ibm.com/servers/storage/support/config/ess/index.jsp
• Fibre Channel host bus adapter firmware and driver level matrix site

System Storage Enterprise Disk Practices –

Advanced Copy Functions Overview for Availability
Point in Time Copy (FlashCopy)

• Minimize application / database downtime required to make local point in time copies for:
- Production backup, data cloning, data warehouse, test and development
- Disk subsystem microcode creates internal copy of data (FlashCopy)
¾ Copy initialization of large terabytes of data can be accomplished in seconds
Remote Mirroring (Metro Mirror, Global Mirror, zOS Global Mirror)

• Create real-time, continuously updated remote copies of disk subsystem data
- Campus, metropolitan, or geographically distant site
• Data suitable for High Availability fast failover and failback
• Supports large amounts of data, at the terabyte level
• Disk subsystem microcode mirrors volumes/LUNs to remote disk subsystem
- Synchronous capability (Metro Mirror)
- Asynchronous capability (Global Mirror)


Point in Time internal Data Replication
Fast Time-Zero internal data replication capability (FlashCopy)

• Create internal copies of data for backup, cloning, data mining, etc.
Physical configuration
• Assure sufficient target disk space allocated
Usage practices:
• Plan databases/applications to be in hot backup mode or quiesce to maintain data
integrity
• Back up internal volume/LUN required for:
¾ Operating System catalogs, etc.
¾ Database/application metadata


Metro Mirror - Synchronous Data Replication
Applicability:
• General: provide synchronous data replication of disk subsystem at volume / LUN level
• System z: In combination with GDPS HyperSwap, provides foundation for removal of
Parallel Sysplex disk subsystem single point of failure
Physical configuration, link and infrastructure planning
• Must perform initial and ongoing analysis of write workload to determine sufficient
SAN/WAN/telecom infrastructure bandwidth
Automation
• Plan for highly automated operational control of mirroring to mask complexity and
support reliability, repeatability, testability
Testing and testing resource expectations

• Plan to provide Tertiary Copy storage at remote site
- For every production TB to be mirrored, ideally 2x that TB at remote site
- To provide additional storage for ongoing testing environment, resync protection,
and golden copy, problem determination, validation


Global Mirror - Asynchronous Data Replication
Applicability of IBM Global Mirror : is usually chosen when
• Open Systems or mix of z/OS and Open asynchronous replication of volumes/LUNs is
desired, and when reduced bandwidth is a necessity
Link and infrastructure planning
• Similar speed and throughput characteristics on source and target volumes can provide
optimum performance
Automation
Availability and Testing
• Plan to provide sufficient Tertiary Copy storage at remote site
- To provide storage for ongoing testing environment, resync protection, golden copy, problem
determination, validation


Global Mirror (XRC) - Asynchronous Data Replication
Applicability of IBM z/OS Global Mirror (XRC):
• General: z/OS Global Mirror is usually chosen when:
- Only z/OS data requires asynchronous data replication, or when heterogeneous
z/OS disk vendors are required.
Physical configuration, link and infrastructure planning
• Similar speed and throughput characteristics on source and target volumes can provide
optimum performance
• Plan to provide sufficient System z cycles at remote site for System Data Mover
Automation
Availability and Testing
• Plan to provide sufficient Tertiary Copy storage at remote site
- To provide ongoing testing environment for setup, validation, problem determination,
validation

System Storage Enterprise Disk Practices – Three site replication
When to use 3 site

• Three site replication is used when the requirement is to combine zero data loss RPO using local
Metro Mirror, and combining that with out of region recovery (async).
Pre-requisites: Three site replication is affordable and justifiable to the business when:
• Data Center strategy and implementation is already well under way towards Active-Active or Planned
Workload Rotation for two site
Pre-requisite: Two site configuration already includes ongoing:

• Automated failover/failback
• Full Tertiary Copy capability for testing, problem determination, validation, automation
• Ongoing WAN / bandwidth / workload Capacity Planning


Management of Replication
Plan for highly automated disk mirroring environment
Provides foundation for Reliability, Repeatability, Scalability, Testability
Recommendations for automation software:
• System z environment: GDPS

• Mixed open platform: GDOC
• General disk mirroring mgmt: TPC for Replication

System Storage Enterprise Disk Practices Resources

System Storage Business Continuity Solutions website
http://www-03.ibm.com/servers/storage/solutions/business_continuity/index.html
System Storage Technology Center
http://www-03.ibm.com/system/storage/
Storage Education http://www-03.ibm.com/systems/education/cust/crossprod/custcp.html
System Storage Interoperation Center
http://www-01.ibm.com/systems/support/storage/config/ssic/index.jsp
System Storage Services

http://www-03.ibm.com/systems/storage/services/index.html
Redbooks/Redpapers
http://www.redbooks.ibm.com/redbooks.nsf/portals/Storage
The IBM TotalStorage DS8000 Series: Concepts and Architecture (SG24-6452-00)
IBM System Storage Business Continuity Solutions Overview (SG24-6684-01)
IBM System Storage DS8000 Series: Copy Services with IBM System z (SG24-6787-02)
IBM System Storage DS8000 Series: Copy Services in Open Environments (SG24-6788-02)
IBM System Storage Solutions Handbook (SG24-5250-06)
White papers
IBM Storage Infrastructure for Business Continuity Solution
Global Mirror Technical Whitepaper

Data Corruption Solutions

System Storage Enterprise Disk Practices – Data Corruption
Logical data corruption protection must be designed at the operational and

application level
Best practices procedures are:

• Sufficient point in time disk copies of data
• To provide adequate known restart points
• Supplemented by operational procedures at the database/application level
Tools include (but not limited to):

• FlashCopy Point in Time Copy
• Software:
¾ zCDP for DB2 (zOS 1.8 + DB2 9)
- Eliminates need for DB2 Backup Windows via DB2 BACKUP Utility
- No interruption to DB2 Processing to take backups.
- DFSMShsm Maintains up to 50 Backup versions across disk & Tape.
- DB2 RESTORE Utility Granularity - System, Volume, DB Table.
¾ Future: zCDP for Storage – IBM SOD on providing CDP function for all zOS data.

Supplemental Information

FlashCopy: Local Point in Time Data Replication to improve

data availability
FlashCopy Use Cases

Copy data command issued - Copy is immediately available
- Production backup
¾ Regain information from an older level of data
¾ Re-establish production in case of any server errors
Time - Data backup

Source Target ¾ Create backups with the shortest possible
application outage
Write Read Read and write to both
source and copy possible - Data Mining
¾ Avoid performance impacts of the production system
- Test system
¾ Allow to test new application with real production
Optional background copy When copy is complete, data
relationship between
source and target ends
- Moving and migrating data
¾ Move a consistent data set from one host to another
with a minimum of downtime for the host application

Metro Mirror: synchronous replication of data between two

storage subsystems to improve data availability
Synchronous data mirroring (up to 300km)

Server Superior performance
cluster - Low internal MM Overhead (at zero distance DS8000
additional overhead is .38ms)
- Optimized Protocol Exchange
- Each 100KM add 1ms
- Plus Switch/channel extender Overheads
- Generally Fewer Links Required over
competition
Mirroring
Network Platform environment

Storage Storage System z: GPDS/PPRC, GPDS/PPRC Hyperswap Manager
System p: AIX HACMP/XD + Metro Mirror
Scalable Data Integrity System i: High Availability Business Partner software; ASR Toolkit
Geographically Dispersed Open Clusters (GDOC) for Unix, Linux and
Windows

GDPS/PPRC HyperSwap Manager and Metro Mirror

Extends Parallel Sysplex Availability to z/OS DS8000,
DS6000, ESS disk subsystems
• Eliminates disk subsystem as single point of failure
in a z/OS Parallel Sysplex
Masks primary disk subsystem failures by transparently

application switching to use secondary disks (Unplanned
HyperSwap)
Provides ability to perform disk maintenance without

UCB UCB requiring applications to be quiesced (Planned
HyperSwap)
Delivered as IGS Services offering

Technical concept:
• Planned or unplanned HyperSwap will dynamically
substitute DS8000, DS6000, or ESS Metro Mirror
Metro secondary for primary device
P Mirror S
• No operator interaction - GDPS-managed
• Can swap large number of volumes - fast
• Includes volumes with Sysres, page DS, catalogs
• Non-disruptive - applications keep using same
device addresses

Global Mirror: Asynchronous data replication between two storage

subsystems to improve data availability at global distance
Two site, unlimited global distance

Complete and consistent data mirroring
Consistency groups
¾ Across zOS and Open Systems data
PRIMARY REMOTE
HOSTS HOSTS ¾ Across up to 16 subsystems
Currency can be configured to as little as 3 to 5

seconds behind host I/O
Native
Performance
FlashCopy Native application performance
Platform environment
'A‘ SAN System z: GPDS/GM
Primary SAN
‘B’
Global
Copy
Geographically Dispersed Open Clusters (GDOC)
Transmission
Performance
Secondary for Unix, Linux and Windows
Consistent Data

z/OS Exploit Global Mirror (XRC): Asynchronous data replication between two
storage subsystems to improve data availability at global distance, using
System z MIPs
SDM systems Productivity tool that integrates management of

GDPS/XRC
production XRC and FlashCopy
systems Premium performance & scalability
¾ Data moved by System Data Mover (SDM)
address space(s) running on System z
¾ Supports heterogeneous disk subsystems
GDPS/XRC runs in the SDM location

¾ Manages availability of SDM Sysplex
¾ Performs fully automated site failover

Single point of control for multiple / coupled
System Data Movers
Supports zSeries and zSeries Linux data
journals
Over 200 installations worldwide
primary disk secondary disk

subsystems subsystems
XRC manages secondary consistency

Across any number of primary subsystems
All writes time-stamped and sorted before committed to secondary devices
Metro/Global Mirror : IBM three site recovery
Ability to switch production to any site

¾ Planned/Unplanned Outage
Back
up G
¾ Minimal Data Movement lobal
Mirro
LH r
Protection from local site disaster
¾ Metro Mirror (Sync PPRC ) Metro RJ
Mirror
¾ GDPS/MGM with HyperSwap locally
Protection from regional disaster RH
¾ Global Mirror (Async PPRC) Æ Regional C

IH
¾ Minimal Data Loss (3-5 seconds) IH
IH
Resynchronize any site with incremental
changes only
Managed by GDPS/MGM or TPC-R

IBM TotalStorage Productivity Center for Replication (TPC-R)
GUI / CLI / API Enable the configuration of complex

replication environments, provide
TPC-R V3.1 Two-Site BC V 3.1 feedback on the state of their operations,
and make changes easy to accomplish
Basic function plus
High Availability
DR Management Provide Common Interface
3rd Party Storage
Single point of control
TPC-R V3.1
Single set of commands and session states
Flash Copy
Metro Mirror, Global Mirror Build on copy services functions to provide
Session Management our customers a DR solution
Consistency Groups
Replication Monitor Dynamically monitor Metro Mirror and
maintain write order data consistency
Copy Device Interface
Hide differing hardware technologies and
unique Copy Service function
implementations
Automate Metro/Global Mirror Incremental
Resync function
ESS 800 DS8K DS6K SVC

Storage - DS8K HA Best Practices - v1.9

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Storage - DS8K HA Best Practices - v1.9

Încărcat de

Drepturi de autor:

Formate disponibile

Recommended Best Practices Considerations for High

Availability on IBM® System Storage™ DS8000 and

© Copyright 2007 IBM Corporation. All rights reserved.

Copyright © 2007 by International Business Machines Corporation.

Version 1.8 Copyright © IBM Corporation 2007. All rights reserved.

© Copyright 2007 IBM Corporation. All right reserved.

IBM System Storage Enterprise Disk

New Standard in New Standard

DS6000 ESS 750 / 800

 This document provides a summary of recommended High Availability best practice

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices

 Exploit available hardware options

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices

 Multiple Redundant Management Control Consoles

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices

 Duplicate Storage Subsystems in Campus or Same Data Center Floor

 Know the following IBM System Storage web sites

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

 Point in Time Copy (FlashCopy)

 Remote Mirroring (Metro Mirror, Global Mirror, zOS Global Mirror)

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

 Fast Time-Zero internal data replication capability (FlashCopy)

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

 Testing and testing resource expectations

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices – Three site replication

 When to use 3 site

 Pre-requisite: Two site configuration already includes ongoing:

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices –

 Plan for highly automated disk mirroring environment

 Provides foundation for Reliability, Repeatability, Scalability, Testability

 Recommendations for automation software:

• System z environment: GDPS

© Copyright 2007 IBM Corporation. All right reserved.

System Storage Enterprise Disk Practices Resources

System Storage Services

© Copyright 2007 IBM Corporation. All right reserved.

© Copyright 2007 IBM Corporation. All rights reserved.

System Storage Enterprise Disk Practices – Data Corruption

 Logical data corruption protection must be designed at the operational and

 Best practices procedures are:

 Tools include (but not limited to):

© Copyright 2007 IBM Corporation. All right reserved.

© Copyright 2007 IBM Corporation. All rights reserved.

FlashCopy: Local Point in Time Data Replication to improve

FlashCopy Use Cases

Time - Data backup

This document provides a summary of recommended High Availability best practice

Exploit available hardware options

Multiple Redundant Management Control Consoles

Duplicate Storage Subsystems in Campus or Same Data Center Floor

Know the following IBM System Storage web sites

Point in Time Copy (FlashCopy)

Remote Mirroring (Metro Mirror, Global Mirror, zOS Global Mirror)

Fast Time-Zero internal data replication capability (FlashCopy)

Testing and testing resource expectations

When to use 3 site

Pre-requisite: Two site configuration already includes ongoing:

Plan for highly automated disk mirroring environment

Provides foundation for Reliability, Repeatability, Scalability, Testability

Recommendations for automation software:

Logical data corruption protection must be designed at the operational and

Best practices procedures are:

Tools include (but not limited to):

Synchronous data mirroring (up to 300km)

Masks primary disk subsystem failures by transparently

Provides ability to perform disk maintenance without

Delivered as IGS Services offering

Two site, unlimited global distance

Currency can be configured to as little as 3 to 5

SDM systems Productivity tool that integrates management of

GDPS/XRC runs in the SDM location

Ability to switch production to any site