Sunteți pe pagina 1din 13

Data Migration Considerations: A Customer

Engineering Residency
Best Practices Planning

































Abstract
This white paper outlines the results of a recent JPMorgan Chase EMC Engineering Residency focused on
research and development of JPMorgan Chase data migration options and strategies. This paper describes the
overall EMC Engineering Residency project and the experience JPMorgan Chase had using EMC data
migration technologies in EMC Engineering labs. Data migration methodologies and applicability for
migrating off legacy storage arrays onto multi-tiered Symmetrix DMX systems were evaluated and decision
criteria outlined based on JPMorgan Chases business objectives and availability requirements.

May 2007




Copyright 2007 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION
MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com
All other trademarks used herein are the property of their respective owners.
Part Number H2799
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 2



Table of Contents

Executive summary ............................................................................................ 4
Introduction......................................................................................................... 4
Audience ...................................................................................................................................... 4
Data migration process...................................................................................... 4
Phase 0: Assessment .................................................................................................................. 5
Phase 1: Planning and design..................................................................................................... 6
Phase 2: Change control ............................................................................................................. 7
Phase 3: Migration execution....................................................................................................... 7
Phase 4: Post-migration review................................................................................................... 8
Data migration approaches.......................................................................................................... 8
Host-based data migrations ..................................................................................................... 8
Array-based data migrations .................................................................................................... 9
EMC data migration technologies ..................................................................... 9
EMC Open Migrator/LM............................................................................................................... 9
EMC Open Replicator .................................................................................................................. 9
EMC PowerPath Migration Enabler ............................................................................................. 9
Data migration technology comparison ........................................................... 9
EMC Open Migrator/LM considerations..................................................................................... 10
EMC Open Replicator considerations........................................................................................ 10
Push operations ..................................................................................................................... 11
Pull operations........................................................................................................................ 11
EMC PowerPath Migration Enabler considerations................................................................... 11
Decision criteria................................................................................................ 12
JPMorgan Chase technology decision tree.................................................... 13
Conclusion ........................................................................................................ 13
References ........................................................................................................ 13

Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 3



Executive summary
Enterprise data migration projects are often complex, time-consuming engagements that require detailed
planning to mitigate the risk of incurring business disruptions. Migration projects are often considered
one-offs because the technologies and methodologies must be reviewed on a migration-by-migration
basis. However, the overall procedures and best practices are similar for the majority of data migration
projects.
Within the multiple JPMorgan Chase data centers worldwide, host-based and disk-based data migrations
are regularly required to consolidate storage, eliminate capacity limitation, address resource performance
constraints, and periodically refresh technology to reflect current standards. A team of JPMorgan Chase
system architects, implementers, and storage administrators, working jointly with EMC Engineering,
engaged in a project to streamline the host-based data migration techniques currently deployed. This project
was referred to as the Commando Residency and was held at EMC Engineering headquarters in
Hopkinton, Mass., in June 2006.
The Symmetrix

Engineering Residency Program offers select Symmetrix customers a forum to discuss


planning, deployment, and management considerations for a high-impact solution. The residency provides
technical knowledge transfer on new or existing products and features, which enables customers to
maximize their return on investment in EMC technologies and increases the speed to deployment of
solutions supporting specific business initiatives.
The JPMorgan Chase process for performing data relocations is consistent with many mission-critical
environments. Applications in those environments have various service-level agreements (SLA) that are
used to define availability requirements and appropriate application outage windows.
This white paper describes the EMC open systems data migration offerings evaluated during the JPMorgan
Chase Engineering Residency and the resulting decision criteria identified as applicable for JPMorgan
Chase data centers. The decision criteria ensure that the method chosen for the data relocation will simplify
the migration process, increase application availability, and guarantee success for relocating production
data.
During the Commando Residency, it was agreed that the most important requirement was to ensure the
integrity of the data being relocated. It is also important to note that all systems involved with the migration
are compliant with hardware and software revision standards and interoperability requirements documented
in the EMC Support Matrix. Before proceeding with any migration project, all exceptions to these standards
must be documented, analyzed, and approved by the parties involved in the migration.
Introduction
This white paper discusses the five phases of the data migration process used by JPMorgan Chase and three
EMC technologies that were evaluated during the Commando Residency. Considerations for each product
are described and the decision criteria for when to use each of the products are also provided. The
JPMorgan Chase decision tree provides a flowchart view of their selection criteria is also presented.
Audience
This paper is intended for storage professionals who presently or plan to participate in data migration
projects in an open system environment.
Data migration process
The process for performing a data migration is similar to other IT projects in that extensive planning
follows a structured methodology. This section describes the phases of the data migration project.
Phase 0: Assessment
Phase 1: Planning and design
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 4



Phase 2: Change control
Phase 3: Migration execution
Phase 4: Post-migration review
Phase 0: Assessment
The first phase of a migration project involves defining scope and requirements and performing a
preliminary analysis of the current environment to determine the level of conformance to the supported
standards. During this phase, also select the appropriate migration methodology based on the defined
requirements. Effort spent during the assessment phase results in the following benefits:
Safe, structured migration solutions that minimize risk
Repeatable and predictable results
Process tailored to the specific requirements of the lines of business
Efficient execution of the project by the shared storage management teams
Numerous data migration technologies and methodologies are available to JPMorgan Chase, the majority
of which are either array-based or host-based. Business requirements dictate the most appropriate
methodology to deploy for a specific project. To determine the appropriate approach, it is important to
perform a preliminary analysis that involves gathering the following information:
Quantity of information to be migrated
Gather the specific information regarding the number and size of the LUNs.
Application availability
The amount of application downtime for an application often determines the appropriate migration
method.
Best practice: Online migration introduces additional risk that must be weighed against the
impact of application availability. A JPMorgan Chase best practice is to perform offline
migrations whenever possible.
Capacity allocation requirements
Migrations often provide an opportunity to rearchitect storage allocations with regard to LUN sizes and
protection schemes.
Performance impact
Migration involves the moving of large amounts of data that can impact the performance of the host,
array, network, and SAN fabric. This impact must be considered. Implementing features such as
throttling is recommended to reduce the impact.
Scheduling requirements
Depending on the criticality of current operations, maintenance or cutover windows may be small.
Some techniques allow migration to occur over an extended period and can perform the actual cutover
later during the next scheduled maintenance window.
Backout plan
To reduce the risk to the business, a detailed outline of a working backout plan is required before
beginning the migration process.
There are a number of techniques and methodologies that may be deployed. Database-centric methods,
such as standby database and extract/load approaches, were discussed but are beyond the scope of this
white paper.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 5



Phase 1: Planning and design
During this phase of a migration project, a detailed design and timeline are developed based on
requirements identified during the analysis. Completion of this planning and design phase is required for
any well-developed change management process.
This phase consists of developing an implementation plan that includes not only the specific details of the
project but also an analysis of future requirements based on possible growth of the application. A solution
design must document the migration method as well as an environmental assessment to ensure that the
solution meets defined business requirements.
Some of the information identified and documented during this phase includes:
Specific source and target volumes
Front-end port connections
Device masking requirements
Fabric connections
Zoning
Physical cabling requirements
Software versions and licensing
Cache requirements
To define and plan for future capacity requirements, it is important to consider storage utilization trends
and any knowledge of future initiatives that could impact capacity. Analysis of performance and protection
requirements ensures optimal data layout and physical configuration of the target array.
A macro view takes into account the complex nature of data migrations and the impact on shared resources,
including server load, network bandwidth and capacity, source and target storage performance, SAN traffic,
and the impact to operational schedules. All have an impact on the amount of data that can be replicated
from one platform to another during a specific timeframe. Proper solution design allows proper sizing of
resource requirements and sets the expected project duration. This is critical to the project success.
Criteria for success are defined during the planning phase and validated with the stakeholders. These
criteria fall into two categories: data integrity and performance. Obviously, data integrity is paramount for
any data migration project. A team that includes the application stakeholders must first define a test plan.
This plan could be simple, such as performing MD5 checksums or successfully starting the application;
however, more comprehensive testing to validate dataset consistency between interrelated application data
is more likely required. In addition, current performance benchmarks should be established during the
planning phase and revalidated during the post-migration phase. Capturing the output of IOSTAT and/or
VMSTAT is recommended for at least one day prior to migration so that the results can be compared with
post-migration performance.
Best practice: If the proposed solution includes a new technology or technique that has not
been deployed previously in the JPMorgan Chase production environment, then validating the
approach in a proof-of-concept exercise in a non-production environment is considered best
practice. This minimizes risk, and benchmarked transfer rates can be used to set appropriate
expectations.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 6



When a migration project is driven by a need to refresh technology, additional attention must be given to
the following areas:
Supported revision levels and interoperability issues with regard to:
Host bus adapters (HBAs)
Fabrics
Storage subsystems
Host operating systems levels
Application software revision levels
Best practice: Identify and resolve all of the above issues before proceeding with the data
migration. Performing multiple infrastructure changes as part of the migration introduces
complexity that increases risk.
Phase 2: Change control
Data migration projects must conform to the change management process. The following is a summary of
the change management components:
Risk assessment
A critical component of change management is a full risk assessment. Risk assessment is the analysis
of what can go wrong, how to prevent it, and how to mitigate the impact resulting from a failed
change. This analysis should be documented and include a tested backout plan.
Migration plan
This is a review of the migration implementation plan that was created during the planning and design
phase. This outlines the current and target environment and includes a step-by-step process.
Prerequisites and dependencies
Any prerequisites such as site support engagement, vendor, and parts availability should be
documented and planned for.
Timeline and schedule
If the current operations are critical, the timeline and schedule will need to be well defined in order to
meet limited maintenance or cutover timeframes. Ensure that backup plans are prepared in the event
that an implementation needs to be backed out.
Resource plan
The plan should include human resource availability, skills, and ownership.
Phase 3: Migration execution
In this phase, the plan developed during the design phase is implemented. General migration
implementation best practices include:
As a first step, perform a full backup of all data involved.
Review detailed procedures. A scripted methodology is the preferred approach.
Ensure that support needs are defined and available and that backout plans are as well understood as
the data migration plans.
Conduct a pilot migration or execute advanced testing whenever possible. A test before a scheduled
event can be extremely valuable.
Observe data throughput rates early in the project. If the migration timing is based on certain
throughput estimates and a significant difference between estimated and actual throughput develops
early in the project, there may still be time to adjust schedules, performance, or methodology before
the project exceeds the planned timeframe.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 7



Follow the test plan and document everything! The highest risk in moving production data online is
introducing a data integrity problem without a means for determining how it was introduced. If the
plan is documented and explicitly followed, this risk can usually be mitigated. It is critical to have the
customer sign off on each migrated and tested data source before moving to the next data source.
Escalate issues promptly. It may seem easier to wait before asking for help, in the hope that the
problem can be resolved. However, if behavior that is not within the plan is observed, it is important to
involve the support system to ensure that help is available when needed, especially if the staff involved
has little experience performing data migration.
As procedures or best practices are developed, add them to this process.
Phase 4: Post-migration review
Confirm that there is no quality-of-service impact during the post-migration phase. Complete the following
actions:
Capture performance statistics and compare them with pre-migration benchmarks.
Perform project cleanup by removing the tools utilized during migration.
Decommission the original source SAN/disk.
Review issues that occurred during the migration, the manner in which they were handled, and the
results.
Provide a functional overview regarding the end state of the migrated data environment. Examples
include addresses, sizes, access speeds, and the names of the data sources that have changed that can
impact operations.
Update zoning and other documentation.
Data migration approaches
There are three common approaches for physically moving data during the data migration process: host-
based, array-based, and network-based. Before choosing the appropriate approach it is important to first
understand the context under which the data relocation will be performed. The decision process should
also include evaluating the existing data migration strategy, including any current limitations and
restrictions. This white paper focuses on the two core approaches of interest to JPMorgan Chase: host-
based and array-based.
Host-based data migrations
Enterprise-class data centers, such as those at JPMorgan Chase, often deploy host-based replication as a
method for performing data migrations. Each of the major open systems operating environments includes
native tools that implement host-based data migration. For most UNIX migrations, JPMorgan Chase
extends the logical volume mirrors within the Logical Volume Manager. Windows environments that run
the native Logical Disk Manager can also take advantage of these capabilities. Environments running in a
basic disk configuration must rely on copy tools to perform the data relocation.
Host-based data migration is usually performed by the server administrator and not the storage
administrator because it requires root/administrator server level access. The benefit to this technique is that
it does not require additional technologies be installed or licensed and it works within the current skill set
of the server administrator.
Performing host-based migration within UNIX and Windows environments presents different problems:
UNIX data relocation deployments use native tools that do not always support logical volume layout
configurations.
Windows basic disk configurations require an advanced copy utility to move the data. This must be
performed offline and often uses IP network resources to perform the operation.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 8



Array-based data migrations
Disk-based data migrations use storage system resources to physically move data. The benefit of this
technique is that it can preserve host-based resources for other purposes, which may be especially useful
during an online migration. Of course, disk-based migrations will consume resources on the storage system.
In addition, disk-based migrations may require skills and knowledge that span heterogeneous storage
environments.
EMC data migration technologies
During the Commando Residency, JPMorgan Chase reviewed all of the EMC host, array, and network
migration offerings and narrowed it down to three EMC data migration technologies for evaluation and
possible integration with their current migration methodologies. The following technologies were
evaluated.
EMC Open Migrator/LM
EMC

Open Migrator/LM provides online data migration in Microsoft Windows and UNIX environments.
Open Migrator/LM allows volumes to remain online during a migration, increasing application availability
during a process that traditionally requires extensive downtime. The application outage occurs when
switching from the production to the target devices.
Open Migrator/LM for UNIX provides an online data migration technology that utilizes host system
resources to migrate data from the source to the target storage arrays.
Open Migrator/LM for Windows operates at the filter-driver level to manage and move Windows data from
a source to a target volume with minimal disruption to the server or applications.
EMC Open Replicator
EMC Open Replicator provides a method for copying data to or from a Symmetrix DMX storage system
to qualified storage arrays. It requires no host resources as it leverages the storage area network (SAN)
infrastructure to provide deployment flexibility and massive scalability. Open Replicator can also be used
to create point-in-time copies to be used for high-speed data mobility, remote vaulting, migration, and
distribution. Copying data from a Symmetrix DMX array to devices on remote storage arrays can be done
fully or incrementally.
EMC PowerPath Migration Enabler
PowerPath

Migration Enabler (PPME) is a two-part host- and array-based migration tool that allows data
migration between storage systems while providing a nondisruptive (to the application) cutover to the new
system. PPME does not require preconfigured multipathing with PowerPath or any independent
multipathing in general. PowerPath Migration Enabler works in conjunction with underlying EMC
replication technology such as Open Replicator.
When the data is relocated with PowerPath Migration Enabler, the data on the source device continues to
be accessible to host applications while the migration takes place. This minimizes (or potentially
eliminates) application disruption. The amount of disruption depends on whether data is migrating from
pseudo or native devices and also whether PowerPath is already installed on the system.
Data migration technology comparison
Data migration projects conducted at an enterprise data center often move data between different types of
storage systems. There are many reasons this may be necessary: equipment lease periods expire, equipment
may be decommissioned to install newer technology, or new tiered storage requirements may be
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 9



implemented. Data classification has enabled JPMorgan Chase to implement a tiered storage approach,
allowing them to place more critical application data on their faster, more reliable storage hardware.
The following considerations should be reviewed when determining which data migration approach and
tools are appropriate.
EMC Open Migrator/LM considerations
Open Migrator/LM uses host system resources to perform the migration. The source can be any type of
qualified storage array. The migration can be performed online, though an application outage will be
necessary at the end of the data copy.
Because host resources are required and the actual data transfer can be performed with the application
running, taking action to throttle I/O during the migration will lessen any performance impact, especially
during peak periods. The UNIX version is optimized to minimize impact on system I/O performance. It has
a user-tunable migration rate and I/O copy size.
Because the devices being migrated often contain mission-critical application data, they are usually
configured in a clustered environment. Therefore, data migration technologies and methodologies need to
support clusters. Check the release of Open Migrator/LM, as it may not support VERITAS clusters and
dynamic disks. Automatic failover in a cluster environment should be disabled, because a failover between
servers would move disk resources that could be part of the migration process.
Windows environments using Open Migrator/LM may require a reboot to attach the filter-driver. This is
dependent on the specific version of the Windows operating system and the specific version of Open
Migrator/LM. For all supported UNIX environments, Open Migrator/LM can be installed, operated, and
uninstalled without performing a system reboot.
In general, data migrations transfer data in one direction, from the source to the target storage. There is no
business continuity requirement to move data back to the original source after the successful relocation of
data. If a problem occurs during the data transfer that requires a system reboot, Open Migrator/LM allows
for migrations to persist across system reboots. Because there are generally no requirements to move data
back to the original source, except to back out of the migration, migrating between volumes of different
sizes is permitted. An Open Migrator/LM migration target must be equal to or greater than the source
volume capacity.
Verifying data integrity is critical in determining the success of the data relocation. Open Migrator/LM
supports a compare action for verification of source and target volume synchronization. This increases the
data migration duration but validates the integrity of the data once copied.
EMC Open Replicator considerations
Open Replicator can be used to move data onto or off of a Symmetrix DMX or CLARiiON

storage
system. However, a copy session, which is required for copying data, cannot be created with control and
remote devices on the same Symmetrix system. Open Replicator can also be used to pull data off qualified
third-party arrays, The control DMX system initiates the pull or push copy operation from the control
devices to the remote devices. The remote devices can be of different protection and even different
metaconfigurations. For the purposes of the following discussion the target device is receiving the copied
data and the source device is supplying the data to be copied. The target device must be equal to or greater
than the capacity of the source for the copy operation.
Because Open Replicator supports the remote copy process to non-Symmetrix storage, it should be noted
that the Symmetrix API (SYMAPI) does not recognize these subsystems and must use a World Wide Name
as the device identifier. Because it uses front-end director resources to perform the data propagation, it is
important to assess the bandwidth required to determine whether the appropriate throttling parameter (for
example, the pace option) is properly set.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 10



Push operations
During a push copy operation, the data will be copied from the control DMX system and devices to the
remote devices on another system. During both push and pull operations the flow of data is controlled by
the DMX control system. On push copy operations, remote hosts should not access the remote devices until
copying is complete.
Data corruption to devices may be possible during a copy operation if another host on the SAN has write
access to the remote device. To guarantee that the device cannot change while copying is in process
unmount the remote device or mark any other hosts on the SAN as not ready.
Accumulated I/O errors between the control device and remote device will cause a copy session to fail if
the copy operation is an online push. A copy session can stall and restart when errors are encountered
during offline push, online pull, and offline pull copy operations.
Pull operations
During a pull copy operation, the data will be copied from the remote devices onto the DMX control
system. On pull operations, the remote devices should be inaccessible to the remote hosts for the duration
of the copy process. To prevent this, the device should be write-disabled.
Online pull operations can potentially result in the loss of application host updates made during the copy
operation. This can happen because the devices that are also being copied, the target devices, will also
continue to be updated by the application hosts (this is the implication of an online operation). However,
these updates by the host will not be tracked by the copy operation. There are certain error scenarios that
the pull operation cannot recover from, at which point it will have to be restarted. Any application host
updates made during the failed pull operation will be lost.
EMC PowerPath Migration Enabler considerations
PowerPath Migration Enabler uses Open Replicator as the data propagation mechanism. Therefore, review
the considerations for Open Replicator when using PPME for data migration. At the time of testing, PPME
was supported only in Solaris environments.
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 11



Decision criteria
Based on information gathered, the appropriate methodology and technology can be determined. Table 1
lists the decision criteria for the various migration methods.
Table 1. Decision criteria for choosing a migration method
Migration method When to use Decision criteria and consideration
Traditional backup and
restore
Any supported host type
Minimal amount of data to be migrated
Extended migration window (hours)
allowed
Application can be offline during
migration
Independent of storage system
Application must be offline during
migration
Target may be of different size than
source
Host Logical Volume
Manager tools (VXVM,
AIX LVM, and others)
Smaller projects (single application/host)
Source and target volumes can be of
different sizes
Potential performance impact
Complex methodology
For Windows hosts, consider Open
Migrator
Operating system copy
tools (such as dd)
Application can be offline during the
entire migration
No recovery in the event of a failure
during copy
Symmetrix Remote Data
Facility (SRDF

)
Source and target are both Symmetrix
Full volume replication
Data center move
Need to move entire frame
SRDF typically requires the source and
target array be no more than one
generation behind
Bin file change required
Open Replicator Symmetrix to/from non-Symmetrix
Need to replicate full volumes
Independent of host operating system
Need target LUNs to be larger than
source
Environments where copy workload may
impact overall SAN must be minimized
Only Symmetrix control volume can be
active during migration
Can set throttle level for replication
No bin file change required
Methodology can be scripted

Open Migrator Need to move to logical volumes of
different structures (that is, raw devices
to VERITAS volumes)
To move data to or from non-EMC
storage
Host-based replication technology
Logical volume level or drive level
No CLI for Windows
PowerPath Migration
Enabler (PPME)
Limited host support
When application must be continuously
available and/or switchover time is small
(that is, the time it takes to perform a
reboot)
Currently uses Open Replicator as
underlying replication technology
No dependency on PowerPath path
multipathing

Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 12



JPMorgan Chase technology decision tree
Figure 1 details the JPMorgan Chase decision tree used in choosing a migration technology.

Host Type
Long Distance
Migration
Must Use Open
Replicator or
SRDF
UNIX(Not HPUX)/ W2K
Novell/HPUX
SRDF
Yes
SRDF/Open
Replicator/ Open
Migrator/ PPME
No
SAN Impact A
Factor?
Utililze Open
Replicator
Throttling
Yes
Can application
tolerate downtime
during cutover?
No
Utilize
PPME
No
Host impact a
concern?
Yes
Yes
No
Open
Replicator,
SRDF
Volume
Changes?
Open Migrator
Yes
Large number
of hosts?
SRDF
No
Yes No
Open Replicator
Open Migrator
Large number
of hosts?
SRDF
Yes No
Open
Replicator

Figure 1. Technology selection decision tree
Conclusion
The processes described in this white paper are generic, allowing them to be applied to many different data
migration projects. These processes benefit enterprise data centers by increasing reliability, reducing
administration costs, and maximizing application availability by minimizing the disruption normally
inherent in data migration. The overall procedures and best practices described are applicable to the
majority of data migrations within JPMorgan Chase.
References
The following list includes reference materials used during the JPMorgan Chase Commando Residency:
EMC Open Migrator/LM for UNIX and Linux CLI Product Guide
EMC Open Migrator/LM for Windows Product Guide
EMC Solutions Enabler Symmetrix Open Replicator CLI Product Guide
Data Migration Considerations: A Customer Engineering Residency
Best Practices Planning 13

S-ar putea să vă placă și