Sunteți pe pagina 1din 20

SRDF Topology Discussion Document for

Deutsche Bank

James Ridley /Jan Jedynak


EMC Corporation
VERSION 1.2
Page 1 of 21

Introduction..........................................................................................................................3
Executive Overview................................................................................................4
SRDF Overview..............................................................................................................6
What is SRDF?............................................................................................................................6
Why SRDF?...................................................................................................................................6
How does SRDF work?..........................................................................................................6
SRDF Storage Protocol used by Deutsche Bank.....................................................7
Where SRDF is used at Deutsche Bank........................................................................7

DWDM Technology...................................................................................................8
Overview.........................................................................................................................................8
Nortel Networks OPTera Metro.........................................................................................8

Latency......................................................................................................................................10
SRDF induced delays.............................................................................................................10
Example..........................................................................................................................................11
Recommendations for Handling High Activity Data..........................................12

Types of applications not suited to synchronous


replication............................................................................................................................13
SRDF best practices in use at Deutsche Bank......14
Alternative strategies........................................................................................15
1) SRDF Semi-Synchronous mode................................................................................15
2) SRDF Adaptive Copy mode........................................................................................16
3) SRDF Multi-Hop mode..................................................................................................17
4) Oracle8I Automated Standby Database................................................................19

Page 2 of 21

Introduction
Recent engineering work by EMC has validated the Nortel Optera DWDM
(Dense Wave Division Multiplexer) for use with EMC SRDF to a distance of
200KM. This document explains which applications can benefit from this
extended distance mirroring, and which cannot. It also offers alternative
solutions to support the protection of data if the application cannot support
extended distance mirroring.

Page 3 of 21

Executive Overview
EMC Engineering has recently validated the Nortel Optera DWDM for use with
EMC SRDF up to 200KM.
DWDM technology allows for the packaging up of multiple SRDF links into a
smaller number of physical telecomms fibre cable thus reducing the number
of telecomms fibre cables required without reducing the bandwidth or
efficency. This leads to greatly reduced costs.
The Nortel Optera DWDM is not the DWDM currently selected by Deutsche
Bank New World.
It is envisaged that this 200KM distance will be increased very shortly after
further validation by EMC engineering.
EMC SRDF works by forwarding write IO from a host to a Symmetrix onto a
second remote Symmetrix. This is done transparently to the host, which only
sees a slightly slower write IO.
During normal non-BCP operation, only write IO is sent to the remote
Symmetrix. A general rule of thumb is an application does 90-95% reads and
only 10-5% writes.
In order to calculate the additional latency we have to add the fixed overhead
of writing to 2 Symmetrix units, as opposed to 1, plus a variable value
according to distance to be replicated. The variable value is proportionate to
the speed of light, and it is not envisaged that EMC engineering will be able to
improve on this in the near future. The DWDM units and Connectrix switch
units required have negligible overhead. This overhead is per write IO.
Applications with a heavy write IO, or during batch runs, may experience IO
Queuing as write IOs queue to be sent across the SRDF link to the second
Symmetrix.
Various host best practises can greatly reduce the potential for IO Queuing,
and these are in use by Deutsche Bank.
SRDF can operate in a number of modes, and these modes can be
interactively switched on a very granular basis. These different modes can
again greatly reduce or even eliminate IO Queuing.
Thirdly, SRDF in conjunction with EMC TimeFinder can be architected into
various multi-hop and time sync enabled solutions.
In summary, SRDF can be extended across greater distances than previously
possible using DWDM technology, but this extra distance is not without cost to
the efficiency of the applications using it. Detailed examination of the service
levels required and the IO profile of the application must be examined to see if

Page 4 of 21

it is practical to use SRDF over extended distances in synchronous mode. If


the application is not suitable for synchronous mode SRDF over the distance
then there are other architected solutions available which may provide the
required level of protection.

Page 5 of 21

SRDF Overview
What is SRDF?
SRDF generates a mirror image of the data at the logical volume level in one
or more remote Symmetrix systems. These remote volumes can be made
addressable to remote hosts via software commands. SRDF Synchronous
mode (which is the default mode of operation at Deutsche Bank) was first
developed for Disaster Recovery within the customers campus. SRDF
Adaptive Copy modes were later developed to support long distance bulk data
transfers for data center relocations and content replication. Technology has
evolved to support Wide Area Networking (WAN) and multiple transports, thus
increasing distance and throughput for a wider variety of applications of
SRDF. Additional customer uses for SRDF include remote data warehousing,
remote test beds, remote report generation, remote backup and workload
sharing between hosts at the same or geographically remote sites.

Why SRDF?
SRDF is deployed in several key areas, delivering real benefits to their
organizations allowing companies to maintain access to data, so that revenue
producing or supporting applications continue to serve business functions.
SRDF can be used in several key areas including, but not limited to:
Business continuance: business applications continue running despite
possible disk failures.
Disaster recovery: data recovery at the disaster recovery site in minutes
rather than days.
Data centre migrations: application outage reduced to minutes instead of
hours.
Work load migrations: similar to the data centre migrations; especially useful
for minimizing outages during preventative maintenance of hardware or
software, or even data center powerdowns.
Shortening or eliminating backup windows: eliminate the backup window
by utilizing SRDFs second data copy.

How does SRDF work?


SRDF works in 3 different modes; synchronous, semi-synchronous, and
adaptive copy.
-

Synchronous. Data on the source (R1) and target (R2) volumes are
always fully synchronized at the completion of an I/O sequence
Semi-synchronous. Data on remotely mirrored volumes are always
synchronized between the source (R1) and the target (R2) prior to
initiating the next write operation to these volumes.
Adaptive copy. Adaptive Copy modes transfer data from the source
(R1) volume to the target (R2) volume and do not wait for receipt
acknowledgment and synchronization to occur.

SRDF writes are from cache to cache, hence when data is written from local
Symmetrix cache to remote Symmetrix cache over the SRDF link, the
Page 6 of 21

production Symmetrix waits for an acknowledgement from the remote


Symmetrix before data is written to local disk.

SRDF Storage Protocol used by Deutsche Bank


SRDF at Deutsche Bank uses a storage protocol based upon either the
ESCON or Fibre Channel FC-4 specifications to remotely mirror data
between Symmetrix units. The host attachment, I/O protocol, and disk data
structures required by each host are independent to the SRDF operation
between Symmetrix units. All existing production implementations at Deutsche
Bank use ESCON, though all future implementations, including the new
datacentre at Hayes, will use Fibre Channel.
The benefits of SRDF over Fibre Channel Point-to-Point include increased
SRDF throughput for all host types and increased connectivity options for
Open Systems. In addition, Fibre Channel maintains a peer-to-peer
relationship as opposed to the ESCON channel and control unit relationship
used at the ESCON RA director level. This increases the flexibility of SRDF in
cases where it is desired to have primary and secondary volumes located at
each side of the SRDF link.

Where SRDF is used at Deutsche Bank


SRDF is deployed between all the major MERs in the London campus, in
point-to-point configurations.

Page 7 of 21

DWDM Technology
Overview
Dense Wavelength Division Multiplexing (DWDM) is a process in which
multiple different or multiple individual channels of data are carried at different
wavelengths over one pair of fiber links. This contrasts to conventional fiber
optic systems in which just one channel is carried over a fiber pair.
For EMC customers this means that multiple SRDF channels and server
channels can be transferred over one pair of fiber links along with traditional
network traffic! This is especially important in locations where fiber links are at
a premium. For example, a customer may be leasing fiber, so the more traffic
they can run over a single link, the more cost effective the solution. With
todays technology, the capacity of a single pair of fiber strands is virtually
unlimited. The limitation comes from the DWDM itself. Optical to electrical
transfers for switching and channel protection are required and limit the input
traffic per channel.
SRDF over Fibre Channel does not currently support direct connections
between RF directors using WDM or DWDM unit port connections, due to
performance limitations and the relatively variable latencies of such links over
long distances.
DWDM units, however, are supported for SRDF traffic via ISL connections
using Fibre Channel switches such as the Connectrix family of Fibre Channel
switches.

Nortel Networks OPTera Metro


High capacity is inherent in Nortel Networks OPTera Metro DWDM (Dense
Wave Division Multiplex) solution. Each wavelength can support up to .5Gb/s,
while 32 or more such wavelengths can be multiplexed onto a single fiber. The
resulting aggregate supports capacities of 80Gb/s to provide high capacity
trunks between network elements.

Page 8 of 21

Nortel Networks OPTera Metro provides the ability to route wavelengths, and
therefore has the same survivability capabilities as current TDM rings when
deployed in a ring topology. OPTera Metro provides a reliable DWDM platform
for enterprises with large-scale connectivity requirements. OPTeras
transparent capabilities enable these enterprises to control the cost and

DWDM
Fiber
8 to 64 wavelengths

Dense wavelength division multiplexing


Acts as an optical funnel
Multiple protocol independent streams on a single fiberoptic cable pair
Each wavelength represents a unique stream of data
which may have a different data rate

management requirements of connectivity, ensure network integrity, Increase


network robustness, and easily accommodate emerging communications
protocols.
Features and Benefits
Support of SONET/SDH and non-SONET/SDH interfaces
Protocol and bit-rate independence
32 protected wavelengths,64 unprotected wavelengths
P r-wavelength flexible protection switching
Scalable from 16 Mbps to 2.5 Gbps per wavelength
Point-to-point and survivable ring up to 120km
In-band, per wavelength Optical Service Channel
Point and click GUI management system
Open systems management platform
NEBS and ETSI compliant

Page 9 of 21

Latency
SRDF induced delays
Synchronous or even semi-synchronous mirroring of data can cause impacts
to customer workloads. The impact to any given workload will vary according
to:
- The blocksize of the data being remote mirrored
- The distance over which the remote mirroring is being done
- The remote mirroring mode used (e.g.. Synchronous, semisynchronous, adaptive copy)
- The type of connection between the source and target Symmetrix units
- The arrival rate of the write IOs at the source Symmetrix
The degree to which a customer workload is impacted by delays induced by
SRDF mirroring will not only vary according to the amount of the delay, but
also due to the nature of the workload. Some workloads will not be impacted
by extended response times on workload components that are critical for
recovery. Other workloads could be severely impacted if the affected
component is on the critical path for end user transaction response time. (e.g..
An increase in response time to the online Redo logs in an Oracle
environment will invariably cause end user transaction response time to
degrade.)
In order to approximate the amount of delay likely to be introduced by
SRDFing the data for any given workload, one should:
- Determine the type of SRDF implementation that is likely to be installed
- Calculate the propagation delay induced by the link (calculated by
multiplying the round trip link distance in kilometres by 0.005 msec/km,
and then by 3 if campus ESCON is to be used, or by 1 if a telco link
(e.g. T3, ATM, etc) is to be used, or by 2 for SRDF over Fibre Channel.
To this it will be necessary to add an allowance for protocol time within
the both the source and target Symmetrix, as well as allowances for
delays induced by protocol converters, network equipment, etc.)
- Add the approximated SRDF link delay times to the current or
anticipated non SRDFed IO response times.
- Determine the likely impact on the customer workload, remembering
that the impact will inevitably follow Littles Law1.

Littles Law is the basis upon which a lot of queuing theory is built. In general terms, Littles
Law relates the average queue length (Q) to the arrival rate of transactions (a) and the
average response time (R). Specifically, Littles Law states:
Q = a * R.
Consequently, it can be seen that any increase in IO response time may well cause a
significant blowout in the queue length within the application, which may or may not be
supportable from a customer business perspective.
Page 10 of 21

Example
This document is concentrating on SRDF over Fibre Channel. Write IO is
transmitted using SCSI over Fibre Channel, and so according to the SCSI
protocol every IO to be transmitted actually requires 2 round trips; the first is
the SCSI command word (for SRDF this will be WRITE), the remote
Symmetrix then returns the acknowledgement. The second trip is for the
actual data, followed by the acknowledgement from the remote Symmetrix
that the data has been written to cache and confirmed. This leads to the X2
propagation delay described above.

2.1MS

3.9MS

The picture above illustrates the host response time without SRDF (Baseline),
and the overhead of running SRDF over zero distance (Campus) for 4K and
27K blocksize.
Working through a 4K blocksize example, we have a 2.0MS host response
time for zero distance. Add to this a 100KM distance the approximate
distance from London to Milton Keynes - ((100KM + 100KM + 100KM +
100KM)*0.005)=2.0 a total of 4MS response time per write IO.
Heavy write activity on 1 volume may mean that IOs are queued waiting for
the previous IO to be acknowledged from the remote Symmetrix, and so you
may get IO elongation, with IOs waiting on IOs on IOs (see Littles Law
above).
Note: There is no significant Latency through Switches or DWDMs

Page 11 of 21

Recommendations for Handling High Activity Data


As a general rule of thumb, and depending on the nature of the application
being supported, the distance over which the data is to remote mirrored, etc,
in order to ensure acceptable overall IO response times it is desirable that no
single logical volume involved in a remote mirroring relationship be required to
handle more than 100 write IOs/sec at 200KMs. This figure is derived from the
maximum number of IOs that a logical volume can sustain at that distance (4K
blocksize max 175 write IOs per second, 27K blocksize max 125 write IOs
per second). It must be remembered that only 1 IO for a volume can be in the
SRDF pipe at a time, though multiple IOs can be in the pipe at the same
time.
In order to reduce the IO rate to any given logical volume to this sort of level, it
may be necessary to implement some of the following.
- Wherever possible high activity data should be spread over as many
logical volumes as possible, so as to reduce the overall IO rate per
volume, ie host level striping.
- If possible, increase host level buffering and blocksizes so as to reduce
the number of IOs done by the application.
- When dealing with high activity IO caused by large, single address
space tasks (e.g. database control regions, etc), it may be necessary to
break the tasks into multiple smaller tasks, so as to reduce the amount
of data generated on a per region basis to more manageable levels.
This is a non-trivial task, as it may have significant impact on the
customers application architecture, and will require significant
involvement from customer personnel such as Data Base Analysts, etc.
- If necessary, re-design the application so as to achieve the desired IO
rate on a per volume basis.

Page 12 of 21

Types of applications that may not be suited to


synchronous replication.
1) Database applications which exhibit very high transaction throughput and
therefore a high number of log writes.
2) Database Applications that have a high transaction rate and perform excessive
number of Consistency Points operations (perhaps as a result of frequent log
switch operations)
3) Applications which exhibit high volumes of I/O writes.
4) Applications that are highly sensitive to synchronous write I/O performance (nonbuffered synchronous writes)
5) Any highly time-bound write intensive application process where any elongation
of write I/O would impact application performance

Page 13 of 21

SRDF best practices in use at Deutsche Bank


Various best practices can reduce the impact of IO Queuing and IO
elongation.
The simplest is to make sure that all filesystems are built on host level striped
volumes. The reason for this is that the SRDF 'pipe' or queue can only have 1
IO for a Symmetrix volume going across it at any time. The pipe can contain
more than 1 IO, but not for the same Symmetrix volume. By creating a striped
volume set at the host level you get 2 immediate effects when the host writes
an IO. If we were to write IOs to a striped filesystem spread over 4 Symmetrix
volumes then the 2 benefits would be:
1) the host knows it is writing to a striped set and issues more IOs to
the disk subsystem, as it knows it is actually writing to 4 volumes
2) more IOs can go across the SRDF 'pipe' to the remote Symmetrix as
the IOs are to 4 Symmetrix volumes rather just 1. This reduces queuing for
pipe.
Host level LVM striping is being used as a best practice by nearly all projects
based on EMC Symmetrix.

Page 14 of 21

Alternative strategies
The latency overhead can also be masked from the user if an alternative
replication strategy is adopted namely, Semi Synchronous or Multi-Hop
replication.
Another strategy would be combining the benefits of SRDF with an Oracle
automated standby database. This solution requires only that the online redo
logs be synchronously replicated, thus drastically reducing communication
needs.
The following strategies could help alleviate latency overhead with SRDF
deployed over extended distances.

1) SRDF Semi-Synchronous mode


This is used primarily in extended distance environments. In this mode of
operation, data on the remotely mirrored volumes are always synchronized
between the source (R1) volume and the target (R2) volume prior to initiating
the next write operation to these volumes.
The sequence of operations is:
1.An I/O write is received from the host/server into the cache of the source.
2. An ending status is presented to the host/server.
3.The I/O is transmitted to the cache of the target.
4. A receipt acknowledgment is provided by the target back to the cache of the
source.
Semi-Synchronous mode masks the impact of distance in the general case,
because it allows read operations while write operations are in transit.
SRDF uses a first-in, first-out queue.

SRDF SEMI-SYNCHRONOUS MODE

2
4
SRDF links

3
Source

Target

Target behind at most one write operation per source logical volume
Page 15 of 21

2) SRDF Adaptive Copy mode


SRDF Adaptive Copy mode is used primarily for data migrations and data
centre moves. This operational mode is not recommended for use when
mirroring for disaster recovery.
SRDF Adaptive Copy mode allows the source (R1) volumes and target (R2)
volumes to be a few or many I/Os out of synchronization. The number of
tracks out of synchronization (skew) is user selectable.
There are two types of adaptive copy: Write Pending mode and Disk mode.
The sequence of operations is:
1.
An I/O write is received from the host/server into the cache of the
source Symmetrix
2.
The I/O is acknowledged as completed to the host/server
3.
The I/O is placed in the SRDF queue
4.
The I/O is de-staged from cache to the source (R1) volume, and an
issue request is sent to the SRDF link
5.
The I/O is transmitted to the cache of the target
6.
A receipt acknowledgment is provided by the target back to the cache
of the source.
Adaptive Copy Write Pending mode allows the transmission to take place
before the data is de-staged from cache to the R1 disk volumes.
Adaptive Copy Disk mode de-stages the data from the cache to the R1
volume and then keeps track-level information as to what data is owed to the
remote side so that information can be subsequently sent a track at a time.
SRDF Adaptive Copy mode is used primarily for data migrations, data center
moves, and in conjunction with SRDF over Internet Protocol (IP) links. This
mode of operation also can be used in an SRDF Multi-Hop configuration to
mirror TimeFinder Business Continuance Volumes (BCVs)/R1 changed tracks
between the intermediate target site and the final (Multi-Hop) target site.
N.b Thresholds for how far out of synch the volumes are allowed to be is
selectable by the user with the skew command.

Page 16 of 21

3) SRDF Multi-Hop mode


TimeFinder software works by configuring multiple, independently
addressable online Business Continuance Volumes (BCVs) for information
storage. The BCV is a Symmetrix device with special attributes created when
the Symmetrix is configured. It can function either as an additional mirror to a
Symmetrix logical volume or as an independent, host-addressable volume.
Establishing BCV devices as mirror images of active production volumes
allows you to run multiple simultaneous business continuance tasks in
parallel. The principal device, known as the standard device, remains on line
for regular Symmetrix operation from the original production server. Each BCV
contains a unique host address, making it accessible to a separate
backup/recovery server. When you establish a BCV as a mirror of a standard
device, that relationship is known as a BCV pair. The BCV is temporarily
inaccessible to its host until you split the BCV pair.
The multi-hop restart solution is applicable when you want zero data loss in
the event of a disaster at the local site. Zero data loss means that the state of
the data at the Hop 2 restart site (after being propagated from the Hop 1
bunker site) is the same as it is at the local source site at the the beginning of
a rolling disaster.
Automated replication with the BCVs at Hop 2 is applicable if you want a zero
data loss solution but cannot risk the loss of both the local source site and
Hop 1 bunker site at the same time. With this configuration, there are two
possible disaster restart possibilities:
- If only the local source site is lost, the result is zero data loss at the
Hop 2 restart site.
- If both the local source site and the Hop 1 bunker site are lost, the
result is a DBMS restartable copy at the Hop 2 restart site with
controlled data loss. The amount of data loss will be a function of the
replicate copy cycle time between the Hop 1 bunker site and the Hop 2
restart site.

Page 17 of 21

Local

Hop2

Hop1

EM C

EM C

S YM M ETRIX

EM C

S YM M ETRIX

S YM M ETRIX

R2

BCV

R1

R1
BCV

R2
1

is
another approach to the issues introduced by distance-based latency.
Here,TimeFinder is used to create a point-in-time BCV of the production
volume. SRDF Multi-Hop would then treat the BCV as an R1 or source device.
Its R2 target would be at the other end of the link.
In Multi-Hop scenarios, the links between the first location and the
intermediate location are run synchronously. Then the TimeFinder software
performs the splits described above. The links between the intermediate site
and the distant site are usually Adaptive Copy mode due to the issues of
latency.
Multi-Hop is the best of both worlds: fully synchronous for performance
between sites A and B but Adaptive Copy to keep line costs down between B
and C, the disaster recovery site.

Page 18 of 21

4) Oracle8I Automated Standby Database.


The automated standby database is one of the prime solutions to ensure
business continuity after a disaster. It achieves this with reduced amounts of
inter site traffic by only shipping Archived redo logs. In the event of a disaster,
a standby database can take over the processing and data serving
responsibility from the primary database, providing near continuous database
availability. The Oracle 8I automated Standby database and SRDF provide
the means to create and automatically maintain, one or more copies of a
Production database against disasters.
A standby database is initially created by copying, or cloning the Production
database at a remote site. Archived Redo Logs are copied by SRDF to the
remote site. The Standby database is able to begin managed recovery when
the next archived log generated by the Primary database is applied in
managed recovery mode.

Primary DB

Failover DB

Logs Copied over SRDF Link


Logs Applied

On-Line
Redo
Logs

Archived
Redo
Logs

Archived
Redo
Logs

Page 19 of 21

Conclusion
EMC Engineering has validated the Nortel Optera DWDM for use with EMC SRDF
up to 200KM in a point-to-point configuration.
For Deutsche Bank to replicate data in a Synchronous copy mode between sites,
careful consideration must be given as to whether the nature and characteristics of the
application are suited to a Synchronous copy mode configuration, or whether the
application user response times will be adversely effected by the latency issues
described in this document.
If an application or its components exhibit high I/O writes, or high transaction rates,
then alternative SRDF replication modes should be considered to avoid these latency
issues.

S-ar putea să vă placă și