Documente Academic
Documente Profesional
Documente Cultură
Deutsche Bank
Introduction..........................................................................................................................3
Executive Overview................................................................................................4
SRDF Overview..............................................................................................................6
What is SRDF?............................................................................................................................6
Why SRDF?...................................................................................................................................6
How does SRDF work?..........................................................................................................6
SRDF Storage Protocol used by Deutsche Bank.....................................................7
Where SRDF is used at Deutsche Bank........................................................................7
DWDM Technology...................................................................................................8
Overview.........................................................................................................................................8
Nortel Networks OPTera Metro.........................................................................................8
Latency......................................................................................................................................10
SRDF induced delays.............................................................................................................10
Example..........................................................................................................................................11
Recommendations for Handling High Activity Data..........................................12
Page 2 of 21
Introduction
Recent engineering work by EMC has validated the Nortel Optera DWDM
(Dense Wave Division Multiplexer) for use with EMC SRDF to a distance of
200KM. This document explains which applications can benefit from this
extended distance mirroring, and which cannot. It also offers alternative
solutions to support the protection of data if the application cannot support
extended distance mirroring.
Page 3 of 21
Executive Overview
EMC Engineering has recently validated the Nortel Optera DWDM for use with
EMC SRDF up to 200KM.
DWDM technology allows for the packaging up of multiple SRDF links into a
smaller number of physical telecomms fibre cable thus reducing the number
of telecomms fibre cables required without reducing the bandwidth or
efficency. This leads to greatly reduced costs.
The Nortel Optera DWDM is not the DWDM currently selected by Deutsche
Bank New World.
It is envisaged that this 200KM distance will be increased very shortly after
further validation by EMC engineering.
EMC SRDF works by forwarding write IO from a host to a Symmetrix onto a
second remote Symmetrix. This is done transparently to the host, which only
sees a slightly slower write IO.
During normal non-BCP operation, only write IO is sent to the remote
Symmetrix. A general rule of thumb is an application does 90-95% reads and
only 10-5% writes.
In order to calculate the additional latency we have to add the fixed overhead
of writing to 2 Symmetrix units, as opposed to 1, plus a variable value
according to distance to be replicated. The variable value is proportionate to
the speed of light, and it is not envisaged that EMC engineering will be able to
improve on this in the near future. The DWDM units and Connectrix switch
units required have negligible overhead. This overhead is per write IO.
Applications with a heavy write IO, or during batch runs, may experience IO
Queuing as write IOs queue to be sent across the SRDF link to the second
Symmetrix.
Various host best practises can greatly reduce the potential for IO Queuing,
and these are in use by Deutsche Bank.
SRDF can operate in a number of modes, and these modes can be
interactively switched on a very granular basis. These different modes can
again greatly reduce or even eliminate IO Queuing.
Thirdly, SRDF in conjunction with EMC TimeFinder can be architected into
various multi-hop and time sync enabled solutions.
In summary, SRDF can be extended across greater distances than previously
possible using DWDM technology, but this extra distance is not without cost to
the efficiency of the applications using it. Detailed examination of the service
levels required and the IO profile of the application must be examined to see if
Page 4 of 21
Page 5 of 21
SRDF Overview
What is SRDF?
SRDF generates a mirror image of the data at the logical volume level in one
or more remote Symmetrix systems. These remote volumes can be made
addressable to remote hosts via software commands. SRDF Synchronous
mode (which is the default mode of operation at Deutsche Bank) was first
developed for Disaster Recovery within the customers campus. SRDF
Adaptive Copy modes were later developed to support long distance bulk data
transfers for data center relocations and content replication. Technology has
evolved to support Wide Area Networking (WAN) and multiple transports, thus
increasing distance and throughput for a wider variety of applications of
SRDF. Additional customer uses for SRDF include remote data warehousing,
remote test beds, remote report generation, remote backup and workload
sharing between hosts at the same or geographically remote sites.
Why SRDF?
SRDF is deployed in several key areas, delivering real benefits to their
organizations allowing companies to maintain access to data, so that revenue
producing or supporting applications continue to serve business functions.
SRDF can be used in several key areas including, but not limited to:
Business continuance: business applications continue running despite
possible disk failures.
Disaster recovery: data recovery at the disaster recovery site in minutes
rather than days.
Data centre migrations: application outage reduced to minutes instead of
hours.
Work load migrations: similar to the data centre migrations; especially useful
for minimizing outages during preventative maintenance of hardware or
software, or even data center powerdowns.
Shortening or eliminating backup windows: eliminate the backup window
by utilizing SRDFs second data copy.
Synchronous. Data on the source (R1) and target (R2) volumes are
always fully synchronized at the completion of an I/O sequence
Semi-synchronous. Data on remotely mirrored volumes are always
synchronized between the source (R1) and the target (R2) prior to
initiating the next write operation to these volumes.
Adaptive copy. Adaptive Copy modes transfer data from the source
(R1) volume to the target (R2) volume and do not wait for receipt
acknowledgment and synchronization to occur.
SRDF writes are from cache to cache, hence when data is written from local
Symmetrix cache to remote Symmetrix cache over the SRDF link, the
Page 6 of 21
Page 7 of 21
DWDM Technology
Overview
Dense Wavelength Division Multiplexing (DWDM) is a process in which
multiple different or multiple individual channels of data are carried at different
wavelengths over one pair of fiber links. This contrasts to conventional fiber
optic systems in which just one channel is carried over a fiber pair.
For EMC customers this means that multiple SRDF channels and server
channels can be transferred over one pair of fiber links along with traditional
network traffic! This is especially important in locations where fiber links are at
a premium. For example, a customer may be leasing fiber, so the more traffic
they can run over a single link, the more cost effective the solution. With
todays technology, the capacity of a single pair of fiber strands is virtually
unlimited. The limitation comes from the DWDM itself. Optical to electrical
transfers for switching and channel protection are required and limit the input
traffic per channel.
SRDF over Fibre Channel does not currently support direct connections
between RF directors using WDM or DWDM unit port connections, due to
performance limitations and the relatively variable latencies of such links over
long distances.
DWDM units, however, are supported for SRDF traffic via ISL connections
using Fibre Channel switches such as the Connectrix family of Fibre Channel
switches.
Page 8 of 21
Nortel Networks OPTera Metro provides the ability to route wavelengths, and
therefore has the same survivability capabilities as current TDM rings when
deployed in a ring topology. OPTera Metro provides a reliable DWDM platform
for enterprises with large-scale connectivity requirements. OPTeras
transparent capabilities enable these enterprises to control the cost and
DWDM
Fiber
8 to 64 wavelengths
Page 9 of 21
Latency
SRDF induced delays
Synchronous or even semi-synchronous mirroring of data can cause impacts
to customer workloads. The impact to any given workload will vary according
to:
- The blocksize of the data being remote mirrored
- The distance over which the remote mirroring is being done
- The remote mirroring mode used (e.g.. Synchronous, semisynchronous, adaptive copy)
- The type of connection between the source and target Symmetrix units
- The arrival rate of the write IOs at the source Symmetrix
The degree to which a customer workload is impacted by delays induced by
SRDF mirroring will not only vary according to the amount of the delay, but
also due to the nature of the workload. Some workloads will not be impacted
by extended response times on workload components that are critical for
recovery. Other workloads could be severely impacted if the affected
component is on the critical path for end user transaction response time. (e.g..
An increase in response time to the online Redo logs in an Oracle
environment will invariably cause end user transaction response time to
degrade.)
In order to approximate the amount of delay likely to be introduced by
SRDFing the data for any given workload, one should:
- Determine the type of SRDF implementation that is likely to be installed
- Calculate the propagation delay induced by the link (calculated by
multiplying the round trip link distance in kilometres by 0.005 msec/km,
and then by 3 if campus ESCON is to be used, or by 1 if a telco link
(e.g. T3, ATM, etc) is to be used, or by 2 for SRDF over Fibre Channel.
To this it will be necessary to add an allowance for protocol time within
the both the source and target Symmetrix, as well as allowances for
delays induced by protocol converters, network equipment, etc.)
- Add the approximated SRDF link delay times to the current or
anticipated non SRDFed IO response times.
- Determine the likely impact on the customer workload, remembering
that the impact will inevitably follow Littles Law1.
Littles Law is the basis upon which a lot of queuing theory is built. In general terms, Littles
Law relates the average queue length (Q) to the arrival rate of transactions (a) and the
average response time (R). Specifically, Littles Law states:
Q = a * R.
Consequently, it can be seen that any increase in IO response time may well cause a
significant blowout in the queue length within the application, which may or may not be
supportable from a customer business perspective.
Page 10 of 21
Example
This document is concentrating on SRDF over Fibre Channel. Write IO is
transmitted using SCSI over Fibre Channel, and so according to the SCSI
protocol every IO to be transmitted actually requires 2 round trips; the first is
the SCSI command word (for SRDF this will be WRITE), the remote
Symmetrix then returns the acknowledgement. The second trip is for the
actual data, followed by the acknowledgement from the remote Symmetrix
that the data has been written to cache and confirmed. This leads to the X2
propagation delay described above.
2.1MS
3.9MS
The picture above illustrates the host response time without SRDF (Baseline),
and the overhead of running SRDF over zero distance (Campus) for 4K and
27K blocksize.
Working through a 4K blocksize example, we have a 2.0MS host response
time for zero distance. Add to this a 100KM distance the approximate
distance from London to Milton Keynes - ((100KM + 100KM + 100KM +
100KM)*0.005)=2.0 a total of 4MS response time per write IO.
Heavy write activity on 1 volume may mean that IOs are queued waiting for
the previous IO to be acknowledged from the remote Symmetrix, and so you
may get IO elongation, with IOs waiting on IOs on IOs (see Littles Law
above).
Note: There is no significant Latency through Switches or DWDMs
Page 11 of 21
Page 12 of 21
Page 13 of 21
Page 14 of 21
Alternative strategies
The latency overhead can also be masked from the user if an alternative
replication strategy is adopted namely, Semi Synchronous or Multi-Hop
replication.
Another strategy would be combining the benefits of SRDF with an Oracle
automated standby database. This solution requires only that the online redo
logs be synchronously replicated, thus drastically reducing communication
needs.
The following strategies could help alleviate latency overhead with SRDF
deployed over extended distances.
2
4
SRDF links
3
Source
Target
Target behind at most one write operation per source logical volume
Page 15 of 21
Page 16 of 21
Page 17 of 21
Local
Hop2
Hop1
EM C
EM C
S YM M ETRIX
EM C
S YM M ETRIX
S YM M ETRIX
R2
BCV
R1
R1
BCV
R2
1
is
another approach to the issues introduced by distance-based latency.
Here,TimeFinder is used to create a point-in-time BCV of the production
volume. SRDF Multi-Hop would then treat the BCV as an R1 or source device.
Its R2 target would be at the other end of the link.
In Multi-Hop scenarios, the links between the first location and the
intermediate location are run synchronously. Then the TimeFinder software
performs the splits described above. The links between the intermediate site
and the distant site are usually Adaptive Copy mode due to the issues of
latency.
Multi-Hop is the best of both worlds: fully synchronous for performance
between sites A and B but Adaptive Copy to keep line costs down between B
and C, the disaster recovery site.
Page 18 of 21
Primary DB
Failover DB
On-Line
Redo
Logs
Archived
Redo
Logs
Archived
Redo
Logs
Page 19 of 21
Conclusion
EMC Engineering has validated the Nortel Optera DWDM for use with EMC SRDF
up to 200KM in a point-to-point configuration.
For Deutsche Bank to replicate data in a Synchronous copy mode between sites,
careful consideration must be given as to whether the nature and characteristics of the
application are suited to a Synchronous copy mode configuration, or whether the
application user response times will be adversely effected by the latency issues
described in this document.
If an application or its components exhibit high I/O writes, or high transaction rates,
then alternative SRDF replication modes should be considered to avoid these latency
issues.