Documente Academic
Documente Profesional
Documente Cultură
Without Transaction Log: no guarantee of recovery on data DSA, machine or OS failure but provides optimal write/modify performance. This is suggested for environments which are expected to support very high volume concurrent write/modify performance of many thousands of updates per second (performance metrics will vary depending on disk speed) and is closely monitored by system administrators for failure. The document details how to implement a data DSA disaster recovery plan if there is a need to: Manually recover a single data DSA from its online peer data DSA(s). This would be needed when a) running a data DSA without transaction log and the data DSA, machine or OS failed or b) when running a data DSA with transaction log (no flushing) and the machine or OS failed. Manually recover a single data DSA that has been unavailable for such a long time (e.g. weeks) that performing disaster recovery is a quicker method to get the data DSA online than relying on multiwrite-DISP recovery. Manually recover a single data DSA because one of its peer DSA(s) in-memory multiwrite queue has blown only applicable to when multiwrite replication is being used.
2009 CA, Inc. All rights reserved. Confidential and Proprietary Information Version: 1.0 August 09
Table of Contents
Executive Summary .................................................................................................. 3 Product Version ....................................................................................................... 3 Data DSA Replication ................................................................................................ 4 Data DSA Operational Modes .................................................................................... 5 Data DSA Disaster Recovery ..................................................................................... 7 Online Data DSA Backup ........................................................................................... 7 How to Implement a Data DSA Disaster Recovery Plan .................................................. 8 When using multiwrite-DISP or DISP replication ........................................................ 8 When using multiwrite replication .......................................................................... 10
Executive Summary
This document provides specific advice on how to configure CA Directory for data replication and data recovery for optimal performance, reliability, resilience and scalability. Using two or more CA Directory servers allows for 24x7 service and data availability. For example, when a CA Directory server is brought down for system maintenance (e.g. applying OS patches or hardware upgrades) or a server fails, there must always be at least one other server online. The key to providing 100% service and data availability for your LDAP client applications is to ensure that there is always at least one CA Directory server available and that server can accommodate any peak load the LDAP client applications might push at it. At a single Data Center, the advice is to use at least two CA Directory servers - all CA Directory servers replicating synchronously using a multiwrite-DISP configuration. Preferred master or multi-master setups are both valid configurations and dependent on customer preference and use case. In general however, preferred master is preferable to multi-master as best-practice. Use the write-precedence setting in the data DSA settings file to define the preferred master server for a set of peer data DSAs. For example, you might define set write-precedence = Host1-UserStore; in each UserStore DSA settings file on Host1, Host2 and Host3 if the data DSA is named UserStore and there are three hosts named Host1, Host2 and Host3 hosting the UserStore data DSA. If two or more Data Centers are being used, the advice is to use two or more CA Directory servers per Data Center - all CA Directory servers within a Data Center replicating synchronously using a (preferred master or multi master) multiwrite-DISP configuration and CA Directory servers between Data Centers replicating asynchronously using a multi-write group configuration. It is CA Directory best practice advice to use a null prefix router DSA on each CA Directory server all client LDAP requests should enter the CA Directory X500 backbone via a router DSA. A router DSA will route the LDAP requests to the appropriate data DSA using a shortest path routing algorithm. This document describes in detail how to configure replication between CA Directory servers and how to ensure they are always kept in-sync with their peer data DSAs.
Product Version
CA Directory r12 SP1 Service Release 2 (Build 2266) and later.
b) If using more than one Data Center, add a specific string for the multi-write-group setting to logically group all DSAs and hosts within a specific Data Center (suggest using the Data Center name or location for the string):
set dsa dsaname = { ... multi-write-group = <data center name> dsa-flags = ... ... };
If using DXmanager XML configuration, define the following data DSA settings: a) Add the following replication settings for each data DSA:
b) Define hosts within a single Site if using one Data Center or define multiple Sites to logically group all DSAs and hosts within each specific Data Center (suggest using the Data Center name or location for the name of a Site):
The data files location should be defined when the data DSA is created using the dxgrid-dblocation setting defined in the DSAs sever (.dxi) file if using ASCII configuration or defined explicitly in DXmanager if using the XML configuration the default is $DXHOME/data on UNIX/Linux and %DXHOME%\data on Windows. There is a trade-off in DSA operational modes write performance versus guaranteed automatic data recovery. Which mode is chosen is dependent on a number of factors but ultimately is a deployment specific choice. Using a transaction log will help automatic data recovery when a data DSA, machine or OS stops unexpectedly. However, performance of write/modify operations will not be optimal ballpark around 100 updates per second when flushing and a few thousand updates per second when not flushing (performance metrics will vary depending on disk speed). Performance of the transaction log file can be increased by hosting the data DSA files on a high-performing disk system such as a SAN or using RAID (e.g. RAID 1+0 or RAID 5). The network performance between the server and the disk system would then be the likely performance barrier. Flushing of the transaction log file will guarantee that a data DSA will be able to automatically restart and recover data even on machine or OS failure (without system administrator manual intervention). Not flushing the transaction log file will guarantee that a data DSA will always be able to restart on a DSA failure but not guaranteed on machine or OS failure. It is impossible for the data DSA to differentiate between the cases where there was a DSA, machine or OS failure when it is starting up. Thus, when not flushing the transaction log file, a system administrator needs to know whether to just restart the data DSA because they know there was a DSA failure or implement their data DSA disaster recovery plan because they know there was a machine or OS failure. Not using a transaction log will provide optimal write/modify performance ballpark thousands of updates per second. However, if there is a DSA, machine or OS failure,
there will be a need for a system administrator to implement their data DSA disaster recovery plan. By default the transaction log file and flushing is enabled when a data DSA is created. To disable the transaction log file define: set disable-transaction-log = true; in the data DSA settings file. To disable the transaction log file flushing define: set disable-transaction-log-flush = true; in the data DSA settings file. Which data DSA operational mode you choose to use is dependent on a number of factors and is customer deployment specific. Some questions that would determine which mode to use would be: Which type of application is accessing CA Directory? Do those application have a requirement to support a high-volume of update/modify operations? Hundreds of update/modify operations per second? Thousands of update/modify operations per second? Are the update/modify operations mission critical e.g. session/token data may be perceived as transient and not mission critical as it is only relevant for the time a user is logged in. In this case performance overrides any need to guarantee that the data will be available on DSA or machine failure. How closely will the system administrators monitor the CA Directory deployment? When a data DSA, machine or OS fails, will the system administrators take action promptly? What is the historical stability or expected availability of the hardware and/or OS that the CA Directory deployment will be running on? For example, if deploying to a closely controlled UNIX environment that has had 99.999% or above uptime for some years may provide more confidence to deploy with transaction log and no flushing or no transaction log at all. If deploying to a virtual environment which does not have strict access or management control in place may mean you are more conservative in choosing to deploy using a transaction log file with full flushing.
Note that using a CA Directory transaction log file with flushing will likely provide equivalent, if not better, write performance than that provided by disk I/O bound database solutions provided by LDAP vendors.
dump dxgrid-db period 0 7200; On a peer data DSA on server 2, you might define: dump dxgrid-db period 3600 7200; Note, that when using horizontally partitioned data DSAs on separate servers or with data in a DSA that has a logical relationship to data hosted in a separate distributed DSA, you would schedule the online backup for those data DSAs at the same time. An online backup can be taken at any time by connecting to the data DSA via the DXconsole interface on the CA Directory server (i.e. telnet localhost <console-port> where console-port and optional console connection details can be found in the data DSA knowledge file if using ASCII configuration or via DXmanager if using XML configuration). Run the following command within the DXconsole interface: dump dxgrid-db;
CA Directory r12 Data DSA Disaster Recovery plan when using multiwrite-DISP or DISP replication DSA Host1-UserStore
DSA Host1-UserStore fails and requires DSA disaster recovery
DSA Host2-UserStore
DSA Host3-UserStore
t0
t1
dxdisp Host1-UserStore
dxdisp Host1-UserStore
dxdisp Host1-UserStore
t2
Perform an online backup of Host3-UserStore creating: Host3-UserStore.zdb Host3UserStore.zoc Host3-UserStore.zat Remove Host1-UserStore data files including .dp and .tx files. Rename: Host3-UserStore.zdb -> Host1-UserStore.db Host3-UserStore.zoc->Host1-UserStore.oc Host3-UserStore.zat->Host1-UserStore.at
t3
t4
Notes There is a need to dxdisp the recovering DSA on all machines (including the recovering machine) at step t1 BEFORE doing the online backup. This is to prevent the peer DSAs on those machines forwarding updates to the recovering machine when the DSA is started from before the time the backup was performed. The effect of running dxdisp Host1-UserStore on Host2, Host3 etc... is to update the date/time of the last communication that Host2-UserStore and Host3-UserStore had with Host1-UserStore to now in the .dp/.dx file.
10
CA Directory r12 Data DSA Disaster Recovery plan when using multiwrite replication DSA Host1-UserStore
DSA Host1-UserStore fails and requires DSA disaster recovery
DSA Host2-UserStore
DSA Host3-UserStore
t0
t1
t2
Perform an online dump of Host3-UserStore creating: Host3-UserStore.zdb Host3-UserStore.zoc Host3-UserStore.zat Remove Host1-UserStore data files including .dp and .tx files. Rename: Host3-UserStore.zdb -> Host1-UserStore.db Host3-UserStore.zoc->Host1-UserStore.oc Host3-UserStore.zat->Host1-UserStore.at
t3
t4
Notes It is advised to clear the multiwrite queue for the recovering DSA on all machines (except the recovering machine) at step t1 BEFORE doing the online backup. This is to ensure that any updates applied much earlier than the online backup time in step t2 are not replicated when the recovering DSA is started in step t4.
11