Sunteți pe pagina 1din 57

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates.

Lawrence To & Joe Meeks


Oracle

Jeffrey McCormick
The Hartford

What They Didn't Print in the Doc


HA Best Practices by Gurus from Oracles Maximum Availability Architecture Team

Agenda
Maximum Availability Architecture (MAA) The Hartford and MAA HA Best Practices, Tips and Results
Turbocharged Data Guard Oracle Snapshots and Clones More Uptime for Planned Downtime Transparent Client Failover for Disaster Recovery

Maximum Availability Architecture - MAA


! Oracle recommended architecture and best practices for High Availability
! Database, Application Server, Enterprise Manager, Collaboration Suite and Oracle Applications Improved and validated with new Oracle versions, features and product suites Focused on reducing unplanned and planned downtime Focused on making customers successful

http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm

Our Approach
Develop HA solutions and features Work closely with different development teams
Provide feedback early in the development cycles Integrate features and test before and after release

Deploy MAA on internal production systems Design and influence future solutions and features Partner with strategic infrastructure providers Document in best practice books and white papers

35 Person Years of Effort & Growing

Strategic MAA Partners


Servers Dell, HP Network F5, Qlogic, Foundry Networks, Emulex Storage Apple, Engenio, NetApp, HP, EMC

Our success measured by the response from customers like you . . .

Jeff McCormick
Senior Data Architect The Hartford
$22.7 billion in revenue Leading provider of investment products, life insurance, employee benefits, auto, homeowner & business insurance Largest seller of individual annuities in U.S. 11,000 agencies, 100,000 broker/dealers 30,000 employees

Architecture Review
Focus on Business Continuity Assess information technology architectures Minimize/avoid planned & unplanned downtime Rapid recovery/failover to remote location Provide excellent service at lowest cost Retain flexibility to incorporate new technology

The Hartford Future State


Application Access
Real Application Cluster

Database REDO

Database REDO
Data Guard Standby Data Guard Standby Data Guard Standby Data Guard Standby

Storage Array

Storage Array

Storage Array

Primary

Primary

Media Server Tape Drive RMAN

Media Server Tape Drive RMAN

Primary Site

Secondary Site

Tertiary Site

The Value of MAA to The Hartford


Simple . . .

Implement a High Availability solution that offers considerable savings in cost, resources, and time.

MAA Best Practices Lawrence To Oracle

Turbocharged Data Guard Disaster Recovery Solution for Oracle Databases

Data Guard Best Practices


Test results show significant out of the box improvements with Data Guard Release 10.2
Reduction of failover times, potential data loss and primary database impact More efficient redo transport Data Guard SYNC implementation is less impact than remote mirroring implementation

New Data Guard Feature:


Fast-Start Failover
Automatic and fast
Logical standby achieved < 20 seconds Physical standby achieved < 20 seconds Old primary is reinstated automatically once connectivity is reestablished between observer and primary database Attend Session 937, Best Practices for Automatic Failover Using Oracle Data Guard 10g Release 2

Data Guard Best Practices:


Switchover for Planned Maintenance
For fastest switchover (< 1 minute)
Prior to switchover a physical standby transitioning from read only back to Redo Apply should be restarted disconnect all sessions and stop job processing shutdown abort for all secondary RAC instances enable real-time apply on the standby database and ensure the standby is synchronized or caught up with the primary database For manual switchovers open the new primary directly from the mount state Or, simulate a Fast-Start Failover - complete transactions and shutdown abort all primary instances

Data Guard Best Practices:


Faster Redo Transport
Set SDU=32K Tune network parameters that affect network buffer sizes and queue lengths Ensure sufficient network bandwidth for maximum database redo rate + other activities
Note: Please refer to MAA paper, Oracle9i Data Guard: Primary Site and Network Configuration Best Practices
http://www.oracle.com/technology/deploy/availability/pdf/MAA_DG_NetBestPrac.pdf

Oracle 10g Release 2 paper coming soon

Data Guard Best Practices:


Tune Network Parameters
Send and receive buffer size = 3 x bandwidth delay product (BDP)
BDP = 1,000 Mbps * 25ms (.025 secs) = 1,000,000,000 * .025 = 25,000,000 Megabits / 8 = 3,125,000 bytes

Tune network device queues to eliminate packet losses and waits. Set device queues to a minimum of 10,000 (default 100)
* BDP = the product of the estimated minimum bandwidth and the round trip time between the primary and standby server

Impact of Network Tuning


Impact of Network Tuning

Default

10.8

Tuned

937

200

400

600

800

1000

Mbits/sec Network Throughput

Oracle MAA Test Result

Data Guard Release 10.2


Redo Transport Improvements
Increased network write sizes to 10 MB to better utilize network capacity for both ARCH and LNS Full decoupling of LGWR and LNS processes
No more waits during log switches No more waits when LNS buffer is full

Intra-file parallelism support for ARCH Up to 29 parallel remote archive processes Dedicated local ARCH

Faster ASYNC Transport


1GB redo transfer
300 264

Time to transfer (secs)

250 200 150 100 50 0 0ms 10ms 50ms 100ms 52 63 54 77 74 155 102

10gR2 Previous versions

Network latency

ARCH Performance Gains


ARCH Intrafile Parallelism
30 27.8 23.0 17.1 12.8 24.4

Effective transfer rate (MB/sec)

25 20 15 10 5 0 1 2

Parallel ARCH Processes

Data Guard Best Practices:


Gap Resolution and Data Loss
For fastest gap resolution
Leverage intra-file archive parallelism Follow tips for tuning redo transport to improve network utilization

To minimize data loss,


Use SYNC transport with a low latency and with a high bandwidth network For ASYNC transport, follow tips for tuning redo transport

Example: Less than 7 seconds of data loss exposure for high redo rates of 2-12 MB/sec with <=25 ms latency in our tests

Data Guard Best Practices:


Reduce Overhead on Primary
New Data Guard 10g Release 2 ASYNC Transport
Less primary overhead across different latencies and throughput NEW: LNS reads directly from the Online Redo Logs

Best Practice
Allocate additional I/O bandwidth for Online Redo Log Files

Performance Gains
For Redo rates less than 2 MB/sec, there is less than 5% impact on the primary database across different latencies For very high redo rates of 20 MB/sec, less than 10% impact on primary database even with latencies of 50 and 100 ms Overall, Oracle 10g Release 2 database throughput (redo rate) was 2-3 times faster than 10gR1 at high redo rates and latencies

Data Guard Best Practices:


Reduce Overhead on Primary
Offload Backups to Standby Database
Eliminate backup overhead on primary database RMAN enables hot backups of the standby database

Best Practices
Use Redo Apply (Physical Standby) For simplicity, use identical directory structures on the primary and standby databases
Directory structures can be different see best practice paper for details

Use RMAN Recovery Catalog so that backups taken on one database server can be restored on another

Use a catalog server physically separate from primary and standby sites
http://www.oracle.com/technology/deploy/availability/pdf/RMAN_DataGuard_10g_wp.pdf

Reference MAA RMAN/Data Guard best practices paper

Data Guard Sync Transport Less Overhead than Remote Mirroring


RTT 0 ms 10 ms 15 ms 20 ms Data Guard 4 % DB Impact 4 % DB Impact No Data 10% DB Impact Remote Mirroring 3 % DB Impact 26% DB Impact 39% DB Impact No Data

Actual Customer Test Data

Data Guard Advantage Because


DG only transmits the redo in contrast to all the DB writes
DBWR database writer LGWR log writer ARCH - archiver RVWR flashback log writer

Higher wait times for DBWR (db file parallel writes) result in
Contention for free buffers Increase in buffer busy waits

Oracle Snapshots and Clones


An alternative to third-party snapshots and database clones

Database Restore Points:


Database Snapshots
Business Needs
Database Snapshots for Quick Backups and Restores Fast, instantaneous snapshots with little overhead

Oracle Solution
Database [guaranteed] restore points
Create restore point <snap1> [guarantee flashback database]

Restore points only captures one before image block for every changed blocks regardless of how many times it has been changed Flashing back to restore point is proportional to copying the changed blocks over and applying a small amount of recovery Not appropriate as a replacement for full backups

Database Restore Points:


An alternative for snapshots
Database Restore Points
No additional cost from a different vendor Creating restore point is instantaneous No hot backup is required Database consistent after flashback Less system resources than a full backup if flashback is disabled Leverage as a fallback or checkpoint mechanism to protect from logical failures or for quick restores in test environments

Best Practice: Monitor space and I/O performance


Monitor space utilization from v$restore_point and v$flashback_database_stat Monitor for high flashback buffer free wait events More benefits for larger databases

Database Restore Points: Use Cases


Fast fallback for database patches, upgrades, application changes or batch jobs
Upgrade from 10.1.0.4 to 10.2.0.1 ==> 1+ hours Flashback prior to upgrade ==> 2 minutes compared to hours to restore the database

Quick restore of test environments to original state


Change 1% (5 GB), Flashback ==> 10 minutes compared to 100 minutes to restore the 500 GB database

Data Guard, Flashback and RMAN:


Database Clones
Business need
Users need copies or clones of their primary database for testing, development, reporting Typically clones are refreshed daily or weekly

Oracle has all the tools to create a clone without the need of third party products

Data Guard, Flashback and RMAN:


Creating and Resynching a Clone
1. Create restore point 4. Resync with incremental backup or archives from primary 3. Flashback clone to restore point Clone >> Standby
Physical Standby Database

Physical Standby Database

2. Activate standby for testing Standby >> Clone

Read/Write Clone of Primary

Steps to Clone and Resync


Step 1: Activate Clone
Create Restore Point Guarantee Flashback Database (instantaneous) Activate Standby Database (clone)

Step 2: Use Clone for Read-Write Testing Step 3: Resync Clone


Flashback to Restore Point Create Incremental Backup from the Primary containing all changes since the time of the restore point Apply Incremental to the clone

Clone Performance:
Resync vs Recreate
100 90 80 70 60 50 40 30 20 10 0 97 Time (Mins)

18.47

23.68

Resync Clone (Parallel)

Resync Clone (Serial)

Recreate from Primary

Data Guard, Flashback and RMAN:


Database Clones
Oracle clones can be used as an alternative to third party database cloning solutions
No additional cost from a different vendor All features are present in Oracle to create and resync a clone Steps need to be scripted and automated Targeting Enterprise Manager wizard for the future

Best Practices:
Compare performance between Oracle and current approaches Sufficient IO bandwidth and storage implies faster flashback and resync performance Enable block change tracking on the primary Use RMAN parallelism

More Uptime During Planned Downtime

Reducing Planned Downtime


Best Practice:
Pick the right strategy Test, test, test and automate

Reducing Planned Downtime


Best Practices
Dynamic Resource Provisioning, Online redefinition and reorganization reduces planned downtime
Detect new processors from an SMP server Dynamically grow, shrink and tune memory Table and index modifications How?: Automatic Shared Memory Management How?: Online physical and logical table changes How?: Online index operations

Reducing Planned Downtime


Best Practices
ASM eliminates downtime for
storage maintenance and storage migration How?: Automatic data rebalance

RAC rolling upgrade eliminates downtime for


Patching and system maintenance How?: Rolling upgrade with qualified patches How?: Service relocation

Reducing Planned Downtime


Best Practices
Data Guard SQL Apply minimizes downtime:
Node, system, cluster, and site maintenance Database upgrades How?: Fast switchover < 5 minutes and no additional downtime for upgrade steps How?: Rolling upgrades (starting w/ 10.1.0.3)

Best upgrade approach if RAC rolling upgrade is not possible and there are no data type restrictions

Reducing Planned Downtime


Best Practices
Streams approach eliminates or minimizes downtime for
Database upgrades Platform migration (e.g. Windows to Linux) Character set migration How?: Support heterogeneous versions in active/active mode How?: Support heterogeneous platforms How?: Automatic conversion between character sets

Best upgrade approach for customers that are currently using streams

Data Type Restrictions


Data Guard SQL Apply and Streams
Unsupported data types
BFILE, ROWID, User defined types Collections and VARRAYs XML types Multimedia types

With Streams, you can work around some data type restrictions by using
triggers to capture changes from an unsupported tables to a shadow tables that has supported data types Replicate the shadow table changes Use customized apply to apply the changes to the original tables on the target database

Reducing Planned Downtime


Best Practices
Transportable Tablespace reduces planned downtime
Platform migration (e.g. Windows to Linux) Database upgrade How?: Cross-platform datafile conversion How?: Transport tablespaces to new version

Transportable Tablespace to Minimize Downtime for Upgrades


When to use
Logical standby and streams are not best fit solutions Time to run the upgrade or migration scripts is greater than the time to export and import the meta data

Phase 1: Preparation
1. Create shell of target database using new version 2. Create schemas in target database 3. Create physical standby if source and target hosts are different

Phase 2: Transport database


Source database 1.Remove transport violations, if any 2.Make user tablespaces read-only 3.Export tablespace metadata Target database 1.Recover standby and shutdown 2.Use datafiles for target database 3.Import tablespace metadata 4.Make user tablespaces read-write

Transportable Tablespace to Minimize Downtime for Upgrades


Customer Example
AMADEUS
Upgrade electronic ticketing system from Oracle 9.2.0.3 on HP N Class to 10.1.0.4 on HP Superdome Total Downtime 8 Minutes (compared 25 minutes for normal upgrade)
http://www.oracle.com/technology/deploy/availability/pdf/AmadeusProfile_TTS.pdf http://www.oracle.com/technology/deploy/availability/pdf/AmadeusProfile.pdf

Transparent Client Failover for Disaster Recovery

Client Failover Oracle Database 10g Release 2


Fast Application Notification Prerequisites
Oracle 10g Release 2 OCI Clients
Server Side TAF enabled with AQ_HA_NOTIFICATION=TRUE FAN OCI event using AQ notifies OCI mid-tier clients automatically

Oracle 10g JDBC clients


Fast Connection Failover enabled

Client Failover with Data Guard 10g Release 2: Validated Solution


Data Guard failover can complete in seconds DB_ROLE_CHANGE database trigger can be configured to automatically . . .
1. Enable production database services using DBMS_SERVICE 2. Change LDAP or DNS or some naming service to ensure that clients reconnect to the new available primary database 3. Call any other application pre-failover steps 4. Notify JDBC clients with external program to publish FAN ONS events

FAN OCI event using AQ notifies OCI mid-tier clients automatically(10gR2)

MAA Best Practice Home Page


http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
HA Best Practices for Oracle Database
Oracle Database High Availability Overview 10g Release 2 - Documentation Oracle Database High Availability Architecture and Best Practices 10g Release 1 - Documentation Oracle Database 10g Best Practices: Data Guard Redo Apply and Media Recovery Oracle Database 10g Best Practices: Data Guard SQL Apply Oracle Database 10g Best Practices: Data Guard Role Transitions and Streams Using Recovery Manager with Oracle Data Guard in Oracle Database 10g Oracle Database 10g Best Practices: Migration to Automatic Storage Management (ASM) Best Practices for Creating a Low-Cost Storage Grid for Oracle Databases Oracle9i Data Guard: Primary Site and Network Configuration Best Practices Oracle9i Fast-Start Checkpointing Best Practices

HA Best Practices for Oracle Application Server


OracleAS 10g Infrastructure Highly Available Architectures Highly Available Distributed Identity Management Highly Available Identity Management Deployment Example - Rack Mounted Identity Management Highly Available Identity Management Deployment Example - Cold Failover Cluster Identity Management Configuring Highly Available OracleAS Infrastructure With F5's BIG-IP Load Balancer Oracle9i Application Server Cold Failover Cluster Infrastructure Upgrade to Oracle Application Server 10g Cold Failover Cluster Transformation From A Single Host Oracle Application Server Infrastructure To An Oracle Application Server 10g Cold Failover Cluster

HA Best Practices for Oracle Applications & Oracle Collaboration Suite


Configuring Oracle Applications Release 11i with 10g RAC and 10g ASM E-Business Suite 11i on RAC: Configuring Database Load balancing & Failover Oracle E-Business Suite Release 11i with 9i RAC: Installation and Configuration using AutoConfig Business Continuity for Oracle Applications Release 11i Oracle Collaboration Suite High Availability Configuration Release 2 (9.0.4) for UNIX and Linux

HA Best Practices for Oracle Grid Control


Configuring Enterprise Manager for High Availability Enterprise Manager 10g Backup, Recovery and Disaster Recovery Considerations

High Availability Demos/Sessions From Oracle Development


Demogrounds - Monday, Sep 19 Thursday, Sep 22
! ! Oracle Data Guard ILM and Storage ! ! Oracle Secure Backup RMAN, Flashback, and Online Redefinition

Sessions - Monday, Sep 19


! ! ! 1:30-2:30 pm, Room 303 - Optimizing Linux I/O 3:00-4:00 pm, Room 104 - The Future of Database Information Technology 4:30-5:30 pm, Room 103 - What They Didn't Print in the DOC - HA Best Practices by Gurus from Oracle's Maximum Availability Architecture Team

Sessions - Tuesday, Sep 20


! ! 3:00-4:00 pm, Room 104 - Logical Standby Unleashed 4:30-5:30 pm, Room 104 - Best Practices for Oracle Database 10g Backup and Recovery

High Availability Sessions From Oracle Development


Sessions - Wednesday, Sep 21
! ! 11:00 am-12:00 pm, Room 104 - Improve Your Tape Backup Results with Oracle Secure Backup 3:00-4:00 pm, Room 304 - Implementing Information Lifecycle Management (ILM) using the Oracle Database

Sessions - Thursday, Sep 22


! ! ! 1:00-2:00 pm, Room 104 - Minimizing Application Development Time Using Flashback: A Customer Case Study 2:30-3:30 pm, Room 104 - Best Practices To Achieve Business Continuity Using Oracle Applications and Oracle Database Technology 4:00-5:00 pm, Room 104 - Best Practices for Automatic Failover Using Oracle Data Guard 10g Release 2

QUESTIONS ANSWERS

S-ar putea să vă placă și