Documente Academic
Documente Profesional
Documente Cultură
SURENDER SARA
NCOAUG
Email :
SURENDER.SARA@ORABYTE.COM
SURENDER.SARA@SERACONSULTING.US
Two Node Architecture, Unprotected
Two Node Architecture, Protected Apps
tier and unprotected DB tier
Two Node Architecture, Protected
Apps tier and DB tier
Failover Cluster
• Detecting failure by monitoring the heartbeat and checking status of resources
• Reorganizing Cluster membership in the cluster manager
• Transferring Disk ownership from primary node to secondary node
• Mounting the FS on secondary node
• Starting DB instance
• Recovering the Database and rollback of uncommitted data
• Reestablishing the client connections to the failover node
FAILOVER CLUSTER
OFFERINGS
• Veritas cluster server
• HP Service Guard
• Microsoft Cluster Service with Oracle Failsafe
• RedHat Linux Advanced Server 2.1
• Sun Cluster Oracle Agent
• Compaq, now HP, Segregated Cluster
• HACMP
RAC
Scalable RAC
Real Application Cluster
• Many instances of Oracle running on many nodes
• Multiple instances share a single physical database
• All instances have common data, control, and initialization files
• Each instances has individual, shared log files and rollback segments
or undo tablespaces
• All instances can simultaneously execute transactions against the
single database
• Caches are synchronized using Oracle’s Global Cache Management
technology (Cache Fusion)
RAC Building Blocks
• Instance and Database files
• Shared storage with OCFS, CFS or raw devices
• Redundant HBA cards per HOST
• Redundant NIC cards per HOST, one for cluster interconnect and one
for LAN connectivity
• Local RAID protected drives for ORACLE_HOMES ( OCFS does not
support ORACLE_HOME install)
CLUSTERINTER CONNECT
FUNCTION
• - Monitoring Health, status and message synchronization
• - Transporting Distributed Lock manager messages
• - Accessing remote File system
• - Moving application specific traffic
• - providing cluster alias routing
Interconnect Requirements
• - Low latency for short messages
• - High speed and sustained data rates for large messages
• - LOW Host CPU utilization
• - Flow Control, Error Control and heart beat continuity
monitoring
• - switched network that scale well
INTERCONNECT PRODUCTS
• Memory Channel
• SMP Bus
• Myrinet
• Sun SCI
• Gigabit Ethernet
• Infiband Interconnect
INTERCONNECT PROTOCOL
• TCP/IP
• UDP
• VIA
• RDG
• HMP
IO CHANNEL HBA Products
• Adaptec
• DPT
• LSI Logic
• Interphase
• Qlogic
• Emulex
• JNI
FACRIC SWITCHES
• mcDATA
• EMC
• QLOGIC
• BROCADE
CLUSTER NODES
NUMA
SMP
• - shared system bus and IO
• - expensive and scalability problems
• - Adding more CPU can result into upgrading architecture components
• - DELL and HP-Compaq
BLADE Servers
• - BladeFram system from egenera
• - egenera - 24 2 way and 4 way SMP processing resources
• - egenera - redundant central controllers ,redundant high-speed interconnects,
PAN manager
• - egenera - PAN manager handles external storage mapping and virtualization
• - egenera - PAN manager handles , IO and network traffic to and from individual
servers
Oracle’s High Availability (HA)
Solution Stack
System Real Application Clusters
Failure Continuous Availability for all Applications
• examples :
• tar --o_direct -cvf /tmp/backup.tar *
11i Steps -2 ( install OS Patches)
• rpm -Uv fileutils-4.1-4.2.i386.rpm
• This provides an updated version of cp and dd
• Allows a user to copy files from a running database on
OCFS
• examples :
• cp --o_direct /ocfs/quorum.dbf /tmp/backup/quorum.dbf
• dd o_direct=yes if=/ocfs/quorum.dbf
of=/tmp/backup/quorum.dbf
11i Steps -3 Install oracle provided
RPM’s
• ocfs-support-1.0.9-11.i686.rpm
• ocfs-tools-1.0.9-11.i686.rpm
• j2sdk-1_3_1_09-linux-i586.rpm.bin
• unzip-5.50-30.i386.rpm
• zip-2.3-10.i386.rpm
• wu-ftpd-2.6.1-21.i386.rpm
• hangcheck-timer-2.4.9-e.10-0.4.0-2.i686.rpm
• hangcheck-timer-2.4.9-e.10-enterprise-0.4.0-2.i686.rpm
11i steps -3 ( interconnect)
• ifconfig eth0:0 192.168.2.100
• route add -host 192.168.2.100 dev eth0:0
• Do this on each node
• Create watchdog file (oracle installer checks for this to
install cluster option)
# touch /dev/watchdog
• Setup hangcheck-timer module
– # vi /etc/modules.conf
– options hangcheck-timer hangcheck_tick=30
hangcheck_margin=180
– # modprobe hangcheck-timer
11i steps -5 OCFS.conf – 5
• # ocfstool ( from x windows)
• # ocfs config
• # Ensure this file exists in /etc
• #
• node_name = linux3.home.com
• node_number =
• ip_address = 192.168.1.100
• ip_port = 7000
• comm_voting = 1
• guid = 9D3B77AF2FF26E92E25D00E04CA44B58
11i Steps -6 install OCFS
• mkfs.ocfs -F -b 128 -L /s01 -m /s01 -u 500 -g 500 0755 /dev/sda1
• srvconfig_loc=/s01/oragsd-config ( touch this file)
11i steps -7 OCM
• $ ls
• If cmcfg.ora exists:
• $ cp cmcfg.ora cmcfg.ora.original
• If cmcfg.ora does not exist:
• $ cp cmcfg.ora.tmp cmcfg.ora
• $ echo HostName=dc1node3inter >> cmcfg.ora
• $ vi cmcfg.ora
• [comment out WatchdogSafetyMargin and WatchdogTimerMargin]
• PrivateNodeNames=linux22 linux33
• PublicNodeNames=linux2 linux3
• MissCount=210
• KernelModuleName=hangcheck-timer
• CmDiskFile=/u02/oracm-qourum
• $ vi ocmargs.ora
• [comment out first line, which contains the word “watchdogd”]
• $ cd ../bin
• $ cp ocmstart.sh ocmstart.sh.original
• $ vi ocmstart.sh
• [remove words “watchdog and” from line containing “Sample startup script...”]
• [remove every line containing “watchdogd”, uppercase or lowercase. If it’s in a if/then/fi then remove the whole if/then/fi.]
• $ su – root
• export ORACLE_HOME=/d02/oracle/proddb/9.2.0
• /d02/oracle/proddb/9.2.0/oracm/bin/ocmstart.sh
• Configure and Start Cluster Manager
• $ cd $HOME/product/9.2/oracm/admin
11i steps -4 ( cp/dd - DB files to
shared storage )
• cp --o_direct /d03/oracle/proddata/* /s01/oracle/proddata/
• Recreate the controlfile
11i steps 8 – init.ora / spfile
• Create UNDO TBS for each instance
• Enable and disable thread for instance 2 from instance 1
and vice versa
11i steps 9 – instance 1
• # RAC-specific Parameters
• #
• #########
• cluster_database = true
• cluster_database_instances=2
• thread = 1
• instance_number = 1
• instance_name = PRODi1
• service_names = PROD
• local_listener = PRODi1
• remote_listener = PRODi2
11i steps 10 – instance 2
• cluster_database = true
• cluster_database_instances=2
• thread = 2
• instance_number = 2
• instance_name = PRODi2
• service_names = PROD
• local_listener = PRODi2
• remote_listener = PRODi1
11i Apps tier – 806/iAS
tnsnames.ora
• PROD = (DESCRIPTION=
• (ADDRESS_LIST =
• (ADDRESS=(PROTOCOL=tcp)(HOST=linux1)(PORT=1521))
• (ADDRESS=(PROTOCOL=tcp)(HOST=linux2)(PORT=1521))
• )
• (CONNECT_DATA=(SERVICE_NAME=PROD)(SERVER=DEDICATED))
• )
• PRODi2 = (DESCRIPTION=
• (ADDRESS=(PROTOCOL=tcp)(HOST=linux2)(PORT=1521))
• (CONNECT_DATA=(INSTANCE_NAME=PRODi2)(SERVICE_NAME=PROD))
• )
• PRODi1 = (DESCRIPTION=
• (ADDRESS=(PROTOCOL=tcp)(HOST=linux1)(PORT=1521))
• (CONNECT_DATA=(INSTANCE_NAME=PRODi1)(SERVICE_NAME=PROD))
• )
Modify DBC file for Failover
• APPS_JDBC_DRIVER_TYPE=THIN
• FND_MAX_JDBC_CONNECTIONS=100
# Setup at Apps Tier
• APPS_JDBC_URL=jdbc:oracle:thin:@(DESCRIPTION=
(ADDRESS_LIST=(LOAD_BALANCE=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=linux1)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=linux2)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=prod)))
WHAT can & cannot failover
• SQL* PLUS will failover using TAF
• JDBC Connections will failover
• Forms run time connections will not, users
will have to reconnect
Questions And Answers
• Surender.sara@veritiesllc.com