Oracle RAC Administration

ORACLE RAC
ADMINISTRATION
C H E TA N G U P T E
BACKGROUND P ROCESSES OF ORACLE 12 C RAC
The GCS and GES processes, and the GRD collaborate to enable Cache Fusion.
The Oracle RAC processses and their identifiers are as follows:
ACMS: Atomic Controlfile to Memory Service (ACMS)

In an Oracle RAC environment, the ACMS per-instance process is an agent that contributes
to ensuring a distributed SGA memory update is either globally committed on success or
globally aborted if a failure occurs.
GTX0-j: Global Transaction Process
The GTX0-j process provides transparent support for XA global transactions in an Oracle
RAC environment.The database autotunes the number of these processes based on the
workload of XA global transactions.
LMON: Global Enqueue Service Monitor
The LMON process monitors global enqueues and resources across the cluster and performs
global enqueue recovery operations.
LMD: Global Enqueue Service Daemon

The LMD process manages incoming remote resource requests within each instance.
LMS: Global Cache Service Process
The LMS process maintains records of the data file statuses and each cached block by
recording information in a Global Resource Directory (GRD).
The LMS process also controls the flow of messages to remote instances and
manages global data block access and transmits block images between the buffer caches of
different instances.
This processing is part of the Cache Fusion feature.
LCK0: Instance Enqueue Process
The LCK0 process manages non-Cache Fusion resource requests such as library and row
cache requests.
RMSn: Oracle RAC Management Processes (RMSn)

The RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished by
an RMSn process include creation of resources related to Oracle RAC when new instances
are added to the clusters.
RSMN:
Remote Slave Monitor manages background slave process creation and
communication on remote instances.These background slave processes perform
tasks on behalf of a coordinating process running in another instance.
CRSD
CRS is installed and run from a different ORACLE_HOME known as ORA_CRS_HOME
( Called as GRID_HOME starting from 11gR2), which is independent of ORACLE_HOME.
CRSd manages the resources like starting and stopping the services and failing-over the
cluster resources which can be Virtual IP, Database Instance, Listener or Database etc
CRS daemon has two modes of running. During startup and after a shutdown. During
planned clusterware start it is started as ‘reboot’ mode. It is started as ‘restart’ mode after
unplanned shutdown in which it retains the previous state and returns resources to their
previous states before shutdown.
OCSSD
It maintains membership in the cluster through a special file called a voting disk (also
referred to as a quorum disk).This is the first process that is started in the Oracle
Clusterware stack.
OCSS in Stand-alone Databases using ASM is used for the inter-instance communication and in
RAC environments, identified a Clustered Configuration.
 OCSS reads OCR to locate VD and reads the VD to determine the number and names of
cluster members.
CSS verifies the number of nodes already registered as part of the cluster. After verification, if
no MASTER node has been established, CSS authorizes the verifying node to be the MASTER
node.This is the first node that attains the ACTIVE state. Cluster synchronization begins when
the MASTER node synchronizes with the other nodes.
OCSSD
OCSSd offers Node Membership(NM) and Group Membership(GM) services.
The NM checks the heartbeat across the various nodes in the cluster every second. If the
heartbeat/node members do not respond within 60 seconds, the node (among the surviving
nodes) that was started first (master) will start evicting the other node(s) in the cluster.
All clients that perform I/O operations register with the GM (e.g., the LMON, DBWR).
Reconfiguration of instances (when an instance joins or leaves the cluster) happens through
the GM. When a node fails, the GM sends out messages to other instances regarding the
status.
EVMD
It receives the FAN events posted by the clients and propagates the information
to the other nodes.
Failure of EVMd doesn't require node reboot and restarts automatically.
It is spawned by init.evmd wrapper script. It starts evmlogger child process which scans the
callout directory and starts racgevt process to execute the callouts.
ONS:
It is a publish and subscribe service for communicating Fast Application Notification (FAN) events
to clients.
Whenever the state of resource changes in the cluster nodes, CRS triggers a HA event and routes
them to the ONS process which propagates the information to other cluster nodes.
OPROCD:
OPROCd serves as the I/O fencing solution for the Oracle Clusterware.
It is the process monitor for Oracle Clusterware and it uses the hang check timer for the cluster
integrity so that the hanging nodes cannot perform any I/O. Failure of the OPROCd process causes
the node to restart.
CLUSTER SYNCHRONIZATION SERVICE (CSS):
Manages the cluster configuration by controlling which nodes are members of the cluster and
by notifying members when a node joins or leaves the cluster.
 If you are using certified third-party clusterware, then CSS processes interfaces with your
clusterware to manage node membership information.
CSS has three separate processes:
 the CSS daemon (ocssd)
 the CSS Agent (cssdagent)and
 the CSS Monitor (cssdmonitor)
The cssdagent process monitors the cluster and provides input/output fencing.
 A cssdagent failure results in Oracle Clusterware restarting the node.
DISK MONITOR DAEMON (DISKMON):
Monitors and performs input/output fencing for Oracle Exadata Storage Server.
As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon
daemon is always started when ocssd is started.
MULTICAST DOMAIN NAME SERVICE ( MDNS):

Allows DNS requests.
The mDNSprocess is a background process on Linux and UNIX, and a service on Windows.
ORACLE GRID NAMING SERVICE (GNS):

Is a gateway between the cluster mDNS and external DNS servers.
The GNS process performs name resolution within the cluster.
ORAAGENT:
 Extends clusterware to support Oracle-specific requirements and complex resources. It runs server callout
scripts when FAN events occur.
This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).
ORACLE ROOT AGENT ( ORAROOTAGENT):

 Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and
the Grid virtual IP address
CLUSTER KILL DAEMON ( OCLSKD):

 Handles instance/node evictions requests that have been escalated to CSS
GRID IPC DAEMON (GIPCD):

 Is a helper daemon for the communications infrastructure
CONFIGURING INITIALIZATION PARAMETERS FOR
RAC DATABASE
In Oracle RAC, each instance can have separate parameter file or all the instances can have a
single parameter file.
Oralce RAC parameters can be classified into 3 categories:
 Parameters that Must Have Identical Settings on All Instances.

 Parameters that Must Have Unique Settings on All Instances.
 Parameters that Should Have Identical Settings on All Instances.
PARAMETERS THAT MUST HAVE IDENTICAL
SETTINGS ON ALL INSTANCES
 ACTIVE_INSTANCE_COUNT
 CLUSTER_DATABASE
 CLUSTER_DATABASE_INSTANCES
 COMPATIBLE
 CONTROL_FILES
 DB_BLOCK_SIZE
 DB_DOMAIN
 DB_FILES
 DB_NAME
 DB_RECOVERY_FILE_DEST
 DB_RECOVERY_FILE_DEST_SIZE
 DB_UNIQUE_NAME
 INSTANCE_TYPE (RDBMS or ASM)
 PARALLEL_MAX_SERVERS
 REMOTE_LOGIN_PASSWORDFILE
 RESULT_CACHE_MAX_SIZE
 UNDO_MANAGEMENT
PARAMETERS THAT MUST HAVE UNIQUE SETTINGS
ON ALL INSTANCES
INSTANCE_NUMBER
THREAD
ROLLBACK_SEGMENTS
UNDO_TABLESPACE
INSTANCE_NAME
PARAMETERS THAT SHOULD HAVE IDENTICAL
SETTINGS ON ALL INSTANCES
Oracle recommends that you set the values for the parameters in to the same value on all instances.
Although you can have different settings for these parameters on different instances, setting each parameter to the
same value on all instances simplifies administration.
ARCHIVE_LAG_TARGET
CONTROL_MANAGEMENT_PACK_ACCESS
LICENSE_MAX_USERS
LOG_ARCHIVE_FORMAT
SPFILE
UNDO_RETENTION
FLASH RECOVERY AREA:
Oracle recommends that you enable a flash recovery area to simplify your backup management.
Ideally, the flash recovery area should be large enough to contain all the following files:
 A copy of all datafiles
 Incremental backups
 Online redo logs
 Archived redo log files that have not yet been backed up
 Control files and control file copies
 Autobackups of the control file and database initialization parameter file
TROUBLESHOOTING ORACLE RAC:
 Find status of Clusterware Stack:
./crsctl check crs
 Find OCR Locations:

ocrcheck
 Find Voting Disk Locations:

./crsctl query css votedisk
or
check the output of "ocrdump"
 Check status of all resources ( Nodeapps,ASM, Database, RAC Services):

• crs_stat -t
DEBUGGING RESOURCES:
A RAC DBA might possibly face several issues which might be related to Clusterware Stack, Resources, OCR &
Voting Disk etc.
In the below example while trying to start the resources, if we get the issue like below:
CRS-0215: Could not start resource 'ora.prod2.vip’
 We can debug any resources with the help of crsctl command as below:
./crsctl debug log res "ora.prod2.vip:2"
":2" denotes level of debugging and can be in the range of 1 to 5.
 Checking the log files:
$CRS_HOME/log/<hostname>
 Debugging Components:
We can also debug the Clusterware components i.e. CRS, EVM, OCSS etc
crsctl debug log crs "CRSD:1"
DIAGNOSTICS COLLECTION SCRIPT
Every time an Oracle Clusterware error occurs, you should use run the diagcollection.pl
script to collect diagnostic information from Oracle Clusterware in trace files.
The diagnostics provide additional information so Oracle Support can resolve problems.
Run this script from the following location:
CRS_home/bin/diagcollection.pl
ORACLE CLUSTERWARE ALERTS

Oracle Clusterware posts alert messages when important events occur.The alerts contains
information about the entire Clusterware stack. For example events related to EVM, CRS or
OCSS etc
CRS_home/log/hostname/alerthostname.log
HANDLING NODE EVICTION ISSUES:
The Oracle Clusterware is designed to perform a node eviction by
removing one or more nodes from the cluster if some critical
problem is detected.
A critical problem could be a node not responding via a network

heartbeat, a node not responding via a disk heartbeat, a hung or
severely degraded machine, or a hung ocssd.bin process.
COMMON CAUSES FOR NODE EVICTION :
Network failure or latency between nodes. It would take 30 consecutive missed checkins (by
default – determined by the CSS misscount) to cause a node eviction.
Problems writing to or reading from the CSS voting disk. If the node cannot perform a disk
heartbeat to the majority of its voting files, then the node will be evicted.
A member kill escalation. For example, database LMON process may request CSS to remove an
instance from the cluster via the instance eviction mechanism. If this times out it could escalate
to a node kill.
An unexpected failure of the OCSSD process, this can be caused by any of the above issues or
something else.
An Oracle bug.
IMPORTANT LOG FILES:
Clusterware alert log in
$GRID_HOME>/log/nodename
The cssdagent log(s) in
$GRID_HOME/log/nodename/agent/ohasd/oracssdagent_root
The cssdmonitor log(s) in
$GRID_HOME/log/nodename/agent/ohasd/oracssdmonitor_root
The ocssd log(s) in
$GRID_HOME/log//cssd
The lastgasp log(s) in
/etc/oracle/lastgasp or /var/opt/oracle/lastgasp
IPD/OS or OS Watcher data
‘opatch lsinventory -detail’ output for the GRID home
CLUSTERWARE ADMINISTRATION
crsctl check crs - To Check the viability of the CRS stack:
crsctl check cssd - To check the viability of CSS
crsctl check crsd - To check the viability of CRS
crsctl check evmd - To check the viability of EVM
crsctl query css votedisk - To list the voting disks used by CSS
crsctl add css votedisk <path> - adds a new voting disk
crsctl delete css votedisk <path> - removes a voting disk
crsctl enable crs - enables startup for all CRS daemons
crsctl disable crs - disables startup for all CRS daemons
crsctl start crs - starts all CRS daemons
CLUSTERWARE ADMINISTRATION
crsctl stop crs - stops all CRS daemons. Stops CRS resources in case of cluster.
crsctl start resources - starts CRS resources
crsctl stop resources - stops CRS resources
crsctl debug statedump evm - dumps state info for evm objects
crsctl debug statedump crs - dumps state info for crs objects
crsctl debug statedump css - dumps state info for css objects
crsctl debug trace css - dumps CSS in-memory tracing cache
crsctl debug trace crs - dumps CRS in-memory tracing cache
crsctl debug trace evm - dumps EVM in-memory tracing cache
crsctl query crs softwareversion [<nodename>] - lists the version of CRS software installed
crsctl query crs activeversion - lists the CRS software operating version
OS WATCHER
OS Watcher (OSW) is a collection of UNIX shell scripts intended to collect and archive
operating system and network metrics to aid support in diagnosing performance issues.
OSW operates as a set of background processes on the server and gathers OS data on a
regular basis, invoking such Unix utilities as vmstat, netstat and iostat.
OS watcher can be downloaded from My Oracle Support " Doc ID: 301137.1"
Once the tar file downloaded is extracted, from the extracted directory, run
e.g. Configure OS Watcher to take snapshots with the system utilities at every 5 minutes for
the last 24 hours.
nohup ./startOSWbb.sh 300 24 &

Oracle RAC Administration

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Oracle RAC Administration

Încărcat de

Drepturi de autor:

Formate disponibile

ORACLE RAC

ACMS: Atomic Controlfile to Memory Service (ACMS)

LMD: Global Enqueue Service Daemon

RMSn: Oracle RAC Management Processes (RMSn)

Failure of EVMd doesn't require node reboot and restarts automatically.

MULTICAST DOMAIN NAME SERVICE ( MDNS):

ORACLE GRID NAMING SERVICE (GNS):

ORACLE ROOT AGENT ( ORAROOTAGENT):

CLUSTER KILL DAEMON ( OCLSKD):

GRID IPC DAEMON (GIPCD):

 Parameters that Must Have Identical Settings on All Instances.

 Find OCR Locations:

 Find Voting Disk Locations:

 Check status of all resources ( Nodeapps,ASM, Database, RAC Services):

ORACLE CLUSTERWARE ALERTS

A critical problem could be a node not responding via a network

S-ar putea să vă placă și