Sunteți pe pagina 1din 25

Remove a Node from an Existing Oracle RAC 10g R1

Cluster on Linux - (FireWire)


by Jeff Hunter, Sr. Database Administrator

Contents

1. Overview
2. Remove the Instance
3. Remove the Node from the Cluster

Overview

With any RAC configuration, it is common for the DBA to encounter a scenario where
he or she needs to remove a node from the RAC environment. It may be that a server is
being underutilized in the cluster and could be better used in another business unit.
Another scenario is a node failure. In this case, a node can be removed from the cluster
while the remaining nodes continue to service ongoing requests.

This document is an extension to two articles: "Building an Inexpensive Oracle10g


RAC Configuration on Linux - (WBEL 3.0)" and "Adding a Node to an Oracle10g RAC
Cluster - (WBEL 3.0)". Contained in this document are the steps to remove a single node
(the third node I added in the second article) from an already running and configured
Oracle10g RAC environment.

This article assumes the following:

1. Three-node Oracle10g Environment: As I noted previously,


this article assumes that the reader has already built and configured a
three-node Oracle10g RAC environment. This system would consist of a
three node cluster (each with a single processor), all three running Linux
(White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux
3) with a shared disk storage based on IEEE1394 (FireWire) drive
technology.
2. Node to be Removed is Available: The node to be removed in
this example is available and running within the cluster. Of the three
nodes in current RAC configuration, I will be removing linux3.
3. FireWire Hub: The enclosure for the Maxtor One Touch 250GB
USB 2.0 / Firewire External Hard Drive has only two IEEE1394
(FireWire) ports on the back. To configure a three-node cluster, I needed
to purchase a FireWire hub. The one I used for this article is a BELKIN
F5U526-WHT White External 6-Port Firewire Hub with AC Adapter.
The steps in this document provide the steps for removing the node's metadata from the
cluster registry. The node being removed can easily be added back to the cluster at a
later time.

If a node needs to be removed from an Oracle10g RAC database, even if the node will
no longer be available to the environment, there is a certain amount of cleanup that
needs to be done. The remaining nodes need to be informed of the change of status of
the departing node.

The three most important steps that need to be followed are and will be discussed in this
article are:

1. Remove the instance using DBCA (preferred) or command-line


(using srvctl).
2. Remove the node from the cluster.
3. Reconfigure the OS and remaining hardware.

For the purpose of this example, I have a three-node Oracle10g cluster:

Oracle10g RAC Configuration


Node Name IP Address Instance Name Using ASM ASM Instance Name Status
linux1 192.168.1.100 orcl1 Yes +ASM1 Available

linux2 192.168.1.101 orcl2 Yes +ASM2 Available

linux3 192.168.1.107 orcl3 Yes +ASM3 To be removed.

I will be removing node linux3, along with all metadata associated with it. Most of the
operations to remove the node from the cluster will need to be performed from a pre-
existing node that is available and will remain in the cluster. For this article, I will be
performing all of these actions from linux1 to remove linux3.

Remove the Instance

When removing a node from an Oracle10g RAC cluster, the DBA will first need to
remove the instance that is (or was) accessing the clustered database. This includes the
ASM instance if the database is making use of Automatic Storage Management. Most
of the actions to remove the instance need to be performed on a pre-existing node in the
cluster that is available and will remain available after the removal.

For this section, I will be removing the instance(s) on linux3 and performing all of
these operations from linux1:
This section provides two ways to perform the action of removing the instance(s): using
DBCA or command-line (svrctl). When possible, always attempt to use the DBCA
method.

Using DBCA

The following steps can be used to remove an Oracle10g instance from a clustered
database using DBCA - even if the instance on the node is not available.

1. First, verify that you have a good backup of the Oracle


Configuration Repository (OCR) using ocrconfig:
2. $ ocrconfig -showbackup
3.
4. int-linux1 2005/05/25 10:01:46
/u01/app/oracle/product/10.1.0/crs/cdata/crs
5.
6. int-linux1 2005/05/25 06:01:45
/u01/app/oracle/product/10.1.0/crs/cdata/crs
7.
8. int-linux1 2005/05/25 02:01:45
/u01/app/oracle/product/10.1.0/crs/cdata/crs
9.
10. int-linux1 2005/05/24 00:02:48
/u01/app/oracle/product/10.1.0/crs/cdata/crs
11.
int-linux1 2005/05/23 20:02:47
/u01/app/oracle/product/10.1.0/crs/cdata/crs

12. Next, run the DBCA from one of the nodes you are going
to keep. The database should remain up as well as leaving the
departing instance up and running (if it is available).

$ dbca &

Within the DBCA, perform the following steps:

1. Choose "Oracle Real Application Clusters


database" and click [Next].
2. Choose "Instance Management" and click [Next].
3. Choose "Delete an instance" and click [Next].
4. On the next screen, select the cluster database from
which you want to remove the instance from. You will
need to supply the system privilege (SYSDBA) username
and password and click [Next].
5. On the next screen, a list of cluster database
instances will appear. Highlight the instance you would
like to delete (linux3 in my example) and click [Next].
6. If you have services configured, they will need to
be reassigned. Modify the services so that each service
can run on one of the remaining instances. Set "not used"
for each service regarding the instance that is to be
deleted. click [Finish].
7. Acknowledge the dialog box by clicking [Ok]
when asked to confirm you want to delete the selected
instance.
8. Acknowledge the second dialog by clicking [Ok]
when asked to confirm the DBCA will remove the Oracle
instance and all associated OFA directory structure. All
information about this instance will be deleted.

If the database is in archive log mode, the DBA may receive the following errors:

ORA-00350 or ORA-00312

This may occur because the DBCA cannot drop the current log, as it needs
archived. This issue is fixed with 10.1.0.3 patchset. If the DBA encounters this
error, click the [Ignore] button and when the DBCA completes, manually archive
the logs for the deleted instance and drop the log group:

SQL> alter system archive log all;


SQL> alter database drop logfile group 3;

9. After the DBCA has removed the instance, click


when prompted to perform another operation. The
[No]
DBCA will exit.
13. Verify that the redo thread for the dropped instance has
been removed by querying v$log:
14. SQL> select group#, thread#, status from v$log;
15.
16. GROUP# THREAD# STATUS
17. ---------- ---------- ----------------
18. 1 1 CURRENT
19. 2 1 INACTIVE
20. 3 2 CURRENT
4 2 INACTIVE

If for any reason the redo thread is not disabled then disable the
thread:

SQL> alter database disable public thread 3;

21. Verify that the instance was removed from the Oracle
Configuration Repository (OCR) using the srvctl config
database -d <db_name> command. The following example
assumes the name of the clustered database is orcl:
22. $ srvctl config database -d orcl
23. linux1 orcl1
/u01/app/oracle/product/10.1.0/db_1
linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1

You should also run the crs_stat command:

$ $ORA_CRS_HOME/bin/crs_stat | grep ins


NAME=ora.orcl.orcl1.inst
NAME=ora.orcl.orcl2.inst

24. If the node had an ASM instance and the node will no
longer be a part of the cluster, the DBA should remove the ASM
instance using the following, assuming the node being removed is
linux3:
25. $ srvctl stop asm -n linux3
$ srvctl remove asm -n linux3

Verify that the ASM instance was removed using the following:

$ srvctl config asm -n linux3

If the removal of the ASM instance was successful, you should


simply get your prompt back with no output. If however, you
receive a record back (i.e. +ASM3
/u01/app/oracle/product/10.1.0/db_1), then the removal of
the ASM instance failed.

Using SRVCTL

The following steps can be used to remove an Oracle10g instance from a clustered
database using the command-line utility srvctl - even if the instance on the node is not
available.

1. First, verify that you have a good backup of the Oracle


Configuration Repository (OCR) using ocrconfig:
2. $ ocrconfig -showbackup
3.
4. int-linux1 2005/05/25 10:01:46
/u01/app/oracle/product/10.1.0/crs/cdata/crs
5.
6. int-linux1 2005/05/25 06:01:45
/u01/app/oracle/product/10.1.0/crs/cdata/crs
7.
8. int-linux1 2005/05/25 02:01:45
/u01/app/oracle/product/10.1.0/crs/cdata/crs
9.
10. int-linux1 2005/05/24 00:02:48
/u01/app/oracle/product/10.1.0/crs/cdata/crs
11.
int-linux1 2005/05/23 20:02:47
/u01/app/oracle/product/10.1.0/crs/cdata/crs

12. Use the srvctl command-line utility from a pre-existing /


available node in the cluster to remove the instance (from the
node to be removed) from the cluster. This should be run as the
oracle UNIX user account as follows:
13. $ srvctl remove instance -d orcl -i orcl3
Remove instance orcl3 for the database orcl? (y/[n])
y
14. Verify that the redo thread for the dropped instance has
been removed by querying v$log. If for any reason the redo
thread is not disabled then disable the thread:

SQL> alter database disable public thread 3;

15. Verify that the instance was removed from the Oracle
Configuration Repository (OCR) using the srvctl config
database -d <db_name> command:
16. $ srvctl config database -d orcl
17. linux1 orcl1
/u01/app/oracle/product/10.1.0/db_1
linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1

You should also run the crs_stat command:

$ $ORA_CRS_HOME/bin/crs_stat | grep ins


NAME=ora.orcl.orcl1.inst
NAME=ora.orcl.orcl2.inst

18. If the node had an ASM instance and the node will no
longer be a part of the cluster, the DBA should remove the ASM
instance using the following, assuming the name of the clustered
database is named orcl and the node being removed is linux3:
19. $ srvctl stop asm -n linux3
$ srvctl remove asm -n linux3

Verify that the ASM instance was removed using the following:

$ srvctl config asm -n linux3

Remove the Node from the Cluster

Now that the instance has been removed (and the ASM instance is applicable), we now
need to remove the node from the cluster. This is a manual method performed using
scripts that need to be run on the deleted node (if available) to remove the CRS install as
well as scripts that should be run on any of the existing nodes (i.e. linux1).

Before proceeding to the steps for removing the node, we need to determine the node
name and the CRS-assigned node number for each node stored in the Oracle Cluster
Registry. This can be run from any of the existing nodes (linux1 for this example).

$ $ORA_CRS_HOME/bin/olsnodes -n
linux1 1
linux2 2
linux3 3
Now that we have the node name and node number, we can start the steps to remove the
node from the cluster. Here are the steps that should be executed from a pre-existing
(available) node in the cluster (i.e. linux1):

1. Run the NETCA utility to remove the network configuration:


2. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
$ netca &

Perform the following steps within the NETCA:

1. Choose "Cluster Configuration" and click [Next].


2. Only select the node you are removing and click [Next].
3. Choose "Listener Configuration" and click [Next].
4. Choose "Delete" and delete any listeners configured on the
node you are removing. Acknowledge the dialog box to delete the
listener configuration.

NOTE: For some reason, I needed to login to linux3 and


manually kill the process ID for the listener process.

3. Run the crs_stat command to verify that all database resources


are running on nodes that are going to be kept:

$ $ORA_CRS_HOME/bin/crs_stat

For example, verify that the node to be removed is not running any
database resources. Look for the record of type:

NAME=ora.<db_name>.db
TYPE=application
TARGET=ONLINE
STAT=ONLINE on <node>

Assuming the name of the clustered database is orcl, this is the record
that was returned from the crs_stat command on my system:

NAME=ora.orcl.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on linux1

I am safe here since the resource is running on linux1 and not linux3 -
the node I want to remove.

If, however, the database resource was running on linux3, we would


need to relocate it to a node that we are going to keep (i.e. linux1) using
the following:

$ $ORA_CRS_HOME/bin/crs_relocate ora.<db_name>.db

4. From a pre-existing node (i.e. linux1), remove the nodeapps


from the node you are removing as the root UNIX user account:
5. $ su
6. Password: xxxxx
7.
8. # srvctl stop nodeapps -n linux3
9. CRS-0210: Could not find resource
ora.linux3.LISTENER_LINUX3.lsnr.
10.
11. # srvctl remove nodeapps -n linux3
12. Please confirm that you intend to remove the node-level
applications on node linux3 (y/[n]) y
#

13. The next step is to update the node list using the updateNodeList
option to the OUI as the oracle user. This procedure will remove the
node to be deleted from the list of node locations maintained by the OUI
by listing only those remaining nodes. The only file that I know of that
gets modified is
$ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is
the command I used for removing linux3 from the list. Notice that the
DISPLAY variable needs to be set even though the GUI does not run.
14. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
15.
16. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs
-updateNodeList \
17. ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1 \
CLUSTER_NODES=linux1,linux2

Note that the command above will produce the following error which can
safely be ignored:

PRKC-1002 : All the submitted commands did not execute


successfully

18. If the node to be removed is still available and running the CRS
stack, the DBA will need to stop the CRS stack and remove the ocr.loc
file. These tasks should be performed as the root user account and on
the node that is to be removed from the cluster. The nosharedvar option
assumes the ocr.loc file is not on a shared file system (which is the case
in my example). If the file does exist on a shared file system, then
specify sharedvar. From the node to be removed (i.e. linux3) and as
the root user, run the following:
19. $ su
20. Password: xxxx
21.
22. # cd $ORA_CRS_HOME/install
23. # ./rootdelete.sh remote nosharedvar
24. Running Oracle10 root.sh script...
25. \nThe following environment variables are set as:
26. ORACLE_OWNER= oracle
27. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs
28. Finished running generic part of root.sh script.
29. Now product-specific root actions will be performed.
30. Shutting down Oracle Cluster Ready Services (CRS):
31. /etc/init.d/init.crsd: line 188: 29017 Aborted
$ORA_CRS_HOME/bin/crsd -2
32.
33. Shutting down CRS daemon.
34. Shutting down EVM daemon.
35. Shutting down CSS daemon.
36. Shutdown request successfully issued.
37. Checking to see if Oracle CRS stack is down...
38. Oracle CRS stack is not running.
39. Oracle CRS stack is down now.
40. Removing script for Oracle Cluster Ready services
41. Removing OCR location file '/etc/oracle/ocr.loc'
Cleaning up SCR settings in '/etc/oracle/scls_scr/linux3'

42. Next, using the node name and CRS-assigned node number for
the node to be deleted, run the rootdeletenode.sh command as
follows. Keep in mind that this command should be run from a pre-
existing / available node (i.e. linux1) in the cluster as the root UNIX
user account:
43. $ su
44. Password: xxxx
45.
46. # cd $ORA_CRS_HOME/install
47. # ./rootdeletenode.sh linux3,3
48. Running Oracle10 root.sh script...
49. \nThe following environment variables are set as:
50. ORACLE_OWNER= oracle
51. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs
52. Finished running generic part of root.sh script.
53. Now product-specific root actions will be performed.
54. clscfg: EXISTING configuration version 2 detected.
55. clscfg: version 2 is 10G Release 1.
56. Successfully deleted 13 values from OCR.
57. Key SYSTEM.css.interfaces.nodelinux3 marked for
deletion is not there. Ignoring.
58. Successfully deleted 5 keys from OCR.
59. Node deletion operation successful.
'linux3,3' deleted successfully

To verify that the node was successfully removed, use the following as
either the oracle or root user:

$ $ORA_CRS_HOME/bin/olsnodes -n
linux1 1
linux2 2

60. Now, switch back to the oracle UNIX user account on the same
pre-existing node (linux1) and run the runInstaller command to
update the OUI node list, however this time for the CRS installation
($ORA_CRS_HOME). This procedure will remove the node to be deleted
from the list of node locations maintained by the OUI by listing only
those remaining nodes. The only file that I know of that gets modified is
$ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is
the command I used for removing linux3 from the list. Notice that the
DISPLAY variable needs to be set even though the GUI does not run.
61. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
62.
63. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs
-updateNodeList \
64. ORACLE_HOME=/u01/app/oracle/product/10.1.0/crs \
CLUSTER_NODES=linux1,linux2

Note that each of the commands above will produce the following error
which can safely be ignored:

PRKC-1002 : All the submitted commands did not execute


successfully

The OUI now contains the valid nodes that are part of the cluster!

65. Now that the node has been removed from the cluster, the DBA
should manually remove all Oracle10g RAC installation files from the
deleted node. Obviously, this applies only if the removed node is still
accessible and only if the files are not on a shared file system that is still
being accessed by other nodes in the cluster!

From the deleted node (linux3) I performed the following tasks as the
root UNIX user account:

1. Remove ORACLE_HOME and ORA_CRS_HOME:


2. # rm -rf /u01/app/oracle/product/10.1.0/db_1
# rm -rf /u01/app/oracle/product/10.1.0/crs

3. Remove all init scripts and soft links (for Linux). For a
list of init scripts and soft links for other UNIX platforms, see
Metalink Note: 269320.1
4. # rm -f /etc/init.d/init.cssd
5. # rm -f /etc/init.d/init.crs
6. # rm -f /etc/init.d/init.crsd
7. # rm -f /etc/init.d/init.evmd
8. # rm -f /etc/rc2.d/K96init.crs
9. # rm -f /etc/rc2.d/S96init.crs
10. # rm -f /etc/rc3.d/K96init.crs
11. # rm -f /etc/rc3.d/S96init.crs
12. # rm -f /etc/rc5.d/K96init.crs
13. # rm -f /etc/rc5.d/S96init.crs
# rm -Rf /etc/oracle/scls_scr

14. Remove all remaining files:


15. # rm -rf /etc/oracle
16. # rm -f /etc/oratab
17. # rm -f /etc/oraInst.loc
18. # rm -rf /etc/ORCLcluster
19. # rm -rf /u01/app/oracle/oraInventory
20. # rm -rf /u01/app/oracle/product
21. # rm -rf /u01/app/oracle/admin
22. # rm -f /usr/local/bin/coraenv
23. # rm -f /usr/local/bin/dbhome
# rm -f /usr/local/bin/oraenv

24. Remove all CRS/EVM entries from the file


/etc/inittab:
25. h1:35:respawn:/etc/init.d/init.evmd run
>/dev/null 2>&1 </dev/null
26. h2:35:respawn:/etc/init.d/init.cssd fatal
>/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null
2>&1 </dev/null

It is not very easy to read this output if you have large number of nodes with lots of
resources configured on them. You can use the “-t” option with the crs_stat to see the
output in a tabular form like:

crs_stat

However, this output is designed for a fixed terminal width of 60 characters. Hence the
resource names are truncated. This makes it even more difficult to see what resource is
in which state.

Thankfully, there are some scripts out there that parse the default output of the crs_stat
and provide a tabular output in a wider form so you can see what your are looking for.

As being a one liner junkie I prefer my own version

crs_stat | awk -F= '/NAME=/{n=$2}/TYPE=/{t=$2}/TARGET=/{g=$2}/STATE=/


{s=$2; printf("%-45s%-15s%-10s%-30s\n", n,t,g,s)}'

I also have an alias my_crs_stat for this command so I don’t have type it all the time.
alias my_crs_stat='crs_stat | awk -F= '\''/NAME=/{n=$2}/TYPE=/
{t=$2}/TARGET=/{g=$2}/STATE=/{s=$2; printf("%-45s%-15s%-10s%-30s\n",
n,t,g,s)}'\'''

This will do the trick and provide a fancier output.

my_crs_stat

1. * crs_stat
* crs_register
* crs_unregister
* crs_start
* crs_stop
* crs_getperm
* crs_profile
* crs_relocate
* crs_setperm
* crsctl check crsd
* crsctl check cssd
* crsctl check evmd
* crsctl debug log
* crsctl set css votedisk
* crsctl start resources
* crsctl stop resources

2. 10g RAC administration


3. See OCFS Oracle Cluster Filesystem, ASM, TNSnames configuration,
Oracle Database 11g New Features, Raw devices,
4. Resource Manager, Dbca
5. See http://www.oracle.com/technology/support/metalink/index.html
to view certification matrix
This is just a draft of basic RAC 10g administration
6.

RAC benefit and characteristics


- does not protect from human errrors
- increased availabilty from node/instance failure
- speed up parallel DSS queries
- no speed up parallel OLTP processes
- no availability increase on data failures
- no availability increase on network failures
- no availability increase on release upgrades
- no scalability increased for applications workloads in all cases

RAC tuning - After migration to RAC test:


- Interconnect latency
- Instance recovery time
- Application strongly relying on table truncates, full scan tables,
sequences and non-sequences key generation,
global context variables

7.

RAC specific background processes for the database instance


Cluster Synchronization Service (CSS)
ocssd daemon, manages cluster configuration

Cluster Ready Services (CRS)


manages resources(listeners, VIPs, Global Service Daemon GSD, Oracle
Notification Service ONS)
crsd daemon backup the OCR every for hours, configuration is stored
in OCR

Event Manager (EVM)


evmd daemon, publish events

LMSn coordinate block updates


LMON global enqueue for shared locks
LMDn manages requests for global enqueues
LCK0 handle resources not requiring Cache Fusion
DIAG collect diagnostic info

GSD 9i is not compatible with 10g

8.

FAN Fast Application Notification


- Must connect using service

Logged to:
&ORA_CRS_HOME/racg/dump
$ORA_CRS_HOHE/log/<nodename>/racg

<event_type> VERSION=<n.n>
service=<service_namne.db_domain_name>
[database=<db_unique_name> [instance=<instance_name>]]
[host=<hostname>]
status=<event_status> reason=<event_reason> [card=<n>]
timestamp=<event_date> <event_time>

event_type Description
SERVICE Primary application service event
SRV_PRECONNECT Preconnect application service event (TAF)
SERVICEMEMBER Application service on a specific instance event
DATABASE Database event
INSTANCE Instance event
ASM ASM instance event
NODE Cluster node event
#FAN events can control the workload per instance for each service

9.

Oracle Notification Service ONS


- Transmits FAN events
- For every FAN event status change, all executables in
$ORA_CRS_HOME/racg/usrco are launched (callout scripts)

The ONS process is $ORA_CRS_HOME/opmn/bin/ons


Arguments:
-d: Run in daemon mode
-a <command>: <command> can be [ping, shutdown, reload, or debug]

[$ORA_CRS_HOME/opmn/conf/ons.config]
localport=6lOO
remoteport=6200
loglevel=3
useocr=on

onsctl start/stop/ping/reconfig/debug/detailed

10.

FCF Fast Connection Failover


- A JDBC application configured to use FCF automatically subscribes to
FAN events
- A JDBC application must use service names to connect
- A JDBC application must use implicit connection cache
- $ORACLE_HOME/opmn/lib/ons.jar must be in classpath
- -Doracle.ons.oraclehome - <location of oracle home>
or
System.setProperty ("oracle.ons.oraclehome",
"/u01/app/oracle/product/10.2.0/db_l");

OracleDataSource ods = new OracleDataSource();


ods.setUser("USERl");
ods.setPassword("USERl");
ods.setConnectionCachingEnabled(true);
ods.setFastConnectionFailoverEnabled(true);
ods.setConnectionCacheName("MyCache");
ods.setConnectionCacheProperties(cp);
ods.setURL("jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=londonl-vip)(PORT=152l)
(ADDRESS=(PROTOCOL=TCP)(HOST=london2-vip)(PORT=152l)
(CONNECT_DATA=(SERVICE_NAME=SERVICE1)))")

11.

Check for main Clusterware services up

#check Event Manager up


ps -ef | grep evmd
#check Cluster Synchronization Services up
ps -ef | grep ocssd
#check Cluster Ready Services up
ps -ef | grep crsd
#check Oracle Notification Service
ps -ef | grep ons

[/etc/inittab]
...
hl:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&l </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&l </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

12.

crs_stat
#Tested, as root
#Lists the status of an application profile and resources
#crs_stat [resource_name [...]] [-v] [-l] [-q] [-c cluster_node]
$ORA_CRS_HOME/bin/crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.e2.gsd application ONLINE ONLINE e2
ora.e2.ons application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e2

VIP Normal
Name Type Target State Host
------------------------------------------------------------
ora.e2.vip application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e3
VIP Node 2 is down
Name Type Target State Host
------------------------------------------------------------
ora.e2.vip application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e2

crs_stat -p ...
AUTO_START = #2 CRS will not start after system boot

crs_stat
NAME=ora.RAC.RACl.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on londonl

NAME=ora.RAC.SERVICEl.RACl.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

#use -v for verbose resource use


#use -p for a lot of details
#use -ls to view resources and relative owners

13.
Voting disk
On Shared storage, Used by CSS, contains nodes that are currently
available within the cluster
If Voting disks are lost and no backup is available then Oracle
Clusterware must be reinstalled
3 way multiplexing is ideal

#backup a voting disk online


dd if=<fname> of=<out_fname>

crsctl
#Tested, as oracle
$ORA_CRS_HOME/bin/crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy

#add online a new voting disk(10.2), -force if Oracle Clusterware is


not started
crsctl add css votedisk 'new votedisk path' -force

crsctl start/stop/enable/disable crs

#set/unset parameters on OCR


crsctl set/unset <parameter> <value>

You can list the currently configured voting disks:


crsctl query css votedisk
0. 0 /u02/oradata/RAC/CSSFilel
1. 1 /u03/oradata/RAC/CSSFile2
2. 2 /u04/oradata/RAC/CSSFile3

Dynamically add and remove voting disks to an existing Oracle


Clusterware installation:
crsctl add/delete css votedisk <path> -force

CRS log and debug


#as root, enable extra debug for the running CRS daemons as well as
those running in future
#enable to inspect system reboots
crsctl debug log crs

#Collect log and traces to upload to Oracle Support


diagcollection.pl

14.

OCR - Oracle Cluster Registry

[/etc/oracle/ocr.loc](10g) or [/etc/oracle/srvConfig.loc](9i, still


exists in 10g for compatibility)
ocrconfig_loc=/dev/raw/rawl
ocrmirrorconfig_loc=/dev/raw/raw2
local_only=FALSE

OCRCONFIG - Command-line tool for managing Oracle Cluster Registry


#recover OCR logically, must be done on all nodes
ocrconfig -import exp.dmp
#export OCR content logically
ocrconfig -export
#recover OCR from OCR backup
ocrconfig -restore bck.ocr
#show backup status
#crsd daemon backup the OCR every for hours, the most recent backup
file is backup00.ocr
ocrconfig -showbackup
londonl 2005/08/04 11:15:29
/uOl/app/oracle/product/lO.2.0/crs/cdata/crs
londonl 2005/08/03 22:24:32
/uOl/app/oracle/product/10.2.0/crs/cdata/crs
#change OCR autobackuo location
ocrconfig -backuploc
#must be run on each affected node
ocrconfig -repair ocr <filename>
ocrconfig -repair ocrmirror <filename>
#force Oracle Clusterware to restart on a node, may lose recent OCR
updates
ocrconfig -overwrite

CVU - Cluster verification utility to get status of CRS resources


dd : use it safely to backup voting disks
when nodes are added/removed

#verify restore
cluvfy comp ocr -n all

ocrcheck
#OCR integrity check, validate the accessibility of the device and its
block integrity
log to current dir or to $OCR_HOME/log/<node>/client

ocrdump
#dump the OCR content to a text file, if succeds then integrity of
backups is verified
OCRDUMP - Identify the interconnect being used
$ORA CRS HOME/bin/ocrdump.bin -stdout -keyname SYSTEM.css.misscount
-xml

15.

Pre install, prerequisite


(./run)cluvfy : run from install media or CRS_HOME, verify
prerequisites on all nodes

Post installation
- Backup root.sh
- Set up other user accounts
- Verify Enterprise Manager / Cluster Registry by running srvctl config
database -d db_name

16.

SRVCTL
Stores infos in OCR, manages:
Database, Instance, Service, Node applications, ASM, Listener

srvctl config database -d <db_name> : Verify Enterprise Manager /


Cluster Registry
set SRVM_TRACE=TRUE environment var to create Java based tool
trace/debug file for srvctl
#-v to check services
srvctl status database -d RAC -v SERVICE1
srvctl start database -d <name> [-o mount]
srvctl stop database -d <name> [-o stop_options]
#moves parameter file
srvctl modify database -d name -p /u03/oradata/RAC/spfileRAC.ora
srvctl remove database -d TEST
#Verify the OCR configuration
srvctl config database - TEST

srvctl start instance -d RACDB -i "RAC3,RAC4"


srvctl stop instance -d <orcl> -i "orcl3,orcl4" -o immediate
srvctl add instance -d RACDB -i RAC3 -n londonS
#move the instance to node london4
srvctl modify instance -d RAC -i RAC3 -n london4
#set a dependency of instance RAC3 to +ASM3
srvctl modify instance -d RAC -i RAC3 -s +ASM3
#removes an ASM dependency
srvctl modify instance -d RAC -i RAC3 -r

#stop all applications on node


srvctl stop nodeapps -n londonl
#-a display the VIP configuration
srvctl config nodeapps -n londonl -a
srvctl add nodeapps -n london3 -o $0RACLE_H0ME -A london3-
vip/255.255.0.0/eth0

17.

Services
Changes are recorded in OCR only! Must use DBMS_SERVICE to update the
dictionary

srvctl start service -d RAC -s "SERVICE1,SERVICE2"


srvctl status service -d RAC -s "SERVICE1,SERVICE2"
srvctl stop service -d RAC -s "SERVICE1,SERVICE2" -f
srvctl disable service -d RAC -s "SERVICE2" -i RAC4
srvctl remove service -d RAC -s "SERVICE2"

#relocate from RAC2 to RAC4


srvctl relocate service -d RAC -s "SERVICE2" -i RAC2 -t RAC4

#preferred RAC1,RAC2 and available RAC3,RAC4


#-P PRECONNECT automatically creates a ERP and ERP_PRECONNECT service
to use as BACKUP in tns_names
#See TNSnames configuration
#the service is NOT started, must be started manually (dbca do it
automatically)
srvctl add service -d ERP -s SERVICE2 -i "RAC1,RAC2" -a
"RAC3,RAC4" -P PRECONNECT

#show configuration, -a shows TAF conf


srvctl config service -d RAC -a

#modify an existing service


srvctl modify service -d RACDB -s "SERVICE1" -i "RAC1,RAC2" -a
"RAC3,RAC4"
srvctl stop service -d RACDB -s "SERVICE1"
srvctl start service -d RACDB -s "SERVICE1"

Views
GV$SERVICES
GV$ACTIVE_SERVICES
GV$SERVICEMETRIC
GV$SERVICEMETRIC_HISTORY
GV$SERVICE_WAIT_CLASS
GV$SERVICE_EVENT
GV$SERVICE_STATS
GV$SERV_MOD_ACT_STATS

18.

SQL for RAC


select * from V$ACTIVE_INSTANCES;

Cache Fusion - GRD Global Resource Directory


GES(Global Enqueue Service)
GCS(Global Cache Service)

Data Guard & RAC


- Configuration files at primary location can be stored in any shared
ASM diskgroup, on shared raw devices,
on any shared cluster file system. They simply have to be shared

19.

VIP virtual IP
- Both application/RAC VIP fail over if related application fail and
accept new connections
- Recommended RAC VIP sharing among database instances but not among
different applications because...
- ...VIP fail over if the application fail over
- A failed over VIP application accepts new connection
- Each VIP requires an unused and resolvable IP address
- VIP address should be registered in DNS
- VIP address should be on the same subnet of the public network
- VIPs are used to prevent connection requests timeout during client
connection attempts

Changing a VIP
1- Stop VIP dependent cluster components on one node
2- Make changes on DNS
3- Change VIP using SRVCTL
4- Restart VIP dependent components
5- Repeat above on remaining nodes

20.

oifcfg
allocating and deallocating network interfaces, get values from OCR

To display a list of networks


oifcfg getif
eth1 192.168.1.0 global cluster_interconnect
eth0 192.168.0.0 global public
display a list of current subnets
oifcfg iflist
etho 147.43.1.0
ethl 192.168.1.0

To include a description of the subnet, specify the -p option:


oifcfg iflist -p
ethO 147.43.1.0 UNKNOWN
ethl 192.168.1.0 PRIVATE

In 10.2 public interfaces are UNKNOWN.


To include the subnet mask, append the -n option to the -p option:
oifcfg if list -p -n
etho 147.43.1.0 UNKNOWN 255.255.255.0
ethl 192.168.1.0 PRIVATE 255.255.255.0

21.

Db parameters with SAME VALUE across all instances


active_instance_count
archive_lag_target
compatible
cluster_database RAC param
cluster_database_instance RAC param
#Define network interfaces that will be used for interconnect
#it is not a failover but a redistribution. If an address not work then
stop all
#Overrides the OCR
cluster_interconnects RAC param = 192.168.0.10;
192.168.0.11; ...
control_files
db_block_size
db_domain
db_files
db_name
db_recovery_file_dest
db_recovery_file_dest_size
db_unique_name
dml_locks (when 0)
instance_type (rdbms or asm)
max_commit_propagation_delay RAC param
parallel_max_servers
remote_login_password_file
trace_enabled
#cannot be mixed AUTO and MANUAL in a RAC
undo_management

Db parameters with INSTANCE specific VALUE across all instances


instance_name
instance_number
thread
undo_tablespace #system param

Listener parameters
local_listener='(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
192.168.0.13) (PORT = 1521)))'
#allow pmon to register with local listener when not using 1521 port
remote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
192.168.2.9) (PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST
=192.168.2.10)(PORT = 1521)))'
#make the listener aware of the load of the listeners of other nodes

Important Rac Parameters


gc_files_to_locks #other than default disable Cache Fusion
recovery_parallelism #number of redo application server processes in
instance or media recovery

Rac and Standby parameters


dg_broker_config_file1 #shared between primary and standby instances
dg_broker_config_file2 #different from dg_broker_config_file1, shared
between primary and standby instances

22.
23.

Shared contents
datafiles, controlfiles, spfiles, redo log

Shared or local?
RAW_Dev File_Syst ASM
NFS OCFS
- Datafiles : shared mandatory
- Control files : shared mandatory
- Redo log : shared mandatory
- SPfile : shared mandatory
- OCR and vote : shared mandatory Y Y N
- Archived log : shared not mandatory. N Y N
Y
- Undo : local
- Flash Recovery : shared Y
Y Y
- Data Guard broker conf.: shared(prim. & stdby) Y Y

24.

Adding logfile thread groups for a new instance


#To support a new instance on your RAC
1) alter database add logfile thread 3 group 7;
1) alter database add logfile thread 3 group 8;
#makes the thread available for use by any instance
2) alter database enable thread 3;
# if you want to change an used thread
2) alter system set thread=3 scope=pfile sid='RAC01'
3) srvctl stop instance -d RACDB -i RAC01

25.

Views and queries


select * from GV$CACHE_TRANSFER

26.

An instance failed to start, what do we do?


1) Check the instance alert.log
2) Check the Oracle Clusterware software alert.log
3) Check the resource state using CRS_STAT

27. Install
28. See official Note 239998.1 for removing crs installation
29. See http://startoracle.com/2007/09/30/so-you-want-to-play-with-
oracle-11gs-rac-heres-how/ to install 11g RAC on VMware
30. See
http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_is
csi.html to install on Linux with iSCSI disks
31. See http://www.oracle-
base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4Using
VMware.php to install on VMware
32. See OCFS Oracle Cluster Filesystem
33.
34.
35.
36.
Prerequisites check
#check node connectivity and Clusterware integrity
./runcluvfy.sh stage -pre dbinst -n all
./runcluvfy.sh stage -post hwos -n "linuxes,linuxes1" -verbose
WARNING:
Package cvuqdisk not installed.

rpm -Uvh clusterware/rpm/cvuqdisk-1.0.1-1.rpm

WARNING:
Unable to determine the sharedness of /dev/sdf on nodes:
linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes,l
inuxes,linuxes,linuxes,linuxes,linuxes

Safely ignore this error

./runcluvfy.sh comp peer -n "linuxes,linuxes1" -verbose


./runcluvfy.sh comp nodecon -n "linuxes,linuxes1" -verbose
./runcluvfy.sh comp sys -n "linuxes,linuxes1" -p crs -verbose
./runcluvfy.sh comp admprv -n "linuxes,linuxes1" -verbose -o user_equiv
./runcluvfy.sh stage -pre crsinst -n "linuxes,linuxes1" -r 10gR2
37.
38.
39.
40.
41.
Restart intallation - Remove from each node
su -c "$ORA_CRS_HOME/install/rootdelete.sh;
$ORA_CRS_HOME/install/rootdeinstall.sh"
#oracle user
export DISPLAY=192.168.0.1:0.0
/app/crs/oui/bin/runInstaller -removeHome -noClusterEnabled
ORACLE_HOME=/app/crs LOCAL_NODE=linuxes
rm -rf $ORA_CRS_HOME/*
#root
su -c "chown oracle:dba /dev/raw/*; chmod 660 /dev/raw/*; rm -rf
/var/tmp/.oracle; rm -rf /tmp/.oracle"
42.
43.
44. #Format rawdevices using
45. dd if=/dev/zero of=/dev/raw/raw6 bs=1M count=250
46.
47. #If related error message appears during installation, manually
launch on related node
48. /app/crs/oui/bin/runInstaller -attachHome -noClusterEnabled
ORACLE_HOME=/app/crs ORACLE_HOME_NAME=OraCrsHome
CLUSTER_NODES=linuxes,linuxes1 CRS=true
"INVENTORY_LOCATION=/app/oracle/oraInventory" LOCAL_NODE=linuxes
49.
50. runcluvfy.sh stage -pre crsinst -n linuxes -verbose
51.

52.
/etc/hosts example
# Do not remove the following line, or various programs
# that require network functionality will fail,
127.0.0.1 localhost
147.43.1.101 londonl
147.43.1.102 london2
#VIP is usable only after VIPCA utility run,
#should be created on the public interface. Remember that VIPCA is a
GUI tool
147.43.1.201 londonl-vip
147.43.1.202 london2-vip
192.168.1.1 londonl-priv
192.168.1.2 london2-priv
53.
54.
55.
Kernel Parameters(/etc/sysctl.conf) Recommended Values
kernel.sem (semmsl) 250
kernel.sem (semmns) 32000
kernel.sem (semopm) 100
kernel.sem (semmni) 128
kernel.shmall 2097152
kernel.shmmax Half the size of physical memory
kernel.shmmni 4096
fs.file-max 65536
net.core.rmem_default 262144
net.core.rmem_max 262144
net.core.wmem_default 262144
net.core.wmem_max 262144
net.ipv4.ip_local_port_range 1024 to 65000
56.
57.
58.
RAC restrictions
- dbms_alert, both publisher and subscriber must be on same instance,
AQ is the workaround
- dbms_pipe, only works on the same instance, AQ is the workaround
- UTL_FILE, directories, external tables and BFILEs need to be on
shared storage
59.
60.
61.
Implementing the HA High Availability Framework
Use srvctl to start/stop applications
#Manually create a script that OCR will use to start/stop/status

#Create an application VIP.


#This command generates an application profile called haf demovip.cap
in the $ORA_CRS_HOME/crs/ public directory.
$ORA_CRS_HOME/bin/crs_profile -create hafdemovip -t application -a
$ORA_CRS_HOME/bin/usrvip -o oi=eth0,ov=147.43.1.200,on=255.255.0.0

#As the oracle user, register the VIP with Oracle Clusterware:
ORA_CRS_HOME/bin/crs_register hafdemovip

#As the root user, set the owner of the apphcation VIP to root:
$ORA_CRS_HOME/bin/crs_setperm hafdemovip -o root

#As the root user, grant the oracle user permission to run the script:
$ORA_CRS_HOME/bin/crs_setperm hafdemovip -u user:oracle:r-x

#As the oracle user, start the application VIP:


$ORA_CRS_HOME/bin/crs_start hafdemovip

2. Create an application profile.


$ORA_CRS_HOHE/bin/crs_profile -create hafdemo -t application -d "HAF
Demo" -r hafdemovip -a /tmp/HAFDemoAction -0 ci=5,ra=60

3. Register the application profile with Oracle Clusterware.


$ORA_CRS_HOHE/bin/crs_register hafdemo

$ORA_CRS_HOME/bin/crs_start hafdemo
62.
63.
64.
CRS commands
crs_profile
crs_register
crs_unregister
crs_getperm
crs_setperm
crs_start
crs_stop
crs_stat
crs_relocate
65.
66.
67.
Server side callouts
Oracle instance up(/down?)
Service member down(/up?)
Shadow application service up(/down?)
68.
69.
70.
Adding a new node
- Configure hardware and OS
- With NETCA reconfigure listeners and add the new one
- $ORA_CRS_HOME/oui/bin/addnode.sh from one of existing nodes to define
the new one to all existing nodes
- $ASM_HOME/oui/bin/addnode.sh from one of existing nodes (if using
ASM)
- $ORACLE_HOME/oui/bin/addnode.sh from one of existing nodes
- racgons -add_config to add ONS metadata to OCR from one of existing
nodes

Removing a node from a cluster


- Remove node from clusterware
- Check that ONS configuration has been updated on other node
- Check that database and instances are terminated on node to remove
- Check that node has been removed from database and ASM repository
- Check that software has been removed from database and ASM homes on
node to remove
71.
72.
73.
RAC contentions
- enq:HW - contention and gc current grant wait events
Use larger uniform extent size for objects

- enq: TX - index contention


Re-create the index as a global hash partitioned index.
Increase the sequence cache size if retaining the sequence.
Re-create the table using a natural key instead of a surrogate key.
74.
75.
76.

S-ar putea să vă placă și