Documente Academic
Documente Profesional
Documente Cultură
Node A Node B
Instance Instance
1 2
Instance Instance
3 4
Node C Node D
Operating system
Oracle RDBMS
Data files
Online redo log files
Control files
Instance Instance
1 2
Instance Instance
1 2
Block
access
time With
(milli- cache
seconds) fusion
20
1
0.01
Block in Block in Block
local cache remote cache on disk
• May be X or S mode
• Can serve a copy of blocks to other instances
• Can read blocks from disk
• In X mode
– No other instance has the lock in X mode
– All unwritten changes are in local cache
– Can write changed blocks to disk,
asynchronously informing DLM of the write
• In S mode, block cannot be dirty so no disk writes
allowed
Past Image
A Past Image (PI) is a copy of a globally dirty block image maintained in cache. It is saved
when a dirty block is served to another instance after setting the lock role to global (if it
was not already set). A PI must be maintained by an instance until it, or a later version of
the block, is written to disk. The DLM is responsible for informing an instance when a PI is
no longer needed because another instance wrote the block.
When an instance needs to write a block, to satisfy a checkpoint request, for example, the
instance checks the role of the lock covering the block. If the role is global, the instance
must inform the DLM of the write requirement. The DLM is responsible for finding the
most current block image and informing the instance holding that image to perform the
block write. The DLM then informs all holders of the global lock that they can release the
lock on any PI copies of the block. This allows instances to free buffers holding the PI.
A block written record (BWR) is placed in its redo log buffer when an instance is told it can
free up a PI buffer. This record is used by recovery processes to indicate that redo for the
block is not needed prior to this point. Although the BWR makes recovery more efficient,
the instance does not force a flush of the log buffer after creating it because it is not
essential for accurate recovery.
Past Image
When an instance receives a current copy of block for which it already has a PI copy, it
must keep both copies. If the receiving instance then has to serve the block to another
instance, along with the request, the DLM indicates whether a write is in progress that
would free the PI (that is, a later version of the block is being written).
If such a write is not occurring, the instance replaces the old PI with a new PI created from
the current image. This is called a merge on exit because it results in an apparent single
string of redo (from this instance) for the block, terminated by just one BWR when the
block is finally written to disk.
If such a write is occurring, the instance creates a new PI from the current image. It is
possible, given the asynchronous messaging protocol, that an instance may have more than
one PI of a single block. An instance maintains a maximum of two PIs associated with a
given block.
When the current image is served, a write-in-progress bit is set in the block if the block is
holding an exclusive mode PCM lock. This is required to synchronize block writes when
the serving instance held the original local role lock.
Note that a clean block, regardless of its PCM lock status, does not need a PI generated
when it is served to another instance.
Instance C
1008
Lock master
Instance D
Instance A Instance B
Instance C
1 NL0
Request to obtain a
shared lock on C
1008
Lock master
Instance D
Instance A Instance B
Instance C
1 NL0→ SL0
The request is
2 granted and
the instance 1008
Lock master converts the
lock status
Instance D
Step 2
The DLM grants the lock in shared mode with the local role and the mastering instance
sends a message with the grant to instance C. Instance C converts the NULL status on the
lock to shared mode, local role, with no past images (NL0→ SL0).
Instance A Instance B
Instance C
1 SL0
2 3
1008
Lock master Read request
Instance D
Step 3
Instance C initiates the I/O with a read request message to the disk for the block.
Instance A Instance B
Instance C
Block image
1 SL0 4 delivered
1008
2 3
1008
Lock master
Instance D
Step 4
The I/O completes in step 4 with the delivery of the block to instance C. Instance C now
holds the block with SCN 1008 using an SL0 lock.
Instance A Instance B
NL0
Instance C
1 SL0
1008
Request to obtain a
shared lock on B
1008
Lock master
Instance D
Instance A Instance B
NL0
Instance C
1 SL0
1008
2
Instruction to transfer 1008
Lock master the block to B for
shared access
Instance D
Step 2
The DLM discovers that the block is being held by instance C under an SL0 lock. The
DLM sends a request to instance C to transfer the block, for shared access, to instance B.
Instance A Instance B
NL0
3
1008
Instance C Block sent to B
1 SL0 indicating a
1008 shared mode lock
on B and C
2
1008
Lock master
Instance D
Step 3
Instance C ships a copy of the block to instance B with headers indicating that it is retaining
its lock in SL0 mode and that instance B should take out the same type of lock.
Note: In earlier releases, the read request from instance B would have been granted by the
DLM issuing a shared lock, but instance B would have needed to read the block from disk.
Instance A Instance B
NL0→ SL0
3
1008
Instance C
1 SL0
1008
4 1008
Lock master
Lock assumption
Instance D and status message
Step 4
Instance B converts the lock to shared mode, local role, with no PI and sends a message to
the DLM— that is, to instance D where the lock is mastered— to inform the DLM of the
newly-converted lock status. This message includes the lock status (SL0) on both of the
instances involved in the process, B and C.
The process would have been slightly different had the required block no longer been
available in the cache of the instance receiving the instruction to send the block. In this
case, the message sent in step 3 would simply have contained the lock information,
informing the receiving instance that it is free to obtain the required lock. After performing
the lock conversion, the receiving instance would have to read the block from disk.
In the example, this would result in instance C dropping the lock and in instance B
performing the disk I/O as shown in the earlier example (Example 1).
Instance A Instance B
NL0
Instance C
1 SL0
1008
Request to obtain an
exclusive lock on B
1008
Lock master
Instance D
Instance A Instance B
NL0
Instance C
1 SL0
1008
2
Instruction to transfer 1008
Lock master the block to B for
exclusive access
Instance D
Step 2
The DLM discovers that the block is being held by instance C and sends a request to that
instance. The request asks instance C to transfer the block, for exclusive access, to instance
B.
In a more complex situation, more than one instance could be holding SL0 locks on the
block for which an instance is requesting an exclusive lock. In such cases, the DLM sends,
to all but one of the holding instances, a message to transfer the block to null location. This
effectively tells these instances to close their shared locks on the block and to release the
buffers holding the block. Once this is done, the last remaining shared lock holder is
equivalent to instance C in this example— it is the only instance holding an SL0 lock on the
requested block. At this point, the actions performed by the DLM and the remaining lock
holder are identical to the steps shown in this example, with Instance C having the role of
the last holder of the requested shared lock.
Instance A Instance B
NL0
3
1008
Instance C Block and lock
1 SL0→ NL0 status (including
1008 C’s plan to close
its lock)
2
1008
Lock master
Instance D
Step 3
On receipt of the transfer message sent in step 2, instance C does the following:
1. Sends the block to B, as requested, along with an indicator that is closing its own lock
and supplying an exclusive lock for use by the receiving instance
2. Closes its own lock by converting it to NL0. This also marks the buffer holding the
block image as Consistent Read (CR), identifying it as available for reuse.
Note: In earlier releases, the read request from instance B would have been granted by the
DLM issuing an exclusive lock to instance B after forcing instance C to release its shared
lock.
However, at that point, instance B would have needed to read the block from disk, the copy
in
instance C would be unused.
Instance A Instance B
NL0→ XL0
3
1009
Instance C
1 NL0
1008
2
1008
Lock master Lock status
4 on instances
Instance D B and C
Step 4
On receipt of the block message, instance B converts its lock and sends a message to the
DLM. The message includes information about the assumption of lock mode and role
(XL0) on instance B and the closure of the lock on instance C. Instance B can now modify
the block. In this example, the block SCN becomes 1009 following the changes.
The process would have been slightly different had the required block no longer been
available in the cache of the instance receiving the instruction to send the block. In this
case, the message sent in step 3 would simply have contained the lock information,
informing the receiving instance that it is free to obtain the required lock. After performing
the lock conversion, the receiving instance would have to read the block from disk.
In the example, this would result in instance C dropping the lock and in instance B
performing the disk I/O as shown in the earlier example (Example 1).
Instance A Instance B
NL0 XL0
1009
Instance C
1
Request lock
in exclusive
mode
1008
Lock master
Instance D
Instance A Instance B
NL0 XL0
1009
Instance C
1 2
1008
Lock master
Instruction to transfer
exclusive lock to B
Instance D
Step 2
The DLM tells instance B to give up the block to satisfy the request from instance A for an
exclusive lock. This message will be sent immediately if the DLM has completed recording
the lock transactions from the previous example (Example 3). If these transactions are not
complete, the request from instance A will be queued until the DLM can process the
request.
Instance A Instance B
NL0 Exclusive-keep XL0→ NG1
copy of buffer
1009 3 1009
Instance C
1 2
1008
Lock master
Instance D
Step 3
Instance B completes its work on the block when it receives the message to transfer the
block to instance A. This involves
• Logging any changes to the block and forcing a log flush if this has not already
occurred.
• Converting its lock to NG1, indicating that the buffer now contains a PI, that is, a
history, level 1, copy of the block.
• Sending an exclusive-keep copy of the block buffer to instance A. This includes the
block image at SCN 1009, information that the instance B is holding a past image of
the block, and notification that the exclusive lock is available in global mode.
If there had been no changes to the block’s contents when the message was received by
instance B, the instance would simply send the block image to instance A and close its lock
(XL0→ NL0). This would allow the receiving instance to assume the exclusive lock in local
mode, just as in the read to write transfer shown in the previous example (Example 3).
Instance A Instance B
NL0→ XG0 NG1
1013 3 1009
Instance C
1 2
4
Lock
assumption
information
1008
Lock master
Instance D
Step 4
After instance A has received the block with the lock dispositions, instance A sends a lock
assumption message— including the lock information from instance B— to the DLM (in this
case, the mastering instance D). This tells the DLM that instance A has the lock with and
XG0 status and that instance B, the previous holder of the exclusive lock, is now a PI
holder of version 1009. Instance A is able to obtain the block SCN from the copy sent by
instance B because the copy sent to instance A contains all of the changes made by instance
B.
Once this is done, instance A can modify the block. In the example, the modification
converts the block to SCN 1013. Note that because it no longer has an exclusive lock,
instance B cannot make any further changes to the block even though it is required to
maintain a PI copy in the buffer cache.
Instance A Instance B
XG0 NG1
1013 1009
Instance C
Request lock in
shared mode 1
1008
Lock master
Instance D
Instance A Instance B
XG0 NG1
1013 1009
Instance C
2 Instruction NL0
to transfer
shared
lock to C 1
1008
Lock master
Instance D
Step 2
The DLM instructs instance A to transfer a shared lock to satisfy the request from instance
C. As before, this message will be sent immediately if the DLM has no lock transactions in
progress or else it will be queued.
Instance A Instance B
XG0→ SG1 Shared-keep NG1
3 copy of buffer
1013 1009
Instance C
2 NL0
1013
1008
Lock master
Instance D
Step 3
On receipt of the message to transfer the block, Instance A completes its work on the block
and sends a copy of the block image to instance C. As in the previous example (Example
4), this may involve logging changes and flushing the log buffer on instance A before
sending the block. In this case, an exclusive lock is not needed by the receiving instance, so
instance A downgrades its lock to a shared lock, but keeps its global role in order to
preserve the past image of the block.
After this is done, the instance sends a shared-keep copy of the block to instance C. As well
as the block current block contents, the message identifies the type of locks at each end of
the transfer: shared/global/PI in instance A and shared/global without PI in instance C.
Instance A Instance B
SG1 NG1
1013 3 1009
Instance C
2 NL0→ SG0
1013
1
4 1008
Lock
Lock master
assumption
Instance D information
Step 4
Instance C extracts the SCN from the block it received from instance A and constructs a
lock assumption message for the DLM. This contains sufficient information for the DLM to
record the new status of the lock on each of the instances, along with the SCN of the PI on
instance A. Instance C sends the completed message to instance D, the instance mastering
the lock for the DLM.
Note that, at the end of the transfer, instance A has the most recent PI for the block and that
the lock on instance C is in global mode because of the dirty PI block image still held in
instance A’s buffer.
Instance A Instance B
XG1 NG1
1013 1009
1
1008
Request write
at SCN 1009
or greater
Lock master
Instance D
Instance A Instance B
XG1 NG1
1013 1009
2 1
1008
Request write at
SCN 1009 or greater
Lock master
Instance D
Step 2
The DLM mastering node, instance D, sends a write request to the instance selected in step
1, instance A. The master also remembers that a write at the requested SCN is outstanding,
and does not allow another write to be requested until the current one is satisfied.
Instance A Instance B
XG1 4 Write notification NG1
1013 1009
3
2 Block write 1
1013
Lock master
Instance D
Step 3
Instance initiates the I/O with a write to disk.
Step 4
The I/O completes with a notify message back to instance A.
Having received the completion notification, instance A will log the completion and the
version written with a BWR and advance its checkpoint, but not force the log.
Instance A Instance B
XG1→ XL0 4 NG1
1013 1009
3
2 5 1
1013
Write
notification
Lock master
Instance D
Step 5
Instance A sends a notification to the DLM master node (instance D). It also includes an
assertion that it is going local because it wrote current.
Note: The order in which the two operations— writing the BWR (mentioned in step 4) and
sending the notification message— are performed is not critical, they can be done in parallel
and in any order.
Instance A Instance B
XG1→ XL0 4 NG1→ NL0
1013
3
2 5 1
1013
6 Flush PI
Lock master
instruction
Instance D
Step 6
On receipt of the write notification, the DLM master (instance D) sends each instance
holding a PI, which it recorded earlier, an instruction to flush the PI. It also sends a
notification, potentially redundant, to the current holder of the X mode lock (which may
have moved), even if it has no PIs. If no PIs remain, the instance holding the current X lock
is told to go to local role, and the flush to this instance will set a go-local flag. This will be
redundant if the current X holder did the write.
In the example, B is the only instance, other than the writing instance (instance A), that
holds a PI. When instance B receives the flush instruction from the DLM, the instance logs
a BWR that the block has been written, without flushing the log buffer. Instance B also
releases the block buffer and clears the record being kept of the write the instance initiated.
At the completion of this step, A holds the buffer in XL mode, and all other past images
have been purged.
Instance A Instance B
XL0
1013
Instance C
NL0
1
Shared lock
request to 1013
Lock master DLM
Instance D
Instance A Instance B
XL0
1013
Shared Instance C
transfer
NL0
command
2 passed
to lock 1
holder
1013
Lock master
Instance D
Step 2
The DLM forwards the request for the shared lock to the current holder, instance A, as a
command for a shared transfer.
Instance A Instance B
XL0→ SL0 3 Shared-keep
copy of buffer
1013 to requester
Instance C
NL0→ SL0
1013
2
1
1013
Lock master
Instance D
Step 3
The sending instance, A, has to convert its exclusive lock to shared. After converting its
lock, instance A sends the block to the requesting instance, C, with a shared-keep lock
status. Because the block is globally clean, the lock mode can remain local on both
instances.
Instance A Instance B
SL0 3
1013
Instance C
SL0
1013
2
1 4
Shared-keep
message 1013
Lock master to DLM
Instance D
Step 4
Instance C sends a shared-keep message to inform the DLM that the sender and recipient
instances (A and C) now both hold the lock as shared local. Once again, the DLM master
for the lock is on instance D.
• Cancellation
– Control-C from user
– Time out on lock request
– Process failure
• Instance holding XL lock is transferred while
writing
• Write race
• Write and notification errors
• Messages out of order
There are other situations that can cause problems but which are anticipated by the cache
fusion code. The main categories of these problems include
• Cancellation of a pending request caused by
– Control-C sent by a user
– Lock request had a time out which expired
– Failure of a foreground process
• The DLM sends a transfer to an instance with an exclusive, local lock while the
instance is writing the requested block
• The DLM sends an instruction to write a block which is no longer available in the
selected instance because of a change that was signaled but not received by the
DLM in time
• Messages will not necessarily reach the DLM in the logical order they were sent
• Various failures that could cause a write, or a notification of a write, to be properly
completed or recorded.
N1 N2 N3
Example
This slide shows an example of a Real Application Cluster environment consisting of three
instances, one per node. There are six open resources which can be PCM locks on data
blocks for example. These resources are hashed to six different hashed values and these
values are then evenly mapped to the three instances.
The open locks column represents the locks that each instance has on each resource.
N1 N2 N3
Example
If the instance in N2 crashes, the values HV2 and HV5 need to be remapped. Resources R2
and R5 need new master nodes. During reconfiguration, the DLM will map HV2 and HV5 to
N1 and N3 respectively. Hence, R2 will have master node at N1 and R5 will have master node
at N3.
The slide shows the hashed values and master mapping after the reconfiguration. It also shows
how the locks from the lost instance (N2) are cleared by the distributed lock manager.
5 4
Pass 1 recovery PCM reconfiguration write thaw
PCM release thaw
6 7
Locks claimed for recovery Claims done, PCM locks thaw
9 8
Pass 2 recovery Partially available
10 11
Individual block availability Recovery enqueue released
Recovery in Oracle9i
The recovery path in Oracle9i involves the following steps:
1. The instance, or instances, dies.
2. The failure is detected by the cluster manager or cluster group services.
3. Parallel Cache Management (PCM) locks are frozen for a time, as are write requests.
Enqueue locks are reconfigured quickly and become available.
4. The DLM commences its recovery and remastering of the PCM locks, which involves
rebuilding, on surviving instances, the lock masters lost due to the instances failures.
When this is complete, pending activities are processed after which PCM lock releases
and down converts are allowed.
5. At the same time, the recovery code grabs the enqueue lock, and does it’s first pass
recovery read of the log. It identifies the locks of the blocks that need to be recovered.
6. On completion of pass 1 and the DLM reconfiguration, recovery continues by
• obtaining buffer space for the recovery set, possibly doing writes to make room;
• claiming locks identified by pass 1;
• obtaining a source buffer, either from an instance’s buffer cache or by a disk read.
7. After the necessary locks are obtained, and the recovering instance has all the resources
it needs to complete pass 2 with no further intervention, the PCM lock space is
unfrozen.
8. The system is partially available, as blocks not in recovery may be operated on as
before. Blocks being recovered will be blocked by the locks held in the recovering
instance.
9. The cache marches through the second phase of its recovery, taking care of all blocks
identified in pass 1, recovering and writing each block, then releasing recovery locks.
10. Blocks become individually available as they are recovered, not all at once.
11. When all the blocks have been recovered, written, and recovery locks released, the
system is completely available, and the recovery enqueue is released.
In case of a multiple failure, when neither the latest PI copy nor any current copy have
survived, it may happen that the changes made to the block are spread over multiple logs of
the failed instances. To ensure complete recovery, the logs must be merged. Because only
the logs of the failed instances are required, the potential performance penalty for the log
merge is proportional to
Number of failed instances × Size of log per instance
The size of the logs can be controlled by checkpoint features. This calculation shows that
the multi-instance recovery performance penalty is similar to the price that pre-Oracle9i
multi-instance databases paid without cache fusion, which required the successive
application of all logs of failed instances. Therefore the total performance penalty of
recovery prior to cache fusion is also proportional to
Number of failed instances × Size of log per instance
The additional requirement of the Cache Fusion design compared to the pre-Cache Fusion
design is to merge the logs of the failed instances. The number of operations required for
that scales linearly with the size of the merged data sets.
Primary Secondary
Real Real
Application Application
Cluster System Cluster
Oracle HA Management Oracle HA
Packs Infrastructure Packs
Cluster Cluster
Framework Framework
Clustered Clustered
System System
RAID/Mirrored
Storage
Local Local
disk disk
/hdisk1/dbs/initorac1.ora /hdisk1/dbs/initorac2.ora
SPFILE=/dev/rdisk1/spfile SPFILE=/dev/rdisk1/spfile
/dev/rdisk1/spfile
orac1.instance_name=orac1
orac2.instance_name=orac2
…
GC_FILES_TO_LOCKS Parameter
Fixed locks were the only types of PCM locks available in the early days of multi-instance
Oracle databases. They were allocated at instance start up and persisted for the life of the
instance. To reduce the overhead of the DLM, fixed locks were 1:N locks— each lock
covered multiple blocks rather than just a single block.
In later versions, releasable locks were introduced. These locks were acquired from a pool
of locks when required and released back to the pool when they are no longer needed by the
instance. For the past few releases, releasable locks have been the default PCM locking
method. By default, releasable locks are 1:1, that is one lock covers exactly one block.
For data which could benefit from having more blocks to a lock, 1:N locks are the preferred
strategy. This includes data in files which are accessed by only one instance or are accessed
by several instances for read-mostly activity. In these cases, 1:N locks reduce the locking
overhead for these files and improve performance. The initialization parameter,
GC_FILES_TO_LOCKS, is used to assign 1:N locks to files. In previous releases, 1:N
locks assigned with the GC_FILES_TO_LOCKS could be defined as releasable, but
defaulted to fixed.
Releasable locks have a number of advantages over fixed locks:
• Instance start up times are faster because there are no fixed locks to open
• You have more flexibility in assigning hash locks— because all the 1:N locks are not
allocated at start-up but are created on demand, more hash locks can be specified.
For this reason, fixed locks are eliminated from Oracle9i— 1:1 and 1:N locks are all
releasable. The option to define 1:N as releasable is therefore no longer needed and has
been dropped from the GC_FILES_TO_LOCKS parameter syntax.
Note: To avoid performance problems caused by pre-Oracle9i pinging, you should only use
GC_FILES_TO_LOCKS to assign 1:N PCM locks on:
• Read-only or read-mostly files and tablespaces
• Files containing data that is modified only, or mainly, by just one instance
GC_DEFER_TIME Parameter
The GC_DEFER_TIME parameter defined a number of one-hundredths of a second that an
instance would wait before responding to a request to release or downgrade a PCM lock.
The intent was to give the instance an opportunity to finish any current activity on the block
(or blocks) covered by the lock before taking action on the lock request. The benefits of
setting this parameter were minimal because, in general, only a few blocks were in use
when a lock request was received and the delay impacted all of the lock requests for non-
active blocks. Tuning this parameter was difficult because of the lack of good guidelines
and barely-measurable performance improvements. For these reasons, GC_DEFER_TIME
has been made obsolete
Note: GC_DEFER_TIME is being retained as an underscore (hidden) parameter,
_GC_DEFER_TIME.
GC_RELEASABLE_LOCKS Parameter
In Oracle8i and earlier releases, if GC_FILES_TO_LOCKS was used to assign hash locks
to all the files in the database, then GC_RELEASABLE_LOCKS was occasionally set to be
less than DB_BLOCK_BUFFERS. This was done to save memory because less DLM locks
were needed.
However, in Oracle9i, hash locks are used only for read mostly tablespaces and the DLM
locks are much smaller. For these reasons, there is no requirement to reduce the number of
releasable locks. The number of releasable locks is fixed at DB_BLOCK_BUFFERS and the
GC_RELEASABLE_LOCKS parameter has been made obsolete.
Note: GC_RELEASABLE_LOCKS is being retained as an underscore (hidden) parameter,
_GC_RELEASABLE_LOCKS.
GC_ROLLBACK_LOCKS Parameter
This parameter was used to specify the lock mapping for rollback segments. If there was a
lot of pinging of UNDO blocks from a rollback segment it needed fine grain locks, if there
was not much pinging it needed coarse grain locks. The idea was to balance the cost of
pinging with the cost of getting locks, to achieve maximum performance.
The value of this parameter was reduced considerably by the introduction of the Consistent
Read (CR) Server in Oracle8i. The CR Server created read consistent block images on the
instance holding the rollback blocks and sends them to the requesting instance through the
interconnect. This eliminated the need to send rollback blocks to the requesting instance.
This parameter is obsolete in Oracle9i. Internally the rollback segments are protected by
locks with a grouping of 16. Since the UNDO blocks are created sequentially, this large
grouping should provide the best performance. The grouping is not too large, however,
because query requests may be sent to a node which has aged out the pertinent blocks. In
this case, any rollback block needed to build a read consistent image is read into the cache
of the querying instance under a shared PCM lock. Until the shared lock is released, the
rollback blocks covered by the lock cannot be modified by the instance to which the
rollback segment is assigned.
Instance Naming
Prior to the release of Oracle9i, instances were identified internally by number. Instance-
specific database objects, such as redo threads and free list groups, were also associated
with numeric initialization parameters, such as thread and instance_number. Instances in
those earlier releases had names, which were assigned using the ORACLE_SID
environment variable at the operating system level and the optional instance_name
initialization parameter. However, there were limitations with these naming techniques:
• INSTANCE_NAME values did not have to be unique in different instances of the
same database
• On some platforms, the ORACLE_SID could be the same for all instances of the
same database
This meant that instance names could not be used by management tools to identify
instances. Also, if the thread and instance_number parameters were not specified in an
instance’s initialization file, values were assigned based on startup order— the first instance
to start was assigned thread 1, the second was assigned thread 2, and so on. Thus there was
never a guaranteed assignment of these database objects based on instance names.
In Oracle9i, each instance of an Oracle Real Application Cluster database is required to
have a unique name assigned with the SID. The use of unique instance names enable the
system management tools to use instance names to identify instances to the user, with the
assurance that these names are unique.
Unique names also allow the instances associated with the same database to share an
initialization file through the use the SID as a parameter prefix as described earlier.
Windows
Windows
cluster Registry or UNIX
cluster
Text
UNIX file Raw
cluster device
Pre-Oracle9i Oracle9i
Pre-Installation
The OPSM software is installed as part of the Real Application Cluster Option. It is not
listed as a separately installable item in the Oracle Universal Installer.
As part of installation, DBCA already creates the Real Application Cluster configuration
according to the database name, SID prefix and list of nodes entered by the user. Users also
have to provide a raw device on which to store the OPSM configuration. Creating such a
raw device will be a pre-installation step.
Extended OPSCTL Commands
Configuration information is shared amongst the nodes by being stored as a binary file on a
shared raw disk, like the database files. You will no longer be able to change this mapping
by using a text editor. This configuration will need to be alterable via a command line
interface as well as by GUI tools such as DBCA.
New OPSCTL sub-commands allow you to configure Real Application Clusters from the
command line.
Windows Pre-Installation Tool
The Windows Preinstallation tool is a wizard, called Cluster Setup Wizard. This tool
incorporates OLM functionality in it and users can create the symbolic links before
installing OLM and OSDs. A help system is integrated with this wizard.
Reference: For information on the tools used to diagnose Oracle Real Application Clusters
refer to the course Improved Diagnosability Features.
MSCS Concepts
Oracle9i Real Application Clusters use an active-active shared storage cluster That is all the
nodes in the cluster can be online and capable of processing transactions at the same time.
Microsoft’s Windows has functionality to support clustering through the Microsoft Cluster
Server (MSCS), previously known as Wolfpack. MSCS is a generic clustering solution
which can be used to cluster virtually any Windows application It is based on an
active/passive design which emphasizes high availability rather than scalability or fault
tolerance. MSCS works by having an application active on only one node at any given
time. MSCS monitors the availability of the application and will restart the application on a
standby node in the case of failure.
MSCS requires that all of the nodes in the cluster share at least one disk which is used as a
quorum. The shared disk is only visible by the active node in the cluster and is failed over if
the active node fails. Clustered applications can put data, such as currently open documents,
on the shared disk. This allows the data to be recovered on the secondary node in the case
of an active node failure. Clustered applications can also use their local disks as long as the
data is not required to fail over. Currently, MSCS only supported two node clustering but
Microsoft has stated intentions to release an n-node version in the future.
MSCS Configuration
MSCS provides a way to configure the extended resource types in the initial creation
wizard or after the resource is created in the cluster administrator GUI. Configuration is
done in dialog boxes which are implemented in a separate module call the cluster
administrator extension DLL. The Oracle9i Real Application Clusters resource type uses a
cluster administrator extension DLL to allow proper configuration.
The Oracle9i Real Application Clusters architecture is designed to allow hardware vendors
to provide system dependent clusterware in modules collectively called the Oracle System
Dependent modules (OSD). Currently, most Windows modules are shipped with the
reference implementation supplied by Oracle. The OSD cluster manager (CM) is a module
which monitors the health of the instances and dependent processes in the cluster. An
Oracle9i Real Application Clusters MSCS resource interfaces with the CM. Hardware
vendors who wish to supply MSCS enabled OSDs must either ship the Oracle references
modules or integrate the changes into their CM modules.
Configuration Parameters
The cluster administrator extension DLL allows the user to configure the following
parameters:
• The name of the database. This is needed to distinguish between multiple databases
on the cluster.
• The instances designator which uniquely and globally differentiates instances in the
cluster.
• The behavior of the resource when the database is brought online: either to start the
Oracle9i Real Application Clusters instance and mount it when MSCS calls the
Online function or to override this functionality.
• The Oracle Net connect information for OCI queries used by the IsAlive function.
• The behavior of the Oracle9i Real Application Clusters resource when the database
is brought offline. The options are
– To stop the service
– To issue a SHUTDOWN command
– To do nothing.
The default behavior is to stop and start the database services when the database is brought
online and offline.
Online Function
The Online function either does nothing or else starts the database instances, depending on
the configuration specified by the user via the cluster administrator extension DLL.
If configured to bring up the database, the function first checks to see if the database service
is up. If it is not, the function starts the service and then mounts the database. It does not
return until confirming that the new database instance is functioning correctly or validating
the already-running instance.
Should the function not be able to bring up the instance, it returns an error. The amount of
time the function waits before concluding that there is a problem can be configured by the
cluster administrator extension DLL.
Offline Function
The Offline function either does nothing or stops the database instance depending on the
configuration specified by the user in the cluster administrator extension DLL. When
shutting down the database it is possible to specify whether the service is “shutdown”
(process alive but not mounted) or completely stopped (process terminated).
LookAlive Function
The MSCS specifications require the LookAlive function to take less than 30 millisecond to
do a best guess assessment of the database instances health. This function makes use of the
CM’s knowledge of group membership.
To avoid split-brain problems, the serving nodes recognized by Oracle9i Real Application
Clusters and by MSCS are the same in case the communication between nodes is severed.
The LookAlive function relies on this integrity between the two products.
IsAlive Function
The IsAlive function can take more time than the LookAlive to determine if the instance is
available.
In addition to the IsAlive and LookAlive functions, the MSCS resource type DLL registers
an event with MSCS which it uses to signal a failure more quickly. The Oracle9i Real
Application Clusters resource type DLL uses this functionality to reduce the latency of
error detection.
Terminate Function
The Terminate function attempts to stop the database, using a normal shutdown to avoid the
overhead of instance recovery. If it is unable to do this, it uses more drastic measures such
as killing the instance process.