Sunteți pe pagina 1din 38

RAC Workload Management

Originally presented at Hostos Symposium


March 2008
Alex Gorbachev
The Pythian Group
gorbachev@pythian.com
Table of Contents
RAC Workload Management 1
Table of Contents 2
Introduction 1
Why Workload Management? 1
Programming languages 1
Connection establishing basics 2
Client side connection load balancing
Oracle Cluster Services 4
Online orders processing example
Preferred vs. available instances
Node affinity
Server-side FAN callouts
Services and Resource Manager
Services abstraction for applications
Server-side connection load balancing 15
Server-side CLB without services
Server-side CLB with services
Bug 5593693 using db_domain
Nodes with different capacity
Fast Application Notifications 28
HA Events
Load Balancing Advisory events
Run-time load balancing on the client-side 29
RLB with JDBC Implicit Connection Cache
Debugging JDBC run-time load balancing on the client
Oracle Notifications Client - ONC
Make your own LBA 34
Final thoughts 35
References 35
Oracle RAC Workload Management Alex Gorbachev

Introduction
While I was working on one of my previous presentations, RAC Connection
Management, I realized that the topic is too broad for a single presentation and,
probably, requires a full day class to cover all details of connection management in
Oracle. I was just able to cover the basics in my 45-60 minutes presentations and give a
bit of overview for more advanced topics.

One of the areas that attracted most of the interest was workload balancing using new
Oracle 10g features such as Oracle Cluster Services, Fast Application Notifications
(FAN) and Load Balancing Advisory (LBA). This paper focuses on internal
implementation details and possible pitfalls rather than how-to instructions found in the
manuals. This should assist in troubleshooting and let you understand the technology
better.

Its assumed that reader is aware of general RAC architecture as well as has basic
understanding of connection management in RAC - how connections are established
and failed over, what are the role of client process, listener and Oracle instance.

Why Workload Management?


The purpose of Oracle database workload management is to achieve the most efficient
distribution of the load across available number of RAC instances. The keyword here is
efficiency. Depending on your requirements, efficiency has different meaning. Here are
some of the common targets:

Average response time

Stability of response time or guaranteed response time

Average throughput

Different applications have different criteria of efficiency and it depends a lot whether its
OLTP, data warehouse, batch or reporting functionality.

Programming languages
You might think that regardless of application development languages, database
workload management techniques would stay the same. Yes and no.

Oracle has made some efforts by standardizing connection management across


different client drivers and application languages but there are still some differences in
configuration and implementation details. The principle, though, stays the same.

-1
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

In order to distinguish the differences, we often need to consider 4 types of client


drivers:

OCI - Oracle Call Interface. Its used in C and C++ applications as well as some other
languages that are built on top of OCI layer such as Perl and PHP.

Thick JDBC - this is Java libraries that are built around OCI libraries to implement
JDBC API standard for Java language.

Native JDBC or Thin JDBC - this is pure Java implementation that doesnt need
underlying OCI layer.

ODP.NET - Oracle Data Provider for .Net. This is Oracle provided drivers for Microsoft
.Net environment to be used, typically, from C#, VB.NET. Its actually based on OCI
with some additional features and .Net integration.

When we get to the run-time workload balancing, the examples of run-time load
balancing and ONS API are using thin JDBC driver. OCI and ODP.NET specifics can be
found in [BLUNDHILD].

Connection establishing basics


I have covered the basics of connection management in my presentation RAC
Connection Management if you had a chance to see it. James Morle wrote an excellent
paper [JMORLE] where he covered connection establishment steps and provided
excellent details on connection failover.

For the purpose of this presentation you will need to distinguish the steps of connection
process and few basic principles.

Client side connection load balancing

Client connection descriptor for a RAC database will be in the following form regardless
of client drivers:
(DESCRIPTION=
(FAILOVER=ON)
(ADDRESS_LIST=
(LOAD_BALANCE=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=lh1-vip.oracloid.com)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=lh2-vip.oracloid.com)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=lh3-vip.oracloid.com)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=lh4-vip.oracloid.com)(PORT=1521))
)
(CONNECT_DATA=(SERVICE_NAME=service10g.oracloid.com))
)

-2
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

There are several key points you should know in order to make your application
connection descriptors RAC aware:

Several connection points should be referenced - you want to include all listeners of
your RAC cluster or enough of them so that there is no case when all of them are
down at the same time.

You want to use VIP addresses. See [JMORLE] on why you should do so.

Specify LOAD_BALANCE=ON so that client program picks up a random listener. This is


the first stage of workload balancing, which is especially important for applications with
short living connections. Its known often as client-side connection load balancing and
allows to distribute workload generated by new connection requests amongst all
listeners.

Specify FAILOVER=ON to make connection requests try another address on failure.


This provides transparency in the situation when some of the listeners are not
available to establish connection.

Using SERVICE_NAME instead of SID is a crucial part of RAC connection descriptor.

Here is what happens when client issues a connection request using the descriptor
above:

Random address is selected from the list

If connection attempt fails (no listener, network time out, listener doesnt know of
requested service_name) then another address is chosen randomly.

James Morle in [JMORLE] concluded that client-side connection load balancing seems
to provide pretty much uniform distribution and I could only confirm it with my
observations.

I have already mentioned and I want to re-iterate once again that client-side connection
load balancing is done by Oracle client that randomly chooses an address from address
list of a connection descriptor. This causes connection requests to be distributed across
all referenced listeners.

Now, depending on listener configuration, the listener itself can forward connection
request to any of the instances providing requested service if remote listener
configuration is in place, more about which is later when we discuss Oracle Cluster
Services.

Right now we need to distinguish two cases:

-3
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

1. Each listener is only aware of instances and, consequently, services available on the
same node that the listener is running on. This is called local-only listener
registration. Oracle instance reports only to the locally running listener. When
listener receives connection request, it can only be established with the local
database instance. If local instances dont provide requested service, an error is
returned that listener is not aware of any instances providing requested service. The
connection refusal will cause client to try another address thanks to FAILOVER=ON.
Using *only* local listener registration will leave only client-side connection
load balancing in place.

2. In addition to local listener registration, remote listeners are also configured


(remote_listener init.ora parameter). With properly configured remote listeners
registration, each listener is aware of all instances providing each service and
connection request is forwarded to one of the instances. This is called server-side
connection load balancing discussed later. Remote listener registration let the
client to balance connection requests across available listeners while leaving
to listener the decision which instance should receive new connection.

Oracle Cluster Services


Oracle Cluster Services should be considered the main concept of Oracle database
workload management. Its an integration layer that tightens together several other
features. Those features are mostly applicable in RAC environments but can be useful
for single instance configurations as well.

Service can be thought of as a subset of application functionality or, another way


around, group of applications, usually, with unique workload requirements and traffic
patterns that can be managed as a whole. Oracle database allows DBA to specify how
workload is handled for each service to fit requirements of availability and performance
of certain business functions.

The best way to understand Oracle Cluster Services concept and capabilities is to use
an example.

Online orders processing example

Lets take a simple example of online orders processing system. The most important
business function is taking customers orders. It has direct end-user impact and if
system is not available to take new orders, its a direct hit to the revenue stream.
Another functionality is web content display which doesnt hit the database directly but
goes from application cache refreshed asynchronously from the database. Visitors can
also leave feedback on orders and items which is more critical than content refresh as
outage has direct end-user impact. However, its not as important for business as taking
orders. There is a separate orders processing application - back end application that is

-4
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

used by internal users for orders handling and shipment. The last piece is data
extraction process for a corporate financial application and few other batches.

What we can do is to create several services in Oracle database that are mapped to the
application functions we identified. Based on the figures from capacity planning, we
know how much CPU capacity we need for each component so its possible to arrange
workload with affinity to certain nodes to minimize RAC overhead due to Cache Fusion.

Preferred vs. available instances

When services are created in Oracle database, we can specify which instances provide
particular service by default. Those are called preferred instances. We can have several
instances providing same service as well as several services provided by the same
instance.

We can also define when the service can run in case one of the preferred instances is
not available. Those are potential (or backup) instances. In Oracle Cluster Services
terminology, these instances are available to run the service.

For our example, we create NEW_ORD service on all instances and make DB1 and
DB2 instances preferred, while DB3 and DB4 are available to take over. CONTENT
service requires most of the resources and runs on 3 nodes and has the fourth node as
available. Order processing and batch data extraction services are concentrated on one
node to avoid impact on more sensitive business functions.

Oracle Services for online order processing system

DB1
NEW_ORD DB2
NEW_ORD DB3
NEW_ORD DB4
NEW_ORD

CONTENT CONTENT CONTENT CONTENT

FEEDBACK FEEDBACK FEEDBACK FEEDBACK

PROC_ORD PROC_ORD

BATCH BATCH

Oracle Clusterware will automatically bring service up on one or more of the available
instances in case one or more preferred instances become unavailable. If instance DB1

-5
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

becomes unavailable, Oracle will shift NEW_ORD to one of DB3 or DB4 and bring
CONTENT service up on DB4.

Which component of Oracle RAC software stack controls services placement? Prior to
Oracle 10g, DBA could manually control services provided by each instance using
SERICE_NAMES init.ora parameter. However, no automation and notion of preferred and
available instances. 10g still provides this possibility but, with introduction of
Clusterware, DBA can now create CRS resources for services. Preferred and available
instances are defined in Clusterware and CRS component is responsible for monitoring
services availability and failover services as required.

The rules of services failover is very simplistic. For example, Oracle Clusterware doesnt
automatically shift services back when preferred instance comes back to the cluster.
Server-side FAN callbacks mechanism allows to implement more complex algorithms.
For example, we can implement the rule which will stop BATCH service in case
CONTENT or NEW_ORD services are running on DB4 instance.

Node affinity

As I mentioned already, we can arrange services in such a way that minimizes Cache
Fusion impact in RAC. Lets say that online orders processing application wasnt
designed very well, just like in real life. Scaling an ill-designed application by moving it to
RAC is one of the most disastrous projects I can imagine.

Luckily, three functional areas (NEW_ORD, CONTENT, FEEDBACK) work mostly with
their own subsets of database tables with very light overlap. If each of the three services
can be satisfied with capacity of a single node, we can distribute services in the
following way:

-6
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Oracle Services for online order processing system - node affinity

DB1
NEW_ORD DB2 DB3 DB4
NEW_ORD

CONTENT CONTENT

FEEDBACK FEEDBACK

PROC_ORD PROC_ORD

BATCH

If instance DB1 becomes unavailable, Oracle Clusterware will move NEW_ORD service
to DB4 instance which is available for NEW_ORD service. However, Order Processing
back-end application and batches are running on that node and that would negatively
impact new orders placement response time. One of the workarounds is manual DBA
intervention who can stop BATCH and PROC_ORD services on DB4 instance since
they are not critical but it might take a while for DBA to react.

What if we had a mechanism to automate this manual DBA response? This is where
Server-Side FAN Callouts come to play.

Server-side FAN callouts

We are going to cover FAN Events in more details soon but for now its suffice to know
that Oracle Clusterware keeps track of certain events on the database server including
events when instance fails, starts or stops and services coming up and down on each
node.

Server-side FAN callouts mechanism provides a simple solution to a DBA - we can


place an executable shell script in $ORA_CRS_HOME/racg/usrco directory of CRS
Oracle home and Oracle Clusterware will call this script on every HA event and pass
event information on the command line.

Here is an example. I will just use 3 nodes and skip FEEDBACK service. First, lets
create 4 services ord_new, content, proc_ord and batch:

srvctl add service -d g4 -s ord_new -r g41 -a g43


srvctl add service -d g4 -s content -r g42 -a g43

-7
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

srvctl add service -d g4 -s proc_ord -r g43 -a g42


srvctl add service -d g4 -s batch -r g43

My database name is g4 and I have 3 instances - g41, g42 and g43. Lets start them
and check the status:

[oracle@lh1 ~]$ srvctl config service -d g4


new_ord PREF: g41 AVAIL: g43
content PREF: g42 AVAIL: g43
proc_ord PREF: g43 AVAIL: g42
batch PREF: g43 AVAIL:
[oracle@lh1 ~]$ srvctl start service -d g4
[oracle@lh1 ~]$ srvctl status service -d g4
Service new_ord is running on instance(s) g41
Service content is running on instance(s) g42
Service proc_ord is running on instance(s) g43
Service batch is running on instance(s) g43

Now lets stop g41 instance and see what happens:

[oracle@lh1 ~]$ srvctl stop instance -d g4 -i g41


[oracle@lh1 ~]$ srvctl status service -d g4
Service service10g is not running.
Service new_ord is running on instance(s) g43
Service content is running on instance(s) g42
Service proc_ord is running on instance(s) g43
Service batch is running on instance(s) g43

When I stopped the instance CRS triggered 3 events:

Instance g41 down on node lh1; reason - user request

Service new_ord down for instance g41 on node lh1; reason - failure (instance is
gone)

Service new_ord up for instance g43 on node lh3; reason - failure (relocated from lh1)

The result is that instance g43 hosts critical new_ord service as well as 2 other heavy-
weights - proc_ord and batch. What we want for our automated response is to stop
proc_ord and batch services in case either new_ord or content services are started on
instance g41 as the result of a failure and not direct user request. Indeed, we still want
to be able to manipulate services manually without automation kicking in.

How do I know what events are triggered? Besides the reference ([ORACADG] Chapter
4, section Fast Application Notification High Availability Events), we can create a simple
FAN callout script to log all events:

-8
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

#! /bin/sh
FAN_LOGFILE=/nfs1/oracle/product/10.2.0/crs/log/`hostname`/racg/
fan_ha_events.log
echo "
$* 'reported='`date`" >> $FAN_LOGFILE &

This is what weve got on node lh1.oracloid.com during instance shutdown:

INSTANCE VERSION=1.0 service=g4.oracloid.com database=g4


instance=g41 host=lh1 status=down reason=user timestamp=08-Feb-
2008 12:30:40 'reported='Fri Feb 8 12:30:40 EST 2008

SERVICEMEMBER VERSION=1.0 service=new_ord.oracloid.com


database=g4 instance=g41 host=lh1 status=down reason=failure
timestamp=08-Feb-2008 12:30:41 'reported='Fri Feb 8 12:30:41
EST 2008

and on lh3.oracloid.com:

SERVICEMEMBER VERSION=1.0 service=new_ord.oracloid.com


database=g4 instance=g43 host=lh3 status=up reason=failure
card=1 timestamp=08-Feb-2008 12:15:53 'reported='Fri Feb 8
12:15:54 EST 2008

So lets create a simple callout script:

#!/bin/sh

ORA_CRS_HOME=/nfs1/oracle/oracle/product/10.2.0/crs
SRVCTL=$ORA_CRS_HOME/bin/srvctl
LOG=$ORA_CRS_HOME/log/service_rebalance.log

EVENTTYPE=$1

for ARGS in $* ; do
PROPERTY=`echo $ARGS | awk -F"=" '{print $1}'`
VALUE=`echo $ARGS | awk -F"=" '{print $2}'`
case $PROPERTY in
VERSION|version) VERSION=$VALUE ;;
SERVICE|service) SERVICE=$VALUE ;;
DATABASE|database) DATABASE=$VALUE ;;
INSTANCE|instance) INSTANCE=$VALUE ;;
HOST|host) HOST=$VALUE ;;
STATUS|status) STATUS=$VALUE ;;
REASON|reason) REASON=$VALUE ;;
CARD|card) CARDINALITY=$VALUE ;;
TIMESTAMP|timestamp) LOGDATE=$VALUE ;;

-9
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

??:??:??) LOGTIME=$PROPERTY ;;
esac
done

if [ "$EVENTTYPE" = "SERVICEMEMBER" ] &&


([ "$SERVICE" = "new_ord" ] || [ "$SERVICE" = "content" ]) &&
[ "$DATABASE" = "g4" ] &&
[ "$INSTANCE" = "g43" ] &&
[ "$STATUS" = "up" ] &&
[ "$REASON" = "failure" ] ; then

echo "`hostname` `date`: Service $SERVICE is up on instance


$INSTANCE due to a failure. Stopping proc_ord and batch
services..." >> $LOG

echo "$SRVCTL stop service -d $DATABASE -i $INSTANCE -s


proc_ord" >> $LOG
$SRVCTL stop service -d $DATABASE -i $INSTANCE -s proc_ord >>
$LOG 2>&1

echo "$SRVCTL stop service -d $DATABASE -i $INSTANCE -s batch"


>> $LOG
$SRVCTL stop service -d $DATABASE -i $INSTANCE -s batch >> $LOG
2>&1

echo "$SRVCTL status service -d $DATABASE" >> $LOG


$SRVCTL status service -d $DATABASE >> $LOG 2>&1

fi

The script parses the arguments and checks for matching events. If event is that one of
services new_ord or content is up on instance g43 as the result of a failure then it stops
proc_ord and batch services. Here it the result from the log (formatted to readability):
lh1.oracloid.com Fri Feb 8 13:22:46 EST 2008: Service new_ord is up
on instance g43 due to a failure. Stopping proc_ord and batch services...
/nfs1/oracle/oracle/product/10.2.0/crs/bin/srvctl stop service -d g4 -i g43
-s proc_ord
/nfs1/oracle/oracle/product/10.2.0/crs/bin/srvctl stop service -d g4 -i g43 -s batch
/nfs1/oracle/oracle/product/10.2.0/crs/bin/srvctl status service -d g4
Service service10g is not running.
Service new_ord is running on instance(s) g43
Service content is running on instance(s) g42
Service proc_ord is not running.
Service batch is not running.

You can also use server-side FAN callouts to relocate services back to the preferred
instances as soon as they become available. This and some other examples you can
find in [OTNSAMPLE].

- 10
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Is there a way to avoid such relatively complex manipulations? This example has only
three nodes and four services but imagine a ten nodes cluster with 42 services. That
gets more complex than a chess game as one of my customers said. This is when
Resource Manager comes in handy.

Services and Resource Manager

Resource Manager is an Oracle feature that allows a DBA to prioritize workload within
Oracle database instance. The purpose of workload manager is to favor more critical
business functions when resources are sparse.

In application to our example of online orders processing system, we can consider


configuring Resource Manager to minimize impact of data extraction batches and order
processing back-end application. This way we dont need to stop those lower priority
services but rather can rely on Resource Manager to prioritize workload in accordance
with predefined resource plan.

The discussion of Resource Manager functionality is outside of scope of this paper.


Refer to [ODAG] for more details. However, why its important to mention Resource
Manager here is that service can be used as criteria to automatically assign session to a
certain resource consumer group.

Resource Manager PL/SQL API provides DBMS_RESOURCE_MANAGER package. We


can use DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING procedure to
map certain services to appropriate consumer groups. Assuming we have consumer
groups identified for each of our example services with the same name plus _GROUP
suffix, we can configure mapping as following:

BEGIN
DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING
(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'NEW_ORD',
'NEW_ORD_GROUP');
DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING
(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'CONTENT',
'CONTENT_GROUP');
DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING
(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'FEEDBACK',
'FEEDBACK_GROUP');
DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING
(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'PROC_ORD',
'PROC_ORD_GROUP');
DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING
(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'BATCH',
'BATCH_GROUP');
END;

- 11
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Using Resource Manager API, we should configure resource allocation for those groups
in priorities according to the business impact. I.e. NEW_ORD_GROUP and
FEEDBACK_GROUP would have the highest priority followed by CONTENT_GROUP
and then by PROC_ORD_GROUP and BATCH_GROUP. For more details on consumer
groups mapping see [ODAG] Section Specifying Session-toConsumer Group Mapping
Rules.

Please note that Resource Manager is not RAC-aware meaning that it can only control
resources allocation between consumer groups on each instance in isolation and not
across the whole cluster. Another limitation is that Resource Manager works inside one
instance only so if you have two or more instances on the node, it wont be able to take
that into account.

Now is a good time to cover integration of services with AWR.

Services and AWR

As you probably know, Oracle 10g has the new set of features known under the
common name Automatic Workload Repository (AWR). I think I can safely say that
AWR is a hybrid of Statspack and 10046 trace, in a nutshell. AWR captures a lot of
statistics with very detailed granularity. AWR also provides several aggregated views on
this data and one of them is service-aggregated perspective.

Here are some of the Oracle views with service-aggregated performance data:

V$SERVICE_EVENT
V$SERVICE_STATS
V$SERVICE_WAIT_CLASS
V$SERVICEMETRIC
V$SERVICEMETRIC_HISTORY
DBA_HIST_SERVICE_%

Service-aggregated statistics collected by AWR are also used for server-side


connection load balancing as well as for run-time load balancing, both of which are
discussed later in this paper.

Services abstraction for applications

How does application know where the service is running now? Well, the answer is - it
doesnt. Recall that client-side connection load balancing picks a listener randomly.
What happens next is the listener directs connection request further to one of the
instances providing requested service so each listener should know which services are
available on each node of the cluster.

- 12
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

This awareness can be achieved by setting remote_listener init.ora parameter


appropriately. Its discussed in more details in the section about server-side connection
load balancing.

PMON will notify the listeners when services become available on its instance. Jumping
a little forward, I should say that listeners also subscribe to HA events and can quickly
clean up services from failed instances so that new connection requests are not routed
there.

We can use lsnrctl status (or lsnrctl service for more details) to display
information about services that listener is aware of:

[oracle@lh3 ~]$ lsnrctl status


...
Service "batch.oracloid.com" has 1 instance(s).
Instance "g43", status READY, has 2 handler(s) for this
service...
Service "content.oracloid.com" has 1 instance(s).
Instance "g42", status READY, has 1 handler(s) for this
service...
Service "g4.oracloid.com" has 3 instance(s).
Instance "g41", status READY, has 1 handler(s) for this
service...
Instance "g42", status READY, has 1 handler(s) for this
service...
Instance "g43", status READY, has 2 handler(s) for this
service...
Service "new_ord.oracloid.com" has 1 instance(s).
Instance "g41", status READY, has 1 handler(s) for this
service...
Service "proc_ord.oracloid.com" has 1 instance(s).
Instance "g43", status READY, has 2 handler(s) for this
service...
The command completed successfully

[oracle@lh3 admin]$ lsnrctl service


...
Service "batch.oracloid.com" has 1 instance(s).
Instance "g43", status READY, has 2 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh3-vip)(PORT=1521))

- 13
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Service "content.oracloid.com" has 1 instance(s).


Instance "g42", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh2-vip)(PORT=1521))
Service "g4.oracloid.com" has 3 instance(s).
Instance "g41", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh1-vip)(PORT=1521))
Instance "g42", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh2-vip)(PORT=1521))
Instance "g43", status READY, has 2 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh3-vip)(PORT=1521))
Service "new_ord.oracloid.com" has 1 instance(s).
Instance "g41", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh1-vip)(PORT=1521))
Service "proc_ord.oracloid.com" has 1 instance(s).
Instance "g43", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh3-vip)(PORT=1521))
The command completed successfully

We can see from the output above that the listener on lh3 node is aware of all services
be it local (proc_ord and batch) or remote (new_ord and content) service.

- 14
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

There is also a default service with the same name as DB_NAME, g4, that is up on all
nodes by default. We shouldnt touch this service normally and shouldnt modify it so if
some of your workload must be distributed across all instances of the cluster, dont use
default service but rather create a new one with all instances as preferred.

At this point its probably appropriate to ask a very important question - how does
listener decide which instance should it assign for a new connection if more than one
are available for requested service?

Server-side connection load balancing


When listener has more than one instance providing requested service, it needs to
make a decision as to where the new connection request should be routed - to which
instance.

Before we start more detailed discussion on the mechanisms of server-side connection


load balancing, lets quickly review how to properly configure it. I suggest you check
[JMORLE] for more details on virtual IPs and why they should be used.

For three nodes cluster we need the following entries in tnsnames.ora in


$ORACLE_HOME/network/admin:

LISTENER_LH1 =
(ADDRESS = (PROTOCOL = TCP)(HOST = lh1-vip)(PORT = 1521))

LISTENER_LH2 =
(ADDRESS = (PROTOCOL = TCP)(HOST = lh2-vip)(PORT = 1521))

LISTENER_LH3 =
(ADDRESS = (PROTOCOL = TCP)(HOST = lh3-vip)(PORT = 1521))

LISTENERS_LH =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = lh1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = lh2-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = lh3-vip)(PORT = 1521))
)

For database instances we should set local_listener parameter to reference


tnsnames.ora descriptor for a local listener while remote_listener parameter should be
the same for all instances - LISTENERS_LH.

ALTER SYSTEM SET local_listener=LISTENER_LH1 SID=g41;


ALTER SYSTEM SET local_listener=LISTENER_LH2 SID=g42;
ALTER SYSTEM SET local_listener=LISTENER_LH3 SID=g43;
ALTER SYSTEM SET remote_listener=LISTENERS_LH SID=*;
- 15
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

When everything configured correctly, you should see the following as part of lsnrctl
service command:

Service "g4" has 3 instance(s).


Instance "g41", status READY, has 2 handler(s) for this
service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh1-vip)(PORT=1521))
"DEDICATED" established:36 refused:0 state:ready
LOCAL SERVER
Instance "g42", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:36 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh2-vip)(PORT=1521))
Instance "g43", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:36 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=lh3-vip)(PORT=1521))

- 16
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Below you will find two nodes schematic example of remote listeners registration.

LISTENER_LH41 LISTENER_LH42

G41 G42

lh1 lh2
Client nodes must be able to resolve IP aliases lh1-vip, lh2-vip and lh3-vip as they are
stated in tnsnames.ora descriptors used for local_listener and remote listener even if
client connection strings are using IPs or other aliases, perhaps, including domain
name. This is important as server-side connection load balancing will cause connection
requests to be redirected using those aliases.

Note that DBCA in Oracle 10g Release 2 doesnt set local_listener and this causes local
host names to be used for connections that are redirected to remote instances.
Consequently, your virtual IPs are not used impacting connection failover capabilities.

Server-side CLB without services

Prior to Oracle Database 10g Release 2, listener was only able to make routing decision
based on host load or instance load. In fact, this is still the case with 10g Release 2 and
11g when, for whatever reason, listener doesnt have the load information for a service.
This is why it still makes sense to discuss this mechanism.

- 17
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

LISTENER_G44

Which
instance?

G41 G42 G43


PMON process of an Oracle instance is responsible for registration with local and
remote listeners. PMON also periodically notifies listener about workload conditions for
the instance and the host its running on.

Enabling listener trace on USER level, we can have a peak on the information that
listener gets. Here is an example:

[oracle@lh1 ~]$ lsnrctl trace user

LSNRCTL for Linux: Version 10.2.0.3.0 - Production on 09-FEB-


2008 12:59:07

Copyright (c) 1991, 2006, Oracle. All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Opened trace file:
/nfs1/oracle/oracle/product/10.2.0/db_1/network/trace/listener_lh1
.trc
The command completed successfully

- 18
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

[oracle@lh1 ~]$ tail -f


/nfs1/oracle/oracle/product/10.2.0/db_1/network/trace/listener_l
h1.trc | grep "nsglgrDoRegister: inst loads:"
nsglgrDoRegister: inst loads: ld1:128 mld1:5120 ld2:6 mld2:170
nsglgrDoRegister: inst loads: ld1:135 mld1:5120 ld2:5 mld2:170
nsglgrDoRegister: inst loads: ld1:33 mld1:5120 ld2:6 mld2:170
nsglgrDoRegister: inst loads: ld1:122 mld1:5120 ld2:7 mld2:170
nsglgrDoRegister: inst loads: ld1:39 mld1:5120 ld2:4 mld2:170
nsglgrDoRegister: inst loads: ld1:89 mld1:5120 ld2:4 mld2:170
nsglgrDoRegister: inst loads: ld1:168 mld1:5120 ld2:6 mld2:170

Note that if there are no connects to and disconnects from the instances then you might
need to wait for a while. Alternatively, you can simply connect/disconnect in another
session and this will cause PMON to send updates to the listener.

What we see in the trace is two pairs of load data (ld) and max load data (mld) -
ld1/mld1 + ld2/mld2. One pair represents node load and another pair - instance load.
Which one is which? That depends on the listener configuration.

DBAs have been often advised to set listener parameter in listener.ora file
PREFER_LEAST_LOADED_NODE_<LISTENER>=OFF. This is suggested as a solution to
uneven connection distribution amongst RAC instances.

Standard listener behavior is to send connections to the least loaded node. Thus, spikes
in CPU consumption can really screw up connections distribution which has very
negative impact on the applications with persistent (or relatively long living) connections.

Setting PREFER_LEAST_LOADED_NODE=OFF causes listener to use count of user


sessions as criteria for choosing least loaded instance instead of choosing least loaded
node. This way persistent connections distribution is not affected by CPU difference
during connection establishing time.

It turned our that internal implementation of this parameter is very simple. When
PREFER_LEAST_LOADED_NODE=ON or not defined (ON is default), listener sets ld1/mld1
pair as node load and ld2/mld2 as instance load. When parameter is set to OFF, listener
swaps load data and ld1/mld1 represent instance load while ld2/mld2 - node load.

So what are those numbers exactly? Instance load, as you probably already figured out,
is number of user connections. You can validate it querying GV$SESSION:

- 19
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

SQL> select inst_id, count(*) from gv$session where type='USER'


group by inst_id;

INST_ID COUNT(*)
---------- ----------
1 5
2 7
3 5

This is what ld1 or ld2 is set to. Note that querying GV$ tables spawns parallel slaves
on each instance so it will always return number of session higher by one.

What about maximum instance load - mld? Its simply maximum number of sessions
per instance - sessions init.ora parameter:

SQL> select value from v$parameter where name='sessions';


VALUE
------------
170

What about node load? Maximum node load is based on the number of CPUs. On Linux
and Solaris SPARC I observed that its calculated as number of CPUs * 5120. PMON
calculates it based on cpu_count init.ora parameter. I would expect it to be the same
on all platforms.

Current node load seems to be calculated based on the run queue state. It correlates
very well with 1 minute load average. I also traced system calls of PMON process on
Linux and it reads /proc/loadavg file that represents load averages and current run-
queue state. It seems that formula is roughly something like the following but there is
slight discrepancy:

node_load = 1_min_load_average * 256

Back to the listener now. As I cold see the algorithm how listener picks the instance is
unchanged regardless of PREFER_LEAST_LOADED_NODE parameter. It first evaluates
instances based on ld1/mld1 pairs and, if there are two instances with the same
attractiveness, then listener compares ld2/mld2 pairs.

Unfortunately, its not possible to distinguish from the trace which instance reports
particular load data. There is one useful trick that we can employ. We can set sessions
parameter on each node so that it different by one session - it wouldnt make any
noticeable difference while let us distinguish load updates for each instance. I set up
sessions parameter to 171,172,173 for g41, g42 and g43 respectively and I can clearly
attribute the load updates now:

- 20
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

nsglgrDoRegister: inst loads: ld1:341 mld1:5120 ld2:6 mld2:172


nsglgrDoRegister: inst loads: ld1:110 mld1:5120 ld2:5 mld2:173
nsglgrDoRegister: inst loads: ld1:184 mld1:5120 ld2:5 mld2:171
nsglgrDoRegister: inst loads: ld1:99 mld1:5120 ld2:6 mld2:173
nsglgrDoRegister: inst loads: ld1:253 mld1:5120 ld2:7 mld2:172
nsglgrDoRegister: inst loads: ld1:256 mld1:5120 ld2:6 mld2:171
nsglgrDoRegister: inst loads: ld1:184 mld1:5120 ld2:7 mld2:171
nsglgrDoRegister: inst loads: ld1:115 mld1:5120 ld2:7 mld2:

PMON is notified on every connect and disconnect because it tracks all the sessions
and needs to be aware of them to perform its duties. Since opening and closing
connections causes a change to instance load, PMON, waking up, will notify listeners as
well.

There is the way to trace PMON process as well - 10257 trace name context
forever, level 16.

Here is an example of the trace output:

*** 2007-09-19 05:52:24.797


kmmlrl: update for session drop delta:
111 106 10 10 82
kmmlrl: service11g.oracloid.com goodness 8
kmmlrl: update for service goodness
kmmlrl: 55 processes
kmmlrl: node load 358
kmmlrl: instance load 19
kmmlrl: nsgr update returned 0

The trace above is from 11g and for now we need to make a note of two lines that are
marked bold. This is what is reported at ld1 and ld2 in listener trace (depending on
PREFER_LEAST_LOADED_NODE parameter). This is another way to observe how each
instance reports node and instance load. We will see later that PMON trace contains
other useful stuff.

To summarize, PMON reports instance load (based on load average/run queue) and
instance load (count of USER sessions) to the listener as well as maximum instance
load (sessions init.ora parameter) and maximum node load (cpu_count*5120). Listener
depending on PREFER_LEAST_LOADED_NODE uses this load information to determine
where to route the connection - to the least loaded node or to the instance with the
fewest number of USER sessions.

In practice, short living connections should use PREFER_LEAST_LOADED_NODE=ON


(default) whereas long connections should probably use
PREFER_LEAST_LOADED_NODE=OFF set explicitly. There is more about connections

- 21
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

classification in the next section which discusses the latest connection load balancing
approach, which is valid when using services configuration properly.

Server-side CLB with services

With introduction of services, listener has got the ability to balance connections based
on service statistics. This is where AWR comes into play with service-aggregated
metrics. Note that this only appeared in Oracle Database 10g Release 2.

Oracle provides two distinctive options for service-based connection balancing - for long
sessions and short sessions.

In order to manage workload efficiently, its very important to know your application
connection life-cycle. This will greatly influence the approach of connection
management. Usually, connections types would fall into one of the following categories:

Short living sessions when applications connects for each transaction, does some
work and then disconnect. This is probably the most inefficient way of connecting to
the database. Unfortunately, some application stacks dictate this approach. PHP or
Perl web applications with Apache is the infamous example.

Persistent connections. Application binary establishes a connection, typically at


startup and use it until it exits or connection fails.

Connection pooling is, strictly speaking, a variation of persistent connection


approach when application tier establishes number of connections that can be shared
by set of application binaries or threads. Depending on connection pooling
characteristics, connections can be established and released dynamically.

Oracle provides standard connection pooling features in all of its drivers and
they are typically very flexible and powerful.

Custom connection pooling is another variation and is sometimes exploited by


big teams that are integrating it within their framework.

Application Server connection pooling is yet another kind of connection pool.


Oracle Application Server builds up on top of standard connection pooling using
JDBC, while other vendors could use custom implementations.

Long connections include ones from connection pool and, especially, persistent
connections. Required characteristic is that connection stays idle for a significant part of
connection lifetime. This fact is often overseen but its very important to take into
account as you see later when we discuss real-time load balancing.

- 22
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Short living connections are the ones that connect to the database, do some work and
disconnect. These are the connections that do some work for most of the time they are
connected.

Note, long sessions that are active almost 100% of their connection time, such as as
batches, are much closer to short living connections than to long connections (in terms
of workload management). Data warehouse environment that connects for each
request/report should typically be considered as short connections as well even though
it might span several minutes.

Behavior of the pooled connections varies depending on the pool characteristics. If


connections are largely over-allocated and not released quickly when idle then they
should be considered as long connections. This is preferred connection method with
run-time load balancing. If connection allocation is tight and idle connections are
releases relatively quickly then connections pattern geared mostly towards short
connections and its a bit more difficult to choose the right configuration.

Sometimes the choice between long and short connections is obvious whereas in other
cases it might be tricky so its very useful to understand key internals of connection load
balancing and to be able to track down whats going on under the hood.

Each service has attribute CLB_GOAL (Connection Load Balancing goal) which can
take value CLB_GOAL_LONG or CLB_GOAL_SHORT (constants in DBMS_SERVICE
package). It can be specified when service is created or modified using DBMS_SERVICE
package. To see current status, query DBA_SERVICES view, which has column
CLB_GOAL.

There is nothing to be done in the listener configuration to use server-side load


balancing (except remote registrations) - service attractiveness for for each instance is
passed in the same way as instance and node load - PMON process broadcasts to local
and remote listeners - and we can trace it in exactly the same way.

Here is what we can see in the listener trace on USER level:

[oracle@lh1 trace]$ tail -f listener_lh1.trc | \


grep "nsglgrDoRegister: service:service10g"
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:47
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:48
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:49
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:49
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:50
- 23
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

nsglgrDoRegister: service:service10g what:4 value:1


nsglgrDoRegister: service:service10g what:2 value:50

Again, unfortunately, there is no notion of instance in the trace. Note that there are two
characteristics per service reported. We can get more meaningful result from PMON by
setting already familiar event 10257 on level 16:

*** 2008-02-10 15:20:00.289


kmmlrl: service10g goodness 50
kmmlrl: update for service goodness
kmmlrl: 77 processes
kmmlrl: instance load 55
kmmlrl: nsgr update returned 0

In PMON trace we see service goodness - its the same value are reported in listener
trace line what:2. What is service goodness then?

Goodness is a ratio describing attractiveness of the instance for new connections to a


given service. The lower the value - the more attractive the instance and, vice-versa, the
higher the value - the less attractive the instance. Oracle has misnamed the
characteristic as usual - since higher values mean worse service on the instance, the
ratio should have been called badness. Im sure its marketing vetoed calculating
service badness in Oracle database.

OK. What about the second value - the one reported on the line what:4 in listener
trace? Apparently its goodness delta. Delta is how much goodness is expected to
change when we add one more connection. Typically, it would be a positive value since
adding one more connection to the instance should only decrease its attractiveness that
should result in a higher value of goodness.

Now that we know how to spy on listener and PMON, its good time to answer the
question - how are the goodness and delta calculated? And the answer is - it depends!

Goodness calculation depends on CLB_GOAL attribute of a service. The example above


is for CLB_GOAL_LONG. Connection balancing principle for CLB_GOAL_LONG targets
achieving equal sessions distribution for a service across all instances. Goodness is
calculated as the current number of the sessions connected:

SQL> select inst_id, count(*)


from gv$session
where service_name='service10g'
group by inst_id;

INST_ID COUNT(*)
---------- ----------
1 50
1 51
- 24
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

What about delta value. Well, if goodness is represented as number of sessions then
adding one more session will increase goodness value by one so delta is constant and
equal to 1 for CLB_GOAL_LONG. Delta is never changed so thats probably why PMON
doesnt post it in the trace. We can see it in the listener trace when service is first
registered on service up:

nsglgrDoRegister: service:service10g flag:2 goodness:51 delta:1

Since service goodness is reported together with node/instance load, we can use the
same trick to distinguish which lines correspond to which instance - goodness and delta
seems to follow instance and node load:

nsglgrDoRegister: inst loads: ld1:125 mld1:5120 ld2:32 mld2:172


nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:28
nsglgrDoRegister: service:g4 what:4 value:1
nsglgrDoRegister: inst loads: ld1:174 mld1:5120 ld2:34 mld2:171
nsglgrDoRegister: service:service10g what:4 value:1
nsglgrDoRegister: service:service10g what:2 value:27
nsglgrDoRegister: service:g4 what:4 value:1
nsglgrDoRegister: service:g4 what:2 value:1

The trick with different number of sessions seems to work for simple cases but I noticed
it can get out of sync when there are many services and service updates.

CLB_GOAL_LONG mechanism is simple but things get more interesting when


CLB_GOAL_SHORT is used. This is when AWR service metrics come handy.

Service metrics are calculated by MMNL, a new background process in Oracle


Database 10g. V$SERVICEMETRIC view exposes current metrics and
V$SERVICEMETRIC_HISTORY - historical values. Interesting that V$SERVICEMETRIC
also contains information about goodness and delta, that are reported by PMON to the
listeners so its very handy for a quick check.

Service metrics are tracked based on one minute and five seconds intervals so for each
service we get 2 rows per instance in V$SERVICEMETRIC view - for minute interval and
five seconds interval. Interval boundaries are in columns BEGIN_TIME and END_TIME,
and interval length in centi-seconds is in INTSIZE_SEC. There are self-explanatory
columns GOODNESS and DELTA which we know already. Whats more interesting is
another 4 columns that represent average number of calls per second, amount of CPU
per call as well as DB Time per second and per call:

CPUPERCALL
DBTIMEPERCALL
CALLSPERSEC
DBTIMEPERSEC
- 25
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

These are average values for respective interval. Note that cumulative counters for
consumed CPU and DB Time as well as other statistics are available in
V$SERVICE_STATS.

Its now time to introduce Load Balancing Advisory (LBA). Usually, LBA goal is
mentioned in the context of real-time load balancing and its not widely known that LBA
also used by server-side connection load balancing for short connections (with
CLB_GOAL_SHORT).

LBA goal can be set to one of three - GOAL_NONE, GOAL_SERVICE_TIME and


GOAL_THROUGHPUT (constants in DBMS_SERVICE package). GOAL_NONE disables LBA
for run-time load balancing while GOAL_SERVICE_TIME or GOAL_THROUGHPUT enable
LBA for optimization of service time or throughput respectively. However, all three are
active for CLB_GOAL_SHORT connection balancing.

The combination of CLB_GOAL_SHORT and GOAL_NONE, works similarly to default pre-


10gR2 listener connection balancing (PREFER_LEAST_LOADED_NODE=ON as default).
Goodness is calculated equally for all services with CLB_GOAL_SHORT and GOAL_NONE
while delta is separate for each service on the instance. There is a very strong
correlation with CPU load average and documentation confirms that as well.

GOAL_THROUGHPUT instructs Oracle to use rate of work in calculations while


GOAL_SERVICE_TIME - response time. Interesting, that LBA advisory for real-time load
balancing and for connection load balancing dont always match but more about it later.

Bug 5593693 using db_domain

There is a nasty bug that I couldnt identify for a while - see [MLN5593693.8]. It kept
screwing up my experiments and produced inconsistent results on some systems. The
bug is only noticeable when db_domain init.ora parameter is not empty. Here is what
happens.

Lets assume that my database name is g4 and db_domain is set to oracloid.com.


If I create a service without domain, db_domain value is appended automatically when
its registered with a listener. This means if I add service service10g, it is visible as
service10g.oracloid.com and thats how it should be defined in application
connection string. However, default domain is not added during service creation and its
stored without domain internally.

What happens now is that somewhere between PMON and a listener (I suspect its the
listener process), mismatch occurs and listener doesnt account for updates sent for that
service without a domain. There is a bit of speculations here so, perhaps, the root
problem is somewhere else but this assumption explains the erroneous behavior very
well so far. Metalink Note 5593693.8 says that this happens when service
network_name is not fully qualified and db_domain is non-null.

- 26
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

In this situation, listener falls back to the old connection load balancing principle based
on instance and node load data. I think this is why I often see people still messing with
PREFER_LEAST_LOADED_NODE even in Oracle Database 10g Release 2. On the other
hand, I saw many complains that PREFER_LEAST_LOADED_NODE doesnt work
anymore. Both are wrong... or not.

It seems that there is a simple workaround - creating fully qualified services with
domain. Indeed, it works very well when service is created by DBMS_SERVICE package
with domain explicitly specified. However, DBMS_SERVICE package doesnt create CRS
resources in RAC environment. Thats why srvctl must be used to create cluster
services for RAC database and later they can be modified using
DBMS_SERVICES.MODIFY_SERVICE.

Unfortunately, srvctl refuses to create a new service with domain matching default
database domain - instead, srvctl requires creating a service with empty domain so
that default domain is added during listener registration. The problem is that CRS now
uses plain service name (without domain) in the database and that puts us back to
square one - service names mismatch.

There is one exception I saw in this situation - if database is created with db_domain
set then default service based on db_name and db_domain is created with fully
qualified name. In my situation, default service g4.oracloid.com was stored in the
database as fully qualified name including domain. However, this service is not
controlled by CRS and shouldnt be touched.

Nodes with different capacity

Both, old and new, Connection Load Balancing algorithms have inherent capabilities to
balance connection across nodes of different capacity.

Node load maximum based on the number of CPUs. Instance load can be accounted by
setting SESSIONS init.ora parameter accordingly to the node capacity be it CPU or
memory.

Service Metrics based load balancing is a bit more complex and, as usual, its not bug
free in the first few releases - see [MLN6613950.8]. Apparently, service goodness
calculations do not always account for different node capacity. Metalink Note 6613950.8
identifies the bug but doesnt provide much detailed information. Its said that all current
versions believed to be affected (<11.2). On the other hand, the fix is included in
10.2.0.4 patchset. One-off patches are available for 10.2.0.2 and 10.2.0.3 patchsets.
However, it might be a dirty fix instead of a conceptual resolution.

From here we will move to details of Load Balancing Advisory and how it works with
run-time workload balancing. Before we do that, we need to review Oracle 10g Fast
Application Notifications mechanism.
- 27
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Fast Application Notifications


Oracle 10g Fast Application Notifications (FAN) feature was first introduced in Oracle
Database 10g Release 1. The core of FAN is the new daemon running from Oracle
home - Oracle Notification Services or ONS Server.

This program has its roots in Oracle Application Server as far as my familiarity with it
goes. Its a simple daemon providing functionality of exchanging events based on
publish and subscribe mechanism.

HA Events

Oracle Database 10g Release 1 uses ONS daemon as a messenger for High
Availability (HA) events. These include events triggered on node coming up and down,
instance failures, service up and down and etc. The purpose of HA events is to support
Fast Connection Failover (FCF) which we do not cover here in details. We already had
a chance to deal with FAN HA events earlier during the example of server-side FAN
callouts. Listener, for example, also subscribes to some events to speed up services
availability updates on instance failure.

FAN HA events are passed to ONS (published) and ONS duty is to forward it to all
subscribers that are interested in that particular event. In addition, the event is passed to
other ONS daemons registered as remote ONS servers. In RAC environment, ONS
daemons on each node are aware of each other. In addition, ONS servers running on
application tiers (particularly important for Oracle Application Servers) should be
registered as remote ONS as well.

See [BLUNDHILD] for more details about Fast Application Notifications, Fast
Connection Failover and ONS configuration.

Load Balancing Advisory events

Oracle Database 10g Release 2 extended FAN framework by adding Load Balancing
Advisory events. Every 30 seconds, a CRS daemon racgimon gets LBA info from
database instances and passes it to ONS daemon that pushes them further to
subscribers that are interested in LBA events (sometimes also referenced as Service
Metrics events).

LBA events are only generated for services with LBA goal set to GOAL_SERVICE_TIME
or GOAL_THROUGHPUT. Default value, GOAL_NONE, disables service metrics events.

If we increase log level for ONS daemon to 9 (loglevel=9 in


$ORA_CRS_HOME/opmn/conf/ons.config), we can see all events in ONS log file
(here re-formatted for readability):

[oracle@lh1 logs]$ tail -100f ons.log.lh1 | \


- 28
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

grep -e ^VERSION= -e "eventType:"


eventType: database/event/servicemetrics/service10g
VERSION=1.0 database=g4 {
{instance=g42 percent=34 flag=GOOD}
{instance=g43 percent=33 flag=GOOD}
{instance=g41 percent=33 flag=GOOD}
} timestamp=2008-02-11 03:24:18

Its very handy to tail this log file to see the LBA trend in the terminal. Percent value
represents recommendation what part of work for that service should flow to particular
instance. As you noticed already, sum of all percentages is 100% as one would expect.
Flag denotes service metrics status for a service on particular node. GOOD flag means
advisory is available and is up to date, NO_DATA means missing data for that
instance/service and UNKNOWN flag is usually posted when no connections has been
made and workload processed on a give instance.

Another way to see LBA events is in SYS.SYS$SERVICE_METRICS_TAB table. This it


the table behind Queue SYS$SERVICE_METRICS. LBA events are managed in Oracle
database using Oracle AQ mechanism. Here is how to see last 10 events:

SQL> SELECT * FROM (SELECT user_data


FROM sys.sys$service_metrics_tab
ORDER BY enq_time DESC
) WHERE rownum <=10;

USER_DATA(SRV, PAYLOAD)
--------------------------------------------------------------------------------
SYS$RLBTYP('service10g', 'VERSION=1.0 database=g4 service=service10g { {instance
=g42 percent=34 flag=GOOD}{instance=g43 percent=33 flag=GOOD}{instance=g41 perce
nt=34 flag=GOOD} } timestamp=2008-02-11 03:43:52')

Detailed case studied and benchmarks of different advisory settings can be found in the
paper presented via web-cast on IOUG RAC SIG by Naoko Kanemoto Kurinami - see
[NAOKO].

Run-time load balancing on the client-side


Compare to connection load balancing done by a listener, run-time load balancing has
to be handled by applications themselves. LBA in Oracle database server is nothing
more than recommendation what proportion of the workload for a service should each
instance receive in order to maximize LBA goal.

Bulk of the work has already been done by Oracle and, in many cases, its a matter of
few configuration changes and few additional lines of code. Unfortunately, out of the
box, run-time load balancing is only available when using standard Oracle connection
pooling mechanisms. Otherwise, there is much more efforts and coding required.

- 29
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

The idea is similar for all implementations. Application needs to subscribe for Service
Metrics Events (LBA events) and route the workload between connections based on
these recommendations.

During the presentation we cover run-time load balancing using thin JDBC and below
you can find detailed explanation to the relevant Java code. My original idea was to
include OCI and ODP.NET examples as well. However, due to the lack of time, I wont
be able to cover it during presentation and adding examples here, just for the sake of it,
doesnt make much sense. Instead, great examples and explanation is available in
[BLUNDHILD].

RLB with JDBC Implicit Connection Cache

In this example, we create application that uses Oracle JDBC implicit connection cache
with Fast Connection Failover (FCF) option enabled. FCF in JDBC includes automatic
support for LBA events.

The application will subscribe remotely with ONS daemons running on the RAC nodes
so ONS servers should be configured for remote subscription (remoteport parameter
in ons.config for 10g; 11g is able to pick up remote port from OCR stored
configuration for remote ONS). Connection cache manager subscribes to HA for FCF
functionality and LBA events for run-time load balancing. Connection cache will
gravitate workload to the instance according to the percentage weights from LBA
events. There is some factor of randomness added as well.

Here is the except from the full example. First of all, we create data source as usual.
Note format of the connection descriptor.

OracleDataSource ods = new OracleDataSource()


ods.setUser("oracloid");
ods.setPassword("***");
String dbURL="jdbc:oracle:thin:@(DESCRIPTION=(FAILOVER=ON)" +
"(LOAD_BALANCE=ON)" +
"(ADDRESS=(PROTOCOL=TCP)" +
"(HOST=lh1-vip)(PORT=1521))" +
"(ADDRESS=(PROTOCOL=TCP)" +
"(HOST=lh2-vip)(PORT=1521))" +
"(ADDRESS=(PROTOCOL=TCP)" +
"(HOST=lh3-vip)(PORT=1521))" +
"(CONNECT_DATA=(SERVICE_NAME=service10g)))";

Next we enable implicit connection cache and set its properties:

ods.setConnectionCachingEnabled(true);
Properties prop = new Properties();
prop.setProperty("MinLimit", "5");

- 30
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

prop.setProperty("MaxLimit", "20");
prop.setProperty("InitialLimit", "10");
prop.put (oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR,"" + (1000)); // 1 sec
ods.setConnectionCacheProperties(prop);
ods.setConnectionCacheName("MyCache");

Above, we set TCP Connect Timeout property that helps to shorten delay during VIP
failover. Its needed due to JDBC specific retry implementation. See [MLN453347.1] for
more details.

Next, we need to enable FCF and subscribe with remote ONS servers to receive FAN
events. JDBC connection cache manager will handle subscription transparently - we just
need to tell it which hosts and ports ONS servers are listening on:

ods.setFastConnectionFailoverEnabled(true);
ods.setONSConfiguration("nodes=lh1:6201,lh2:6201,lh3:6201");

Note that if not all ONS servers are available then it will cause significant delay due to
TCP timeout for each unavailable ONS. I thought to use VIPs for remote ONS port but
Im not sure if there are harmful side effects - its not documented anywhere that ONS
can be configured on VIPs.

After setup is completed, the first call to ods.getConnection() will allocate initial
connections.

The full program used in the demo during presentation is available with online version of
the paper. Its a threaded Java program that executes dummy transactions in parallel.

Debugging JDBC run-time load balancing on the client

Oracle provides ability to run JDBC drivers in debug mode. There is a special version of
drivers compiled with additional debug options and it has suffix _g appended. If you use
JDK 1.5 and normal thin JDBC driver is ojdbc5.jar then debug version is
ojdbc5_g.jar. You should use debug version in you CLASSPATH.

You will also need to create a logging properties file (it can be managed at run-time but
we need a simple example here). Here is the minimal setting you would need to trace
operations related to implicit connection cache including run-time load balancing.

# default location - current directory


java.util.logging.FileHandler.pattern = jdbc.log
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.level = ALL
java.util.logging.FileHandler.formatter =
java.util.logging.SimpleFormatter
handlers = java.util.logging.FileHandler

- 31
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

oracle.jdbc.pool.level = FINEST

Providing properties file name is OracleLog.properties and debug JDBC .jar file is
in the CLASSPATH, we just need to add couple option on the command line starting the
program:

-Doracle.jdbc.Trace=true
-Djava.util.logging.config.file=OracleLog.properties

The trace produces huge amount of lines but for the purpose of RLB, you would want to
search for parseRuntimeLoadBalancingEvent:
...
Mar 2, 2008 12:43:19 AM oracle.jdbc.pool.OracleConnectionCacheManager
parseRuntimeLoadBalancingEvent
TRACE_16: Enter: "s10gr2", [B@82eca8
...
Mar 2, 2008 12:43:19 AM oracle.jdbc.pool.OracleImplicitConnectionCache
updateDatabaseInstance
TRACE_16: Enter: "g5", "g52", 58, 1
...
Mar 2, 2008 12:43:19 AM oracle.jdbc.pool.OracleImplicitConnectionCache
updateDatabaseInstance
TRACE_16: Enter: "g5", "g51", 42, 1
...
TRACE_20: Debug: (RLB) OracleImplicitConnectionCache.processDatabaseInstances: <<<
ServiceName=s10gr2, Connections to g52 = 8, Attempted Connection Requests to this instance=0,
Total Connections=30 >>>
Mar 2, 2008 12:43:19 AM oracle.jdbc.pool.OracleImplicitConnectionCache
processDatabaseInstances
TRACE_20: Debug: (RLB) OracleImplicitConnectionCache.processDatabaseInstances: <<<
ServiceName=s10gr2, Connections to g51 = 22, Attempted Connection Requests to this
instance=0, Total Connections=30 >>>
...

If you browse through the trace a bit more you can see confirmation that there is some
factor of randomness introduced:
TRACE_20: Debug:
OracleImplicitConnectionCache.retrieveFromConnectionList()RandomPercent=21:
percentSum=58

This trace would also be useful to monitor how FCF is working in JDBC but you might
need to enable tracing for oracle.jdbc and oracle.jdbc.driver in addition to
oracle.jdbc.pool (in OracleLog.properties file).

Oracle Notifications Client - ONC

Sometime, it might not be possible to use implicit connection cache. Other times,
standard workload management of implicit connection cache does not work well for a
particular application. In this case, its possible to use Oracle Notification Client (ONC)
API. I.e. its possible to write your own notification subscriber and implement custom
processing of LBA events.

- 32
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Unfortunately, in the current implementation of ONC API (both 10g and 11g), local ONS
daemon must be running. Remote subscription has not yet been made available for
ONC. This defeats the purpose of thin JDBC as you wont be able to deploy application
without installing a full blown Oracle client (I think that its possible to identify only ONS
related parts and distribute them but it wont be supported by Oracle officially). ONS is
daemon is also included with Oracle Applications Server.

Full demo of ONC Subscriber is available online (demo 8) and here is a short
walkthrough using Java ONC API.

Firs of all, we need to include ons.jar in CLASSPATH and imports should include

import oracle.ons.*;

Note that you dont have to use JDBC driver itself.

We have to define path to the Oracle home with running local ONS server:

System.setProperty("oracle.ons.oraclehome",
"/nfs1/oracle/oracle/product/10.2.0/client");

Alternatively, you can specify it on the command-line using option -


Doracle.ons.oraclehome=<Oracle home>. This way you dont have to hardcode
it in the application code.

Next we create a subscriber and subscribe to either all events

Subscriber s = new Subscriber("", "");

or selected events (like LBA events) of particular database (g4 database on lh cluster)

Subscriber s = new Subscriber("\"g4/lh\"",


"database/event/servicemetrics/*");

Then we can issue a blocking call waiting for the next event

Notification n = s.receive(true);

The call will return when event is posted and we can process it as needed. Below we
print event header, type, body length and the content of the body:

n.print(); // headers
System.out.println(n.type()); // event type
System.out.println(n.body().length); // body length
String event = new String(n.body()); // transform to String
System.out.println(event); // print content

- 33
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

We can even use ONC API to generate events for other subscribers. But... hold on!
Does it mean we can come up with our own Load Balancing Advisories using our own
mechanisms?

Make your own LBA


I couldnt find information about publishing your own events to ONS. It seems that ONS
daemons are using plain HTTP protocol for communications so I thought to produce
fake HTTP post requests by publishing my own events.

In one of the examples, I saw a comment about special event type javaAPI and
example onc_publisher.java. Unfortunately, I wasnt able to locate this example on
either Metalink or Google so I had to do some research and found class Publisher class
and functionality related to publishing custom events.

Soon I was able to simulate my own Load Balancing Advisory. The demo example is
provided online as usual (demo 9) with some comments following.

Just like with subscribing using ONC API, a local ONS is required. You can use ONS
running from Clusterware home as well. As with subscriber example,
oracle.ons.oraclehome must be set either on command-line or at run-time.

LBA and HA events are simple text strings in proper format. Lets say we want to send
advisory suggesting 80% of the load to instance g51 and 20% to instance g52:

String event="VERSION=1.0 database=g5 { " +


"{instance=g51 percent=80 flag=GOOD aff=FALSE}" +
"{instance=g51 percent=20 flag=GOOD aff=FALSE}" +
"} timestamp=2008-03-04 09:20:00";

First of all, we instantiate a Publisher instance:

Publisher p = new Publisher("onc/mylba");

Then we need to create our notification (it accepts only byte array so we need to convert
string):

Notification n = new Notification(


"database/event/servicemetrics/s10gr2",
"","", event.getBytes());

Two empty arguments are affected components and affected nodes. Those are useful
for HA events but can be ignored for LBA events so I left them empty.

Whats left is to publish that new notification:

p.publish(n);
- 34
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

Its that simple!

You can check ONS log file or use ONC subscriber example to verify it. If you trace your
JDBC connection cache, you would see that event was processed appropriately.

The example used in the presentation is available online (demo 9).

Final thoughts
Oracle has done a nice job providing DBAs and application developers with the number
of tools that allow automation of workload management in RAC to the great extent.
However, it still does require careful configuration, planning, understanding of
application workload patterns and, most importantly, tight collaboration between
developers and database administrators to achieve full integration of applications and
database.

The more standard Oracle components are used, the easier it is to make them work
together and integrate database workload management into full end-to-end enterprise
load balancing stack. On the other hand, Oracle left the door open for third-party
vendors and custom solutions which require more efforts but, at the same time, provide
more flexibility.

Even for applications that are not based on open standards, its possible to implement
custom workload management scenarios with ONC API or server-based FAN callouts
that can trigger external manipulation of application configuration in response to certain
events.

References
BLUNDHILD - Barb Lundhild - Automatic Workload Management with Oracle Real
Application Clusters (Oracle 11g)

JMORLE - James Morle - RAC Connection Management

ORACADG - Oracle Real Application Clusters Administration and Deployment Guide


11g Release 1, Chapter 4 Introduction to Automatic Workload Management

OTNSAMPLE - Oracle Real Application Clusters Sample Code on OTN

ODAG - Oracle Database Administrator's Guide 11g Release 1 (11.1), Chapter 25


Managing Resource Allocation with Oracle Database Resource Manager

NAOKO - Naoko Kanemoto Kurinami - NS Solutions: Real World Test Results for
Maximizing Load Balancing with Real Application Clusters

- 35
Copyright The Pythian Group 2008
Oracle RAC Workload Management Alex Gorbachev

MLN5593693.8 - Metalink Note 5593693.8: Connection load balancing does not work
if db_domain is set

MLN6613950.8 - Metalink Note 6613950.8: RAC load balancing does not work when
one server is heavily loaded

MLN453347.1 - Metalink Note 453347.1: ORACLEDATASOURCE Freezes For 3 Mins


On Getting Connection When ConnectionCachingEnabled Is Set To True

Bug 6151350 - RESTRICTION ON SETONSCONFIGURATION

Bug 6315760 - SETONSCONFIGURATION ONLY REGISTERS A MAXIMUM OF


THREE NODES

+++

About Alex Gorbachev


Vice President, Pythian East Asia/Pacific

Few DBAs are as well equipped as Alex to handle any kind of database scenario. He
brings to The Pythian Group strong analytical skills and over 10 years of experience
with Oracle technologies. He has special expertise in high-availability, RAC, and
performance monitoring.

In his work before joining The Pythian Group, Alex developed and administered Oracle
databases and applications for LUKOIL, the Russian petrochemicals giant. He later
oversaw high-availability mission-critical applications for Amadeus, the leading Global
Distribution System and biggest processor of travel bookings in the world.

Alex is also a respected figure in the Oracle world, frequently presenting at international
conferences, and regularly publishing articles on The Pythian Group's weblog. Alex is a
member of OakTable Network and has been awarded Oracle Ace title by Oracle.

Alex holds a Bachelor of Engineering from Nizhegorodsky State Technical University.

About The Pythian Group

The Pythian Group is a leading provider of remote database administration services for
Oracle, SQL Server and MySQL. Providing top-to-bottom expertise -- from forensics and
architectural design to monitoring and tuning production environments -- is a hallmark of
our offering. With our expertise, flexible contracting, fractional resourcing, 24/7 service
and global reach we can achieve improved quality of service at a reduced cost.
www.pythian.com

- 36
Copyright The Pythian Group 2008

S-ar putea să vă placă și