Sunteți pe pagina 1din 12

Troubleshooting RV Daemon High Memory

Consumption on Solaris

Introduction
This article details troubleshooting and avoiding running into one or all of the following
situations:

-- RV daemons (rvd or rvrd) log file showing the entry out of memory or Drainqueue: out of
memory.

-- RV daemon process consumes an unusually high amount of memory.

-- RV daemon is not responsive due to its high memory consumption.

Determining if RV Daemon is Consuming Excessive


Amounts of Memory
You can use any of the following utilities:

a. top command

The top utility does not come standard with Solaris. You can obtain this utility from .

Example: run 'top -d 20 -s 15 > top.log' while the RV daemon is running. You can increase
the -d parameter to make sure that rvrd will be in the process list and use the -s parameter
to increase the update interval.

An example of a top result follows:

Generated by Clearspace on 2010-12-01-06:00


1
Troubleshooting RV Daemon High Memory Consumption on Solaris

In the example above the rvrd64 process appears to be consuming a large amount of
memory.

b. prstat (example: prstat c 1 5). Please run man prstat for further information on the options
used.

Generated by Clearspace on 2010-12-01-06:00


2
Troubleshooting RV Daemon High Memory Consumption on Solaris

You can also monitor the system processes memory consumption pattern. Run the following
or similar script continuously and save the output to file.

#! /usr/bin/sh

#!/bin/sh

while true

do

now=`date "+%m/%d/%y:%H:%M:%S"`

psStatus=`ps -ef -o vsz,pcpu,args|grep rvd|grep -v grep`

echo "$now $psStatus"

top -b

sleep 60

done

Understanding how the RV Daemon uses


Memory
a. rvd

The rvd daemon consumes memory to process incoming and outgoing messages. It also
uses memory to buffer data for retransmission and to maintain a buffer associated with each
client connection.

b. rvrd

In addition to the memory used by the rvd daemon mentioned above, the rvrd daemon
uses memory to process internal protocol messages and to communicate global routing

Generated by Clearspace on 2010-12-01-06:00


3
Troubleshooting RV Daemon High Memory Consumption on Solaris

configuration changes. Furthermore, the rvrd daemon consumes memory for each created
router instance and connected neighbor.

Possible Causes for RV Daemons High Memory


Consumption
In general, the reasons for high memory utilization by the RV daemon are:

a. Sending rate is too high; Slow Consumer/Fast Producer.

b. Message size is large.

c. Low CPU availability. This causes the RV daemon to become starved for CPU resources.
This may be seen more frequently with high message rates and a large number of client
connections.

d. Slow disk writes.

e. Slow network capability (slow WAN access for example).

The following discusses the high memory consumption issue and how to determine the
cause of the problem through a process of elimination of non-related causes:

a. Is the memory leak caused at the OS layer?


RV calls various OS system functions at various places in its code. A memory leak could
be caused by a function call at this level. Please make sure that hosts OS is up-to-date
regarding patches.

You can use a system command called smpatch (requires root privilege to execute) with
the option analyze to examine your system and to generate a list of applied patches. When
completed, you can use the update subcommand or the download then add subcommands
to download and apply new patches to your system.

Please run man smpatch for more information regarding this command.

b. Are you running into RV known memory leak issues?


Previous releases of RV had a few known memory leak issues, either at the RV daemon or
RV API level. Please review the latest RV Release Notes regarding memory bug fixes and
consider upgrading to the latest RV release.

Generated by Clearspace on 2010-12-01-06:00


4
Troubleshooting RV Daemon High Memory Consumption on Solaris

Below is a list of memory leak related defect corrections in various RV 7.X releases.

1-2KFXBP: 7.3

Fixed a defect in which the daemon logged spurious out-of-memory errors. The daemon
closes the affected connection as an associated symptom.

Out-of-memory conditions cause rvd to close the affected client connection.

Out-of-memory conditions cause rvrd to close the affected daemon-to-daemon connection.

1-201RG3: 7.3

Fixed a daemon defect in which a large number of simultaneous client transport connections
could cause large memory growth.

1-219VXE: 7.3

Fixed a daemon memory leak (a small amount per client transport connection).

1-1B2W3O: 7.3

Fixed a memory management bug in which the C client library made unneeded internal
copies of messages.

1-193ZWX: 7.3

Corrected a memory fragmentation issue in the client API libraries, which would occur when
an application held references to a large number of messages. The symptom observed
would be excessive memory usage, out of proportion to the number of messages held by the
application.

1-3OHTGD: 7.4

Fixed a memory leak associated with tibrvTransport_SetDescription.

1-6NGVB5: 7.4.1

Fixed a routing daemon memory issue in which direct client connections could cause the
routing daemon to abruptly exit, without any trace information.

1-6Y66N1: 7.5

Generated by Clearspace on 2010-12-01-06:00


5
Troubleshooting RV Daemon High Memory Consumption on Solaris

Improved memory management within daemons reduces the number of message latency
spikes.

c. Is the RV daemon processs memory limited by ulimit?


Check to see that your RV daemon process has not exceeded the limit set by the ulimit
setting of the system. Confirm that the host where the affected RV daemon is running has
enough system and process memory.

Even if there is sufficient memory available to the system, the OS may not be able to
allocate additional memory to a process for the following reasons:

1. The OS is imposing memory restrictions for individual processes via ulimit. You can run
the ulimit a command to see if there are memory quotas per process. The following is a
screen copy of a ulimit a command result:

sol% ulimit -a

time(seconds) unlimited

file(blocks) unlimited

data(kbytes) unlimited

stack(kbytes) 8192

coredump(blocks) unlimited

nofiles(descriptors) unlimited

vmemory(kbytes) unlimited

If vmemory is set to a small value, try increasing it if the unlimited value is not allowed.

2. Even if ulimit is configured to provide unlimited memory allocation for a process, due to
address space limitations per process, it will only grow up to a maximum size after which
the OS will be unable to allocate more memory. If this is the case, consider using the 64 bit
version of the RV daemon instead.

Generated by Clearspace on 2010-12-01-06:00


6
Troubleshooting RV Daemon High Memory Consumption on Solaris

d. Receiver(s) cant keep up with the inbound messages?


RV daemons queue messages as they are received before forwarding them on to connected
clients. By default, RVD buffers sixty seconds worth of messages. The more messages your
application sends per sixty seconds, combined with message size, the larger the resulting
rvd memory footprint. If the messages arrive faster than the RV daemon can forward them
to the clients or the clients can consume them, then the memory used by the RV daemon
will increase unbounded. This situation could eventually cause the RV daemon to exhaust
available memory and the "drainqueue" and out of memory messages will be generated in
the log.

Note that Out of memory messages suggest that either the RV daemon or some other
application is consuming all available system memory. As a result, the RV daemon will
not able to allocate memory to carry out certain tasks. While the RV daemon attempts to
continue in the presence of the memory allocation failure, it is quite likely that the OS will
terminate the process and release the allocated memory.

e. Is it possible that large amounts of message buffering


between rvrd neighbors causes high memory consumption?
First, lets understand why there is message buffering between rvrd neighbors. Consider the
following:

Every WAN connection has a maximum message throughput capacity. Routing daemons
cannot exceed this physical limitation. When the volume of routed data is greater than
the WAN capacity, rvrd buffers the outbound data which causes memory consumption to
increase.

Data backlog can occur for several reasons.

An unexpected burst of data exceeds WAN capacity.

A temporary problem with the WAN sharply decreases its bandwidth capacity.

WAN capacity is insufficient for the required volume of data.

WAN capacity is generally sufficient, but rvrd is mis-configured to route more data than
expected. The total data volume exceeds WAN capacity.

An extremely large backlog can cause severe problems regarding memory growth for rvrd
and its host computer.

Generated by Clearspace on 2010-12-01-06:00


7
Troubleshooting RV Daemon High Memory Consumption on Solaris

You can use RV daemons HTTP interface (refer to the TIBCO Rendezvous Administrator
Manual > Chapter 4 section "Browser Administration Interface-rvd", for information on how to
launch RV daemons HTTP Interface) to see the peak backlog of each neighbor connection
to determine if this is the reason for memory growth. If this is indeed the cause, you may
want to consider enabling the rvrd backlog protection feature for each neighbor connection.
This feature protects the rvrd from growing unbounded. When enabling this feature, specify
the maximum permissible backlog in kilobytes. The router applies this maximum to all the
neighboring connections.

Note that when using the rvrd daemons http interface you can navigate to the Connected
Neighbors page, which displays the peak backlog for each neighbor. This information will
provide an idea of what to set as the maximum backlog value when configuring the backlog
feature.

When an outbound backlog exceeds the maximum backlog value for any neighbor
connection, rvrd automatically disconnects from that neighbor, clears the corresponding
outbound data buffer and attempts to reconnect to the neighbor.

Enabling this feature will result in the discarding of data in certain extreme circumstances.
When this feature is disabled (the default), the routing daemon does not protect against
backlog. The decision to enable the backlog is generally based upon deployment criteria.

For more details on configuring the backlog feature, please refer to the TIBCO Rendezvous
Administrator Manual > Chapter 5 "Routing Daemon (rvrd)" > section "Routers".

It is incorrect to assume that there is a 1:1 ratio between the rvrd processs memory
increment size and the backlogs size. It is common to see the rvrds process size at
anywhere from 2.7 to 3.0 times the backlog size. The reason is that there are memory and
CPU tradeoffs when attempting to deliver messages as quickly and reliably as possible.

f. High rate of data retransmission?


Information about retransmission rates can be found by using RV daemons HTTP interface.
Please refer to the TIBCO Rendezvous Administrator Manual > Chapter 4 section "Browser
Administration Interface-rvd", for information on how to launch RV daemons HTTP Interface.

Information about retransmission can also be determined if you have a raw data capture
during the problematic period. You can use rvtrace to obtain the data capture.

Analyzing rvtrace data can help determine the retransmission rate. As a general rule of
thumb, if the retransmission rate is above 2% it should be treated as an alert. A rate above
5% will require immediate attention.

Generated by Clearspace on 2010-12-01-06:00


8
Troubleshooting RV Daemon High Memory Consumption on Solaris

The cause of retransmission can be rooted at the lower layer. Make sure all machines
involved in the same network are equipped with NICs of the same speed setting (avoid the
autospeed setting) and the duplex mode is set to the same value for all NICs.

You might want to consider reviewing the following RV Best Practice article to minimize
retransmission:

For further information on the rvtrace tool, please refer to the TIBCO Rendezvous
Administration manual > Chapter 12 "Protocol Monitor (rvtrace)".

g. How many applications are connected to the RV daemon?


This question comes about due to a known defect (CR# 1-201RG3) which was fixed in the
RV 7.3 release. When there are many clients (600-1500) connected to a RV daemon (rvd or
rvrd), the RV daemons memory usage could substantially increase. It is common for the RV
daemon to reach 200M-900M with 600-1500 connections when running RV with this defect.

Also, if you do run into this defect, the memory profiling data will reveal that the "IPQ pools"
is what is consuming most of the memory. Obtaining memory profiling data will be discussed
near the end of this article.

h. Was there any increase in the message rate/size in the recent


past?
You can obtain message rate and size by logging into the HTTP interface of the RV
daemon. Please refer to the TIBCO Rendezvous Administrator Manual > Chapter 4 section
"Browser Administration Interface-rvd", for information on how to launch RV daemons HTTP
Interface.

The information on message rate and size will help to determine if you are running into a
slow consumer/fast producer situation that eventually causes messages build-up in the RV
daemons buffer. Always try to avoid sending messages in a tight loop. If possible, try to
introduce a small delay between the sending of messages.

For long term planning, if message rates are high, you should isolate the machine which
hosts the RV daemon and keep the low memory consuming applications running on this
same machine.

Generated by Clearspace on 2010-12-01-06:00


9
Troubleshooting RV Daemon High Memory Consumption on Solaris

i. Examining the daemon log file for further clues


It is advisable to run the RV daemon with the -logfile option. You can review the log file
to see if there are any additional error messages. You should also pay attention to those
messages that appear around the time the RV daemon logs the out of memory error
message.

j. Monitor Disk I/O performance


Run iostat -xn 30 during peak times to observe the I/O characteristics of your devices.
Please ignore the leading summary statistics and view the output every thirty seconds.

Here is a sample result of iostat xn 30 command:

In summary, the following points should be considered:

1. If you are seeing asvc_t (service time) values of more than 20ms on disks that are in
use (those with say, 10% busy as denoted in the b% column), then the end user will see
noticeably sluggish performance.

2. If a disk is more than 60% busy (%b column) over sustained periods of time, this
can indicate overuse of that resource. The %b column of the iostat statistic provides a
reasonable measure for utilization of regular disk resources.

3. A high disk saturation (as measured by the iostat's %w column) always causes some level
of performance impact since I/Os are forced to queue up. If iostat consistently reports %w
being greater than 5, the disk subsystem is too busy.

Generated by Clearspace on 2010-12-01-06:00


10
Troubleshooting RV Daemon High Memory Consumption on Solaris

Another indication of a saturated disk I/O subsystem is when the procs/b section of vmstat
persistently reports a number of blocked processes that is comparable to the run queue
(procs|kthr/r). (The run queue is roughly comparable to the load average.)

Information to be sent to TIBCO Support


If after trying all of the above you still have not found a reason for the memory consumption
problem, collect the following data and submit it to TIBCO Support for further analysis:

1. ulimit a

2. uname a

3. rv daemon log file.

4. In case the daemon is rvrd, please also include the store file.

The following commands should be run during the problematic period.

5. Resource monitoring (top or prstat) script. Run this script as discussed in the Determining
if RV Daemon is indeed consuming large amount of memory section above.

6. rvtrace filter ip w rvtrace.cap. Run this command for at least ten minutes on any machine
on the same network. You might need to run this command a bit differently if you have
multiple network cards on the machine where you would run the command. Please refer to
the TIBCO Rendezvous Administration manual > Chapter 12 "Protocol Monitor (rvtrace)", for
more details.

7. Turn on Memory Profiling.

Profiling is enabled by visiting the RV daemons profiling page. (<RV daemons url>/profiling).
Example: http://cbui-t60.na.tibco.com:7580/profiling You will be presented with a page where
you can click on Memory or the Function Invocations box (or both). After clicking on the
Memory Profiling box, click on the submit button. Function profiling tells how much time
has been spent in each function and memory profiling provides details of the daemons
memory usage. When one or both of these is enabled, the daemon prints the data to its logs
every five seconds until the profiling is disabled (using the same URL mentioned above).
As Function/Memory Profiling writes data to the daemons log, please be sure to start your
RV daemon with the logfile option. Word of caution: While using the profiling feature, you
will see significant growth in CPU/memory usage as the daemon will log statistics relating to
memory allocation/deallocation. Additionally, information on all functions accessed and with
what frequency will also be logged. We recommend enabling profiling for about ten minutes

Generated by Clearspace on 2010-12-01-06:00


11
Troubleshooting RV Daemon High Memory Consumption on Solaris

only during the problematic period (when you see an increase in memory usage). Disable
profiling shortly thereafter as the log file will grow quite significantly in size. 8. What was the
affected daemon doing during the problematic period? You can run pstack, truss against
the RV daemon before restarting it. Also, you can generate a core file using kill -3 <pid of
daemon> or control-\ if the process is running in the foreground. You can later run dbx or
gdb against the core file to generate a stack trace.

Note 1: Make sure the core file size is not limited by the system. If in doubt please run ulimit
a and change the size accordingly (unlimited is preferred).

Note 2: If the affected RV daemon was running with the foreground option, a core file will
be created in the current directory where the daemon was executed. If the affected daemon
was running in the background (by default), the core file will normally be created in the root
directory. Be sure you have write privilege for the directory.

If you cant find the core file, talk to your system administrator as your system may have
been configured to generate core files in a specific directory.

Generated by Clearspace on 2010-12-01-06:00


12

S-ar putea să vă placă și