Documente Academic
Documente Profesional
Documente Cultură
Consumption on Solaris
Introduction
This article details troubleshooting and avoiding running into one or all of the following
situations:
-- RV daemons (rvd or rvrd) log file showing the entry out of memory or Drainqueue: out of
memory.
a. top command
The top utility does not come standard with Solaris. You can obtain this utility from .
Example: run 'top -d 20 -s 15 > top.log' while the RV daemon is running. You can increase
the -d parameter to make sure that rvrd will be in the process list and use the -s parameter
to increase the update interval.
In the example above the rvrd64 process appears to be consuming a large amount of
memory.
b. prstat (example: prstat c 1 5). Please run man prstat for further information on the options
used.
You can also monitor the system processes memory consumption pattern. Run the following
or similar script continuously and save the output to file.
#! /usr/bin/sh
#!/bin/sh
while true
do
now=`date "+%m/%d/%y:%H:%M:%S"`
top -b
sleep 60
done
The rvd daemon consumes memory to process incoming and outgoing messages. It also
uses memory to buffer data for retransmission and to maintain a buffer associated with each
client connection.
b. rvrd
In addition to the memory used by the rvd daemon mentioned above, the rvrd daemon
uses memory to process internal protocol messages and to communicate global routing
configuration changes. Furthermore, the rvrd daemon consumes memory for each created
router instance and connected neighbor.
c. Low CPU availability. This causes the RV daemon to become starved for CPU resources.
This may be seen more frequently with high message rates and a large number of client
connections.
The following discusses the high memory consumption issue and how to determine the
cause of the problem through a process of elimination of non-related causes:
You can use a system command called smpatch (requires root privilege to execute) with
the option analyze to examine your system and to generate a list of applied patches. When
completed, you can use the update subcommand or the download then add subcommands
to download and apply new patches to your system.
Please run man smpatch for more information regarding this command.
Below is a list of memory leak related defect corrections in various RV 7.X releases.
1-2KFXBP: 7.3
Fixed a defect in which the daemon logged spurious out-of-memory errors. The daemon
closes the affected connection as an associated symptom.
1-201RG3: 7.3
Fixed a daemon defect in which a large number of simultaneous client transport connections
could cause large memory growth.
1-219VXE: 7.3
Fixed a daemon memory leak (a small amount per client transport connection).
1-1B2W3O: 7.3
Fixed a memory management bug in which the C client library made unneeded internal
copies of messages.
1-193ZWX: 7.3
Corrected a memory fragmentation issue in the client API libraries, which would occur when
an application held references to a large number of messages. The symptom observed
would be excessive memory usage, out of proportion to the number of messages held by the
application.
1-3OHTGD: 7.4
1-6NGVB5: 7.4.1
Fixed a routing daemon memory issue in which direct client connections could cause the
routing daemon to abruptly exit, without any trace information.
1-6Y66N1: 7.5
Improved memory management within daemons reduces the number of message latency
spikes.
Even if there is sufficient memory available to the system, the OS may not be able to
allocate additional memory to a process for the following reasons:
1. The OS is imposing memory restrictions for individual processes via ulimit. You can run
the ulimit a command to see if there are memory quotas per process. The following is a
screen copy of a ulimit a command result:
sol% ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) unlimited
vmemory(kbytes) unlimited
If vmemory is set to a small value, try increasing it if the unlimited value is not allowed.
2. Even if ulimit is configured to provide unlimited memory allocation for a process, due to
address space limitations per process, it will only grow up to a maximum size after which
the OS will be unable to allocate more memory. If this is the case, consider using the 64 bit
version of the RV daemon instead.
Note that Out of memory messages suggest that either the RV daemon or some other
application is consuming all available system memory. As a result, the RV daemon will
not able to allocate memory to carry out certain tasks. While the RV daemon attempts to
continue in the presence of the memory allocation failure, it is quite likely that the OS will
terminate the process and release the allocated memory.
Every WAN connection has a maximum message throughput capacity. Routing daemons
cannot exceed this physical limitation. When the volume of routed data is greater than
the WAN capacity, rvrd buffers the outbound data which causes memory consumption to
increase.
A temporary problem with the WAN sharply decreases its bandwidth capacity.
WAN capacity is generally sufficient, but rvrd is mis-configured to route more data than
expected. The total data volume exceeds WAN capacity.
An extremely large backlog can cause severe problems regarding memory growth for rvrd
and its host computer.
You can use RV daemons HTTP interface (refer to the TIBCO Rendezvous Administrator
Manual > Chapter 4 section "Browser Administration Interface-rvd", for information on how to
launch RV daemons HTTP Interface) to see the peak backlog of each neighbor connection
to determine if this is the reason for memory growth. If this is indeed the cause, you may
want to consider enabling the rvrd backlog protection feature for each neighbor connection.
This feature protects the rvrd from growing unbounded. When enabling this feature, specify
the maximum permissible backlog in kilobytes. The router applies this maximum to all the
neighboring connections.
Note that when using the rvrd daemons http interface you can navigate to the Connected
Neighbors page, which displays the peak backlog for each neighbor. This information will
provide an idea of what to set as the maximum backlog value when configuring the backlog
feature.
When an outbound backlog exceeds the maximum backlog value for any neighbor
connection, rvrd automatically disconnects from that neighbor, clears the corresponding
outbound data buffer and attempts to reconnect to the neighbor.
Enabling this feature will result in the discarding of data in certain extreme circumstances.
When this feature is disabled (the default), the routing daemon does not protect against
backlog. The decision to enable the backlog is generally based upon deployment criteria.
For more details on configuring the backlog feature, please refer to the TIBCO Rendezvous
Administrator Manual > Chapter 5 "Routing Daemon (rvrd)" > section "Routers".
It is incorrect to assume that there is a 1:1 ratio between the rvrd processs memory
increment size and the backlogs size. It is common to see the rvrds process size at
anywhere from 2.7 to 3.0 times the backlog size. The reason is that there are memory and
CPU tradeoffs when attempting to deliver messages as quickly and reliably as possible.
Information about retransmission can also be determined if you have a raw data capture
during the problematic period. You can use rvtrace to obtain the data capture.
Analyzing rvtrace data can help determine the retransmission rate. As a general rule of
thumb, if the retransmission rate is above 2% it should be treated as an alert. A rate above
5% will require immediate attention.
The cause of retransmission can be rooted at the lower layer. Make sure all machines
involved in the same network are equipped with NICs of the same speed setting (avoid the
autospeed setting) and the duplex mode is set to the same value for all NICs.
You might want to consider reviewing the following RV Best Practice article to minimize
retransmission:
For further information on the rvtrace tool, please refer to the TIBCO Rendezvous
Administration manual > Chapter 12 "Protocol Monitor (rvtrace)".
Also, if you do run into this defect, the memory profiling data will reveal that the "IPQ pools"
is what is consuming most of the memory. Obtaining memory profiling data will be discussed
near the end of this article.
The information on message rate and size will help to determine if you are running into a
slow consumer/fast producer situation that eventually causes messages build-up in the RV
daemons buffer. Always try to avoid sending messages in a tight loop. If possible, try to
introduce a small delay between the sending of messages.
For long term planning, if message rates are high, you should isolate the machine which
hosts the RV daemon and keep the low memory consuming applications running on this
same machine.
1. If you are seeing asvc_t (service time) values of more than 20ms on disks that are in
use (those with say, 10% busy as denoted in the b% column), then the end user will see
noticeably sluggish performance.
2. If a disk is more than 60% busy (%b column) over sustained periods of time, this
can indicate overuse of that resource. The %b column of the iostat statistic provides a
reasonable measure for utilization of regular disk resources.
3. A high disk saturation (as measured by the iostat's %w column) always causes some level
of performance impact since I/Os are forced to queue up. If iostat consistently reports %w
being greater than 5, the disk subsystem is too busy.
Another indication of a saturated disk I/O subsystem is when the procs/b section of vmstat
persistently reports a number of blocked processes that is comparable to the run queue
(procs|kthr/r). (The run queue is roughly comparable to the load average.)
1. ulimit a
2. uname a
4. In case the daemon is rvrd, please also include the store file.
5. Resource monitoring (top or prstat) script. Run this script as discussed in the Determining
if RV Daemon is indeed consuming large amount of memory section above.
6. rvtrace filter ip w rvtrace.cap. Run this command for at least ten minutes on any machine
on the same network. You might need to run this command a bit differently if you have
multiple network cards on the machine where you would run the command. Please refer to
the TIBCO Rendezvous Administration manual > Chapter 12 "Protocol Monitor (rvtrace)", for
more details.
Profiling is enabled by visiting the RV daemons profiling page. (<RV daemons url>/profiling).
Example: http://cbui-t60.na.tibco.com:7580/profiling You will be presented with a page where
you can click on Memory or the Function Invocations box (or both). After clicking on the
Memory Profiling box, click on the submit button. Function profiling tells how much time
has been spent in each function and memory profiling provides details of the daemons
memory usage. When one or both of these is enabled, the daemon prints the data to its logs
every five seconds until the profiling is disabled (using the same URL mentioned above).
As Function/Memory Profiling writes data to the daemons log, please be sure to start your
RV daemon with the logfile option. Word of caution: While using the profiling feature, you
will see significant growth in CPU/memory usage as the daemon will log statistics relating to
memory allocation/deallocation. Additionally, information on all functions accessed and with
what frequency will also be logged. We recommend enabling profiling for about ten minutes
only during the problematic period (when you see an increase in memory usage). Disable
profiling shortly thereafter as the log file will grow quite significantly in size. 8. What was the
affected daemon doing during the problematic period? You can run pstack, truss against
the RV daemon before restarting it. Also, you can generate a core file using kill -3 <pid of
daemon> or control-\ if the process is running in the foreground. You can later run dbx or
gdb against the core file to generate a stack trace.
Note 1: Make sure the core file size is not limited by the system. If in doubt please run ulimit
a and change the size accordingly (unlimited is preferred).
Note 2: If the affected RV daemon was running with the foreground option, a core file will
be created in the current directory where the daemon was executed. If the affected daemon
was running in the background (by default), the core file will normally be created in the root
directory. Be sure you have write privilege for the directory.
If you cant find the core file, talk to your system administrator as your system may have
been configured to generate core files in a specific directory.