Batch Best Practices PDF

Batch Best Practices
Oracle Utilities Application Framework

OR ACLE WHITE P APER | APRIL 2015
Table of Contents
Introduction 1
Caveat 1
Conventions used in this whitepaper 1
Batch concepts 2
Batch Program 2
Batch Controls 2
Timed Batch 3
Level Of Service 3
Batch Overview 4
COBOL Versus Java Processes 4
Executing Batch Jobs 4
Scheduler Submission Overview 5
Configuration File Hierarchy 6
Execution Modes 7
Issues with DISTRIBUTED mode 9
CLUSTERED Mode internals 12
Worker Initialization 12
Submitter Initialization 13
Process 13
Member Validation 13
Scheduler Daemon 14
BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

Socket issues in CLUSTERED mode 14
Clustering using Unicast or Multicast 15
Migrating from DISTRIBUTED to CLUSTERED mode 16
CLUSTERED Mode Operations 17
Testing for multi-cast 18
Demonstration Single machine setup 18
Timed Batch 18
CLUSTERED Mode recommendations 19
Threadpools and Database Recycling 20
EXTENDED Mode 20
Threading Overview 24
Operational Best Practices 25
Parameter Guidelines 25
Use Of Batch Control For Parameters 26
Multiple Batch Controls for a Batch Program 26
Take Defaults For Parameters 26
Commonly Used Configuration Settings 27
Setting The Error Tolerance 27
Multi-threading Guidelines 27
Altering Commit Interval and Cursor Reinitialization time 28
Online Daemon or Standalone Daemon 30
Online Submission Guidelines 31
Use of Spacenames 31
Maximum Execution Attempts 31
1 BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

Number of Threads Per Threadpool 32
Designing Your Threadpools 33
Threadpool Cluster Considerations 33
COBOL Memory Optimization for Batch 34
Example Clustered Mode scenarios 34
Generic configuration process 35
Sample Setup 36
Setup threadpools 36
Setup Clustered Mode 37
Starting the threadpools 41
Monitoring Background processes 42
JMX Monitoring 42
Global Batch View 42
Database Connection information 43
Commit Strategies 43
Flushing the Batch Cache 45
Online Submission Alternative Threadpools 45
Restart Threadpools Regularly 45
Overriding Threadpool log file names 46
Submitter Nodes Per Job 47
Clustered Mode – Dedicated Storage Node Recommendation 47
Clustered Mode – Roles Recommendation 48
Clustered Mode – Private Network Recommendation 50

Clustered Mode – Multiple Clusters Recommendation 50
Batch Memory Management 50
Setting Batch Log Filename Prefix 51
Adding Custom JMXInfo for Monitoring 52
Managing configuration using Batch Edit 52
Enabling BatchEdit 55
BatchEdit Commmand Line 55
BatchEdit Configuration Process 56
Cluster Creation and Maintenance 57
Threadpool Creation and Maintenance 58
Submitter Creation and Maintenance 59
Using Cache Threadpools 60
BatchEdit Common Configurations 61
Converting from existing setup to use BatchEdit 62
Scheduling Best Practices 62
Optimize the schedule 62
Use a third party batch scheduler 63
Scheduler Implementation Guidelines 64
Common Errors 65
No Storage Nodes Exist 65
Communication Delays 65
Threadpool Work Abends 66
MAX-ERRORS Exceeded 66

.

Introduction
One of the major components of the Oracle Utilities Application Framework is the ability to execute background
processes.
Typically a foreground process (a.k.a. online or Web Service transaction), performs operations on a single instance
of an object within the product. Maintaining a person's contact details or processing a single payment is two such
examples. Background processing, by contrast, performs operations on multiple instances of an object. This coins
the term batch where the background process processes batches of objects. The term background processing
implies that the processing is performed in the background with little or no user interaction.
The product ships with a preset number of background processes that may be used and configured to perform the
necessary business functions for your site. These background processes can be extended (just like the rest of the
product functionality) and custom background processes can be added.
This white paper outlines the common and best practices used for the background processing (a.k.a. batch)
component of the Oracle Utilities Application Framework. The advice in this whitepaper is based upon Oracle
internal studies and customer feedback around the world. This information is provided to guide other sites in
implementing or maintaining the product in production.
This document is a companion document to the product documentation and the Performance Troubleshooting
Guidelines – Batch Troubleshooting (Doc Id: 560382.1) whitepaper available from My Oracle Support.
Note: For publishing purposes, the word product will be used to be denote all Oracle Utilities Application Framework
based products.
Caveat
While all care has been taken in providing this information, implementation of the practices outlined in this document
may NOT guarantee the same level of (or any) improvement. Not all practices outlined in this document will be
appropriate for your site. It is recommended that each practice be examined in light of your particular organizational
policies and use of the product. If the practice is deemed beneficial to your site, then consider implementing it. If the
practice is not appropriate (e.g. for cost and other reasons), then it should not be considered.
Conventions used in this whitepaper

The advice in this document applies to any product based upon Oracle Utilities Application Framework versions 2.1
and above. Refer to the installation documentation to verify which version of the framework applies to your version
of the product. For publishing purposes the specific facilities and instructions for specific framework versions will be
indicated with icons:
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
V2.2 based products and above.
V4.2.0.0.0 based products and above.
1 | BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

V4.3.0.0.0 based products and above.
Note: Advice in this document is primarily applicable to the latest version of the Oracle Utilities Application
Framework at time of publication. Some of this advice may apply to other versions of the Oracle Utilities Application
Framework and may be applied at site discretion.
Note: In some sections of this document the environment variable $SPLEBASE (or %SPLEBASE%) is used. This
denotes the root location of the product install. Substitute the appropriate value for the environment used at your
site.
Note: This document is a companion to the Server Administration Guide (V4.3 and above), Batch Server
Administration Guide (V4.0 - V4.2 only) or Operations And Configuration Guide (V2.x)
Batch concepts
The following section outlines some basic concepts when configuring and executing the batch component of the
Oracle Utilities Application Framework.
Batch Program
The main component of the background process is the batch program. This program contains the logic to select the
batch of records that the process will perform actions upon. The batch of records is progressively passed to the
relevant objects to complete processing. In essence, the program acts as a driver to push individual records to the
relevant objects. Objects in the product are shared across all modes of access to maximize reuse.
Apart from the logic to decide the subset of records, the batch program contains the following additional information:
» The batch program contains the code necessary to interface to the framework to automatically manage its
individual execution and restart (if necessary).
» Some of the batch programs contain the code necessary to multi-thread the processing. In essence the program
determines which slice or subset of the overall data it needs to process. Not all programs support multi-threading
(most do, except the extract to file type processes).
The product ships with a preset number of batch programs associated with background processes that may be used
and configured to perform the necessary business functions for your site. These background processes can be
extended (just like the rest of the product functionality) and custom batch programs can be added.
Batch Controls
A batch program and its parameters must be defined in metadata prior to initial execution. This data is stored in a
Batch Control object within the meta-data component of the framework.
The Batch Control contains the definition of the batch program and the following additional information:
» A Batch Code used as an identifier. This is used by the framework to identify the job internally and used to denote
output.
» Basic execution statistics for the last execution, including last execution date and time and the latest run number
which is primarily used for extract processing.
» Process specific parameters identify whether a parameter is optional or required and may provide a site specific
default value.
The product ships with a preset number of batch controls associated with background processes that may be used
and configured to perform the necessary business functions for your site. Custom batch controls can be added as
needed.

Timed Batch
Note: This feature only applies to Oracle Utilities Application Framework V4.2.x and above only.
One of the features of the Oracle Utilities Application Framework is the ability to create timed batch jobs. Typically
most batch jobs are executed as a single execution on a regular time period such as once a day, once an hour etc.
In some implementation scenarios, it is required to run a batch job continuously (such as a daemon) to process data
as it is found. The Oracle Utilities Application Framework introduced a feature to allow implementations to create
jobs that are run continuously.
The concept is as follows:
» The batch program must be designed to run continuously. Existing jobs in the product cannot be converted from
non-timed to timed just using configuration. The Oracle Utilities SDK contains information about writing
continuous batch programs.
» An instance of the timed batch program runs for a timer interval and then completes on next commit after the
interval is reached. After completion a new instance is automatically started. This simulates a continuous
execution whilst minimizing performance impact on the overall system 1.
» The batch control for timed jobs must be configured with additional information including:
» Timer Interval - The duration each instance of the batch job will execute.
» Timer Active - This controls whether the job is executing or not. The job will only execute when the timer
active is set to true/Yes.
» UserId/Batch Language - Userid used for authorization and language for batch messages
» Email Address - Optional, email address used for monitoring for error executions.
» The job must be started manually initially to start the process executing.
» To stop a timed job, set the Timer Active to false.
Level Of Service
Note: This feature is only available in Oracle Utilities Application Framework V4.2.0.2,0 and above.
Note: This feature has been designed for use with Timed Batch only with algorithms optimized for that style only
provided. It is applicable to other styles but requires an algorithm to be developed and configured.
One of the newer features of Batch processing is the ability to configure a level of service check on the batch control
to check the latest execution of the job against some criteria to ascertain whether it meets service expectations. This
facility is designed to provide basic service level feedback.
By default, the facility is disabled with the message Disabled - Level of Service reporting is not enabled for this batch
job displayed on the Batch Control screen. This indicates that the level of service is not configured for this batch.
To use this facility the following needs process to be performed:
» An algorithm of type Batch Level of Service must be created using the Oracle Utilities SDK or using ConfigTools
using Service Scripts. This algorithm must be defined as an algorithm of this type and contain the logic to check
the last execution against a target (the logic is up to your business criteria). A sample, F1-BAT-LSDEF, has been
provided as a guide.
» On the Batch Control  Algorithms tab, configure the new algorithm with the appropriate service levels on the
algorithm definition.
Whenever the batch control is displayed the level of service will be assessed according
1 As timed batch is continuous it may execute concurrently with peak online hours.

Batch Overview
The Oracle Utilities Application Framework is a java based framework for running background processes, online
processes and Web Services. The internals of the framework manage the input channels and direct calls to the
appropriate objects within the product to perform the business logic and database access necessary.
In terms of background processing, the framework wraps the batch program to execute it and manage it from the
operational point of view. This includes the following:
» Providing the interface to track the progress of the threads via the Batch Run Tree transaction.
» Providing the infrastructure for the recording of restart checkpoints.
» Providing the infrastructure to handle other components like algorithms, developed in either COBOL or Java.
Background processes are executed within a JVM which has loaded the Framework components necessary for
batch execution. Each submission method may have other unique elements, but this basic summary is true for
each.
COBOL Versus Java Processes

Note: Not all Oracle Utilities Application Framework based products support COBOL. Refer to the installation
documentation with your product for more details. Products not using COBOL should ignore this section.
Note: This section does not apply to Oracle Utilities Application Framework V4.3.x and above .
The Oracle Utilities Application Framework supports both java and COBOL 2 based background processes. Since
the Oracle Utilities Application Framework is Java based, all processes are executed within a batch JVM (e.g.
threadpool). When the process needs to invoke a COBOL based module, the required module is loaded and
executed internally by the JVM. The results are then passed back to the main Java objects. This method differs
from the one used by the Online and Web Service Adapter in that they use separate child JVMs to isolate COBOL
calls. The reason for this difference is that in Online and Web Services there could be a large number of different
COBOL objects called. Batch is usually limited to a smaller number of COBOL objects.
Given that COBOL and Java live in the same batch JVM, there are a number of concepts which must be
understood:
» Any COBOL program that misbehaves (bad data/bad memory management etc) can cause the failure of the
executing batch JVM. This may affect other jobs using the same batch JVM (threadpool) at the same time (or
even future scheduled executions if the batch JVM is not restarted). Regularly checking the threadpool via JMX
and logs is advised to avoid issues.
» COBOL programs are attached to the batch JVM as executed. This can increase the memory footprint over time.
COBOL modules cannot be garbage collected using the default method provided with Java. Over time, a long
running batch JVM, that is reused for a lot of batch process may need to be stopped and restarted to avoid
memory issues.
Executing Batch Jobs

There are a number of different ways to initiate background processes within the Oracle Utilities Application
Framework. The different ways are available to support the different activities typically seen during an
implementation of the product.
2 COBOL is not supported as of Oracle Utilities Application Framework V4.3.x and above.

The list below summarizes the methods available with the framework:
» Command line mode – This involves running the submitjob[.sh] -e THIN command from the system
prompt for each thread of each background process. A JVM is created for each thread and shuts down after
execution of the thread. This mode has limitations in that you cannot use JMX to monitor the job and, per thread,
is the most resource intensive of all submission methods. This is primarily recommended for developer use only.
» Online Submission – An online transaction is provided to allow end users to register their intent to execute a
background process. A batch daemon (as part of the application server or standalone) polls for registrations of
background processes and executes them. This mode is primarily recommended for non-production use (for
example testing).
Note: In Oracle Utilities Application Framework V2.2 it is now possible, after implementing Patch 9364072, to use
environment variables when specifying parameters. For example, using ${SPLOUTPUT} in the FILE-PATH
parameter on the online submission. Environment variables must be surrounded by ${}.
» Batch Scheduler Submission – A set of utilities is provided to allow third party schedulers to execute
background processes from the product. These allow a scheduler to micro-manage the processes on behalf of
the product and also allow integration to non-product processes into your overall schedule.
Most implementations use the methods in the following ways:
» The Command Line method is not used by site personnel. Developers may consider using it but may use the
other methods.
» The Online Submission Method is used by most implementations for non-production environments where there is
no scheduler present. This allows testing personnel to submit background processes as necessary. The
implementation has a choice in terms of how the execution via the daemon actually occurs. A discussion of the
various execution methods are discussed in Online Daemon Or Standalone Daemon.
» The Scheduler Submission Method is commonly used for production and a scheduler test environment (it is
expected that the IT group will get familiar with the scheduler and scheduler before implementing it into
production). Guidelines for this method are outlined in Scheduler Implementation Guidelines.
Scheduler Submission Overview

One of the common practices at product implementations is to implement a batch scheduler to schedule and
execute batch jobs across an enterprise 3 (including product jobs). Therefore the product includes a set of scheduler
submission utilities that allow third party schedulers to execute background processes. These include a set of
command-line utilities and their associated configuration files. The intent is that site-specific configurations are
applied for the desired subset of background processes. The scheduler then executes those processes as
necessary across the hardware allocated for this purpose.
The basic concepts for these utilities can be summarized as follows:
» A worker JVM is started (termed a threadpool) using the threadpoolworker[.sh] utility. This starts a JVM
and loads the Oracle Utilities Application Framework ready to accept work. The threadpoolworker[.sh]
utility uses a set of configuration files to determine the characteristics of any threadpool it needs to manage. In
Oracle Utilities Application Framework V2.2, it is possible to manage the threadpools via JMX using a JMX
console or the provided jmxbatchclient[.sh] utility. The threadpool is given a name which is used by the
Oracle Utilities Application Framework to attribute work to it as directed.
» Each thread (or multiple threads) can be submitted to the named threadpool using the submitjob[.sh] utility.
During this process a small submitter JVM is created to initiate communication between the background process
3 Whilst sites will use the implementation of the product to introduce the batch scheduler, it is apparent that the scheduler is best used for enterprise
wide scheduling.

and the threadpool. This JVM is also used by the submitjob[.sh] utility to communicate back to the scheduler
on the outcome of the execution of the background process.
» At the conclusion of the background process the submitjob[.sh] utility returns the outcome to the scheduler
for the appropriate action. This is where an external scheduler can decide the next action based upon the
outcome of previous executions of background processes. For example, in the case of failure it may stop
dependent background processes and page the appropriate personnel.
Guidelines on how to use this facility are contained in Scheduler Implementation Guidelines.
Note: Do not attempt to edit the properties files directly. These are rebuilt from the .template files each time
initialSetup[.sh] is executed (i.e. when patches are applied) and all changes WILL BE lost. All changes
should be made in the threadpoolworker.properties.template and
submitbatch.properties.template respectively, and initialSetup[.sh] executed after each change. Refer to
the Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide for your
product for details of how to implement custom templates.
Note: Keep backups of your .template files. Patches and Service packs will overwrite these files with defaults.
When this happens, check the new .templates for any NEW settings which may be necessary for the newly patched
environment(s) prior to restoring your site-specific configuration.
Configuration File Hierarchy

A component of the Scheduler Submission Method is the set of configuration files that control the behavior for the
interface and the options that are used by the execution. For maximum flexibility there is an inbuilt hierarchy for the
configuration options:
» The threadpoolworker[.sh] and submitjob[.sh] have internal defaults that are used if no configuration
files exist. These are not recommended to be used, except in development testing, as these defaults are usually
not recommended for production use.
» The internal defaults may be overridden by a properties file associated with the utility. The
threadpoolworker[.sh] utility uses the file $SPLEBASE/etc/threadpoolworker.properties. The
submitjob[.sh] utility uses the file $SPLEBASE/etc/submitbatch.properties. Default templates for
these files are provided with the product.
» For the submitjob[.sh] utility, it is possible to create a job specific configuration file which will contain only
those characteristics unique to a particular background process. This configuration file is usually named
<batchcode>.properties or <batchcode>.properties.xml. The xml version is primarily provided for
character sets other than Western European 4.
» The threadpoolworker[.sh] and submitjob[.sh] support command line options to override previous
configuration settings.
The figure below illustrates the hierarchies:
4 Western European character set includes USA, Canada, Australia, Europe (except Eastern Europe) and New Zealand.

threadpoolworker hierarchy submitjob hierarchy
Internal defaults Internal defaults
Overridden by Overridden by
threadpoolworker.properties submitbatch.properties
Overridden by Overridden by
Job specific properties

Command line options
(if exist)
Overridden by
Command line options
Figure 1 – Configuration hierarchy
Execution Modes
Note: The CLUSTERED execution mode applies to Oracle Utilities Application Framework V2.2 SP7 and above only.
Note: A new mode has been introduced called EXTENDED mode that is available for Oracle Utilities Application
Framework V4.1 and above only after applying patch 1173516.
The execution method is specified in both the threadpoolworker.properties file and the
submitbatch.properties file as runtime defaults and are established at configuration time with the
configureEnv.sh utility. This can be overridden at runtime using either submitjob[.sh] and
threadpoolworker[.sh] with the -e option.
The THIN execution mode executes a single thread of a single job in a single JVM. This is primarily designed to be
used by developers to test their batch processes during initial development and testing activities. It can be used for
other uses in an implementation (testing etc) but is seen as inefficient compared to other methods due to the fact
that a JVM is required per thread per job.
The DISTRIBUTED execution mode (also known as Classic mode) 5, allows numerous threads from numerous jobs
to be execute by one or more JVM's known as threadpools. A threadpool is made up of worker nodes that process
work as instructed from submitter nodes.
Each worker node offers n number of threads to the grid in a specific thread pool (where n is the number of
concurrent threads the worker can run), and creates and takes out a lease on a THREAD_OFFER entry on the
F1_TSPACE_ENTRY database table to register and stay alive as a participant in the grid. Similarly, a submitter node
leases a WORK_OWNER entry on the table to register its participation. By default, these leases are renewed every 20
seconds by updating a lease expiry timestamp on the respective row. There is therefore some overhead involved,
and it requires adequate database response times to function properly.
5 The DISTRIBUTED mode is the default mode from Oracle Utilities Application Framework 2.0, 2.1 and 2.2.

The workers and submitters poll for GRID_WORK and WORK_ENDED entries respectively by regularly scanning the
F1_TSPACE_ENTRY table on the database. The default interval setting for this is every 1-second, so if there are a
lot of active nodes in the grid, it can amount to a significant level of database activity (even though just the header
information is selected - not the CLOB column on the table).
A submitter node inserts a GRID_WORK entry into the table to notify the grid that there's work to be done. This entry
contains the batch code and batch parameters for the job, which enables the worker that picks up the request to
execute it. The worker takes out a lease on this GRID_WORK entry to signal its intent to process the work, and
continues to initiate the individual threads for the job. Once the threads have ended, the worker inserts a
WORK_ENDED entry into the table, which notifies the submitter that its work has been processed.
Note: This is a highly simplified overview of the process. In reality, the worker itself also creates GRID_WORK and
WORK_ENDED as well as PER_STATE entries for the individual threads when those are initiated. There is also a
HousekeepingDaemon process that has its own GRID_WORK entry and which periodically monitors the
F1_TSPACE_ENTRY for expired leases, etc., as well as an optional SchedulerDaemon which looks for submissions
from the online system.
The figure below summarizes the DISTRIBUTED processing mode:
Submitter Node theadpoolworker Node 1

F1_TSPACE_ENTRY
StandardExecuter StandardExecuter
submitter=true
submitter=false
batchCode=JOB1 THREAD_OFFER theadPool=DEFAULT
theadPool=DEFAULT
threadOffered=4
threadCount=4
WORK_OWNER
GRID_WORK
Submitter Node WORK_ENDED theadpoolworker Node 2

THREAD_OFFER
submitter=true submitter=false
batchCode=JOB2 theadPool=DEFAULT
theadPool=DEFAULT WORK_OWNER
threadOffered=4
threadCount=2
GRID_WORK
WORK_ENDED
Figure 2 – DISTRIBUTED mode
For publishing purposes, the following facilities are not depicted in the above figure:
» The PER_STATE rows that contain the ThreadWorkUnit entries (i.e. the data) for each submitted thread.
» The HouseKeapingDaemon. This singleton periodically monitors the F1_TSPACE_ENTRY table for expired
entries and deletes them. It also has a GRID_WORK entry on the F1_TSPACE_ENTRY table.
» The SchedulerDaemon. This singleton looks for submissions from the online system by polling the
CI_BATCH_JOB table used for online submission. If found, it submits the job to the grid. It also has a
GRID_WORK entry on the F1_TSPACE_ENTRY table.
The figure below illustrates the flow of data in DISTRIBUTED execution mode:

Submitter Node Worker Node
ThreadPoolWorker
SubmitBatch
StandaloneExecuter
DistributedGridNode
StandaloneExecuter
Grab Work SpaceManager
SpaceJDBC
DistributedGridNode SpaceChangePoller
DistributedJobExecuterStub ThreadPool
ThreadOfferManager
Execute
WorkProcessor AbstractGridWork
Thread
Execute Job AbstractBatchWork
JavaJob CobolJob
GRID_WORK AbstractGridWork
Executer Executer
Submit Job JobExecuterWork
BatchJobExecuter
DistributedJobExecuter Read Thread Work
Submit Threads JavaJob CobolJob
Executer Executer
PER_STATE
Store Thread Work
Figure 3 – DISTRIBUTED process mode
In DISTRIBUTED mode, the submitter node serializes the appropriate GridWork entry to the F1_TSPACE_ENTRY
table, and then waits for the job completion WORK_ENDED entry.
The worker performs the job portion of the execution by calling either JavaJobExecuter or CobolJobExecuter,
in which the initialization of the job takes place. In the case of JavaJobExecuter, the application’s getJobWork
method is invoked and the thread work units that it collects are divided into the thread chunks and stored in
F1_TSPACE_ENTRY PER_STATE rows. The worker then submits the threads by serializing the thread GridWork
entries to F1_TSPACE_ENTRY GRID_WORK rows so that a similar path is followed to execute the individual threads.
Is s u e s with DIS TRIBUTED m o d e

The DISTRIBUTED mode is currently is the default mode and is in production at a number of sites and works well
for the most part, but there are a few limitations with this mode:
If a worker node drops off unexpectedly, as happens if a program crashes, the submitter node is not made aware of
it immediately. To the user it looks like the job is still in progress. When the worker is restarted, the submitter does
get notified at that point, which is not ideal.
Note: In this case the worker node logs the frequently misunderstood message, "Maximum number of grid work
failures was reached (1)".
If a submitter node drops off, the worker nodes are not aware of that and will continue to process the job to the end.
This is a lesser problem, but is also not ideal.
A worker or submitter node’s health is highly dependent on good, consistent database response. The database, by
nature, is volatile and response times can vary, and it causes intermittent problems that are difficult to troubleshoot.
The most common error is one that indicates that a lease cannot be renewed, and the Oracle Utilities Application
Framework can and does self-correct most of the time, but it very often leads to further problems.

The last execution mode, CLUSTERED 6, provides the same facilities as DISTRIBUTED mode but eliminates the
need for F1_TSPACE_ENTRY and uses the Named Cache and Work Manager facilities in Oracle Coherence to
address the issue.
Oracle Coherence provides a grid-enabled implementation of the IBM and BEA CommonJ Work Manager, which is
the basis for JSR-237. Using a Work Manager, the product can submit a collection of work (a job or set of threads)
that needs to be executed. The Work Manager distributes that work in such a way that it is executed in parallel,
typically across the grid. In other words, if there are ten work items submitted and ten servers in the grid, then each
server will likely process one work item. Further, the distribution of work items across the grid can be tailored, so that
certain servers (e.g. one that acts as a gateway to a particular mainframe service) will be the first choice to run
certain work items, for sake of efficiency and locality of data. The application can then wait for the work to be
completed, and can provide a timeout for how long it is willing to wait.
When a worker starts, the initialization is similar to DISTRIBUTED mode, except that a Coherence based cluster
node is started by creating a WorkManager for each thread in the thread pool. In Coherence these are known as
services, and they specify the number of threads that are offered in that service. This is exactly the same as the
existing concept of thread pools. The objects needed for classic DISTRIBUTED mode to manage thread pools, poll
for work, create and renew leases, and so on, are not initialized as that is redundant in a Coherence cluster.
An important difference to note here is that a CLUSTERED worker plays a far more passive role in the batch grid. A
DISTRIBUTED worker proactively polls for new work, whereas a CLUSTERED node waits for work to be handed to it
from a submitter.
A CLUSTERED submitter also goes through the same initialization as DISTRIBUTED mode and, like the worker,
starts a WorkManager instance to join the cluster. This WorkManager is created with a thread number
specification of zero, which indicates to Coherence that it is a client.
The job is then submitted by scheduling a serializable Work object. This object should be easily created from the
existing JobExecuterWork. This is a WorkManager schedule call, which serializes the object to its destination
worker node and waits for it to finish. This call will perform the "job" portion of the run by calling the standard
JavaJobExecuter or CobolJobExecuter appropriately.
Where the CLUSTERED implementation changes significantly from the DISTRIBUTED implementation is in the
submission of the threads. In DISTRIBUTED mode, a worker submits the threads, in effect becoming a submitter
(i.e. client). In contrast, the CLUSTERED implementation will make this the responsibility of the submitter node, so
that it is physically networked with its worker nodes. This will result in appropriate notifications in the event of nodes
dropping off. The figure below illustrates the concept:
6 CLUSTERED mode is only available with Oracle Utilities Application Framework 2.2 and above. For Oracle Utilities Application Framework 2.2
customers, refer to the Batch Operations and Configuration Guide for the appropriate patches to apply to enable CLUSTERED mode.

Submitter Node
StandardExecuter
submitter=true
batchCode=JOB1
theadPool=DEFAULT
threadCount=4
Cluster
Machine Machine
submitter=false submitter=false
theadPool=DEFAULT theadPool=DEFAULT
threadOffered=4 threadOffered=2
F1_TSPACE_ENTRY
Figure 4 – CLUSTERED mode

CLUSTERED Mode internals
Note: For the customers on Oracle Utilities Application Framework V2.2 based products considering using
CLUSTERED mode should install Oracle Utilities Application Framework Service Pack 14 to ensure all the patches to
implement CLUSTERED mode are applied.
From the batch application’s perspective, the underlying technology is completely transparent. Whether the
execution mode is THIN, DISTRIBUTED or CLUSTERED has no effect on the application program’s logic.
The CLUSTERED mode utilizes the Oracle Coherence NamedCache feature as a way of sharing information with
members of the cluster. There are two NamedCaches, one for job submission and the other for service information.
The BatchClusterCache moderates all insertions and deletions to these caches.
The process flow for the CLUSTERED mode is illustrated in the figure below:
Submitter Node Worker Node

BatchWork
SubmitBatch ThreadPoolWorker
AbstractGridWork
StandaloneExecuter StandaloneExecuter
AbstractBatchWork
Insert Service Java COBOL
Execute
Schedule
Information when Batch Batch

WorkManager is Work Work
Join Coherence Cluster
created
For Thread Pool as Client
ClusteredNode BatchWorkManager ClusteredNode

Join Coherence Cluster
For Thread Pool(s)
MemberListener Orchestrates MemberListener
read/write to the BatchWorkManager
MapListener Named Caches BatchWorkManager
MapListener
that are available
to all members
of the cluster Updates available threads in
Submit job by inserting entry into service information cache
submission cache and wait for when work assigns or
completion completes
BatchClusterCache
Job inserted/deleted event
NamedCache NamedCache
Service Information Cache Job Submission Cache
Figure 5 – CLUSTERED process flow
The details of the flow are outlined in the subsections below.
Wo rke r In itia liza tio n

When a worker starts, it either joins an existing cluster or creates a new one if the cluster name with which it is
started does not exist. Refer to Member Validation below for more details.
The initialization is then similar to DISTRIBUTED mode, except that a ClusteredNode is created instead of a
DistributedGridNode. The Clustered node implements the MemberListener interface to listen for member
events, and the MapListener interface to listen for insertion and deletion of a NamedCache.

Note: An online application server may also host a batch worker, but it is generally only recommended for test,
development or demonstration purposes. This is done by specifying the property
com.splwg.grid.online.enabled=true in the spl.properties file. The configuration for this is usually
done via the configureEnv[.sh] utility.
During the initialization process, the ClusteredNode creates a BatchWorkManager for each thread pool
specified in the properties file or command-line argument. With each BatchManager created, the ClusteredNode
is registered as the MemberListener so that it will be notified of any member events, and an entry is inserted into
the service information cache. The service information cache contains all information needed for a particular service
(thread pool), for example what each thread in that particular thread pool is currently running.
S u b m itte r In itia liza tio n

When a submitter starts, it checks that there is a threadpool to service its work. If a cluster does not exist (i.e. no
workers have been started for the cluster) or the existing cluster does not offer threads in the desired threadpool, the
message There is no member in the cluster with that pool name is logged and the submitter is terminated with a
non-zero exit code. Refer to Member Validation below for more details.
The ClusteredNode then initiates the job submission by inserting an entry into the job submission cache. The
submitter waits for job completion by monitoring the job submission cache. The insertion of the job into the job
submission cache fires an entry-inserted event that gets processed by all ClusteredNodes that are servicing the
particular pool name specified for that job submission. A node ‘acquires’ the job by being the first to update the job
submission entry with its member id. The node executes the SubmitBatchRun session executable and schedules
each work entry using a BatchWorkManager client. Each work entry is only scheduled as a thread becomes
available.
P ro c e s s
When the BatchWorkManager processes work, it updates the service info cache and the job submission cache to
specify the member id doing the processing and batchThreadId being processed. When the work is complete,
that information is removed. Once all the work is completed, the job status of the job entry in the submission cache
is changed to ended, signaling to the submitter that the job completed.
If a worker goes down, a member left event is generated and processed by all registered ClusteredNodes. The
submitter uses this event to get the pending work list of the member and, once the job is complete, the submitter
uses this pending work list to update the thread status to error and to report that a worker unexpectedly went down.
If a submitter goes down, a member left event is also generated for all registered ClusteredNodes. All the nodes
processing any work for that particular submitter immediately cancel the corresponding threads for the submitter and
update each thread’s status to error. The submitter itself then terminates with a non-zero exit code.
Me m b e r Va lid a tio n
An important operation is to validate that any node that joins a grid is supposed to join that particular grid, and not
some other. This is critical to ensure that a submitter node is joined to its intended cluster before submitting a job to
run against that environment. It would be disastrous if, for example, an archiving job meant to run against a test
system were inadvertently submitted to the production system.
To prevent this:
» When joining a cluster, a basic handshake protocol is used to validate that the new member is connected to the
same database as the other members in the cluster. The new member inserts a unique MEMBER_VALID entry in
the F1_TSPACE_ENTRY table and then waits for confirmation from an existing member that it saw that same
entry.

» The cluster name (tangosol.coherence.cluster) is a mandatory property. By specifying a unique name for
each separate environment, accidental discovery between environments is avoided.
Note: This property is optional for Oracle Coherence, and is therefore not truly mandatory in that an error will be
reported if omitted, but the validation as described above ensures against arbitrary unions in a cluster. A unique
value for it must be specified for multiple environments to separate the clusters and to avoid cluster validation errors.
S c h e d u le r Da e m o n
The scheduler daemon is what enables online job submissions to be processed. It polls the CI_BATCH_JOB table
for Pending entries and submits them to the batch cluster by inserting appropriate entries into the job submission
cache and then polling for completion. This process is exactly as described above for job submitters, however a
scheduler daemon also updates the status to Ended on the CI_BATCH_JOB tables when the job (i.e. last thread)
has finished.
Note: This does not indicate success or failure, but merely whether the threads for the job have ended. The job and
thread statuses are held on the CI_BATCH_RUN and CI_BATCH_THD (Batch Run Tree) tables.
The scheduler daemon is a singleton process (i.e. exactly one of them running on the cluster at any one point) with
appropriate failover in the event of the hosting worker dropping off. If this happens, another worker that is daemon-
enabled will become the host. It is therefore advisable to set the property
com.splwg.batch.scheduler.daemon to true (or use the -d Y command-line option) to enable this failover
capability.
An online application server may also host the scheduler daemon, even if it does not host a batch worker in the
cluster. In other words, these properties, in the spl.properties file for the online application server are perfectly
acceptable:
com.splwg.grid.online.enabled=false
com.splwg.batch.scheduler.daemon=true
Socket issues in CLUSTERED mode

To help minimization of packet loss, the operating sockets buffers need to be large enough to handle the incoming
network traffic while your Java application is paused during garbage collection. By default Oracle Coherence will
attempt to allocate a socket buffer of 2MB. If your operating system is not configured to allow for large buffers Oracle
Coherence will utilize smaller buffers. Most versions of Unix use a very low default buffer limit, which should be
increased to at least 2MB.
Starting with Oracle Coherence 3.1 you will receive the following warning if the operating system failed to allocate
the full size buffer.
UnicastUdpSocket failed to set receive buffer size to 1428 packets

(2096304 bytes); actual size is 89 packets (131071 bytes). Consult your
OS documentation regarding increasing the maximum socket buffer size.
Proceeding with the actual value may cause sub-optimal performance.
Though it is safe to operate with the smaller buffers it is recommended that you configure your OS to allow for larger
buffers.

TABLE 1 – RECOMMENDED OS SETTINGS FOR NETWORKING
Clustering mode Configuration Settings
Linux sysctl -w net.core.rmem_max=2096304

sysctl -w net.core.wmem_max=2096304
Solaris ndd -set /dev/udp udp_max_buf 2096304
AIX no -o rfc1323=1
no -o sb_max=4194304
Windows Windows does not impose a buffer size restriction by default.
Note: AIX only supports specifying buffer sizes of 1MB, 4MB, and 8MB. Additionally there is an issue with IBM's
1.4.2, and 1.5 JVMs which may prevent them from allocating socket buffers larger then 64K. This issue has been
addressed in IBM's 1.4.2 SR7 SDK and 1.5 SR3 SDK.
Clustering using Unicast or Multicast

Note: This facility applies to Oracle Utilities Application Framework V2.2 SP7 and above only
Note: Network protocols are more effective when the cluster is across more than one machine. On a single machine
there is little difference between the network protocols.
One of the configuration questions that need to be considered with the CLUSTERED execution mode is whether you
will use multicast or unicast for the threadpools. By default, a multi-cast based configuration is provided by the
configuration files provided with the installed product.
When deciding whether to use multicast or unicast the following advantages and disadvantages should be
considered:
TABLE 2 – DAEMON CONS IDERATIONS
Clustering Mode Advantages Disadvantages
Multicast (default) Only have to submit to one active node in Network traffic between clusters is via multi-cast
cluster. address.
Threadpools can be clustered and Work Some sites do not like the multicast protocol
Manager can load balance across them.
Threadpools communicate across cluster
Clusters can be shared or dedicated
Cluster nodes can be added dynamically for
load fluctuations
Unicast Can submit to specific nodes (micro Each node has to be defined to other nodes (no
management) dynamic node support)
Clusters can be shared or dedicated Increased configuration requirements (Well Known
Minimal interaction between nodes Address support)
Nodes should be on different machines (current
limitation only)
When configuring multicast or unicast a number of settings in the threadpoolworker.properties and

submitbatch.properties should be considered:
TABLE 3 – CLUS TER CONFIGURATION S ETTING

Multicast tangosol.coherence.clusteraddress
tangosol.coherence.clusterport
Unicast tangosol.coherence.localhost
tangosol.coherence.localport
tangosol.coherence.wka
tangosol.coherence.wka.port
Please refer to the Batch Operations And Configuration Guide, Batch Server Administration Guide or Server
Administration Guide for your product for details of these settings.
For more information about multi-cast and uni-cast see the following sites:
» Discussion of protocols - http://docs.oracle.com/cd/E24290_01/coh.371/e22840/networkprotocols.htm

» Configuration of the Cluster - http://docs.oracle.com/cd/E18686_01/coh.37/e18677/cluster_setup.htm
Note: In Oracle Utilities Application Framework V4.1 and above , the tangasol parameters have been
moved to tangasol-coherence-override.xml. Refer to the Batch Server Administration Guide provided with
your product for more information.
Migrating from DISTRIBUTED to CLUSTERED mode

Note: This facility applies to Oracle Utilities Application Framework V2.2 SP7 and above.
With the advent of the CLUSTERED execution mode, existing customers of the Oracle Utilities Application
Framework using the DISTRIBUTED execution mode can migrate to the CLUSTERED execution mode. To migrate to
CLUSTERED execution mode the following must be performed:
» Execute the configureEnv[.sh] utility to alter the configuration files.

» Specify the following configuration values:
Configuration Setting Comments
Batch RMI Port Default JMX port for threadpool. Must be unique per environment
Batch Mode Change to CLUSTERED
Coherence Cluster Name Name for the Cluster
Coherence Cluster Address Multi-cast address for use by this cluster
Coherence Cluster Port Port number used for multi-cast address
Coherence Mode Mode of execution of Coherance
Refer to the Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration
Guide with your product for suggested values for these parameters.
» Execute the initialSetup[.sh] utility to reflect the changes in the product. This may require additional steps
to implement the change for selected platforms. Refer to the Batch Operations And Configuration Guide, Batch
Server Administration Guide or Server Administration Guide with your product for additional advice.
» Very few changes have been made to threadpoolworker.properties and submitbatch.properties.
The tangasol.coherence.* should be set in these files.

moved to tangasol-coherence-override.xml. Refer to the Batch Server Administration Guide or Server
Administration Guide provided with your product for more information.
» Remove any custom com.splwg.grid.executionMode settings from any job specific configuration files (if
used).
» If your site wishes to use unicast rather than multicast then alter the threadpoolworker.properties and
submitbatch.properties files manually as outlined in the Batch Operations And Configuration Guide, Batch
Server Administration Guide or Server Administration Guide provided for your product version.
» You are now migrated from DISTRIBUTED to CLUSTERED execution mode.
CLUSTERED Mode Operations

One of the advantages of using CLUSTERED mode is the simplification of operations for the threadpoolworker and
the submitter nodes. The following operations are applicable to CLUSTERED mode:
» If the threadpoolworker JVM is killed or crashes each related submitter node that was running in that
threadpoolworker is immediately terminated with a non-zero return code and the relevant batch run tree entries
are set to Error status. In DISTRIBUTED mode, a submitter node waiting for work to finish is not made aware of
an event such as a kill or JVM crash until the relevant threadpoolworker is restarted.
» The submitted node can be gracefully ended by killing the submitter node process on the operating system. The
affected threads are systematically cancelled and the relevant batch run tree entry statuses updated with
appropriate messages. This is not possible in DISTRUBUTED mode. JMX can be used in DISTRIBUTED, as it
also can in CLUSTERED mode, but to cancel an entire job requires using the JMX calls to find the appropriate
threadpoolworker(s) where the threads are running and cancelling each submitter thread individually.
» Database access is minimized. In DISTRIBUTED mode the entire grid is controlled through a single database
table (F1_TSPACE_ENTRY). This table continually gets polled by listeners for newly submitted work and work that
ended, as well as lease renewal agents that manage the leases for the active nodes. This polling does result in a
significant number of additional database calls and grows as more nodes join the grid. In contrast, CLUSTERED
mode uses shared cache for clustering, which in turn controls membership, so the F1_TSPACE_ENTRY table is
only used in a very minimal capacity.
» Lease renewals issues removed. In DISTRIBUTED mode the lease renewals relied upon good database
response. If the database experienced high demand these renewals would error which could lead to incomplete
threads. In CLUSTERED mode the last renewals are cache based and do not depend on the database.

Testing for multi-cast
If your site is considering using the multicast option for CLUSTERED mode then one of the recommendations is that
you verify your environment is enabled for multicast. Oracle Coherence supplies a test utility that can be used to test
whether your machine is setup for multicast. Refer to Performing a Multicast Connectivity Test for more information
about testing for multicast.
Demonstration Single machine setup

If you are interesting in demonstrating the CLUSTERED mode or want to avoid supplying the clustered address and
clustered port then it is possible to limit the cluster to a single machine/environment then the following settings can
be set:
TABLE 5 – Demonstration settings
Configuration Setting Comments

tangosol.coherence.ttl Set to 0
tangosol.coherence.localhost Set to 127.0.0.1
tangosol.coherence.clusteraddress Remove parameter as this is defaulted
tangosol.coherence.clusterport Remove parameter as this is defaulted
Note: These settings only apply to a single copy of the product on a single machine. They are suggested for
demonstration or training purposes only and are not recommended for production.
Timed Batch
Note: This facility applies to Oracle Utilities Application Framework V4.0 and above only
One of the features of the Oracle Utilities Application Framework is the ability to support continuous or timed batch
processes. For example, there are monitor processes built into the product. These monitor processes track status
and business rules for specific objects and then process the data according to the status and object configuration.
This is typically is state transition where an object is moved from one state to another via this monitor process
according to the object specification. This monitor process will probably be more effective in this situation if it ran
continuously.
The facility consists of a Batch Control Type of Timed and a series of attributes to indicate the attributes of the batch
process at execution time. The figure below illustrates the additional attributes available when Batch Control Type is
set to Timed:
Figure 6 – Timed Batch attributes
The term continuous and timed use the same facilities within the batch framework but have different processing
flows:

» Continuous Batch –This is a style of writing a batch process. The batch logic executes in a repeated timed loop,
with configurable delay interval and an (optional) maximum duration. The timings are specified as batch
parameters, with millisecond resolution. To the outside world this just looks like a potentially very long-lived batch
run. For this mode the batch control indicates the job is controlled under a timer control.
» Timed Batch – This type of batch control allows a job to be executed every timer interval. The timer creates
BatchJobQueue entries and the batch infrastructure creates the actual BatchRun instances to execute the
batch process. The site configures the desired interval between starts (with resolution given in seconds). The
system schedules new runs at each interval if the last instance of the job has completed. If you turn the timer off
any running job finishes normally, but a new one won't be auto-scheduled.
» Timed and continuous jobs are controlled by the default threadpool and not by a scheduler. When the DEFAULT
threadpoolworker starts it will start executing any job of Batch Control Type set to Timed with the Timer
Active set to Yes. This is whether the batch daemon or batch server is enabled or not.
Note: If you intend to run Timed or Continuous Batch for your any of your monitor jobs then it is recommended that
an instance of the DEFAULT threadpool be made available to execute the timed or continuous jobs.
» To stop a timed or continuous running batch job then it is recommended to change the Timer Active value on the
Batch Control to No. The job will not restart the next timer interval.
» As continuous or timed batches are run automatically then additional information must be provided on the batch
control:
» User – The user to use for execution of the batch process. This user must have security access to the
necessary objects accessed by the batch process.
» Batch Language – The language for any messages.
» Email Address (optional) – The email address to send the output on completion of execution.
» Thread Count – Number of threads to allocated to the job
» Override Number Records to Commit (optional) – Commit interval for job. This applies to the complete
execution of the job regardless of what part of the day it is submitted.
CLUSTERED Mode recommendations

Note: This new facility applies to Oracle Utilities Application Framework V4.0 and above only
One of the new features of the CLUSTERED approach is the ability to optimize Coherence for the batch activities you
are performing on the product environment. A new configuration setting has been added to Oracle Utilities
Application Framework V4.0.2 and above, where the mode in which Coherence operates can be set. The setting is
tangosol.coherence.mode set in the submitbatch.properties and threadpoolworker.properties.
The setting has valid values of prod for Production and dev for Development. These modes do not limit access to
features, but instead alter some default configuration settings. For instance, development mode allows for faster
cluster startup to ease the development process.
The Oracle Coherence mode setting should be set using the following guidelines:
» During non-production activities not involving a cluster, the mode should be set to dev. It is recommended to use
the development mode for all pre-production activities, such as development and testing. This is an important
safety feature, because Coherence automatically prevents these nodes from joining a production cluster.
» Ensure that the CLUSTERED settings for each environment are unique across the all servers in a network. If you
are sharing multiple batch servers as a single virtual environment then they can share CLUSTERED settings but
each server will need to be identified in the configuration as unique.
» Record the use of the ports used for CLUSTERED mode according to your site standards. For example, if your site
requires that all ports are listed in the /etc/services file then this file should be updated with the port numbers
used.

For production and environments where clustering is required (for cluster testing) the mode should be set to prod.
The production mode must be explicitly specified when using Coherence in a production environment.
Threadpools and Database Recycling

Note: This advice applies to c3p0 only and therefore only applies to Oracle Utilities Application V2.2.
If at any stage the database is shutdown and restarted (i.e. recycled) any active threadpools will become zombies.
This means the threadpools will not process any work and seem to be looping. This can be addressed by changing
the tolerances for the database connection to reconnect successfully after recycling.
In the spl.properties for the batch component the following parameters need to be changed to address
connectivity for database recycles:
hibernate.c3p0.idle_test_period = 10 c3p0 will test all idle, pooled but unchecked-out

(a.k.a. c3p0.idleConnectionTestPeriod) connections, every 10 seconds. This test is done
asynchronously.
hibernate.c3p0.acquire_increment = 2 Determines how many Connections a c3p0 pool will attempt

(a.k.a. c3p0.acquireIncrement) to acquire when the pool has run out of Connections.
(Regardless of acquireIncrement, the pool will never
allow maxPoolSize to be exceeded.)
hibernate.c3p0.timeout = 0 This is optional. This parameter sets the number of seconds

(a.k.a. c3p0.maxIdleTime) a connection can remain pooled but unused before being
discarded. Zero means idle connections never expire. This
is suggested for customers who are using Oracle Audit
Vault or Oracle auditing facilities to remove contention on
audit tables when connections are idle.
EXTENDED Mode
Note:This mode is only supported for specific situations and is only available for Oracle Utilities Application
Framework V2.2 and above via Patch 1173571 (FW4.1) and Patch 11683404 (FW2.2) .
Note: This mode is not recommended unless directed by Oracle Support.
Note: Refer to the Coherence*Extend documentation for a description of this mode.
In most cases the CLUSTERED mode is sufficient for most background processing needs but if you require a large
number of submitters then you might want to consider a new mode called EXTENDED. This mode utilizes the
Coherence*Extend features to provide an alternative way for threadpoolworker and submitters to be setup.
Note: EXTENDED mode should only be used if there are a large number of submitters used and the CLUSTERED
mode is reporting communication delays. Communication delays can occur when large numbers of submitters
complete.
The EXTENDED mode differs from CLUSTERED in a number of key ways:

» Submitters using EXTENDED mode do not participate in any cluster.
» EXTENDED mode has configuration overhead in that each threadpoolworker must be configured to start a port for
listening for submitters explicitly.
» Each submitter is tied to its configured threadpoolworker (though multiple threadpoolworkers can be configured as
failovers if necessary).
Note: The same threadpoolworker can be used for both CLUSTERED and EXTENDED mode if desired.
The idea is to define threadpoolworkers in your configuration as proxies and then define the submitters to point to
those proxies to implement the mode. The proxies are defined in the coherence-cache-config.xml
configuration files and then augment the local submitbatch.properties files (using an external configuration
file) on each node to point to the proxy servers defined. The examples below illustrate a single proxy but it is
possible to define multiple proxies to the submitbatch.properties to failover balance across proxies.
To use EXTENDED mode the following must be performed:
» Change the following entries in the submitbatch.properties in the

$SPLEBASE/splapp/standalone/config subdirectory (or %SPLEBASE%\splapp\standalone\config
on Windows):
» Change the com.splwg.grid.executionMode to EXTENDED. For example:
com.splwg.grid.executionMode=EXTENDED
» Set the location of configuration file used for the mode using the tangosol.coherence.cacheconfig
configuration setting. The setting is in the format:
tangasol.coherence.cacheconfig=<fullpath>
where
<fullpath> Full path to configuration file on local machine including configuration file name (usually
extend-client-config.xml)
» Edit the $SPLEBASE/splapp/standalone/config/coherence-cache-config.xml (or

%SPLEBASE%\splapp\standalone\config\coherence-cache-config.xml on Windows) file to add the
proxy scheme section. Substitute x.x.x.x for the local host IP or hostname and yyyy for the port number to listen
upon. For example:

Create a cache configuration file extend-client-config.xml to house the configuration settings. The main
change is to implement a Proxy Server replacing the x.x.x.x with the hostname and yyyy with the port number
specified in the coherence-cache-config.xml file. For example:

Note: For a full description of the configuration file see Cache Configuration Elements
If the com.splwg.batch.cluster.pollTimeInSeconds is set in both submitbatch.properties and

threadpoolworker.properties, then ensure it is set to an appropriate value is set for your site. The value must be the
same value in both files to prevent miscommunications. The default value is 10 seconds, if no setting is specified.
The internal housekeeping daemon uses this value (2x it's value to be precise) and the client will poll using this
interval.
Note: This change alters the way that batch programs initialize context as well as ensure SQL statements are closed
at appropriate times. If this causes some custom programs issues then it is recommended to add the
com.splwg.submitbatch.useOldExitCodeHandling=true to the submitbatch.properties file. This
will enforce the older Exit code handling within the execution of the program.
Note: When executing in EXTENDED mode a message outlining that the cache configuration file has been loaded will
appear in the threadpoolworker log files.
Threading Overview
One of the major features of the batch framework is the ability to support multi-threading. The multi-threading
support allows a site to increase throughput on an individual batch job by splitting the total workload across multiple
individual threads. This means each thread has fine level control over a segment of the total data volume at any
time.
The idea behind the threading is based upon the notion that "many hands make light work". Each thread takes a
segment of data in parallel and operates on that smaller set. The object identifier allocation algorithm built into the
product randomly assigns keys to help ensure an even distribution of the numbers of records across the threads and
to minimize resource and lock contention.
The best way to visualize the concept of threading is to use a pie analogy. Imagine the total workset for a batch job
is a pie. If you split that pie into equal sized segments, each segment would represent an individual thread.
The concept of threading has advantages and disadvantages:

» Smaller elapsed runtimes - Jobs that are multi-threaded finish earlier than jobs that are single threaded. With
smaller amounts of work to do, jobs with threading will finish earlier.
Note: The elapsed runtime of the threads is rarely proportional to the number of threads executed. Even though
contention is minimized, some contention does exist for resources which can adversely affect runtime.
» Threads can be managed individually – Each thread can be started individually and can also be restarted
individually in case of failure. If you need to rerun thread X then that is the only thread that needs to be
resubmitted.
» Threading can be somewhat dynamic – The number of threads that are run on any instance can be varied as
the thread number and thread limit are parameters passed to the job at runtime. They can also be configured
using the configuration files outlined in this document and the relevant manuals.
Note: Threading is not dynamic after the job has been submitted.
» Failure due to data issues with threading is reduced – As mentioned earlier individual threads can be
restarted in case of failure. This limits the risk to the total job if there is a data issue with a particular thread or a
group of threads.
» Number of threads is not infinite – As with any resource there is a theoretical limit. While the thread limit can be
up to 1000 threads, the number of threads you can physically execute will be limited by the CPU and IO
resources available to the job at execution time.

Theoretically with the objects identifiers evenly spread across the threads the elapsed runtime for the threads should
all be the same. In other words, when executing in multiple threads theoretically all the threads should finish at the
same time. Whilst this is possible, it is also possible that individual threads may take longer than other threads for
the following reasons:
» Workloads within the threads are not always the same - Whilst each thread is operating on the roughly the
same amounts of objects, the amount of processing for each object is not always the same. For example, an
account may have a more complex rate which requires more processing or a meter has a complex amount of
configuration to process. If a thread has a higher proportion of objects with complex processing it will take longer
than a thread with simple processing. The amount of processing is dependent on the configuration of the
individual data for the job.
» Data may be skewed – Even though the object identifier generation algorithm attempts to spread the object
identifiers across threads there are some jobs that use additional factors to select records for processing. If any of
those factors exhibit any data skew then certain threads may finish later. For example, if more accounts are
allocated to a particular part of a schedule then threads in that schedule may finish later than other threads
executed.
Threading is important to the success of individual jobs. For more guidelines and techniques for optimizing threading
refer to Multi-Threading Guidelines.
Operational Best Practices

The following section outlines the common operations best practices based upon customers experience and internal
studies.
Parameter Guidelines
Given the flexibility of the overriding configuration settings, there are a number of permutations and combinations of
configuration files and command line options to suit your sites needs. The following guidelines may assist in
deciding the optimal mix for your site:
» Internal defaults should not be relied upon for non-development use. They are provided for developers to unit test
their code for various testing techniques.
» The threadpoolworker.properties and submitbatch.properties files should represent your sites
global parameter settings. See Commonly Used Configuration Settings for guidelines of what commonly is set in
those files.
» For the submitjob[.sh] utility the provision of a job specific parameter file is not necessary in most cases. The
following are the only exceptions to this rule:
» If the background process requires additional parameters then a job specific parameter file is required.
» If any parameter on the job must be overridden on a regular basis then a job specific parameter file should be
created and only contain the parameters that are to be overridden.
» The command line options should only be used for reruns or off schedule (a.k.a. special runs). This avoids
updating the configuration files for one-off or special processing. The only exception to this rule is that the
Business Date parameter should be specified on the command line to avoid the past midnight issue. For more
details of this see Scheduler Implementation Guidelines.
» While it is possible to override most parameters on the command line, it is not desirable to do so as this is not
efficient. Configuration files are designed to minimize the need for command line overrides unless such overrides
are applicable to the particular execution of the background process.
» Command line options are the only option to be used if threads in a background process need different
parameters. For example, if the record range is available as a parameter then the command line options must be
used to specify that record range on the command line.

It is recommended to specify the threadpool on the threadpoolworker[.sh] utility to avoid confusion and allow
greater control to start a particular threadpool. This can apply to the submitjob[.sh] utility as well as it allows the
background process to be targeted to a specific named threadpool.
Use Of Batch Control For Parameters

One source of configuration settings is the Batch Control. Typically, most sites configure their Batch Controls with
the default values for their business practices. In particular extra parameters values and defaults are specified on
batch controls to assist in testing.
While the values are stored in the Batch Control they are not used by the Scheduler Submission Method as the
configuration files are the source of the configuration parameters. This information needs to be extracted to the
configuration files from the batch control.
Note: The parameters used by the Batch Control are stored in the CI_BATCH_CTRL_P table which has Batch Code
(BATCH_CD) and Batch Parameter Name (BATCH_PARM_NAME) as keys. The parameter value is stored in column
BATCH_PARM_VAL. It is not recommended to use BATCH_PARM_NAME='MAX-ERRORS'. Refer to Setting The Error
Tolerance for details. Remember ANY parameter must be in job specific configuration files in the format:
com.splwg.batch.submitter.softParameter.<BATCH_PARM_NAME>=<BATCH_PARM_VAL> and do not
include Blank values.
The reason the Batch Control is not referenced is that the Batch Control is managed by the business, where the
configuration files are under IT control. Typically, Change Management principles separate the responsibility for
these elements. The primary reason is to isolate the physical system from un-intentional changes to parameters by
the business.
Multiple Batch Controls for a Batch Program

One of the flexible features of the batch control is the fact that a batch program can be referenced on multiple batch
controls. The batch program is a parameter of a batch control. This is useful to use in a number of situations:
» You need to run an execution of a program where you do not want to alter the existing batch control. The batch
control holds the run number for the next execution and there may be a business situation where you need to run
a one-off execution and do not want it to affect the batch number.
» You want to execute a business process a number of times with different parameters. Some customers use a
common extract format (via a common extract program) but run it multiple times with different parameters to send
to individual interface targets. For example, you may decide a common format for sending collection information
to a number of collection agencies. In this case, a common program would be written and a batch control created
per collection agency that needed the information. This would allow tracking at an individual collection agency
level and a separation of the execution of the process.
» You have a new background based business process you want to trial before you replace an existing background
process. This allows parallel execution.
Take Defaults For Parameters

Each background process has a number of standard parameters that control its behavior. In most cases the default
values are sufficient for the needs of most sites.
Note: Environmental settings such userid, file location and file names are exceptions to this guideline.
The following values should not be altered unless otherwise recommended:
» Trace Program Start

» Trace Program Exit
» Trace SQL
» Trace Standard Output.
Commonly Used Configuration Settings

Typically not all configuration settings are configured for each site as defaults are sufficient or can be set at a global
level to cover all background processes. The following guidelines should be considered when setting values:
» It is not recommended to set MAX-ERRORS at a global level. This should only be specified at an individual thread
of a background process level. Refer to Setting The Error Tolerance for more details.
» The Userid used for background processes should be specified at a global level (in submitbatch.properties
configuration file). This userid is used to mark records processed with this userid and security.
» It is not recommended that the default SYSUSER be used as the userid specified in background processes.
SYSUSER is the initial default user for the Oracle Utilities Application Framework and is only used to input
additional user in the initial stages of configuration. Most customers create a dedicated user record (such as
BATCH) to delineate background processes from online processes.
» The promptForValues parameter should be set to false in the submitbatch.properties configuration
file. Alternatives are only used for development purposes.
» The executionMode parameter should be set to CLUSTERED in the submitbatch.properties and
threadpoolworker.properties configuration files. For customers using versions of Oracle Utilities
Application Framework not supporting the CLUSTERED execution mode, then an executionMode of
DISTRIBUTED should be used. Refer to the Batch Operations And Configuration Guide, Batch Server
Administration Guide or Server Administration Guide supplied with your product for details of this setting.
» The distThreadPool parameter in the submitbatch.properties configuration file should be set to a
common site specific default threadpool used in your implementation. Refer to Designing Your Threadpools for
more details.
Setting The Error Tolerance

By default, there are two circumstances where a background process will not complete successfully. First of all there
is some technical issue with the environment or program and the other is if all records processed fail some business
rule. The latter condition may not be appropriate for your site. For example, if all but one payment fails in the
payment background processes then the process is still considered complete.
To support a more realistic tolerance, it is possible to set a limit on the number of errors tolerated before the process
should be cancelled. At the thread level, the MAX-ERRORS parameter 7 can be used to specify a thread level error
tolerance where the thread will be cancelled when the tolerance is reached or exceeded.
The default value of MAX-ERRORS is 0 (zero) which turns off the facility. Any appropriate non-zero value will then
become the error tolerance limit for the thread.
Setting the appropriate value for your site will require business approval in line with your sites organization business
practices.
Note: The MAX-ERRORS parameter is applicable to business errors only. System errors (SEVERE errors) will
terminate the process immediately.
Multi-threading Guidelines
7 The parameters MAX-ERRORS, MAX_ERRORS and maxErrors are all acceptable names for this parameter. For publishing purposes MAX-ERRORS
will be used.

Running a background process in multiple threads is almost always faster than running it in a single thread. The
trick is determining the number of threads that is optimal for each process.
A rule of thumb that may be used is to have three (3) threads per core available. For example if you have a quad
core processor, you can run twelve (12) threads to begin your testing.
This is a rule of thumb because the footprint of each process is different (heavy versus light) and is dependent on
the data in your database. Your hardware configuration (i.e., number of processors, speed of your disk drives,
speed of the network between the database server and the application server) also has an impact on the optimal
number of threads. Please follow these guidelines to determine the optimal number of threads for each background
process:
» Execute the background process using the number of threads dictated by the rule of thumb (described above).
During this execution, monitor the utilization percentage of your application server, database server and network
traffic.
» If you find that your database server has hit 80-100% utilization, but your application server hasn’t one of the
following is probably occurring:
» There may be a problematic SQL statement executing during the process. You must capture a database trace to
identify the problem SQL.
» It is also possible that your commit frequency may be too large. Commit frequency is a parameter supplied to
every background process. If it is too large, the database’s hold queues can start swapping. Refer to Parameters
Supplied to Background Processes for more information about this parameter.
» It is normal if you find that your application server has hit 80-100% utilization but your database server has not.
This is normal because, in general, all processes may become CPU bound and not IO bound. At this point, you
should decrease the number of threads until just under 90-100% of the application server utilization is achieved.
And this will be the optimal number of threads required for this background process.
» If you find that your application server has NOT hit 80-100% utilization, you should increase the number of
threads until you achieve just under 90-100% utilization on the application server. And remember, the application
server should achieve 80-100% utilization before the database server reaches 100% utilization. If this proves not
to be true, something is probably wrong with an SQL statement and you must capture an SQL trace to determine
the culprit.
Note: For the Windows platform, the CPU should not exceed 70-80% to provide enough additional CPU for the
operating system to process.
» Another way to achieve similar results is to start out with a small number of threads and increase the number of
threads until you have maximized throughput. The definition of throughput may differ for each process but can be
generalized as a simple count of the records processed in the Batch Run Tree. For example, in the Billing
background process in product, throughput is the number of bills processed per minute. If you opt to use this
method, it is recommended that you graph a curve of throughput vs. number of threads. The graph should
display a curve that is steep at first but then flattens as more threads are added. Eventually adding more threads
will cause the throughput to decline. Through this type of analysis you can determine the optimum number of
threads to execute for any given process.
Altering Commit Interval and Cursor Reinitialization time

Note: Cursor Reinitialization Time does not apply to Java based processes and is not used for any batch process in
Oracle Utilities Application Framework V4.3 and above.
There are two parameters that can be altered to control the amount of resources a background process uses when it
executes. While most implementations reuse the default values supplied with the product, it is possible to alter the
values to tune the performance of background processes and allow background processes to be executed during
peak application usage periods.

The Commit Interval parameter controls the Unit of work on the background process. It controls after how many
primary objects are processed before a commit point is taken. It controls the amount of work the database has to do
and when a checkpoint is taken for restart purposes. At the commit point in a background process, all the
outstanding work, since the last commit, is reflected on the database and a checkpoint is written on the batch control
record for the job. You usually take the default for the job but it can be adjusted according to the circumstances.
The higher commit frequency the value specified the less commit points are taken. Typically commit points can be
expensive transactions in a database so most implementations try and minimize them. The lower value the more
commit points. The latter is desirable when you are running background during online hours as it reduces the
resource hit the background process has on online processes.
Specifying a large commit frequency can cause larger than normal rollbacks to be performed by the database. This
can cause a strain on the database and hardware. The longer the unit of work the more work the database has to do
to roll it back in case of a failure.
The commit interval can have advantages and disadvantages depending on the situation and the value:
TABLE 6 – COMMIT FREQUENCY CONSIDERATIONS
High value for Commit Frequency Low values for Commit Frequency
Less Commits in process (Less checkpoints). More Commits in process (More frequent checkpoints).
Larger Unit of Work Smaller Unit of work
Lower Concurrency (higher impact on other users) Higher Concurrency (lower impact on other users)
Longer rollback in case of failure Shorter rollbacks in case of failure
Can increase throughput on lightly loaded system Can allow background to work harmoniously with online.
The second parameter Cursor Reinitialization Time (in minutes) controls how long a solution set is held by the
process. When Oracle database processes a set of records it typically holds a snap shot of these records to save
processing time. If the set is held too long, the records may not reflect the state of the database so it is a good idea
for the Oracle database to maintain the currency of this data regularly. Within the background process this is
controlled by setting this value to prevent the snapshot being discarded by Oracle database and causing an abort. If
the records are held too long an ORA-1555 Snaphot too old error is generated and the process aborts.
The Cursor Reinitialization and Commit interval parameters are tunable parameters to affect the impact of the
background processes on the other processes running and prevent internal database errors. It is also important to
understand their impact to ascertain whether any change is required. The following rules of thumb apply to setting
the values:
» It is recommended that the Commit Interval should not be set to one (1) as this value may causes excessive
database I/O and therefore performance degradation.
» For light jobs (short duration, single threaded, small numbers of records etc), the default value for Commit Interval
may satisfy your site performance requirements.
» For heavy jobs (long duration, multi-threaded, large number of records etc), then a value for Commit Interval of
between 5 (five) to 20 (twenty) is recommended.
» The value of the Commit Interval directly affects the size of the redo logs allocated to the database. The higher
the commit interval the larger the redo logs need to be to hold the inprocess objects. Work with your sites DBA
groups to come up with a compromise between redo logs and commit interval.
During processing of any background process a main object is used to drive the process. For example in BILLING
the main object is Account. The BILLING process loops through the accounts objects as it processes. For other
processes it is other objects that are considered the main object. This main object type is used to determine when a
transaction is complete.

For both Cursor Reinitialization and Commit interval this is important as:
» When a certain number of main objects have been processed then a database commit is issued to the database.
This number is the Commit Interval. The larger the commit interval the larger the amount of work that the
database has to keep track of between commit points.
» The Cursor Reinitialization parameter is used to minimize issues in the Oracle database where the unit of work is
so large it causes a "Snapshot too old". The Oracle database stores undo information on the Rollback Segment
and the read consistent information for the current open cursor is no longer available. This is primarily caused
when the Oracle database recycles the Rollback Segment storage regularly. In Oracle Utilities Application
Framework based products this is prevented by reinitializing the cursor on a regular basis to prevent an error.
When this timeout, known as the Cursor Reinitialization, is exceeded then at the end of the current transaction a
commit will be issued.
» At any time in a process a commit for objects processed may be caused by the reaching the Commit Interval or
the time limit set on Cursor Reinitialization, whichever comes first.
» The setting of Commit Frequency and Cursor Initialization has impact on the amount of memory the JVM memory
space allocated to the individual threads. Higher values of both require more memory to hold the data.
Note: The Cursor Reinitialization parameter only applies to COBOL based batch processes.
Online Daemon or Standalone Daemon

In Oracle Utilities Application Framework V2.1 and above a background processing daemon was introduced to
support the Online Submission Method (and the internal scheduler). This facility replaced the cron based daemon
in previous versions. The daemon's responsibility is to execute any background processes that have been
registered by the online submission transaction.
Note: If the daemon is enabled in more than JVM, then the grid will ensure that only one daemon is active at any
time. If the JVM running the daemon fails for any reason, other JVM's in the grid will assume the role of the
daemon. As long as at least one JVM is configured to accept the scheduling daemon, the daemon will run on exactly
one thread somewhere among the batch grid JVMs. For example, if there are 3 JVMs configured to accept the
scheduling daemon, the daemon may end up running in one thread in JVM #2 and not at all on JVMs #1 and #3. If
JVM #2 goes down, the scheduler daemon will start running on one thread either in JVM #1 or #3, but not both.
This ensures that the scheduler is always running, if at all possible, and duplicate submissions do not happen.
There are two options to use the daemon:
» The daemon can be executed with an existing J2EE Business Application Server shared with the online. This
reserves some capacity from the server to execute the background processes.
» The daemon can be run within a dedicated standalone batch server that is not shared with online. This allows the
execution of background processing to be processed on dedicated server hardware.
While the decision to use an online or standalone daemon is a site specific one there are a number of factors that
should be considered when making this decision:
Daemon Advantages Disadvantages
Online Easiest to configure (Installation option). If background process misbehaves it can effect
Does not require separate resources. online.
Useful for non-production. Cannot manage daemon or trace daemon directly or

via JMX.
Borrows Application Server capacity from online.
Standalone Direct management possible (JMX or command line based). Additional configuration and management.

Daemon Advantages Disadvantages
Does not directly affect online processing. Requires additional resources.
Online Submission Guidelines

» The Oracle Utilities Application Framework includes a facility for allowing the online users submit batch processes
as part of their testing activities. Whilst this facility is handy for use for submitting jobs there are a few guidelines
for it's optimal use:
» The online submission facility is not recommended for production use. The facility is designed primarily for
training, demonstration and testing use only and has functionality limited to those uses.
» The online submission facility will run all processes in the DEFAULT or LOCAL 8 threadpools only by default. This
behavior can be overridden using the DIST-THD-POOL batch parameter. This will need to be added to the Batch
Control definition for individual jobs that require this functionality. This is equivalent to the
com.splwg.batch.submitter.distThreadPool submitter parameter.
» When navigating to the Batch Run tree from this transaction, all executions of a job are listed on the selection
dialog. By default, the submission records are sorted by Batch Number in descending order. The first entry in the
list is the latest execution of the job, by default.
Use of Spacenames
By default, all batch jobs submitted are run within the same housekeeping space within the batch grid 9. This
behavior is designed for production use to maximize the efficiency and resource usage of the batch grid. During
development of batch code, it may be desirable to execute each developer's workload in their own spaces to isolate
developers from affecting each other. There is a facility within the product to override the space used by the batch
grid to allow segregation.
To do this the developer must specify the following parameter for the job (in the properties file for the individual
job):
com.splwg.grid.spaceName=<spacename>
where <spacename> is the name of the desired space.
For example:
com.splwg.grid.spaceName=TEST1
Note: This parameter is intended as a development and testing aid only. It provides a hard partition between
workers. Each space name has its own HouseKeepingDaemon and is therefore totally separate from workers with
different space names. For production purposes, distributed thread pools are more flexible and should be used
instead.
Maximum Execution Attempts

By default the threadpoolworkers attempt execution of a batch process once. There is a parameter that specifies
how many times the worker(s) in the grid should attempt execution of the work submitted by this submitter. If an
application program crashes and brings down the worker JVM with it, this parameter is designed to prevent any
other worker nodes in the grid from picking up this same bad work request and thereby spreading the poison work
around the grid, crashing JVMs along the way and ultimately bringing the batch grid down completely.
8 The LOCAL threadpool applies to Oracle Utilities Application Framework V2.1 only.
9 It is known as the MAIN space.

The default for this is 1, but parameter maxExecutionAttempts can be used to override it as follows:
com.splwg.batch.submitter.maxExecutionAttempts=n
where n is a number greater than 0.
The default is set to 1 and it is highly recommended that it should be at that value unless there instructed by Oracle
Support.
Number of Threads Per Threadpool

Each threadpool defined for the product contains a thread limit and is also subject to the java thread limits imposed
by the JVM vendor. To understand these thread limits a few concepts must be understood.
Each version of java from each java vendor supports a limit of the active threads that can be supported. This value
varies from java version to java version and java vendor to java vendor. Typically this varies from 150-600 active
threads per JVM. This implies that you can run 150+ threads of background processing but this is not the case.
Java threads tend to be short lived (typically associated with online or web services style work) and the limit of java
seems to support that level. Conversely, background process threads tend to be long lived and therefore take more
resource then short lived threads. This can be explained by the fact that online transactions and web services tend
to operate on one object and background process on multiple.
It is recommended that the maximum number of threads you consider per thread pool is 8 for heavy jobs and 10-15
for light jobs.
Customers using COBOL based objects should considering minimizing the number of threads in each pool to
reduce Access Violation Errors generated by the Microfocus runtime. If your site finds that this occurs, reducing the
number of concurrent threads per threadpool can reduce the occurance of the problem.
The thread limit is set at a threadpool level as part of the threadpoolworker[.sh] utility. This thread limit is set
to prevent the JVM from using more resources than required. Exceeding this limit will result in background process
execution delays as the process waits for an available thread.
Even though this limit is set, it represents the maximum number of potential threads in the threadpool. Not all
background processes have an equal footprint on the system. Some are heavier (use more resources) than others.
The footprint of a particular background process is not measured by the volume of data but by the thruput of the
background process. The heavier background processes tend to have lower throughput rates than lighter
background processes.
The importance of the footprint is related to the number of threads that can be actually executed at any time within
the threadpool. A threadpool will process less heavier background processes and more lighter background
processes. For example, if the threadpool limit is set to ten (10) and you try and run ten (10) heavy threads then the
JVM may run the background processes slower due to threadpool having capacity issues. If you sent ten (10) lighter
threads it may process them adequately.
To determine the optimal number of threads for your threadpools:
» During the testing phase, set all the threadpools used to the 10-15 thread limit. Refer to Designing Your
Threadpools for additional advice.
» Allocate each threads to the threadpool up to the limit. Note the run times.
» Decrease the number of threads sent to the pool and note the run times.

This should assist in determining whether the threadpool can optimally handle less or more threads and the relative
footprint of the threads used.
Note: The online daemon DEFAULT threadpool should remain at five (5).
Designing Your Threadpools

The key to using the threadpoolworker[.sh] and submitjob[.sh] utilities is to design the number and
composition of threadpools. The following guidelines have been found from research with customer and internal
studies to assist in this design process:
» Separate java based processes from COBOL based processes. This will assist in micromanaging the threads in
case you need to stop threads of a background process. The jmxbatchclient[.sh] utility can only kill
individual java based background processes so to kill COBOL based background processes it is recommended to
use jmxbatchclient[.sh] to shutdown the threadpool. This will stop all threads in the threadpool but is the
only way to stop a COBOL based background process within the Oracle Utilities Application Framework.
Note: jmsbatchclient[.sh] can be used for any batch process regardless of the language used to write the
functionality.
Note: COBOL support is not available for Oracle Utilities Application Framework V4.3.x and above.
» The number of threads you are simultaneously executing at any point in the schedule will dictate the number of
threadpools to be used for your site. For example, if 80 threads are executing at any time then 8-10 threadpools
may be necessary (this is only a rough calculation).
» Group light footprint background processes into a common set small number of threadpools. Typically
background processes that operate on the same type of data are ideal. Consult your product documentation as
the background processes are usually grouped by functional area already for consideration.
» Consider splitting heavy footprint background processes into a number of threadpools. The best way to determine
this is through trial and error by determining the optimal number of threads per threadpool for that particular
background process.
» Name your threadpools appropriate for their function. The DEFAULT threadpool is reserved for the daemon use
only and should not be used if the daemon is active in that environment.
This may sound quite complicated but the process can be simplified in a number of ways:
» Java based background processes are the easiest to manage so can be grouped together in a small number of
threadpools.
» Increasing the number of threads used to limit the threads to smaller units of work is acceptable for heavy
processes as subdividing the workload into small units can lead to an increase to thruput levels. Smaller units of
work reduce the memory and CPU of the JVM by reducing work queue length.
» The large heavy background processes that require multi-threading should be separated in their own dedicated
threadpools. This threadpool can be started prior to the first thread of the background process starting and
shutdown after the last thread has completed.
Note: In most product implementations, the number of multi-threaded background processes usually represents
fewer than 10% of the total number of background processes that need to be managed.
» Shut down any threadpools no longer used by your schedule. This may reduce overall resource usage.
Th re a d p o o l Clu s te r Co ns id e ra tio n s
The CLUSTERED execution mode allows for clustering of threadpools. There are a few guidelines when using this
facility that may assist in configuration and operations:

» One Cluster per environment – The current implementation of CLUSTERED execution mode only allows one
cluster per environment. This is sufficient for the majority of most needs. The implementation can run a number of
threadpools within the cluster (the threadpools can be clustered [same name] or non-clustered [different names]).
» More than one machine – Clustering of threadpools (same name) requires more than one machine to be
optimal. It is not possible to cluster a threadpool within a single machine.
» Name your cluster appropriately - The name can be up to thirty-two (32) characters to define the name of the
cluster. This is required and must be unique for each environment. With classic DISTRIBUTED mode, the batch
JVMs for an environment is naturally grouped because they register themselves through database table
F1_TSPACE_ENTRY, but in CLUSTERED mode the JVMs are joined through a Coherence cache. The cache may
be restricted to one environment or across all environments, so a unique cluster name, along with address and
port, is required to ensure that they are appropriately grouped per environment. Environments are typically
separated by database and/or database user, so a possible convention may be to use a combination of database
name and owner Id as the cluster name, for example "DEMO.SPLADM" or simply use the environment name, for
example, "TUGBUDEMO".
» Use Multicast – By default, multicast is enabled in the configuration files. It should be used unless multicast is
deemed inappropriate for your site (some network administrators do not like multicast) then unicast is supported.
Refer to Clustering using Unicast or Multicast for more information.
COBOL Memory Optimization for Batch

Note: COBOL support is not available in Oracle Utilities Application Framework V4.3.x and above.
In Oracle Utilities Application Framework V2.1 and V2.2 and above, a number of configuration settings were
introduced to optimize the memory used by the threadpool based JVMs. These settings control the behavior of the
threadpool based JVMs in terms of their memory usage.
To optimize the settings for background processing it is recommended to set these memory settings to the following:
» Releasing COBOL Memory – COBOL programs only release their thread-bound memory when the thread dies.
This thread-bound memory is primarily memory allocated by the COBOL runtime on the C heap. As threads
return to the thread pool and are used again to process calls to different COBOL programs, the memory footprint
may continue to grow as more and more different COBOL programs are called. In an online processing scenario,
this can cause memory faults in the long run as many COBOL modules are called during the availability of the
product. During background processing, this problem is somewhat reduced as the number of COBOL modules
called is much lower. Therefore the configuration settings that controls this behavior in spl.properties for
background processing should be set as follows (opposite to what is recommended for online):
spl.runtime.cobol.remote.releaseThreadMemoryAfterEachCall=false
» Minimizing Housekeeping – When using the DISTRIBUTED mode of execution, the threadpool worker polls the
F1_TSPACE* tables to check for new available jobs. By default this poll is performed every 1000ms (1 second).
This can be inefficient for the threadpool worker process, therefore it is recommended to change this tolerance to
5000ms (5 seconds) in production to reduce overheads. This can be implemented by changing the
etc\threadpoolworker.properties file and adding the configuration setting:
com.splwg.grid.polling.minMillisBetweenCycles=5000
Example Clustered Mode scenarios

When utilizing CLUSTERED mode there are a number of scenarios that are common in terms of architecture and
configuration. The following sections will outline each common scenario, give an example and illustrate how to
implement the scenario within the configuration files used for the batch component.
Note: Whilst there are numerous variations available for each scenario, the samples used in this section are generic
and simplified to cover the more pertinent aspects of the specific scenario.

Ge n e ric c o n fig u ra tio n p ro c e s s
Before discussing the scenarios it is important to understand the process of what configuration files to update to
setup the configuration. Primarily there are three steps to consider:
» Attach to the environment by issuing the splenviron[.sh] command from the relevant host. If the scenario
then this will be repeated for each host and environment to implement the change.
» The threadpool can be optionally defined in the threadpoolworker.properties file. Definition of the
threadpool in this file can be skipped if the threadpool is dynamically created using the options on the
threadpoolworker[.sh] utility. Refer to the Batch Operations And Configuration Guide, Batch Server
Administration Guide or Server Administration Guide for details of the options for this utility. The location of the
threadpoolworker.properties file varies with Oracle Utilities Application Framework versions (refer to the
Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide)
for the location of this file.
» In Oracle Utilities Application Framework V4.1 and above, the cluster configuration is defined in the tangasol-
coherence-override.xml file as opposed to the threadpoolworker.properties file used by Oracle
Utilities Application Framework 4.0.x and below versions of the framework.
The figure below summarizes the steps used in the configuration:
Start
Oracle Utilities Application Framework V4.1 Oracle Utilities Application Framework

and above V4.0.x and below
Attach to environment(s)
Setup Threadpool threadpoolworker.properties Setup Threadpool
tangasol-coherence-
Setup Clustered mode override.xml
Setup Clustered mode
End
Figure 7 – Setup of CLUSTERED mode
When making changes, please ensure the configuration files conform to the expected format as outlined in the
Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide for
the version of the Oracle Utilities Application Framework used.
Note: For publishing purposes both multicast and unicast examples will be shown.

S a m p le S e tu p
To illustrate the scenarios a sample setup has been devised to cover all the scenarios. The figure below illustrates
the sample used for the examples:
host1 (10.1.10.1) host2

(10.1.10.2)
submitter submitter submitter
threadpool threadpool threadpool threadpool threadpool

(SCEN1) (SCEN2) (SCEN2) (SCEN3) (SCEN3)
C:Multiple threadpools / Multiple hosts
B:Multiple threadpools / Single host
A:Single threadpool / Single host
Figure 8 – Example threadpool setup
The sample consists of two hosts (host1 and host2) which house an identical copy of the product. There are a
number of threadpools (SCEN1, SCEN2 and SCEN3) spread across the hosts.
Note: For publishing we will assume each threadpool has an arbitrary limit of 8 threads.
This setup will illustrate the following scenarios:
TABLE 8 – Clustered Scenarios
Scenario Attributes
A Single threadpool on a single host/environment. This is a common scenario for non-production environment such
as development and initial test environments.
B Multiple threadpools on a single host/environment. This is a scenario used for testing and for executing larger
numbers of jobs and threads simultaneous.
C Multiple threadpools across multiple machines but in a single "environment" (such as production). This is a scenario
where the site may have multiple servers for an environment (e.g. production) and want to run jobs across the
machines.
D Multiple different pool combinations. A combination of scenarios A, B and C.
S e tu p th re a d p o o ls
The first part of the process is to setup the threadpoolworker.properties file with the definitions of the
threadpools required for tour scenario. This step is optional as the threadpools can be dynamically created using the
threadpoolworker[.sh] utility.
For each threadpool on a host a definition of threadpool can exist in threadpoolworker.properties are in
form:
com.splwg.grid.distThreadPool.threads.<poolname>=<threads>

where
<threads> Maximum threads for the threadpoolworker
<poolname> Threadpool name
The following table shows the threadpool file parameter entries for each scenario.
TABLE 9 – CLUS TERED SCENARIOS
Scenario threadpoolworker.properties file entries
A On host1:
com.splwg.grid.distThreadPool.threads.SCEN1=8
B On host1:
C On host1:
D On host1:
host2:
S e tu p Clu s te re d Mo d e
The next step is to define the clustered mode information for the host/environment. There are two approaches to
consider. The clustered mode can use multicast or unicast (see Clustering using Unicast or Multicast for more
information about the different modes).
This information is specified in the tangasol-coherence-override.xml file for Oracle Utilities Application
Framework V4.1x and above products or he threadpoolworker.properties file for Oracle Utilities Application
Framework V4.0.x and below (including V2.x) customers.
This requires setting up the following parameters:
TABLE 10 – CLUS TERED P ARAMETERS
MulticasT Unicast
Set tangosol.coherence.cluster to the cluster name. Refer Set tangosol.coherence.cluster to the cluster name.
to the Server Administration Guide, Batch Server Administration Refer to the Server Administration Guide, Batch Server
Guide or Batch Operations and Configuration Guide for Administration Guide or Batch Operations and Configuration
recommendations. Guide for recommendations.
Set tangosol.coherence.clusteraddress to a unique IP Set tangosol.coherence.localport to the port number

address for the cluster to use for multicast communications. Refer used by this node in the cluster. This must be unique across
to the Server Administration Guide, Batch Server Administration all the environments on the machine.
Guide or Batch Operations and Configuration Guide for valid IP
ranges.
Set the tangosol.coherence.clusterport to a unique port Set tangosol.coherence.wka to the host name or host IP
number for the cluster for the environment. address (if DNS resolution is slow) of the hosts in the cluster
Set tangosol.coherence.distributed.localstorage to Set tangosol.coherence.wka.port to the same value as

false to minimize heap size used by threadpoolworker JVM's. tangosol.coherence.localport if the implementing
See Coherence Best Practices for more information single host solution else set the port number to the same
unique number across all hosts on the cluster.

Therefore, if using Oracle Utilities Application Framework V4.0 and below, the threadpoolworker.properties
entries for the scenarios are as follows:
TABLE 11 – THREADP OOLWORKER.P ROPERTIES CLUS TERED VALUES FOR SCENARIOS
Scenari MULTICAST UNICAST

o
A, B host1: host1:
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.clusteraddress=239.128.0.10 M
tangosol.coherence.clusterport=7810 tangosol.coherence.localport=7810
tangosol.coherence.distributed.localstorage=fals tangosol.coherence.wkaport=7810
e tangosol.coherence.wka=10.1.10.1
C, D host1: host1:
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.clusteraddress=239.128.0.10 M
tangosol.coherence.clusterport=7810 tangosol.coherence.port1=7820
tangosol.coherence.distributed.localstorage=fals tangosol.coherence.port2=7830
e tangosol.coherence.wkaport=7810
host2: tangosol.coherence.wka1=10.1.10.1
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.wka2=10.1.10.2
tangosol.coherence.clusteraddress=239.128.0.10 host2:
tangosol.coherence.clusterport=7810 tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.distributed.localstorage=fals M
e tangosol.coherence.port2=7820
tangosol.coherence.port1=7830
tangosol.coherence.wkaport=7810
tangosol.coherence.wka2=10.1.10.1
tangosol.coherence.wka1=10.1.10.2
Therefore, if using Oracle Utilities Application Framework V4.1 and above, the tangasol-coherence-
override.xml entries for the scenarios are as follows:
TABLE 12 – TANGAS OL-COHERENCE-OVERRIDE.XML CLUS TERED VALUES FOR SCENARIOS
Scenarios A,B Multicast
<coherence>
<cluster-config>
<services>
<service id="1">
<init-params>
<init-param id="4">

<param-name>local-storage</param-name>
<param-value system-
property="tangosol.coherence.distributed.localstorage">false</param-value>
</init-param>
</init-params>
</service>
</services>
<member-identity>
<cluster-name system-
property="tangosol.coherence.cluster">FWDEMO.SPLADM</cluster-name>
</member-identity>
<multicast-listener>
<address system-
property="tangosol.coherence.clusteraddress">239.128.0.10</address>
<port system-property="tangosol.coherence.clusterport">7810</port>
</multicast-listener>
</cluster-config>
</coherence>
Scenarios A,B Unicast
<coherence>
<cluster-config>
<member-identity>
</member-identity>
<unicast-listener>
<well-known-addresses>
<socket-address id="1">
<address system-property="tangosol.coherence.wka">host1</address>
<port system-property="tangosol.coherence.wka.port">7810</port>
</socket-address>

</unicast-listener>
<address system-property="tangosol.coherence.localhost">localhost</address>
<port system-property="tangosol.coherence.localport">7810</port>
</cluster-config>
</coherence>
Scenarios C,D Multicast
<coherence>
<cluster-config>
<services>
<service id="1">
<init-params>
<init-param id="4">
<param-name>local-storage</param-name>
<param-value system-
property="tangosol.coherence.distributed.localstorage">false</param-value>
</init-param>
</init-params>
</service>
</services>
<member-identity>
</member-identity>
<multicast-listener>
<address system-
property="tangosol.coherence.clusteraddress">239.128.0.10</address>
<port system-property="tangosol.coherence.clusterport">7810</port>
</multicast-listener>
</cluster-config>
</coherence>
Scenarios C,D Unicast

<coherence>
<cluster-config>
<member-identity>
</member-identity>
<unicast-listener>
<well-known-addresses>
<address system-property="tangosol.coherence.wka1">host1</address>
<port system-property="tangosol.coherence.wka1.port">7810</port>
</socket-address>
<address system-property="tangosol.coherence.wka2">host2</address>
<port system-property="tangosol.coherence.wka2.port">7820</port>
</socket-address>
</well-known-addresses>
</unicast-listener>
<address system-property="tangosol.coherence.localhost">localhost</address>
<port system-property="tangosol.coherence.localport">7810</port>
</cluster-config>
</coherence>
S ta rtin g th e th re a d p o o ls
The key now is to start the threadpools using the threadpoolworker[.sh]. Generally the command:
threadpoolworker[.sh]
is sufficient to start all the threadpools in the threadpoolworker.properties file but if you are staring multiple instances
of the same pool (SCEN2) then you need to run additional explicit commands for each instance. For example for
SCEN2:
threadpoolworker[.sh] –p SCEN2=<threads> -i <rmiport>

where:
<threads> Maximum threads for the threadpoolworker

<rmiport> RMI port used for JMX. To manage each instance of the pool an unique port number should
be used.
Monitoring Background processes

Once the background process has started there are a number of ways that can be used to monitor the progress and
status of the process:
» Batch Run Tree – Every time a background process is executed a set of records are registered within the Oracle
Utilities Application Framework to track the restart and progress of the background process. This information is
updated after each commit point to show the progress of the job.
» JMX Monitoring – In Oracle Utilities Application Framework V2.2, JMX monitoring was made available which
allows a JMX client (such as jconsole etc) or the provided jmxbatchclient[.sh] utility to monitor the
progress of individual threads. This is ideal for live monitoring of the threads executing on a particular threadpool
and has more up to date information than the Batch Run Tree. Refer to the Batch Operations And Configuration
Guide, Batch Server Administration Guide or Server Administration Guide associated with your product for more
information about the JMX capabilities.
» Log based monitoring – Depending on the submission method a number of logs are written for each job that
can be used to find errors and statistics. Refer to the Batch Operations And Configuration Guide, Batch Server
Administration Guide or Server Administration Guide associated with your product for more information.
» Database query – As information is displayed on the Batch Run Tree is stored in the database it is possible to
retrieve that information and provide historical analysis of the background process. This information is particularly
useful for tracking performance as well as determining the footprint of individual background processes. Refer to
the Performance Troubleshooting Guidelines – Batch Troubleshooting (Doc Id: 560382.1) from My Oracle
Support for details of this facility.
JMX Monitoring
Note: The JMX capability is only available for Oracle Utilities Application Framework V2.2 and above only.
With the implementation of DISTRIBUTED and CLUSTERED mode, the ability to actively monitor and manage
individual batch processes is available via Java Management Extensions (JMX) is possible. This means a JMX
console, such as jconsole, can be used on an active threadpool and individual batch threads to monitor their
progress and manage them remotely.
Refer to the BaServer Administration Guide and Batch Operations and Configuration Guide for your product for
more information on how to enable and monitor using JMX.
Global Batch View

Note: This facility is available in Oracle Utilities Application Framework V4.2.0.0.0 and above.
One of the features of the product is the ability to monitor active batch processes using JMX. This was a feature
introduced in Oracle Utilities Application Framework V2.2 and above to provide monitoring capability via
jmxbatchclient or a JMX console.
The issue is that each instance of a threadpool opens its own JMX port so while it is possible to monitor jobs at a
threadpool level it is not possible to see the jobs across all threadpool instances. In Oracle Utilities Application
Framework V4.2.0.0.0, a global batch view is available. This facility allows for a site to connect to any threadpool
instance and see all other instances with active jobs.

For more information of this facility refer to the Server Administration Guide for details of the JMX interface and call
structure to retrieve information.
Database Connection information

Note: Database connection tags are only supported for Oracle Utilities Application Framework V2.2 and above only
For Oracle Utilities Application Framework V2.2 customer, install patch #10215923 to implement this functionality.
By default, the database connection information via the column MODULE on the V$SESSION view for batch
processes is set to "JDBC Thin Client". This makes the batch sessions harder to differentiate from other sessions.
The product has been altered to now populate the MODULE column to display the Batch control id for the connection
and "TUGBU Idle" when the threadpool has an idle connection.
To implement this change the following setting must be added to the hibernate.properties file contained in
$SPLEBASE/splapp/standalone/config (or %SPLEBASE%\splapp\standalone\config on Windows):
hibernate.connection.release_mode=on_close
The MODULE will now display the batch control on active connections.
Note: For Oracle Utilities Application Framework V4.1 or above, the CLIENT_IDENTIFIER is also populated with
the product user configured to execute the process.
In Oracle Utilities Application Framework V4.2.0.0.0 ( ), the database connection information for batch and
online connections has been expanded and the following information is displayed for active batch threads accessing
the database:
TABLE 13 – V$S ES SION VARIABLES
Session Variable Comments

CLIENT_IDENTIFIER Authorization User used for Batch job (com.splwg.batch.submitter.userId)
MODULE Batch Control Identifier
ACTION Thread Number
CLIENT_INFO Threadpoolworker Name
Commit Strategies
Note: Commit Strategies were introduced in Oracle Utilities Application Framework V2.1. Versions of Oracle Utilities
Application Framework prior to V2.2 used the Standard Commit Strategy exclusively. To implement any of the
alternative commit strategies ensure that the Oracle Utilities Application Framework is patched to the latest service
pack to include all strategies.
By default, background processes typically commit records every configurable interval (see Commit Interval for more
information). The commit interval defines the size of the work unit in the database as well as defines the granularity
of restart, as the product rolls back to the last commit point on process error or failure. This is known as the
Standard Commit Strategy and is employed by the majority of background processes in the product.
Whilst the product generally uses the Standard Commit Strategy it is possible for custom background processes to
use alternative strategies. These are either coded within the custom background process themselves or are

configured using the com.splwg.batch.submitter.softParameter.executionStrategy setting specified
in the properties file used by the background process.
The table below outlines the valid strategies and their attributes:
TABLE 14 – COMMIT S TRATEGIES
Commit Strategy Parameter ValUe Comments
Standard Commit (default) StandardCommit This strategy Process each Unit of work as part of a group of work units in
one database transaction. The standard maximumCommitRecords
parameter defines the commit interval, defaulted to 200 if not supplied or
not a number > 0. In the event of an exception, the transaction group is
rolled back to the last committed work unit, the exception is logged and
committed, and the successful entries in this transaction group are
reprocessed (up to the current, failed work unit). Processing can resumed,
upon restart, at the first work unit after the one that failed.
This strategy is appropriate for batch processes that can tolerate errors in
the execution.
Single Transaction SingleTransaction Process the entire workload in a single committed transaction. Any
exception will cause a rollback of processed work and will be considered
an unsuccessful thread execution.
This strategy is most appropriate to update processes or interfaces that
cannot tolerate any errors within a run. For example, an interface that
requires a complete set of results should consider this strategy.
Commit Every Unit CommitEveryUnit Each successful record processed has its own commit. There is no need
to "back up" and reprocess units rolled back because of an exception.
This strategy can continue to move forward after exceptions. This is
equivalent to a commit interval of one (1).
This strategy is most appropriate to update processes or interfaces that
can tolerate some errors within a run.
Thread Iteration ThreadIteration If there is a requirement for thread pool workers to select their data at
initialization time and to loop and process until the end of the selection
then using the ThreadIteration commit strategy is recommended.
The application data can come from a database table, one or more flat
files, or any other source that the thread worker requires. The opening,
fetching and closing of the data is left entirely up to the application
program The batch framework’s responsibility is to provide appropriate
context, commit frequency, error handling and restartability, as it does in
the case of the other strategies.
This strategy was introduced to reduce java heap space usage of the
background process and to provide alternatives for restart. When a thread
is restarted after a premature end (for example error or cancellation), its
initialization method will have the opportunity to refresh the selection of its
data for the run. This is in contrast to the existing model in which the data
will always be based on the original selection when the job was first
submitted
The standard maximumCommitRecords parameter defines the commit
interval, defaulted to 200 if not supplied. Soft parameter maxErrors
controls the number of errors that the program can tolerate. Each error
that is thrown while within this limit causes all updates for that one work
unit to be rolled back. This strategy class uses a JDBC savepoint for each
work unit to avoid also rolling back the successfully processed units of
work when an error is found. If maxErrors is overrun, the thread is
aborted.
Continuous Execution ContinuousExecution Support continuous batch processes that may run indefinitely. Similar to
Commit Every Unit Strategy. Introduced to support Timed Batch.

Note: Not all background processes support all the commit strategies listed above. If in using an alternative commit
strategy the program fails, using StandardCommit may resolve the issue.
Flushing the Batch Cache

Note: This facility is available as the following patches 12539014 (FW4.1) , 9866988 (FW4.0.2)
and 9900007 (V2.2) . In other versions of Oracle Utilities Application Framework it is provided in the base
install.
The online component of the Oracle Utilities Application Framework uses a cache of static data for performance
reasons. The batch component uses a similar cache mechanism (per threadpoolworker) using a Hibernate data
cache. Whilst this cache is automatically refreshed by the product on a regular basis it can be now manually
refreshed by running the F1-FLUSH background process.
It is recommended to run the F1-FLUSH background process for long continuously running threadpoolworkers to
reflect data changes in configuration data.
Online Submission Alternative Threadpools

When using the online batch submission facility in the product, the batch jobs submitted will execute in the DEFAULT
threadpool by default (in V2.1 this is known as the LOCAL threadpool). Whilst, this behavior is usually sufficient for
most sites during testing, it may be desirable to execute the online submission jobs on an externally started
threadpool. Ensure you have started the external threadpool prior to submitting any jobs.
To do this, you must add the DIST-THD-POOL parameter to all batch control records you want to run in the external
pool. To save time, you can also provide a default value for this parameter to save the submitted having to specify it.
An example of this setting on the Batch Control is shown below:
Figure 9 – Setup of alternative threadpool
Restart Threadpools Regularly

Threadpools can be used for varying amounts of time. As pointed out in Designing Your Threadpools it is
recommended to shut down threadpools when they are not used. Whilst this applies to threadpools for shorter
duration processes, some sites implement long running threadpools to cover monitor processes and processes that
are regularly executed through the business day.
It is recommended that long running threadpools be stopped and restarted on a regular basis to release resources
that may be held by those JVM's. This is particularly important for customers who are using Oracle Utilities
Application Framework based products that contain jobs written in technologies other than java, such as COBOL or
C, as those resources are not released as easily as java resources and a restart of the threadpool will assure these
resources are released as well.
The frequency of the restart will vary with your site's volume and frequency of jobs but a few guidelines may be
helpful in deciding this frequency:
Restart a threadpool when no jobs are running – It is not a good idea to restart a threadpool whilst the threadpool is
active. Pick a time where jobs are not likely running to restart the threadpool.

A good rule of thumb is to restart long running threadpools at least once per day. This is essential for customers with
batch processes with components written in languages other than java. Customers where 100% of the code is java
can consider the rule of thumb of a longer frequency of once per 2-3 days 10.
Monitoring the JVM memory footprint of the JVM can also be a good idea to see if too many resources are being
held by the JVM (the memory footprint will trend upwards over time). If the threadpool is very active with different
processes, especially processes not written in java, then those resources will be held and not released until the
threadpool is restarted.
The more active the threadpool with more than one batch code the more often to restart the threadpool. Batch jobs
load different classes and resources and if those resources cannot be released then they have to released by
restarting the threadpool.
Overriding Threadpool log file names

Note: This functionality is available in Oracle Utilities Application Framework V4.1 Group Fix 4 only and
above.
By default the submitjob and threadpoolworker utilities will create logs in a specific location dictated by the
utility.
For example:
TABLE 15 – DEFAULT LOG FILE NAMES
User Exit Platform Default Location and Name

submitjob.sh Linux/Unix $SPLOUTPUT/submitjob.{batchCode}.{sysDateTime}.log
submitjob.cmd Windows %SPLOUTPUT%\submitjob.{batchCode}.{sysDateTime}.log
threadpoolworker.cmd Windows %SPLOUTPUT%\threadpoolworker.{sysDateTime}.log
threadpoolworker.sh Linux/Unix $SPLOUTPUT/threadpoolworker.{sysDateTime}.{pid}.log
Where:
{batchCode} Batch Control used for job
{sysDateTime} System Date and Time in YYYYMMDDHHmmSSSSS format
{pid} Process Id of threadpool
If your implementation wishes to implement custom log file names then this may be achieved using user exits which
allow custom setting of the file name pattern. In the utilities an environment variable is set to the name and location
of the log file. The user exit may be used to set this environment variable to an alternative. The user exit contains the
script code fragment 11 used to set the log file environment file name.
The table below lists the user exit, environment variable name and the platform:
TABLE 16 – US ER EXITS FOR LOG FILE NAME
User Exit Platform User Exit Name Environment Variable

submitjob.sh Linux/Unix submitjob.sh.setvars.include SBJLOGID
10 This would vary if the threadpool is very active. If it is more active, then
11 The script code fragment must be valid for your operating system.

User Exit Platform User Exit Name Environment Variable
submitjob.cmd Windows submitjob.cmd.setvars.include SBJLOGID
threadpoolworker.cmd Windows threadpoolworker.cmd.setvars.include TPWLOGID
threadpoolworker.sh Linux/Unix threadpoolworker.sh.setvars.include TPWLOGID
Additionally internal session variables are available for use in the user exit ( indicates validity for the individual
utility).
TABLE 17 – ADDITIONAL RUNTIME VARIABLE
Variable submitjob threadpoolworker Comments

RUNOPTS  Runtime Options
batchCode  Batch Code
sysDateTime  Run Date and Time
Note: Other environment variables in the session can be used and determined in the user exit script code.
Note: When setting the log file name the location and file name MUST be valid for the security and operating system
used for the product. The directory should be writable by the OS user used to execute the job.
Submitter Nodes Per Job

When submitting a batch process, there are two options regarding the number of submitter nodes to be employed.
Single submitter for the entire process. Regardless of the number of threads (e.g. 100 threads of BILLING), there
will only be one submitter node for the process in the cluster.
Multiple submitters per batch process, i.e. one per thread. For example, in the case of 100 threads of BILLING, there
would be 100 submitter nodes.
There are advantages/disadvantages for both of the approaches. The single submitter approach is less resource
intensive (each submitter JVM requires 180-256Mb) and results in a smaller cluster in terms of transient members.
With respect to this latter point, it should be noted that submitter nodes are continually entering/exiting the cluster
(hence the term transient), thus requiring acknowledgment from other members and thereby significantly increasing
the required cluster communication as the number of submitter nodes increases. This can be problematic for large
clusters for which there is the recommendation to employ multiple clusters if communication delays become an
issue.
Despite the aforementioned resource and potential communication disadvantages, employing multiple submitters
per batch process (one per thread) does have distinct advantages. Namely, it allows for immediate notification of a
failed thread (the associated submitter nodes terminates immediately) and the canceling of a specific thread by
terminating the associated submitter node. Real-time feedback of a terminated thread can be critical at some sites
such that the issue can be attended to immediately - as opposed to waiting until all other threads have
ended/abended to receive such feedback as in the case of a single submitter (which, depending on the process, can
be a significant amount of time). Note that the threadpoolworker JMX facilities can also be used to monitor and
cancel individual threads, however the site will need to create the mechanism to issue and interpret the JMX
requests.
Clu s te re d Mo d e – De d ic a te d S to ra g e No d e Re c o mm e n d a tio n

As stated in the Coherence Best Practices, the use of dedicated cache servers can be advantageous both from a
heap consumption standpoint as well as CPU utilization (as only a subset of the JVMs in the batch cluster will need
to maintain the caches).
By default, all OUAF batch node instances (threadpoolworker and submitter) will maintain the application caches
which can be overridden via the property tangosol.coherence.distributed.localstorage=false. While
this property can be readily specified in the submitbatch.properties file thereby disabling the local storage for
the submitter nodes, it needs to be approached differently with respect to the threadpoolworker nodes (as at least
one threadpoolworker node must have local storage enabled). It is recommended that if Unicast is being utilized that
the cache node(s) be the WKA members. It is further recommended that the cache node(s) not perform any
application processing; therefore they should be assigned a threadpool which is not specified by any job.
Clu s te re d Mo d e – Ro le s Re c o m m e n d a tio n
Coherence provides the property tangosol.coherence.role which can be used to identify the type of node for
added clarity when monitoring the application. By default, the application sets this value to
SplwgBaseApiThreadPoolWorker for the threadpoolworker instances and SplwgBaseApiSubmitBatch for
the submitter nodes. These values can be overridden to provide further specifics regarding the node, for example
the WKA/cache members can be identified, the job/thread associated with the submitter node, etc.
Below is sample output from a submitter node illustrating what can be achieved via specifying
tangosol.coherence.role for the different nodes. As can be seen, the WKA and storage enabled nodes are
denoted by WKACache_SplwgBaseApiTPW, the actual threadpoolworker instance as denoted by
SplwgBaseApiThreadPoolWorker (unchanged from the default), and the submitter nodes are denoted by the
Job/Thread, e.g., SubmitBatch_BILLING_1_OF_8.
- 2012-07-06 18:07:21,884 [Logger@9219105 3.6.0.0] INFO (Coherence)

2012-07-06 18:07:21.884/5.050 Oracle Coherence GE 3.6.0.0 <Info>
(thread=main, member=n/a): Started cluster Name=cluster.batch.prod
WellKnownAddressList(Size=2,
WKA{Address=1.1.1.10, Port=7020}
WKA{Address=1.1.1.10, Port=7010}
)
MasterMemberSet
(
ThisMember=Member(Id=11, Timestamp=2012-07-06 18:07:21.509,
Address=1.1.1.10:7524, MachineId=424,
Location=machine:HOST1,process:5636500,
Role=SubmitBatch_BILLING_8_OF_8)

OldestMember=Member(Id=1, Timestamp=2012-07-06 18:04:57.347,
Location=machine:HOST1,process:7930178, Role=WKACache_SplwgBaseApiTPW)
ActualMemberSet=MemberSet(Size=11, BitSetCount=2
Member(Id=1, Timestamp=2012-07-06 18:04:57.347,
Role=SplwgBaseApiThreadPoolWorker)



)
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0, BitSetCount=0
)
)
TcpRing{Connections=[10]}
IpMonitor{AddressListSize=0}
Clu s te re d Mo d e – P riva te Ne twork Re c o m m e n d a tio n
If the batch cluster will be spread across multiple physical machines, ensure that the nodes are communicating via a
private network. This will preclude the possibility of the network becoming saturated by network activity outside the
cluster. Despite this measure, inter-machine communication can still be problematic from a communication
standpoint (indicated by the presence of communication delays), hence the subsequent recommendation of
establishing multiple clusters when more than one physical machine comprises the batch topology.
Clu s te re d Mo d e – Mu ltip le Clu s te rs Re c o m m e n d a tio n

Given the communicative nature of Coherence between all of the nodes; if multiple physical machines will be utilized
as batch servers or the number of batch cluster members is substantial, it is recommended that multiple clusters be
established if communication delays are observed for which the sheer cluster size / inter-machine communication is
determined to be the issue. This topology reduces the size of each individual cluster and eliminates inter-machine
communication in the case of more than one physical batch server; in tandem these reduce the chances of
communication delays within the respective smaller clusters which will improve overall batch health and stability.
The above topology can be achieved by setting the property tangosol.coherence.cluster to a unique value for those
threadpoolworker.properties and submitbatch.properties on each physical server. For example, if two batch
servers were being utilized, this property could be set to GBUPRODA and GBUPRODB respectively. In the case
where multiple clusters will reside on the same machine, these values can be overridden at
submitter/threadpoolworker startup (note that the associated Unicast/Multicast ports will also need to be overridden).
By having separate clusters, the job submission / scheduling mechanism must submit jobs to each cluster explicitly.
For example, in the case of two clusters and a 60 thread job, the job submission mechanism could submit 30
threads to one cluster and 30 to the other.
Batch Memory Management

Note: Refer to the Tuning Java Virtual Machines documentation for additional advice and options.
Note: Refer to the JVM options from the documentation provided with the JVM vendor used for valid formats
By default the threadpools allocate enough memory to run most batch processes. In some cases though, such as
when out of memory conditions occur, you may need to tweak the setting to provide enough memory to the running
processes. There are a number of techniques available to address this:
» In Oracle Utilities Application Framework V4.x a number of parameters control the java memory settings of the
threadpoolworker using the following settings available from option 51 from the configureEnv utility (using the
–a option):
TABLE 18 – BATCH MEMORY SETTING
Setting Usage
BATCH_MEMORY_ADDITIONAL_OPT Additional Java options to be passed to the threadpoolworker JVM. The format of
the options is as expected by the version and JVM vendor.
Note: Avoid duplicating the memory options outlined below.
BATCH_MEMORY_OPT_MAX Maximum heap memory to allocate to the threadpoolworker. Equivalent to the –Xmx
java option.
BATCH_MEMORY_OPT_MAXPERMSIZE Maximum permgen space memory to allocate to the threadpoolworker. Equivalent to

the –XX:PermSize java option.
BATCH_MEMORY_OPT_MIN Minimum heap memory to allocate to the threadpoolworker. Equivalent to the –Xms
java option.
» In Oracle Utilities Application Framework V2.2, to make changes to the memory arguments requires manual
changes to the following base scripts in the bin directory:
TABLE 19 – BATCH MEMORY CHANGES IN V2.2
Setting Usage
threadpoolworker.sh (UNIX/Linux) Change MEM_ARGS="-Xms512m -Xmx1024m -XX:MaxPermSize=192m" to

desired settings
threadpoolworker.cmd (Windows) Change set MEM_ARGS=-Xms512m -Xmx1024m -XX:MaxPermSize=192m

to desired settings.
Note: Any changes should be backed up and noted as they may be overwritten due to upgrades or fixes, and
therefore may need reapplication.
Setting Batch Log Filename Prefix

Note: This facility is available in Oracle Utilities Application Framework V2.2 SP10 and above.
By default the batch logs are named in a standard fashion (refer to the Server Administration or
Configuration/Operations Guide for these standards). The name of the log file can include a custom prefix (for
example, a literal name). These can be set using the following method:
» As the product administrator user, create a file named batch_log_prefix.txt in the etc directory of your
installation. In that file create a single row entry that contains the prefix to use 12. For example:
12
Do not include the "."

$ cat $SPLEBASE/etc/batch_log_prefix.txt
fred
$
» Save the file to use the prefix.
» Any subsequent start of a threadpool will include this prefix. For example:
$SPLOUTPUT/fred.threadpoolworker.DEFAULT.20130806.1519890.log
Adding Custom JMXInfo for Monitoring

Note: This facility is only available in Oracle Utilities Application Framework V4.3.x and above.
Note: For the programmatic version of this facility refer to the Oracle Utilities SDK documentation.
In past releases of Oracle Utilities Application Framework it was possible to programmatically add additional tags to
the JMX interface to provide addition monitoring capabilities. In Oracle Utilities Application Framework V4.3.x and
above this facility is now available via configuration as well as programmatic.
To use this facility the following must be configured:
» Add a new configuration entry to the submitbatch.properties files for each additional tag in the following
format:
com.splwg.batch.submitter.softParameter.f1.jmxInfo.<parameter>=<value>
where
<parameter> Name of tag
<value> Value of tag
For example:
com.splwg.batch.submitter.softParameter.f1.jmxInfo.foo=bar
» This setting can be set globally or on specific properties files for particular jobs.
Note: It is possible to specify this parameter on the command line using the -x
f1.jmxInfo.<parameter>=<value> option.
Managing configuration using Batch Edit

Note: Batch Edit is part of Oracle Utilities Application Framework V4.2.0.2.0 and above only and must be enabled by
setting the Enable Batch Edit Functionality to true using the configureEnv[.sh] -a utility.
One of the most critical parts of the batch architecture is deciding and maintaining the configuration settings
appropriate for your requirements. In past releases this has involved maintaining the following configuration files 13:
TABLE 20 – COMMON BATCH CONFIGURATION FILES
Setting Usage
13 All batch configuration files are located in $SPLEBASE/splapp/standalone/config (or %SPLEBASE%\splapp\standalone\config on Windows)

Setting Usage
submitbatch.properties Global parameters for batch submitter JVM's
tangasol.coherence-override.xml Oracle Coherence cluster configuration
threadpoolworker.properties Threadpoolworker configuration
In Oracle Utilities Application Framework V4.2.0.2.0 and above, a new utility bedit[.sh] has been added to
simplify the creation and maintenance of these files to promote stability and flexibility whilst minimizing maintenance
mistakes. The features of the facility are as follows:
» Command driven wizard with simplified interfaces. Customers familiar with Oracle's WebLogic WLST utility will
recognize the design pattern with the utility. For example:
$ bedit.sh -c
Editing file /oracle/demo/splapp/standalone/config/tangosol-
coherence-override.xml using template /oracle/demo/etc/tangosol-
coherence-override.ss.be
Batch Configuration Editor 1.0 [tangosol-coherence-override.xml]
-------------------------------------------------------------
Current Settings
cluster (cluster1)
address (127.0.0.1)
port (42020)
loglevel (5)
mode (dev)
> set loglevel 4

Batch Configuration Editor 1.0 [tangosol-coherence-override.xml]
--------------------------------------------------------------
Current Settings
cluster (cluster1)
address (127.0.0.1)
port (42020)
loglevel (4)
mode (dev)

» Uses optimized templates for generation of configuration files based upon Production Configuration Guidelines
and feedback from customer implementations.
» Supports templates of all configuration parameters included in Oracle Utilities Application Framework V4.2.0.2.0
including JVM specific options for the threadpool. The latter avoids manually setting up JVM parameters in
custom scripts.
» Ability to create cache nodes to reduce network traffic and act as a batch administration node to use Global JMX
monitoring.
» Supports the various cluster modes for Oracle Coherence including multi-cast based clusters, uni-cast clusters
(a.k.a Well Known Addresses) and single server clusters. The latter is useful for non-production and
demonstration environments.
» Comprehensive online help for each configuration setting with additional advice. Use the extended help option (-
-h) for more advice. For example:
> help
tangosol-coherence-override.ss
------------------------------
This template is used to configure a Coherence cluster. Multicast is the

default and also the simplest configuration from the perspective of Coherence,
but multicast may not always be desirable or possible. An alternative
configuration is well-known-addresses (WKA), which uses unicast instead of
multicast. WKA requires all the "well known addresses" to be explicitly
defined, so the configuration is more complex. To create/edit a WKA cluster,
use the "-c -t wka" options. For development, testing or demo, a "single
server" cluster can also be created. This type of cluster will confine all its
network communications to the local server. In other words, it will not
broadcast over the wider network to establish or join a cluster. For such a
cluster, use the "-c -t ss" options.
See "help usage", "help mc", "help wka", "help ss".
Topics
address Coherence single-server address
bedit BatchEdit overview
cluster Coherence cluster name
command List available commands
job Job specific configuration
loglevel Coherence cluster log level
mc Coherence multicast configuration
mode Coherence cluster mode
port Coherence single-server port
role Coherence member role

socket Coherence WKA socket definition
ss Coherence single-server configuration
storage Coherence storage-enabled switch
submitbatch Submitter configuration
tangosol-coherence-override Clustered configuration
threadpoolworker TPW configuration
topic List these topics
usage Show bedit command line usage
what Show what is being edited
wka Coherence well-known-addresses configuration
wkaaddress Coherence WKA node address
wkaport Coherence WKA node port
Commands
set Set a property value
add Add a group
del Delete a group
save Save the configuration
help Get help on a topic
exit Exit Batch Configuration Editor
Type "help [command|topic]"
Online help is context sensitive to the options used. If help is not available on a topic then it may not be appropriate
for the option used.
Enabling BatchEdit
For backward compatibility the BatchEdit facility is disabled. To enable the use of BatchEdit the following process
must be performed:
» Attach to the environment as a valid administrator user using the splenviron[.sh] command.
» Execute the configureEnv[.sh] -a option to invoke the configuration menu.
» Select option 50 and navigate to the Enable Batch Edit Functionality menu option.
» Specify true to enable the functionality.
» Navigate to the main menu of the configuration menu and use the P option to process the change.
BatchEdit is now enabled.
BatchEdit Commmand Line

To use BatchEdit a command line has been provided to create and maintain the configuration files with the following
features:
» The first time a command option is used will create the default configuration files using product supplied
templates.
» Product supplied templates exist in the etc subdirectory with the suffix be.
TABLE 21 – BATCHEDIT TEMP LATES
Template Usage
submitbatch.be Default submitbatch.properties template for all jobs
submitbatch.LOCAL.be Default submitbatch.properties template for multi-threaded jobs that

do not require a threadpool. This is useful for developers.
submitbatch.THIN.be Default submitbatch.properties template for single-threaded jobs that

do not require a threadpool. This is useful for developers.
tangosol-coherence-override.mc.be Default tangosol-coherence-override.xml template for multi-cast

clusters
tangosol-coherence-override.ss.be Default tangosol-coherence-override.xml template for single

server clusters
tangosol-coherence-override.wka.be Default tangosol-coherence-override.xml template for uni-cast

clusters
threadpoolworker.be Default threadpoolworker.properties template for DEFAULT

threadpool. Use this template for non-production environments using the
online batch daemon that uses the DEFAULT threadpool.
threadpoolworker.cache.be Default threadpoolworker.properties template for cache

threadpools. Use this template for implementing cache threadpools.
threadpoolworker.job.be Default threadpoolworker.properties template for threadpools that

execute jobs. Use this template for implementing implementation specific
threadpools.
» It is possible to create custom templates by copying the base template and adding a cm. prefix. This technique
will be illustrated during the configuration process.
» The templates have been pre-optimized based upon customer experiences, performance engineering and partner
feedback.
BatchEdit will remember your preferences 14, so minimal options are needed when maintaining the existing cluster
and threadpool definitions. For example, when you specify the -t option on the cluster, it is set and not needed for
subsequent invocations of BatchEdit.
BatchEdit Configuration Process

To use BatchEdit effectively there are three stages of configuration to define a complete set of batch solution:
» Cluster Configuration - The cluster must be created and maintained. This defines how the cluster will operate on
the network and the scope of the cluster. This stage configures Oracle Coherence to cluster and manage batch
objects. The major decision here is the type of cluster to implement. Oracle Utilities Application Framework
supports a single server cluster, unicast based cluster and multicast based cluster. This stage is outlined in
Cluster Creation and Maintenance.
» Threadpool Configuration - Once a cluster is defined the threadpools that will execute across that cluster must
be defined with their name and attributes. The Oracle Utilities Application Framework supports threadpools that
14 The preferences are stored in be.prefs, be.properties or be.default.properties, in that order.

can execute batch processes and threadpools reserved as a cache for network traffic optimization. This stage is
outlined in Threadpool Creation and Maintenance.
» Submitter Configuration - Once threadpools have been defined the submitter's must be configured to create
global configurations for batch jobs or specific configurations for specific batch jobs. This stage is outlined in
Submitter Creation and Maintenance.
The process is summarized in the figure below:
Single Server Cluster
Create/Maintain Cluster Unicast Cluster
Multicast Cluster
Standard Threadpool
Create/Maintain Threadpools
Cache Threadpool
General Submitters
Create/Maintain Submitters
Specific Submitters
Figure 10 – BatchEdit Configuration Process
Note: Once the configuration is complete, it must be reflected/synchronized across your architecture. If more than
one server is included in your hardware the configuration files created need to be sychcronized.
Cluster Creation and Maintenance

The first step in the BatchEdit approach is to decide the type of cluster and configure the cluster for your
implementation. There are a number of decisions that have to be made:
Type of Cluster - This decision is basically whether you want to implement a single server cluster, a unicast based
server or a multicast based server. This will define the scope of the cluster and how the objects in the cluster will
communicate within the cluster. This decision can be made considering a number of factors including the scope of
the cluster and your preferred networking method.
TABLE 22 – CLUS TER TYP ES SUPPORTED
Cluster type -t Option Usage
Single Server Cluster ss Cluster is restricted to a single host only. The networking is restricted to the
host using a local internal protocol. This type of cluster is useful for simple
environments such as development, demonstration and other non-production
environments you want to restrict to a single server.
Unicast Cluster wka Cluster is across one or more hosts using the unicast networking technique.
This requires each host to be explicitly defined in cluster using a Well Known

Cluster type -t Option Usage
Address format. This cluster is suitable for production or non-production
environments where more than one host is in the cluster and multicast is not
suitable. This configuration is less dynamic than multicast
Multicast Cluster (default) mc Cluster is across one or more hosts using the multicast networking technique.
This is a dynamic configuration with each host in a cluster joining the cluster
using a common multicast setup in a network at startup time. This cluster is
suitable for production or non-production environments where more than one
host is in the cluster. This is the default option if no other is specified.
Once the type has been set the parameters for the cluster must be specified. The table below illustrates the
common parameters available for the cluster configuration:
Note: use the help <parameter> function in BatchEdit for a description of the field and more advice as well as a
full list of parameters.
TABLE 23 – CLUS TER P ARAMETERS
Parameter ss wka mc Recommendations

cluster ■ ■ ■ Unique cluster name for environment. This must be unique for the cluster.
Typically use the environment name or database name.
address ■ ■ ■ IP address or host name for this node in the cluster. Use localhost if possible to
minimize maintenance across hosts.
port ■ ■ ■ Unique port used for cluster. This must be unique per cluster per host. It use will
vary from cluster t ype to cluster type. Refer to the online help for more
information.
loglevel ■ ■ ■ The logging level associated with cluster operations. Refer to the online help to
decide the amount of information you wish to log. The higher the value the more
that is logged. High values are used by developers typically.
mode ■ ■ ■ The Coherence mode that the cluster is to be scoped to. Refer to the online help
for more information.
socket ■ This is a section for each of the hosts in the Well Known Address format. Each
host is a separate socket entry. Refer to the online help for more information.
wkaaddress ■ The IP address or host name of the member of the cluster assigned to this
socket.
wkaport ■ The port number assigned of the member of the cluster assigned to this socket.
This value ideally is the same across all hosts in the cluster but can be
overridden to overcome port conflicts. The port number on each node must
match the number assigned to the port value.
Use save to save the changes.
Note: The Cluster type may be initially set by specifying the -t option on the command line. After the type has been
set the -t option is no longer needed unless you wish to change the cluster type.
Threadpool Creation and Maintenance

Once the cluster is defined the next step is to design and configure the threadpools that will be implemented. Once
the threadpools have been designed as outlined in Designing Your Threadpools, the configuration of these
threadpools should be undertaken using this process:
» It is now possible to create different configurations per threadpool. The main differentiators for this are the role of
the threadpool and the JVM parameters. The different configurations are set using the -l option. For example:

bedit[.sh] -w -l <arg> where <arg> is the tag/role to use.
Note: Use of multiple configurations is optional. Omit the -l option to use the default configuration.
» For each threadpool the parameters should be set appropriately.
TABLE 24 – THREADP OOLWORKER P ARAMETERS
Parameter Recommendations
minheap Minimum JVM Heap size
maxheap Maximum JVM Heap size
maxperm Maximum JVM permgen size
daemon Whether the threadpool should run the online submission daemon. This value should be set to false
for production environments.
dkidisabled Key insert behavior
storage Whether local storage is enabled. Used for cache threadpools.
distthds Number of internal threads used for cache threadpools.
invocthds Number of internal threads used for invocation services.
role Information role for the threadpools used for monitoring
pool Section for each threadpool
poolname Name of threadpool
threads Maximum number of threads for the threadpool per instance
To minimize maintenance the recommendations for threadpools are:
» Use the default template for the vast majority of threadpools unless there is a need to implement different
parameters for individual threadpools.
» Create at least one cache threadpool per node in your architecture. Use the -l cache label option to achieve
this. For more information about cache threadpool refer to Using Cache Threadpools.
» Create custom templates or use labels to create custom configurations for specialist jobs where parameters differ
for the jobs run in that threadpool. Remember to use the -l <label> option on the threadpoolworker[.sh]
utility to use the label specific parameters.
Note: Once the configuration is completed, the execution of the configuration by manually starting/stopping the
threadpools, using the threadpoolworker[.sh] utility, defines the thread capacity and threadpool availability.
Submitter Creation and Maintenance

The last step in configuration is to configure the submitters. A submitter is a JVM that submits jobs or threads of jobs
to a threadpoolworker and waits for the thread to complete. It is responsible for interfacing to the threadpoolworker
and also provide an interface to the process that submits the jobs in the first place (e.g. a third party batch scheduler
or Oracle Scheduler).
A submitter needs one or more of the following pieces of information:
» It needs a configuration file that defines the parameters to be used for the individual batch process being
executed. These can be global configuration files or individual configuration files optimized for a particular batch
process.
» Command line options to set or override particular configuration parameters that define the execution parameters
for the individual process or thread.

» The single execution of a submitter can run a single threaded batch job, a single individual thread of a multi-
threaded batch job or all threads of a multi-threaded job 15.
To create a configuration file for a submitter use the bedit[.sh] -s -l <label> or bedit[.sh] -b
<batchcode> command with the following parameters set:
TABLE 25 – SUBMITTER P ARAMETERS
Parameter Recommendations
poolname Name of threadpool to execute this submitter within
threads Thread limit of submitter. The number of threadsmust be equal to or less than the number of threads allocated
to executing instances of the threadpool
commit Default commit interval. This overrides the commit interval defined internally for the batch job.
user The userid, defined to the User record, to be used to determine execution permissions and is used for records
updated and created by this process. This MUST be a valid defined used.
lang The language pack used for the batch process (default is ENG)
storage This sets whether this node is a storage node. Used for submitters that use THIN mode (for developers). This
value is recommended to be set to false for all other submitter types.
role The role used for this submitter. This is used for the JMX monitoring interface as a filter. By default the batch
code used for this value.
minheap Minimum JVM Heap size
maxheap Maximum JVM Heap size for submitter (optional)
maxperm Maximum JVM permgen size for submitter (optional)
soft Group section for soft parameters. One section per parameter
parm Parameter name as outlined on Batch Control. For example MAX-ERRORS
value Value assigned to the parameter
Note: Other parameters supported by the submitjob[.sh] utility are available as options rather than configuration
parameters.
To minimize maintenance the recommendations for submitters are:
» Use the generic template for the majority of the batch jobs unless the job requires special parameters.
» Use specific configurations using the -b <batchcode> option on the command line to generate and maintain
job specific configurations.
Using Cache Threadpools

One of the new facilities in the batch architecture is the ability to define cache or storage nodes in your architecture.
By default a batch cluster communicates across the cluster elements such as threadpools the state of other
elements in the cluster. While this has benefits, as any element in the cluster can be queried, via JMX, to find out the
global status of all active processes in that cluster, it can generate a large amounts of network traffic and cause
instability as the cluster is performing a large amount of maintenance operations. To address this it is now possible
in Oracle Utilities Application Framework to create cache or storage threadpools. These act as conduits with the
architecture and greatly reduce the network traffic and overheads in managing the cluster. This translates to more
stable clusters.
15 This is achieved by specifying the thread number as 0 (zero) to spawn threads up to a thread limit.

The communications between elements are shown below:
Cluster Cluster
Threadpool Threadpool Threadpool Threadpool

Instance Instance Instance Instance
Cache
Threadpool
Threadpool Threadpool Threadpool Threadpool

Instance Instance Instance Instance
Without Cache With Cache

Figure 11 – Cache Threadpools
The performance advantages of the cache increases with the number of elements the cluster has to manage and
cache threadpools have the following implementation recommendations:
» Cache threadpools do not execute any threads of any jobs within them. They are exclusively used for
administration, a storage node for the cluster state and a conduit for cluster management.
» Cache threadpools act as Coherence local storage nodes to maintain the integrity of the cluster and allow cluster
management.
» Cache threadpools are ideally suited to allow JMX connections to monitor the performance of the cluster using the
Global JMX interface outlined in the Batch Server Administration Guide.
» At least one cache threadpool per cluster per host is recommended. Multiple cache threadpools can be
implemented where high availability is required or there are a lrge number of submitters, threads and/or
threadpools to manage.
» If a cache threadpool, is shut down and no cache threadpools are active at any time, the cluster will not revert to
individual elements communicating across other elements.
To create cache clusters, use the bedit[.sh] -l cache command. A prebuilt template is created for the cache
where storage is enabled , distthds, invocthds and the number of threads is set to 0 (to prevent jobs from
actually running in the cache).
BatchEdit Common Configurations

The process for creating configuration varies from scenario to scenario. The table below illustrates the common
scenarios and the commands that can be used to create and maintain a configuration:
TABLE 26 – COMMON SCENARIOS
Parameter Process
Single Server Create and configure the single server cluster using the bedit[.sh] -c -t ss
Create and configure the threadpool definitions using the bedit[.sh] -w command. In most cases, use the
default template and avoid cache threadpools unless the number of submitters/threads/threadpools is large.

Parameter Process
Create and configure the submitter global definitions for the jobs to execute using the bedit[.sh] -s
command. Specify job specific setting using the bedit[.sh] -b <batchcode> command.
Unicast Decide the hosts to be used for the cluster.

Create and configure the unicast cluster using the bedit[.sh] -c -t wka. For each host in your cluster
define the socket section with the relevant host and port numbers. To minimize maintenance try and allocate
the same port number for cluster communication across each host.
Create and configure the threadpool definitions for job executing threadpools using the bedit[.sh] -l job
command. Create at least one cache threadpool per host 16 using the bedit[.sh] -l cache command.
Copy all the threadpool and cluster configuration files generated to the hosts in the cluster. The submitter
configuration files can be copied if submitted from that host.
Multicast Create and configure the multicast cluster using the bedit[.sh] -c -t mc. Allocate an appropriate multi-
cast IP address and port number.
Create and configure the threadpool definitions for job executing threadpools using the bedit[.sh] -l job
command. Create at least one cache threadpool per cluster host 17 using the bedit[.sh] -l cache
command.
Copy all the threadpool and cluster configuration files generated to the hosts in the cluster. The submitter
configuration files can be copied if submitted from that host.
Converting from existing setup to use BatchEdit

If you are an existing implementation who wishes to use BatchEdit generated configuration files the following
process should be performed to retain your settings:
» Create a backup the following files that are located in the splapp/standalone/config subdirectory:
TABLE 27 – EXIS TING CONFIGURATION FILES
file Recommendations
tangosol-coherence-override.xml Cluster Configuration
threadpoolworker.properties Threadpool configuration
submitbatch.properties Global Submitter configuration
» Reflect and recreate your current configuration in the cluster, threadpool and submitter as outlined in BatchEdit
Common Configurations.
Scheduling Best Practices

The product contains a background-processing component to process multiple records over a selected period (daily,
nightly, weekly etc). Optimization of this component is critical to the overall success of the implementation.
Optimize the schedule
16 This is done at runtime not configuration time. Create one cache definition and copy it across nodes.
17 This is done at runtime not configuration time. Create one cache definition and copy it across nodes.

During the implementation of the product, the implementation will work with the business to determine which
processes are applicable to your business to determine the sites schedule. Remember that NO site has
implemented the schedule as it is supplied. It is always a relevant subset of the provided schedule.
Business processes and business activity drive the schedule that the implementation will go live with. Some of the
background processes will be removed and some will be added. Individual background processes can be removed if
business process does not require the process or, for any reason, the process is not applicable to the business.
Custom Processes (typically for interfaces) will be added by your implementation team.
Therefore the following goals are applicable to schedule optimization within product:
» Maximize throughput by limiting process concurrency. Do not run too many processes simultaneously. The CPU
can flood if you run too much at the same time.
» Aligning schedule with Bill Cycle and Meter Read Schedules. The Bill cycle and Meter Read cycles, if used, can
influence the schedule. Refer to the Business Process documentation for billing and meter reading for more
details of this.
The following process can be used to optimize the base schedule supplied with the product:
» Remove background processes that are not to be implemented. Check with the business if the process is
applicable or needed for a business process. If in doubt, leave it in.
» Add custom background processes to schedule. For custom interfaces or custom business processes, outside the
scope of the base product, you will be adding a few custom background processes. Your implementation team
will have details of these processes.
» Adjust dependencies for the added and removed background processes. When you take away and add
background processes, the dependencies will change and the overall flow of data will change.
» Run schedule in test as initially documented – You need to run the new schedule at a basic level to get an idea of
how it hangs altogether.
» Gather elapsed times and throughput rates - You need to get some stats to determine which background
processes will need optimization and how much optimization is really needed.
» Determine "heavy" background processes – Background processes that take a long time (we will leave that
tolerance up to you) need to be determined, as they will greatly affect the overall schedule. These become
candidates for multiple threading.
» Now that we have the basic information we can start optimization.
» Heavy background processes can be run multi-threaded – Consider multiple threading the heavy background
processes but not too much threading as it can drive up contention. See Multi-threading guidelines for more
information.
» Move scheduling of background processes to minimize number of background processes running in parallel –
Reduce contention around heavy background processes by scheduling other background processes earlier or
later. Also try not to run a lot of light background processes at the same time. It has the same impact that heavy
background processes have.
When altering the base schedule remember the following:
» If in doubt, do NOT leave it out. Keep processes in the schedule that you are not sure about.
» Only run multi-threaded if necessary – remember too much threading can increase contention and therefore
reduce throughput.
» To run background processes during busy business days then reduce the Commit Interval to increase transaction
concurrency.
Use a third party batch scheduler

The product does not have advanced batch-scheduling capabilities typically required by sites. All sites tend to use a
third party tool for scheduling to manage product background processes as well as external related background
processes (such as backups). In choosing a batch scheduling tool that you can use with product the following must
be supported:
» UNIX command line support – The batch scheduler should be able to run a UNIX command line. An added
bonus is the ability to parameterize the command line.
» UNIX return code support – The batch scheduler should be able to act upon the value of a standard UNIX return
code. For example the process will return zero (0) for successful execution and non-zero for not successful.
» Manage Job Dependencies – The main advantage of a batch scheduler for product is to manage the
relationships between background processes and ensure that background processes are only started after
dependant background processes are completed successfully.
» (Optional) Limit the number of batch background processes executing at one time – The ability for a batch
scheduler to submit and control the number of active background processes is ideal to ensure the resources on a
machine are maximized not exhausted. This is not a mandatory requirement but experience has shown that
throughput is maximized if the scheduler manages the number of active background processes competing for the
CPU.
Oracle Utilities Application Framework V2.x and above includes an internal scheduler 18 that can be used to submit
background processes but it has the following limitations:
» It contains no failover facilities.

» It can only be run on one server only.
» It can only run product background processes.
Scheduler Implementation Guidelines

Note: Most third party scheduler products use the term job to denote a background process or an individual thread
of a background process.
Once a third party scheduler has been chosen the scheduler, it must be configured for submission of background
processes. The following guidelines have been used by a number of sites to successfully implement the scheduler:
» Create a separate operating system account for the scheduler to use to submit background processes. Avoid
using administration accounts (e.g. root) or the product administration account. This account should have
access to the relevant product security groups.
» To run any utility provided with the product, the batch user must execute the splenviron[.sh] utility to set the
context of the executions. This can be achieved two ways:
» The .profile (or autoexec.bat on Windows) for the administration account can automatically call the
splenviron[.sh] utility automatically. This is the preferred method.
» The command line used in the scheduler for each background process (including threadpoolworker[.sh] as
well as submitjob[.sh]) must be prefixed with the full splenviron[.sh] utility with the –c option. Refer to
the Operations And Configuration Guide or Batch Server Administration Guide for your product for details of the
splenviron[.sh] utility.
» The background processes must be loaded into the scheduler. This can be done manually or using an import
facility provided with the scheduler. In some products the dependency information is stored in a set of tables.
Refer to the Framework Administration Guide on the Internal Scheduler online help for more information.
Note: The Internal Scheduler feature is not available in ALL products.
18 Not all products have included the scheduler as part of their product. Refer to the relevant product documentation for details.

» Configure the threadpoolworker.properties and submitbatch.properties configuration files with
your global defaults.
» Each background process must use the submitjob[.sh] utility to execute the background process using the
following guidelines:
» The batch code must be provided on the command line in the scheduler using the –b option.
» The Business Date should be specified on the command line using the –d option. This is to avoid the Past
Midnight issue. If the date is not supplied the product will use the current business date and if any background
process spans across midnight or processes in the schedule across midnight then the business dates used may
be inconsistent and data may not be processed as expected. Specifying the same business date for all
background process avoids this issue. Most schedulers have a business date function (also known as an
operational date) that can be used for this purpose.
» Consider providing the threadpool as using the –p option to explicitly allocate a threadpool to a background
process.
» Start the threadpools referenced in the background processes using the threadpoolworker[.sh] utility.
Optionally they can also be shutdown using the jmxbatchclient[.sh] when not used to save resources.
» Ensure the threadpool referenced in the submitjob[.sh] command line (or configuration file) is running before
the process is executed and on the same physical machine. Remote invocation of worker or submitter threads in
not supported in the current release of the Oracle Utilities Application Framework.
For multi-threaded background processes there are a number of options:
» It is possible to submit individual threads of a background process as individual jobs within your scheduler. This
will allow micromanagement of the schedule for dependencies.
» It is possible to submit all threads in a single job using the –t 0 (zero) option. This will submit all threads
simultaneously in the same threadpool which may not be desirable.
» Remember to add non-product jobs that are necessary as part of your schedule such as backups and interface
transfers.
Common Errors
There are a number of common errors that can occur from time to time in threadpoolworker and/or submitter. This
section outlines some of the common errors and suggested remedies.
No Storage Nodes Exist

If the submitters cannot find a threadpoolworker active, due to the all the threadpool instances abending, not started
or were terminated it cannot instantiate the instance of the job to be execute. This error will appear in the submitter
log in a similar fashion to the example below:
… [main] WARN (api.batch.StandaloneExecuter) The following exception

was thrown, but the exit code will be determined from the Batch Run
Tree statuscom.tangosol.net.RequestPolicyException: No storage-enabled
nodes exist for service DistributedCache at
com.tangosol.coherence.component.util.daemon.queueProcessor.service.gri
d.partitionedService.PartitionedCache$BinaryMap.onMissingStorage(Partit
ionedCache.CDB:23) at …
To avoid this issue ensure at least one instance of the threadpool used by the submitter is started prior to initiating
the submitter.
Communication Delays

In CLUSTERED mode, Oracle Coherence uses the network to communicate between the various components and if
the workload in the cluster is delayed due to latency, JVM workload, garbage collection or traffic congestion then the
communications between components may be delayed. This can manifest in batch delays and messages similar to
one below to indicate a network delay:
… [Logger@9242415 3.7.1.0] WARN (Coherence) 2011-09-14

18:08:15.007/27592.726 Oracle Coherence GE 3.7.1.0 <Warning>
(thread=PacketPublisher, member=7): Experienced a 2191 ms communication
delay (probable remote GC) with Member(Id=1, Timestamp=2011-09-14
10:19:35.724, Address=126.29.53.60:11088, MachineId=33852,
Location=machine:someremotemachine.oracle.com,process:20984,
Role=SplwgBaseApiThreadPoolWorker); 28 packets rescheduled,
PauseRate=1.0E-4, Threshold=1878
To resolve this issue it is recommended to adjust the settings in the configuration files to take the workload, latency
and traffic into your network into account using the guidelines outlined in
http://docs.oracle.com/cd/E24290_01/coh.371/e22838/tune_perftune.htm.
Threadpool Work Abends

In products using COBOL for extensions, the COBOL objects can cause the threadpool worker to fail if the COBOL
runtime encounters an error such as a subscript out of range, excessive heap consumption, process limits exceeded
etc.
To find these errors, look in the following locations:
» Examine the threadpoolworker and/or submitter logs (including the stdout logs) for an indication of an error
message from the COBOL object causing the error.
» If the logs do not contain any information enable tracing on the batch control and rerun the job to assist in finding
the object.
» Alternatively it is possible to use Linux/UNIX pmap command against the process to track errors.
Once the error is isolated, then the COBOL object or data causing the error needs to be corrected to enable the job
to be successfully executed.
MAX-ERRORS Exceeded
One of the major features of the batch architecture is the ability to define the error tolerance (MAX-ERRORS). Whilst
the default setting, 0 (zero), disables this facility, if it is useful to set detect mass data errors. Typically, batch
processes data in batch (hence the name), and if there is a data problem across the data it is processing then MAX-
ERRORS can be used to detect data set wide issues and prevent large numbers of errors (and any associated To Do
entries) being created.
The value of the MAX-ERRORS will vary from job to job and will depend on the error tolerance and likelihood of errors
in that job as well as the sensitivity of the business to any errors. For example, jobs that load data from external
sources would be ideal candidates for setting MAX-ERRORS as they would catch when the external system sent
invalid data in the data set. They are also useful for jobs where the processing is heavily reliant on configuration
settings such as calculations where error trapping can detect wrong administration data. For example, if you are
using a rate or calculation that will generate errors if misconfigured, it would be useful to set MAX-ERRORS to catch
such misconfigurations.

Jobs stopping due to MAX-ERRORS being exceeded is not strictly an error. It is a safety valve for mass data or
configuration errors.

Oracle Corporation, World Headquarters Worldwide Inquiries
500 Oracle Parkway Phone: +1.650.506.7000
Redwood Shores, CA 94065, USA Fax: +1.650.506.7200
CONNECT WITH US
blogs.oracle.com/theshortenspot Copyright © 2007-2015, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and
the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
facebook.com/oracle warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
twitter.com/theshortenspot means, electronic or mechanical, for any purpose, without our prior written permission.
oracle.com Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and
are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are
trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0415

Batch Best Practices PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Batch Best Practices PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Batch Best Practices

Oracle Utilities Application Framework

Conventions used in this whitepaper 1

COBOL Versus Java Processes 4

Executing Batch Jobs 4

Scheduler Submission Overview 5

Configuration File Hierarchy 6

Issues with DISTRIBUTED mode 9

CLUSTERED Mode internals 12

BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

Clustering using Unicast or Multicast 15

Migrating from DISTRIBUTED to CLUSTERED mode 16

CLUSTERED Mode Operations 17

Testing for multi-cast 18

Demonstration Single machine setup 18

CLUSTERED Mode recommendations 19

Threadpools and Database Recycling 20

Operational Best Practices 25

Use Of Batch Control For Parameters 26

Multiple Batch Controls for a Batch Program 26

Take Defaults For Parameters 26

Commonly Used Configuration Settings 27

Setting The Error Tolerance 27

Altering Commit Interval and Cursor Reinitialization time 28

Online Daemon or Standalone Daemon 30

Online Submission Guidelines 31

Maximum Execution Attempts 31

1 BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

Designing Your Threadpools 33

Threadpool Cluster Considerations 33

COBOL Memory Optimization for Batch 34

Example Clustered Mode scenarios 34

Generic configuration process 35

Setup Clustered Mode 37

Starting the threadpools 41

Monitoring Background processes 42

Global Batch View 42

Database Connection information 43

Flushing the Batch Cache 45

Online Submission Alternative Threadpools 45

Restart Threadpools Regularly 45

Overriding Threadpool log file names 46

Submitter Nodes Per Job 47

Clustered Mode – Dedicated Storage Node Recommendation 47

Clustered Mode – Roles Recommendation 48

Clustered Mode – Private Network Recommendation 50

2 BATCH BEST PRACTICES - ORACLE UTILITIES APPLICATION FRAMEWORK

Batch Memory Management 50

Setting Batch Log Filename Prefix 51

Adding Custom JMXInfo for Monitoring 52

Managing configuration using Batch Edit 52

BatchEdit Commmand Line 55

BatchEdit Configuration Process 56

Cluster Creation and Maintenance 57

Threadpool Creation and Maintenance 58

Submitter Creation and Maintenance 59

Using Cache Threadpools 60

BatchEdit Common Configurations 61

Converting from existing setup to use BatchEdit 62

Scheduling Best Practices 62

Optimize the schedule 62

Use a third party batch scheduler 63

Scheduler Implementation Guidelines 64