Documente Academic
Documente Profesional
Documente Cultură
Best practices
Workload Management Server
Overview and Best Practice
Yong Li
Development Manager – InfoSphere High
Performance Engine
Xiaoyan Pu
Software Development Engineer
Hanson Lieu
Software Development Engineer
Ron Liu
InfoSphere Information Server Performance
Len Greenwood
IO
Memory
CPU
Network bandwidth
When the system is underutilized, it causes inefficiency. When the system is overloaded,
jobs often run into different problems. Common symptoms include network timeout,
slow job startup, job hang, and even job crash due to out of memory. Even if some jobs
run, they tend to run much slower than if they are properly scheduled due to excessive
system swapping (Context switching). IBM InfoSphere Information Server Workload
Management (WLM) solves this problem by regulating the workload execution
environment to maximize system throughput and maintain a stable and much more
predictable runtime environment.
WLM is a new server component available in IBM Information Server 9.1 release. This
server component is installed on the Engine tier. Its main role is to monitor the machine
resource usage such as CPU/Memory, keep tracking of workload (job) count, and
dynamically dispatch jobs for execution or place jobs into a queue and release the job for
execution when resource becomes available.
This article describes the WLM Server system architecture, installation and configuration
process, WLM user interface, workload dispatching rules and best practices to leverage
the power of WLM Server.
The WLM user interface is embedded as a tab in IBM InfoSphere Information Server
DataStage Operations Console user interface and can be accessed using the following
URL:
http://host:port/ibm/iis/ds/console
WLM user interface communicates with DataStage services running inside IBM
WebSphere Application Server via HTTP/REST APIs. DataStage services further
communicate with WLM Server process via ASBAgent proxy.
UNIX Platform
Login as dsadm, edit the DataStage Operations Console configuration file:
/opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg
Change:
DSODBON=0
to:
DSODBON=1
Change:
WLMON=0
to:
WLMON=1
There are many other parameters that you can configure to further fine tune the behavior
of DataStage Operations Console and WLM Server. To get started, all you need is to turn
on DSODBON and WLMON.
Windows Platform
If your installation is on Windows server machine, then you can edit the
C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg setting
DSODBON=1
WLMON=1
Unix Platform
First source the dsenv file
cd /opt/IBM/InformationServer/Server/DSEngine
. ./dsenv
Windows Platform
The first option is to bring up DataStage control panel applet:
You can also run the following script to stop and start DataStage Engine:
Once WLM Server is enabled in DSODBConfig.cfg file, and the DataStage server is
started, WLM will be started. When the DataStage server is stopped, WLM will be
stopped as well. You do not need to separately start or stop WLM.
Unix Platform
Run the following command to check the correctness of the configuration file:
bash-3.2$ pwd
Driver: com.ibm.db2.jcc.DB2Driver
Schema: DSODB
Test Successful.
bash-3.2$
AppWatcher:STARTED
EngMonApp:STARTING
ODBQueryApp:STARTING
ResMonApp:STARTING
bash-3.2$
C:\IBM\InformationServer\Server\DSODB>bin\DSAppWatcher.sh -test
Driver: com.ibm.db2.jcc.DB2Driver
Schema: DSODB
Test Successful.
C:\IBM\InformationServer\Server\DSODB>
In the Engine Status panel, wlmserver: OK indicates that WLM Server process is running
properly.
When you switch to the Workload Management tab, you see a screen similar to the
following screen:
System Policies
System policies define system wide settings for the WLM Server instance. When
determining whether to dispatch jobs for immediate execution, or put a job run request
into the queuing system, WLM first checks System Policies before it checks queue level
policies.
CPU Usage
When WLM detects that system CPU usage exceeds this configured threshold, it will
place incoming job run requests into queue. The default value for this setting is 80%.
Memory Usage
When WLM detects that memory usage exceeds this configured threshold, it will place
incoming job run request into queue. The default value for this setting is 80%.
Depending on your specific requirements and use cases, you can use one, two, three or
all four settings concurrently to throttle the system. For instance, if you set CPU usage to
100%, then you effectively disable CPU check. If you set CPU to 0%, then you effectively
put all incoming jobs into queues.
• MyQueue
• IA (Reserved queue)
• HighPriorityJobs
• LowPriorityJobs
• MediumPriorityJobs
Reserved queues (IA, ISD, and DataClick) are special queues for IBM InfoSphere
Information Analyzer, IBM InfoSphere Information Service Director, and IBM InfoSphere
DataClick. They can not be deleted or modified.
For each non-empty queue, the tab also lists the number of jobs, the name of the job,
project name, process id of the job etc. A user with the DataStage Administrative role will
see all jobs in the queue. A non-administrative user can only see jobs that belong to
projects that they have access to.
There is also a special link for active running jobs. Click this link to bring up a new page
that shows a list of currently running jobs.
If you are a non-IBM Information Server administrator, you can remove a job from a
queue if those jobs belong to projects that you have access to.
The queue level policy for Job Count limits the number of jobs that were sent to that
queue and can be concurrently running. If WLM Server successfully checks system level
policies, then it will check queue level policy when determining job execution.
To create a new queue, click New Queue. In the Queue Management - New Queue
dialog box, enter a queue name, specify whether this queue will be treated as default
queue, set queue priority, max running jobs on this queue and give a short description.
Click Save to create this queue.
• Default
• Priority setting
• Queue description
After you have modified the settings, select Save to save the changes.
In addition, you can also use the following command to query information regarding
available queues:
C:\work>C:\IBM\InformationServer\Clients\Classic\dsjob.exe -
domain NONE -user <user> -password <password> -server <Engine> -
lqueues
LowPriorityJobs
MediumPriorityJobs
HighPriorityJobs
WarehouseIntegrationQueue
• Elapsed Time
The following sections discuss the semantics of each rule and the specific use case.
Priority Weight
The priority of a job is derived from the priority of the queue it was submitted to and
from the elapsed time since the job was submitted to the queue. This rule is the default
rule. The priority weight offset is roughly 15 minutes. If three jobs are submitted to high,
medium, and low priority queue at the same time, respectively, assuming enough
resources are available, the medium priority job will start 15 minutes later than the high
priority job; likewise, the low priority job will start 15 minutes later than the medium
priority job. A high priority job submitted within this 15 minutes window will run before
the medium priority job.
You should select this rule if priority is important. In terms of resource allocation, a high
priority queue should not take more resources than it actually needs. The rule here is to
ensure high priority jobs are run as soon as possible. Having fewer high priority
concurrent jobs helps achieve this goal as each job can get more physical resources such
as CPU and memory. This approach also makes sure that when there are no high priority
jobs, more resources can be utilized by medium priority jobs, or low priority jobs.
This rule, if applied to the queues with the same priority, falls back to the ElapsedTime
rule.
Although JobRunRatio is designed to support different priorities, you can set this rule
and assign the same priority to all queues. For example, the job run ratio 0:20:0 means
that 20 jobs from medium priority queues can run concurrently assuming there are no
high an low priority queues. This rule should be used if queued time and priority are not
a concern.
This rule should be considered if you want to maintain priorities and also balance jobs
across queues. The queued time is no longer a factor in this rule. If there are multiple
queues with the same priority, the job that calls back to WLM first gets the chance to run.
For example, the job run ratio 5:2:1 means 5 high priority jobs, 2 medium priority jobs,
and 1 low priority job. If there are 2 high priority queues, 3 medium priority queues, and
1 low priority queue, the 5 high priority jobs can come from one or both high priority
Elapsed Time
The priority of a job is derived from the elapsed time since the job was submitted to the
queue. This rule gives the highest priority to the job that was submitted first, irrespective
of the queue it was submitted to.
You can select ElapsedTime as the priority rule and assign the same default priority to all
the queues in the system. A simply way to allocate resources is to evenly apply JobCount
to all queues. You can determine JobCount per queue first then multiply it by the number
of queues to determine JobCount for the entire system. This approach makes it easy to
achieve fairness, but resources claimed by empty queues cannot be re-allocated even if
other queues have pending jobs.
The ElapsedTime rule is based on the queued time only regardless of the priority of a
queue. If you want to take priorities into consideration, you need to select
PriorityWeight.
With DataStage Administrator role, the user can perform the following operations:
• Create, edit, or delete any queues except system reserved queues for IA, ISD
and DataClick
• See jobs in the queue if the jobs are in the projects that current users have
access right, otherwise, current users can only see a number indicating the total
number of jobs in the queue but not specific job details
• Can not move jobs to different queues or promote jobs to the top of the queue.
Advanced Configurations
C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg
# The following allows a job to run outside of WLM if communication between the
DataStage runtime and WLM failed.
# A setting of 0 will stop the job if communication with the WLM failed.
# A setting of 1 will not send the job to the WLM. It will run immediately.
WLM_CONTINUE_ON_COMMS_ERROR=0
# The following sends a job to the default queue if the queue specified is no longer valid.
# The following specifies the time a job will wait on the pending queue.
# If this time has been exceeded, the job will be stopped and removed from the queue.
WLM_QUEUE_WAIT_TIMEOUT=0
uvconfig Tuning
When jobs are in a running state or queuing state, they consume DSEngine lock resource.
Specifically, DSD_RUN process will acquire the following locks:
• RT_CONFIG lock to prevent others from compiling or deleting the current job
Default configuration in uvconfig can support up to 20 concurrent running jobs and 150
queuing jobs. If you need to run more jobs or queue more jobs, you may need to adjust
uvconfig parameters.
The following table lists some of the tested configuration from internal performance
study:
Troubleshooting
WLM generates a trace file in the /opt/IBM/InformationServer/Server/DSWLM/logs
folder. It will log to file on daily basis, and when the size of the log file exceeds 100MB, it
will switch to next log file automatically.
Best Practice
WLM provides flexibility for you to add and update queues in terms of priority and
resource constraints in addition to system-level resource configuration capability.
However, flexibility can lead to complexity if it is not clear how priority rules and
resource policies can work together to better mange workloads.
This section describes some use cases that can help demonstrate WLM queue
management functionality. It starts with defining queues with the same priority but
different criteria, then moves on to cover various scenarios for different priority rules.
Job Characteristics
In this scenario, job characteristic is used to define a queue, which can be either design or
performance characteristic.
Queues
• IA job queue
• QS job queue
Parallel Configurations
You can balance system resources by defining queues based on parallel configuration.
• Large parallel job queue where job runs on more than 4-node
JobCount resource is not evenly distributed; instead, more resources are allocated to 1-
node jobs than 4-node jobs as the 1-node job creates fewer processes. An example
JobCount allocation could be: total JobCount is 10, 5 for 1-node jobs, 3 for 2-node jobs,
and 2 for 4-node jobs.
Development/Testing/Production Environments
If a system is shared by development, testing, and production, you may want to create
queues based on this sharing characteristic.
Similar to scenario 9.1.4, total JobCount is 20 with 10 for production, 5 for testing, and 5
for development.
CPU and memory caps are also designed for the system where InfoSphere Information
Server needs to share physical resources with other application software. You can set a
limit on how much CPU (or memory) Information Server can utilize and make sure other
software application also gets enough physical resources to run on the same system.
Use Cases
This section describes some common ways you might want to restrict job runs on a
system, and how you could achieve that with the WLM controls currently defined. The
examples go from simple to more complex.
Scenario 9.4.3 is an attempt to cover why you would set up more than one queue at the
same priority. Scenario 9.4.3b shows when to use JobRunRatio which does not take
queued time into consideration and does not emphasize on the order of job submission.
9.4.1b. You don't want to overstress the system because you know that when it runs at
near capacity things start to fail.
- Set system CPU limit and/or a memory limit and/or a "no more than N jobs in X
seconds" policy.
You only want 20 jobs to run in the whole system, but some are
higher priority than others, and need to go first when there is a
spare slot.
o Set up 2 queues: High and Medium.
o Each queue has a job count max = 20
o Ensure System job count also = 20.
o Queue priority rule: leave the default, i.e.Priority Weight, will mean that
anything that has to wait on the High queue will get to run before
anything on the Medium queue.
9.4.2b. You want to ensure that at least 2 high priority jobs can run at all times.
As above, but:
This means that even if there are lots of medium jobs, they will leave at least 2 system
slots for high jobs if they appear. On the other hand, up to 20 high jobs could run, if there
were that many, and medium jobs would have to wait.
9.4.2c. You don't want the Medium queue to be completely locked out when a lot of
high jobs get submitted
Add:
- Queue priority rule: use ratio of High:Medium = 5:1, which means a Medium job will
get a go every 5 High jobs, rather than get blocked.
9.4.3b If there is spare capacity on the system, a group can use it; but if another group
submits a job, they will get a slot as soon as possible - they will not be locked out just
because the first group got in early. And, each group should have equal access to
available slots,
That means that up to 20 jobs can be running from any queue. However when something
turns up on another queue, that other queue can get in rather than having to wait for all
older jobs to drain first. If you submit 400 jobs to Q1 at the start of the day, you will get to
run 20 jobs at a time; if you submit a job to Q2 later on, you don’t have to wait until 381
jobs off Q1 have run before Q2 gets a look in.
9.4.3c. In conjunction with the above, you need a High priority system so that some
jobs get preference over anything else that may be running.
Having a higher priority queue around means it ill take precedence when there is a free
slot at system level. If the system limit has been reached, the High job will have to wait;
but as soon as a job finished, the high queued job will start next.
Contributors
Yong Li
Development Manager – InfoSphere High
Performance Engine
Xiaoyan Pu
Software Development Engineer
Hanson Lieu
Software Development Engineer
Ron Liu
InfoSphere Information Server Performance
Len Greenwood
Software Architect
Ashley Holland
DataStage Software Developer
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-
INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only
and do not in any manner serve as an endorsement of those websites. The materials at those
websites are not part of the materials for this IBM product and use of those websites is at your
own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE: © Copyright IBM Corporation 2012, 2013. All Rights Reserved.
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Contacting IBM
To contact IBM in your country or region, check the IBM Directory of Worldwide
Contacts at http://www.ibm.com/planetwide