Better Analysis With Performance Data Investigator QUSR 2019 4 Part 1 PDF

Better Analysis with
Performance Data
Investigator : Part 1
Lora Powell
Advisory Software Engineer
lrpowell@us.ibm.com
2019
© 2018 IBM Corporation

Agenda
• Tips for Navigator and PDI
• General Health Indicators
• Analysis of Performance Problems

– Scenario #1
– Wait Accounting
– Scenario #2 - Job Watcher
– Demo with Database perspectives

Browser Support
• Supported Browsers for the latest Navigator enhancements, latest
version of:
– Mozilla Firefox
– Google Chrome
– Apple Safari
– Microsoft Edge (new)
– Note: Internet Explorer no longer supported -

– Unsupported browser warning
• Update your browser

4
Browser Tips
• Clear your browser cache after installing PTFs
– Then close and restart browser
– Or, always run in Incognito/Private mode!
Chrome - FF - private
Incognito
• Review your browser security settings
– Allow pop-ups
• For details see the following web page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Te
chnology%20Updates/page/Browser%20tips
• Close unneeded tabs in your Navigator session

̶ Tasks in tabs consume resources and may cause performance degradation
• Avoid using PF-5 to Refresh a panel, instead use Refresh button found on
Navigator panels
• Unexpected results could be browser related. Example problems are….

• Hung charts
• Empty tables

5
Tips for Best Performance of Navigator
Note: Navigator will not run fast on a system that is already slow!
ü Ensure no bad DNS entries on the system
– http://www-01.ibm.com/support/docview.wss?uid=nas8N1010614
ü Use Application Runtime Expert to validate your environment
http://www.ibm.com/developerworks/ibmi/library/i-applicationruntime/index.html
– Network health checker (simple to use template for ARE, no charge to run) from QShell:
• /QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh –network
http://ibmsystemsmag.blogs.com/i_can/2013/09/application-runtime-expert-network-health-
checker.html
ü Close the Dashboard tab if you do not need it, it consumes system resources
since it is periodically pinging the system status
ü Managing System Performance on IBM i
ü Use the Web Performance Advisor to validate your Web Performance

http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzaie/rzaieconwebperfadvisor.htm
ü Ensure QMAXACTLVL system value is set to *NOMAX

This value is the number of jobs/threads that can simultaneously compete for memory and CPU.
6
Navigator Search
Search for Navigator tasks by things you know

You can find tasks without having to know how to navigate to them
7
Navigator - Favorites
Throughout Navigator, save favorites to quickly get to the function you want
– Including favorite Performance Data Investigator perspectives
8
Collection Services
• https://www.ibm.com/developerworks/community/wikis/home
?lang=en#!/wiki/IBM%20i%20Technology%20Updates/page/
Performance%20Data%20Collectors

Performance Instrumentation and Data Collection
The Advantage
•IBM develops the software stack, top to bottom

– Instruments the software with component-specific performance metrics
•IBM develops the performance data collectors that harvest those performance
metrics
•IBM i has an integrated database – Db2

– This is a BIG DEAL
– Performance data is stored in the database automatically
• No “add on” application is necessary – it’s all in the Operating System
•IBM provides the graphical analysis tools

– Analysis of the performance data in the Db2 files using SQL
IBM i has the best performance instrumentation and data collection capabilities in the industry!
© 2018 IBM Corporation 10
Collection Services
• Designed to be Always On – with minimal overhead
– If something goes wrong, you have data that will help analyze the
problem, fix it, and prevent it from happening in the future
– If you can’t solve the problem, you have information that makes it easier
for IBM Support to solve the problem faster
– This provides a reliable baseline - understand the impact that a

software, network, or environmental change had on the performance of
your system
– It also provides historical information - enables planning for future

growth based on real trends, not guesses
11
What is Collection Services?
IBM i function that collects performance
data at a system level
AND at a job/thread/task level

Collects data from many system
resources:
•Jobs
•Disk
•Buses
•Memory pools
Collection Services data is used by:
•Communication lines Performance Data Investigator
•…..many others System Monitors
Performance Tools for i
PM for Power Systems
iDoctor 12
Collection Services
• IBM recommends you always run Collection Services
• Collect data at regular intervals from 15 seconds to 1 hour

– Default is 15 minutes - suitable for trending/capacity planning
– Consider 5 minute intervals for problem determination and in-depth
performance analysis
• Data is initially stored in a management collection object

– Can hold large quantities of performance data with minimal overhead
• Includes valuable Wait Accounting information (more on this later…)
• Performance data is transferred into database files for each collection -

upon request or automatically.

Configuring Collection Services
For Graph
History data
collection
Check this box if you plan to use:

• Performance Data Investigator Collection Services can collect
• Performance Tools monitor data without starting a
• System Monitors system monitor
They all require data in the database files
The default is checked; leave it checked.

Rebuild Collections Table
• If you restore performance data without using the Restore
Performance Collection interface (or SAVPFRCOL & RSTPFRCOL
commands), collections won’t display in the Manage Collections view.
• Why do I need to rebuild the table?
– Use “Rebuild Collection Table” action
• will rebuild the meta-data used for the Manage
Collections task and then your performance data will be
visible.
Ever wonder why PDI
doesn’t list your
collection? This may
be the reason!

15
Performance Data
Investigator - PDI
https://www.ibm.com/developerworks/community/wikis/home?lang
=en#!/wiki/IBM%20i%20Technology%20Updates/page/Performanc
e%20on%20the%20web

Packaging
PDI Packages are listed under Investigate Data in Navigator
Included with the base

7.2 operating system
IBM Performance Tools –

Job Watcher feature
IBM Performance Tools –

7.2 Manager feature
7.3

17
Authority - Authorizing Users to PDI
• Users need to be authorized to use the investigate data and collection
manager performance tasks
• Include users on the QPMCCDATA and QPMCCFCN authorization lists
Edit Authorization List
Object . . . . . . . : QPMCCDATA Owner . . . . . . . : QSYS

Library . . . . . : QSYS Primary group . . . : *NONE
Type changes to current authorities, press Enter.
Object List
User Authority Mgt
*PUBLIC *EXCLUDE
QSYS *ALL X
PDI01 *USE
PDI02 *USE
PDI03 *USE
PDI04 *USE
PDI05 *USE
PDI06 *USE
PDI07 *USE
PDI08 *USE
PDI09 *USE
More...

18
Use PDI from a system other than where the data
was collected
Store data centrally if you have multiple physical or logical partitions
• Easier to analyze and backup
• Resource-intensive analysis won’t impact production partitions
Two variations:
1. Go to the data : Use Set Target
- Use PDI on one system while analyzing data on another
system
- Use PDI on one release to view data on another
2. Bring the data to PDI : Transfer the collections

- Save and restore the right way
- From a different release level
- Know whether you should convert or not
Back up key performance ©data as you would business data

2018 IBM Corporation
19
Viewing Data on another System
• Leave the data where it is!

– Use Set Target System to view data on one partition
while running Navigator from another
1. Log in to IBM Navigator from your updated development

partition
2. Set Target to your production system
• Look at your production system performance
Even View data on

other releases
Set Target System
Target System
Navigator
System
HTTP Server runs on the
system you initially log
into.
You can manage a second

system
• No web server is required
on the second system
• The Host Servers are
used 21
Transferring Collections to another partition
• Move the data to your Navigator system to analyze it with PDI
• Save the *CSFILE object
– Use *CSMGTCOL for sending it in to IBM or to save with less space
• Restore from Navigator on the other partition. (RSTPFRCOL)
– This works fine whether the customer saved it using SAVPFRCOL or
not.
Now just
Refresh to
view in table
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Tec
hnology%20Updates/page/Saving%20and%20Restoring%20from%20previous%20release
Collection File Level
• Collection Services data has different File Level for every release.
• Converting allows PDI to view the data without knowing about differences in
data from that (previous) release.
CVTPFRCOL
• After you convert a back-level collection to the current release, the file level will be
changed to match that for the release level.
But starting back in Fall 2016 - PDI now handles all Collection Services
collections for whatever file level they are set to on any release.
So we no longer need to convert CS collections. Still may need to convert Job

Watcher, Disk Watcher or PEX files.
7.2 7.3

To Convert or NOT Convert?
It Depends on the Collection Type
• If you CONVERT a file, then PDI looks at it as if it was created for a system
at that file level. For Collection Services, this then *loses* some of the
significant data for the system it was collected on.
– Wait bucket data – for example 6.1
• Bucket 20
7.1
– 6.1 was Classic JVM
– 7.1 not used so set to RESERVED 7.2+
– 7.2 reused for Journal Save While Active Time
• When you look at this data, you want to view it with the right bucket
labels!
• PML Single Source provides forward-compatibility in viewing your CS data.
PDI will show release specific information for your collection no matter what
future release you choose to view your collection from.
Don’t convert Collection Services Collections

Viewing Prior Release Performance data
For Collection Services data
• View as a collection for the release it was generated on

– Use Create Performance Data (CRTPFRDTA) command on the
partition it was collected on to get the data into database files
» Uses the current release format
– SAVPFRCOL, transfer and RSTPFRCOL on the desired partition
Note: the library in which the performance data is restored into

needs to contain only collections from that same release so that it is
in the same release format.
If created on 5.4 it will need to be converted.

There is no PDI package support for 5.4 collections

25
Viewing Prior Release Performance data
• For Job Watcher, Disk Watcher, or Performance Explorer
collections
• Convert the performance data to the current release format via the GUI
– Save the performance collection - SAVPFRCOL
– FTP the save file to the desired partition
– Restore the collection via the Collection Manager - RSTPFRCOL
– Convert the collection to the current release format - CVTPFRCOL
– convert the prior release database files to the current release.
Cons of converting CS data files – we could no longer show you the data specific
to the release it was collected on. Conversion could lose information.
26
PML Single Source
• No more maintaining PML for each release! All releases are handled
within the same PML file
– Release to release changes include:
• New fields, additional data (Database SQL, Memory data, etc)
• Wait Bucket name changes
– A 6.1 collection brought to 7.3 to view will correctly show you the
wait buckets for 6.1.
– Collections are forward-compatible with regard to PDI charting
– Release specific information for your collection no matter what release
you view the collection.
– The goal is that PDI will show you the most updated definition of
the chart possible for your given collection level on any release.
– Keep the collections at the same file level as they were generated on.
– Easier for us to maintain!

27
Release Specific Metrics
Database Perspectives
The perspectives available in the

Database package and the views varies
by release.
PDI will show you the most updated
definition of the chart possible for your
given collection and its content.
• Database Health Indicators is only

available on 7.2 and later
– QAPMSQLPC is only available on
7.2 and later

28
Collection and Perspective Compatibility
There is variation in the data available in the regular Q* Collection Services collections and the
System Monitor collections which are designated with a name of R*.
View List
shows what
files are
System Monitor collections required
don’t have all the same data as
a Collection Services collection
- SO, The perspectives

available may be different.
• Monitor files don’t
include QAPMSQLPC
data
• Database Health
Indicators perspective
requires QAPMSQLPC
so will not be available
for the R* collection

29
PDI Search
• Search all the

metrics
displayed
• Search the
entire SQL for all
perspectifves
Launch from search results to that
perspective to select your collection
30
Selectable Sort Perspectives
• Drilldown actions with integrated sort based on selected metric
• Select an interesting metric on the graph and then use Actions to select a
drilldown chart.
If you select a metric before drilling down into one of these perspectives, the
resulting chart will be sorted based on that selected metric.
If you do not select a metric before doing the drill down action, the chart will
be sorted based on a default metric.
The default is also used when the perspective is chosen from the initial
Investigate Data Perspectives screen where no select is available.
Best-kept
Great for drilling down into Waits (more on this later) PDI secret!
Selectable sort is only in the Collection Services package

Drilling Down – Waits by Job or Task
Sorted by Dispatch CPU Time

PDI - General Health Indicators
Start here to: Check the overall health of your system for today or in
past few days
+ Summary of key metrics in one view
+ Look at full day multiple metrics
+ Drilldown from here to breakdown over time
Summarizes the full collection for percent of intervals that exceed set
thresholds
Go next to: Drill down to other PDI health indicators charts or go to

Collection Services package to look at breakdown over time
+ more metrics
+ interval data 33
+ more in depth information: metrics by job, thread or task; wait bucket data
This feature comes with the base IBM i Operating System
General Health Indicators
Quick overview of Key Performance metrics
Database Health
Indicators were
introduced in 7.2
34

Health Indicators
System Resources Health Indicators
• Summarizes key metrics from 5 categories Drill into
areas that
exceed
thresholds
35

Database Health Indicators
This chart shows Database health indicators by
analyzing all collection time intervals according to the
defined thresholds for database. Use this chart to
determine the proportion of intervals where
Database health indicators exceeded the defined
thresholds.
Drilldowns 36
Health Indicators
Customize Health Indicator Thresholds
37

Performance – When should I be concerned?
• Users complain
• Change in system performance

– Less throughput
– Longer batch runtimes
• When compared to baseline

– A change in amount of wait time
– A new type of wait time appears
• An unexpected drop in CPU utilization

Performance Analysis - Where do I start?
• Start by asking questions:
– What was the symptom of the problem?
– Who reported the problem?
– What time did it occur?
– How long did it last?
– Have there been any recent changes?

• New or changed workload?
• Any application changes?
• Any recent hardware configuration changes?
– What was the scope?

• Did it impact the entire system?
• Did it impact some subset of work?
– Specific users?
– Specific applications?
39
Performance Data Investigator (PDI)
Start here to: Graphically view and analyze metrics collected

by IBM i collector tools
• Integrated and free

• Easy to use
• Simplified analysis
Go next to: Start a Job Watcher collection and view with PDI Job
Watcher package or iDoctor Job Watcher.
40
This feature comes with the base IBM i Operating System

Basic Performance Analysis
• Become familiar with your Collection Services performance data & tools to
understand your performance characteristics
– Be proactive!
• Have a knowledge of your baseline performance data
• When a performance problem occurs you often need to use performance

analysis tools to identify the cause of the problem to correct it
41
Starting Point
Start with CPU Utilization and Waits Overview
• Shows CPU Utilization (red line)
• Shows Wait Information (stacked bars)
• Green bars are disk time
– Identify when the CPU utilization dropped that disk time went up
• Do full zoom out first to find time span of interest
• Then zoom in and/or drill down for further analysis
– Type of disk operations
– Contributing jobs
42
Understanding the basics
Each bar graph shows accumulated times from all active jobs on the system at that
moment of IBM i Collection Services sampling period
Each bar graph is a time interval of collected data
You can interact with the graph: Select the tooltips icon, then hover over a bar or
line to see more details of that component
Use the data on one chart to modify the next chart – for example: date & time
filtering, sorting the next chart by a selected metric
43
Problem Analysis
Scenario
using PDI

CPU Utilization and Waits Overview
CPU Utilization and Waits Overview is an excellent starting place. Look for interesting points.
Next steps will depend upon the answer to the prior questions, along with what you see.

45
Example Analysis - Start
• Use tooltips to get specific data points
• Look for peaks or drops in CPU consumption Drop in CPU consumption in interval 11.
– OS contention time: Also note operating system contention
time just before that in interval 10
– Select the low point

• Drill down for that
interval to Waits Overview
You can select one point or a range by

selecting first and last intervals
46
Waits Overview – one interval
Most of the wait time is disk page faulting
By clicking in the Disk Page Faults Time bar before going to Waits by
Job or Task, that chart will be sorted by Disk Page Faults time
47
Waits by Job or Task
• By clicking on Disk Page Faults Time before going to Waits by Job or
Task, this chart is sorted by Disk Page Faults time
Click on the first job and select All Waits by Thread or Task
Hover over to see tooltips on the page faults time for the job
48
All Waits by Thread or Task
– See multiple threads spending nearly all their time waiting on page faults
Zoom Region
Interval size is 900 seconds

49
All Waits by Thread or Task - Zoom
• Zoom in to see if there is other data on the leftmost side of chart
– Less than 1 second per interval is spent in CPU time
50
Waits Overview
What else can we find out?

Waits by server type - What server
Waits by job current user profile - Who is the client
51
Waits by Server Type
52
Waits by Job Current User Profile
- Go back to Waits Overview, then

- »Select Waits by Job Current User Profile chart
- »Can see the user that this server is doing work for
53
Back “Home” – CPU Utilization & Waits
Next go to Contention Waits Overview for the same interval

54
Contention Waits Overview
Drill down into Waits by Job or Task to see
if we can figure out what jobs are
contributing to this contention
Machine Level Gate Serialization
55
What jobs are contributing to this contention?
Need to look at Job Watcher data to find out more:

holders
call stacks
56
Job Watcher - Object Lock Contention
You should see a single bar. Click on

it and then drill down into “Interval
Details for One Thread or Task”
Note the following:

- Object Waited on
- Holding job or task
- Call Stack of the
thread requesting
the lock
57
Object Lock Contention
• The QSCLICEV job wanted to lock the WATCHEVENTSPACE, but
was unable to do so.
• Job QZRCSRVS/QUSER/014097 held the lock.
• It turns out this particular example was due to a code defect

– the lock was not released by the QZRCSRVS job.
– The fact that this job held the lock was sufficient information for
the developer to identify and correct the defect.
58
Collection Services vs Job Watcher
§ Collection Services and Job Watcher both collect wait
information
– Graphically view the data that show waits
– Collection Services runs by default
– Job Watcher data is generally collected when additional
information is necessary to analyze a problem
§ Job Watcher can also collect call stacks and SQL statements
– Provides additional information for detailed analysis
– More frequent intervals
– More detailed wait information
– Objects being waited on
– Holder of object
• Call Stacks
59
Wait Accounting
Wait Accounting helps to determine if a

wait condition is a problem

Why are we waiting?
Figure out why a job is waiting
Is it valid or can we shorten it? -> improve our run time
Performance Fact:
“All computers wait at the same speed”

What is Wait Accounting?
Wait Accounting = the ability to determine what a job is doing when it is
waiting (not “running”)
– i Exclusive!! Patented IBM i technology built into IBM i
But what is it waiting for? Waits may be normal, some waits are not normal
Wait Accounting
• Wait Accounting is used to understand what is happening

when a job is not running.
– Wait information is tracked for each job, thread and task on system
• A job spends its time in one of three states

1. CPU
• Time spent dispatched to the processor, active/running
2. CPU queuing
• Ready to be processed, waiting for a processor to become available
3. Wait
• Waiting for something or someone, blocked or idle
Using Run-Wait Analysis to Improve your Job Performance

63
Basics of Waiting
• Two basic types of waits
– Idle: waiting for a work request
• Typically not indicative of a problem
Waiting for the “Enter” key to be pressed on a 5250 display session
• If a problem, usually external to the machine
i.e - slow arrival of work requests due to communications problem
• Possible, but not typical in batch jobs
i.e.- waiting for an entry to be placed on a data queue
– Blocked: waits that occur while performing a work request
Blocked waits are the ones we want to take a closer look at

• Outside of CPU usage and CPU queuing time, blocked waits are
the reason jobs/threads take as long as they do to complete their
work
http://ibmsystemsmag.blogs.com/i_can/2009/11/i-can-tell-you-why-youre-waiting.html
65
Run/Wait Signature
Typical batch job run/wait signature

CPU CPU queue Wait
Elapsed time
Interactive job run/wait signature

Idle CPU CPU queue Wait
Elapsed time

Basic example:
Batch job with total run time of 6 hours
Run/wait signature
CPU CPU queue Wait
120 min 70 min 170 min
Elapsed time 6 hours (360 mins)
Potential to run in 3 hrs 10 min if wait time could be eliminated
Wait analysis and reduction can be a powerful and cost-

effective way to improve response time and throughput
66
Detailing wait time
§ Determine the components of time spent waiting
Elapsed time
CPU CPU queue Wait
Record
Disk reads Disk writes Journal
locks

68
Detailing wait time:

Metrics related to components of wait time
Total Disk reads Disk writes Record Journal

count Locks
3,523 17,772 355 5,741
Total
time 42 sec 73 sec 45 sec 44 sec
0.012 sec 0.004 sec 0.126 sec 0.007 sec
Avg time
per wait
We can see questions to ask:

● How many of the reads are page faults? Could memory/pool changes help?
● What programs are causing reads? Could they be reduced or made async?
● What programs are causing writes? Could they be reduced or made async?
● What Db2 files are involved with the record locks?
● What files are being journaled? Are journals needed and optimally configured?

Expected vs. Unexpected Waits
• Some waits are “expected” and others “unexpected”

– A record lock may be expected, but one that lasts for a very long
duration is unexpected
– Regardless of the type of wait, it is always better if wait time can

be minimized or eliminated
– There are a few block points on the system that are almost
always considered unexpected
69
Holders, Waiters, and Call Stacks
• IBM i keeps track of who is holding a resource, and if applicable, who is
waiting to access that resource
– A Holder is the job/thread/task that is holding the serialized resource
– A Waiter is the job/thread/task that wants to access the serialized
resource
• IBM i also maintains call stacks for every job/thread/task
• The combination of
– Who - holders and waiters
– What – the resource being waited on
– How - call stacks
provides a very powerful solution for analyzing wait conditions
• Job Watcher is the tool to accomplish this

– Collection services can tell you about waits, but not about holders/waiter or call stacks

71
Wait Accounting - “Wait Buckets”

§ Licensed Internal Code (LIC) has identified points where waits occur and
assigned a numeric identifier to each point, also known as a “block point”
§ This level of detail poses challenges to efficiently evaluate the wait time in a
job
– Difficult to fully understand hundreds of block points
– Huge amounts of storage would be needed to track counts and times
for every block point in every job
§ For this reason, block points are assigned to categories commonly referred
to as “wait buckets”
§ “Mapping” refers to how block points are assigned to buckets

Blue – Blocked waits
32 Wait Buckets
• Time dispatched on a CPU • Object lock contention
• CPU queuing • Ineligible waits
• Reserved • Main storage pool
• Other waits overcommitment
• Disk page faults • Journal save while active
• Disk non fault reads • Reserved (7.3 ….)
• Disk space usage contention • Reserved (7.3….)
• Disk op-start contention • Socket accepts (idle)
• Disk writes • Socket transmits
• Disk other • Socket receives
• Journaling • Socket other
• Semaphore contention • IFS
• Mutex contention • PASE
• Machine level gate • Data queue receives
serialization • Idle / waiting for work
• Seize contention • Synchronization Token
• Database record lock contention
contention • Abnormal contention
http://www.ibm.com/developerworks/ibmi/library/i-ibmi-wait-accounting/
http://public.dhe.ibm.com/services/us/igsc/idoctor/Job_Waits_White_Paper_61_71.pdf
73
Wait Bucket Example

§ Database Record Lock Contention – Bucket 16
– Several different causes for waits in this bucket

● Read
● Update
● Weak
● Transfer
● Check
● Conflict Exit
Note: The Information Center also describes ‘buckets’ as ‘groups’ or ‘sets’

Understanding “Time Dispatched on a CPU”
Time dispatched on a CPU (Bucket 1)

• Thread or task has been assigned to a processor and is NOT waiting
• Complicated by certain features
• Hardware Multi Threading (HMT)
● Allows multiple threads/tasks to be assigned to a single physical processor
● Causes bucket 1 time to be greater than actual CPU time
• Background assisting tasks
● Promote their CPU usage back into the client job/thread
● Causes client thread’s bucket 1 time to be smaller than measured CPU time
• LPAR shared/partial processors
● Bucket 1 records time dispatched to the virtual processor
● Bucket 1 time may be greater than CPU time because it may include time the
thread/task is waiting for the physical processor behind the virtual processor
Bucket 1 – Time Dispatched on a CPU does NOT equal CPU time

74
75
Understanding “CPU Queuing”
• CPU Queuing (Bucket 2)

– Thread or task has been assigned to a processor and is waiting for the
CPU to become available
• Too much work on the partition causing threads to need to wait for the
processors
• Spiky workloads - I/O completing in batches can cause this but so can software
design.
• Workload Groups - workload group can be over-committed even though the system
is under-committed
• Shared processors – Latency due to hypervisor sharing the physical

processors among multiple partitions

76
Waits that Applications use
• Disk waits
• Semaphores, Mutexes, Synchronization Tokens
• Journaling
• Database record locks
• Object locks
• Sockets

Waits Overview – Collection Services & Job Watcher
“Wait buckets” are collected and

viewable by Collection Services
and Job Watcher
Job Watcher data is more

granular and has additional drill-
down capabilities

The next question likely would be which job(s) are incurring this wait time. Drilling
down further, we can see the list of jobs incurring this wait time:
This type of chart can also be used to understand a job(s) “run-wait” signature

Analyze Job Watcher
data using PDI

Job Watcher
Start here to: Job Watcher returns real-time information about a selected
set of jobs, threads, or LIC tasks
Keep track of performance of a specific job or how it might be affecting
system performance.
Or dig into a job that was seen to be causing problems when viewed in
Collection Services
• Data collected by Job Watcher includes
– Wait times
Run Job Watcher when you need detailed
– CPU performance data for diagnostic purposes.
– I/O activity
There are clients that run Job Watcher
– Call Stacks 24x7
to always have diagnostic data available.
– SQL statements
– Communications statistics Need to manage the data carefully. 80
– Activation Group statistics
This feature Requires the Performance Tools Job Watcher feature – 5770PT1 option 3
Job Watcher
• Job Watcher collects more detailed performance data than Collection
Services and at more frequent intervals
– CPU and I/O (like Collection Services)
– Call Stacks
– SQL Statements
– Detailed Wait information:
• Objects being waited on, even record number of files
• Holder of object
• Job Watcher does not collect everything that Collection Services collects.
• It does not always collect information about every thread
– Thread must use CPU during interval
– Thread must exist for entire interval
• Data is written to Db2 files

81
Job Watcher Usage Tips
• Use Job Watcher when you need detailed performance data to

resolve a problem
– Typically problem has been scoped first by Collection Services
• For problem determination Job Watcher can be run on specific jobs
• Multiple collections can be run at the same time
• Need to manage the amount of

data collected
82
Basic Job Watcher Data Collection Steps
• Create the Job Watcher definition

– Or use one of the IBM-supplied definitions
• Start the Job Watcher collection
• Let it run until the problem has occurred
• Stop the Job Watcher collection
• Analyze the data
• There are times when you may want to

– run Job Watcher continuously
83
How do I analyze Job Watcher data?
§ Scope the problem

§ What time?
§ What users or jobs?
§ Look for trends in the data
§ Look for presence of waits

– Drill down into wait details
§ Display call stacks for running or waiting jobs
84
Viewing Waits with Job Watcher
• Use wait information
– Get to details on what a job is doing when it is not

running
– Determine what the job is waiting for
– Determine what job or thread has the resource being

waited on
Next: Example with Machine Level Gate Serialization

85
CPU Utilization and Waits Overview
§ In PDI, both Collection Services and Job Watcher have a
“CPU Utilization and Waits Overview” graph as a general
starting point for wait analysis
CPU Utilization and

Waits Overview
Full Zoom Out
86
Find timeframe and zoom in
- Look for unusual patterns as a way to start

- Zoom into the time where we see the large drop in CPU Utilization
87
Zoom in to see a specific time frame
We can see operating system contention
occurred during the time when the CPU
Utilization dropped
.
Select the beginning and ending intervals to investigate and then drill into Contention Waits
88
Overview © 2018 IBM Corporation
Contention Waits Overview
89
Note for JW, we select the sorting
© 2018(vs selection based sort)
IBM Corporation
90
Finding the significant wait

We want to see if we can figure out who might be causing the contention.
Drill into All Waits by Thread or Task Sorted by Machine Level Gate
Serialization so we can see the jobs/threads/tasks that are all waiting.
Note: Drilling into waits by thread or task can take some time…. be patient.
Machine level gate serialization is a major reason for the contention waits.
Zoom in to see more detail
We can’t see the machine level gate serialization details at first;
Zoom in and we can see it appear in many threads.
This tells us many threads were waiting.
…But why?
91
92
Select a thread and look at the waits for that one

thread Select a Thread
It may be necessary to drill down into interval details for several threads to find the
one with the information we need… © 2018 IBM Corporation
Select an interval
View Interval details for one thread or task
Select an interval
93
The Power of Job Watcher… Show Holder
• If there is a holding job or task for the current thread or task, the “Show Holder” button
will be displayed
• Can move to the next interval or specify an interval number
TESTWAITS
When clicking the “Show
Holder” button, the
holding job/task/thread
will be displayed – see it’s
call stack
QAUDJRN
QDBSRV02/QSYS/345313
QDBSRV02/QSYS/345313
Easily navigate
from one interval
to the next
94
94
View Call Stack
We can see the call stack to see how we got to this wait point
Job Watcher shows information about the object being waited on and call stacks
In the call stack you will see an entry that shows the job is creating an audit journal entry.
Note that access to the audit journal is serialized by a “gate”. So why is this job blocked and
waiting to create the audit record?
95
Thread or Task Details
Thread is waiting for the QAUDJRN
JOURNAL AT 8:51:05
Look at
the thread
that is
holding
the
resource
96
If the audit journal information was still available,
Audit Journal you could look at it.
This screen capture shows the audit journal entries
from the matching time period.
- NR is Next Receiver
- PR is Previous Receiver
97
Job Watcher – Example summary
• This exercise showed how a normal system function for going to a new
journal receiver affected the CPU utilization of the system for a short period
of time.
• In this scenario, the next steps would be to evaluate what information is

being captured in the security audit journal to ensure you are not auditing
information you do not need.
• This exercise also showed how powerful the Job Watcher capabilities are
for understanding the details of what is happening on the system.
• This is something only IBM i can do!
98
More Perspectives

Database with PDI
• SQL Plan Cache and SQL Performance Monitor database performance files can
be viewed with SQL Overview and SQL Attribute Mix perspectives in Performance
Data Investigator
• From the Database task viewing any of these database performance files, you can
launch into PDI to view the set of charts. From within PDI Perspective list panel, the
SQL Plan Cache Snapshot, Event Monitor and SQL Performance Monitor database
performance files can be seen in the Collection
– SQL plan cache data perspectives with new SQL collection services data
– Database I/O views for both Physical and Logical I/O metrics
– SQL Cursor and Native DB Opens
– Health Indicators perspective for Database Health is added to the Health
Indicators package
– Job Watcher package is enhanced with detailed Logical Database I/O
perspectives.
– DEMO ***
100
7.2+ - Additional Perspectives
Database Package 7.2
• I/O Reads and Writes
• Physical Database I/O - Detailed
• Logical Database I/O – Detailed
• SQL Performance Data –
Collection Services
Health Indicators Package

• Database Health Indicators
101
Database Health Indicators
This chart shows Database health indicators by
analyzing all collection time intervals according to the
defined thresholds for database. Use this chart to
determine the proportion of intervals where
Database health indicators exceeded the defined
thresholds.
Drilldowns 102
I/O Reads and Writes
103
SQL CPU Utilization
• Shows you the SQL CPU Utilization sorted by thread
• The starting point to determine if your CPU utilization is due to SQL or
other work
104
104
Database Locks Overview
• Database locks overview gives you a graph of database record lock
contention from Collection Services data
105
Database Locks Overview -
Drill down to find contributing jobs
We can find out it was the QRWTSRVR jobs with record lock contention
106
Job-Level Database Statistics
The following metrics have been added to the job performance data *JOBMI category
of Collection Services in 7.1
– SQL clock time (total time in SQ and below) per thread (microseconds)
– SQL unscaled CPU per thread (microseconds)
– SQL scaled CPU per thread (microseconds)
– SQL synchronous database reads per thread
– SQL synchronous nondatabase reads per thread
– SQL synchronous database writes per thread
– SQL synchronous nondatabase writes per thread
– SQL asynchronous database reads per thread
– SQL asynchronous nondatabase reads per thread
– SQL asynchronous database writes per thread
– SQL asynchronous nondatabase writes per thread
– Number of high level SQL statements per thread
– Special instructions to activate the support

https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/IBM%20i%20Technology%20Up
dates/page/Job%20Level%20SQL%20Metrics
– Error if you try to display one of these charts but have not activated the support:
107
Database – Physical Database I/O
108
Job-Level Database Statistics
7.1
• Ten new perspectives (8 on perspective list plus 2 drilldowns)
– Physical Database I/O for Jobs or Tasks - Detailed
– Physical Database I/O for One Job or Task - Detailed
109
More Perspectives

Disk Response Time Charts
A very easy interface to

see if you have slow
disk operations
111
111
Java Perspectives
Find that job using a

lot of heap…
112
112
Storage Allocation Perspectives
113
113
Storage Allocation by Thread or Task
114
114
Timeline Perspective
The timeline bars on the chart represent
the elapsed time of threads or tasks
– Dispatched CPU Time
– CPU Queuing Time
– Other Waits Time
115
115
Timeline Overview for Threads or Tasks
Drilldown to this new chart from

existing charts
- Waits by Job or Task
- All Waits by Thread or Task
Select one thread or task and drill down to

“All Waits for One Thread or Task”
or
“All Waits by Thread or Task”

116
QAPMCONF
Display Collection Services

DB Files
118
118
Memory
In a graphical view!
Note the change in pool sizes.

QPFRADJ is on.

119
Memory
• Memory perspectives are now available
• Similar information from what you get on WRKSYSSTS
• But this is collected on an interval basis and you can view across the
collection time

120
Memory Charts
3 views or charts in each
• Memory Pool Sizes and Fault Rates perspective
View 1: Memory Pool Sizes and Fault Rates (001-004)
View 2: Memory Pool Sizes (All Pools)
View 3: Fault Rates (All Pools)
• Memory Pool Activity Levels

View 1: Memory Pool Activity Levels and Ineligible Transitions Per Second (001-004)
View 2: Memory Pool Activity Levels (All Pools)
View 3: Ineligible Transitions Per Second (All Pools)
• DB and Non-DB Page Faults

View 1: DB and Non-DB Page Faults Overview (All Pools)
View 2: DB Page Faults (All Pools)
View 3: Non-DB Page Faults (All Pools)
• Drilldown:
– Memory Metrics for One Pool
View 1: Memory Metrics Overview for One Pool
View 2: DB and Non-DB Page Faults for One Pool
View 3: DB and Non-DB Pages Read/Written for One Pool
121
Memory Perspectives
Memory Pool Sizes and Fault Rates – View one: (Pools 001-004)
Memory Pool Activity

Levels – View one:
Memory metrics
overview for one pool
122
Memory Perspectives – DB and non-DB Page Faults
3 views
123
Memory - Drilldown
124
Storage Allocation Perspectives
125
Storage Allocation by Thread or Task
126
Physical System Perspectives
Start here to: View high-level cross-partition processor performance

metrics for the entire frame.
• All logical partitions on the same physical server (regardless of OS)

• Available on Power 6 and above
• Requires turning on HMC option to enable collection
127

Physical Systems Perspectives
Display overall CPU utilization for the physical box and all
partitions, regardless of operating system
128
http://ibmsystemsmag.blogs.com/i_can/2009/10/i-can-display-cpu-utilization-for-all-partitions.html

Logical Partitions Overview
Shown for each partition:
• Average CPU utilization
• CPU entitled time used
• Uncapped CPU time used
More data available in the Table view 129

I/O : Throughput Rates & Utilization
Data Throughput Rates and Utilization for selected I/O components
• Total Mega Bytes per Second

• Total Utilization %
More data available in the Table view 130

CPU Utilization and Waits Overview Sample 1
This is an example of an SQL Workload

• Disk IO Intensive
• Not CPU Intensive
SAVLIB to tape
started here
131
CPU Utilization and Waits Overview Sample 2
This is an RPG Workload

with many concurrent
active jobs
Started SAVLIB
to save file
132
133
Resource Utilization Percentages Sample 1
One of 2 views for

“Resource Utilization
Overview” perspective
- QAPMSYSTEM
- QAPMDISK

Resource Utilization Percentages Sample 2
SAVLIB to
save file IO-
CPU-bound
bound
workload
workload
134
Disk Throughput for Disk Pools Sample 1
This 720 system has 2 x 5908 Disk Controllers + 32 Disk Units
135
Disk Throughput for Disk Pools Sample 2
This
This 720
720 system
system has
has 2
2 xx 5908
5908 Disk
Disk
Controllers + 32 Disk Units
Controllers + 32 Disk Units
Much less Disk Wait Time than the
other Disk Throughput graph
Disk wait time comparison
This 740 system has 4 x 5908 Disk

This 740 system has 4 x 5908 Disk Controllers + 48 Disk Units Controllers + 48 Disk Units
136
137
Resource Utilization Rates Sample 1
View 2 of “Resource
Utilization Overview”
- QAPMJSUM
- QAPMSYSTEM
- QAPMPOOLB

138
Physical Disk I/O Overview - Basic Sample 1
Display disk IO /sec in reads

and writes (compare to the
preceding graph)

Waits Overview Sample 1
This system has 3

cores and 19GB RAM.
Adding more RAM
may reduce Disk Page
Faults Time.
139
Waits Overview Sample 2
A lot of
disk read
here Substantial
CPU queuing
but not too
much
140
Siezes and Locks Waits Overview Sample 1
Waits Overview chart with Disk Page

Faults Time (ten thousands of
seconds) on Waits Overview is more
significant than this Seize Contention
Time
This is selected waits as compared to what is shown on Waits Overview chart
141
Disk Waits Overview Sample 1
This is selected waits as compared to what is shown on Waits Overview chart

142
143
Contention Waits Overview Sample 1

144
Waits by Job or Task Sample 1

145
Waits by Job or Task Sample 2

CPU Utilization by Job or Task Sample 1
Use this to identify CPU

cycle “hogs” – should be
none here since CPU % is
low for each job
146
147
CPU Utilization by Thread or Task Sample 2
Use this to identify

CPU cycle “hogs”

Page Faults Overview Sample 1
148
149
Page Faults by Pool Sample 1
There are only 2 main memory pools in this system

Next Up:
Trending – Historical Data & Graph History
Modeling – Batch Model

Better Analysis With Performance Data Investigator QUSR 2019 4 Part 1 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Better Analysis With Performance Data Investigator QUSR 2019 4 Part 1 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Better Analysis with

© 2018 IBM Corporation

• Analysis of Performance Problems

© 2018 IBM Corporation

– Note: Internet Explorer no longer supported -

© 2018 IBM Corporation

• Close unneeded tabs in your Navigator session

• Unexpected results could be browser related. Example problems are….

© 2018 IBM Corporation

ü Managing System Performance on IBM i

ü Use the Web Performance Advisor to validate your Web Performance

ü Ensure QMAXACTLVL system value is set to *NOMAX

Search for Navigator tasks by things you know

© 2018 IBM Corporation

•IBM develops the software stack, top to bottom

•IBM i has an integrated database – Db2

•IBM provides the graphical analysis tools

– This provides a reliable baseline - understand the impact that a

– It also provides historical information - enables planning for future

AND at a job/thread/task level

• Collect data at regular intervals from 15 seconds to 1 hour

• Data is initially stored in a management collection object

• Includes valuable Wait Accounting information (more on this later…)

• Performance data is transferred into database files for each collection -

© 2018 IBM Corporation 13

Check this box if you plan to use:

© 2018 IBM Corporation 14

© 2018 IBM Corporation

© 2018 IBM Corporation

Included with the base

IBM Performance Tools –

IBM Performance Tools –

© 2018 IBM Corporation

• Include users on the QPMCCDATA and QPMCCFCN authorization lists

Edit Authorization List

Object . . . . . . . : QPMCCDATA Owner . . . . . . . : QSYS

Type changes to current authorities, press Enter.

© 2018 IBM Corporation

2. Bring the data to PDI : Transfer the collections

Back up key performance ©data as you would business data

• Leave the data where it is!

1. Log in to IBM Navigator from your updated development

Even View data on

You can manage a second

So we no longer need to convert CS collections. Still may need to convert Job

© 2018 IBM Corporation

Don’t convert Collection Services Collections

• View as a collection for the release it was generated on

– SAVPFRCOL, transfer and RSTPFRCOL on the desired partition

Note: the library in which the performance data is restored into

If created on 5.4 it will need to be converted.

© 2018 IBM Corporation

© 2018 IBM Corporation

The perspectives available in the

• Database Health Indicators is only

© 2018 IBM Corporation

- SO, The perspectives

© 2018 IBM Corporation

• Search all the

Selectable sort is only in the Collection Services package

© 2018 IBM Corporation

Go next to: Drill down to other PDI health indicators charts or go to

© 2018 IBM Corporation