Documente Academic
Documente Profesional
Documente Cultură
Performance Data
Investigator : Part 1
Lora Powell
Advisory Software Engineer
lrpowell@us.ibm.com
2019
– Mozilla Firefox
– Google Chrome
– Apple Safari
– Microsoft Edge (new)
• Avoid using PF-5 to Refresh a panel, instead use Refresh button found on
Navigator panels
Note: Navigator will not run fast on a system that is already slow!
ü Ensure no bad DNS entries on the system
– http://www-01.ibm.com/support/docview.wss?uid=nas8N1010614
ü Use Application Runtime Expert to validate your environment
http://www.ibm.com/developerworks/ibmi/library/i-applicationruntime/index.html
– Network health checker (simple to use template for ARE, no charge to run) from QShell:
• /QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh –network
http://ibmsystemsmag.blogs.com/i_can/2013/09/application-runtime-expert-network-health-
checker.html
ü Close the Dashboard tab if you do not need it, it consumes system resources
since it is periodically pinging the system status
6
© 2018 IBM Corporation
Navigator Search
7
© 2018 IBM Corporation
Navigator - Favorites
Throughout Navigator, save favorites to quickly get to the function you want
– Including favorite Performance Data Investigator perspectives
8
© 2018 IBM Corporation
Collection Services
• https://www.ibm.com/developerworks/community/wikis/home
?lang=en#!/wiki/IBM%20i%20Technology%20Updates/page/
Performance%20Data%20Collectors
•IBM develops the performance data collectors that harvest those performance
metrics
IBM i has the best performance instrumentation and data collection capabilities in the industry!
© 2018 IBM Corporation 10
Collection Services
• Designed to be Always On – with minimal overhead
– If something goes wrong, you have data that will help analyze the
problem, fix it, and prevent it from happening in the future
– If you can’t solve the problem, you have information that makes it easier
for IBM Support to solve the problem faster
11
© 2018 IBM Corporation
What is Collection Services?
IBM i function that collects performance
data at a system level
•Buses
•Memory pools
Collection Services data is used by:
•Communication lines Performance Data Investigator
•…..many others System Monitors
Performance Tools for i
PM for Power Systems
iDoctor 12
© 2018 IBM Corporation
Collection Services
• IBM recommends you always run Collection Services
For Graph
History data
collection
7.3
Object List
User Authority Mgt
*PUBLIC *EXCLUDE
QSYS *ALL X
PDI01 *USE
PDI02 *USE
PDI03 *USE
PDI04 *USE
PDI05 *USE
PDI06 *USE
PDI07 *USE
PDI08 *USE
PDI09 *USE
More...
Two variations:
1. Go to the data : Use Set Target
- Use PDI on one system while analyzing data on another
system
- Use PDI on one release to view data on another
Navigator
System
HTTP Server runs on the
system you initially log
into.
Now just
Refresh to
view in table
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Tec
hnology%20Updates/page/Saving%20and%20Restoring%20from%20previous%20release
© 2018 IBM Corporation
Collection File Level
• Collection Services data has different File Level for every release.
• Converting allows PDI to view the data without knowing about differences in
data from that (previous) release.
CVTPFRCOL
• After you convert a back-level collection to the current release, the file level will be
changed to match that for the release level.
But starting back in Fall 2016 - PDI now handles all Collection Services
collections for whatever file level they are set to on any release.
7.2 7.3
• When you look at this data, you want to view it with the right bucket
labels!
• PML Single Source provides forward-compatibility in viewing your CS data.
PDI will show release specific information for your collection no matter what
future release you choose to view your collection from.
Cons of converting CS data files – we could no longer show you the data specific
to the release it was collected on. Conversion could lose information.
© 2018 IBM Corporation
26
PML Single Source
• No more maintaining PML for each release! All releases are handled
within the same PML file
– Release to release changes include:
• New fields, additional data (Database SQL, Memory data, etc)
• Wait Bucket name changes
– A 6.1 collection brought to 7.3 to view will correctly show you the
wait buckets for 6.1.
– Collections are forward-compatible with regard to PDI charting
– Release specific information for your collection no matter what release
you view the collection.
– The goal is that PDI will show you the most updated definition of
the chart possible for your given collection level on any release.
– Keep the collections at the same file level as they were generated on.
– Easier for us to maintain!
Database Perspectives
View List
shows what
files are
System Monitor collections required
don’t have all the same data as
a Collection Services collection
If you select a metric before drilling down into one of these perspectives, the
resulting chart will be sorted based on that selected metric.
If you do not select a metric before doing the drill down action, the chart will
be sorted based on a default metric.
The default is also used when the perspective is chosen from the initial
Investigate Data Perspectives screen where no select is available.
Best-kept
Great for drilling down into Waits (more on this later) PDI secret!
Start here to: Check the overall health of your system for today or in
past few days
+ Summary of key metrics in one view
+ Look at full day multiple metrics
+ Drilldown from here to breakdown over time
Summarizes the full collection for percent of intervals that exceed set
thresholds
+ more in depth information: metrics by job, thread or task; wait bucket data
This feature comes with the base IBM i Operating System
© 2018 IBM Corporation
General Health Indicators
Quick overview of Key Performance metrics
Database Health
Indicators were
introduced in 7.2
34
35
Drilldowns 36
© 2018 IBM Corporation
Health Indicators
37
Go next to: Start a Job Watcher collection and view with PDI Job
Watcher package or iDoctor Job Watcher.
40
41
© 2018 IBM Corporation
Starting Point
Start with CPU Utilization and Waits Overview
• Shows CPU Utilization (red line)
– Identify when the CPU utilization dropped that disk time went up
– Contributing jobs
42
© 2018 IBM Corporation
Understanding the basics
Each bar graph shows accumulated times from all active jobs on the system at that
moment of IBM i Collection Services sampling period
You can interact with the graph: Select the tooltips icon, then hover over a bar or
line to see more details of that component
Use the data on one chart to modify the next chart – for example: date & time
filtering, sorting the next chart by a selected metric
43
© 2018 IBM Corporation
Problem Analysis
Scenario
using PDI
By clicking in the Disk Page Faults Time bar before going to Waits by
Job or Task, that chart will be sorted by Disk Page Faults time
47
© 2018 IBM Corporation
Waits by Job or Task
• By clicking on Disk Page Faults Time before going to Waits by Job or
Task, this chart is sorted by Disk Page Faults time
Click on the first job and select All Waits by Thread or Task
Hover over to see tooltips on the page faults time for the job
48
© 2018 IBM Corporation
All Waits by Thread or Task
– See multiple threads spending nearly all their time waiting on page faults
Zoom Region
50
© 2018 IBM Corporation
Waits Overview
52
© 2018 IBM Corporation
Waits by Job Current User Profile
55
© 2018 IBM Corporation
Waits by Job or Task
What jobs are contributing to this contention?
57
© 2018 IBM Corporation
Object Lock Contention
• The QSCLICEV job wanted to lock the WATCHEVENTSPACE, but
was unable to do so.
– The fact that this job held the lock was sufficient information for
the developer to identify and correct the defect.
58
© 2018 IBM Corporation
Collection Services vs Job Watcher
§ Collection Services and Job Watcher both collect wait
information
– Graphically view the data that show waits
– Collection Services runs by default
– Job Watcher data is generally collected when additional
information is necessary to analyze a problem
§ Job Watcher can also collect call stacks and SQL statements
– Provides additional information for detailed analysis
– More frequent intervals
– More detailed wait information
– Objects being waited on
– Holder of object
• Call Stacks
59
© 2018 IBM Corporation
Wait Accounting
Performance Fact:
But what is it waiting for? Waits may be normal, some waits are not normal
© 2018 IBM Corporation 62
Wait Accounting
Run/Wait Signature
Run/wait signature
CPU CPU queue Wait
120 min 70 min 170 min
Elapsed time 6 hours (360 mins)
66
© 2018 IBM Corporation
Detailing wait time
Elapsed time
CPU CPU queue Wait
Record
Disk reads Disk writes Journal
locks
– There are a few block points on the system that are almost
always considered unexpected
69
© 2018 IBM Corporation
Holders, Waiters, and Call Stacks
• IBM i keeps track of who is holding a resource, and if applicable, who is
waiting to access that resource
– A Holder is the job/thread/task that is holding the serialized resource
– A Waiter is the job/thread/task that wants to access the serialized
resource
• The combination of
– Who - holders and waiters
– What – the resource being waited on
– How - call stacks
provides a very powerful solution for analyzing wait conditions
§ This level of detail poses challenges to efficiently evaluate the wait time in a
job
– Difficult to fully understand hundreds of block points
– Huge amounts of storage would be needed to track counts and times
for every block point in every job
§ For this reason, block points are assigned to categories commonly referred
to as “wait buckets”
• Too much work on the partition causing threads to need to wait for the
processors
• Spiky workloads - I/O completing in batches can cause this but so can software
design.
• Workload Groups - workload group can be over-committed even though the system
is under-committed
• Disk waits
• Semaphores, Mutexes, Synchronization Tokens
• Journaling
• Database record locks
• Object locks
• Sockets
The next question likely would be which job(s) are incurring this wait time. Drilling
down further, we can see the list of jobs incurring this wait time:
This type of chart can also be used to understand a job(s) “run-wait” signature
Start here to: Job Watcher returns real-time information about a selected
set of jobs, threads, or LIC tasks
Keep track of performance of a specific job or how it might be affecting
system performance.
Or dig into a job that was seen to be causing problems when viewed in
Collection Services
• Data collected by Job Watcher includes
– Wait times
Run Job Watcher when you need detailed
– CPU performance data for diagnostic purposes.
– I/O activity
There are clients that run Job Watcher
– Call Stacks 24x7
to always have diagnostic data available.
– SQL statements
– Communications statistics Need to manage the data carefully. 80
– Activation Group statistics
This feature Requires the Performance Tools Job Watcher feature – 5770PT1 option 3
© 2018 IBM Corporation
Job Watcher
• Job Watcher collects more detailed performance data than Collection
Services and at more frequent intervals
– CPU and I/O (like Collection Services)
– Call Stacks
– SQL Statements
– Detailed Wait information:
• Objects being waited on, even record number of files
• Holder of object
• Job Watcher does not collect everything that Collection Services collects.
• It does not always collect information about every thread
– Thread must use CPU during interval
– Thread must exist for entire interval
82
© 2018 IBM Corporation
Basic Job Watcher Data Collection Steps
83
© 2018 IBM Corporation
How do I analyze Job Watcher data?
84
© 2018 IBM Corporation
Viewing Waits with Job Watcher
86
© 2018 IBM Corporation
Find timeframe and zoom in
.
Select the beginning and ending intervals to investigate and then drill into Contention Waits
88
Overview © 2018 IBM Corporation
Contention Waits Overview
89
Note for JW, we select the sorting
© 2018(vs selection based sort)
IBM Corporation
90
Note: Drilling into waits by thread or task can take some time…. be patient.
Machine level gate serialization is a major reason for the contention waits.
© 2018 IBM Corporation
Zoom in to see more detail
We can’t see the machine level gate serialization details at first;
Zoom in and we can see it appear in many threads.
This tells us many threads were waiting.
…But why?
91
© 2018 IBM Corporation
92
It may be necessary to drill down into interval details for several threads to find the
one with the information we need… © 2018 IBM Corporation
Select an interval
Select an interval
93
© 2018 IBM Corporation
The Power of Job Watcher… Show Holder
• If there is a holding job or task for the current thread or task, the “Show Holder” button
will be displayed
• Can move to the next interval or specify an interval number
TESTWAITS
When clicking the “Show
Holder” button, the
holding job/task/thread
will be displayed – see it’s
call stack
QAUDJRN
QDBSRV02/QSYS/345313
QDBSRV02/QSYS/345313
Easily navigate
from one interval
to the next
94
© 2018 IBM Corporation
94
View Call Stack
We can see the call stack to see how we got to this wait point
Job Watcher shows information about the object being waited on and call stacks
In the call stack you will see an entry that shows the job is creating an audit journal entry.
Note that access to the audit journal is serialized by a “gate”. So why is this job blocked and
waiting to create the audit record?
95
© 2018 IBM Corporation
Thread or Task Details
Thread is waiting for the QAUDJRN
JOURNAL AT 8:51:05
Look at
the thread
that is
holding
the
resource
96
© 2018 IBM Corporation
If the audit journal information was still available,
Audit Journal you could look at it.
This screen capture shows the audit journal entries
from the matching time period.
- NR is Next Receiver
- PR is Previous Receiver
97
© 2018 IBM Corporation
Job Watcher – Example summary
• This exercise showed how a normal system function for going to a new
journal receiver affected the CPU utilization of the system for a short period
of time.
• This exercise also showed how powerful the Job Watcher capabilities are
for understanding the details of what is happening on the system.
98
© 2018 IBM Corporation
More Perspectives
• From the Database task viewing any of these database performance files, you can
launch into PDI to view the set of charts. From within PDI Perspective list panel, the
SQL Plan Cache Snapshot, Event Monitor and SQL Performance Monitor database
performance files can be seen in the Collection
– SQL plan cache data perspectives with new SQL collection services data
– Database I/O views for both Physical and Logical I/O metrics
– SQL Cursor and Native DB Opens
– Health Indicators perspective for Database Health is added to the Health
Indicators package
– Job Watcher package is enhanced with detailed Logical Database I/O
perspectives.
– DEMO ***
100
© 2018 IBM Corporation
7.2+ - Additional Perspectives
Database Package 7.2
• I/O Reads and Writes
• Physical Database I/O - Detailed
• Logical Database I/O – Detailed
• SQL Performance Data –
Collection Services
101
© 2018 IBM Corporation
Database Health Indicators
This chart shows Database health indicators by
analyzing all collection time intervals according to the
defined thresholds for database. Use this chart to
determine the proportion of intervals where
Database health indicators exceeded the defined
thresholds.
Drilldowns 102
© 2018 IBM Corporation
I/O Reads and Writes
103
© 2018 IBM Corporation
SQL CPU Utilization
• Shows you the SQL CPU Utilization sorted by thread
• The starting point to determine if your CPU utilization is due to SQL or
other work
104
104
© 2018 IBM Corporation
Database Locks Overview
• Database locks overview gives you a graph of database record lock
contention from Collection Services data
105
© 2018 IBM Corporation
Database Locks Overview -
Drill down to find contributing jobs
We can find out it was the QRWTSRVR jobs with record lock contention
106
© 2018 IBM Corporation
Job-Level Database Statistics
The following metrics have been added to the job performance data *JOBMI category
of Collection Services in 7.1
– SQL clock time (total time in SQ and below) per thread (microseconds)
– SQL unscaled CPU per thread (microseconds)
– SQL scaled CPU per thread (microseconds)
– SQL synchronous database reads per thread
– SQL synchronous nondatabase reads per thread
– SQL synchronous database writes per thread
– SQL synchronous nondatabase writes per thread
– SQL asynchronous database reads per thread
– SQL asynchronous nondatabase reads per thread
– SQL asynchronous database writes per thread
– SQL asynchronous nondatabase writes per thread
– Number of high level SQL statements per thread
– Error if you try to display one of these charts but have not activated the support:
107
© 2018 IBM Corporation
Database – Physical Database I/O
108
© 2018 IBM Corporation
Job-Level Database Statistics
7.1
• Ten new perspectives (8 on perspective list plus 2 drilldowns)
– Physical Database I/O for Jobs or Tasks - Detailed
– Physical Database I/O for One Job or Task - Detailed
109
© 2018 IBM Corporation
More Perspectives
111
© 2018 IBM Corporation
111
Java Perspectives
112
© 2018 IBM Corporation
112
Storage Allocation Perspectives
113
© 2018 IBM Corporation
113
Storage Allocation by Thread or Task
114
© 2018 IBM Corporation
114
Timeline Perspective
The timeline bars on the chart represent
the elapsed time of threads or tasks
– Dispatched CPU Time
– CPU Queuing Time
– Other Waits Time
115
© 2018 IBM Corporation
115
Timeline Overview for Threads or Tasks
118
© 2018 IBM Corporation
118
Memory
In a graphical view!
• But this is collected on an interval basis and you can view across the
collection time
• Drilldown:
– Memory Metrics for One Pool
View 1: Memory Metrics Overview for One Pool
View 2: DB and Non-DB Page Faults for One Pool
View 3: DB and Non-DB Pages Read/Written for One Pool
121
© 2018 IBM Corporation
Memory Perspectives
Memory Pool Sizes and Fault Rates – View one: (Pools 001-004)
3 views
123
© 2018 IBM Corporation
Memory - Drilldown
124
© 2018 IBM Corporation
Storage Allocation Perspectives
125
© 2018 IBM Corporation
Storage Allocation by Thread or Task
126
© 2018 IBM Corporation
Physical System Perspectives
127
Display overall CPU utilization for the physical box and all
partitions, regardless of operating system
128
http://ibmsystemsmag.blogs.com/i_can/2009/10/i-can-display-cpu-utilization-for-all-partitions.html
SAVLIB to tape
started here
131
© 2018 IBM Corporation
CPU Utilization and Waits Overview Sample 2
132
© 2018 IBM Corporation
133
SAVLIB to
save file IO-
CPU-bound
bound
workload
workload
134
© 2018 IBM Corporation
Disk Throughput for Disk Pools Sample 1
135
© 2018 IBM Corporation
Disk Throughput for Disk Pools Sample 2
This
This 720
720 system
system has
has 2
2 xx 5908
5908 Disk
Disk
Controllers + 32 Disk Units
Controllers + 32 Disk Units
Much less Disk Wait Time than the
other Disk Throughput graph
136
© 2018 IBM Corporation
137
View 2 of “Resource
Utilization Overview”
- QAPMJSUM
- QAPMSYSTEM
- QAPMPOOLB
139
© 2018 IBM Corporation
Waits Overview Sample 2
A lot of
disk read
here Substantial
CPU queuing
but not too
much
140
© 2018 IBM Corporation
Siezes and Locks Waits Overview Sample 1
141
© 2018 IBM Corporation
Disk Waits Overview Sample 1
146
© 2018 IBM Corporation
147
148
© 2018 IBM Corporation
149