Sunteți pe pagina 1din 148

Better Analysis with

Performance Data
Investigator : Part 1

Lora Powell
Advisory Software Engineer
lrpowell@us.ibm.com

2019

© 2018 IBM Corporation


Agenda
• Tips for Navigator and PDI
• General Health Indicators

• Analysis of Performance Problems


– Scenario #1
– Wait Accounting
– Scenario #2 - Job Watcher
– Demo with Database perspectives

© 2018 IBM Corporation


Browser Support
• Supported Browsers for the latest Navigator enhancements, latest
version of:

– Mozilla Firefox
– Google Chrome
– Apple Safari
– Microsoft Edge (new)

– Note: Internet Explorer no longer supported -


– Unsupported browser warning
• Update your browser

© 2018 IBM Corporation


4
Browser Tips
• Clear your browser cache after installing PTFs
– Then close and restart browser
– Or, always run in Incognito/Private mode!
Chrome - FF - private
Incognito
• Review your browser security settings
– Allow pop-ups
• For details see the following web page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Te
chnology%20Updates/page/Browser%20tips

• Close unneeded tabs in your Navigator session


̶ Tasks in tabs consume resources and may cause performance degradation

• Avoid using PF-5 to Refresh a panel, instead use Refresh button found on
Navigator panels

• Unexpected results could be browser related. Example problems are….


• Hung charts
• Empty tables

© 2018 IBM Corporation


5
Tips for Best Performance of Navigator

Note: Navigator will not run fast on a system that is already slow!
ü Ensure no bad DNS entries on the system
– http://www-01.ibm.com/support/docview.wss?uid=nas8N1010614
ü Use Application Runtime Expert to validate your environment
http://www.ibm.com/developerworks/ibmi/library/i-applicationruntime/index.html
– Network health checker (simple to use template for ARE, no charge to run) from QShell:
• /QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh –network
http://ibmsystemsmag.blogs.com/i_can/2013/09/application-runtime-expert-network-health-
checker.html

ü Close the Dashboard tab if you do not need it, it consumes system resources
since it is periodically pinging the system status

ü Managing System Performance on IBM i

ü Use the Web Performance Advisor to validate your Web Performance


http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzaie/rzaieconwebperfadvisor.htm

ü Ensure QMAXACTLVL system value is set to *NOMAX


This value is the number of jobs/threads that can simultaneously compete for memory and CPU.

6
© 2018 IBM Corporation
Navigator Search

Search for Navigator tasks by things you know


You can find tasks without having to know how to navigate to them

7
© 2018 IBM Corporation
Navigator - Favorites
Throughout Navigator, save favorites to quickly get to the function you want
– Including favorite Performance Data Investigator perspectives

8
© 2018 IBM Corporation
Collection Services

• https://www.ibm.com/developerworks/community/wikis/home
?lang=en#!/wiki/IBM%20i%20Technology%20Updates/page/
Performance%20Data%20Collectors

© 2018 IBM Corporation


Performance Instrumentation and Data Collection
The Advantage

•IBM develops the software stack, top to bottom


– Instruments the software with component-specific performance metrics

•IBM develops the performance data collectors that harvest those performance
metrics

•IBM i has an integrated database – Db2


– This is a BIG DEAL
– Performance data is stored in the database automatically
• No “add on” application is necessary – it’s all in the Operating System

•IBM provides the graphical analysis tools


– Analysis of the performance data in the Db2 files using SQL

IBM i has the best performance instrumentation and data collection capabilities in the industry!
© 2018 IBM Corporation 10
Collection Services
• Designed to be Always On – with minimal overhead

– If something goes wrong, you have data that will help analyze the
problem, fix it, and prevent it from happening in the future

– If you can’t solve the problem, you have information that makes it easier
for IBM Support to solve the problem faster

– This provides a reliable baseline - understand the impact that a


software, network, or environmental change had on the performance of
your system

– It also provides historical information - enables planning for future


growth based on real trends, not guesses

11
© 2018 IBM Corporation
What is Collection Services?
IBM i function that collects performance
data at a system level

AND at a job/thread/task level


Collects data from many system
resources:
•Jobs
•Disk

•Buses

•Memory pools
Collection Services data is used by:
•Communication lines Performance Data Investigator
•…..many others System Monitors
Performance Tools for i
PM for Power Systems
iDoctor 12
© 2018 IBM Corporation
Collection Services
• IBM recommends you always run Collection Services

• Collect data at regular intervals from 15 seconds to 1 hour


– Default is 15 minutes - suitable for trending/capacity planning
– Consider 5 minute intervals for problem determination and in-depth
performance analysis

• Data is initially stored in a management collection object


– Can hold large quantities of performance data with minimal overhead

• Includes valuable Wait Accounting information (more on this later…)

• Performance data is transferred into database files for each collection -


upon request or automatically.

© 2018 IBM Corporation 13


Configuring Collection Services

For Graph
History data
collection

Check this box if you plan to use:


• Performance Data Investigator Collection Services can collect
• Performance Tools monitor data without starting a
• System Monitors system monitor
They all require data in the database files
The default is checked; leave it checked.

© 2018 IBM Corporation 14


Rebuild Collections Table
• If you restore performance data without using the Restore
Performance Collection interface (or SAVPFRCOL & RSTPFRCOL
commands), collections won’t display in the Manage Collections view.
• Why do I need to rebuild the table?
– Use “Rebuild Collection Table” action
• will rebuild the meta-data used for the Manage
Collections task and then your performance data will be
visible.
Ever wonder why PDI
doesn’t list your
collection? This may
be the reason!

© 2018 IBM Corporation


15
Performance Data
Investigator - PDI
https://www.ibm.com/developerworks/community/wikis/home?lang
=en#!/wiki/IBM%20i%20Technology%20Updates/page/Performanc
e%20on%20the%20web

© 2018 IBM Corporation


Packaging
PDI Packages are listed under Investigate Data in Navigator

Included with the base


7.2 operating system

IBM Performance Tools –


Job Watcher feature

IBM Performance Tools –


7.2 Manager feature

7.3

© 2018 IBM Corporation


17
Authority - Authorizing Users to PDI
• Users need to be authorized to use the investigate data and collection
manager performance tasks

• Include users on the QPMCCDATA and QPMCCFCN authorization lists

Edit Authorization List

Object . . . . . . . : QPMCCDATA Owner . . . . . . . : QSYS


Library . . . . . : QSYS Primary group . . . : *NONE

Type changes to current authorities, press Enter.

Object List
User Authority Mgt
*PUBLIC *EXCLUDE
QSYS *ALL X
PDI01 *USE
PDI02 *USE
PDI03 *USE
PDI04 *USE
PDI05 *USE
PDI06 *USE
PDI07 *USE
PDI08 *USE
PDI09 *USE
More...

© 2018 IBM Corporation


18
Use PDI from a system other than where the data
was collected
Store data centrally if you have multiple physical or logical partitions
• Easier to analyze and backup
• Resource-intensive analysis won’t impact production partitions

Two variations:
1. Go to the data : Use Set Target
- Use PDI on one system while analyzing data on another
system
- Use PDI on one release to view data on another

2. Bring the data to PDI : Transfer the collections


- Save and restore the right way
- From a different release level
- Know whether you should convert or not

Back up key performance ©data as you would business data


2018 IBM Corporation
19
Viewing Data on another System

• Leave the data where it is!


– Use Set Target System to view data on one partition
while running Navigator from another

1. Log in to IBM Navigator from your updated development


partition
2. Set Target to your production system
• Look at your production system performance

Even View data on


other releases
© 2018 IBM Corporation
Set Target System
Target System

Navigator
System
HTTP Server runs on the
system you initially log
into.

You can manage a second


system
• No web server is required
on the second system
• The Host Servers are
© 2018 IBM Corporation
used 21
Transferring Collections to another partition
• Move the data to your Navigator system to analyze it with PDI
• Save the *CSFILE object
– Use *CSMGTCOL for sending it in to IBM or to save with less space
• Restore from Navigator on the other partition. (RSTPFRCOL)
– This works fine whether the customer saved it using SAVPFRCOL or
not.

Now just
Refresh to
view in table

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Tec
hnology%20Updates/page/Saving%20and%20Restoring%20from%20previous%20release
© 2018 IBM Corporation
Collection File Level
• Collection Services data has different File Level for every release.
• Converting allows PDI to view the data without knowing about differences in
data from that (previous) release.
CVTPFRCOL
• After you convert a back-level collection to the current release, the file level will be
changed to match that for the release level.

But starting back in Fall 2016 - PDI now handles all Collection Services
collections for whatever file level they are set to on any release.

So we no longer need to convert CS collections. Still may need to convert Job


Watcher, Disk Watcher or PEX files.

7.2 7.3

© 2018 IBM Corporation


To Convert or NOT Convert?
It Depends on the Collection Type
• If you CONVERT a file, then PDI looks at it as if it was created for a system
at that file level. For Collection Services, this then *loses* some of the
significant data for the system it was collected on.
– Wait bucket data – for example 6.1
• Bucket 20
7.1
– 6.1 was Classic JVM
– 7.1 not used so set to RESERVED 7.2+
– 7.2 reused for Journal Save While Active Time

• When you look at this data, you want to view it with the right bucket
labels!
• PML Single Source provides forward-compatibility in viewing your CS data.

PDI will show release specific information for your collection no matter what
future release you choose to view your collection from.

Don’t convert Collection Services Collections


© 2018 IBM Corporation
Viewing Prior Release Performance data
For Collection Services data

• View as a collection for the release it was generated on


– Use Create Performance Data (CRTPFRDTA) command on the
partition it was collected on to get the data into database files
» Uses the current release format

– SAVPFRCOL, transfer and RSTPFRCOL on the desired partition

Note: the library in which the performance data is restored into


needs to contain only collections from that same release so that it is
in the same release format.

If created on 5.4 it will need to be converted.


There is no PDI package support for 5.4 collections

© 2018 IBM Corporation


25
Viewing Prior Release Performance data
• For Job Watcher, Disk Watcher, or Performance Explorer
collections
• Convert the performance data to the current release format via the GUI
– Save the performance collection - SAVPFRCOL
– FTP the save file to the desired partition
– Restore the collection via the Collection Manager - RSTPFRCOL
– Convert the collection to the current release format - CVTPFRCOL
– convert the prior release database files to the current release.

Cons of converting CS data files – we could no longer show you the data specific
to the release it was collected on. Conversion could lose information.
© 2018 IBM Corporation
26
PML Single Source
• No more maintaining PML for each release! All releases are handled
within the same PML file
– Release to release changes include:
• New fields, additional data (Database SQL, Memory data, etc)
• Wait Bucket name changes
– A 6.1 collection brought to 7.3 to view will correctly show you the
wait buckets for 6.1.
– Collections are forward-compatible with regard to PDI charting
– Release specific information for your collection no matter what release
you view the collection.
– The goal is that PDI will show you the most updated definition of
the chart possible for your given collection level on any release.
– Keep the collections at the same file level as they were generated on.
– Easier for us to maintain!

© 2018 IBM Corporation


27
Release Specific Metrics

Database Perspectives

The perspectives available in the


Database package and the views varies
by release.
PDI will show you the most updated
definition of the chart possible for your
given collection and its content.

• Database Health Indicators is only


available on 7.2 and later
– QAPMSQLPC is only available on
7.2 and later

© 2018 IBM Corporation


28
Collection and Perspective Compatibility
There is variation in the data available in the regular Q* Collection Services collections and the
System Monitor collections which are designated with a name of R*.

View List
shows what
files are
System Monitor collections required
don’t have all the same data as
a Collection Services collection

- SO, The perspectives


available may be different.
• Monitor files don’t
include QAPMSQLPC
data
• Database Health
Indicators perspective
requires QAPMSQLPC
so will not be available
for the R* collection

© 2018 IBM Corporation


29
PDI Search

• Search all the


metrics
displayed
• Search the
entire SQL for all
perspectifves
Launch from search results to that
perspective to select your collection
© 2018 IBM Corporation
30
Selectable Sort Perspectives
• Drilldown actions with integrated sort based on selected metric
• Select an interesting metric on the graph and then use Actions to select a
drilldown chart.

If you select a metric before drilling down into one of these perspectives, the
resulting chart will be sorted based on that selected metric.

If you do not select a metric before doing the drill down action, the chart will
be sorted based on a default metric.
The default is also used when the perspective is chosen from the initial
Investigate Data Perspectives screen where no select is available.

Best-kept
Great for drilling down into Waits (more on this later) PDI secret!

Selectable sort is only in the Collection Services package


© 2018 IBM Corporation
Drilling Down – Waits by Job or Task
Sorted by Dispatch CPU Time

© 2018 IBM Corporation


PDI - General Health Indicators

Start here to: Check the overall health of your system for today or in
past few days
+ Summary of key metrics in one view
+ Look at full day multiple metrics
+ Drilldown from here to breakdown over time

Summarizes the full collection for percent of intervals that exceed set
thresholds

Go next to: Drill down to other PDI health indicators charts or go to


Collection Services package to look at breakdown over time
+ more metrics
+ interval data 33

+ more in depth information: metrics by job, thread or task; wait bucket data
This feature comes with the base IBM i Operating System
© 2018 IBM Corporation
General Health Indicators
Quick overview of Key Performance metrics

Database Health
Indicators were
introduced in 7.2

34

© 2018 IBM Corporation


Health Indicators
System Resources Health Indicators
• Summarizes key metrics from 5 categories Drill into
areas that
exceed
thresholds

35

© 2018 IBM Corporation


Database Health Indicators
This chart shows Database health indicators by
analyzing all collection time intervals according to the
defined thresholds for database. Use this chart to
determine the proportion of intervals where
Database health indicators exceeded the defined
thresholds.

Drilldowns 36
© 2018 IBM Corporation
Health Indicators

Customize Health Indicator Thresholds

37

© 2018 IBM Corporation


Performance – When should I be concerned?
• Users complain

• Change in system performance


– Less throughput
– Longer batch runtimes

• When compared to baseline


– A change in amount of wait time
– A new type of wait time appears

• An unexpected drop in CPU utilization

© 2018 IBM Corporation 38


Performance Analysis - Where do I start?
• Start by asking questions:
– What was the symptom of the problem?
– Who reported the problem?
– What time did it occur?
– How long did it last?

– Have there been any recent changes?


• New or changed workload?
• Any application changes?
• Any recent hardware configuration changes?

– What was the scope?


• Did it impact the entire system?
• Did it impact some subset of work?
– Specific users?
– Specific applications?
© 2018 IBM Corporation
39
Performance Data Investigator (PDI)

Start here to: Graphically view and analyze metrics collected


by IBM i collector tools

• Integrated and free


• Easy to use
• Simplified analysis

Go next to: Start a Job Watcher collection and view with PDI Job
Watcher package or iDoctor Job Watcher.

40

This feature comes with the base IBM i Operating System


© 2018 IBM Corporation
Basic Performance Analysis
• Become familiar with your Collection Services performance data & tools to
understand your performance characteristics
– Be proactive!

• Have a knowledge of your baseline performance data

• When a performance problem occurs you often need to use performance


analysis tools to identify the cause of the problem to correct it

41
© 2018 IBM Corporation
Starting Point
Start with CPU Utilization and Waits Overview
• Shows CPU Utilization (red line)

• Shows Wait Information (stacked bars)

• Green bars are disk time

– Identify when the CPU utilization dropped that disk time went up

• Do full zoom out first to find time span of interest

• Then zoom in and/or drill down for further analysis

– Type of disk operations

– Contributing jobs

42
© 2018 IBM Corporation
Understanding the basics
Each bar graph shows accumulated times from all active jobs on the system at that
moment of IBM i Collection Services sampling period

Each bar graph is a time interval of collected data

You can interact with the graph: Select the tooltips icon, then hover over a bar or
line to see more details of that component

Use the data on one chart to modify the next chart – for example: date & time
filtering, sorting the next chart by a selected metric

43
© 2018 IBM Corporation
Problem Analysis
Scenario
using PDI

© 2018 IBM Corporation


CPU Utilization and Waits Overview
CPU Utilization and Waits Overview is an excellent starting place. Look for interesting points.
Next steps will depend upon the answer to the prior questions, along with what you see.

© 2018 IBM Corporation


45
Example Analysis - Start
• Use tooltips to get specific data points
• Look for peaks or drops in CPU consumption Drop in CPU consumption in interval 11.
– OS contention time: Also note operating system contention
time just before that in interval 10

– Select the low point


• Drill down for that
interval to Waits Overview

You can select one point or a range by


selecting first and last intervals
46
© 2018 IBM Corporation
Waits Overview – one interval
Most of the wait time is disk page faulting

By clicking in the Disk Page Faults Time bar before going to Waits by
Job or Task, that chart will be sorted by Disk Page Faults time

47
© 2018 IBM Corporation
Waits by Job or Task
• By clicking on Disk Page Faults Time before going to Waits by Job or
Task, this chart is sorted by Disk Page Faults time

Click on the first job and select All Waits by Thread or Task

Hover over to see tooltips on the page faults time for the job
48
© 2018 IBM Corporation
All Waits by Thread or Task
– See multiple threads spending nearly all their time waiting on page faults

Zoom Region

Interval size is 900 seconds


49
© 2018 IBM Corporation
All Waits by Thread or Task - Zoom
• Zoom in to see if there is other data on the leftmost side of chart
– Less than 1 second per interval is spent in CPU time

50
© 2018 IBM Corporation
Waits Overview

What else can we find out?


Waits by server type - What server
Waits by job current user profile - Who is the client
51
© 2018 IBM Corporation
Waits by Server Type

52
© 2018 IBM Corporation
Waits by Job Current User Profile

- Go back to Waits Overview, then


- »Select Waits by Job Current User Profile chart
- »Can see the user that this server is doing work for
53
© 2018 IBM Corporation
Back “Home” – CPU Utilization & Waits

Next go to Contention Waits Overview for the same interval


54
© 2018 IBM Corporation
Contention Waits Overview
Drill down into Waits by Job or Task to see
if we can figure out what jobs are
contributing to this contention

Machine Level Gate Serialization

55
© 2018 IBM Corporation
Waits by Job or Task
What jobs are contributing to this contention?

Need to look at Job Watcher data to find out more:


holders
call stacks
56
© 2018 IBM Corporation
Job Watcher - Object Lock Contention

You should see a single bar. Click on


it and then drill down into “Interval
Details for One Thread or Task”

Note the following:


- Object Waited on
- Holding job or task
- Call Stack of the
thread requesting
the lock

57
© 2018 IBM Corporation
Object Lock Contention
• The QSCLICEV job wanted to lock the WATCHEVENTSPACE, but
was unable to do so.

• Job QZRCSRVS/QUSER/014097 held the lock.

• It turns out this particular example was due to a code defect


– the lock was not released by the QZRCSRVS job.

– The fact that this job held the lock was sufficient information for
the developer to identify and correct the defect.

58
© 2018 IBM Corporation
Collection Services vs Job Watcher
§ Collection Services and Job Watcher both collect wait
information
– Graphically view the data that show waits
– Collection Services runs by default
– Job Watcher data is generally collected when additional
information is necessary to analyze a problem

§ Job Watcher can also collect call stacks and SQL statements
– Provides additional information for detailed analysis
– More frequent intervals
– More detailed wait information
– Objects being waited on
– Holder of object

• Call Stacks

59
© 2018 IBM Corporation
Wait Accounting

Wait Accounting helps to determine if a


wait condition is a problem

© 2018 IBM Corporation


Why are we waiting?
Figure out why a job is waiting
Is it valid or can we shorten it? -> improve our run time

Performance Fact:

“All computers wait at the same speed”


© 2018 IBM Corporation 61
What is Wait Accounting?
Wait Accounting = the ability to determine what a job is doing when it is
waiting (not “running”)

– i Exclusive!! Patented IBM i technology built into IBM i

But what is it waiting for? Waits may be normal, some waits are not normal
© 2018 IBM Corporation 62
Wait Accounting

• Wait Accounting is used to understand what is happening


when a job is not running.
– Wait information is tracked for each job, thread and task on system

• A job spends its time in one of three states


1. CPU
• Time spent dispatched to the processor, active/running
2. CPU queuing
• Ready to be processed, waiting for a processor to become available
3. Wait
• Waiting for something or someone, blocked or idle

Using Run-Wait Analysis to Improve your Job Performance


63
© 2018 IBM Corporation
Basics of Waiting
• Two basic types of waits
– Idle: waiting for a work request
• Typically not indicative of a problem
Waiting for the “Enter” key to be pressed on a 5250 display session
• If a problem, usually external to the machine
i.e - slow arrival of work requests due to communications problem
• Possible, but not typical in batch jobs
i.e.- waiting for an entry to be placed on a data queue

– Blocked: waits that occur while performing a work request

Blocked waits are the ones we want to take a closer look at


• Outside of CPU usage and CPU queuing time, blocked waits are
the reason jobs/threads take as long as they do to complete their
work
http://ibmsystemsmag.blogs.com/i_can/2009/11/i-can-tell-you-why-youre-waiting.html
© 2018 IBM Corporation 64
65

Run/Wait Signature

Typical batch job run/wait signature


CPU CPU queue Wait
Elapsed time

Interactive job run/wait signature


Idle CPU CPU queue Wait
Elapsed time

© 2018 IBM Corporation 65


Basic example:
Batch job with total run time of 6 hours

Run/wait signature
CPU CPU queue Wait
120 min 70 min 170 min
Elapsed time 6 hours (360 mins)

Potential to run in 3 hrs 10 min if wait time could be eliminated

Wait analysis and reduction can be a powerful and cost-


effective way to improve response time and throughput

66
© 2018 IBM Corporation
Detailing wait time

§ Determine the components of time spent waiting

Elapsed time
CPU CPU queue Wait

Record
Disk reads Disk writes Journal
locks

© 2018 IBM Corporation 67


68

Detailing wait time:


Metrics related to components of wait time

Total Disk reads Disk writes Record Journal


count Locks
3,523 17,772 355 5,741
Total
time 42 sec 73 sec 45 sec 44 sec
0.012 sec 0.004 sec 0.126 sec 0.007 sec
Avg time
per wait

We can see questions to ask:


● How many of the reads are page faults? Could memory/pool changes help?
● What programs are causing reads? Could they be reduced or made async?
● What programs are causing writes? Could they be reduced or made async?
● What Db2 files are involved with the record locks?
● What files are being journaled? Are journals needed and optimally configured?

© 2018 IBM Corporation


Expected vs. Unexpected Waits

• Some waits are “expected” and others “unexpected”


– A record lock may be expected, but one that lasts for a very long
duration is unexpected

– Regardless of the type of wait, it is always better if wait time can


be minimized or eliminated

– There are a few block points on the system that are almost
always considered unexpected

69
© 2018 IBM Corporation
Holders, Waiters, and Call Stacks
• IBM i keeps track of who is holding a resource, and if applicable, who is
waiting to access that resource
– A Holder is the job/thread/task that is holding the serialized resource
– A Waiter is the job/thread/task that wants to access the serialized
resource

• IBM i also maintains call stacks for every job/thread/task

• The combination of
– Who - holders and waiters
– What – the resource being waited on
– How - call stacks
provides a very powerful solution for analyzing wait conditions

• Job Watcher is the tool to accomplish this


– Collection services can tell you about waits, but not about holders/waiter or call stacks

© 2018 IBM Corporation 70


71

Wait Accounting - “Wait Buckets”


§ Licensed Internal Code (LIC) has identified points where waits occur and
assigned a numeric identifier to each point, also known as a “block point”

§ This level of detail poses challenges to efficiently evaluate the wait time in a
job
– Difficult to fully understand hundreds of block points
– Huge amounts of storage would be needed to track counts and times
for every block point in every job

§ For this reason, block points are assigned to categories commonly referred
to as “wait buckets”

§ “Mapping” refers to how block points are assigned to buckets

© 2018 IBM Corporation


Blue – Blocked waits
32 Wait Buckets
• Time dispatched on a CPU • Object lock contention
• CPU queuing • Ineligible waits
• Reserved • Main storage pool
• Other waits overcommitment
• Disk page faults • Journal save while active
• Disk non fault reads • Reserved (7.3 ….)
• Disk space usage contention • Reserved (7.3….)
• Disk op-start contention • Socket accepts (idle)
• Disk writes • Socket transmits
• Disk other • Socket receives
• Journaling • Socket other
• Semaphore contention • IFS
• Mutex contention • PASE
• Machine level gate • Data queue receives
serialization • Idle / waiting for work
• Seize contention • Synchronization Token
• Database record lock contention
contention • Abnormal contention
http://www.ibm.com/developerworks/ibmi/library/i-ibmi-wait-accounting/
http://public.dhe.ibm.com/services/us/igsc/idoctor/Job_Waits_White_Paper_61_71.pdf
© 2018 IBM Corporation 72
73

Wait Bucket Example


§ Database Record Lock Contention – Bucket 16

– Several different causes for waits in this bucket


● Read
● Update
● Weak
● Transfer
● Check
● Conflict Exit

Note: The Information Center also describes ‘buckets’ as ‘groups’ or ‘sets’

© 2018 IBM Corporation


Understanding “Time Dispatched on a CPU”

Time dispatched on a CPU (Bucket 1)


• Thread or task has been assigned to a processor and is NOT waiting
• Complicated by certain features
• Hardware Multi Threading (HMT)
● Allows multiple threads/tasks to be assigned to a single physical processor
● Causes bucket 1 time to be greater than actual CPU time
• Background assisting tasks
● Promote their CPU usage back into the client job/thread
● Causes client thread’s bucket 1 time to be smaller than measured CPU time
• LPAR shared/partial processors
● Bucket 1 records time dispatched to the virtual processor
● Bucket 1 time may be greater than CPU time because it may include time the
thread/task is waiting for the physical processor behind the virtual processor

Bucket 1 – Time Dispatched on a CPU does NOT equal CPU time


74
© 2018 IBM Corporation
75

Understanding “CPU Queuing”

• CPU Queuing (Bucket 2)


– Thread or task has been assigned to a processor and is waiting for the
CPU to become available

• Too much work on the partition causing threads to need to wait for the
processors

• Spiky workloads - I/O completing in batches can cause this but so can software
design.

• Workload Groups - workload group can be over-committed even though the system
is under-committed

• Shared processors – Latency due to hypervisor sharing the physical


processors among multiple partitions

© 2018 IBM Corporation


76

Waits that Applications use

• Disk waits
• Semaphores, Mutexes, Synchronization Tokens
• Journaling
• Database record locks
• Object locks
• Sockets

© 2018 IBM Corporation


Waits Overview – Collection Services & Job Watcher

“Wait buckets” are collected and


viewable by Collection Services

and Job Watcher

Job Watcher data is more


granular and has additional drill-
down capabilities

© 2018 IBM Corporation 77


Waits by Job or Task

The next question likely would be which job(s) are incurring this wait time. Drilling
down further, we can see the list of jobs incurring this wait time:

This type of chart can also be used to understand a job(s) “run-wait” signature

© 2018 IBM Corporation 78


Analyze Job Watcher
data using PDI

© 2018 IBM Corporation


Job Watcher

Start here to: Job Watcher returns real-time information about a selected
set of jobs, threads, or LIC tasks
Keep track of performance of a specific job or how it might be affecting
system performance.
Or dig into a job that was seen to be causing problems when viewed in
Collection Services
• Data collected by Job Watcher includes
– Wait times
Run Job Watcher when you need detailed
– CPU performance data for diagnostic purposes.
– I/O activity
There are clients that run Job Watcher
– Call Stacks 24x7
to always have diagnostic data available.
– SQL statements
– Communications statistics Need to manage the data carefully. 80
– Activation Group statistics
This feature Requires the Performance Tools Job Watcher feature – 5770PT1 option 3
© 2018 IBM Corporation
Job Watcher
• Job Watcher collects more detailed performance data than Collection
Services and at more frequent intervals
– CPU and I/O (like Collection Services)
– Call Stacks
– SQL Statements
– Detailed Wait information:
• Objects being waited on, even record number of files
• Holder of object

• Job Watcher does not collect everything that Collection Services collects.
• It does not always collect information about every thread
– Thread must use CPU during interval
– Thread must exist for entire interval

• Data is written to Db2 files


81
© 2018 IBM Corporation
Job Watcher Usage Tips

• Use Job Watcher when you need detailed performance data to


resolve a problem
– Typically problem has been scoped first by Collection Services

• For problem determination Job Watcher can be run on specific jobs

• Multiple collections can be run at the same time

• Need to manage the amount of


data collected

82
© 2018 IBM Corporation
Basic Job Watcher Data Collection Steps

• Create the Job Watcher definition


– Or use one of the IBM-supplied definitions

• Start the Job Watcher collection

• Let it run until the problem has occurred

• Stop the Job Watcher collection

• Analyze the data

• There are times when you may want to


– run Job Watcher continuously

83
© 2018 IBM Corporation
How do I analyze Job Watcher data?

§ Scope the problem


§ What time?
§ What users or jobs?

§ Look for trends in the data

§ Look for presence of waits


– Drill down into wait details

§ Display call stacks for running or waiting jobs

84
© 2018 IBM Corporation
Viewing Waits with Job Watcher

• Use wait information

– Get to details on what a job is doing when it is not


running

– Determine what the job is waiting for

– Determine what job or thread has the resource being


waited on

Next: Example with Machine Level Gate Serialization


85
© 2018 IBM Corporation
CPU Utilization and Waits Overview
§ In PDI, both Collection Services and Job Watcher have a
“CPU Utilization and Waits Overview” graph as a general
starting point for wait analysis

CPU Utilization and


Waits Overview

Full Zoom Out

86
© 2018 IBM Corporation
Find timeframe and zoom in

- Look for unusual patterns as a way to start


- Zoom into the time where we see the large drop in CPU Utilization
87
© 2018 IBM Corporation
Zoom in to see a specific time frame
We can see operating system contention
occurred during the time when the CPU
Utilization dropped

.
Select the beginning and ending intervals to investigate and then drill into Contention Waits
88
Overview © 2018 IBM Corporation
Contention Waits Overview

89
Note for JW, we select the sorting
© 2018(vs selection based sort)
IBM Corporation
90

Finding the significant wait


We want to see if we can figure out who might be causing the contention.
Drill into All Waits by Thread or Task Sorted by Machine Level Gate
Serialization so we can see the jobs/threads/tasks that are all waiting.

Note: Drilling into waits by thread or task can take some time…. be patient.

Machine level gate serialization is a major reason for the contention waits.
© 2018 IBM Corporation
Zoom in to see more detail
We can’t see the machine level gate serialization details at first;
Zoom in and we can see it appear in many threads.
This tells us many threads were waiting.
…But why?

91
© 2018 IBM Corporation
92

Select a thread and look at the waits for that one


thread Select a Thread

It may be necessary to drill down into interval details for several threads to find the
one with the information we need… © 2018 IBM Corporation
Select an interval

View Interval details for one thread or task

Select an interval

93
© 2018 IBM Corporation
The Power of Job Watcher… Show Holder
• If there is a holding job or task for the current thread or task, the “Show Holder” button
will be displayed
• Can move to the next interval or specify an interval number

TESTWAITS
When clicking the “Show
Holder” button, the
holding job/task/thread
will be displayed – see it’s
call stack
QAUDJRN

QDBSRV02/QSYS/345313
QDBSRV02/QSYS/345313

Easily navigate
from one interval
to the next

94
© 2018 IBM Corporation
94
View Call Stack

We can see the call stack to see how we got to this wait point

Job Watcher shows information about the object being waited on and call stacks

In the call stack you will see an entry that shows the job is creating an audit journal entry.

Note that access to the audit journal is serialized by a “gate”. So why is this job blocked and
waiting to create the audit record?

95
© 2018 IBM Corporation
Thread or Task Details
Thread is waiting for the QAUDJRN
JOURNAL AT 8:51:05

Look at
the thread
that is
holding
the
resource

96
© 2018 IBM Corporation
If the audit journal information was still available,
Audit Journal you could look at it.
This screen capture shows the audit journal entries
from the matching time period.
- NR is Next Receiver
- PR is Previous Receiver

97
© 2018 IBM Corporation
Job Watcher – Example summary
• This exercise showed how a normal system function for going to a new
journal receiver affected the CPU utilization of the system for a short period
of time.

• In this scenario, the next steps would be to evaluate what information is


being captured in the security audit journal to ensure you are not auditing
information you do not need.

• This exercise also showed how powerful the Job Watcher capabilities are
for understanding the details of what is happening on the system.

• This is something only IBM i can do!

98
© 2018 IBM Corporation
More Perspectives

© 2018 IBM Corporation


Database with PDI
• SQL Plan Cache and SQL Performance Monitor database performance files can
be viewed with SQL Overview and SQL Attribute Mix perspectives in Performance
Data Investigator

• From the Database task viewing any of these database performance files, you can
launch into PDI to view the set of charts. From within PDI Perspective list panel, the
SQL Plan Cache Snapshot, Event Monitor and SQL Performance Monitor database
performance files can be seen in the Collection

– SQL plan cache data perspectives with new SQL collection services data
– Database I/O views for both Physical and Logical I/O metrics
– SQL Cursor and Native DB Opens
– Health Indicators perspective for Database Health is added to the Health
Indicators package
– Job Watcher package is enhanced with detailed Logical Database I/O
perspectives.

– DEMO ***

100
© 2018 IBM Corporation
7.2+ - Additional Perspectives
Database Package 7.2
• I/O Reads and Writes
• Physical Database I/O - Detailed
• Logical Database I/O – Detailed
• SQL Performance Data –
Collection Services

Health Indicators Package


• Database Health Indicators

101
© 2018 IBM Corporation
Database Health Indicators
This chart shows Database health indicators by
analyzing all collection time intervals according to the
defined thresholds for database. Use this chart to
determine the proportion of intervals where
Database health indicators exceeded the defined
thresholds.

Drilldowns 102
© 2018 IBM Corporation
I/O Reads and Writes

103
© 2018 IBM Corporation
SQL CPU Utilization
• Shows you the SQL CPU Utilization sorted by thread
• The starting point to determine if your CPU utilization is due to SQL or
other work

104
104
© 2018 IBM Corporation
Database Locks Overview
• Database locks overview gives you a graph of database record lock
contention from Collection Services data

105
© 2018 IBM Corporation
Database Locks Overview -
Drill down to find contributing jobs
We can find out it was the QRWTSRVR jobs with record lock contention

106
© 2018 IBM Corporation
Job-Level Database Statistics
The following metrics have been added to the job performance data *JOBMI category
of Collection Services in 7.1
– SQL clock time (total time in SQ and below) per thread (microseconds)
– SQL unscaled CPU per thread (microseconds)
– SQL scaled CPU per thread (microseconds)
– SQL synchronous database reads per thread
– SQL synchronous nondatabase reads per thread
– SQL synchronous database writes per thread
– SQL synchronous nondatabase writes per thread
– SQL asynchronous database reads per thread
– SQL asynchronous nondatabase reads per thread
– SQL asynchronous database writes per thread
– SQL asynchronous nondatabase writes per thread
– Number of high level SQL statements per thread

– Special instructions to activate the support


https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/IBM%20i%20Technology%20Up
dates/page/Job%20Level%20SQL%20Metrics

– Error if you try to display one of these charts but have not activated the support:

107
© 2018 IBM Corporation
Database – Physical Database I/O

108
© 2018 IBM Corporation
Job-Level Database Statistics
7.1
• Ten new perspectives (8 on perspective list plus 2 drilldowns)
– Physical Database I/O for Jobs or Tasks - Detailed
– Physical Database I/O for One Job or Task - Detailed

109
© 2018 IBM Corporation
More Perspectives

© 2018 IBM Corporation


Disk Response Time Charts

A very easy interface to


see if you have slow
disk operations

111
© 2018 IBM Corporation
111
Java Perspectives

Find that job using a


lot of heap…

112
© 2018 IBM Corporation
112
Storage Allocation Perspectives

113
© 2018 IBM Corporation
113
Storage Allocation by Thread or Task

114
© 2018 IBM Corporation
114
Timeline Perspective
The timeline bars on the chart represent
the elapsed time of threads or tasks
– Dispatched CPU Time
– CPU Queuing Time
– Other Waits Time

115
© 2018 IBM Corporation
115
Timeline Overview for Threads or Tasks

Drilldown to this new chart from


existing charts
- Waits by Job or Task
- All Waits by Thread or Task

Select one thread or task and drill down to


“All Waits for One Thread or Task”
or
“All Waits by Thread or Task”

© 2018 IBM Corporation


116
QAPMCONF

Display Collection Services


DB Files

118
© 2018 IBM Corporation
118
Memory
In a graphical view!

Note the change in pool sizes.


QPFRADJ is on.

© 2018 IBM Corporation


119
Memory
• Memory perspectives are now available

• Similar information from what you get on WRKSYSSTS

• But this is collected on an interval basis and you can view across the
collection time

© 2018 IBM Corporation


120
Memory Charts
3 views or charts in each
• Memory Pool Sizes and Fault Rates perspective
View 1: Memory Pool Sizes and Fault Rates (001-004)
View 2: Memory Pool Sizes (All Pools)
View 3: Fault Rates (All Pools)

• Memory Pool Activity Levels


View 1: Memory Pool Activity Levels and Ineligible Transitions Per Second (001-004)
View 2: Memory Pool Activity Levels (All Pools)
View 3: Ineligible Transitions Per Second (All Pools)

• DB and Non-DB Page Faults


View 1: DB and Non-DB Page Faults Overview (All Pools)
View 2: DB Page Faults (All Pools)
View 3: Non-DB Page Faults (All Pools)

• Drilldown:
– Memory Metrics for One Pool
View 1: Memory Metrics Overview for One Pool
View 2: DB and Non-DB Page Faults for One Pool
View 3: DB and Non-DB Pages Read/Written for One Pool

121
© 2018 IBM Corporation
Memory Perspectives

Memory Pool Sizes and Fault Rates – View one: (Pools 001-004)

Memory Pool Activity


Levels – View one:
Memory metrics
overview for one pool
122
© 2018 IBM Corporation
Memory Perspectives – DB and non-DB Page Faults

3 views

123
© 2018 IBM Corporation
Memory - Drilldown

124
© 2018 IBM Corporation
Storage Allocation Perspectives

125
© 2018 IBM Corporation
Storage Allocation by Thread or Task

126
© 2018 IBM Corporation
Physical System Perspectives

Start here to: View high-level cross-partition processor performance


metrics for the entire frame.

• All logical partitions on the same physical server (regardless of OS)


• Available on Power 6 and above
• Requires turning on HMC option to enable collection

127

© 2018 IBM Corporation


Physical Systems Perspectives

Display overall CPU utilization for the physical box and all
partitions, regardless of operating system

128

http://ibmsystemsmag.blogs.com/i_can/2009/10/i-can-display-cpu-utilization-for-all-partitions.html

© 2018 IBM Corporation


Logical Partitions Overview
Shown for each partition:
• Average CPU utilization
• CPU entitled time used
• Uncapped CPU time used

More data available in the Table view 129

© 2018 IBM Corporation


I/O : Throughput Rates & Utilization
Data Throughput Rates and Utilization for selected I/O components

• Total Mega Bytes per Second


• Total Utilization %

More data available in the Table view 130

© 2018 IBM Corporation


CPU Utilization and Waits Overview Sample 1

This is an example of an SQL Workload


• Disk IO Intensive
• Not CPU Intensive

SAVLIB to tape
started here

131
© 2018 IBM Corporation
CPU Utilization and Waits Overview Sample 2

This is an RPG Workload


with many concurrent
active jobs
Started SAVLIB
to save file

132
© 2018 IBM Corporation
133

Resource Utilization Percentages Sample 1

One of 2 views for


“Resource Utilization
Overview” perspective
- QAPMSYSTEM
- QAPMDISK

© 2018 IBM Corporation


Resource Utilization Percentages Sample 2

SAVLIB to
save file IO-
CPU-bound
bound
workload
workload

134
© 2018 IBM Corporation
Disk Throughput for Disk Pools Sample 1

This 720 system has 2 x 5908 Disk Controllers + 32 Disk Units

135
© 2018 IBM Corporation
Disk Throughput for Disk Pools Sample 2

This
This 720
720 system
system has
has 2
2 xx 5908
5908 Disk
Disk
Controllers + 32 Disk Units
Controllers + 32 Disk Units
Much less Disk Wait Time than the
other Disk Throughput graph

Disk wait time comparison

This 740 system has 4 x 5908 Disk


This 740 system has 4 x 5908 Disk Controllers + 48 Disk Units Controllers + 48 Disk Units

136
© 2018 IBM Corporation
137

Resource Utilization Rates Sample 1

View 2 of “Resource
Utilization Overview”
- QAPMJSUM
- QAPMSYSTEM
- QAPMPOOLB

© 2018 IBM Corporation


138

Physical Disk I/O Overview - Basic Sample 1

Display disk IO /sec in reads


and writes (compare to the
preceding graph)

© 2018 IBM Corporation


Waits Overview Sample 1

This system has 3


cores and 19GB RAM.
Adding more RAM
may reduce Disk Page
Faults Time.

139
© 2018 IBM Corporation
Waits Overview Sample 2

A lot of
disk read
here Substantial
CPU queuing
but not too
much

140
© 2018 IBM Corporation
Siezes and Locks Waits Overview Sample 1

Waits Overview chart with Disk Page


Faults Time (ten thousands of
seconds) on Waits Overview is more
significant than this Seize Contention
Time

This is selected waits as compared to what is shown on Waits Overview chart

141
© 2018 IBM Corporation
Disk Waits Overview Sample 1

This is selected waits as compared to what is shown on Waits Overview chart


142
© 2018 IBM Corporation
143

Contention Waits Overview Sample 1

© 2018 IBM Corporation


144

Waits by Job or Task Sample 1

© 2018 IBM Corporation


145

Waits by Job or Task Sample 2

© 2018 IBM Corporation


CPU Utilization by Job or Task Sample 1

Use this to identify CPU


cycle “hogs” – should be
none here since CPU % is
low for each job

146
© 2018 IBM Corporation
147

CPU Utilization by Thread or Task Sample 2

Use this to identify


CPU cycle “hogs”

© 2018 IBM Corporation


Page Faults Overview Sample 1

148
© 2018 IBM Corporation
149

Page Faults by Pool Sample 1

There are only 2 main memory pools in this system


© 2018 IBM Corporation
Next Up:

Trending – Historical Data & Graph History

Modeling – Batch Model

© 2018 IBM Corporation

S-ar putea să vă placă și