Sunteți pe pagina 1din 29

Systematic Oracle performance tuning

Guy Harrison, Chief Architect, Database Solutions August 2007

Copyright 2006 Quest Software

Agenda
A brief history of Oracle tuning Limitations of common approaches Systematically tuning by levels
Application demand Database contention Reducing physical IO Tuning physical IO

The story so far .


First, the earth cooled Then, the dinosaurs came Then, Oracle 5 was released The Rules of Thumb:
Buffer cache hit rate > 90% Every segment in a single extent Index everything Trial and error

Limited Instrumentation
Tkprof (6.0) V$sysstat

No WWW

The performance tuning renaissance


Oracle 7-8 1992+ An empirical, profiling based approach Many champions, but notably:
Anjo Kolk Carey Millsap

An Oracle performance community emerges (Oak Table, etc) Yet Another Performance Profiling Methodology (YAPP): ResponseTime=ServiceTime+WaitTime

YAPP and Wait interface based tuning


Based primarily on existence of the wait interface and related Oracle instrumentation When Oracle sessions wait for a resource (IO, lock, latch, buffer, etc) they record wait durations:
V$SYSTEM_EVENT, $SESSION_EVENT, $SESSION_WAIT SQL trace files created with the 10046 event

Largest wait categories represent greatest opportunities for optimization Average wait times can reveal contention points

The path to enlightenment


Problems to overcome:
Many waits were/are mysterious Exactly what does PMON to cleanup pseudo-branches at svc stop time mean? Message had yet to reach all in the community Many still busy tweaking hit ratios Service time especially CPU time was hard to accurately measure Oracles internal CPU counters notoriously inaccurate No breakdown within CPU time Tuning less effective for CPU-bound systems Response time included components outside of Oracle: especially as clientserver gave way to 3/N tier systems No guarantee that tuning would result in end user improvement

In 10g, most of the technical issues have been resolved:


Documentation and external resources (eg, the WWW) Time model Ms timings Extended V$SQL timings

WBT pitfalls : an example

Single block (e.g. indexed) I/Os represent 83% of DB elapsed time.

Average time for an IO is about 10x expected.

Therefore it would be reasonable to assume insufficient IO bandwidth (e.g., distinct disk drives) to support the workload.

But before you go upgrading that Disk array.


A single missing index can cause huge increases in logical IO demand. This magnifies disk IO demand, internal contention (latches, etc) and CPU utilization. Furthermore, IO demands from a missing index can increase faster than hardware upgrades can be applied

Making a distinction between symptoms and causes


When faced with an obviously IO-bound database, its tempting to deal with the IO subsystem immediately. However, as often as not, IO problems are symptoms of problems arising at other levels.
Typically SQL tuning or schema design

Likewise, eliminating a contention point might actually increase physical IO demand


Lock waits can reduce the demand on the IO subsystem

Tuning SQL might increase transaction rates


Which then results in latch contention

You are best advised to ignore symptoms in the lower levels until you have optimized the performance of high levels.

Interactions between layers


Application demand Concurrency Mgt Logical IO Physical IO

SQL/ PLSQL

Block read/ write

Disk reads/ writes

Application

Oracle Software

Memory: SGA and PGA

IO Subsystem

Rows/ return code

Data Blocks

Data/ Return code

Application DB access code (parse, execute, single fetch, array fetch)

Parse SQL, check security, acquire locks, latches, buffer space, etc

Cache data blocks for re-use; sort data as required, create hash tables for hash join

Read/write data from disk devices

What it means (key slide)


Problems in one layer can be caused or cured by configuration in the higher layer. The logical steps in Oracle tuning are therefore:
1. Reduce application demand to its logical minimum by tuning SQL, optimizing physical design (partitioning, indexing) and tuning PL/SQL 2. Maximize concurrency by minimizing contention for locks, latches, buffers and so on in the Oracle code layer 3. Having normalized logical IO demand by the preceding steps, minimize the resulting physical IO by optimizing Oracle memory 4. Now that the physical IO demand is realistic, configure the IO subsystem to meet that demand by providing adequate bandwidth and evenly distributing the resulting load

Examining the time model


High end tuning tools offer big returns on investment, but basic Oracle instrumentation serves well for most stages. A lot of insight can be gained by looking at the time model combined with the wait interface

Why it doesnt all add up.


Items in the time model are nested: some categories incorporate times from other categories Wait time contributes to DB time and background elapsed time Does parse time include CPU time, etc?
1) background elapsed time 2) background cpu time 2) background wait time 1) DB time 2) DB CPU 2) User wait time 2) connection management call elapsed time 2) sequence load elapsed time 2) sql execute elapsed time 2) parse time elapsed 3) hard parse elapsed time 4) hard parse (sharing criteria) elapsed time 5) hard parse (bind mismatch) elapsed time 3) failed parse elapsed time 4) failed parse (out of shared memory) elapsed time 2) PL/SQL execution elapsed time 2) inbound PL/SQL rpc elapsed time 2) PL/SQL compilation elapsed time 2) Java execution elapsed time

Reduce the application demand


If you can, tune the application code
The best optimized SQL is the SQL you didnt do (Anjo)

Reduce the amount of logical IO generated by SQL statements


This in turn reduces CPU and IO but which fluctuate less predictably. Rewrite SQL or use stored outlines/profiles Index wisely Consider physical design changes Partitioning Denormalization

Dont forget other application demand factors


Parse CPU consumption (also can cause latch contention) PL/SQL execution time

Searching for a specific plan steps


Table and index scans might be amenable to better indexing We could also look for row counts relative to the object being scanned Procedurally, we can even look for nested table scans and other antipatterns; even before they cause problems

PL/SQL tuning workflow

Contention the proverbial bottleneck

Application Demand for DB services

Contention for limited or serialized resources causes waits and or queuing

Apparent demand at lower layers is reduced

Types of contention
Locks
Mostly application, but occasionally system (ST in legacy tablespaces)

Latches
Often side effect of excessive application demand Lack of bind variables But sometimes the final constraint on DB throughput

Buffers
Buffer cache, redo buffer, etc Hot blocks (buffer busy) Slow lazy writers processes (DBWR, LGWR, RVWR)

Sequences Redo/Archive logs


Contention between the LGWR and the DBWR or ARCH

Shared servers/PQO servers Global Cache (RAC) contention


Most significant for hot block types of contention (cbc latch, buffer busy)

Locks

Latch contention & spin count


120 100

80

60

40

20

0 0 2000 4000 6000 8000 10000 Spin Count 12000 14000 16000 18000 20000

cpu utilization Cpu best fit

Time waited on latch Execution rate best fit

Execution Rate latch sleep time best fit

IO is increasingly expensive

10,000

%change

1,000 100 10 1 IO rate Disk Capacity 400 2000 20 IO/GB CPU 3200 12.5 IO/CPU .

Reducing physical IO

Oracle Session

Buffer pools

Program Global Area (PGA) Sort area Hash Area

Data (USER) tablespace Temporary tablespace

Memory is used to cache data and to perform sorting and hashing (ORDER/GROUP BY, joins). Oracle 10g manages allocations within PGA/SGA well enough Determining the trade offs between these areas is the most important IO reduction method 11g promises to automate this step

A good buffer cache hit ratio doesnt help much


70,000,000

60,000,000

50,000,000

40,000,000 Other Direct I/O User I/O 30,000,000 DB CPU

20,000,000

10,000,000

A FTS can only generate so much IO A disk sort can generate many times the IO required to read the data from disk Some of this sort IO appears to be hidden from the wait interface

Elapsed time (microsec)

00 0

25 0

50 0

75 0

00 0

00 0

50 0

,0 00

,5 00

,0 00

,5 00

,0 00 20

1,

2,

2,

2,

3,

5,

7,

10

12

15

17

Sort Area Size (KB)

25

,0 00

10

25

50

75

Advisories
Advisories exist that provide estimates of the workload impact if a memory area were a different size
In essence, Oracle is maintaining LRU chains that are longer than actual allocations

Key advisories:
V_$PGA_TARGET_ADVICE V_$DB_CACHE_ADVICE V_$SHARED_POOL_ADVICE V_$SGA_TARGET_ADVICE (10g)

Unfortunately, Oracle suffered from split-brain during implementation:


PGA reports in bytes RW DB cache reports in IOs saved Shared pool reports in parse time

When comparing PGA and SGA advisories, you need to convert them to common units (wait time)

IO management
The demand for physical IO should be about right now
Youve reduced the application logical IO demand Youve eliminated contention that might mask that demand Youve minimized the amount of logical IO than turns into physical IO by effective memory management

Now optimize the IO subsystem for the amount of physical IO you observe
Buy enough disks to meet IO demand and storage demand Remember also that sparse disks are more efficient Oracles SAME (Stripe and Mirror Everything) is a good default approach Separating log/FRA and datafile IO is arguably the only split up you should have to make Dedicated disks for sequential redo IO OR Separate wide, fine-grained stripe for redo and FRA ASM makes this very easy, but be careful: intolerant of poor underlying configurations

ASM makes it very easy


Only need to provision for total storage for all files and total IO demand Balancing within the diskgroup is (partially) automatic Can implement redundancy within the group DBA gets more control Sys admin involvement is minimal BUT:
Still a version 1.0 technology Simplistic algorithm does not work well if underlying disks are of different sizes/characteristics Hard to map segments to spindles, especially if implemented on top of hardware storage appliance Auto-rebalance has not been observed ASM instance is a vulnerability

ASM disk groups

Q&A

S-ar putea să vă placă și