Sunteți pe pagina 1din 15

Administrating the Database

TUNING I/O FOR BETTER PERFORMANCE -- FOCUSING ON


HP AND EMC
Wei Huang, Capital One

INTRODUCTION
Let us consider a scenario. At 2:30 PM, your phone rings, and your support personnel at the Command
Center tell you that the users are experiencing slow performance on your database. At the same time, your
developers call you and ask why GlancePlus shows a warning of 100% I/O bottleneck. Did the I/O
bottleneck cause the application slowdown? Or did the application slowdown cause I/O to be the
bottleneck? What can you do to help them? Maybe your real problem is related to logical I/O and not
physical I/O. If the culprit is physical I/O, how and what can you tune to improve your physical I/O
throughput and obtain optimal performance? This paper is written to answer those questions and share my
experience and knowledge on tuning physical I/O for better performance.
STRATEGIES ON TUNING PHYSICAL I/O PERFORMANCE
The most frequently cited methodology for physical I/O tuning is SAME which stands for Stripe and Mirror
Everything (Loaiza, 2001). The SAME methodology has four rules. First, stripe all files across all disks using
a one-megabyte stripe size. Second, mirror the data against disk failure for high availability. Third, use outer
tracks of the disk for frequently accessed data. Fourth, use partitions rather than disks to subset data. The
SAME paradigm has provided useful guidelines in our tuning practice for both configuring and laying out
data on the disk subsystems. However, tuning I/O for optimal performance requires more than just SAME,
because striping indiscriminately can introduce I/O hot spots and cause trouble for capacity growth
problems in the future (Adams, 2001a). Because the size of individual disks has been increasing, using the
SAME methodology tends to waste disk space. SAME can also add to configuration problems for application
failover in an Oracle Parallel Server environment. In addition, the SAME methodology is too simplistic for
database I/O tuning in that there are many factors influencing I/O service time from an application and
Oracle process point of view. In this section, I will discuss the strategies that every DBA needs to know at
the time of I/O tuning. These strategies are:
• Understanding the hardware/OS environment
• Understanding the applications
• Creating optimal physical database layout
• Using appropriate database parameters
• Creating optimal database objects
• Collecting and maintaining I/O statistics

UNDERSTANDING THE HARDWARE/OS ENVIRONMENT


Understanding the hardware and Operating System environment is the first key strategy to achieve better
I/O performance. For example, as DBAs you need to know what kind of disks your databases are going to
be using. Are you using file system or raw devices? Are logical volumes going to be used? What is your logical
volume stripe size? Are you striping inside the storage array? How many controllers will your database reside
on? What RAID configuration do you use? Where is the outer part of the disks on your system? In addition,
you should include an understanding of the basic terminology that your System Administrators use.

Paper #559
Administrating the Database

First, you need to know the maximum I/O size in your system. In HP UX, MAXPHYS controls the I/O size.
It has been set to 256 KB on HP UX 11.0. If a disk driver sees an I/O request larger than this size, the driver
breaks the request into chunks of MAXPHYS size. In addition, if you are using cooked files (rather than raw
partitions), then you need to know what the chunk size of the file system buffer is. This is determined at the
time your System Administrator creates them, and it is often set to 4K or 8K. It is recommended that the
Oracle database block size matches the chunk size. However, you should not set the database block size
larger than the chunk size, because a single I/O will be broken into pieces, thus, increasing the number of
I/O.
Second, you should know how many spindles and controllers you have, not just how much storage capacity
you need for your database. If you have a 130 GB database, but it resides on only two 73 GB disks with one
controller and it also has more than 200 users doing a lot of I/Os, then your database will have I/O
performance problems. To make matters worse, to fully utilize storage capacity, you will be surprised to find
that your database files have been sharing the same spindles and disks with files belonging to other databases.
Therefore, each DBA needs to communicate with his/her management, business users, and System
Administrators about performance requirements as well as capacity requirements. DBAs need to write down
the number of disks, the speed of the disks, as well as the size of disks in their project requirement for the
new business initiatives.
Third, you should avoid RAID-5 (or RAID-S if you are using EMC). With RAID-5, data protection is
enforced by Exclusive Or (XOR) Boolean operations. Parity information for each data segment across
physical volumes is written to a separate physical disk. Therefore, RAID-5 uses 25% of the storage capacity
as overhead rather than 50% as in RAID-1 to achieve data protection (three units of data, one unit of parity).
With EMC RAID-S implementation, the EMC Symmetrix also generally addresses a RAID group of four
volumes, also cutting the cost of data protection to 25%. Additionally, with EMC Symmetrix RAID-S allows
the XOR calculation to take place at the disk level, instead of in directors or global cache thus saving
director/cache cycles. Thus, the cost of I/O is reduced greatly. However, in either case, the cost of
performance on writes is still substantial due to the parity calculation. Therefore, in any environment that
requires high I/O performance, it is important to use RAID-1 that simply mirrors all the disks.
Fourth, you should use raw devices rather than file system (cooked files), when write activities are high. I/O
to raw devices is much faster compared to file systems because it bypasses the Unix buffer cache. If you must
use file systems, for example, in the case of archive destination, you should choose a Journaled File System
(JFS), such as Veritas File System’s VxFS. The VxFS is an extents-based file system, not a block-based
reference system as in a regular Unix file system. It also has a direct I/O option. This feature allows for large
I/O operations and multipass writes, which provides a more efficient access to underlying disks. However, it
is still subject to inode contention. Using raw devices also allows you to enable asynchronous I/O on your
system.
Fifth, you should use asynchronous I/O and check its configuration. Enabling asynchronous I/O involves
creating the /dev/async character device and configuring the asynchronous driver in the Unix kernel. In
addition, HP UX parameter max_asych_ports needs to be set to the maximum number of processes
allowed to use asynchronous I/O. If max_async_ports is reached, subsequent processes will use
synchronous I/O. You can use lsof|grep async to see whether your database writer process is using
asynchronous I/O.

ora_arc2_ 1886 oracle 9u VCHR 101,0 0t176892 1258 /dev/async


ora_arc2_ 1886 oracle 15u VCHR 101,0 0t186236 1258 /dev/async
ora_pmon_ 5615 oracle 9u VCHR 101,0 0t0 1258 /dev/async
ora_pmon_ 5615 oracle 13u VCHR 101,0 0t0 1258 /dev/async
ora_dbw0_ 5631 oracle 9u VCHR 101,0 0x6df0e0e04 1258 /dev/async
ora_dbw0_ 5631 oracle 13u VCHR 101,0 0t0 1258 /dev/async
ora_lgwr_ 5633 oracle 9u VCHR 101,0 0xdde449d4 1258 /dev/async
ora_lgwr_ 5633 oracle 13u VCHR 101,0 0t0 1258 /dev/async
ora_ckpt_ 5635 oracle 9u VCHR 101,0 0t68163308 1258 /dev/async
ora_ckpt_ 5635 oracle 13u VCHR 101,0 0t0 1258 /dev/async

Paper #559
Administrating the Database

Sixth, you should use a Logical Volume Manager, such as HP LVM, to construct Logical Volumes across
several disks. Striping across disks improves I/O performance and allows load balancing across disks and
controllers. You can use vgdisplay, lvdisplay and pvdisplay to view volume group, logical volume, and
physical volume information, respectively.
Finally, you should know the unique configuration of the I/O subsystems that each vendor provides, such as
EMC. Ignorance of those unique configurations will cause you to miss good tuning opportunities. I will
briefly discuss the EMC configuration and its meta volume and CacheStormTM features. For further reading,
please refer to EMC web site, EMC training materials, and/or Pearce (2001).
GENERAL EMC SYMMETRIX ARCHITECTURE
Figure 1 shows the general EMC Symmetrix architecture. The front-end directors usually include Fibre
and/or SCSI Channel adapters. They connect hosts and move buffered I/Os to and from the Symmetrix
Cache. These directors each contain two microprocessors that provide high I/O throughput. All read and
write data goes through the Symmetrix cache before reaching the physical disks. Symmetrix cache uses Least
Recently Used (LRU) algorithms and Prefetch algorithms to determine data access patterns and maintain data
availability. Based on data access patterns, the prefetch algorithms move data blocks from disks to cache in
anticipation of reads, avoiding read misses. The back-end disk directors interface cache and physical disk
storage, moving all read data from disk to cache, performing prefetch operations, and destaging write data to
disk storage. Disk directors adjust track counts up to 12 tracks for sequential I/O. Each disk director also
contains two microprocessors to handle all I/O operations.

Figure 1 EMC Architecture

Figure 2 shows the structure of hyper volumes, the basic unit of EMC Symmetrix storage. Each hyper
volume is a logical split of a physical disk. For example, a 36GB disk can be divided into four 9GB hyper
volumes. The host views a hyper volume as a physical disk. The maximum size of a hyper volume is 16 GB.
An EMC physical disk can contain a maximum of 32 hyper volumes. An EMC frame can contain a
maximum of 4096 hyper volumes.

Paper #559
Administrating the Database

Figure 2 EMC Hyper volumes

There are a number of advantages in using EMC Symmetrix. The first one is cache configurations that allow
Symmetrix reads and writes to occur at memory speed rather than disk speed. Similar to Oracle buffer cache,
the Least Recently Used (LRU) algorithm is used to ensure that only pages of data that have been used
recently are kept in cache. Using EMC microcode level 5568, the maximum global cache size increases from
32 GB to 64 GB, such increase results in a higher cache hit ratio and greater performance gain. In addition,
the Symmetrix Enginuity Quality of Service (QoS) operating environment allows customers to assign the
quantity of cache along with its LRU to different database applications/LUNs. The latest CacheStormTM
technology can partition the Global Cache Director into a maximum of 16 separately addressable cache
regions and allocate different amount of cache to groups of disks. This technology can reduce the probability
of cache contention for one region down to 6%.
The second advantage is EMC PowerPath implementation. PowerPath is an EMC host-based software
offering that allows a maximum of 32 I/O paths from the host to Symmetrix channel directors, automatically
balancing I/O requests among these paths. Without PowerPath, a host accesses each disk resource via a
single I/O path between host and I/O adapters (e.g., SCSI). Significant imbalance among the paths results in
sporadic application slowdown. PowerPath automatically rebalances the I/O requests across all available
paths to the adapters. In addition, PowerPath enhances high availability in standalone or cluster
environments. PowerPath directs I/O to an alternative path at the time of path failure, thus preventing server
failures in standalone environment and node failovers in clustered environment.
The third advantage of EMC Symmetrix is its meta volume addressing (Rarich, 2001). Symmetrix allows the
concatenation of hyper volumes up to 4 terabytes. Figure 3 shows a meta volume of 32 GB with 4 hyper
volumes each of 8 GB size. With micro code 5265 or higher, Symmetrix allows striping across the hyper
volumes that comprise a meta volume. The minimum stripe size on Symmetrix meta volumes is 960 K. The
host regards each meta volume as a single disk. Thus, you can use LVM to create logical volume groups
spanning multiple meta volumes for better I/O distribution. For example, you can create one meta volume
of 32 GB with eight hyper volumes each of 4 GB size and then use VxFS to create a 128 GB volume group
across 4 different meta volumes. As a result, any data file created in this volume group will have 32 spindles,
spread across the entire Symmetrix back end, to support its I/O operations. This will greatly reduce chances
of I/O hot spots. Using meta volumes also automatically allows for better I/O queuing on Symmetrix hyper
volumes.

Paper #559
Administrating the Database

Figure 3 EMC Striped Meta Volumes

UNDERSTANDING THE APPLICATIONS


Understanding applications as well as the business purpose of those applications is critical to achieve
proactive tuning for better I/O performance on your system. DBAs focusing only on hardware or Oracle
internals cannot solve I/O problems successfully. Tuning I/O problems as well as other database
performance issues has to start with an understanding of the purpose of the applications and their
performance requirements. DBAs need to convince business partners that performance is as important as
storage capacity in the capacity planning and budgeting process. DBAs also need to ensure that the SQL
statements in the applications are optimized for high performance. Removing bad SQLs with unneeded I/Os
produces better performance.
Understanding the applications for better I/O performance really means that DBAs know:
1. business
2. what the applications are trying to achieve
3. the physical data model
4. the access path
5. the access pattern
6. access/transaction volume
7. any application changes
8. the interactions among the applications running on the same system
9. batch scheduling

Paper #559
Administrating the Database

Obtaining all the information requires collaborative effort from DBAs and developers, and sometimes,
involves business analysts and data modelers. How to obtain all the information is beyond the scope of the
discussion of this article. However, for successful tuning, DBAs should maintain a repository for access path
and pattern for each table. A form such as the following one makes proactive maintaining and tuning more
productive and efficient.

Table Name: Employee


Table Type: Normal hash table
Purpose: List employee information for the company
Number of Initial Rows: 2000
Number of Rows after 3 months: 2300
Bytes per Row: 100
Access Type: Insert, update, delete, select
DML Activity: 100 DML per day
Peak DML Time: 2:00PM to 4:00PM
Batch Schedule: 8:00PM to 8:30PM export
Query Activity: 2000 select per day
Peak Query Time: 9:00AM to 4:00AM
Access Methods: Index range scan
Interaction with Other Databases: Six database links with only select privilege
Concurrency: Low
Primary Key: Employee_id
Index: Last_name, First_name

If you have not done any of the proactive steps, the most often used reactive method you can use is to find
the queries that have high physical reads. The following query on Steven Adams’ web site can give you some
hints on which queries are worth tuning effort (assuming TIMED_STATISTICS = TRUE in all examples)
(Adams, 2001b).
select
substr(to_char(s.pct, '99.00'), 2) || '%' load,
s.executions executes,
p.sql_text
from
(
select
address,
disk_reads,
executions,

Paper #559
Administrating the Database

pct,
rank() over (order by disk_reads desc) ranking
from
(
select
address,
disk_reads,
executions,
100 * ratio_to_report(disk_reads) over () pct
from
sys.v$sql
where
command_type != 47
)
where
disk_reads > 50 * executions
) s,
sys.v$sqltext p
where
s.ranking <= 5 and
p.address = s.address
order by
1, s.address, p.piece;

Oracle also offers a powerful tool, STATSPACK, for SQL tuning. Since you can set snapshot interval, the
STATSPACK is especially useful for monitoring and identifying SQL statements with high physical disk
reads during the problematic period. In order to do this, you have to set snapshot level at level of 5 or higher.
However, for real-time troubleshooting, you have to use SQL tracing with higher level. For example, you can
ask your user to issue the following command before his/her actual statement (Oracle, 2001).
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
With level 12, the trace file will include both bind variable and wait event information. If you want to trace
other sessions, you can use the following methods.
1. Find sid, serial#, and use DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION (SID,
SERIAL#,TRUE). This will not provide wait and bind variable information though.
2. Find sid, serial#, and use DBMS_SUPPORT.START_TRACE_IN_SESSION (<SID>,
<SERIAL#>, waits=>TRUE, binds=>TRUE). You can also issue
DBMS_SUPPORT.STOP_TRACE_IN_SESSION(<SID>, <SERIAL#>) to stop tracing.
3. If you know the Unix process id, you can find the spid from v$process by using following query.
SELECT P.SPID FROM V$PROCESS P, V$SESSION S
WHERE S.PROCESS = <Unix Process Id> AND P.ADDR = S.PADDR;
Then, use ORADEBUG to set trace for the session.
SVRMGRL> CONNECT INTERNAL
SVRMGRL> ORADEBUG SETOSPID <Process Id>
SVRMGRL> ORADEBUG EVENT 10046 TRACE NAME CONTEXT FOREVER,LEVEL 12

Example: The following listing shows a segment of trace information by level 12.
WAIT #1: nam='db file sequential read' ela= 4 p1=261 p2=233717 p3=1
WAIT #1: nam='db file sequential read' ela= 7 p1=261 p2=233715 p3=1
WAIT #1: nam='db file sequential read' ela= 1 p1=261 p2=233605 p3=1
WAIT #1: nam='db file sequential read' ela= 7 p1=261 p2=233676 p3=1

Paper #559
Administrating the Database

WAIT #1: nam='db file sequential read' ela= 4 p1=261 p2=1272 p3=1
WAIT #1: nam='db file sequential read' ela= 1 p1=261 p2=233666 p3=1
WAIT #1: nam='db file sequential read' ela= 8 p1=261 p2=1200 p3=1
WAIT #1: nam='db file sequential read' ela= 3 p1=261 p2=1264 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=261 p2=233734 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=261 p2=1292 p3=1
WAIT #1: nam='db file sequential read' ela= 3 p1=261 p2=233650 p3=1

The trace file will reveal what the process has been waiting on. For I/O performance tuning, you look for
wait event ‘db file scattered read’ and ‘db file sequential read’. Check how long, for
what files, and for what blocks the process has been waiting on. If the I/O type of wait is too long, then you
may have an I/O hot spot on those files and need to redistribute the I/O load or rebuild the tables or
indexes. You may also consider rewriting the SQL statement and changing its access path and see whether
you can avoid the I/O contention.

CREATE OPTIMAL PHYSICAL DATABASE LAYOUT


There are a number of well-documented guidelines on how to create optimal physical database layout
(Aronoff, Loney, & Sonawalla, 1997; Loney, 1998; Millsap, 1995; Niemiec, 1999). Their philosophy is to
separate database files on dedicated disks based on their functionality and I/O access patterns, thus,
eliminating I/O contention. However, since Oracle database requires one file each for system, data, index,
temporary, and rollback tablespaces, and two files for redo logs, the minimum number of required physical
disks is seven for maximum availability and recovery capability. Given that the size of physical disks is
growing bigger and bigger, this kind of isolation is becoming more difficult. Furthermore, this kind of
separation can cause I/O imbalance on some disks. The philosophy of the SAME methodology is opposite
to the idea of separation. However, it assumes a large number of spindles to eliminate the overhead caused by
the locality of reference, especially when sequential I/O and random I/O are mixed together. Therefore,
there is no absolute ideal database layout. The goal is to create an optimal layout for each situation. There are
some basic rules to achieve this goal.
1. Prevent different databases from using the same disks and controllers.
2. Follow Optimal Flexible Architecture (OFA) standard religiously. If you do not know all the rules
required by OFA, read Millsap (1995).
3. Know how many and what type of I/O requests there are before you decide to spread or separate
database files. If the database is read intensive, the SAME makes more sense. When DML is dominant,
DBWR, LGWR, and ARCH have to work, and the game changes and starts favoring separation of
database files.
4. Stripe all disks, instead of separation, if you have limited number of disks (e.g., less than 10 disks).
5. Consider controllers as well as disks, when separating files.
6. Separate redo logs from the rest of the database files, especially the archive destination if the database is
in archivelog mode. This is not only for the sake of I/O performance, but also for the purpose of
availability and recoverability.
7. Separate batch and online application onto different disks and let them use different controllers. Spread
out batch space as much as possible.
8. Separate data, index, rollback, and redo logs, if transaction rate is high.

Paper #559
Administrating the Database

9. Place redo log members on the same fast disks, because the faster members always have to wait for the
slower ones. Try to use the outer section of disks (e.g., the first hyper volume of each disk in EMC
Symmetrix) for redo logs.

USE APPROPRIATE DATABASE PARAMETERS


There are a number of init.ora parameters that can be tuned to achieve better I/O performance. This paper
discusses a few of them because of their importance and effectiveness. However, tuning those parameters is a
non-trivial task.
The first set of parameters is related to DBWR process. The DBWR is responsible for writing dirty blocks
from the database buffer cache to the data files for keeping buffer cache available for new data blocks
(Vengurlekar, 1998). DBWR always writes dirty buffers in batch. One goal of tuning the DBWR process is to
minimize write time. One way to achieve this goal and increase DBWR throughput is to make sure that its
I/O requests are distributed and writes can be carried out in parallel (Alomari, 1999). As we will discuss in
the last section, EMC meta volume can help us tremendously. In addition, enabling asynchronous I/O will
also minimize DBWR wait time, because it allows the write process to continue in parallel without blocking.
However, if AIO is not available, Oracle7 and Oracle8 allow the writing portion of DBWR to be set up in
parallel by using multiple DB_WRITERS / DBWR_IO_SLAVES respectively.
In Oracle 8 or higher versions, setting DB_WRITER_PROCESSES can enable both the gathering and
writing of buffers in parallel, producing higher throughput than slaves do. However,
DB_WRITER_PROCESS and DBWR_IO_SLAVES are mutually exclusive in Oracle 8 and higher version.
Because each writer process is assigned to an LRU latch set, the value of DB_WRITER_PROCESS should
be less than or equal to the number of LRU latches set by DB_BLOCK_LRU_LATCHES parameter. The
value of DB_WRITER_PROCESS should also not exceed the number of CPUs on the system.
In Oracle 8, careful analysis on dirty buffer queue length ((summed dirty queue length)/ (write requests))
sometimes can lead DBAs to change the value of DB_BLOCK_WRITE_BATCH. Increasing this parameter
will potentially reduce the activities of DBWR process. Similarly, DB_BLOCK_CHECKPOINT_BATCH
defines the maximum number of blocks that the database writer process will write in one batch that are
devoted to checkpoints. Careful analysis on checkpoint activities can help DBAs to set
DB_BLOCK_CHECKPOINT_BATCH to an optimal value, which can prevent the I/O system from being
flooded with checkpoint writes and allows other modified blocks to be written to disk. However, both
parameters are obsolete parameters in Oracle 8i.
SORT_AREA_SIZE is another important parameter that defines how much memory a session can use for
its sorting operations. In general, any sorting on disk should be avoided. Analysis of system-wide statistics on
disk sort can provide hints on the value of this parameter. Furthermore, DBAs need to estimate sort size of
important jobs and set individual values for each job. Similarly, HASH_AREA_SIZE needs to be analyzed
for each important job and system-wide.
Another important parameter is DB_FILE_MULTIBLOCK_READ_COUNT. This parameter defines the
maximum number of blocks during one physical read in a full table scan. Setting this parameter can reduce
the number of I/O operations. However, the value of this parameter can change the optimizer’s choice in
terms of index scan or full-table scan. Furthermore, the maximum value of this parameter is always
constrained by the maximum I/O size allowed on the OS level (i.e., (MAX I/O size) / DB_BLOCK_SIZE).
Similarly, the parameter DB_FILE_DIRECT_IO_COUNT defines the maximum number of blocks to be
used for one I/O operation done by direct read operations such as parallel full-table scan and backup and
restore situations. In the sorting operations, setting SORT_MULTIBLOCK_READ_COUNT and
HASH_MULTIBLOCK_IO_COUNT to optimal values based on requirements for each session can reduce
the number of I/O operations.

Paper #559
Administrating the Database

CREATE OPTIMAL DATABASE OBJECTS


Creating optimal database objects has a lot to do with understanding applications. Here is a list of tips and
techniques that can help you to build the objects that minimize physical I/O and increase performance.
1. Place Tables and indexes on multiple disks. Placing all tables/indexes on one disk will cause I/O
imbalance and contention.
2. Avoid fragmentation in tablespaces by using Simple Algorithm for Fragmentation Elimination (SAFE)
(Himatsingka & Loaiza. 1998). SAFE requires that the storage parameters be set at the tablespace level,
not at the segment level. INITIAL is set to uniform size, and it is always equal to NEXT.
PCTINCREASE is set to zero. Contiguous extents allow large I/Os
3. Organize segments based on their I/O pattern and activity, such as sequential I/O and random I/O and
put them in separate tablespaces.
4. Place mission-critical segments into own dedicated tablespace to allow better I/O analysis and
maintenance.
5. Create all data files in a database the same size for better analysis of I/O data. This tip coupled with a
standard naming convention of volume groups makes tuning much easier, in that each volume group
represents a distinct set of disks/logical/hyper volumes and load balancing is a matter of finding a less
busy volume group and moving data there.
6. Set all extents to a multiple of the multiblock read size.
7. For parallel I/O scan, set more than one extent for the tables.
8. Consider setting parallel degrees in the parallel query to less than the number of disks that tables are
residing on. If you do not have enough disk bandwidth, try to reduce the parallel slaves in your system.
9. Avoid row migration by setting sufficient PCTFREE. Row migration causes additional I/O for index
reads.
10. Set enough FREELISTS, if there will be many concurrent inserts taking place.
11. Use partitions to distribute I/O and spread out the data, especially when tables grow larger.
12. Use Index Organized Tables (IOT), index clustered tables, and hash clustered tables when applications
and system capacity permits.
In summary, creating optimal database objects is related to the purpose of the applications and the nature of
the data. Access patterns and volumes are key factors in determining how the objects should be created.

COLLECTING AND MAINTAINING I/O STATISTICS


There are a number of tools that can be used to collect I/O statistics in Unix, EMC, and Oracle database
levels. One of the most popular tools for collecting statistics at Unix level is sar. Using sar –d, one can
get a disk activity report which includes important information such as the device name, percentage of time
that that device was busy (%busy), average number of wait requests for that device (avque), and average
service time for that device (avserv). Figure 6 shows a portion of a disk I/O report with interval of 10
seconds for 5 times.
$sar –d 10 5
HP-UX xyz B.11.00 E 9000/800 01/04/02

23:09:42 device %busy avque r+w/s blks/s avwait avserv


23:09:52 c0t6d0 78.78 0.50 64 640 5.10 35.62
Paper #559
Administrating the Database

c2t6d0 67.77 0.50 57 644 4.91 32.05


c5t2d4 0.80 0.50 13 100 5.00 0.83
c5t2d5 2.00 0.50 14 183 5.20 1.01
c5t2d6 0.30 0.50 9 94 5.14 0.67
c5t2d7 1.40 0.50 10 93 5.57 0.91
c3t0d0 0.40 0.50 1 11 4.94 3.83
c3t0d1 0.10 0.50 0 3 5.01 8.39
c3t0d2 0.90 0.50 1 21 6.52 4.36
c5t2d0 13.01 0.50 35 717 4.72 4.57

Since sar output can be large, this report should be run often and archived for future analysis. In general, if
the report repeatedly shows a high number of waits (e.g., greater than the number of CPUs) and high service
times (e.g., greater than 20 ms) for certain devices, then you need to concentrate on those devices and map
them to their logical volumes.
Another more powerful tool for identifying possible I/O bottlenecks as well as system performance
problems on HP UX is GlancePlus. It can be run in character mode, with the command, glance. It also
can be run in Motif mode, with the gpm command. A detailed discussion on the features of GlancePlus is
beyond the scope of this paper. However, the following steps can help you to determine where a bottleneck
occurs.
1. After Glance is launched, watch the top section of the screen and see whether you have Disk Util
approaching high 90%. If it is, then go to the next step.
2. Type u. It will give you an I/O report by disks. Look for any disks with high number in the column of
Qlen. Consistent high disk queues indicate an I/O bottleneck.
3. Type S and enter the index (Idx) number of the disk that has a high disk queue. It will return a screen
with information on that disk, such as its logical volume, group name, logical reads/writes and physical
reads/writes.
4. Repeat steps 2 and 3 to find all disks having an I/O bottleneck.
5. You can further identify which logical volumes are on those hot disks (logical volume groups) by typing
v. Those logical volumes with a high number of reads and writes are usually hot spots.
One major disadvantage of Glance is that it does very limited logging of data, which prevents historical
analyses, identifying repeated I/O issues, and verifying the effects of tuning. However, use of HP
MeasureWare and HP PerfView can mitigate the disadvantage. We found the data collected by
MeasureWare and presented by PerfView are complementary to that collected by Glance and Sar.
There are several tools that you can use to collect logical to physical mapping and I/O performance data at
the EMC Symmetrix Level. The first is the symrslv command. This command gives detailed logical to
physical mapping information specific to a disk storage object. For example, issuing the following command
as root on your system will display the mapping information on the data file temp_ts_01_1_02.dbf.
symrslv -file /tdbtuner/rdbm2/oracle/TDBTUNER/datafiles/temp_ts_01_1_02.dbf
Here is the result:
Absolute Path :
/tdbtuner/rdbm2/oracle/TDBTUNER/datafiles/temp_ts_01_1_02.dbf
Resolved Object Type : HP-UX VxFS File
Resolved Object Size : 2000m
Number of Trailing Bytes : 0

Paper #559
Administrating the Database

Extent byte offset to data : 0


Number of Physical Devices : 4
File System Mount Point : /tdbtuner/rdbm2
File System Device Name : /dev/test_dbvg_01/fs_tdbtuner_rdbm2
Number of Mirrors for object (1):
{
1) Mirror Configuration

Mirror Physical Extents (2002):


{
------------------------------------------------------------------
Sym
Size SymID Dev Offset PPdevName Offset
------------------------------------------------------------------
8k 01233 005 2503m /dev/rdsk/c5t1d4 2503m
1m 01233 004 2536m /dev/rdsk/c5t1d3 2536m
1m 01233 005 2536m /dev/rdsk/c5t1d4 2536m
1m 01233 006 2536m /dev/rdsk/c7t1d0 2536m
1m 01233 007 2536m /dev/rdsk/c7t1d1 2536m
1m 01233 004 2546m /dev/rdsk/c5t1d3 2546m

Another tool for use with the Symmetrix is EMC Symmetrix Manager (ECC). It gives real-time data for
monitoring analysis and also helps you obtain information on back-end configuration, such as physical
volumes, hyper volumes, and how they relate to logical volumes. When coupled with EMC Workload
Analyzer, historical data can be maintained for trending as well as for troubleshooting. However, most of the
data is more useful for System Administrators than for DBAs.
The best tool for the Oracle DBA is EMC DB Tuner, jointly developed by EMC and Precise Software. It
provides a graphical user-interface and has little impact on production systems. One big advantage is that it
helps DBAs to associate a particular data file, such as the one identified by the script below, with EMC back-
end information, such as hyper volumes and disk directors. It also stores historical data on physical I/O as
well as the SQL statements for detailed analyses.
At the Oracle database level, you can use the following query on v$filestat to collect I/O statistics. You
should also set the I/O threshold to filter hot files and find out what objects reside in those files. Coupled
with your understanding of the applications, you can find out what application may be improved if you tune
the I/O performance.
set linesize 200
col owner format a10
col object_name format a40
col object_type format a15
col read_time format 999.90 heading 'mS/read'
break on name nodup skip 2 on read_time nodup
set pagesize 50
ttitle center 'Datafiles with "poor IO Rats (reads > 40MS/read)'

select owner, object_name, object_type, v$datafile.name name,


((readtim)*10/phyrds) read_time
from v$datafile, ind$ tab,dba_objects, v$filestat a
Paper #559
Administrating the Database

where v$datafile.file#=tab.file#
and v$datafile.ts#=tab.ts#
and v$datafile.file#=a.file#
and ((readtim)*10/(phyrds)) > 40
and (phyrds) > 0
and object_id=tab.obj#
union
select owner, object_name, object_type, v$datafile.name name,
((readtim)*10/phyrds) read_time
from v$datafile, tab$ tab,dba_objects, v$filestat a
where v$datafile.file#=tab.file#
from v$datafile, tab$ tab,dba_objects, v$filestat a
where v$datafile.file#=tab.file#
and v$datafile.ts#=tab.ts#
and object_id=tab.obj#
and v$datafile.file#=a.file#
and ((readtim)*10/(phyrds)) > 100
and (phyrds) > 0;

Alternatively, if you know which logical volume/data file has I/O contention from the data collected
by Glance or sar, then you can use the following SQL to identify what objects are in that data file.
set linesize 132
col owner format a9
col segment_name format a35
col name format a40
break on name nodup skip 2
set pages 1000

select distinct owner, segment_name, segment_type, v.name


from dba_extents, v$datafile v
where file_id = v.file#
and v.name in ('<file_name>')
order by 4,1,2;

Likewise, if you know the tables or indexes accessed by a certain application, you can use the following SQL
to identify what data files they are using. With PerfView, Glance, and sar, you can ascertain how the
application can be tuned better by tuning I/O contention.
select v.name,e.owner, e.segment_name, e.bytes, e.tablespace_name
from dba_extents e, v$datafile v
where e.file_id = v.file#
and e.segment_name in ('<segment_name>')
and e.owner ='<owner_name>';

Paper #559
Administrating the Database

A drawback for v$filestat is that its information is accumulative from instance startup and is difficult to use
for more fine-grained analyses over a period of time, such as hourly analysis, for mission-critical databases.
Oracle UTLBSTAT/ESTAT provides an opportunity for us to conduct such time-based analyses on the
overall database performance including I/O analyses. You can set up a cron job to run BSTAT/ESTAT
periodically to collect statistics on tablespace and data file levels. In my tuning practice, I changed the SQL in
ESTAT to the following, so that the output is more readable.
select file_name, table_space,
phys_blks_rd blks_read, phys_rd_time read_time,
phys_blks_wr blks_wrt, phys_wrt_tim write_time,
((phys_rd_time+phys_wrt_tim)*10)/
decode((phys_blks_rd+phys_blks_wr),0,0.0001,
(phys_blks_rd+phys_blks_wr)) "Access Time"
from stats$files
order by 7;

Here is a sample output:


FILE_NAME TABLE_SPACE BLKS_READ READ_TIME BLKS_WRT WRITE_TIME Access Time

/dev/data1.dbf DATA_TS 355 357 470 1001 16.46


/dev/data2.dbf DATA_TS 636 693 942 2317 19.07

From the report generated by BSTAT/ESTAT, you can easily identify the top I/O files and further identify
I/O activities by tables, because you can identify tables and indexes that need to be tuned and move others to
separate tablespaces and data files. This allows us to employ appropriate techniques to resolve I/O
contention on those tables/indexes, such as spreading the I/O load or by moving them to other less busy
disks.

TUNING I/O FOR BETTER PERFORMANCE – PUTTING IT ALL TOGETHER


In summary, tuning I/O is not just using the SAME, or separating indexes from tables. You have to take a
holistic approach to understand hardware/OS environment, understand the applications, create optimal
physical database layout, create optimal database objects, use appropriate database parameters, and collect
and maintain I/O related statistics. Tuning is a reiterative process. Applications change over time and I/O
requirements change over time. Thus, performance needs to be monitored and reviewed consistently. In
addition, you should always ensure that the SQLs are optimized before you plan to do any major disk and/or
database reorganization to improve performance. When the “Fire” starts, you should not only evaluate your
statistics to find the hot spot, but also understand the access pattern and access volume of the applications.
Failure to do so will only cause future contention and performance problems. At the time of balancing the
I/O distribution, remember to use ALTER TABLE MOVE and ALTER INDEX REBUILD to accomplish
quick online reorganization, and include MINEXTENTS in these commands to spread the data across all data
files within their tablespace.

ACKNOWLEDGEMENT
I would like to express my appreciation to the following individuals for giving me support and help. Without
their efforts, I would not have been able to complete this paper.
1. Michael Erwin, Practice Manager of Oracle, who has shared with me his knowledge on how to tackle
I/O problems as well as how to tune overall system performance.

Paper #559
Administrating the Database

2. Bob Ritko, Jim Rayhorn, Ray Spillman, Mark Bortle, and Prasad Sangle, our Unix Administrators, who
have supported me in my tuning efforts and have given me tutoring on hardware and OS concepts.
3. David Dalton, Senior System Engineer of EMC, who answered my questions regarding EMC Symmetrix,
Symmetrix Manager, and EMC DB Tuner.
4. Ching-Yin Fang, Technical Consultant of HP, who helped me to understand HP tools and supported me
in solving I/O-related issues.
5. Prasad Kaggallu and Raju Kotini, my managers, for their managerial support.
6. Steve Adams, Mark Bonanno, Mike Craig, Scott Myers, Stan Nickel, Rich Niemiec, and Bob Ritko for
their technical reviews and comments on the earlier versions of this paper.
7. Stephanie Caswell Schuckers, George Trapp, Mike Henry, and John Atkins, my graduate advisors at the
Lane Department of CSEE, West Virginia University, who taught me fairness, kind, hardworking, as well
as knowledge. This paper is dedicated to you.

REFERENCES
Adams, S. (2001a). The Seven Deadly Sins Just Got Worse. Oracle OpenWorld Proceeding.
Adams, S. (2001b). www.ixora.com.au
Alomari, A. (1999). Oracle8 & Unix Performance Tuning. Prentice Hall.
Aronoff, E., Loney, K., & Sonawalla, N. (1997). Advanced Oracle Tuning and Administration. Osborne.
Himatsingka, B., & Loaiza, J. (1998). How to Stop Defragmenting and Start Living: The Definitive Word on
Fragmentation. Oracle Corporation White Paper.
Loaiza, J. (2001). Optimal Storage Configuration Made Easy. technet .oracle.com.
Loney, K. (1998). Oracle8 DBA Handbook. Osborne.
Millsap, C. V. (1995). The OFA Standard – Oracle for Open Systems. Oracle Corporation White Paper.
Niemiec, R. J. (1999). Oracle Performance Tuning: Tips & Techniques. Osborne.
Oracle Corporation. (2001). SQL*Trace – Notes for Application Support Analysis. Metalink Note:77343.1.
Pearce, B. (2001). Opening the Black Box: A DBA’s View of the EMC Symmetrix. IOUG Live-2001
Conference Proceeding.
Rarich, T. (2001). Meta Volumes and Striping. EMC Engineering White Paper.
Vengurlekar, N. (1998). Database Writer and Buffer Management. Oracle Corporation White Paper.

Paper #559

S-ar putea să vă placă și