Sunteți pe pagina 1din 36

Redo Configuration for Sustained and Scalable Performance in a

Real World Environment


Krishna Manoharan
krishmanoh@gmail.com

1
Background
During routine capacity planning of
an Operational Data Store, it was
noticed that large extracts were not
scaling as predicted.
Among other things, higher log
related waits were noticed when
conducting the analysis.
A study was performed to
reduce/eliminate log related waits.
2
The Environment
Oracle 10.2.0.3 Enterprise Edition 64bit on
Solaris 9
Database Size – 4 TB
SunFire E4800 with 12 CPU’s, 48GB RAM
and 2 IO Boards.
Work Load – Numerous small transactions
due to replication and large batch process
through ETL. Operations in logging mode.
Commits done for every row change
(replication) to commits after million+ rows
changes (ETL).
3
Work Load Profile (redo) of the instance (Peak Hours)

# Stat Value
Session commit wait requested 72/sec
Session commit wait performed 72/sec
Session redo entries generated 13326 entries/sec
Session redo size 9.45MB/sec
LGWR redo synch writes 72 sync writes/sec
LGWR redo writes 74 writes/sec
LGWR lgwr throughput 9.8 MB/sec

4
Objectives of the study
Deliver predicted scalability by
Reduce/Eliminate log related waits.
Improve on log related statistics.
Eliminate LGWR/ARCH as a bottleneck.
Performance as measured by
Improved transaction rates.
Meeting pre-defined thresholds for waits and stats.
Build a standard for an optimal and scalable
redo/archive configuration.
Performance tuning measure – not capacity planning.
Limited to configuration only, no code changes.

5
Symptoms from Oracle
Top log related wait statistics (Peak Hours)

Existing Threshold
Event Waits/sec (average wait time) (average wait time)
latch: redo allocation Negligible Negligible Eliminate
latch: redo writing Negligible Negligible Eliminate
log buffer space 3.5 waits/sec 28.6 ms Eliminate
log file parallel write 74 waits/sec 11.8 ms < 2ms
log file sequential read 10 waits/sec 7.02 ms < 10ms
log file switch completion 0.5 waits/sec 20.82 ms ?
log file sync 72 waits/sec 26.45 ms < 5 ms
log file switch (checkpoint incomplete) 0.5 waits/sec 0.38 ms Eliminate

6
Symptoms from Oracle – contd.
Top log related instance statistics (During peak hours)

# Statistic Existing Threshold


LGWR redo sync response time 0.27 ms/sync write < 0.1 ms
LGWR redo write response time 0.12ms/write < 0.1 ms/write
Session redo buffer allocation retries 0.0002 retries/entry, 3.8 retries/second Eliminate
Session redo log space requests 0.000081/entry, 1 request/second Eliminate
Session redo log space wait time 0.12ms/wait for space < 0.1 ms/wait for space

7
Symptoms from the System – contd.
Top system related statistics (During peak hours)

# Parameter Before Threshold

CPU Average Run Queue 1.2 No Change


Average Involuntary context switches for
CPU LGWR 30% Eliminate
8ms (Reads) 10ms (Reads)
Storage Average Redo Lun response time 12ms (Write) 2ms (Write)
8ms (Reads) 10ms (Reads)
Storage Average File response time (redo logs) 12ms (Write) 2ms (Write)
8ms (Reads) 10ms (Reads)
Storage Average Redo Volume Response Time 12ms (Write) 2ms (Write)

8
Existing Log Configuration (Instance)
No _ parameters set.
log_buffer – Default (Seen as 4MB)
Redo log groups – 3
Size of members – 500M
log_archive_max_processes –
Default (seen as 2)
Using VRTS ODM and Vxfs
9
Existing System/Storage Configuration
Default scheduling class – TS and Default
priorities for Oracle.
Thread Affinity set to 150.
Storage Foundation 4.1 MP2 for Oracle
Enterprise Edition.
Maxphys set to 8M (system and Vxvm)
Lun Queue Depth – 30 with a max of
256/Target.
All luns – Raid 1 using 72GB, 15K RPM FC
Drives. Storage – Hitachi AMS1000
Dual Simultaneous Active (2Gbit) Paths to each
lun. Load Balancing via vxdmp.

10
Existing Physical Log Configuration (Filesystem)

Redo Members Redo Members


LOG01A.dbf - 500M (Primary of Group 1) LOG01B.dbf - 500M (Mirror of Group 1)
LOG02A.dbf - 500M (Primary of Group 2) LOG02B.dbf - 500M (Mirror of Group 2)
LOG03A.dbf - 500M (Primary of Group 3) LOG03B.dbf - 500M (Mirror of Group 3)

Filesystem 1 (vxfs) Filesystem 2 (vxfs)


/u05/redo1 /u05/redo2

Raid 1
Single Lun of 66.4GB
}
72GB, 15K RPM 72GB, 15K RPM
FC FC

11
Log Waits - Schematic
User Session

Event
LGWR
Event
Log File Parallel Write
} Log File Sync

Event
Log Buffer Space
CPU Subsystem
Event
Log File Switch Completion

Log Group 1

Redo Logs Shared Log Buffer Private Redo Strands

Log Group 2
IO Subsystem

Archive Log Files

SGA
CPU Subsystem

Event
ARCH
Log Archive IO

Event
Log File Sequential Read

12
Analysis of the symptoms
LGWR related
Wait - log file parallel write - “Writing redo records to the redo
log files from the log buffer. The wait time is the time it takes for
the I/Os to complete.”
High average wait time (11.8 ms).
Correlating Stats
High Lun response time for the redo log filesystem.
High redo sync time (0.27 ms/sync write).
High redo write time (0.12 ms/write).
Log redo sync writes (72 writes/sec).
Higher buffer allocation retries (3.8 retries/sec).
High degree of involuntary context switches for the
LGWR process.

13
Analysis of the symptoms (contd.)
Wait - log file parallel write (contd.)
With a high degree of commits, it is more important to review the
average response time/wait rather than the number of waits. Every
commit write wait results in a increment to the log file parallel write
event.
The physical layout of the redo log filesystems show a single lun
used for all the groups. Since redo log members are relatively small
sized, it is common practice for the Storage/System Admin to
assign a single lun which is then used for all the redo filesystems.
This invariably leads to poor IO performance and a slower LGWR.
A slower LGWR also results in poor commit performance (sync
writes) as evidenced by correlating stats.
A slower LGWR results in higher buffer allocation retries because
the LGWR is unable to write the redo entries and flush the buffer to
the disk fast enough to meet the session requirements.
The overall run queue on the system was low, however involuntary
context switching (~30%) indicated that LGWR was being switched
out of the CPU before it could complete it’s task.
The high lun response time for the redo log filesystem’s indicated
that IO was a bottleneck.

14
Analysis of the symptoms (contd.)
LGWR related
Wait - log file switch completion – “Waiting for a log switch to
complete”
Wait - log file switch (checkpoint incomplete) – “Waiting for a
log switch because the session cannot wrap into the next log.
Wrapping cannot be performed because the checkpoint for that
log has not completed”
Large number of waits (0.5 waits/sec) with high average
wait time (20.82 ms).
Correlating Stats
redo log space requests – 1 request/sec
redo log space wait time – 0.12ms/entry

15
Analysis of the symptoms (contd.)
Wait - log file switch completion (contd.)
During a log file switch, redo generation is disabled. So this wait
directly impacts session performance.
The log members were only 500M in size and thus causing frequent
log switches (every 1 minute). This will result in higher waits.
The log_buffer is 4M in size and during a log switch, the log buffer is
flushed to disk. If there is an IO bottleneck to the redo log files, then
flushing 4M of log buffer could result in higher response times.
Since the redo log groups were on the same set of Filesystems,
there could possibly be a IO conflict between the checkpoint and
LGWR processes when doing a log switch as shown in the wait log
file switch (checkpoint incomplete).
However a bigger log file can also cause slower log file switches.
The impact of increasing the log member size needs to be studied
with respect to the event – log file switch completion.

16
Analysis of the symptoms (contd.)
ARCH related
Wait - log file sequential read – “Waiting for the
read from this logfile to return. This is used to read
redo records from the log file – either for recovery or
archival. The wait time is the time it takes to complete
the physical I/O (read)”
High number of waits (10 waits/sec) with
high average wait time (7 ms).
Correlating Stats
High Lun response time for the redo log
filesystem.
Event – log file parallel write (high average
wait time – 11.8 ms)

17
Analysis of the symptoms (contd.)
Wait - log file sequential read (contd.)
Small sized redo log members cause frequent log switches (1/minute).
These logs need to be archived and thus indirectly impacts the event -
log file sequential read.
Members of the redo groups were located on the same filesystems and
share the same physical LUNS.
This results in IO contention because the ARCH process is reading from
the previous group as the LGWR is writing to the present group .
This in turn impacts LGWR write performance thus resulting in increased
response time for the events - log file parallel writes and log file sync
waits.
Poor archival performance can also indirectly impact log switches as
reported in the event – log file switch (archival incomplete) and thus
session performance.
For 500 M log members, the average response time is on the higher
side again indicating an IO contention.
Since redo log members are relatively small sized, it is common practice
for the Storage/System Admin to assign a single lun which is then used
for all the redo filesystems.
The nature of the access being sequential, this problem is multiplied in
effect – especially if the lun is Raid 5.
Increasing log file sizes can also cause this event to report higher wait
times.

18
Analysis of the symptoms (contd.)
Session related
Wait - log buffer space – “Waiting for space in
the log buffer because the session is writing data
into the log buffer faster than LGWR can write it
out”
High number of log buffer space waits (3.5
waits/sec) with an average response time of
28.6 ms.
Correlating Stats
Event – log file parallel write (high average
wait time)
redo buffer allocation retries (3.8
retries/sec).

19
Analysis of the symptoms (contd.)
Wait - log buffer space (contd.)
This along with the high response time for the
log file parallel writes wait shows a slow LGWR.
The presence of higher redo log buffer
allocation retries also correlate this wait.
It also can mean that the default log buffer
(4MB) is too small for the rate of redo
generation (9.45 MB/sec).
During a log switch, LGWR flushes the
log_buffer to disk. So the impact of increasing
the size of the log_buffer needs to be analyzed
with respect to the event – log file switch
completion.

20
Analysis of the symptoms (contd.)
Session related
Wait - log file sync – “When a user session commits,
the session's redo information needs to be flushed to
the redo logfile. The user session will post the LGWR
to write the log buffer to the redo log file. When the
LGWR has finished posting, it will post the user
session. The wait time includes the writing of the log
buffer and the post.”
The average wait time was 26.45 ms.
Correlating Stats
Event – log file parallel write (high average
wait time – 11.8 ms)
High degree of involuntary context switches
for both user session and LGWR.

21
Analysis of the symptoms (contd.)
Wait - log file sync (contd.)
Every commit write wait/immediate will result in an
increment of the wait counter and a redo write (resulting in
an increment to the log file parallel write wait counter).
Rather than the number of waits, the average wait time is
important for this wait event.
Under ideal circumstances, the average wait time for a log
file sync event should be the about the same as the
average wait time for the wait – log file parallel write. If
there is a difference, then it probably indicates a CPU
bottleneck for the session.
Higher wait times can be a result of slow LGWR as well
as CPU bottleneck (evidenced by high involuntary context
switches for session processes)

22
Initial Conclusions
From the waits and stats, we came to
following conclusions
LGWR
The underlying IO subsystem for the
redo logs needed to be improved.
The redo log members needed to be
resized from 500M to a suitable size.
Also increase the groups from 3 to 4.
Reduce LGWR involuntary switches by
addressing OS scheduling issues.
23
Initial Conclusions (contd.)
ARCH
Separate the redo log groups onto
dedicated filesystems to prevent
contention between ARCH and LGWR.
Session
Increase log buffer from the default to a
suitable value taking into consideration
impact on the event log file switch
completion.

24
Final Configuration Details
After 30 or so runs, we finally arrived at the below
optimal configuration.
Redo Filesystem configuration (to address IO issues)
Striped filesystems on dedicated Raid 1 luns configured for the redo
logs as shown in the next slide.
Filesystem is vxfs with 8k block size.
Stripe Width = 1M
LGWR configuration (to address involuntary context
switches)
The FX scheduling class was set for the LGWR process. The CPU time
quantum was increased to 1000 and the priority set to 59.
# priocntl -s -c FX -m 59 -p 59 -t 1000 -i pid <LGWR process>

The thread affinity was set to 150 for the entire system, however we
decided it to be best if the LGWR was bound to a specific CPU.
# psrset –c <CPU>
# psrset –b 1 <LGWR process>

25
New Filesystem Layout and redo group placement

Redo Members Redo Members Redo Members Redo Members


LOG01A.dbf - 1500M (Primary of Group 1) LOG02B.dbf - 1500M (Mirror of Group 2) LOG02A.dbf - 1500M (Primary of Group 2) LOG01B.dbf - 1500M (Mirror of Group 1)
LOG03A.dbf -1 500M (Primary of Group 3) LOG04B.dbf -1 500M (Mirror of Group 4) LOG04A.dbf -1 500M (Primary of Group 4) LOG03B.dbf -1 500M (Mirror of Group 3)

Filesystem 1 (vxfs) Filesystem 2 (vxfs) Filesystem 3 (vxfs) Filesystem 4 (vxfs)


Layout=Stripe Layout=Stripe Layout=Stripe Layout=Stripe
STWidth = 1M STWidth = 1M STWidth = 1M STWidth = 1M
/u05/redo1 /u05/redo2 /u05/redo3 /u05/redo4
}

Raid 1 Raid 1
Single Lun of Single Lun of
66.4GB 66.4GB
}

72GB, 15K RPM 72GB, 15K RPM 72GB, 15K RPM 72GB, 15K RPM
FC FC FC FC

26
Final Configuration Details (contd.)
Redo groups
4 redo groups configured with 2 members each.
The log members were placed in such a manner on the redo
filesystems to eliminate LGWR and ARCH IO contention.
Each member was 8G in size (8G log members would
reduce the log switches from 1 switch per minute to 1 switch
every 7 minutes).
Reducing log switches improves performance as during a log
switch, redo generation is disabled.
8G was an ideal size – log archiving completed within 2
minutes whereas log switches happened every 7 minutes.
Increasing the log member size resulted in higher wait times
for the events – log file switch completion and log file
sequential read.
However the overall performance gain was well worth it.

27
Final Configuration Details (contd.)
Session
The log buffer was set to 72M (after several
iterations).
A 72M log buffer along with 8G log file members
resulted in an higher response time for the event –
log file switch completion.
However we completely eliminated the wait event –
log buffer space (even when simulating 1.5X load).
72M appears to be an ideal size for a redo
generation rate up to 14MB/sec.
The _log_io_size is set to a maximum of 1M
irrespective of the log_buffer size once the log_buffer
crosses ~ 6MB. Also since we had a Storage sub-
system which was quite capable of handling upto
32M in a single write within acceptable response
time, we did not downsize the log_io_size.

28
Final Configuration Details (contd.)
Session (contd.)
Improving the LGWR write performance however resulted
in the redo allocation latch contention.
To reduce the redo allocation latch contention, we
increased the parallelism for the shared redo buffer from
the default of 2 to 12.
_log_parallelism_max = 12 # Default is 2. Max - Limited to CPU count
_log_parallelism = 4 # Default is 1

By enabling log_parallelism, the shared log buffer is split


into log_parallelism_max sections each assigned a redo
allocation latch.
As per oracle documentation, the redo allocation latch for
the shared log buffer is randomly assigned to the
requestor and then does a round-robin allocation. We did
notice that this was not an optimal way of assignment.

29
Final Configuration Details (contd.)
ARCH
The ARCH process reads OS sized blocks as set by
the _log_archive_buffer_size parameter.
The default and maximum value on Solaris with Oracle
10g is 2048 OS blocks (equates to 1MB reads).
So the archive logs filesystem was also created as a
stripe filesystem with 1MB stwidth.
Performance improved as the redo logs filesystems
and the archive filesystems were both stripe
filesystems with 1MB stripe width. Average ARCH
throughput was around 150MB/sec.
However we did notice that the ARCH process reads
from the primary group member only. It does not read
simultaneously from both the members.
We did not change the log_archive_max_processes
from default (2).
30
Final Results

Peak Work load showed an


improvement of 7x.
Least improvement was 4x.
At 1.5X load, the scalability
was near linear.

31
The results – Work Load Profile (redo)

# Stat Before After


Session commit wait requested 72/sec 520/sec
Session commit wait performed 72/sec 520/sec
Session redo entries generated 13326 entries/sec 14677 entries/sec
Session redo size 9.45MB/sec 10.1MB/sec
LGWR redo synch writes 72 sync writes/sec 520 sync writes/sec
LGWR redo writes 74 writes/sec 845 writes/sec
LGWR lgwr throughput 9.8 MB/sec 10.75 MB/sec

32
The results – Waits

Before After Target


Before After Threshold
Event Waits/sec (average wait time) Waits/sec (average wait time) (average wait time)
latch: redo allocation Negligible Negligible 0.002 waits/sec 0.9 ms Eliminate
latch: redo writing Negligible Negligible 0 waits/sec 0 ms Eliminate
log buffer space 3.5 waits/sec 28.6 ms 0 waits/sec 0 ms Eliminate
log file parallel write 74 waits/sec 11.8 ms 845 waits/sec 0.55 ms < 2ms
log file sequential read 10 waits/sec 7.02 ms 10.5 waits/sec 16.62 ms ?
log file switch completion 0.5 waits/sec 20.82 ms 0.02 waits/sec 31.5 ms ?
log file sync 72 waits/sec 26.45 ms 519 waits/sec 2.13 ms < 5 ms
log file switch (checkpoint incomplete) 0.5 waits/sec 0.38 ms 0 waits/sec 0 ms Eliminate

33
The results – Stats

34
The results – System

# Parameter Before After Threshold

CPU Average Run Queue 1.2 1.2 No Change


Average Involuntary context switches for
CPU LGWR 30% < 0.1 % Eliminate
8ms (Reads) 16ms (Reads) 10ms (Reads)
Storage Average Redo Lun response time 12ms (Write) < 1ms (Write) 2ms (Write)
8ms (Reads) 16ms (Reads) 10ms (Reads)
Storage Average File response time (redo logs) 12ms (Write) < 1ms (Write) 2ms (Write)
8ms (Reads) 16ms (Reads) 10ms (Reads)
Storage Average Redo Volume Response Time 12ms (Write) < 1ms (Write) 2ms (Write)

35
Final Thoughts
In order of biggest impact to performance (in
descending order),
1. IO Subsystem (50%)
2. Redo Groups layout and sizing of log file
members (20%)
3. CPU Scheduling (15%)
4. Log Buffer (10%)
5. Log Parallelism (5%)
The LGWR process in 10g is incredibly efficient
requiring minimal tuning, however it would have
been ideal if there was dedicated LGWR for each
shared strand.
One can only imagine the performance gain with
multiple LGWR each servicing distinct log buffers.
36

S-ar putea să vă placă și