Documente Academic
Documente Profesional
Documente Cultură
Each Case Study contains a skill level rating. The rating provides an indication of what
skill level the reader should have as it relates to the information in the case study.
Ratings are:
When diagnosing the problem several diagnostic events were set to highlight which part
of the freelist search algorithm was causing the buffer contention. These events should
not be set unless advised by Oracle Support Services.
Case History
The issue reported by customer in the case study was:
We have a big 1.4TB table and every day the application inserts around 6 to 9 million
rows.
The row contains a long raw column and is around 3KB. During the time of the inserts
we see lots of sessions waiting on buffer busy waits on the same datablocks.
The customer reported they were carrying out the inserts from a number of concurrent
sessions, and had set the number of process freelists on the table to 23. The table involved
had the following structure:
It was reported that these indexes showed no signs of contention, and they could always
see one session waiting on a db file sequential read and all others are waiting on a buffer
busy wait with reason code 120, which indicates the session is waiting for the block to be
read into the buffer cache.
This table had been created some time ago so the history of why the buffer busy waits
issue had suddenly become a problem was simply explained as an increase in data
volume being inserted.
The puzzling thing was that processes are supposed to map to different process freelists
based on the following algorithm:
If no free blocks can be found on the assigned process freelist (PFL) then the process will
search the Master Freelist (MFL) for space. If no space is found, committed Transaction
Freelists are merged into the MFL and those blocks are scanned for usefulness. If still no
Page 2
space is found the High Water Mark is incremented and the new blocks can be moved to
the PFL and used. The customer was reporting all processes waiting on the same
datablocks whilst carrying out the insert statements, which indicates some problem with
the freelist search mechanism since the multiple process freelists should have mitigated
this kind of contention. This is where the data gathering and analysis begins.
The customer's database version used in this case study is 8.1.7.4 on Solaris. However,
this issue could occur on any platform with database versions 8.0 or higher.
Analysis
Summary
A number of different data items were collected to help determine a cause. These
included:
• Statspack reports to show what the largest contention points were for the
database
• I/O statistics from the operating system to see if we were running into bad I/O
times slowing down the freelist search
• Datablock dumps for the segment header and other blocks being waited on to
see how the freelist lists were changing
• Several diagnostic events that dump information to a trace file and show how
the freelist search mechanism was working
Using the collected data it was possible to build a test case that reproduced the
same problem on a multiple CPU box using concurrent sessions inserting into the
same table. The test case did in fact show a problem with how the freelist search
mechanism is a single point of contention when there are few rows per block and
a high concurrency level of DML activity. An in-depth review of the collected
data is provided in the next section.
Detailed Analysis
The following sections describe how each piece of data was collected and what it shows.
Once all the data has been described a final cause determination will be presented, tying
everything together.
Page 3
The statspack report clearly showed massive I/O contention issues:
Looking further down the statspack report it is clear which tablespace is having all the
waits against it:
Ideally the average read times should be at a maximum of 10-20ms, so although the
DATA01 tablespace is in the upper band, it is still not a cause for why the customer is
reporting large waits for buffer busy waits. The datafile I/O statistics section of the
statspack reports showed many of the datafiles with waits against them, and four of the
fourteen disks being used for the datafiles showed average read times of 20-25ms.
I/O statistics from the operating system using vxstat also showed the same four disks with
a higher average read time than the rest. The other disks also showed relatively high read
times:
Page 4
The I/O statistics certainly indicate a performance problem but do not explain the earlier
reports of buffer busy waits on the same blocks, contradicting with the expected way in
which process freelists spread out the use of datablocks amongst processes.
It was decided by the customer that we would assume the I/O was certainly not helping
performance issues but they wanted to concentrate on why the process freelists did not
appear to be used correctly. The investigation moved in this direction.
The buffer busy wait statistics for the class of datablock block being waited on is shown
in the statspack:
This indicates we have some hot block issue, so we needed to find out what these hot
blocks are and why they appear so hot.
In order to find the hot block(s) v$session_wait was queried to show which blocks were
being contended for. The results proved interesting as the particular block being sought
was always changing:
select b.sid,b.username,event,wait_time,p1,p2,p3,b.sql_hash_value,b.status
from v$session_wait a,v$session b
where event not like 'SQL*Net message%'
and event not like 'rdbms%' and a.sid=b.sid
and b.sql_hash_value=4290940428
and b.sid>8
order by sql_hash_value;
SQL> /
1 In v$session_wait, the P1, P2, and P3 columns identify the file number, block number, and buffer busy reason
codes, respectively.
Page 5
SID USERNAME EVENT WAIT_TIME P1 P2 P3
---------- ---------- ------------------ --------- ------ ---------- ----------
16 GALLERY buffer busy waits 0 257 101465 120
44 GALLERY buffer busy waits 0 257 101465 120
86 GALLERY buffer busy waits 0 257 101465 120
104 GALLERY buffer busy waits 0 257 101465 120
147 GALLERY buffer busy waits 0 257 101465 120
179 GALLERY buffer busy waits 0 257 101465 120
200 GALLERY buffer busy waits 0 257 101465 120
226 GALLERY buffer busy waits 0 257 101465 120
254 GALLERY buffer busy waits 0 257 101465 120
316 GALLERY buffer busy waits 0 257 101465 120
313 GALLERY buffer busy waits 4 257 101465 120
292 GALLERY buffer busy waits 0 257 101465 120
184 GALLERY buffer busy waits 0 257 101465 120
164 GALLERY buffer busy waits 0 257 101465 120
111 GALLERY db file sequential read 0 257 101465 1
Note: In v$session_wait, the P1, P2, and P3 columns identify the file number, block
number, and buffer busy reason codes, respectively
We then needed to find out which object these blocks belonged to so we could dump the
segment header over a period of time to see if the freelists were changing:
Note: Each time we ran this query with the blocks being waited on it was always this
same table.
Dumping the segment header showed that the process freelists were being used as
expected, because all the freelist structures were changing over time:
HEADER_FILE HEADER_BLOCK
------------ ------------
14 2
Page 6
SEG LST:: flg: USED lhd: 0x00000000 ltl: 0x00000000
SEG LST:: flg: USED lhd: 0x09c29f25 ltl: 0x09c29f25
SEG LST:: flg: USED lhd: 0xa5822a00 ltl: 0xa5822a00
SEG LST:: flg: USED lhd: 0x00000000 ltl: 0x00000000
SEG LST:: flg: USED lhd: 0xa0036033 ltl: 0x33038d14
SEG LST:: flg: USED lhd: 0xa1c0d61d ltl: 0x8702d4dc
SEG LST:: flg: USED lhd: 0x9483ab8e ltl: 0x0f83baec
SEG LST:: flg: USED lhd: 0x6ac071bb ltl: 0x61c23e99
SEG LST:: flg: USED lhd: 0xd401cd2c ltl: 0xaa02eabb
SEG LST:: flg: USED lhd: 0x00000000 ltl: 0x00000000
SEG LST:: flg: USED lhd: 0x97432a9a ltl: 0x84c3cced
SEG LST:: flg: USED lhd: 0xa88062fb ltl: 0x9483911c
SEG LST:: flg: USED lhd: 0x79436ef5 ltl: 0x790102af
SEG LST:: flg: USED lhd: 0x00000000 ltl: 0x00000000
SEG LST:: flg: USED lhd: 0x8000b9d1 ltl: 0x1a42881d
SEG LST:: flg: USED lhd: 0x00000000 ltl: 0x00000000
SEG LST:: flg: USED lhd: 0x1cc20fc7 ltl: 0x19005a17 <- Process Freelist #23
XCT LST:: flg: UNUSED lhd: 0x00000000 ltl: 0x00000000 xid:0x0000.000.00000000
... <- Transaction freelists populated when a transaction frees more blocks
XCT LST:: flg: USED lhd: 0x0d03d289 ltl: 0x0d03d289 xid:0x0001.00e.00870924
XCT LST:: flg: USED lhd: 0x2900d4fd ltl: 0x2900d4fd xid:0x0004.05e.006b4288
XCT LST:: flg: USED lhd: 0x1d42e271 ltl: 0x1d42e271 xid:0x0002.052.0086f6c3
XCT LST:: flg: USED lhd: 0xda41e80c ltl: 0xda425b83 xid:0x0004.018.006b42f4
End dump data blocks tsn: 4 file#: 14 minblk 2 maxblk 2
Page 7
The transaction freelists (TFL) are emptied when a process cannot find any free blocks on
the MFL (as described earlier). The fact that the TFLs have not been emptied implies the
MFL has always managed to supply enough blocks for the requesting processes or all
transactions are still uncommitted (which seems unlikely).
The second segment header dump shows 15 out of the 23 process freelists have changed,
including the master freelist. The tail of the master freelist has not changed, but the
header has, which indicates the master freelist has enough free blocks on it to satisfy all
searches within the monitored time period. The fact that many of the freelists are
changing indicates the process freelist assignment is working correctly. What this data
doesn’t prove is if the searching mechanism is working correctly and it certainly doesn’t
highlight any cause to the buffer busy waits issue.
In order to find out more about what was happening with several of the waiting sessions
during the free space search, we set a few diagnostic events to gather tracing information.
The events that were used:
Note: It is only recommended to set the freelist tracing events under advice from
Oracle Support. These events can produce a large amount of trace data so setting
them should be done only for short periods of time.
The customer was instructed to set the events for three waiting sessions when they saw a
high number of buffer busy waits with the 120 reason code. After tracing for 5 minutes,
all the events were turned off. The PL/SQL code used to enable and disable the events is
listed below.
Page 8
cursor c2 (wevent NUMBER) is
select sid,serial# from tracing where event=wevent;
BEGIN
if what=10320 -- Freelist tracing
then
if onoff=1 -- Turn freelist tracing ON
then
for rec in c1 loop
dbms_system.set_ev(rec.sid,rec.serial#,10320,1,'');
insert into tracing values (rec.sid, rec.serial#, 10320);
commit;
end loop;
elsif onoff=0 -- Turn freelist tracing OFF
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10320,0,'');
delete from tracing where sid=rec.sid and event=10320;
commit;
end loop;
end if;
elsif what=10022 -- Freelist 10022 tracing
then
if onoff=1 -- Turn freelist tracing ON
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10022,1,'');
insert into tracing values (rec.sid, rec.serial#, 10022);
commit;
end loop;
elsif onoff=0 -- Turn freelist tracing OFF
then
for rec in c2(10022) loop
dbms_system.set_ev(rec.sid,rec.serial#,10022,0,'');
delete from tracing where sid=rec.sid and event=10022;
commit;
end loop;
end if;
elsif what=10085 -- Freelist 10085 tracing
then
if onoff=1 -- Turn freelist tracing ON
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10085,1,'');
insert into tracing values (rec.sid, rec.serial#, 10085);
commit;
end loop;
elsif onoff=0 -- Turn freelist tracing OFF
then
for rec in c2(10085) loop
dbms_system.set_ev(rec.sid,rec.serial#,10085,0,'');
delete from tracing where sid=rec.sid and event=10085;
commit;
end loop;
end if;
elsif what=10080 -- Freelist 10080 tracing
then
if onoff=1 -- Turn freelist tracing ON
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10080,1,'');
insert into tracing values (rec.sid, rec.serial#, 10080);
Page 9
commit;
end loop;
elsif onoff=0 -- Turn freelist tracing OFF
then
for rec in c2(10080) loop
dbms_system.set_ev(rec.sid,rec.serial#,10080,0,'');
delete from tracing where sid=rec.sid and event=10080;
commit;
end loop;
end if;
elsif what=10082 -- Freelist 10082 tracing
then
if onoff=1 -- Turn freelist tracing ON
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10082,1,'');
insert into tracing values (rec.sid, rec.serial#, 10082);
commit;
end loop;
elsif onoff=0 -- Turn freelist tracing OFF
then
for rec in c2(10082) loop
dbms_system.set_ev(rec.sid,rec.serial#,10082,0,'');
delete from tracing where sid=rec.sid and event=10082;
commit;
end loop;
end if;
elsif what=10046 -- SQL Trace - SHOULD BE TURNED ON AFTER 10320 TRACE
then
if onoff=1 -- Turn SQL trace ON
then
for rec in c2(10320) loop
dbms_system.set_ev(rec.sid,rec.serial#,10046,12,'');
insert into tracing values (rec.sid, rec.serial#, 10046);
commit;
end loop;
elsif onoff=0 -- Turn SQL trace OFF
then
for rec in c2(10046) loop
dbms_system.set_ev(rec.sid,rec.serial#,10046,0,'');
delete from tracing where sid=rec.sid and event=10046;
commit;
end loop;
end if;
end if;
end trace_freelists;
/
exec trace_freelists(10320,1);
exec dbms_lock.sleep(10); -- need to wait 10secs for next event to work
exec trace_freelists(10022,1);
exec dbms_lock.sleep(10);
exec trace_freelists(10085,1);
exec dbms_lock.sleep(10);
exec trace_freelists(10080,1);
exec dbms_lock.sleep(10);
exec trace_freelists(10082,1);
exec dbms_lock.sleep(10);
exec trace_freelists(10046,1);
Wait 5 minutes then turn off each event in the following order:
Page 10
exec trace_freelists(10046,0);
exec dbms_lock.sleep(10);
exec trace_freelists(10082,0);
exec dbms_lock.sleep(10);
exec trace_freelists(10080,0);
exec dbms_lock.sleep(10);
exec trace_freelists(10085,0);
exec dbms_lock.sleep(10);
exec trace_freelists(10022,1);
exec dbms_lock.sleep(10);
exec trace_freelists(10320,1);
The trace files generated confirmed the sessions were assigned different process freelists
and use different blocks for some of the inserts:
Session #1:
*** 2005-07-13 05:56:24.594
KTSGSP: flag = 0xa7, seg free list = 7, tsn = 4 block = 0xb1c3721d
*** 2005-07-13 05:56:24.655
KTSGSP: flag = 0x24, seg free list = 7, tsn = 4 block = 0xca030308
Session #2:
*** 2005-07-13 05:56:11.672
KTSGSP: flag = 0xa7, seg free list = 9, tsn = 4 block = 0xbf02e415
*** 2005-07-13 05:56:11.730
KTSGSP: flag = 0xa7, seg free list = 9, tsn = 4 block = 0xb4c265e7
Session #3:
*** 2005-07-13 05:56:05.362
KTSGSP: flag = 0xa7, seg free list = 20, tsn = 4 block = 0xabc191fa
*** 2005-07-13 05:56:05.369
KTSGSP: flag = 0xa7, seg free list = 20, tsn = 4 block = 0xabc17d9b
But it also showed times when the sessions would be checking the same blocks for use:
Session #1:
KDTGSP: seg:0x1c000002 wlk:0 rls:0 options:KTS_EXCHANGE KTS_UNLINK
pdba:0xbf40185d
WAIT #3: nam='buffer busy waits' ela= 2 p1=438 p2=91992 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=429 p2=117294 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=429 p2=117294 p3=120
WAIT #3: nam='buffer busy waits' ela= 1 p1=429 p2=117294 p3=120
... Waits on many different blocks with an occasional db file seq read
WAIT #3: nam='buffer busy waits' ela= 1 p1=839 p2=53567 p3=120 <- Starts
waiting on same blocks here
WAIT #3: nam='buffer busy waits' ela= 0 p1=839 p2=53561 p3=120
WAIT #3: nam='buffer busy waits' ela= 1 p1=832 p2=187797 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=832 p2=187797 p3=220
WAIT #3: nam='buffer busy waits' ela= 1 p1=832 p2=187781 p3=120 <- waiting on
session #2 to read the block
WAIT #3: nam='buffer busy waits' ela= 0 p1=764 p2=65082 p3=120
WAIT #3: nam='buffer busy waits' ela= 2 p1=764 p2=65082 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=758 p2=29094 p3=120
WAIT #3: nam='buffer busy waits' ela= 1 p1=758 p2=29094 p3=120
Page 11
...
This continues for at least another 30-40 blocks
Session #2:
KDTGSP: seg:0x1c000002 wlk:0 rls:0 options:KTS_EXCHANGE KTS_UNLINK
pdba:0x35429ced
WAIT #3: nam='buffer busy waits' ela= 1 p1=839 p2=53567 p3=120 <- Goes
straight to the MFL here as looking for common blocks
WAIT #3: nam='buffer busy waits' ela= 0 p1=839 p2=53561 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=839 p2=53561 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=839 p2=53561 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=832 p2=187797 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=832 p2=187797 p3=120
WAIT #3: nam='buffer busy waits' ela= 1 p1=832 p2=187797 p3=120
WAIT #3: nam='db file sequential read' ela= 1 p1=832 p2=187781 p3=1 <- first
process to need block so reads in
WAIT #3: nam='buffer busy waits' ela= 0 p1=764 p2=65082 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=764 p2=65082 p3=120
WAIT #3: nam='buffer busy waits' ela= 2 p1=764 p2=65082 p3=120
WAIT #3: nam='buffer busy waits' ela= 0 p1=758 p2=29094 p3=120
WAIT #3: nam='buffer busy waits' ela= 1 p1=758 p2=29094 p3=120
...
This continues for at least another 30-40 blocks
The second trace output shows that session #1 first starts to traverse a set of blocks that
another session (not known which session) is currently reading from disk. We know this
because it is still waiting on buffer busy waits with a p3 (reason code) value of 120. Then
it starts to traverse blocks that session #2 is also trying to read. This indicates that both
sessions are reading from the master freelist at the same time. We know this because
earlier in the trace file we saw they are assigned to different process freelists. Because all
processes will move to searching the master freelist if no suitable blocks are found on
their process freelist this highlights a possible limitation that can occur when several
process freelists are empty resulting in a number of processes trying to search the same
master freelist. The severity of this seemed unexpected so a bug (4523986) was opened to
get some input from Oracle Development. Development came back with the following
thoughts:
The crux of the problem is that all sessions are reading the same five blocks at the same
time (and only one of them will eventually succeed in moving them to their PFL which
means the problem repeats on another five block list for the rest of them).
Your I/O issue is probably making this worse as presumably the cache is running slowly -
the five block lists are read in current shared mode.
With this input from development and the diagnostic data we have analyzed we can state
the cause determination as being due to allowing concurrent processes to traverse and
Page 12
manipulate the master freelist when looking for space. If the assigned process freelists are
empty of suitable blocks, they move to searching the master freelist. By default only one
master freelist is created and controlled by the segment header, which becomes the new
serialized point of contention.
(2) Pinning the Segment Header when walking the MFL slightly costly but could be
worked out.
(3) Aquiring block in CR mode, so that the other processes could do some work rather
than waiting.
Page 13
Possible workaround we looked at for this this problem was:
Use FREELIST GROUPs (even in single instance this can make a difference), I came
across some note stating possible resolutions for high buffer busy waits.
Freelist groups are mapped as (in a non-OPS environment or OPS and Single-Instance
environment):
Making RDBMS kernel code changes for 8.1.7.4 would provide a more comprehensive
solution for all tables that would have this high concurrency issue but also would have
further implications. For example, it may simply move contention from buffer busy waits
to enqueue waits which might cause a worse bottleneck.
In this customer’s case, the suggested workaround of using freelist groups makes perfect
sense. When an object is created it defaults to freelist groups of 1 where all the freelist
information is maintained in the segment header, as in this case. When using a value
greater than 1 additional datablocks are created after the segment header that will store a
master freelist, a number of process freelists and transaction freelists. The segment header
will only contain a single master freelist, the master of all freelists if you like. Freelist
groups were originally designed for use with OPS (Oracle Parallel Server) so that each
instance will be assigned a different freelist group and all processes connecting to that
instance will not interfere with free space searching on another instance causing blocks to
ping between them. Within a single instance environment, using freelist groups can still
provide some benefit because each process will be assigned to a different master and set
of process freelists. By increasing the freelist groups from 1 it allows us to reduce the
contention on the single master freelist, as now processes will search the master freelist in
their assigned freelist group block. A process will only search the MFL in the segment
header if no space can be found in their freelist group block.
I decided to test using freelist groups on my test environment and the results are listed
below:
FREELIST GROUPS = 1
Avg time for each session to complete inserts: 2:57mins
FREELIST GROUPS = 7
Avg time for each session to complete inserts: 2:20mins
Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
Page 14
free buffer waits 1,988 39,364 45.20
db file sequential read 23,666 17,588 20.20
buffer busy waits 5,491 9,534 10.95
rdbms ipc reply 126 9,444 10.84
latch free 793 5,793 6.65
FREELIST GROUPS = 17
Avg time for each session to complete inserts: 2:31mins
Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
free buffer waits 462 41,452 46.90
db file sequential read 25,048 24,291 27.48
latch free 2,045 9,825 11.12
rdbms ipc reply 124 6,072 6.87
log file parallel write 341 2,937 3.32
The test involved 12 concurrent session inserting 10,000 rows into a table of the same
structure provided at the beginning of this case study.
It is clear from my testing results that using multiple freelist groups can increase
performance significantly by decreasing the number of waits on buffer busy waits. When
the buffer busy waits decreased and freelist contention no longer became an issue,
especially when using 17 freelist groups, it is apparent that my test system has an I/O
issue due to the increase in free buffer waits and db file sequential read waits. This could
also be due to an incorrectly sized buffer cache so further investigation would be required
to remove this new area of contention.
NOTE: Freelist groups cannot be dynamically added to objects. The object must be
recreated with a new FREELIST GROUPS setting and then repopulated with data.
In conclusion, when seeing a high number of sessions waiting on buffer busy waits for the
same datablock, that are continually changing, on a table that has a number of process
freelists defined, it is possible you may be running into the serialization problem with
searching and moving blocks from the master freelist to the assigned process freelist. The
diagnostic steps outlined in this case study have described an approach to determining the
cause to such waits. A workaround is also provided of rebuilding the table with multiple
freelist groups, which was demonstrated to relieve the buffer busy waits. It is important to
note that removing one area of contention often highlights a different area that needs
further optimization.
References
Note 157250.1 Freelist Management with Oracle 8i, Stephan Haisley, Center of Expertise
Page 15