Documente Academic
Documente Profesional
Documente Cultură
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
Troubleshooting Steps
Community Discussions
References
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.4 and later
Information in this document applies to any platform.
Oracle RDBMS Automatic Workload Repository
PURPOSE
The purpose of this document is to describe the diagnostic data that can be collected and the corrective steps that can be taken when
encountering issues with slow, hung or missing AWR snapshots.
TROUBLESHOOTING STEPS
Sometimes AWR snapshots aren't created as expected. A check in DBA_HIST_SNAPSHOT shows no snapshots in the expected time range.
col
col
col
col
col
col
Err#
----0
0
0
0
There are two different causes for this scenario: snapshots not starting, and snapshots not completing successfully.
1 of 8
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
Snapshots should happen every hour (+0 01:00:00). AWR tries to take the snapshots at the top of the hour (08:00, 09:00, 10:00, ...). That
means that the first snapshot after startup, or the first automatic snapshot following a manual snapshot, may appear to be "late" or "early".
Common reasons snapshots might not be collected are given below.
Database Closed
In the case shown here, the last snap time was 24th Dec 01:00 PM (see output above) and the date at the time of checking is 28th Dec, so
here the last snapshot was taken over 4 days ago. The explanation is simple: the database was just opened, and has been closed for several
days.
SQL> alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
Session altered.
SQL> select open_time from v$thread ;
OPEN_TIME
-------------------28-DEC-2009 13:11:46
TYPE
----------boolean
string
integer
boolean
VALUE
------------------FALSE
TYPICAL
0
TRUE
Parameter with "SWRF", "AWR" or "SYSAUX" in them could also have an impact. If any of those are set find out why. If they're no longer
required try to unset them.
Database Open Mode
Some open modes may also disable automatic snapshots. For example, in restricted mode or read only mode, snapshots are disabled.
Check the database open mode:
SQL> select open_mode from v$database;
OPEN_MODE
---------READ WRITE
If the above isn't sufficient to resolve the issue, customers may need to gather traces and send to Oracle support. Often, it's best to get the
MMON action trace ("28" below) and the snapshot flush trace ("10" below).
SQL> alter session set "_swrf_test_action" = 28;
Session altered.
SQL> alter session set "_swrf_test_action" = 10;
Session altered.
Wait for at least an hour, and then gather the MMON and M00x traces.
To disable the traces again, use the following syntax:
SQL> alter session set "_swrf_test_action" = 29;
Session altered.
SQL> alter session set "_swrf_test_action" = 11;
Session altered.
With the tracing enabled the MMON trace file shows the start time and completion messages for each auto flush and purge:
*** MODULE NAME:() 2011-02-17 06:27:37.051
2 of 8
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
3 of 8
STATUS
-----0
0
0
0
0
0
0
0
0
0
ERR
---0
0
0
0
0
0
1
0
1
1
SNAP
--0
0
1
0
0
0
0
0
0
0
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
In this case, we can see that a few of the snapshots had errors (ERROR_COUNT > 0), but they still completed (STATUS = 0 and
FLUSH_ELAPSED is not NULL).
To investigate the errors, start with the WRM$_SNAP_ERROR table:
SQL> select * from wrm$_snap_error
where dbid = (select dbid from v$database)
order by snap_id;
SNAP_ID
---------97
99
100
DBID
---------581521084
581521084
581521084
INST
----1
1
1
TABLE_NAME
-----------------------------WRH$_SQL_PLAN
WRH$_SQL_PLAN
WRH$_SQL_PLAN
ERROR_NUMBER
-----------942
942
942
In most cases, the M00x process will write some information about the error to a file as well. It will be either an incident dump, or a regular
trace file. From 11g onwards incident dumps should always be collected for service requests where available. If there are no errors, but
status is non-zero, it means that the snapshot ran for too long and was terminated, or that it's still running. In those cases, it's good to trace
the snapshots to get some timing information.
If this happens with automatic snapshots, you may find that manual snapshots actually complete but take a long period of time to do so.
Tracing with normal SQL trace will tend to just show high CPU usage:
begin
dbms_workload_repository.create_snapshot();
end;
call
count
------- -----Parse
1
Execute
1
Fetch
0
------- -----total
2
cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.00
0.00
0
0
0
485.98
485.98
0
1
34
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------485.98
485.98
0
1
34
rows
---------0
1
0
---------1
This is not particularly useful in terms of diagnostics. You need to get further into the processing so as to determine the source of the issue.
The easiest way to trace the snapshots is the same one shown above:
SQL> alter session set "_swrf_test_action" = 10;
Session altered.
This will write some information (about 120 rows per snapshot) into the M00x process trace, including some timing information.
The output willl be similar to the following:
...
[Top SQL Selection], btime: 1961091529, etime:1961095597, Elapsed=4068
[WRH$_SQL_BIND_METADATA Attr Selection], btime: 1961095597, etime:1961095598, Elapsed=1
[WRH$_SQL_BIND_METADATA Update], btime: 1961095598, etime:1961095598, Elapsed=0
[WRH$_SQL_BIND_METADATA Insert], btime: 1961095598, etime:1961095599, Elapsed=1
[WRH$_SQL_BIND_METADATA Total], btime: 1961095597, etime:1961095599, Elapsed=2
[WRH$_SQL_PLAN Attr Selection], btime: 1961095599, etime:1961095601, Elapsed=2
[WRH$_SQL_PLAN Update], btime: 1961095601, etime:1961095602, Elapsed=1
[WRH$_SQL_PLAN Insert], btime: 1961095602, etime:1961095604, Elapsed=2
[WRH$_SQL_PLAN Total], btime: 1961095599, etime:1961095604, Elapsed=5
[WRH$_OPTIMIZER_ENV Attr Selection], btime: 1961095604, etime:1961095604, Elapsed=0
[WRH$_OPTIMIZER_ENV Update], btime: 1961095604, etime:1961095604, Elapsed=0
[WRH$_OPTIMIZER_ENV Insert], btime: 1961095604, etime:1961095604, Elapsed=0
[WRH$_OPTIMIZER_ENV Total], btime: 1961095604, etime:1961095604, Elapsed=0
[WRH$_SQLTEXT Attr Selection], btime: 1961095604, etime:1961095605, Elapsed=1
[WRH$_SQLTEXT Update], btime: 1961095605, etime:1961095605, Elapsed=0
[WRH$_SQLTEXT Insert], btime: 1961095605, etime:1961095605, Elapsed=0
[WRH$_SQLTEXT Total], btime: 1961095604, etime:1961095605, Elapsed=1
[WRH$_SQLSTAT All Stats], btime: 1961095605, etime:1961095618, Elapsed=13
[WRH$_SQLSTAT Total], btime: 1961095605, etime:1961095618, Elapsed=13
[WRH$_SQL_SUMMARY Total], btime: 1961095618, etime:1961095618, Elapsed=0
[SQL Group Total], btime: 1961091529, etime:1961095618, Elapsed=4089
[Snapshot Total], btime: 1961089445, etime:1961095618, Elapsed=6173
To interpret, look at the [Snapshot Total] at the end of the report and then see which like makes up the largest proportion. In this case the
[Top SQL Selection] section is using the time:
4 of 8
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
You can use this information to look for similar bugs or other occurrences.
If more details are required, then SQL tracing can be used to collect more information. There are two ways of getting SQL trace for
snapshots. The best way is usually to set module/action-specific tracing for the two flush actions:
begin
dbms_monitor.serv_mod_act_trace_enable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Auto-Flush Slave Action');
dbms_monitor.serv_mod_act_trace_enable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Remote-Flush Slave Action');
end;
/
Note: These commands will take effect for the entire instance, even though the syntax used is "alter session".
ADDM
There's one additional scenario where snapshots don't finish properly. At the end of the snapshot itself, we also start ADDM (the Automatic
Database Diagnostic Monitor), which analyzes ASH data to identify potential performance issues. If the call stack (use "pstack" or similar)
shows "keh" functions, ADDM is most likely running. The proper diagnosis of ADDM performance is a separate topic. However, a good place
to start would be to get SQL trace for the process using the 10046 trace event, and run tkprof to see if there are any unusual amounts of
waits or any plans that look bad.
Workarounds
When you review the trace information, you might find a single table that seems to be causing a problem. If so, you can try to disable the
flushing of that table, so that you can verify if snapshots work OK after that. Here's an example of how to disable flush of the table
WRH$_TEMPSTATXS. Please note that this command should be run on all instances in the cluster.
alter system set "_awr_disabled_flush_tables" = 'wrh$_tempstatxs';
Complications
In some cases, the SQL trace will show a lot of 'reliable message' waits. That just means that the local node (the one that started the
snapshot) is waiting for the remote nodes to complete. The real issue at that point will be in the trace files from the remote nodes.
In some cases, all local and remote nodes will have completed their snapshots successfully, but the local node doesn't realize that, and
keeps waiting for 'reliable message' until it hits the 15-minute timeout. This is believed to be caused by a bug, but it has not yet been
resolved.
If this is the case, each node will have a successfully completed snapshot (STATUS=0) with an elapsed time less than the timeout
(FLUSH_ELAPSED < 00:15:00.00).
Table timing information
Release 11.2.0.2 introduces a new table, WRM$_SNAPSHOT_DETAILS, which contains timing information for each individual table flush.
This can be used to determine which table is taking a lot of time. For instance, sometimes a single table takes close to 15 minutes, causing
the entire flush to fail.
set pagesize 999
5 of 8
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
6 of 8
TIME
----------------------------+000000000 00:00:00.238
+000000000 00:00:03.159
+000000000 00:00:00.476
+000000000 00:00:00.296
+000000000 00:00:00.041
+000000000 00:00:00.011
+000000000 00:00:00.018
+000000000 00:00:00.033
+000000000 00:00:00.071
+000000000 00:00:00.000
+000000000 00:00:00.067
+000000000 00:00:00.023
+000000000 00:00:00.204
+000000000 00:00:00.042
+000000000 00:00:00.040
+000000000 00:00:00.018
+000000000 00:00:00.031
+000000000 00:00:00.028
+000000000 00:00:00.125
+000000000 00:00:00.010
+000000000 00:00:00.079
+000000000 00:00:00.027
+000000000 00:00:00.033
+000000000 00:00:00.010
+000000000 00:00:00.047
+000000000 00:00:00.014
+000000000 00:00:00.012
+000000000 00:00:00.017
+000000000 00:00:00.016
+000000000 00:00:00.009
+000000000 00:00:00.026
+000000000 00:00:00.110
+000000000 00:00:00.052
+000000000 00:00:00.009
+000000000 00:00:00.048
+000000000 00:00:00.048
+000000000 00:00:00.246
+000000000 00:00:00.013
+000000000 00:00:00.040
+000000000 00:00:00.172
+000000000 00:00:00.009
+000000000 00:00:00.007
+000000000 00:00:00.007
+000000000 00:00:00.070
+000000000 00:00:00.071
+000000000 00:00:00.009
+000000000 00:00:00.027
+000000000 00:00:00.141
+000000000 00:00:00.032
+000000000 00:00:00.006
+000000000 00:00:00.012
+000000000 00:00:00.031
+000000000 00:00:00.028
+000000000 00:00:00.011
+000000000 00:00:00.036
+000000000 00:00:00.542
12/14/2014 12:19 AM
Document 1301503.1
Segment Group
Datafile Group
Tempfile Group
Service Group
Undo Group
WRH$_COMP_IOSTAT
WRH$_SGA_TARGET_ADVICE
WRH$_EVENT_HISTOGRAM
WRH$_MUTEX_SLEEP
WRH$_MEMORY_TARGET_ADVICE
WRH$_MEMORY_RESIZE_OPS
WRH$_IOSTAT_FUNCTION
WRH$_IOSTAT_FILETYPE
WRH$_IOSTAT_FILETYPE_NAME
WRH$_RSRC_CONSUMER_GROUP
WRH$_RSRC_PLAN
WRH$_MEM_DYNAMIC_COMP
WRH$_DISPATCHER
WRH$_SHARED_SERVER_SUMMARY
WRH$_IOSTAT_DETAIL
WRH$_MVPARAMETER
WRH$_TABLESPACE
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
+000000000
00:00:00.312
00:00:00.060
00:00:00.044
00:00:00.089
00:00:00.009
00:00:00.007
00:00:00.008
00:00:00.033
00:00:00.030
00:00:00.007
00:00:00.056
00:00:00.037
00:00:00.017
00:00:00.053
00:00:00.024
00:00:00.005
00:00:00.008
00:00:00.006
00:00:00.033
00:00:00.036
00:00:00.023
00:00:00.014
78 rows selected.
In this case, WRH$_PARAMETER_NAME took a lot more time than the other tables, but still not enough to be a problem.
Please note that WRH$_SQLTEXT was flushed twice. That's expected behaviour.
Known Issues
Community Discussions
Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window
is the live community not a screenshot)
Click here to open in main browser window
7 of 8
12/14/2014 12:19 AM
Document 1301503.1
hps://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-st...
All Places > My Oracle Support Community > Oracle Database (MOSC) > Database Tuning (MOSC) > Discussions
Additionally there is a thread discussing an article: AWR Report Interpretation Checklist for Diagnosing Databa
(Doc ID 1628089.1
33049 Views
here.
Tags:
REFERENCES
NOTE:748642.1 - How to Generate an AWR Report and Create Baselines
NOTE:761298.1 - MMON Suspended Due to ORA-12751 "cpu time or run time policy violation"
NOTE:1392603.1 - AWR or STATSPACK Snapshot collection extremely slow in 11gR2
NOTE:1086120.1 - Quick Instructions For Obtaining The Automatic Workload Repository (AWR) Report
NOTE:13257247.8 - Bug 13257247 - AWR Snapshot collection hangs due to slow inserts into WRH$_TEMPSTATXS.
NOTE:1363422.1 - Automatic Workload Repository (AWR) Reports - Main Information Sources
8 of 8
12/14/2014 12:19 AM