Five Tuning Tips For Your Data Warehouse

Five Tuning Tips For Your
Data Warehouse
Jeff Moss
My First Presentation
 Yes, my very first presentation

– For BIRT SIG
– For UKOUG
 Useful Advice from friends and colleagues
– Use graphics where appropriate
– Find a friendly or familiar face in the audience
– Imagine your audience is naked!
– …but like Oracle, be careful when combining advice!
Be Careful Combining Advice!
 Thanks for the

opportunity Mark!
Agenda
 My background
 Five tips
– Partition for success
– Squeeze your data with data segment compression
– Make the most of your PGA memory
– Beware of temporal data affecting the optimizer
– Find out where your query is at
 Questions
My Background
 Independent Consultant
 13 years Oracle experience
 Blog: http://oramossoracle.blogspot.com/
 Focused on warehousing / VLDB since 1998
 First project
– UK Music Sales Data Mart
– Produces BBC Radio 1 Top 40 chart and many more
– 2 billion row sales fact table
– 1 Tb total database size
 Currently working with Eon UK (Powergen)
– 4Tb Production Warehouse, 8Tb total storage
– Oracle Product Stack
What Is Partitioning ?
 “Partitioning addresses key issues in supporting very

large tables and indexes by letting you decompose them
into smaller and more manageable pieces called
partitions.” Oracle Database Concepts Manual, 10gR2
 Introduced in Oracle 8.0

 Numerous improvements since
 Subpartitioning adds another level of decomposition
 Partitions and Subpartitions are logical containers
Partition To Tablespace Mapping
 Partitions map to tablespaces P_JAN_2005

P_FEB_2005 T_Q1_2005
– Partition can only be in One P_MAR_2005
tablespace
P_APR_2005
– Tablespace can hold many P_MAY_2005 T_Q2_2005
partitions P_JUN_2005
– Highest granularity is One P_JUL_2005
tablespace per partition P_AUG_2005 T_Q3_2005

P_SEP_2005
– Lowest granularity is One
tablespace for all the partitions P_OCT_2005
P_NOV_2005
 Tablespace volatility P_DEC_2005
T_Q4_2005
– Read / Write
P_JAN_2006
– Read Only P_FEB_2006 T_Q1_2006
P_MAR_2006
Read / Write Read Only

Why Partition ? - Performance
 Improved query Sales Fact Table

JAN
performance FEB
MAR
SELECT SUM(sales)
FROM part_tab
APR
– Pruning or elimination MAY
JUN
WHERE sales_date BETWEEN ‘01-JAN-2005’
AND ’30-JUN-2005’
JUL
– Partition wise joins AUG
SEP
OCT
 Read only partitions NOV

DEC
– Quicker checkpointing
– Quicker backup
– Quicker recovery
– …but it depends on
mapping of:
– partition:tablespace:datafile
* Oracle 10gR2 Data Warehousing Manual

Why Partition ? -
Manageability
 Archiving
– Use a rolling window approach
– ALTER TABLE … ADD/SPLIT/DROP PARTITION…
 Easier ETL Processing
– Build a new dataset in a staging table
– Add indexes and constraints
– Collect statistics
– Then swap the staging table for a partition on the target
 ALTER TABLE…EXCHANGE PARTITION…
 Easier Maintenance
– Table partition move, e.g. to compress data
– Local Index partition rebuild
Why Partition ? - Scalability
 Partition is generally consistent and predictable

– Assuming an appropriate partitioning key is used
– …and data has an even distribution across the key
 Read only approach
– Scalable backups - read only tablespaces are ignored
– …so partitions in those tablespaces are ignored
 Pruning allows consistent query performance
Why Partition ? - Availability
 Offline data impact P_JAN_2005

P_FEB_2005
minimised
T_Q1_2005
P_MAR_2005
– …depending on granularity P_APR_2005

P_MAY_2005
– Quicker recovery P_JUN_2005
T_Q2_2005
– Pruned data not missed P_JUL_2005
– EXCHANGE PARTITION P_AUG_2005

P_SEP_2005
T_Q3_2005
 Allows offline build

P_OCT_2005
 Quick swap over P_NOV_2005 T_Q4_2005
P_DEC_2005
P_JAN_2006
P_FEB_2006 T_Q1_2006
P_MAR_2006
Read / Write Read Only

Fact Table Partitioning
Load Date Transaction Date

Tran Date Customer Load Date Tran Date Customer Load Date
January January 07-JAN-2005 Customer 1 09-JAN-2005

Partition 07-JAN-2005 Customer 1 09-JAN-2005 Partition 15-JAN-2005 Customer 2 17-JAN-2005
15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005
22-JAN-2005 Customer 3 01-FEB-2005
22-JAN-2005 Customer 3 01-FEB-2005

February February 02-FEB-2005 Customer 4 05-FEB-2005
02-FEB-2005 Customer 4 05-FEB-2005
Partition Partition 26-FEB-2005 Customer 5 28-FEB-2005
26-FEB-2005 Customer 5 28-FEB-2005
06-MAR-2005 Customer 2 07-MAR-2005

March March 06-MAR-2005 Customer 2 07-MAR-2005
12-MAR-2005 Customer 3 15-MAR-2005
Partition Partition 12-MAR-2005 Customer 3 15-MAR-2005
21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005

April 09-APR-2005 Customer 9 10-APR-2005 April
Partition Partition
Easier ETL Processing Harder ETL Processing

Each load deals with only 1 partition But still uses EXCHANGE PARTITION
No use to end user queries! Useful to end user queries

Can’t prune – Full scans! Allows full pruning capability
Watch out for…
 Partition exchange and table statistics 1
– Partition stats updated

– …but Global stats are NOT!
– Affects queries accessing multiple partitions
– Solution
 Gather stats on staging table prior to EXCHANGE
 Gather stats on partitioned table using GLOBAL
Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2

Partitioning Feature:
Characteristic Reason Matrix
Characteristic: Performance Manageability Scalability Availability
Feature:
Read Only   
Partitions
Pruning   
(Partition
Elimination)
Partition wise  
joins
Parallel DML 
Archiving   
Exchange    
Partition
Partition   
Truncation
Local Indexes    
What Is Data Segment
Compression ?
 Compresses data by eliminating intra block
repeated column values
 Reduces the space required for a segment
– …but only if there are appropriate repeats!
 Self contained
 Lossless algorithm
Where Can Data Segment
Compression Be Used ?
 Can be used with a number of segment types

– Heap & Nested Tables
– Range or List Partitions
– Materialized Views
 Can’t be used with
– Subpartitions
– Hash Partitions
– Indexes – but they have row level compression
– IOT
– External Tables
– Tables that are part of a Cluster
– LOBs
How Does Segment
Compression Work ?
ID DESCRIPTION CONTACT OUTCOME FOLLOWUP

TYPE
100
101
102 bill amount
Call to discuss new product TEL
MAIL NO
YES YES
N/A
Database Block
Symbol Table
1 100 4 NO 7 Call to discuss new product 10 102
2 Call to discuss bill amount 5 YES 8 MAIL
3 TEL 6 101 9 N/A
Row Data Area

1 2 3 4 5
6 7 8 4 9
10 7 3 5 9
Pros & Cons
 Pros  Cons
– Saves space – Increases CPU load
 Reduces LIO / PIO – Can only be used on Direct
 Speeds up Path operations
backup/recovery  CTAS
 Improves query response  Serial Inserts using
time INSERT /*+ APPEND */
– Transparent  Parallel Inserts (PDML)
 To readers  ALTER TABLE…MOVE…
 …and writers  Direct Path SQL*Loader
– Decreases time to perform – Increases time to perform
some DML some DML
 Deletes should be quicker  Bulk inserts may be
 Bulk inserts may be slower
quicker  Updates are slower
Ordering Your Data For
Maximum Benefits
 Colocate data to maximise compression benefits
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Uniformly distributed
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Colocated
 For maximum compression

– Minimise the total space required by the segment
– Identify most “compressable” column(s)
 For optimal access
– We know how the data is to be queried
– Order the data by
 Access path columns
 Then the next most “compressable” column(s)
Get Max Compression
Order Package
PROCEDURE mgmt_p_get_max_compress_order
Argument Name Type In/Out Default?

------------------------------ ----------------------- ------ --------
Running mgmt_p_get_max_compress_order...
P_TABLE_OWNER VARCHAR2 IN DEFAULT
----------------------------------------------------------------------------------------------------
Table
P_TABLE_NAME : BIG_TABLE VARCHAR2 IN
Sample Size : 10000
P_PARTITION_NAME
Unique Run ID: 25012006232119 VARCHAR2 IN DEFAULT
P_SAMPLE_SIZE
ORDER BY Prefix: NUMBER IN DEFAULT
----------------------------------------------------------------------------------------------------
P_PREFIX_COLUMN1 VARCHAR2
Creating MASTER Table : TEMP_MASTER_25012006232119 IN DEFAULT
P_PREFIX_COLUMN2
Creating COLUMN Table 1: COL1 VARCHAR2 IN DEFAULT
Creating COLUMN Table 2: COL2
P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT
Creating COLUMN Table 3: COL3
----------------------------------------------------------------------------------------------------
BEGIN
The output below lists each column in the table and the number of blocks/rows and space
used when the table data is ordered by only that column, or in the case where a prefix
mgmt_p_get_max_compress_order(p_table_owner
has been specified, where the table data is ordered by the => ‘AE_MGMT’
prefix and then that column.
,p_table_name =>’BIG_TABLE’
From this one can determine if there is a specific ORDER BY which can be applied to
to the data in order to maximise compression within the table whilst, in the case of a
,p_sample_size =>10000);
a prefix being present, ordering data as efficiently as possible for the most common
END:access path(s).
/ ----------------------------------------------------------------------------------------------------
NAME COLUMN BLOCKS ROWS SPACE_GB
============================== ============================== ============ ============ ========
TEMP_COL_001_25012006232119 COL1 290 10000 .0022
TEMP_COL_002_25012006232119 COL2 345 10000 .0026
TEMP_COL_003_25012006232119 COL3 555 10000 .0042
Data Warehousing Specifics
 Star Schema compresses better than Normalized

– More redundant data
 Focus on…
– Fact Tables and Summaries in Star Schema
– Transaction tables in Normalized Schema
 Performance Impact1
– Space Savings
 Star schema: 67%
 Normalized: 24%
– Query Elapsed Times
 Star schema: 16.5%
 Normalized: 10%
1 - Table Compression in Oracle 9iR2: A Performance Analysis

Things To Watch Out For
 DROP COLUMN is awkward

– ORA-39726: Unsupported add/drop column operation on
compressed tables
– Uncompress the table and try again - still gives ORA-39726!
 After UPDATEs data is uncompressed
– Performance impact
– Row migration
 Use appropriate physical design settings
– PCTFREE 0 - pack each block
– Large blocksize - reduce overhead / increase repeats per block
PGA Memory: What For ?
 Sorts
Serial Process
– Standard sorts [SORT]
– Buffer [BUFFER] PGA
– Group By [GROUP BY (SORT)] Cursors
– Connect By [CONNECT-BY (SORT)] Variables

– Rollup [ROLLUP (SORT)]
– Window [WINDOW (SORT)] Sort Area
 Hash Joins [HASH-JOIN]
 Indexes
– Maintenance [IDX MAINTENANCE SOR]
– Bitmap Merge [BITMAP MERGE]
– Bitmap Create [BITMAP CREATE]
Dedicated
 Write Buffers [LOAD WRITE BUFFERS] Server
[] V$SQL_WORKAREA.OPERATION_TYPE
PGA Memory
Management: Manual
 The “old” way of doing things
– Still available though – even in 10g R2
 Configuring
– ALTER SESSION SET WORKAREA_SIZE_POLICY=MANUAL;
– Initialisation parameter: WORKAREA_SIZE_POLICY=MANUAL
 Set memory parameters yourself
– HASH_AREA_SIZE
– SORT_AREA_SIZE
– SORT_AREA_RETAINED_SIZE
– BITMAP_MERGE_AREA_SIZE
– CREATE_BITMAP_AREA_SIZE
 Optimal values depend on the type of work1
– One size does not fit all!
1 - Richmond Shee: If Your Memory Serves You Right

PGA Memory Management:
Automatic
 The “new” way from 9i R1
– Default OFF in 9i R1/R2
 Enabled by setting at session/instance level:
– WORKAREA_SIZE_POLICY=AUTO
– PGA_AGGREGATE_TARGET > 0
– Default ON since 10g R1
 Oracle dynamically manages the available
memory to suit the workload
– But of course, it’s not perfect!
Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
Auto PGA Parameters: Pre 10gR2
 WORKAREA_SIZE_POLICY
– Set to AUTO
 PGA_AGGREGATE_TARGET
– The target for summed PGA across all processes
– Can be exceeded if too small
 Over Allocation
 _PGA_MAX_SIZE
– Target maximum PGA size for a single process
– Default is a fixed value of 200Mb
– Hidden / Undocumented Parameter
 Usual caveats apply
Auto PGA Parameters : Pre 10gR2
 _SMM_MAX_SIZE
– Limit for a single workarea operation for one process
– Derived Default
 LEAST(5% of PGA_AGGREGATE_TARGET
, 50% of _PGA_MAX_SIZE)
 Hits limit of 100Mb
– When PGA_AGGREGATE_TARGET is >= 2000Mb
– And _PGA_MAX_SIZE is left at default of 200Mb
– Hidden / Undocumented Parameter

Auto PGA Parameters : Pre 10gR2
 _SMM_PX_MAX_SIZE PGA_AGGREGATE_TARGET: 3000Mb

_PGA_MAX_SIZE = 200Mb
_SMM_MAX_SIZE = 100Mb
– Limit for all the parallel slaves _SMM_PX_MAX_SIZE = 900Mb
of a single workarea operation

– Derived Default Session 1 Session 2 Session 3 Session 4
 30% of PGA_AGGREGATE_TARGET 100Mb

75Mb 100Mb
75Mb 100Mb
75Mb 100Mb
75Mb
– Hidden / Undocumented Session 5 Session 6 Session 7 Session 8

Parameter 100Mb
75Mb 100Mb
75Mb 100Mb
75Mb 100Mb
75Mb
Session 9 Session 10 Session 11 Session 12
– Parallel slaves still limited 75Mb 75Mb 75Mb 75Mb
 _SMM_MAX_SIZE
– Impacts only when…
 _SMM_PX_MAX_SIZE 
Degree Of Parallelism  CEILING  
 _SMM_MAX_SIZE 
10gR2 Improvements
 _SMM_MAX_SIZE now the driver

– More advanced algorithm
PGA_AGGREGATE_TARGET _SMM_MAX_SIZE
<= 500Mb 20% * PGA_AGGREGATE_TARGET
500Mb – 1000Mb 100Mb
1000Mb + 10% * PGA_AGGREGATE_TARGET
– _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE
 Parallel operations
– _SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET
– When DOP <=5 then _smm_max_size is used
– When DOP > 5 _smm_px_max_size / DOP is used
Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
PGA Target Advisor
select trunc(pga_target_for_estimate/1024/1024) pga_target_for_estimate
, to_char(pga_target_factor * 100,'999.9') ||'%' pga_target_factor
, trunc(bytes_processed/1024/1024) bytes_processed
, trunc(estd_extra_bytes_rw/1024/1024) estd_extra_bytes_rw
, to_char(estd_pga_cache_hit_percentage,'999') ||
'%' estd_pga_cache_hit_percentage
, estd_overalloc_count
from v$pga_target_advice
/
PGA Target For PGA Tgt Estimated Extra Estimated PGA Estimated
Estimate Mb Factor Bytes Processed Bytes Read/Written Cache Hit % Overallocation Count
-------------- ------- ---------------- ------------------ --------------- --------------------
5,376 12.5% 5,884,017 7,279,799 45% 113
10,752 25.0% 5,884,017 3,593,510 62% 8
21,504 50.0% 5,884,017 3,140,993 65% 0
32,256 75.0% 5,884,017 3,104,894 65% 0
43,008 100.0% 5,884,017 2,300,826 72% 0
51,609 120.0% 5,884,017 2,189,160 73% 0
60,211 140.0% 5,884,017 2,189,160 73% 0
68,812 160.0% 5,884,017 2,189,160 73% 0
77,414 180.0% 5,884,017 2,189,160 73% 0
86,016 200.0% 5,884,017 2,189,160 73% 0
129,024 300.0% 5,884,017 2,189,160 73% 0
172,032 400.0% 5,884,017 2,189,160 73% 0
258,048 600.0% 5,884,017 2,189,160 73% 0
Beware Of Temporal Data
Affecting The Optimizer
 Slowly Changing Dimensions
– Cover ranges of time
– “From” and “To” DATE columns define applicability
– Need BETWEEN operator to retrieve rows for a reporting point in time
SELECT * FROM d_customer
WHERE ’15/01/2005’ BETWEEN valid_from AND valid_to
CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE
487438 Jeff Moss SME Month 1
1st Jan, 2004
D_CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO
487438 Jeff Moss SME 01/01/2004
CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE
487438 Jeff Moss I&C
839398 Mark Rittman SME
Month 2
D_CUSTOMER 1st Feb, 2004
CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO
487438 Jeff Moss SME 01/01/2004 31/01/2004
487438 Jeff Moss I&C 01/02/2004
839398 Mark Rittman SME 01/02/2004
Dependent Predicates
 When multiple predicates exist, individual selectivities

are combined using standard probability math1:
– P1 AND P2
S(P1 & P2) = S(P1) * S(P2)
– P1 OR P2
S(P1 | P2) = S(P1) + S(P2) – [S(P1) * S(P2)]
 Only valid if the predicates are independent otherwise…
– Incorrect selectivity estimate
– Incorrect cardinality estimate
– Potentially suboptimal execution plan
 BETWEEN is multiple predicates!
 Also known as Correlated Columns2
1 – Wolfgang Breitling, Fallacies Of The Cost Based Optimizer
2 – Jonathan Lewis, Cost-Based Oracle Fundamentals, Chapter 6
Some Test Tables…
 Consider these 3 test tables…

 12 records in an SCD type table
TEST_12_DISTINCT_TD TEST_2_DISTINCT_TD TEST_1_DISTINCT_TD
Key Non Key Attr From To Key Non Key Attr From To Key Non Key Attr From To
1 Jeff 01-Jan-2005 31-Jan-2005 1 Jeff 01-Jan-2005 30-Jun-2005 1 Jeff 01-Jan-2005 31-Dec-2005
2 Mark 01-Feb-2005 28-Feb-2005 2 Mark 01-Feb-2005 30-Jun-2005 2 Mark 01-Feb-2005 31-Dec-2005
3 Doug 01-Mar-2005 31-Mar-2005 3 Doug 01-Mar-2005 30-Jun-2005 3 Doug 01-Mar-2005 31-Dec-2005
4 Niall 01-Apr-2005 30-Apr-2005 4 Niall 01-Apr-2005 30-Jun-2005 4 Niall 01-Apr-2005 31-Dec-2005
5 Tom 01-May-2005 31-May-2005 5 Tom 01-May-2005 30-Jun-2005 5 Tom 01-May-2005 31-Dec-2005
6 Jonathan 01-Jun-2005 30-Jun-2005 6 Jonathan 01-Jun-2005 30-Jun-2005 6 Jonathan 01-Jun-2005 31-Dec-2005
7 Lisa 01-Jul-2005 31-Jul-2005 7 Lisa 01-Jul-2005 31-Dec-2005 7 Lisa 01-Jul-2005 31-Dec-2005
8 Cary 01-Aug-2005 31-Aug-2005 8 Cary 01-Aug-2005 31-Dec-2005 8 Cary 01-Aug-2005 31-Dec-2005
9 Mogens 01-Sep-2005 30-Sep-2005 9 Mogens 01-Sep-2005 31-Dec-2005 9 Mogens 01-Sep-2005 31-Dec-2005
10 Anjo 01-Oct-2005 31-Oct-2005 10 Anjo 01-Oct-2005 31-Dec-2005 10 Anjo 01-Oct-2005 31-Dec-2005
11 Larry 01-Nov-2005 30-Nov-2005 11 Larry 01-Nov-2005 31-Dec-2005 11 Larry 01-Nov-2005 31-Dec-2005
12 Pete 01-Dec-2005 31-Dec-2005 12 Pete 01-Dec-2005 31-Dec-2005 12 Pete 01-Dec-2005 31-Dec-2005
Optimizer Gets Incorrect
Cardinality
select * from test_1_distinct_td
where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date;
KEY NON_KEY_AT FROM_DATE TO_DATE

---------- ---------- --------- --------- Key Non Key Attr From To
1 Jeff 01-JAN-05 31-DEC-05 1 Jeff 01-Jan-2005 31-Jan-2005
2 Mark 01-FEB-05 31-DEC-05 2 Mark 01-Feb-2005 28-Feb-2005
3 Doug 01-MAR-05 31-DEC-05 3 Doug 01-Mar-2005 31-Mar-2005
4 Niall 01-Apr-2005 30-Apr-2005
4 Niall 01-APR-05 31-DEC-05
5 Tom 01-May-2005 31-May-2005
5 Tom 01-MAY-05 31-DEC-05
6 Jonathan 01-Jun-2005 30-Jun-2005
6 Jonathan 01-JUN-05 31-DEC-05 7 Lisa 01-Jul-2005 31-Jul-2005
7 Lisa 01-JUL-05 31-DEC-05 8 Cary 01-Aug-2005 31-Aug-2005
8 Cary 01-AUG-05 31-DEC-05 9 Mogens 01-Sep-2005 30-Sep-2005
9 Mogens 01-SEP-05 31-DEC-05 10 Anjo 01-Oct-2005 31-Oct-2005
10 Anjo 01-OCT-05 31-DEC-05 11 Larry 01-Nov-2005 30-Nov-2005
12 Pete 01-Dec-2005 31-Dec-2005
10 rows selected.
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 264 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_1_DISTINCT_TD | 11 | 264 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
…And Again

KEY NON_KEY_AT FROM_DATE TO_DATE Key Non Key Attr From To

1 Jeff 01-Jan-2005 31-Jan-2005
---------- ---------- --------- ---------
2 Mark 01-Feb-2005 28-Feb-2005
7 Lisa 01-JUL-05 31-DEC-05
3 Doug 01-Mar-2005 31-Mar-2005
8 Cary 01-AUG-05 31-DEC-05 4 Niall 01-Apr-2005 30-Apr-2005
9 Mogens 01-SEP-05 31-DEC-05 5 Tom 01-May-2005 31-May-2005
10 Anjo 01-OCT-05 31-DEC-05 6 Jonathan 01-Jun-2005 30-Jun-2005
7 Lisa 01-Jul-2005 31-Jul-2005
4 rows selected. 8 Cary 01-Aug-2005 31-Aug-2005
9 Mogens 01-Sep-2005 30-Sep-2005
10 Anjo 01-Oct-2005 31-Oct-2005
11 Larry 01-Nov-2005 30-Nov-2005
Execution Plan 12 Pete 01-Dec-2005 31-Dec-2005
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
…And Again

Key Non Key Attr From To
KEY NON_KEY_AT FROM_DATE TO_DATE 1 Jeff 01-Jan-2005 31-Jan-2005
---------- ---------- --------- --------- 2 Mark 01-Feb-2005 28-Feb-2005
10 Anjo 01-OCT-05 31-OCT-05 3 Doug 01-Mar-2005 31-Mar-2005
4 Niall 01-Apr-2005 30-Apr-2005
5 Tom 01-May-2005 31-May-2005
1 row selected. 6 Jonathan 01-Jun-2005 30-Jun-2005
7 Lisa 01-Jul-2005 31-Jul-2005
8 Cary 01-Aug-2005 31-Aug-2005
9 Mogens 01-Sep-2005 30-Sep-2005
10 Anjo 01-Oct-2005 31-Oct-2005
11 Larry 01-Nov-2005 30-Nov-2005
12 Pete 01-Dec-2005 31-Dec-2005
Execution Plan
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Workarounds
 Ignore it
– If your query still gets the right plan of course!
 Hints
– Force the optimizer to do as you tell it
 Stored outlines
 Adjust statistics held against the table
– Affects any SQL that accesses that object
 Optimizer Profile (10g)
– Offline Optimisation1
 Dynamic sampling level 4 or above
– Samples “single table predicates that reference 2 or more
columns”
– Takes extra time during the parse – minimal but often worth it
1 - Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
Dynamic Sampling With A Hint
select /*+ dynamic_sampling(test_1_distinct_td,4) */ *

from test_1_distinct_td
KEY NON_KEY_AT FROM_DATE TO_DATE

Key Non Key Attr From To
---------- ---------- --------- --------- 1 Jeff 01-Jan-2005 31-Dec-2005
1 Jeff 01-JAN-05 31-DEC-05 2 Mark 01-Feb-2005 31-Dec-2005
2 Mark 01-FEB-05 31-DEC-05 3 Doug 01-Mar-2005 31-Dec-2005
3 Doug 01-MAR-05 31-DEC-05 4 Niall 01-Apr-2005 31-Dec-2005
4 Niall 01-APR-05 31-DEC-05 5 Tom 01-May-2005 31-Dec-2005
5 Tom 01-MAY-05 31-DEC-05 6 Jonathan 01-Jun-2005 31-Dec-2005
7 Lisa 01-Jul-2005 31-Dec-2005
6 Jonathan 01-JUN-05 31-DEC-05
8 Cary 01-Aug-2005 31-Dec-2005
7 Lisa 01-JUL-05 31-DEC-05 9 Mogens 01-Sep-2005 31-Dec-2005
8 Cary 01-AUG-05 31-DEC-05 10 Anjo 01-Oct-2005 31-Dec-2005
9 Mogens 01-SEP-05 31-DEC-05 11 Larry 01-Nov-2005 31-Dec-2005
10 Anjo 01-OCT-05 31-DEC-05 12 Pete 01-Dec-2005 31-Dec-2005
10 rows selected.
Execution Plan
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Find Out Where Your Query Is At
 Data Warehouses are big, big, BIG!

– Big on rows
– Big on disk storage
– Big on hardware
– Big SQL statements issued
 Lots of data to scan, join and sort
 Many operations
 Long running
 So where is my long running query at ?
– No solid answers here, just food for thought…
A “Big” Query Execution Plan
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 124 | | 49722 (10)|
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ20006 | 1 | 124 | | 49722 (10)|
| 3 | HASH JOIN | | 1 | 124 | | 49722 (10)|
Sorts
| 4 | BUFFER SORT | | | | | |
| 5 | PX RECEIVE | | 207K| 9510K| | 25982 (9)| 
| 6 | PX SEND BROADCAST | :TQ20000 | 207K| 9510K| | 25982 (9)|
Aggregations
| 7 | VIEW | | 207K| 9510K| | 25982 (9)|
| 8 | WINDOW SORT | | 207K| 10M| 26M| 25982 (9)| 
| 9 | MERGE JOIN | | 207K| 10M| | 25976 (9)|
 Hash joins
| 10 | TABLE ACCESS BY INDEX ROWID| AML_T_ANALYSIS_DATE | 1 | 22 | | 2 (0)|
| 11 | INDEX UNIQUE SCAN | AML_I_ANL_PK | 1 | | | 0 (0)|
| 12 | SORT AGGREGATE | | 1 | 9 | | |
| 13 |
| 14 |
| 15 |
PX COORDINATOR
PX SEND QC (RANDOM)
SORT AGGREGATE
|
|
|
:TQ10000
|
|
|
|
1 |
1 |
|
9 |
9 |
|
|
|
|
|
|
 Merge joins
| 16 |
| 17 |
| 18 | FILTER
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_ANALYSIS_DATE
|
|
|
|
1 |
1 |
|
9 |
9 |
|
|
|
|
2
2
(0)|
(0)|
|
 Table scans
| 19 |
| 20 |
FILTER
TABLE ACCESS FULL
| |
| AML_T_BILLING_ACCOUNT_DIM|
|
82M| 2371M|
| |
| 5457
|
(5)|  Materialized
View scans
| 21 | HASH JOIN | | 18M| 1340M| | 23704 (10)|
| 22 | HASH JOIN | | 10M| 500M| | 17005 (11)|
| 23 | PX RECEIVE | | 10M| 265M| | 11304 (14)|
 Analytics
| 24 | PX SEND HASH | :TQ20003 | 10M| 265M| | 11304 (14)|
| 25 | BUFFER SORT | | 1 | 124 | | |
| 26 | VIEW | AML_V_MD_CUH_SID | 10M| 265M| | 11304 (14)|
 Parallel
| 27 | HASH JOIN | | 10M| 337M| | 11304 (14)|
| 28 | PX RECEIVE | | 17M| 310M| | 5228 (18)|
| 29 | PX SEND HASH | :TQ20001 | 17M| 310M| | 5228 (18)|
| 30 |
| 31 |
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_MEASURE_DIM
|
|
17M|
17M|
310M|
310M|
| 5228 (18)|
| 5228 (18)|
Query
Pruning
| 32 | PX RECEIVE | | 34M| 461M| | 5958 (10)|
| 33 | PX SEND HASH | :TQ20002 | 34M| 461M| | 5958 (10)| 
| 34 | PX BLOCK ITERATOR | | 34M| 461M| | 5958 (10)|
Temp Space
| 35 | TABLE ACCESS FULL | AML_T_CUSTOMER_DIM | 34M| 461M| | 5958 (10)|
| 36 | PX RECEIVE | | 55M| 1212M| | 5562 (3)| 
| 37 | PX SEND HASH | :TQ20004 | 55M| 1212M| | 5562 (3)|
| 38 |
| 39 |
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_CUSTOMER_DIM
|
|
55M| 1212M|
55M| 1212M|
| 5562
| 5562
(3)|
(3)| Use
| 40 | PX RECEIVE | | 94M| 2516M| | 6483 (5)|
| 41 | PX SEND HASH | :TQ20005 | 94M| 2516M| | 6483 (5)|
| 42 | PX BLOCK ITERATOR | | 94M| 2516M| | 6483 (5)|
| 43 | MAT_VIEW ACCESS FULL | AML_M_CD_BAD | 94M| 2516M| | 6483 (5)|
V$ Views To The Rescue ?
 V$SESSION – Identify your session

 V$SQL_PLAN – Get the execution plan operations
 V$SQL_WORKAREA – Get all the work areas which will be required
 V$SESSION_LONGOPS – Get information on long plan operations
 V$SQL_WORKAREA_ACTIVE – Get the work area(s) being used right now
V$SESSION V$SQL_PLAN V$SQL_WORKAREA V$SQL_WORKAREA_ACTIVE

SID SQL_ID SQL_ID SQL_ID
SERIAL# CHILD_NUMBER CHILD_NUMBER SQL_HASH_VALUE
PROGRAM ADDRESS WORKAREA_ADDRESS WORKAREA_ADDRESS
USERNAME HASH_VALUE OPERATION_ID OPERATION_ID
SQL_ID OPERATION OPERATION_TYPE OPERATION_TYPE
SQL_CHILD_NUMBER ID POLICY
SQL_ADDRESS PARENT_ID SID
SQL_HASH_VALUE V$SESSION_LONGOPS QCSID
ACTIVE_TIME
SID
SERIAL#
OPNAME
TARGET
MESSAGE
SQL_ID
SQL_ADDRESS
SQL_HASH_VALUE
ELAPSED_SECONDS
Demonstration
Problems
 V$SQL_PLAN Bug
– Service Request: 4990863.992
– Broken in 10gR1, Works in 10gR2
– PARENT_ID corruption
 Can’t link rows in this view to their parents as the values are
corrupted due to this bug
 Shows up in TEMP TABLE TRANSFORMATION operations
 Multiple Work Areas can be active…or None
 Some operations are not shown in Long ops
 V$SESSION sql_id may not be the executing cursor
– E.g. for refreshing Materialized View
* Test case for bug: http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql

Questions ?
References: Papers
 Table Compression in Oracle 9iR2: A Performance Analysis

 Table Compression in Oracle 9iR2: An Oracle White Paper
 “Fallacies Of The Cost Based Optimizer”, Wolfgang Breitling
 “Scaling To Infinity, Partitioning In Oracle Data Warehouses”, Tim Gorman
 Advanced Management Of Working Areas in Oracle 9i/10g, UKOUG 2005, Joze Senegacnik
 Oracle9i Memory Management: Easier Than Ever, Oracle Open World 2002, Sushil Kumar
 Working with Automatic PGA, Christo Kutrovsky
 Optimising Oracle9i Instance Memory, Ramaswamy, Ramesh
 Oracle Metalink Note 223730.1: Automatic PGA Memory Managment in 9i
 Oracle Metalink Note 147806.1:
Oracle9i New Feature: Automated SQL Execution Memory Management
 Oracle Metalink Note 148346.1:
Oracle9i Monitoring Automated SQL Execution Memory Management
 Memory Management and Latching Improvements in Oracle Database 9i and 10g
, Oracle Open World 2005, Tanel Pőder
 If Your Memory Serves You Right…, IOUG Live! 2004, April 2004, Toronto, Canada,
Richmond Shee
 Decision Speed: Table Compression In Action
References: Online
Presentation / Code
 http://www.oramoss.demon.co.uk/presentations/fivetuningtipsforyourdatawarehouse.ppt
 http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc
 http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql
 http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql
 http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql
 http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql
 http://www.oramoss.demon.co.uk/Code/run_big_query.sql
 http://www.oramoss.demon.co.uk/Code/run_big_query_parallel.sql
 http://www.oramoss.demon.co.uk/Code/get_query_progress.sql

Five Tuning Tips For Your Data Warehouse

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Five Tuning Tips For Your Data Warehouse

Încărcat de

Drepturi de autor:

Formate disponibile

Five Tuning Tips For Your

 Yes, my very first presentation

 Thanks for the

 “Partitioning addresses key issues in supporting very

 Introduced in Oracle 8.0

 Partitions map to tablespaces P_JAN_2005

– Highest granularity is One P_JUL_2005

tablespace per partition P_AUG_2005 T_Q3_2005

Read / Write Read Only

 Improved query Sales Fact Table

 Read only partitions NOV

* Oracle 10gR2 Data Warehousing Manual

 Partition is generally consistent and predictable

 Offline data impact P_JAN_2005

– …depending on granularity P_APR_2005

– Pruned data not missed P_JUL_2005

– EXCHANGE PARTITION P_AUG_2005

 Allows offline build

Read / Write Read Only

Load Date Transaction Date

January January 07-JAN-2005 Customer 1 09-JAN-2005

22-JAN-2005 Customer 3 01-FEB-2005

06-MAR-2005 Customer 2 07-MAR-2005

21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005

Easier ETL Processing Harder ETL Processing

No use to end user queries! Useful to end user queries

 Partition exchange and table statistics 1

– Partition stats updated

Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2

 Can be used with a number of segment types

ID DESCRIPTION CONTACT OUTCOME FOLLOWUP

Row Data Area

 For maximum compression

Argument Name Type In/Out Default?

 Star Schema compresses better than Normalized

1 - Table Compression in Oracle 9iR2: A Performance Analysis

 DROP COLUMN is awkward

– Connect By [CONNECT-BY (SORT)] Variables

1 - Richmond Shee: If Your Memory Serves You Right

– Hidden / Undocumented Parameter

 _SMM_PX_MAX_SIZE PGA_AGGREGATE_TARGET: 3000Mb

of a single workarea operation

 30% of PGA_AGGREGATE_TARGET 100Mb

– Hidden / Undocumented Session 5 Session 6 Session 7 Session 8

 _SMM_MAX_SIZE now the driver

 When multiple predicates exist, individual selectivities

 Consider these 3 test tables…

KEY NON_KEY_AT FROM_DATE TO_DATE

select * from test_2_distinct_td

KEY NON_KEY_AT FROM_DATE TO_DATE Key Non Key Attr From To

select * from test_12_distinct_td

select /*+ dynamic_sampling(test_1_distinct_td,4) */ *

KEY NON_KEY_AT FROM_DATE TO_DATE

 Data Warehouses are big, big, BIG!

 V$SESSION – Identify your session

V$SESSION V$SQL_PLAN V$SQL_WORKAREA V$SQL_WORKAREA_ACTIVE

* Test case for bug: http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql

 Table Compression in Oracle 9iR2: A Performance Analysis

S-ar putea să vă placă și

select /+ dynamic_sampling(test_1_distinct_td,4) / *