Sunteți pe pagina 1din 24

ADVT SQL Plan Explained

1. The query is the same each day except for the date. 2. A lot of similar queries with different predicate on column source.

The query run for long time because of high LIO.

NESTED LOOPS JOIN with bad cardinality estimate on first row source is a major reason for high LIO and CPU usage.

1. Those small rows are usually the results of missing stats. 2. For this specific case, the query started before partition stats were ready. 3. When there are no partition stats, global stats would be used.

1. Here is the plan currently running, using SQL profile to force hash join. 2. Note the high cost of HASH GROUP BY at the bottom.

Query Structure
1. Three views SELECT /*+ parallel(d,4) full(d) */ 2. One simple view (will be merged) 3. Two Complex views (cannot be merged)

FROM SOURCE_BY_SRCH_DLY_REV_MASK s, ( Complex View based on source_search_type_daily t1 ) t, (Complex View based on DM_SUMMARY_DAILY d ) d where 1. The troublemaker d.datestamp >= s.start_date and d.datestamp <= nvl(s.end_date,sysdate) ... Other join condition group by d.domain, d.source, d.search_type, d.query_source, t.revenue, t.adjusted_revenue, s.descriment_rev_pct
1. 2. 3. 4. Three Tables: SRC_BY_SRCH_DREV_MASK_ED: small code table DM_SUMMARY_DAILY: daily partitioned. Using a single partition. SOURCE_SEARCH_TYPE_DAILY: daily partitioned. Using a single partition.

Inline View T
SELECT mrkt_id, datestamp, SOURCE, query_source, search_type, SUM (revenue) revenue, SUM (adjusted_revenue) adjusted_revenue FROM source_search_type_daily t1 WHERE t1.datestamp = TO_DATE ('20120715', 'yyyymmdd' ) AND t1.mrkt_id = 0 GROUP BY mrkt_id, 1. Access a single partition of datestamp, SOURCE_SEARCH_TYPE_DAILY. SOURCE, 2. MRKT_ID=0 (skewed, about half of the query_source, data) search_type

Inline View D
select mrkt_id, datestamp, SOURCE, query_source, search_type, domain, pageview_type, country_of_origin , sum(pageviews) pageviews, sum(bidded_searches) bidded_searches, sum(bidded_results) bidded_results, sum(bidded_clicks) bidded_clicks, sum(revenue) revenue from DM_SUMMARY_DAILY d where d.datestamp = to_date('20120715' , 'yyyymmdd' ) and d.source like 'geosign%derp and d.mrkt_id = 0 group by mrkt_id, datestamp, SOURCE, query_source, search_type, domain, pageview_type, country_of_origin
1. Access a single partition of DM_SUMMARY_DAILY. 2. MRKT_ID=0 3. SOURCE uses LIKE expr.

How to Analyze This Query?


Verify the row (cardinality) estimate and see if we can calculate it with available information from table statistics. If yes, check if anything wrong with the stats. The last resort is using 10053 events to understand how Oracle comes with a bad plan or good plan.

How to Calculate Cardinality


1. 2. Cardinality = (num_rows) * (selectivity of column 1) * (selectivity of column 2) * *(selectivity of column n) Column Selectivity =
1. 2. 3. Without histograms or with bind value: 1/(number of distinct values (NDV)) With frequency histograms: (number of buckets for the specified value) /total bucket number With height balance histograms: if the value occupied more that 1 bucket, see 2. Otherwise, use the density from column stats, but we can always use 1/NDV as reference. For inequality predicate with bind variable or function: 0.05

4.

Bad Plan
DM_SUMMARY_DAILY
Partition Stats not ready Global: rows: 23,451,579,811 Global NDV: datestamp: 2145, mrkt_id: 25, source: 65524 Estimate: 23,451,579,811*(1/2145)*(1/25)*(1/65524) = 6.6742, round up to 7. Actual Partition Stats: rows: 5,127,832, datestamp: 1, mrkt_id: 23, source: 601 Estimated if using part stats when it was ready: 5,127,832*(1/1)*(1/23)*(1/601) = 371. If using histograms for mrkt_id=0 (3946 out of 5551 bucket numbers): 5,127,832*(3946/5551)*(1/601) = 6065

SRC_BY_SRCH_DREV_MASK_ED
No stats. Default to (block_size-cache layer)*blocks/100. block_size is 16K, blocks is 5. 16*1024*5/100 = 819.2. Not sure about the value of cache layer.

SOURCE_SEARCH_TYPE_DAILY, per (datestamp,mrkt_id,source)


No partition and global stats captured.

We blamed lacking of stats was the reason. So I will skip further research on this plan.

Good Plan With SQL profile


DM_SUMMARY_DAILY
Actual Partition Stats: rows: 5,193,086, datestamp: 1, mrkt_id: 21, source: 609 (huge diff from global stats) Estimated if using part stats: 5,193,086*(1/1)*(1/21)*(1/609) = 406. If using histograms for mrkt_id=0 (3924 out of 5615 buckets) and for source like geosign%drep (6 out of 254 buckets): 5,193,086*(3924/5615)*(6/254) = 85,727.

SRC_BY_SRCH_DREV_MASK_ED
Still use default 818 rows. Actual value is 61.

SOURCE_SEARCH_TYPE_DAILY
partition stats: rows: 3,312,381, datestamp: 1, mrkt_id: 26 (histograms for value 0: 3015 out of 5590), source: 11890 When using hash join, with datestamp and mrkt_id=0, 3,312,381 *(3015/5590) = 1,786,542 (1786K in the plan). When use join predicate push down with column source, for each (datestamp, mrkt_id,source) is 3,312,381*(3015/5590)*(1/11890) = 150. Here column source is treated as bind value.

MRKT_ID Histograms

Data is skewed on MRKT_ID=0

SOURCE Histograms

Not easy to count the actual buckets

How Oracle evaluate join orders?


Estimate cardinalities from each row source,
SRC_BY_SRCH_DREV_MASK_ED: 818 View D on DM_SUMMARY_DAILY
406 or 85,727, depending on if histograms available or not

View S on SOURCE_SEARCH_TYPE_DAILY
1,786,542

Oracle normally starts from the row source with smallest table, then next smaller one, and eventually all the combinations (factorial of total number of tables, here is 3! = 6). So in this case, if histograms is used, the first table will be SRC_BY_SRCH_DREV_MASK_ED, otherwise, it will be DM_SUMMARY_DAILY. Since the view on SOURCE_SEARCH_TYPE_DAILY is the last to evaluate, the cardinality estimate for it is usually not very important, but the costs for different access methods will be very important and will be very sensitive to the output counts of the join from the other two tables.

Join Cardinality Between S and D


Join Cardinality =
(num_rowsS num_nullS)*(num_rowsD num_nullD) /max(ndv(mrkt_idS),max(ndv(mrkt_idD))

NDV 21 is found from 10053 trace for the small table. It is interesting how Oracle derives this default value, because it is actual NDV of the other table at partition level. If no histograms is used:
(818-0)*(406-0)/max(21,1) = 15,814

If histogram is used:
(818-0)*(85727-0)/max(21,1) = 3,339,270

Join Cardinality Between S and D


After filtered by d.datestamp >= s.start_date and d.datestamp <=
nvl(s.end_date,sysdate) Each inequality join filter will downgrade cardinality to 5%, so here is 5%*5%=0.25%.

Without histograms: 39.535 -> 40 With histograms: 8348.175 -> 8349 (plan uses 8347) Fortunately, the result is inflated by SRC_BY_SRCH_DREV_MASK_ED, by 818/61 = 13.4 times. Side note: when dynamic sampling was used as attempt to resolve this issue, it gave the actual count of SRC_BY_SRCH_DREV_MASK_ED, that is, 61. So even with histograms, the join cardinality estimate is only at 622, not enough for Oracle to pick up the right plan.

FTS Cost
FTS CPU cost formula: cost = (#SRds + #MRds*mreadtim/sreadtim + #CPUCycles/(cpuspeed*sreadtim) When using noworkload statistics, like in this case MBRC = db_file_multiblock_read_count Sreadtim = ioseektim + db_block_size/iotfrspeed Mreadtim = ioseektim + db_file_multiblock_read_count* db_block_size/iotfrspeed #SRds: number of single block reads #MRds: number of multiple block reads with size of db_file_multiblock_read_count.

Cost Estimate For View T


FTS on SOURCE_SEARCH_TYPE_DAILY 26,243 blocks, Parameters (from 10053, except CPUSPEEDNW, all default values)
db_file_multiblock_read_count:16 CPUSPEEDNW: 1583 millions instructions/sec (default is 100) IOTFRSPEED: 4096 bytes per millisecond (default is 4096) IOSEEKTIM: 10 milliseconds (default is 10)

sreadtim = (10 + 16*1024/4096) = 14 mreadtim = (10 + 16*16*1024/4096) = 74 FTS Cost = (0+26,243*74/14) + cpu_cost = 8669.5625 + cpu_cost The plan used cost 8736. The difference is from cpu_cost to read rows and filter the result. Because view T is aggregated complex view, there is a huge cost associated with it for sorting and grouping, making the total cost at 31,796.

Index Scan Cost


Cost = blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling(clustering_factor*effective table selectivity) Effective index selectivity is the calculated as multiplications of all leading columns inside the index specified in the predicates. If an index has more columns than the predicates, stop when encounter the first column without

Cost Estimate For T


Cost estimate via JPPD (join predicate push down), per (datestamp, mrkt_id,source), via index range scan Index IDX2_SOURCE_SEARCH_TYPE_DAILY
Blevel: 2 Leaf_blocks: 11818 Clustering_factor: 530,949 Effective selectivity: sel(datestamp)*sel(mrkt_id)*sel(source) = 1 * (3015/5590)*(1/11890) = 0.0000453621 Cost = 2+ceil(0.536)+ceil(24.08) = 28 Cardinality Estimate: 3,312,381* 0.0000453621 = 150 Because the low cardinality, GROUP BY will be in memory and the cost can be ignored.

Cost Estimate for T


If partition stats are not ready, global stats are used (for index IDX2_SOURCE_SEARCH_TYPE_DAILY) Num_rows: 2,967,427,119, blevel: 3, leaf blocks: 10,109,550, clustering_factor: 213,185,700. NDV: datestamp: 3350, mrkt_id: 32, source: 50900. Histograms for mrkt_id for value 0: 4393 out of 9212. Effective index selectivity: (1/3350)*(4393/9212)*(1/50900) = 2.796692e-9 Cardinality: 2,967,427,119*2.796692e-9 = 8.3 - > 9 Cost: 3 + ceil(10,109,550*2.796692e-9) + ceil(213,185,700*2.796692e-9) = 3+1+1 = 5

Cost With Three Table/View Joins


Assume the join cost between S and view D is cost(S,D) The cost after join view T can be calculated as follows:
with hash join/FTS on T: cost(S,D) + 31,796 with JPPD/NESTED LOOPS JOIN: cost(S,D) + rows(S,D)*28/PX, PX is the dop from D

So the final method: compare 31,796 and rows(S,D)*28/PX and pick the smaller one. With PX as 4, rows(S,D) has to be at least 4,542 so that the correct HASH JOIN can be selected. Here is the estimate for rows(S,D)
1: when D has no partition stats. 40: when D has partition stats, without histograms 8349: when D has partition stats, with histograms.

What if T has no partition stats? What if we have PX hint on view T, or on the table SOURCE_SEARCH_TYPE_DAILY inside view T? FTS cost will be changed to 31,796/PX.

S-ar putea să vă placă și