Documente Academic
Documente Profesional
Documente Cultură
1. The query is the same each day except for the date. 2. A lot of similar queries with different predicate on column source.
NESTED LOOPS JOIN with bad cardinality estimate on first row source is a major reason for high LIO and CPU usage.
1. Those small rows are usually the results of missing stats. 2. For this specific case, the query started before partition stats were ready. 3. When there are no partition stats, global stats would be used.
1. Here is the plan currently running, using SQL profile to force hash join. 2. Note the high cost of HASH GROUP BY at the bottom.
Query Structure
1. Three views SELECT /*+ parallel(d,4) full(d) */ 2. One simple view (will be merged) 3. Two Complex views (cannot be merged)
FROM SOURCE_BY_SRCH_DLY_REV_MASK s, ( Complex View based on source_search_type_daily t1 ) t, (Complex View based on DM_SUMMARY_DAILY d ) d where 1. The troublemaker d.datestamp >= s.start_date and d.datestamp <= nvl(s.end_date,sysdate) ... Other join condition group by d.domain, d.source, d.search_type, d.query_source, t.revenue, t.adjusted_revenue, s.descriment_rev_pct
1. 2. 3. 4. Three Tables: SRC_BY_SRCH_DREV_MASK_ED: small code table DM_SUMMARY_DAILY: daily partitioned. Using a single partition. SOURCE_SEARCH_TYPE_DAILY: daily partitioned. Using a single partition.
Inline View T
SELECT mrkt_id, datestamp, SOURCE, query_source, search_type, SUM (revenue) revenue, SUM (adjusted_revenue) adjusted_revenue FROM source_search_type_daily t1 WHERE t1.datestamp = TO_DATE ('20120715', 'yyyymmdd' ) AND t1.mrkt_id = 0 GROUP BY mrkt_id, 1. Access a single partition of datestamp, SOURCE_SEARCH_TYPE_DAILY. SOURCE, 2. MRKT_ID=0 (skewed, about half of the query_source, data) search_type
Inline View D
select mrkt_id, datestamp, SOURCE, query_source, search_type, domain, pageview_type, country_of_origin , sum(pageviews) pageviews, sum(bidded_searches) bidded_searches, sum(bidded_results) bidded_results, sum(bidded_clicks) bidded_clicks, sum(revenue) revenue from DM_SUMMARY_DAILY d where d.datestamp = to_date('20120715' , 'yyyymmdd' ) and d.source like 'geosign%derp and d.mrkt_id = 0 group by mrkt_id, datestamp, SOURCE, query_source, search_type, domain, pageview_type, country_of_origin
1. Access a single partition of DM_SUMMARY_DAILY. 2. MRKT_ID=0 3. SOURCE uses LIKE expr.
4.
Bad Plan
DM_SUMMARY_DAILY
Partition Stats not ready Global: rows: 23,451,579,811 Global NDV: datestamp: 2145, mrkt_id: 25, source: 65524 Estimate: 23,451,579,811*(1/2145)*(1/25)*(1/65524) = 6.6742, round up to 7. Actual Partition Stats: rows: 5,127,832, datestamp: 1, mrkt_id: 23, source: 601 Estimated if using part stats when it was ready: 5,127,832*(1/1)*(1/23)*(1/601) = 371. If using histograms for mrkt_id=0 (3946 out of 5551 bucket numbers): 5,127,832*(3946/5551)*(1/601) = 6065
SRC_BY_SRCH_DREV_MASK_ED
No stats. Default to (block_size-cache layer)*blocks/100. block_size is 16K, blocks is 5. 16*1024*5/100 = 819.2. Not sure about the value of cache layer.
We blamed lacking of stats was the reason. So I will skip further research on this plan.
SRC_BY_SRCH_DREV_MASK_ED
Still use default 818 rows. Actual value is 61.
SOURCE_SEARCH_TYPE_DAILY
partition stats: rows: 3,312,381, datestamp: 1, mrkt_id: 26 (histograms for value 0: 3015 out of 5590), source: 11890 When using hash join, with datestamp and mrkt_id=0, 3,312,381 *(3015/5590) = 1,786,542 (1786K in the plan). When use join predicate push down with column source, for each (datestamp, mrkt_id,source) is 3,312,381*(3015/5590)*(1/11890) = 150. Here column source is treated as bind value.
MRKT_ID Histograms
SOURCE Histograms
View S on SOURCE_SEARCH_TYPE_DAILY
1,786,542
Oracle normally starts from the row source with smallest table, then next smaller one, and eventually all the combinations (factorial of total number of tables, here is 3! = 6). So in this case, if histograms is used, the first table will be SRC_BY_SRCH_DREV_MASK_ED, otherwise, it will be DM_SUMMARY_DAILY. Since the view on SOURCE_SEARCH_TYPE_DAILY is the last to evaluate, the cardinality estimate for it is usually not very important, but the costs for different access methods will be very important and will be very sensitive to the output counts of the join from the other two tables.
NDV 21 is found from 10053 trace for the small table. It is interesting how Oracle derives this default value, because it is actual NDV of the other table at partition level. If no histograms is used:
(818-0)*(406-0)/max(21,1) = 15,814
If histogram is used:
(818-0)*(85727-0)/max(21,1) = 3,339,270
Without histograms: 39.535 -> 40 With histograms: 8348.175 -> 8349 (plan uses 8347) Fortunately, the result is inflated by SRC_BY_SRCH_DREV_MASK_ED, by 818/61 = 13.4 times. Side note: when dynamic sampling was used as attempt to resolve this issue, it gave the actual count of SRC_BY_SRCH_DREV_MASK_ED, that is, 61. So even with histograms, the join cardinality estimate is only at 622, not enough for Oracle to pick up the right plan.
FTS Cost
FTS CPU cost formula: cost = (#SRds + #MRds*mreadtim/sreadtim + #CPUCycles/(cpuspeed*sreadtim) When using noworkload statistics, like in this case MBRC = db_file_multiblock_read_count Sreadtim = ioseektim + db_block_size/iotfrspeed Mreadtim = ioseektim + db_file_multiblock_read_count* db_block_size/iotfrspeed #SRds: number of single block reads #MRds: number of multiple block reads with size of db_file_multiblock_read_count.
sreadtim = (10 + 16*1024/4096) = 14 mreadtim = (10 + 16*16*1024/4096) = 74 FTS Cost = (0+26,243*74/14) + cpu_cost = 8669.5625 + cpu_cost The plan used cost 8736. The difference is from cpu_cost to read rows and filter the result. Because view T is aggregated complex view, there is a huge cost associated with it for sorting and grouping, making the total cost at 31,796.
So the final method: compare 31,796 and rows(S,D)*28/PX and pick the smaller one. With PX as 4, rows(S,D) has to be at least 4,542 so that the correct HASH JOIN can be selected. Here is the estimate for rows(S,D)
1: when D has no partition stats. 40: when D has partition stats, without histograms 8349: when D has partition stats, with histograms.
What if T has no partition stats? What if we have PX hint on view T, or on the table SOURCE_SEARCH_TYPE_DAILY inside view T? FTS cost will be changed to 31,796/PX.