Sunteți pe pagina 1din 13

CBQT DP Distinct Placement

Query Transformation
Query transformation is to rewrite original query into semantic equivalent query during parsing time. CBQT, cost based query transformation, is to find out the optimal semantic equivalent query based on cost calculation. Almost all RDMBS have query transformation phase. Even Apache Hive has query transformation, although it is simple and rule based. For Oracle, even simplest query will have CBQT phase. The sophisticated CBQT process will usually generate more efficient execution plans for very complex queries, but it will add overhead for very simple queries, especially when you compare the process with MySQL. Side notes: ANSI SQL join syntax like A JOIN B ON (A.C1=B.C2) is not Oracle native SQL language. Oracle will treat each such join as view, for example, A JOIN B ON (A.C1=B.C2) JOIN C ON (B.C3=C.C4) will become SELECT FROM (SELECT FROM A, B WHERE A.C1=B.C1) V, C WHERE V.C3=C.C4. Then view merge will be considered and CBQT will be applied possibly for each query block.

Where to Find CBQT Information?


10053 trace file is the best place to learn CBQT terminologies and the processes. Query plans may have system generated view with name like VW_xxx_xxx to indicate query transformation at work.

10053 Trace File

Distinct Placement
The following is a simple query with two table join and a final SELECT DISTINCT.
select distinct dtpc.targ_profile_id from udd_dim.dim_targ_prof_content_topic dtpc join udd_dim.map_sp_id_cont_topic msic on msic.cont_topic_id = dtpc.targ_cont_topic_id where dtpc.data_source_id = 4 and dtpc.tp_cont_top_include_flag = 1

DP The Explain Plan

The Questions
Why does the two table join have such long plan and why an internal view is generated?
VW_DTP_xxx (VW_DTP_C252A5A8) indicate CBQT distinct placement at work.

Where is the join predicate and column at step 6 from?


The join predicate now is on a the table DIM_TARG_PROF_CONTENT_TOPIC and the view VW_DTP_C252A5A8.

ITEM_1

Why Need Distinct Placement


The purpose is to reduce input size before join and final DISTINCT
From the plan, we can see that table MAP_SP_ID_CONT_TOPIC has 2,163K rows. After DP transformation is used, the internal view returns 508 rows, a 4,258 times of input size reduction.

What happens if no DP transformation?


Table Stats:
MAP_SP_ID_CONT_TOPIC: 2,163,072 rows, column CONT_TOPIC_ID NDV: 508 DIM_TARG_PROF_CONTENT_TOPIC: global stats is not accurate. Partition PART_04 is used: rows 3,571,698 , TARG_CONT_TOPIC_ID NDV: 193

Without DP, join cardinality will be 2,163,072* 3,571,698 /max(508,193) = 14,978,413,480. This will create huge burden for final DISTINCT. Think about if this join is just an intermediate process for more table joins or other operations.

With DP, join cardinality is 508*3,571,698/max(508, 193) = 3,571,698. The plan has it at 710K. 10053 trace file uses selectivity 0.000392 other than 1/508. I need further investigation. One possibility is histogram.

Query Transformation At Work

The following is excerpt from 10053 trace file, which shows the process of query transformation and the detail text of the system generated inline view VW_DTP_C252A5A8. CBQT also makes it hard to read 10053 trace files because it adds too many combinations beyond access path, join type and join order.

Original Issue
The first (good) query has one more table join to DIM_TARG_PROF_CONTENT_TOPIC. The second (bad) query has two more table joins, with one to MAP_SP_ID_CONT_TOPIC, on different column.

The Good Plan


The good plan used DP to have nice input cardinality reduction.

The Bad Plan


With one more table in the join, DP needs additional output column (ITEM_1 and ITEM_2) and there is almost no input cardinality reduction at all. So later when join DIM_TARG_PROF_CONTENT_TOPIC, a huge join cardinality is generated at 1,132M, which makes later operations almost mission impossible.

The Work Around


The work around is to do our own query transformation, by making the join of the two tables with bad join cardinality to last stage. Inline views with NO_MERGE hint are used to prevent Oracle from view merging, which could make Oracle into the dilemma with the bad plan. If distinct is not used in the inline view, Oracle will use push DP into the first inline view.

SELECT DISTINCT v1.adv_order_line_id AS adv_order_line_id, v2.site_id as pred_site_id FROM (SELECT /*+ NO_MERGE */ distinct daol.adv_order_line_id, dtpc.targ_cont_topic_id FROM udd_dim.dim_advertiser_order_line daol JOIN udd_dim.dim_targ_prof_content_topic dtpc ON (daol.targ_profile_id = dtpc.targ_profile_id) WHERE daol.data_source_id = 4 AND daol.apt_placement_type_code = 'AIC' AND daol.targ_section_name_text = 'N/A' AND daol.targ_cont_topic_name_text NOT IN ('N/A', 'abt') AND dtpc.data_source_id = 4 AND dtpc.tp_cont_top_include_flag = 1 ) v1 JOIN (SELECT /*+ NO_MERGE */ distinct msic.cont_topic_id, msis.site_id FROM udd_dim.map_sp_id_cont_topic msic JOIN udd_dim.map_sp_id_site msis ON (msis.space_id = msic.space_id)) v2 ON v1.targ_cont_topic_id=v2.cont_topic_id

The Plan With Work Around

While there is not much input cardinality reduction, there is no step with cardinality estimate which reaches 100M or 1B. The response time has changed from not coming back to minutes.

S-ar putea să vă placă și