Sunteți pe pagina 1din 9

Oracle Sql Tuning- A Framework

Prepared by Saurabh Kumar Mishra Performance Engineering & Enhancement offerings (PE2) Infosys Technologies Limited (NASDAQ: INFY) saurabhkumar_mishra@infosys.com This paper describes a framework for Tuning Oracle Queries which leads to provide maximum improvement in execution time. It includes the practical examples which shows, how to use this framework. INTRODUCTION
Currently rdbms systems such as oltp databases with huge records getting processed per unit of time uses complex logics which leads to usage of aggregate functions, union, union-all, minus, exists and group by views etc. Some of the parameter which developers/dba commonly use while tuning any queries are generating explain plan by using plan table or by generating trace files for any session, but analyzing what is affecting the query performance before starting tuning is the most important step. This paper will provide a step wise approach for tuning the plsql or sqls. Over the time period SQL has been used extensively in rdbms systems. I will be discussing in this paper specifically about Oracle with a generalized approach. In oracle Sql execution time completely depends on components such as Query Optimizer and Query execution engines.

Query Optimizer
A typical query optimizer; the parsed query will come from parser as an input to the optimizer. The next step is generation of potential execution plans based on availability of available access paths and hints which is task of query transformer. Estimator will then use the dictionary tables like dba_tables, dba_tab_columns, dba_indexes, dba _tab_partitions etc to generate the cost for each plan. The dictionary tables play here an important role in collection of statistics like data distribution and storage characteristics of tables, indexes, partitions accessed by the sqls. Once this cost is calculated the Plan Generator will select the lowest cost query plan and will provide it to row source generator for execution. So for execution of any query the key role will be played by the statistics generation in dictionary tables/views, which can be stated as the first prerequisite for any query tuning activity.

1|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

Framework
CollecttheTimingfortheexistingUntunedQuery.

CollectStatisticsin DictionaryTables EvaluationofSt atistics captured.

CollectSGAsizingdetails, alongwithhit ratios(DatabaseBuffer, LibraryCache)

GenerateExplainPlan

AnalyseExplainPlan

ObjectlevelDesignEvaluation EvaluatePredicates EvaluateIndexing Evaluate Partitioned Tables

Query LevelDesignEvaluation EvaluateJoins EvaluateNestedQueries EvaluateUsageofFunctions EvaluateUsageofHints

Recommendationapplication FunctionalTesting

Collectthe Improvementfor thetuned Query.


The framework gives you an idea right from the starting of query tuning spreading across touching database level tuning aspects relevant to Sql tuning. Top to bottom approach in this framework will analyze each aspect involved in query tuning, which results in high performing queries as an output.

2|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

Descriptions
Each step in the framework plays a significant role in tuning of queries, but the first and the last steps i.e. collection of before and after results the response time with the no of record count each query is generating play a vital role. Along with this collection of load (on server & database) at which you are executing the queries and gathering the results is also important. Some tuning expert also uses cost as the base line for tuning of queries and analyzing the improvement, which can be little distractive when you are looking at the response time as the major factor in tuning. So Sql tuning can generally can be divided into: o Tuning for best response time, o Tuning for best throughput (i.e. less usage of resources in db and server). But again what is considered as best tuning practice is, the query should give the best response time using the least resource, So you need to keep in mind both cost and response time in to factor while tuning Sql queries. The next step before you actually begin with generation of explain_plan is collection and evaluation of Statistics available for Objects (tables and indexes).

Statistics Collection and Analysis


DBMS_STATS and Analyze are the two tools available which are used to collect table and index level statistics in oracle, which will be used by oracle optimizer to choose the best execution plan. Once the statistics are generated we need to evaluate the statistics gathered. The parameters like estimate_percent, block_sample, method_opt, and cascade are the parameters which address statistics accuracy and performance in dbms_stats which are one big advantage over old analyse command. Collection using dbms_stats can be done using following syntax: EXECUTE dbms_stats.gather_table_stats(Ownname=>'ABC', tabname=>'ABC_TAB', estimate_percent=>1,block_sample=True, method_opt='FOR ALL COLUMNS SIZE 1', cascade=>true) Ownname is the table owner; tabname is the table_name which we are going to collect statistics. Now, estimate_percent, block_sample, method_opt and cascade are the parameter which defines the accuracy of the statistics. Mr. Terry [2] says, the recommended values for each are below, which again needs to be verified for each database are: Estimate_percent - Balancing performance of the gathering process and the accuracy of the statistics lies between 5 and 20 for estimate_percent. Five percent will probably work for most cases, but 20% may be advisable if really accurate statistics are needed. Gathering statistics separately for indexes is faster than the cascade=true option while gathering table statistics. Block_sample=true doesnt appear to appreciably speed up statistics gathering, while at the same time delivering somewhat less-accurate statistics. Using the SKEWONLY option to size histograms is inadvisable. Analysis the results generated by DBMS_STATS can be done by seeing the values captured in dictionary tables like dba_tab_columns, dba_indexes and dba_ind_columns, the columns which play significant role are num_rows,

3|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

num_distinct and sample_size for tables, distinct_keys and sample_size for indexes. I will leave the analysis part to you using the best recommendations suggested above.

Collection of SGA Details


SGA (System Global Area) is the most important part of oracle database, it plays a vital role in storing the execution plans, caching of data blocks and so on. The most important are the two features: Storage of Execution Plan in Library Cache in Shared Pool- Library Cache Hit Ratio. Caching of data Blocks in Database Buffer Cache-Buffer Cache Hit Ratio. Above, hit ratio value should be collected before we start the tuning process; the standard for same is below: Buffer cache hit ratio from 95 to 98 percent is considered healthy. The Data Dictionary (library) Cache Hit Ratio should be at or above 95 Percent. Calculation of hit ratio, can be done using Statspack/AWR reports output or by continuous monitoring of system, if you see any deviation in this you should contact the DBA for changing the database parameters of SGA by using advisor available.

Generation of Explain Plan


Explain Plan is the most important thing which needs to be analyzed during query tuning , it can be generated using many tools like 1) Using Plan table available in UTLXPLAN.SQL, this table can be queried to get explain plan of query. a. explain plan for your-precious-Sql-statement; b. Displaying the execution plan Select substr (lpad (' ', level-1) || operation || ' (' || options || ')', 1, 30) "Operation", object_name Object" from plan_table start with id = 0 connect by prior id=parent_id; 2) Using Quest tool TOAD for Oracle (or similar tools) and selecting the adequate contents like access predicates, filter predicates, CPU cost, time, optimizer_mode etc. 3) Using Tkprof to generate the report of the trace file by selecting any session from which queries are executing, in tkprof report you can analyze the I/O issues, execution and parse time difference.

Analyse Explain Plan


Explain plan once generated need to analyze based on parameters like , full table scans, full index scans , type of joins getting used(hash , nested loops, merge join, sort join ), total cost of query , cost increment pattern at each stage of join and table access paths used. If we broadly differentiate these parameters it will of two classes: a) Object level: - Object level will consist of table and index level parameters like access and filter predicates while joining tables by filtering the results. Usage of best practice needs to be checked here, and if not used recommendation has to be given or applied. b) Query level: - Query Level will consists of Types of joins, cost (CPU, I/O, overall) its increment pattern, execution time.

4|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

Collection of these few point initially will be of great help while tuning in later stages, now in the paper I will use a complex example to explain each of these parameters. Query 1: A module query from a Batch processing system, first step where it collects all the matching values from the I_claims (Partitioned) and M_n_audit Table based on filters.
SELECT * FROM (SELECT M_ID, N_A, L_NAME, D_R, E_DATE, T_DATE FROM (SELECT MBNDCA.M_ID, MBNDCA.N_A, MBNDCA.L_NAME, MBNDCA.D_R, MBNDCA.E_DATE, MBNDCA.T_DATE, R_FLAG, RANK () OVER (PARTITION BY MBNDCA.M_ID, MBNDCA.N_A ORDER BY MBNDCA.VERSION_N DESC, MBNDCA.M_NA_VER_ID DESC) R FROM M_N_AUDIT MBNDCA WHERE UPPER (M_ID) = UPPER (:V_M_ID) AND ( 20071001 BETWEEN MBNDCA.E_DATE AND MBNDCA.T_DATE OR MBNDCA.E_DATE BETWEEN 20071001 AND 20071231 ) AND MBNDCA.LAST_UPDT_DATE <= SYSTIMESTAMP) WHERE R = 1 AND R_FLAG = 'A') MBNDCA, I_CLAIMS IC WHERE UPPER (MBNDCA.N_A) = UPPER (IC.N_A) AND IC.TR_STATUS! =:TR_STATUS_R AND ( DECODE ('CL', :C_TYPE, IC.C_ID, NULL) NOT LIKE :C_CUSTOMER OR DECODE ('PDP', :C_TYPE_PDP, IC.C_ID, NULL) = :PDP_CUST OR DECODE ('M', :C_MED, IC.C_ID, NULL) BETWEEN :PDP_CUST AND :MAPD_CUST_END OR DECODE (:V_C_TYPE_CODE, :CTYPE_MAPD, IC.C_ID, NULL ) BETWEEN :MAPD_CUST_START AND :MAPD_CUST_END );

So lets generate the explain plan using toad for oracle.


Plan SELECT STATEMENT FIRST_ROWS Cost: 43 K Bytes: 1 M Cardinality: 8 K CPU Cost: 8 G IO Cost: 41 K Time: 511 6 HASH JOIN Access Predicates: UPPER ("N_A") =UPPER ("IC"."N_A") Cost: 43 K Bytes: 1 M Cardinality: 8 K CPU Cost: 8 G IO Cost: 41 K Time: 511 3 VIEW Filter Predicates: "R"=1 AND "R_FLAG"='A' Cost: 44 Bytes: 160 Cardinality: 4 CPU Cost: 14 M IO Cost: 41 Time: 1 2 WINDOW SORT PUSHED RANK Filter Predicates: RANK () OVER (PARTITION BY "MBNDCA"."M_ID","MBNDCA"."N_A" ORDER BY INTERNAL_FUNCTION ("MBNDCA"."VERSION_N") DESC, INTERNAL_FUNCTION ("MBNDCA"." M_NA_VER_ID ") DESC )<=1 Cost: 44 Bytes: 284 Cardinality: 4 CPU Cost: 14 M IO Cost: 41 Time: 1 1 TABLE ACCESS FULL TABLE M_N_AUDIT Filter Predicates: UPPER (TO_CHAR ("M_ID")) =UPPER (:V_M_ID) AND ("MBNDCA"."E_DATE"<=20071001 AND "MBNDCA"."T_DATE">=20071001 OR "MBNDCA"."E_DATE">=20071001 AND "MBNDCA"."E_DATE"<=20071231) AND SYS_EXTRACT_UTC (INTERNAL_FUNCTION ("MBNDCA"."LAST_UPDT_DATE"))<=SYS_EXTRACT_UTC(SYSTIMESTAMP(6)) Cost: 43 Bytes: 284 Cardinality: 4 CPU Cost: 8 M IO Cost: 41 Time: 1 5 PARTITION RANGE ALL Cost: 43 K Bytes: 30 M Cardinality: 193 K CPU Cost: 8 G IO Cost: 41 K Time: 511 Partition #: 5 Partitions accessed #1 - #6 4 TABLE ACCESS FULL TABLE I_CLAIMS Filter Predicates: "IC"."TR_STATUS"<>:TR_STATUS_R AND (DECODE (CL, C_TYPE,"IC"."C_ID", NULL) NOT LIKE: C_CUSTOMER OR DECODE (PDP, C_TYPE _PDP,"IC"."CUSTOMER_ID",NULL)=:PDP_CUST OR OR DECODE ('M', :C_MED, IC.C_ID, NULL) BETWEEN :PDP_CUST AND :MAPD_CUST_END OR DECODE (:V_C_TYPE_CODE, :CTYPE_MAPD, IC.C_ID, NULL ) BETWEEN :MAPD_CUST_START AND :MAPD_CUST_END ) Cost: 43 K Bytes: 30 M Cardinality: 193 K CPU Cost: 8 G IO Cost: 41 K Time: 511 Partition #: 5 Partitions accessed #1 - #6

5|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

Collection of Parameters 1) Object Level: Index Used- None. Tables Used (along with record count details). o I_claims- 1million rows in each partition. o M_n_audit- 10K records. Full Table Scans I_claims , M_n_audit Partition Accessed- All Partitions (#1-#6), data in #5th Partition. Functions used in join Columns- Upper. Partition Key Utilization- No (I_claims range partitioned on date_submitted) Access Predicates o UPPER (MBNDCA.N_A) = UPPER (IC.N_A) Filer Predicates o Step (2) in explain plan for view MBNDCA. o Step (4) in explain Plan for table I_claims. 2) Query Level:a. Cost Increased due to partition accessed fully i.e. all partitions getting accessed in query for Internal_claims table. b. Hash Join is applied due to presence of Big Table join with a comparative lesser record table, which intern results in FTS on both tables. c. Total Cost: 43K Total Execution Time Taken actual: 10 mins predicted Time is 511 secs. This above completes our Analysis of a Demo explain , which provide us most of the loop holes which needs to be fixed for tuning this query and bring down the response time. The analysis above will serve as an input while giving recommendations for tuning any such query.

Recommendations
So lets start analysing the results of explain plan, we should do it again in two steps one at query level and other at object level. As in above sample query:

Object Level Recommendations:


(Evaluate Indexing)No indexes are getting utilized which is resulting in FTS and Hash joins which are costlier; we should try to find out the needed index in the query in each table. o I_claims and M_N_AUDIT Access predicates used are N_A, so index should exist on this column in both of table. (Evaluate Partitioned Tables) I_claims is having a full partition access as the partitioning key is not used in the query which is date_submitted. This is the most common mistakes developer

do while using partition tables in there query. So Partition key should be utilized by mentioning the range in query. Query Level Recommendations:
(Evaluate Usage of functions) Also as upper is used even if a normal index is present optimizer will not expect it as it will expect a functional based index. So based on functionality and data in tables usage of upper should be evaluated and index should be created which interns remove Full Table Scan on both of tables.

6|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

(Evaluate Joins) Hash join should be converted to nested loop or vice versa by evaluating the benefit in response time. (Evaluate Hints) Hints can be used to evaluate the usage on indexes if optimizer does not pickup newly index created, but as per regular practice hints should be removed while deploying code in to production/UAT, as it is expected that statistics should be generated well in production/UAT which will allow optimizer to use proper indexes. (Evaluate Nested Queries)On Table M_N_AUDIT Nested query exists based on rank on partition which may be required functionality , options has to find out to convert this functionality in to direct join of this table with I_claims in from clause.

We can get these recommendations by using the guidelines/Best practice for writing oracle sqls which can result in High performing queries. Some of the general Best practices for writing Sqls are: o Semi Joins(EXISTS and IN) Mr. Roger [3] says, If the main body of your query is highly selective, then an EXISTS clause might be more appropriate to semi-join to the target table. However, if the main body of your query is not so selective and the subquery (the target of the semi-join) is more selective, then an IN clause might be more appropriate. o Anti Joins (NOT EXISTS and NOT IN) with or Without Null. Mr. Roger [3] says, First consider how null values should be handled when deciding whether to use NOT EXISTS or NOT IN. If you need the special semantics provided by NOT IN, then your decision has been made. Otherwise, you should next consider whether or not the query might benefit from a merge or hash anti-join. If so, then you probably ought to choose NOT IN. If you decide to go with NOT IN but do not want the expression to evaluate to false if a null value is found, then make sure the subquery cannot return a null value. If there is no chance that the query will benefit from a merge or hash anti-join and the special semantics of NOT IN are not desired, then you might want to select the NOT EXISTS construct so that there is a better chance Oracle will perform an efficient filter instead of an inefficient nested loops anti-join. o Mr. Herv [4] says, o Avoid NOT in or NOT = on indexed columns. They prevent the optimizer from using indexes. Use where amount > 0 instead of where amount! = 0. o Avoid writing where project category is not null. Nulls can prevent the optimizer from using an index. o Consider using IN or UNION in place of OR on indexed columns. ORs on indexed columns cause the optimizer to perform a full table scan. o Avoid calculations on indexed columns. Write WHERE approved_amt > 26000/3 instead of WHERE approved_amt /3 > 26000. o Consider replacing outer joins on indexed columns with UNIONs. A nested loop outer takes more time than a nested loop unioned with another table access by index. o WHERE EXISTS sub-queries can be better than join if can you reduce drastically the number of records in driver query. Otherwise, join is better. o WHERE EXISTS can be better than join when driving from parent records and want to make sure that at least on child exists. Optimizer knows to bail out as soon as finds one record. Join would get all records and then distinct them! o Evaluate Views If a view joins 3 extra tables to retrieve data that you do not need, don't use the view!

7|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

When joining 2 views that themselves select from other views, check that the 2 views that you are using do not join the same tables! Avoid multiple layers of view. For example, look for queries based on views that are themselves views. It may be desirable to encapsulate from a development point of view. But from a performance point of view, you lose control and understanding of exactly how much task loading your query will generate for the system. Look for tables/views that add no value to the query. Try to remove table joins by getting the data from another table in the join. Remove Unnecessary Sqls Overheads. Try to reduce no of joins in a query of a cursor, by seeing no of rows in output you need, ex... If the EMP table has 100000 rows and you are joining it with Dept table to get 2000 odd rows acted a group by to get.
Example: Declare Cursor c1 is select e.deptno,e.category,d.description,count(*) from dept d,emp e Where d.deptno= e.deptno group by e.deptno,e.category,d.description; Begin For xx in c1 loop --End loop; End; Alternative to above code: Declare Xdept Number: = -9999; Xdesc varchar2 (60); Cursor c1 is select e.deptno, e.category, count (*) from EMP EGroup by e.deptno, e.category; Cursor c2 is select d.description from dept d where d.deptno= xdept; Begin For xx in c1 loop --If Xdept! =xx.deptno Xdept: = xx.deptno; Open c2; Fetch c2 into Xdesc; End if; --End loop; End; Now 12 is no of distinct dept return by join in the query.

General Guidelines while coding Mr. Herv [4]: 1) Understand the data. Look around table structures and data. Get a feel for the data model and how to navigate it. 2) Do not code Large, complex plsql blocks, broke down them to smaller, simpler, self-contained blocks. 3) Try to see while making sqls columns referenced in queries are having indexes or not. These columns can be the select list columns and any required join or sort columns. 4) Consider adding small frequently accessed columns (not frequently updated) to an existing index. This will enable some queries to work only with the index, not the table. 5) IOTs and Index clusters usage should be checked.

8|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

You can use and add on to this general guidelines while tuning queries of any application , as it varies the different aspects of sqls (including new features available with 11g and 10g )will be utilized. This completes the Query Tuning recommendations, after applying recommendations to application two final steps should occur: o Functional Testing of changes done after applying recommendations. o Query Improvement Gained analysis, which including analysing again the explain plan (w.r.t. Cost, Response time) and Overall Application Improvement needed gained or not. If not again the tuning cycle should start right from Generating explain plan. Once done with Above all steps, a query is said to be TUNED.

Conclusion
A query Tuning exercise does not only involve tuning of identified queries it is also important to execute all the steps mentioned in this paper to get the maximum improvement in response time. The framework mentioned above is getting used by many DBAs and Sql tuning specialists but an understanding of the existing engineering framework is developed her which is necessary for making effective contribution to the area of query optimization.

References
1. Whats Up with Dbms Stats? By Terry Sutton Database Specialists, Inc. 2. Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN. Roger Schrag. Database Specialists, Inc. 3. http://www.iherve.com/oracle/tune100.htm, By Herv Deschamps.

Further Reading
This paper can be extended to include all best practices for Sql and PLSQL coding with Plsql tuning as well. Also Optimizer behaviour can be studied to evaluate different sets of queries based on Database parameter settings.

9|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.

S-ar putea să vă placă și