Sunteți pe pagina 1din 8

As of Oracle9i, it is recommended that you use the package dbms_stats.

In fact, not only does the package dbms_stats provide many more features, but in some sit uations it provides better statistics as well. For example, the SQL statement AN ALYZE provides less control over the gathering of statistics, it does not suppor t external tables, and for partitioned objects it gathers statistics only for ea ch segment and estimates the statistics at the object (table or index) level. By default, the package dbms_stats modifies the data dictionary. Nevertheless, w ith most of its procedures and functions, it is also possible to work on a userdefined table stored outside the data dictionary. This is what calledl the backu p table. Since managing statistics means much more than simply gathering them, t he package dbms_stats has the following features :Gathering statistics and storing them either in the data dictionary or in a back up table. Locking and unlocking statistics stored in the data dictionary Restoring statistics in the data dictionary Deleting statistics stored in the data dictionary or a backup table Exporting the statistics from the data dictionary to a backup table Importing the statistics from a backup table to the data dictionary Getting (extracting) statistics stored in the data dictionary or a backup table Setting (modifying) statistics stored in the data dictionary or a backup table

Depending on the granularity and the operation you want to execute, the package dbms_stats provides different procedures and functions. For example, if you want to operate on a single schema, the package dbms_stats provides gather_schema_st ats, delete_schema_stats, lock_schema_stats, unlock_schema_stats, restore_schema _stats, export_schema_stats, and import_schema_stats. Feature System Gather/delete X Lock/unlock* Restore* X Export/import X Get/set X Database X X X Dictionary* X X X Schema X X X X Table** X X X X X Index** X X X

* Available as of Oracle Database 10g. ** For partitioned objects, it is possible to limit the processing to a single p artition. System Statistics The query optimizer used to base its cost estimations on the number of physical reads needed to execute SQL statements. This method is known as the I/O cost mod el. The main drawback of this method is that single-block reads and multiblock r eads are equally costly. Consequently,multiblock read operations, such as full t able scans, are artificially favored. Up to Oracle8i, especially in OLTP systems, the initialization parameters optimizer_index_cachin g and optimizer_index_cost_adj solved this problem . In fact, the default values

used to be appropriate for reporting systems and data warehouses only. As of Or acle9i, a new costing method, known as the CPU cost model, is available to addre ss this flaw. To use the CPU cost model, additional information about the perfor mance of the system where the database engine runs, called system statistics, ha s to be provided to the query optimizer. Essentially, system statistics supply t he following information. Performance of the I/O subsystem Performance of the CPU Despite its name, the CPU cost model takes into consideration the cost of physic al reads as well. But, instead of basing the I/O costs on the number of physical reads only, the performance of the I/O subsystem is also considered.In Oracle9i , no system statistics are available per default. This means that, by default, t he I/O cost model is used. As of Oracle Database 10g, a default set of system st atistics is always available. As a result, by default, the CPU cost model is use d. Actually, as of Oracle Database 10g, the only way to use the I/O cost model i s to specify the hint no_cpu_costing at the SQL statement level(or setting the undocumented initialization parameter _optimizer_ cost_model to io). In all other cases, the query optimizer uses the CPU cost mod el. There are two kinds of system statistics, noworkload statistics and workload sta tistics. The main difference between the two is the method used to measure the p erformance of the I/O subsystem. While the former runs a synthetic benchmark, th e latter uses an application benchmark. An application benchmark, also called a real benchmark, is based on the workload produced by the normal operation of a real application. Although it usually pro vides very good information about the real performance of the system running it, because of its nature, it is not always possible to apply it in a controlled ma nner. A synthetic benchmark is a workload produced by a program that does no real work . The main idea is that it should simulate (model) an application workload by ex ecuting similar operations. Although it can be easily applied in a controlled ma nner, usually it will not produce performance figures as good as an application benchmark. Nevertheless, it could be useful for comparing different systems.

Data Dictionary System statistics are stored in the data dictionary table aux_stats$, there are up to three sets of rows that are differentiated by the following values of the column sname: SYSSTATS_INFO is the set containing the status of system statistics and when the y were gathered. If they were correctly gathered, the status is set to COMPLETED . If there is a problem during the gathering of statistics, the status is set to BADSTATS, in which case the system statistics are not used by the query optimiz er. Two more values may be seen during the gathering of workload statistics: MANUALGATHERING and AUTOGATHERING. In addition , up to Oracle9i, the status is set to NOWORKLOAD when noworkload statistics wer e gathered. SQL> SELECT pname, pval1, pval2 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_INF O'; --------------- ---------- -------------------PNAME PVAL1 PVAL2

--------------- ---------- -----------------STATUS COMPLETED DSTART 02-07-2006 22:52 DSTOP 02-07-2006 22:52 FLAGS 1 SYSSTATS_MAIN is the set containing the system statistics themselves. Detailed i nformation about them is provided in the next two sections. SQL> SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN'; PNAME PVAL1 --------------- -----------CPUSPEEDNW 1617.6 IOSEEKTIM 10.0 IOTFRSPEED 4096.0 SREADTIM 1.3 MREADTIM 7.8 CPUSPEED 1620.0 MBRC 7.0 MAXTHR 473982976.0 SLAVETHR 1781760.0 SYSSTATS_TEMP is the set containing values used for the computation of system st atistics.It is available only while gathering workload statistics. Since a singl e set of statistics exists for a single database, all instances of a RAC system use the same system statistics. Therefore, if the nodes are not equally sized or loaded, it must be carefully decided which node the system statistics are to be gathered on. System statistics are gathered with the procedure gather_system_stats in the pac kage dbms_stats. Per the default, the permission to execute it is granted to pub lic. As a result, every user can gather system statistics. Nevertheless, to chan ge the system statistics stored in the data dictionary, the role gather_system_s tatistics, or direct grants on the data dictionary table aux_stats$, are needed. Per the default, the role gather_system_statistics is provided through the role dba.

Noworkload Statistics As mentioned earlier, the database engine supports two types of system statistic s: noworkload statistics and workload statistics. As of Oracle Database 10g, now orkload statistics are always available. If you explicitly delete them, they are automatically gathered during the next database start-up. In Oracle9i, even if they are gathered, no statistics are stored in the data dictionary. Only the column status in aux_stats$ is set to NOWORKLOAD.You gather noworkload statistics on an idle system because the database engine uses a synthetic benchm ark to generate the load used to measure the performance of the system. To measure the CPU speed, most likely some kind of calibrating operation is execute d in a loop.To measure the I/O performance, some reads of different sizes are pe rformed on several datafiles of the database. To gather noworkload statistics, you set the parameter gathering_mode of the pro cedure gather_system_stats to noworkload, as shown in the following example: dbms_stats.gather_system_stats(gathering_mode => 'noworkload'); Noworkload Statistics Stored in the Data Dictionary CPUSPEEDNW :-The number of operations per second (in millions) that one CPU is a

ble to process. IOSEEKTIM :-Average time (in milliseconds) needed to locate data on the disk. T he default value is 10. IOTFRSPEED :-Average number of bytes per millisecond that can be transferred fro m the disk. The default value is 4,096.

Workload Statistics Workload statistics are available only when explicitly gathered. To gather them, you cannot use an idle system because the database engine has to take advantage of the regular database load to measure the performance of the I/O subsystem. O n the other hand, the same method as for noworkload statistics is used to measur e the speed of the CPU.

The three steps for gathering workload statistics:1. A snapshot of several performance figures is taken and stored in the data dic tionary table aux_stats$ (for these rows, the column sname is set to SYSSTATS_TE MP). This step is carried out by setting the parameter gathering_mode of the pro cedure gather_system_stats to start, as shown in the following command: dbms_stats.gather_system_stats(gathering_mode => 'start') 2. The database engine does not control the database load. Consequently, enough time to cover a representative load has to be waited for before taking another s napshot. It is difficult to provide general advice about this waiting time, but it is common to wait at least 30 minutes. 3. A second snapshot is taken. This step is carried out by setting the parameter gathering_mode of the procedure gather_system_stats to stop, as shown in the fo llowing command: dbms_stats.gather_system_stats(gathering_mode => 'stop') 4. Then, based on the performance statistics of the two snapshots, the system st atistics are computed. If one of the I/O statistics cannot be computed, it is se t to NULL (as of Oracle Database 10g) or -1 (in Oracle9i).To avoid manually taki ng the ending snapshot, it is also possible to set the parameter gathering_mode of the procedure gather_system_stats to interval. With this parameter, the start ing snapshot is immediately taken, and the ending snapshot is scheduled to be ex ecuted after the number of minutes specified by a second parameter named interval. The following command specifies that the gathering of statistics should last 30 minu tes: dbms_stats.gather_system_stats(gathering_mode => 'interval',interval => 30) Note that the execution of the previous command does not take 30 minutes. It jus t takes the starting snapshot and schedules a job to take the ending snapshot. U p to Oracle Database 10g Release 1, the legacy scheduler (the one managed with t he package dbms_job) is used. As of Database 10g Release 2, the new scheduler (t he one managed with the package dbms_scheduler) is used. You can see the job by querying the views user_jobs and user_scheduler_job s, respectively. Name Description CPUSPEED The number of operations per second (in millions) that one CPU is able to process SREADTIM Average time (in milliseconds) needed to perform a single-block read op

eration MREADTIM Average time (in milliseconds) needed to perform a ation MBRC Average number of blocks read during a multiblock read MAXTHR Maximum I/O throughput (in bytes per second) for the SLAVETHR Average I/O throughput (in bytes per second) for a slave.

multiblock read oper operation whole system parallel processing

BEGIN dbms_stats.delete_system_stats(); dbms_stats.set_system_stats(pname dbms_stats.set_system_stats(pname dbms_stats.set_system_stats(pname dbms_stats.set_system_stats(pname dbms_stats.set_system_stats(pname dbms_stats.set_system_stats(pname END;

=> => => => => =>

'CPUSPEED', pvalue => 772); 'SREADTIM', pvalue => 5.5); 'MREADTIM', pvalue => 19.4); 'MBRC', pvalue => 53); 'MAXTHR', pvalue => 1136136192); 'SLAVETHR', pvalue => 16870400);

Impact on the Query Optimizer When system statistics are available, the query optimizer computes two costs: I/ O and CPU.The estimated CPU cost to access a column depends on its position in t he table. This formula gives the cost of accessing one row. If several rows are accessed, the CPU cost increases proportionally. A table with nine columns is cr eated, one row is inserted, and then with the SQL statement EXPLAIN PLAN the CPU cost of independently accessing the nine columns is displayed. Notice how there is an initial CPU cost of 35,757 to access the table, and then for each subsequ ent column, a CPU cost of 20 is added. At the same time, the I/O cost is constan t. This makes sense because all columns are stored in the very same database block, and therefore the number of physical reads required to read them is the same for all queries. SQL> CREATE TABLE t (c1 NUMBER, c2 NUMBER, c3 NUMBER, c4 NUMBER, c5 NUMBER, c6 N UMBER, c7 NUMBER, c8 NUMBER, c9 NUMBER); SQL> INSERT INTO t VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9); SQL> EXPLAIN PLAN SET STATEMENT_ID 'c1' FOR SELECT c1 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c2' FOR SELECT c2 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c3' FOR SELECT c3 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c4' FOR SELECT c4 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c5' FOR SELECT c5 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c6' FOR SELECT c6 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c7' FOR SELECT c7 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c8' FOR SELECT c8 FROM t; SQL> EXPLAIN PLAN SET STATEMENT_ID 'c9' FOR SELECT c9 FROM t; SQL> SELECT statement_id, cpu_cost AS total_cpu_cost, cpu_cost-lag(cpu_cost) OVER (ORDER BY statement_id) AS cpu_cost_1_coll, io_cost FROM plan_table WHERE id = 0 ORDER BY statement_id; STATEMENT_ID TOTAL_CPU_COST CPU_COST_1_COLL IO_COST ------------ -------------- --------------- ------c1 35757 3 c2 35777 20 3 c3 35797 20 3 c4 35817 20 3 c5 35837 20 3 c6 35857 20 3 c7 35877 20 3

c8 35897 20 3 c9 35917 20 3 if workload statistics are available, the query optimizer uses them and ignores noworkload statistics. You should be aware that the query optimizer performs sev eral sanity checks that could disable or partially replace workload statistics. When either sreadtim, mreadtim, or mbrc is not available, the query optimizer ig nores workload statistics. When mreadtim is less than or equal to sreadtim, the value of sreadtim and mread tim is recomputed using cost =io_cost + cpu_cost/cpuspeed *sreadtim*1000 io_cost =#Srds +#MRds * mreadtim/sreadtim cpu cost=cpu_cost(stored in plan_table)/cpuspeed *sreadtim*1000 sreadtim =ioseektim+db_block_size/iotrfspeed mreadtim=ioseektim+ mbrc*db_block_size/iotrfspeed Object Subpartition-Level Tables user_tab_statistics user_tab_subpartitions* Columns user_tab_col_statistics user_subpart_col_statistics user_tab_histograms user_subpart_histograms Indexes user_ind_statistics user_ind_subpartitions* Table Statistics SELECT num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len FROM u ser_tab_statistics WHERE table_name = 'T'; num_rows is the number of rows in the table. blocks is the number of blocks below the high watermark in the table. empty_blocks is the number of blocks above the high watermark in the table. This value is not computed by the package dbms_stats. It is set to 0. avg_space is the average free space (in bytes) in the table data blocks. This va lue is not computed by the package dbms_stats. It is set to 0. chain_cnt is the sum of the rows in the table that are chained or migrated to an other block. This value is not computed by the package dbms_stats. It is set to 0. avg_row_len is the average size (in bytes) of a row in the table. SELECT column_name AS "NAME", num_distinct AS "#DST", low_value, high_value, density AS "DENS", num_nulls AS "#NULL", avg_col_len AS "AVGLEN", histogram, num_buckets AS "#BKT" FROM user_tab_col_statistics WHERE table_name = 'T'; num_distinct is the number of distinct values in the column. low_value is the lowest value in the column. It is shown in the internal represe ntation.Note that for string columns (in the example, the column pad), only the user_ind_statistics user_indexes* user_part_col_statistics user_part_histograms user_ind_statistics user_ind_partitions* Table/Index-Level user_tab_statistics user_tables* Statistics Partition-Level user_tab_statistics user_tab_partitions*

first 32 bytes are used. high_value is the highest value in the column. It is shown in the internal repre sentation.Notice that for string columns (in the example, the column pad), only the first 32 bytes are used. density is a decimal number between 0 and 1. Values close to 0 indicate that a r estriction on that column filters out the majority of the rows. Values close to 1 indicate that a restriction on that column filters almost no rows. If no histo gram is present, density is 1/num_distinct. If a histogram is present, the compu tation differs and depends on the type of histogram. num_nulls is the number of NULL values stored in the column. avg_col_len is the average column size in bytes. histogram indicates whether a histogram is available for the column and, if it i s available,which type it is. Valid values are NONE (meaning no histogram), FREQ UENCY, and HEIGHT BALANCED. This column is available as of Oracle Database 10g. num_buckets is the number of buckets in the histogram. A bucket, or category as it is called in statistics, is a group of values of the same kind. As we will se e in the next section, histograms are composed of at least one bucket. If no his togram is available, it is set to 1. The maximum number of buckets is 254. LOW_VALUE AND HIGH_VALUE FORMAT the columns low_value and high_value are not easily decipherable. In fact, they display the values according to the internal representation used by the database engine to store data. To convert them to human-readable values, there are two options First, the package utl_raw provides the functions cast_to_binary_double, cast_to _binary_float, cast_to_binary_integer, cast_to_number, cast_to_nvarchar2, cast_t o_raw, and cast_to_varchar2. As the names of the functions suggest, for each spe cific datatype, there is a corresponding function used to convert the internal v alue to the actual value. To get the low and high value of the column val1, you can use the following query: SQL> SELECT utl_raw.cast_to_number(low_value) AS low_value, utl_raw.cast_to_numb er(high_value) AS high_value FROM user_tab_col_statistics WHERE table_name = 'T' AND column_name = 'VAL 1'; Second, the package dbms_stats provides the procedures convert_raw_value (which is overloaded several times), convert_raw_value_nvarchar, and convert_raw_value_ rowid. Since procedures cannot be directly used in SQL statements, usually they are used only in PL/SQL programs. In the following example, the PL/SQL block has the same purpose as the previous query: DECLARE l_low_value user_tab_col_statistics.low_value%TYPE; l_high_value user_tab_col_statistics.high_value%TYPE; l_val1 t.val1%TYPE; BEGIN SELECT low_value, high_value INTO l_low_value, l_high_value FROM user_tab_col_statistics WHERE table_name = 'T' AND column_name = 'VAL1'; dbms_stats.convert_raw_value(l_low_value, l_val1); dbms_output.put_line('low_value: ' l_val1); dbms_stats.convert_raw_value(l_high_value, l_val1); dbms_output.put_line('high_value: ' l_val1); END; /

create or replace function display_raw (rawval raw, type varchar2) return varchar2 is cn number; cv varchar2(32); cd date; cnv nvarchar2(32); cr rowid; cc char(32); begin if (type = 'NUMBER') then dbms_stats.convert_raw_value(rawval, cn); return to_char(cn); elsif (type = 'VARCHAR2') then dbms_stats.convert_raw_value(rawval, cv); return to_char(cv); elsif (type = 'DATE') then dbms_stats.convert_raw_value(rawval, cd); return to_char(cd); elsif (type = 'NVARCHAR2') then dbms_stats.convert_raw_value(rawval, cnv); return to_char(cnv); elsif (type = 'ROWID') then dbms_stats.convert_raw_value(rawval, cr); return to_char(cnv); elsif (type = 'CHAR') then dbms_stats.convert_raw_value(rawval, cc); return to_char(cc); else return 'UNKNOWN DATATYPE'; end if; end; / select a.column_name, display_raw(a.low_value,b.data_type) as low_val, display_raw(a.high_value,b.data_type) as high_val, b.data_type from user_tab_col_statistics a, user_tab_cols b where a.table_name='TRANSACTION' and a.table_name=b.table_name and a.column_name=b.column_name and a.column_name='DATEMODIFIED'

S-ar putea să vă placă și