Sunteți pe pagina 1din 39

UNIT-IV Query Processing and Optimization (Part II)

Motivation for Query Optimisation


List all the managers that work in the sales department.
SELECT * FROM emp, dept WHERE emp.deptno = dept.deptno AND emp.job = Manager AND dept.name = Sales;

There are at least three alternative ways of representing this query as a Relational Algebra expression.

(job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) (job = Manager) (name=Sales) (EMP ((job = Manager) (EMP))
emp.deptno = dept.deptno

DEPT)

emp.deptno = dept.deptno

((name=Sales) (DEPT))

Unit_IV_Class_Lectures(Prof.B.saleena)

Motivation for Query Optimisation


Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate:

(job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT)


Cartesian product of EMP and DEPT: (1000 + 50) record I/Os to read the relations + (1000 * 50) record I/Os to create an intermediate relation to store result

Selection on result of Cartesian product: (1000 * 50) record I/Os to read tuples and compare against predicate
Total cost of the query: (1000 + 50) + 2*(1000 * 50) = 101, 050 record I/Os. Unit_IV_Class_Lectures(Prof.B.saleena)

Motivation for Query Optimisation


Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate:

(job = Manager) (name=Sales) (EMP

emp.deptno = dept.deptno

DEPT)

Join of EMP and DEPT over deptno: (1000 + 50) record I/Os to read the relations + (1000) record I/Os to create an intermediate relation to store join result

Selection on result of Join: (1000) record I/Os to read each tuple and compare against predicate
Total cost of the query: (1000 + 50) + 2*(1000) = 3, 050 record I/Os. Unit_IV_Class_Lectures(Prof.B.saleena)

Motivation for Query Optimisation


Cost of processing the following query:

((job = Manager) (EMP))

emp.deptno = dept.deptno

((name=Sales) (DEPT))

Select Managers in EMP: (1000) record I/Os to read the relations + (50) record I/Os to create an intermediate relation to store select result
Select Sales in DEPT: (50) record I/Os to read the relations + (5) record I/Os to create an intermediate relation to store select result Join of previous two selections over deptno: (50 + 5) record I/Os to read the relations Total cost of the query: (1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/Os.
Unit_IV_Class_Lectures(Prof.B.saleena)

Query Processing Stage - 1


Cast the query into internal form

This involves the conversion of the original (SQL) query into some internal representation more suitable for machine manipulation. The internal representation typically chosen is either some kind of abstract syntax tree, or a relational algebra query tree.

Unit_IV_Class_Lectures(Prof.B.saleena)

Relational Algebra Query Trees


A Relational Algebra query can be represented as a query tree. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) (job = Manager) (name=Sales) (emp.deptno = dept.deptno)
Root Intermediate operations

X EMP DEPT
Unit_IV_Class_Lectures(Prof.B.saleena)

Leaves

Relational Algebra Query Trees


A Relational Algebra query can be represented as a query tree. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) (job = Manager) (name=Sales)
(emp.deptno = dept.deptno)

Root Intermediate operations Leaves

X EMP DEPT
Unit_IV_Class_Lectures(Prof.B.saleena)

Relational Algebra Query Trees


Alternativequery tree for the query to list all the managers that work in the sales department:

(job = Manager) (name=Sales) (EMP (job = Manager) (name=Sales)

emp.deptno = dept.deptno

DEPT)

emp.deptno = dept.deptno

EMP

DEPT

Unit_IV_Class_Lectures(Prof.B.saleena)

Relational Algebra Query Trees


Alternativequery tree for the query to list all the managers that work in the sales department:

((job = Manager) (EMP))

emp.deptno = dept.deptno

((name=Sales) (DEPT))

emp.deptno = dept.deptno

(job = Manager)

(name=Sales)

EMP

DEPT

Unit_IV_Class_Lectures(Prof.B.saleena)

Query Processing Stage - 2


Convert to canonical form

Find a more efficient representation of the query by converting the internal representation into some equivalent (canonical) form through the application of a set of well-defined transformation rules. The set of transformation rules to apply will generally be the result of the application of specific heuristic processing strategies associated with particular DBMSs.

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


1. Conjunctive selection operations can cascade into individual selection operations (and vice versa). Sometimes referred to as cascade of selection.
pqr(R) = p(q(r(R)))

Example:
deptno=10 sal>1000(Emp) = deptno=10(sal>1000(Emp))

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


2. Commutativity of selection
p(q(R)) = q(p(R))

Example:
sal>1000(deptno=10(Emp)) = deptno=10(sal>1000(Emp))

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


3. In a sequence of projection operations, only the last in the sequence is required.
PLPM PN(R) = PL (R)

Example:
PdeptnoPname(Dept) = Pdeptno (Dept))

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


4. Commutativity of selection and projection.
PAi, , Am(p(R)) = p(PAi, , Am(R))
where p {A1, A2, , Am}

Example:

Selection predicate (p) is only made up of projected attributes

Pname, job(name=Smith(Emp)) = name=Smith'(Pname, job(Staff))

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


5. Commutativity of theta-join (and Cartesian product).

pS

=S

pR

R X S = S XR Example:
EMP

NOTE: Theta-join is a generalisation of both the equi-join and natural-join

emp.deptno = dept.deptno

DEPT EMP

= DEPT

emp.deptno = dept.deptno

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


6. Commutativity of selection and theta-join
(or Cartesian product). (p(R))
rS

= p( R

where p {A1, A2, , Am}


Selection predicate (p) is only made up of join attributes

r S)

Example:
(emp.deptno=10 (EMP))
emp.deptno = dept.deptno

DEPT DEPT)

= emp.deptno=10 (EMP

emp.deptno = dept.deptno

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


7. Commutativity of projection and theta-join
(or Cartesian product). PL(R
r S)

= (PL1(R))

(PL2(S))

Project attributes L = L1 L2, where L1 are attributes of R, and L2 are attributes of S. L will also contain the join attributes

Example:
P job, location, deptno (EMP = (P job, deptno (EMP))
emp.deptno = dept.deptno

DEPT)

emp.deptno = dept.deptno (P location, deptno (DEPT))

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


8. Commutativity of union and intersection (but not set difference).
RS =SR RS =SR

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


9. Commutativity of selection and set operations (union, intersection, and set difference).
Union p(R S) = p(S) p(R) Intersection p(R S) = p(S) p(R) Set Difference p(R - S) = p(S) - p(R)
Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


10 Commutativity of projection and union
PL(R S) = PL(S) PL(R)

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


11 Associativity of natural join (and Cartesian product)
Natural Join (R S) T=R (S T)

Cartesian Product
(R X S) X T = R X (S X T)

Unit_IV_Class_Lectures(Prof.B.saleena)

Transformation Rules for RA Operations


12 Associativity of union and intersection (but not set difference)
Union (R S) T = S (R T) Intersection (R S) T = S (R T)

Unit_IV_Class_Lectures(Prof.B.saleena)

Heuristic Optimization of Query Trees


Query Tree (relational algebra expression)

leaf node

:relations

Internal node :relational algebra operations Execution of query trees: post order traversal of tree Using Heuristics in Query Optimization Apply SELECT and PROJECT before applying JOIN or other binary operations.
Unit_IV_Class_Lectures(Prof.B.saleena)

Heuristic Processing Strategies


Perform selection operations as early as possible
Translate a Cartesian product and subsequent selection (whose predicate represents a join condition) into a join operation. Use associativity of binary operations to ensure that the most restrictive selection operations are executed first Perform projections as early as possible. Compute common expressions once
Unit_IV_Class_Lectures(Prof.B.saleena)

Heuristic Processing - Example


(job = Manager ) (name=Sales)
(emp.deptno = dept.deptno)

(job = Manager ) (name=Sales)

X EMP DEPT EMP

emp.deptno = dept.deptno

DEPT

emp.deptno = dept.deptno

(job = Manager )

(name=Sales)

Optimised Canonical Query

EMP

DEPT

Unit_IV_Class_Lectures(Prof.B.saleena)

Query Processing Stage - 3


Choose candidate low-level procedures

Consider the (optimised canonical) query as a series of low-level operations (join, restrict, etc). For each of these operations generate alternative execution strategies and calculate the cost of such strategies on the basis of statistical information held about the database tables (files).

Unit_IV_Class_Lectures(Prof.B.saleena)

Query Processing Stage - 4


Generate query plans and choose the cheapest

Construct a set of candidate Query Execution Plans (QEPs). Each QEP is constructed by selecting a candidate implementation procedure for each operation in the canonical query and then combining them to form a string of associated operations. Each QEP will have an (estimated) cost associated with it the sum of the cost of each of its operations.

Choose the QEP with the least cost.

Unit_IV_Class_Lectures(Prof.B.saleena)

Cost Based Optimisation


Cost Based Optimisation (stages 3 & 4)

A good declarative query optimiser does not rely solely on heuristic processing strategies. It chooses the QEP with the lowest estimated cost. After heuristic rules are applied to a query, there still remains a number of alternative ways to execute it . The Query Optimiser estimates the cost of executing each one (or at least a number) of these alternatives, and selects the cheapest one.

Unit_IV_Class_Lectures(Prof.B.saleena)

Costs associated with query execution


Secondary storage access costs:
Searching for data blocks on disk, Reading data blocks from disk Writing data block to disk

Storage costs
Cost of storing intermediate (temp) files

Computation costs
Cost of CPU usage

Main memory usage costs


Cost of buffering data

Communication costs
Cost of moving data across
Unit_IV_Class_Lectures(Prof.B.saleena)

Database statistics used in cost estimation


Information held on each relation:
number of tuples

number of blocks blocking factor primary access method primary access attributes secondary indexes secondary indexing attributes number of levels for each index number of distinct values of each attribute

Unit_IV_Class_Lectures(Prof.B.saleena)

Query Optimisation Summary


The aims of query processing are to transform a query written in a high-level language (SQL), into a correct and efficient execution strategy expressed in a low-level language (Relational Algebra), and to execute the strategy to retrieve the required data.
There are many equivalent transformations of the same high-level query, the DBMS has to choose the one that minimises resource usage. There are two main techniques for query optimisation. The first uses heuristic rules that order the operations in a query. The second compares different execution strategies for those operations, based on their relative costs, and selects the least resource intensive (cheapest) ones.
Unit_IV_Class_Lectures(Prof.B.saleena)

Using Heuristics in Query Optimization-Example 2


Heuristic Optimization of Query Trees:
The same query could correspond to many different relational algebra expressions and hence many different query trees. The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.

Example:
Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = AQUARIUS AND PNMUBER=PNO AND ESSN=SSN AND BDATE > 1957-12-31;

Unit_IV_Class_Lectures(Prof.B.saleena)

Query: Find the last names of Employees born after 1957 who work on the Project named Aquarius SQL Query:
SELECT FROM WHERE LNAME EMPLOYEE, WORKS_ON, PROJECT PNAME = AQUARIUS AND PNMUBER=PNO AND ESSN=SSN AND BDATE > 1957-12-31;

Unit_IV_Class_Lectures(Prof.B.saleena)

SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME=Aquarius AND PNUMBER=PNO AND ESSN=SSN AND BDATE > DEC-31-1957

Canonical query tree

Unit_IV_Class_Lectures(Prof.B.saleena)

Moving SELECT operations down the query tree

Unit_IV_Class_Lectures(Prof.B.saleena)

Figure 18.5(c) Applying more restrictive SELECT operation first

SELECT LNAME FROM EMPOYEE, WORKS_ON, PROJECT WHERE PNAME=Aquarius AND PUMBER=PNO AND ESSN=SSN AND BDATE > DEC-31-1987

Unit_IV_Class_Lectures(Prof.B.saleena)

Replacing CARTESIAN PRODUCT and SELECT with JOIN

Unit_IV_Class_Lectures(Prof.B.saleena)

Moving PROJECT operations down

Transformation should keep equivalence


Unit_IV_Class_Lectures(Prof.B.saleena)

S-ar putea să vă placă și