Sunteți pe pagina 1din 35

Distributed Database Systems

Autumn, 2007 Chapter 7

Overview of Query Processing


Distributed Database Systems 1

SQL: Non-Procedural Language of RDB


Tuple calculus
{ t | F(t) } where:
t : tuple variable F(t) : well formed formula

Example
Get the No. and name of all managers

{ t (ENO, ENAME) | t EMP t (TITLE ) =" MANAGER " }

Distributed Database Systems

SQL: Non-Procedural Language of RDB


Domain calculus { x1 , x2 , , xn | F (x1 , x2 , , xn ) }
where:
xi : domain variables F (x1 , x2 , , xn ) : well formed formula

Example

{ x, y | E(x, y, "manager") }
Variables are position sensitive!
Distributed Database Systems 3

SQL: Non-Procedural Language of RDB


SQL is a tuple calculus language SELECT FROM WHERE ENO,ENAME EMP TITLE=manager

End user uses non-procedural languages to express queries.


Distributed Database Systems 4

Query Processor
Query processor transforms queries into procedural operations to access data

Distributed Database Systems

Query Processor
Distributed query processor has to deal with

query decomposition, and data localization

Distributed Database Systems

7.1 Query Processing Problems

Distributed Database Systems

7.1 Query Processing Problems


Centralized query processor must

transform calculus query into algebra operation, and choose the best execution plan

Example: SELECT ENAME FROM E,G WHERE E.ENO = G.ENO AND RESP=manager
Distributed Database Systems 8

7.1 Query Processing Problems


ENAME ( RESP ="Manager " E . ENO =G .ENO (E G )) ENAME (E >< ENO RESP ="Manager " (G ))
Execution plan less resources!

Relational Algebra 1

Relational Algebra 2

2 is better for consuming


Distributed Database Systems 9

7.1 Query Processing Problems


In DDB, the query processor must consider the communication cost and select the best site! Same query as last example, but G and E are distributed. Simple plan:
To transport all segments to query site and execute there. This causes too much network traffic, very costly.
Distributed Database Systems 10

7.1 Query Processing Problems


Distributed Query Example
Distribution of E and G

Distributed Database Systems

11

7.1 Query Processing Problems


Distributed Query Example
Query

ENAME (E >< ENO REPSP="Manager" (G ))

Distributed Database Systems

12

7.1 Query Processing Problems


Distributed Query Example
Optimized Processing

Distributed Database Systems

13

7.2 Objectives of Query Processing

Distributed Database Systems

14

7.2 Objectives of Query Processing


Two-fold objectives:

Transformation, and Optimization

Distributed Database Systems

15

7.2 Objectives of Query Processing


Cost to be considered for optimization:

CPU time I/O time, and Communication time


WAN: the last cost is dominant LAN: all three are equal
Distributed Database Systems 16

7.3 Complexity of Relational Algebra Operations

Distributed Database Systems

17

7.3 Complexity of Relational Algebra Operations

Measured by n (cardinality) and tuples are sorted on comparison attributes


, (with duplicates)

O(n) O(nlogn) O(nlogn) O(n2)


Distributed Database Systems 18

(with duplicates), GROUP


><, , I, U,

7.4 Characterization of Query Processor

Distributed Database Systems

19

7.4.1 Languages
For users:

calculus or algebra based languages.


For query processor:

map the input into internal form of algebra augmented with communication primitives.

Distributed Database Systems

20

7.4.2 Types of Optimization


Exhaustive search
Workable for small solution space

Heuristics
Perform , first, semi-join, etc. for large solution space

Distributed Database Systems

21

7.4.3 Optimization Timing


Static
Do it at compiling time by using statistics, appropriate for exhaustive search, optimized once, but executed many times.

Dynamic
Do it at execution time, accurate, repeated for every execution, expensive.

Distributed Database Systems

22

7.4.4 Statistics
Facts of
Cardinalities Attribute value distribution Size of relation, etc.

Provided to query optimizer and periodically updated.

Distributed Database Systems

23

7.4.5 Decision Site


For query optimization, it may be done by
Single site centralized approach, or All the sites involved distributed, or Hybrid one site makes major decision in cooperation with other sites making local decisions

Distributed Database Systems

24

7.4.6 Exploration of the Network Topology WAN


communication cost is dominant

LAN
communication cost is comparable to I/O cost. Broadcasting capability, star network, satellite network should be considered.

Distributed Database Systems

25

7.4.7 Exploration of Replicated Fragments

Use replications to minimize communication costs.

Distributed Database Systems

26

7.4.8 Use of Semi-joins

Reduce the size of operand relations to cut down communication costs when overhead is not significant.

Distributed Database Systems

27

7.5 Layers of Query Processing

Distributed Database Systems

28

Generic Laying Scheme for Distributed Query Processing

Distributed Database Systems

29

7.5.1 Query Decomposition


Decompose calculus query into algebra query using global conceptual schema information.
Step 1 calculus normalization Step 2 semantic analysis to reject incorrect queries Step 3 simplification to eliminate redundant components Step 4 translation of calculus query into optimized algebra query.
Distributed Database Systems 30

7.5.2 Data Localization


Distributed query is mapped into a fragment query and simplified to produce a good one.

Distributed Database Systems

31

7.5.3 Global Query Optimization


Find an execution strategy close to optimal. Find the best ordering of operations in the fragment query, including communication operations. Cost function defined in time is required.

Distributed Database Systems

32

7.5.4 Local Query Optimization

Centralized system algorithms (to be discussed in chapter 9)

Distributed Database Systems

33

7.6 Conclusions

Distributed Database Systems

34

7.6 Conclusions
Query processor must be able to find good execution plan for a calculus query, s. t. CPU time, I/O time and communication time are minimized. Method: laying of
decomposition localization global query optimization local query optimization
Distributed Database Systems 35

S-ar putea să vă placă și