Sunteți pe pagina 1din 33

Distributed Database Management Systems

Query Optimization
Dr. Eng. Haiyan HOUSROUM

Query Optimization
The number of relations in the query The number of operations to be performed The number of predicates applied The size of each relation in the query The order of operations to be performed, The existence of indexes The number of alternatives for performing each individual operation

Query Optimization
In a distributed system, there are other factors The fragmentation details for the relations The location of these fragments/tables in the system The speed of communication links connecting the sites in the system. The overhead associated with sending messages and the overhead associated with the local processing speed increase exponentially as the number of available alternatives increases. It is generally acceptable to merely try to find a good alternative execution plan for a given query, rather than trying to find the best alternative.

Query Optimization
Bank Database CUSTOMER (CID, CNAME, STREET, CCITY); BRANCH (BNAME, ASSETS, BCITY); ACCOUNT (A#, CID, BNAME, BAL); LOAN (L#, CID, BNAME, AMT); TRANSACTION (TID, CID, A#, Date, AMOUNT);

Computing Selection
No Index on R and Not Sorted (Sorted): Scan all disk blocks B+Tree index on R Simple predicates Complex predicates Hash Index on R

Computing Join
Nested-Loop Join Nested-Loop Join with Indexes Sort-Merge Join

Query Processing in CS.


Minimize the query response time. Maximize the parallelism in the system. Maximize the system throughput. Minimize the total resources used (amount of memory, disk space, cache, etc.)

Query Processing in CS.

Query Processing in CS.


We want to retrieve the name of all customers who have one or more accounts in branches in the city of Edina. Select c.Cname From Customer c, Branch b, Account a Where c.CID = a.CID AND a.Bname = b.Bname AND b.Bcity = Edina;

Query Processing in CS.


Query Optimization PJcname (Customer NJN Account)NJN (Account NJN (SLBcity = Edina(Branch))) PJcname (Customer NJN (Account NJN (SLBcity=Edina (Branch))))

Query Processing in CS.


Unary operator (Uop) is commutative: Uop1(Uop2(R)) Uop2(Uop1(R)) For example, SLBname = Main (SLAssets > 12000000 (Branch) SLAssets > 12000000 (SLBname = Main ((Branch))

Query Processing in CS.


Unary operator is idempotent: Uop((R)) Uop1(Uop2((R)) For example, SLBname = Main AND Assets > 12000000 (Branch) SLBname = Main (SLAssets > 12000000 (Branch)

Query Processing in CS.


Binary operator (Bop) is commutative except for set difference: R Bop1 S S Bop1 R For example, Customer NJN Account Account NJN Customer

Query Processing in CS.


Binary operator is associative: R Bop1 (S Bop2 T) (R Bop1 S) Bop2 T For example, Customer NJN (Account NJN Branch) (Customer NJN Account) NJN Branch

Query Processing in CS.


Unary operator is distributive with respect to some binary operations: Uop(R Bop S) (Uop(R)) Bop (Uop(S)) For example, SLsal > 50000 (PJCname, sal (Customer) UN PJEname, sal (Employee)) (SLsal > 50000 (PJCname, sal (Customer)) UN ((SLsal > 50000 (PJEname, sal (Employee)))

Query Processing in CS.


Unary operator can be factored with respect to some binary operation: (Uop(R)) Bop (Uop(S)) Uop(R Bop S) For example, SLsal > 50000 (PJCname, sal (Customer)) UN (SLsal > 50000 (PJEname, sal (Employee))) SLsal > 50000 (PJCname, sal (Customer)) UN (PJEname, sal (Employee))

Query Processing in CS.


Alt1: (SLBname = Main (Account)) NJN Branch Alt2: Branch NJN (SLBname = Main (Account)) Alt3: (SLBname = Main(Branch)) NJN Account Alt4: Account NJN (SLBname = Main(Branch)) Alt5: SLBname = Main (Account NJN Branch) Alt6: SLBname = Main (Branch NJN Account) Alt7: (SLBname = Main (Account)) NJN (SLBname = Main(Branch)) Alt8: (SLBname = Main(Branch))NJN(SLBname = Main(Account))

Query Processing in CS.

Query Processing in CS.


There are 500 customers in the bank. On average, each customer has two accounts. There are 100 branches in the bank. There are 10 branches in Edina city. Ten percent of customers have accounts in the branches in Edina. 1000 accounts in the bank, and 50 customers have accounts in the branches in Edina resulting in 100 accounts in that city. t: units of time to process each tuple of each relation in memory.

Query Processing in CS.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS.


EMP relation is horizontally fragmented based on the value of the LOC attribute. Each employee works at one of three possible locations (LA, NY, or MPLS). We will consider a query that needs to retrieve the name of all employees who make more than $50,000.

Query Processing in DS.


Global Query: PJEname (SLsal > 50000 (EMP)) LAs Query: PJEname (SLsal > 50000 (LA_EMP)) Nys Query: PJEname (SLsal > 50000 (NY_EMP)) MPLSs Query: PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS.


PJEname (SLsal > 50000 (LA_EMP)) UN PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS.


PJEname (SLsal > 50000 (LA_EMP)) UN ( PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP)) )

Query Processing in DS.


SLsal > 50000 ( PJEname (LA_EMP) UN ( PJEname (NY_EMP) UN PJEname (MPLS_EMP) ) )

Query Processing in DS.


Three relations: A, B, C A JN B JN C Three-site distributed system. A is not fragmented, it is replicated, there are two copies. B is horizontally fragmented into B1, B2, and B3. C is vertically fragmented into C1 and C2. A JN B JN C A JN (B1 UN B2 UN B3) JN (C1 JN C2)

Query Processing in DS.


Operation Shipping Data Shipping
OODB

Hybrid Shipping

Query Processing in DS.


Distributed Query Solution Space Reduction
Apply Select and Project as Soon as Possible

EMP (EmpID, Name, LOC, SAL, DoB, Dept); Create MPLS_EMPS


AS SELECT * FROM EMP WHERE LOC = Minneapolis;

Create LA_EMPS
AS SELECT * FROM EMP WHERE LOC = LA;

Create NY_EMPS
AS SELECT * FROM EMP WHERE LOC = NY;

Select Name
From EMP Where LOC = LA and Sal > 30,000.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS.


Distributed Query Solution Space Reduction Simplify Operations on Horizontal Fragments Perform Operations Where Most of the Data Is Simplify Join Operations Use Semi-join to Reduce Communication Cost

S-ar putea să vă placă și