Sunteți pe pagina 1din 8

UNIVERSITY OF CALGARY

DEPARTMENT OF COMPUTER SCIENCE


WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06

Objective
To get familiar with Cost-estimation of Selection and Join operations.

Exercises

Employee Table Information


rE
10000
bE
2000
bfrE 5 records/block
Department Table Information
rD
125
bD
13
bfrD 9 records/block

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06

Index Information
Index
Name

Index Type

Uniqueness

Level
(x)

Selection
Selectivity
Cardinality (S = (Sl = 1/d)
r/d)

Salary
Ssn
Dno
Sex
Dnumber
Mgrssn

Clustering Index
Secondary Index
Secondary Index
Secondary Index
Primary Index
Secondary Index

NonUnique
Unique
NonUnique
NonUnique
Unique
Unique

3
4
2
1
1
2

20
1
80
5000
1
1

0.002
0.0001
0.008
0.5
0.008
0.008

Distinct
First
Level
Values (d) Index
Blocks/Leaf
Blocks (bI1)
500
10000
125
4
2
125
125
-

(Q1)
Compare the Costs of implementing following Selection Operations using the Relevant Selection Approaches.
OP1:
OP2:
OP3:
OP4:

SSN=1234567 (EMPLOYEE)
DNO>5 (EMPLOYEE)
DNO=5 (EMPLOYEE)
DNO=5 AND SALARY>30000 AND SEX=F (EMPLOYEE)

(a) OP1: SSN=1234567 (EMPLOYEE)


Using Linear Search:
Cost of Selection on Non-key attribute = b
Average Cost of Selection for Equality condition on Key attribute if record is found = b/2
Hence Average Cost of Selection on SSN (Key attribute) for Equality condition = bE/2 = 2000/2 = 1000
Using Secondary B+Tree Index:
Cost of Selection on a Key attribute with Equality condition = x + 1
Cost of Selection on Non-key attribute with Equality condition = x + s

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06

Cost of Selection with Comparison condition = x + bI1/2 + r/2


Hence Average Cost of Selection on SSN (Key attribute) for Equality condition = xSSN + 1 = 4 + 1 = 5
(b) OP2: DNO>5 (EMPLOYEE)
Using Linear Search:
Cost of Selection on DNO (Non-key attribute) = bE = 2000
Using Secondary B+Tree Index:
Cost of Selection with Comparison condition = (xDNO + bI1DNO/2 + rE/2) = (2 + 4/2 + 10000/2) = 5004
(c) OP3: DNO=5 (EMPLOYEE)
Using Linear Search:
Cost of Selection on DNO (Non-key attribute) = bE = 2000
Using Secondary B+Tree Index:
Cost of Selection on DNO (Non-key attribute) with Equality condition = (xDNO + sDNO) = (2 + 80) = 82

(d) OP4: DNO=5 AND SALARY>30000 AND SEX=F (EMPLOYEE)


In this conjunctive selection condition, we need to estimate the cost of using any one of the three components of
selection condition to retrieve records plus the linear search approach.
If DNO=5 is used first, Cost of Selection on DNO (Non-key) with Equality condition using Secondary Index = (xDNO +
sDNO) = (2 + 80) = 82
If SALARY>30000 is used first, Cost of Selection on SALARY with Comparison condition using Ordering Index =
(xSALARY + bE/2) = (3 + 2000/2) = 1003
If SEX=F is used first, Cost of Selection on SEX with Equality condition using Secondary Index = (xSEX + sSEX) = (1 +
5000) = 5001

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06

Since performing select using DNO=5 first gives a cost of 82 which is lowest compared to rest, Optimizer will use
it plus Linear search to retrieve records satisfying the rest two conditions.

(Q2)
Compare the Costs of implementing following Join Operations using the Relevant Join Approaches also changing
the Outer Loop Relation.
OP1: EMPLOYEE DNO=DNUMBER DEPARTMENT
OP2: DEPARTMENT MGRSSN=SSN EMPLOYEE

Join Selectivity js = Ratio of the size (number of tuples) of the join file to Cartesian product of the file
js = |(R (R.A=S.B) S)|/|(R X S)|
If (Both A and B are Keys) or (Both A and B are Non-Keys) then js = 1 / max( d(A,R), d(B,S) )
If (Either A or B is the Key) then js = 1 / d(Key,Relation)
Nested Loop Join
Using NLJ, Worst-case Read Cost = bR + (bR * bS), Best-case Read Cost = bR + bs
Using NLJ, Write Cost = ((js * rR * rS) / bfrRS)
Total Cost estimate = bR + (bR * bS) + ((js * rR * rS) / bfrRS)
Cost formula using the number of available memory blocks = bR + ( bR/(nB-2) * bS) + ((js * rR * rS) / bfrRS)

Single Loop Join


Assuming Index on B attribute of S,
Using SLJ with Secondary Index, Cost = bR + (|R| * (xB + sB)) + ((js * rR * rS) / bfrRS)
Using SLJ with Clustering Index, Cost = bR + (|R| * (xB + (sB/bfrB))) + ((js * rR * rS) / bfrRS)
Using SLJ with Primary Index, Cost = bR + (|R| * (xB + 1)) + ((js * rR * rS) / bfrRS)
Using SLJ with Hash Index, Cost = bR + (|R| * h) + ((js * rR * rS) / bfrRS)

(a) OP1: EMPLOYEE DNO=DNUMBER DEPARTMENT

Join selectivity js = (1/dDnumber,Department) = 1/125 as Dnumber is the Key of Department and also assume bfrED = 4
records/block.
Using NLJ with Employee as Outer Loop:
Cost estimate = bE + (bE * bD) + ((js * rE * rD) / bfrED) = 2000 + (2000*13) + (((1/125)*10000*125)/4) = 30500

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06

Using NLJ with Department as Outer Loop:


Cost estimate = bD + (bE * bD) + ((js * rE * rD) / bfrED) = 13 + (2000*13) + (((1/125)*10000*125)/4) = 28,513
If Memory-size >= 15 blocks, then Cost = bD + bE + ((js * rE * rD) / bfrED) = 13 + 2000 + (((1/125)*10000*125)/4) =
4513
Using SLJ with Employee as Outer Loop: (Primary Index on Dnumber of Department)
Cost estimate = bE + (rE * (xDNUMBER + 1)) + ((js * rE * rD) / bfrED) = 2000 + (10000*2) + (((1/125)*10000*125)/4) =
24,500
Using SLJ with Department as Outer Loop: (Secondary Index on Dno of Employee)
Cost estimate = bD + (rD * (xDNO + sDNO)) + ((js * rE * rD) / bfrED) = 13 + (125 * (2+80)) + (((1/125)*10000*125)/4) =
12,763
Using Hash Join
Cost estimate = 3 * (bE + bD) + ((js * rE * rD) / bfrED) = 3 * (13+2000) + (((1/125)*10000*125)/4) = 8539
(b) OP2: DEPARTMENT MGRSSN=SSN EMPLOYEE

Join selectivity js = (1/ dSsn,Employee) = 1/10000 as SSN is Key in Employee and also assume bfrED = 4 records/block.
Using NLJ with Employee as Outer Loop:
Cost estimate = bE + (bE * bD) + ((js * rE * rD) / bfrED) = 2000 + (2000*13) + (((1/10000)*10000*125)/4) = 28,032
Using NLJ with Department as Outer Loop:
Cost estimate = bD + (bE * bD) + ((js * rE * rD) / bfrED) = 13 + (2000*13) + (((1/10000)*10000*125)/4) = 26,045
Using SLJ with Employee as Outer Loop: (Secondary Index on Mgrssn of Department)
Cost estimate = bE + (rE * (xMGRSSN + sMGRSSN)) + ((js * rE * rD) / bfrED) = 2000 + (10000*3) + (((1/10000)*10000*125)/4)
= 32,032
Using SLJ with Department as Outer Loop: (Secondary Index on Ssn of Employee)
Cost estimate = bD + (rD * (xSSN + sSSN)) + ((js * rE * rD) / bfrED) = 13 + (125*5) + (((1/10000)*10000*125)/4) = 670
Using Hash Join
Cost estimate = 3 * (bE + bD) + ((js * rE * rD) / bfrED) = 3 * (13+2000) + (((1/10000)*10000*125)/4) = 6071

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06
SOME FORMULAS
Number of tuples in relation R = rR
Number of blocks in relation R = bR
Blocking factor of relation R, bfrR = rR/bR
Number of distinct values of attribute A of relation R = dA,R
Selectivity of attribute A of relation R, slA,R = (1/dA,R) ----> ratio of tuples satisfying an equality condition on A
Selection Cardinality of attribute A of relation R, sA,R = (slA,R * rR) ----> number of tuples of R satisfying an equality
condition on A
Join Selectivity of (R R.A=S.B S), jsRS = 1/Max(dA,R, dB,S) If (Both A and B are Keys) or (Both A and B are Non-Keys)
Join Selectivity of (R R.A=S.B S), jsRS = 1 / dKey,Relation If (Either A or B is the Key)
Join Cardinality of (R R.A=S.B S) = (jsRS * rR * rS) ----> number of tuples of (RXS) satisfying join condition R.A=S.B

Write cost of resulting relation RES (from Selection/Join) i.e. number of blocks to be written to disk = rRES/bfrRES
rRES of relation RES resulting from a selection operation for equality condition on Key attribute A = 1
rRES of relation RES resulting from a selection operation for equality condition on Non-Key attribute A = sA,R
rRES of relation RES resulting from a selection operation for comparison condition = scondition,R
rRES of relation RES resulting from a join operation = (jsRS * rR * rS) or (slA,R * rR * slB,S * rS) or (sA,R * sB,S)
Blocking factor bfrRES for result of selection operation ( c R) = Blocking factor of its parent relation i.e. bfrR
Blocking factor bfrRES for result of join operation (R S) = 1/(1/bfrR + 1/bfrS)
SUMMARY OF COSTS OF SELECTION APPROACHES

Using Linear Search = bR


Using Linear Search on a Key attribute with Equality condition = bR/2
Using Binary Search on attribute A (where data file ordered by attribute A) = log2 bR + sA/bfrR - 1
Using Binary Search on Key attribute A with Equality condition = log2 bR
Using Primary Index on attribute A with Equality condition = (xA + 1)
Using Clustering Index on attribute A with Equality condition = (xA + sA/bfrR )
Using Ordering Index on attribute A (Primary/Clustering) with Comparison condition = (xA + bR/2)
Using Secondary Index on a Key attribute A with Equality condition = (xA + 1)
Using Secondary Index on a Non-Key attribute A with Equality condition = (xA + sA) ---> For s, use Min(sA, bR)
Using Secondary Index on attribute A with Comparison condition = (xA + bI1A/2 + rR/2)
Using Hash Key on a Key attribute for Equality condition = 1 which is the best case scenario

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06
SUMMARY OF COSTS OF JOIN APPROACHES
Nested Loop Join
Using NLJ, Worst-case Read Cost = bR + (bR * bS), Best-case Read Cost = bR + bs
Using NLJ, Write Cost = ((js * rR * rS) / bfrRS)
Total Cost estimate = bR + (bR * bS) + ((js * rR * rS) / bfrRS)
Cost formula using the number of available memory blocks = bR + ( bR/(nB-2) * bS) + ((js * rR * rS) / bfrRS)

Single Loop Join


Assuming Index on B attribute of S,
Using SLJ with Secondary Index, Cost = bR + (|R| * (xB + sB)) + ((js * rR * rS) / bfrRS)
Using SLJ with Clustering Index, Cost = bR + (|R| * (xB + (sB/bfrB))) + ((js * rR * rS) / bfrRS)
Using SLJ with Primary Index, Cost = bR + (|R| * (xB + 1)) + ((js * rR * rS) / bfrRS)
Using SLJ with Hash Index, Cost = bR + (|R| * h) + ((js * rR * rS) / bfrRS)
Sort-Merge Join
Assuming both R and S are ordered files on the join attributes,
Using SMJ, cost = bR + bS + ((js * rR * rS) / bfrRS)
Hash Join
Using HJ, cost = 3 * (bR + bS) + ((js * rR * rS) / bfrRS)
Best case Read cost when all buckets of one table fits into memory = bR + bS
For Comparison Condition in a Join (like R
applicable

R.SALARY > S.SALARY

S), Hash and Sort-Merge Approaches are not

UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed

Tutorial # 05 & 06
(Q3)
L(A,B,C,D)
R(C,D,E,F,G)
NB = 8
rL = 20,000, bL = 80
rR = 50,000, bR = 25
4 level Secondary Index available on C in R.
C is uniformly distributed in R with distinct values dC,R = 10
Compare the Cost of implementing following query for two cases (i) Benefiting from Index on C, (ii) Neglecting
the Index on C.
L L.C=R.C ( c=5 R)
(Q4)
Farmer(sin,name,age,sex,vname)
Village(name,area,population,province)
Kids(sin,f_sin,m_sin,s_name)
School(sname,vname,no_classes)
Main memory, NB = 6 blocks
Size of each block = 4000 bytes
Farmer: Number of records rF = 250; Each record size RF = 50 bytes
Village: Number of records rV = 20; Each record size RV = 75 bytes
Kids: Number of records rK = 3000; Each record size RK = 25 bytes
School: Number of records rS = 2500; Each record size RS = 50 bytes
1. Write the following query in SQL
2. Find the optimized Relational Algebra expression using Heuristic approach
3. Estimate the cost of this query
Query: Find the SIN of kids attending a school not located in their village (assume kids are living with their mother).

S-ar putea să vă placă și