Documente Academic
Documente Profesional
Documente Cultură
Tutorial # 05 & 06
Objective
To get familiar with Cost-estimation of Selection and Join operations.
Exercises
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
Index Information
Index
Name
Index Type
Uniqueness
Level
(x)
Selection
Selectivity
Cardinality (S = (Sl = 1/d)
r/d)
Salary
Ssn
Dno
Sex
Dnumber
Mgrssn
Clustering Index
Secondary Index
Secondary Index
Secondary Index
Primary Index
Secondary Index
NonUnique
Unique
NonUnique
NonUnique
Unique
Unique
3
4
2
1
1
2
20
1
80
5000
1
1
0.002
0.0001
0.008
0.5
0.008
0.008
Distinct
First
Level
Values (d) Index
Blocks/Leaf
Blocks (bI1)
500
10000
125
4
2
125
125
-
(Q1)
Compare the Costs of implementing following Selection Operations using the Relevant Selection Approaches.
OP1:
OP2:
OP3:
OP4:
SSN=1234567 (EMPLOYEE)
DNO>5 (EMPLOYEE)
DNO=5 (EMPLOYEE)
DNO=5 AND SALARY>30000 AND SEX=F (EMPLOYEE)
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
Since performing select using DNO=5 first gives a cost of 82 which is lowest compared to rest, Optimizer will use
it plus Linear search to retrieve records satisfying the rest two conditions.
(Q2)
Compare the Costs of implementing following Join Operations using the Relevant Join Approaches also changing
the Outer Loop Relation.
OP1: EMPLOYEE DNO=DNUMBER DEPARTMENT
OP2: DEPARTMENT MGRSSN=SSN EMPLOYEE
Join Selectivity js = Ratio of the size (number of tuples) of the join file to Cartesian product of the file
js = |(R (R.A=S.B) S)|/|(R X S)|
If (Both A and B are Keys) or (Both A and B are Non-Keys) then js = 1 / max( d(A,R), d(B,S) )
If (Either A or B is the Key) then js = 1 / d(Key,Relation)
Nested Loop Join
Using NLJ, Worst-case Read Cost = bR + (bR * bS), Best-case Read Cost = bR + bs
Using NLJ, Write Cost = ((js * rR * rS) / bfrRS)
Total Cost estimate = bR + (bR * bS) + ((js * rR * rS) / bfrRS)
Cost formula using the number of available memory blocks = bR + ( bR/(nB-2) * bS) + ((js * rR * rS) / bfrRS)
Join selectivity js = (1/dDnumber,Department) = 1/125 as Dnumber is the Key of Department and also assume bfrED = 4
records/block.
Using NLJ with Employee as Outer Loop:
Cost estimate = bE + (bE * bD) + ((js * rE * rD) / bfrED) = 2000 + (2000*13) + (((1/125)*10000*125)/4) = 30500
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
Join selectivity js = (1/ dSsn,Employee) = 1/10000 as SSN is Key in Employee and also assume bfrED = 4 records/block.
Using NLJ with Employee as Outer Loop:
Cost estimate = bE + (bE * bD) + ((js * rE * rD) / bfrED) = 2000 + (2000*13) + (((1/10000)*10000*125)/4) = 28,032
Using NLJ with Department as Outer Loop:
Cost estimate = bD + (bE * bD) + ((js * rE * rD) / bfrED) = 13 + (2000*13) + (((1/10000)*10000*125)/4) = 26,045
Using SLJ with Employee as Outer Loop: (Secondary Index on Mgrssn of Department)
Cost estimate = bE + (rE * (xMGRSSN + sMGRSSN)) + ((js * rE * rD) / bfrED) = 2000 + (10000*3) + (((1/10000)*10000*125)/4)
= 32,032
Using SLJ with Department as Outer Loop: (Secondary Index on Ssn of Employee)
Cost estimate = bD + (rD * (xSSN + sSSN)) + ((js * rE * rD) / bfrED) = 13 + (125*5) + (((1/10000)*10000*125)/4) = 670
Using Hash Join
Cost estimate = 3 * (bE + bD) + ((js * rE * rD) / bfrED) = 3 * (13+2000) + (((1/10000)*10000*125)/4) = 6071
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
SOME FORMULAS
Number of tuples in relation R = rR
Number of blocks in relation R = bR
Blocking factor of relation R, bfrR = rR/bR
Number of distinct values of attribute A of relation R = dA,R
Selectivity of attribute A of relation R, slA,R = (1/dA,R) ----> ratio of tuples satisfying an equality condition on A
Selection Cardinality of attribute A of relation R, sA,R = (slA,R * rR) ----> number of tuples of R satisfying an equality
condition on A
Join Selectivity of (R R.A=S.B S), jsRS = 1/Max(dA,R, dB,S) If (Both A and B are Keys) or (Both A and B are Non-Keys)
Join Selectivity of (R R.A=S.B S), jsRS = 1 / dKey,Relation If (Either A or B is the Key)
Join Cardinality of (R R.A=S.B S) = (jsRS * rR * rS) ----> number of tuples of (RXS) satisfying join condition R.A=S.B
Write cost of resulting relation RES (from Selection/Join) i.e. number of blocks to be written to disk = rRES/bfrRES
rRES of relation RES resulting from a selection operation for equality condition on Key attribute A = 1
rRES of relation RES resulting from a selection operation for equality condition on Non-Key attribute A = sA,R
rRES of relation RES resulting from a selection operation for comparison condition = scondition,R
rRES of relation RES resulting from a join operation = (jsRS * rR * rS) or (slA,R * rR * slB,S * rS) or (sA,R * sB,S)
Blocking factor bfrRES for result of selection operation ( c R) = Blocking factor of its parent relation i.e. bfrR
Blocking factor bfrRES for result of join operation (R S) = 1/(1/bfrR + 1/bfrS)
SUMMARY OF COSTS OF SELECTION APPROACHES
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
SUMMARY OF COSTS OF JOIN APPROACHES
Nested Loop Join
Using NLJ, Worst-case Read Cost = bR + (bR * bS), Best-case Read Cost = bR + bs
Using NLJ, Write Cost = ((js * rR * rS) / bfrRS)
Total Cost estimate = bR + (bR * bS) + ((js * rR * rS) / bfrRS)
Cost formula using the number of available memory blocks = bR + ( bR/(nB-2) * bS) + ((js * rR * rS) / bfrRS)
UNIVERSITY OF CALGARY
DEPARTMENT OF COMPUTER SCIENCE
WINTER 2016
CPSC 571: Design and Implementation of Database Systems
TA: Ayeshaa Parveen Abdul Waheed
Tutorial # 05 & 06
(Q3)
L(A,B,C,D)
R(C,D,E,F,G)
NB = 8
rL = 20,000, bL = 80
rR = 50,000, bR = 25
4 level Secondary Index available on C in R.
C is uniformly distributed in R with distinct values dC,R = 10
Compare the Cost of implementing following query for two cases (i) Benefiting from Index on C, (ii) Neglecting
the Index on C.
L L.C=R.C ( c=5 R)
(Q4)
Farmer(sin,name,age,sex,vname)
Village(name,area,population,province)
Kids(sin,f_sin,m_sin,s_name)
School(sname,vname,no_classes)
Main memory, NB = 6 blocks
Size of each block = 4000 bytes
Farmer: Number of records rF = 250; Each record size RF = 50 bytes
Village: Number of records rV = 20; Each record size RV = 75 bytes
Kids: Number of records rK = 3000; Each record size RK = 25 bytes
School: Number of records rS = 2500; Each record size RS = 50 bytes
1. Write the following query in SQL
2. Find the optimized Relational Algebra expression using Heuristic approach
3. Estimate the cost of this query
Query: Find the SIN of kids attending a school not located in their village (assume kids are living with their mother).