Sunteți pe pagina 1din 8

Implementation of Database Exercise 2

Tanmaya Mahapatra Matriculation Number : 340959 tanmaya.mahapatra@rwth-aachen.de Bharath Rangaraj Matriculation Number : 340909 bharath.rangaraj@rwth-aachen.de Manasi Jayapal Matriculation Number : 340892 manasi.jayapal@rwth-aachen.de November 10, 2013

1
1.1

Exercise 2.1 [Query Languages] :


What does relational completeness mean? Show that SQL is relationally complete by enumerating SQL constructs corresponding to selection, projection, Cartesian product, union, and dierence.
Relational Completeness

1.1.1

A database query language L is relationally complete if it is at least as expressive as relational algebra, i.e., every relational algebra expression E has an equivalent expression F in L. Relational completeness provides a benchmark for the expressive power of a database query language. Relational completeness is a threshold that every commercial Database query language should meet or exceed. The SELECT DISTINCT ... FROM ... WHERE ... construct of SQL makes it possible to express Cartesian product, projection, and selection. In addition to this, SQL has explicit constructs for union and dierence : 1. The union A B of two relations A and B is expressed by (SELECT * FROM A) UNION (SELECT * FROM B) 1

EXERCISE 2.2 [TABLEAU REPRESENTATION] :

2. The dierence A B of two relations A and B is expressed by (SELECT * FROM A) EXCEPT (SELECT * FROM B)

1.2

Give two examples of SQL constructs/semantics not expressible in relational algebra (RA).

Solution : The two examples of SQL constructs/semantics not possible to implement in Relational Algebra are Aggregate functions, GROUP BY CLAUSE and NOT IN.

1.3

Figure 1 shows the ow of a query through a DBMS, in which dierent forms are used to represent a query at dierent stages. Fill in the three blanks with the corresponding query languages (i.e., SQL, RC, RA).

Solution : 1. SQL Query 2. Relational Calculus 3. Relational Algebra

2
2.1

Exercise 2.2 [Tableau Representation] :


Solution for Question 1 of Ex 2.2

Optimizing the Query we get A,B,C ((c=1) (R)) TAG TARGET R A a a B b b C c c=1

2.2

Solution for Question 2 of Ex 2.2


TAG TARGET R R R R R R a1 a1 a1 a2 a2 b1 b2 b3 b3 b1 a2 b2 a1 b1 b1 b3 b2 b4 b4 b4

1.2 Give two examples of SQL constructs/semantics not expressible in relational algebra (RA).

EXERCISE 2.3 [SORTING & DATABASE] :

Exercise 2.3 [Sorting & Database] :

Suppose you have a le of 25,000 pages and seven buer pages and you are sorting it using external merge-sort. Please answer the following questions:

3.1

How many runs will you produce ?

Solution : 1. The pass 0 will involve all the 7 buer pages. So the runs is 25,000/7 = 3572 runs. 2. From the rst pass the algorithm will involve only 6 buer pages and the remaining one is for output. So the runs is 3572/6= 596 runs 3. Second pass 596/6= 100 runs 4. Third pass 100/6= 17 runs 5. Fourth pass 17/6 = 3 runs 6. Fifth pass 1 run So the total number of runs is 3572 + 596 + 100 + 17 + 3 + 1 = 4289 runs

3.2

How many passes will it take to sort the le completely ?

Solution : logB 1 N 1 + 1, where N 1 = N/B is the number of runs produced by Pass 0 , B is the number of buers and N is the number of pages. i.e 25000/7 = 3572 runs are produced in pass 0. i.e log6 3572 + 1 = 6 passes.

3.3

How many buer pages do you need at least to sort the le in two passes ?

Solution : The number of buer pages required to sort the le in two passes should satisfy the following condition :

EXERCISE 2.4 [SELECTION] :

1. B (B 1) N Where B is the number of Buer pages and N is the number of pages When B = 159 159 (159 1) 25000 25122 >= 25000 159 is the minimum number that satises the above condition. 2. The number can be validated using the following condition B 1 >= N/B 158 >= 25000/159 158 >= 157.23 i.e it satises the condition. So we need 159 buers to sort the records in two passes.

Exercise 2.4 [Selection] :

Given is a relation with 30000 records. Each page for a node in a B + -tree can hold 20 pointers to records or pages. A data page can store 10 records.

4.1

Assume that each node is 70 % full. What is the height of the B + -tree?

Solution : The height of a B + tree is given by : logF (N umberof datapagesorleaves) where F is fanout. i.e A page can hold upto 20 pointers and the data page can hold 10 records. A node is 70 % full F = 14 log14 (30000) = 4 So the Height is 4.

4.2
4.2.1

What are the I/O costs for an equality selection on a non-key attribute for the following cases?
With a clustered B + -tree of height 10 (matching records are located in one page);

Solution : Height of the B + tree + 1 (since it is a clustered index) i.e 10 + 1 = 11 I/Os

EXERCISE 2.5 [EXTERNAL SORTING] :

4.2.2

Without any index, nor is the le sorted on the attribute occurring in selection;

Solution : If there is no index and if it is not sorted then the whole relation has to be scanned so Cost : N (number of records) 4.2.3 with an unclustered B + -tree index of height 8, and there are 3 matching records;

Solution : In the worst case consider that the three matching records are in three dierent pages Cost : 8 + 3 = 11 I/Os 4.2.4 with an unclustered B + -tree of height 7 and one tenth of the records match the selection.

Solution : Total number of records is 30000 10 % of it is 3000 In the worst case if all the 3000 records are in dierent pages Cost : 7 + 3000 = 3007 I/0s

Exercise 2.5 [External Sorting] :

Suppose that you just nished inserting several records into a heap le and now want to sort those records. Assume that the DBMS uses external sort and makes ecient use of the available buer space when it sorts a le. Here is some potentially useful information about the newly loaded le and the DBMS software available to operate on it: The number of records in the le is 5000. Each record is a total of 24 bytes long. The page size is 1024 bytes. Each page has 64 bytes of control information on it. Five buer pages are available.

5.1

How many sorted subles will there be after the initial pass of the sort, and how long will each suble be ?
N umber of records in the f ile = 5000 Length of each record = 24 bytes T otalsize = 5000 24 = 120000 bytes P age size = 1024 bytes Control Inf ormation in each page = 64 bytes (1) (2) (3) (4) (5)

Proof:

EXERCISE 2.5 [EXTERNAL SORTING] : Actual P age size = 1024 64 = 960 bytes N umber of pages in f ile = 120000 bytes = 125 = N 960 bytes

6 (6) (7) (8)

Available N umber of Buf f ers (B ) = 5

Pass 0: In Pass 0, read in 5 pages at a time and sort internally to produce N/B runs of B pages each (except for the last run, which may contain fewer pages). Pass 0: Pass 0 produces N 1 = 125 5 = 25 runs of 5 pages each. 25 sorted subles of 5 pages each will there be after the initial pass of the sort.

5.2

How many passes (including the initial pass just considered) are required to sort this le ?

Solution : The total number of passes required to sort this le including the initial pass is given by the formula : log(B 1) N 1 + 1. Here N 1 = 25 and B = 5 Hence : = log(51) 25 + 1 = log(41) 25 + 1 = 2.32 + 1 =3+1 =4 4 passes are required to sort this le including the initial pass.

5.3

What is the total I/O cost for sorting this le ?

Solution : The I/O cost for sorting is given by the formula 2 N {logB 1 N 1 + 1}. Here N 1 = 25, N = 125 and B = 5 Hence : 2 N {logB 1 N 1 + 1} = 2 125 {log51 25 + 1} = 2 125 {log4 25 + 1} = 2 125 {2.3219280949 + 1} = 2 125 {3 + 1} = 2 125 4 = 1000 The total I/O cost for sorting this le is 1000

5.2 How many passes (including the initial pass just considered) are required to sort this le ?

EXERCISE 2.5 [EXTERNAL SORTING] :

5.4

What is the largest le, in terms of the number of records, you can sort with just ve buer pages in two passes? How would your answer change if you had 263 buer pages ?

Solution : Proof: Let the Number of pages in le (N ) = x. Number of Buers available (B ) = 5. In Pass 0 : x 5 = N1 Now, we know that the le should be sorted in 2 passes including the initial pass. We have the formula log(B 1) N 1 + 1 for calculating number of passes. log(B 1) N 1 + 1 = 2 = log(51) (x/5) + 1 = 2 = log4 (x/5) + 1 = 2 Now the ceil value of log4 (x/5) should be 1 for the equation to hold true. x/5 = 4 = x 5 =4 = x = 20 Now since number of pages (N ) = 20. Size of the le with 20 pages is 20 (pagesize controlinf osize) = 20 (1024 64) = 20 960 = 19200 bytes. F ileSize Number of records is Sizeof 1Record = 800 = 19200 24 The largest le with 800 records can be sorted with just ve buer pages in two passes. Now if we have 263 Buers available i.e. B = 263 Let the Number of pages in le (N ) = x. x In Pass 0 : 263 = N1 Now, we know that the le should be sorted in 2 passes including the initial pass. We have the formula log(B 1) N 1 + 1 for calculating number of passes. log(B 1) N 1 + 1 = 2 = log(2631) (x/263) + 1 = 2 = log262 (x/263) + 1 = 2 Now the ceil value of log262 (x/263) should be 1 for the equation to hold true. x/263 = 262 x = 263 = 262 = x = 68906 5.4 What is the largest le, in terms of the number of records, you can sort with just ve buer pages in two passes? How would your answer change if you had 263 buer pages ?

EXERCISE 2.5 [EXTERNAL SORTING] :

Now since number of pages (N ) = 68906. Size of the le with 68906 pages is 68906 (pagesize controlinf osize) = 68906 (1024 64) = 68906 960 = 66149760 bytes. F ileSize Number of records is Sizeof 1Record = 2756240 . = 66149760 24 With 263 Buers available the largest le with 2756240 records can be sorted in two passes.

5.4 What is the largest le, in terms of the number of records, you can sort with just ve buer pages in two passes? How would your answer change if you had 263 buer pages ?