Sunteți pe pagina 1din 61

DATABASE MANAGEMENT SYSTEM

DBMS is software system and its main purpose is to store data but every software which
stores data is not DBMS.
Characteristics of DBMS
- it must provide a easy language for retrieval and manipulation of data.
 Language should not have complex programming techniques and it should support
structural programming.
 Language supported by DBMS – SQL(Structured Query Language)
- It must provide concurrent access to data( Multiple transaction can be performed on data at
a time).
- It must provide data integrity ( No replicated data).
- It must provide security( prevent accessing of data by unauthorized user)

DBMS store data in form of table or relation. A table consist of rows( or records or tuples)
and column(or field or attribute). A Database is collection of tables. Consider student table
given below.
RollNo Name Address DOB
1 Vivek 12, jivajinagar 18/8/87
2 Priyesh M25-gandhinagar 15/6/90

In this Student table there are four columns(RollNo, Name, Address, DOB) and two
rows(records). Each column is defined with a data type. Some General data types supported
by DBMS are:
1. Varchar- for string of characters
2. Number- for numeric values
3. Date- for date and time

Constraints are the conditions which are required to be satisfied when data is inserted or
deleted or modified. Suppose for Student table a constraint is defined to check age of students
is not more than 20, then we can apply check for DOB > “31/12/1990”. If we try to insert
entry for a student with DOB less than 31/12/1990 then insertion of record will give error and
transaction will not be completed successfully.

RDBMS- A software system is said to be a Relational DBMS if it follows all 12 rules


suggested by E. F. Codd. Oracle follows only 7-8 rules and MS access follows only 3-4 rules.

STRUCTURED QUERY LANGUAGE


This is standard query language which should be implemented by all RDBMS for defining,
manipulation and retrieval of data. SQL statements can be divided into 3 categories:
(i) DDL( data definition language): used to define tables. e.g. Create, Alter, Update
(ii) DML(data Manipulation Language): used to manipulate data in tables. e.g. insert, update,
delete, select
(iii) DCL(data control language): e.g. Commit, rollback

Uses of some component of SQL is given below:


1. Create: used to create a table
Example: create table Student ( Rollno Number(5),
Name Varchar(20) Not Null,
Address varchar(30),
DOB Date )

ENGINEER’S CIRCLE, GWALIOR Page 1


This query will create a table with four fields
i. Rollno- which is number type and 5 digit it can have at max.
ii. Name- which is variable character string type with maximum 20 character. Not null is
constraints which specify that values in Name field cannot be null.
iii. Address-which is variable character string type with maximum 30 character
iv. DOB- which is date type

RollNo Name Address DOB

2. Insert: used to Insert data into table.


Example: insert into Student values (1,’Vivek’,’12, jivaji nagar’,’26/10/1987’ )
This query will insert a tuples into table Student.
RollNo Name Address DOB
1 Vivek 12, jivajinagar 26/10/1987

Note: when we use varchar and date data in query thenit should be in single quotes.
Keywords in SQL, column names and table names are not case sensitive but data is case
sensitive.

If we want to insert data in particular fields then query will be as


insert into student( Rollno, Name) values (2 ,’priyesh’)
now table will be:
RollNo Name Address DOB
1 Vivek 12, jivajinagar 18/8/87
2 Priyesh Null Null

If we try to insert a tuple without name then query will give error in execution, because we
have defined Name field with constraint ‘not null’.
Example: insert into student (Rollno) values(3);
This query will generate error because null value is not accepted in Name field.

3. Update: used to update values in table


Example: update student set Address =’21,gandhinagar’
This query will update all values in Address field to ’21,gandhinagar’
RollNo Name Address DOB
1 Vivek 21,gandhinagar 18/8/87
2 Priyesh 21,gandhinagar Null

But if we want to change address of particular fields then we can use where clause.
Example: update student set Address= ‘12,Jivajinagar’ where rollno=1
This query will update address of student with rollno 1 to ‘21,gandhinagar’

4. Delete: used to delete rows from table


Example: delete from student
This will delete all rows in student table
If we want to delete particular row then we can use where clause.
Example: delete from student where DOB < ‘31/12/1990’
This query will delete all records with date of birth less than ‘31/12/1990’.

5. Commit: save changes permanently to disk.


Example: update student set Name=’Vikas’ where rollno=1;

ENGINEER’S CIRCLE, GWALIOR Page 2


Commit;
‘;’ is used to separate multiple statements. Commit statement make these changes
permanently to disk. If you don’t commit the transactions than changes are made only in
main memory and if suddenly main memory switch off( or computer crashes) then these
changes are not visible when you reboot computer. Generally DBMS implements
autocommit(automatically executes commit in some period of time ).

6. Rollback : undo changes since last commit.


7. Drop: used to delete table
Example: drop table student
This query will removes table from the disk
Note: drop and create can not be rollbacked.

8. Select: used to view data


Example: select * from student
This query will display all rows and column of student table.

To view particular row and column where clause can be used as


Select rollno,name from student where DOB > ‘1/1/1995’
This query will return only two column rollno and name of students those are born after
1/1/1995. In this type of query, records are selected first on the basis of conditions in
where clause and then fields specified in the query will be displayed.

9. Distinct: used to produce non-duplicate result.


Example: select distinct name from student
This query will return name of students and if two or more students have same name then
only one name is returned.
Query: select distinct name ,address from student
This query will give error when there are two students with same name and different
address.

10. Where clause: to understand where clause and conditionsConsider Relation Employee (
empid, ename , salary, job, deptid )

Q1.write a Query to find name of all employee of deptid 10


Ans. select ename from employee where deptid=10

Q2. write a query to find name of all manager from deptid 20


Ans. select ename from employee where deptid=20 andjob= ‘Manager’

Q3.write a query to find name of employee belongs to deptid 10 or deptid 20


Ans. select ename from employee where deptid=10 ordeptid=20
This query can be rewritten with the use of ‘in’ as
Select ename from employee where deptidin(10,20)
‘in’ checks for set membership. In this example ‘in’ checks whether deptid {10,20}

Q4.Write a query to find name of employees having salary between 1000 and 2000
Ans. Select enamefrom employee where salary>=1000 and salary <=2000

This query can be rewritten with the use of ‘between’ as


Select ename from employee where salary between (10,20)

ENGINEER’S CIRCLE, GWALIOR Page 3


Between includes boundary condition. In above example salary is also equated with 1000
and 2000.

Q5. Write a query to find name of employees have their name start with A or B or C or D

Note:Before solving this query we first understand how strings are handles in SQL.
Suppose if S1= ‘ABC’ and S2= ‘X’ then which one of the S1 and S2 is greater. SQL
compares string from left to right and by position of each character according to their
ASCII value. ‘A’ from S1 is compared with ‘X’ from S2 , since ASCII value of ‘ X’ is
greater than ASCII value of ‘A’, so S2 > S1 is answer. If first character in S1 and S2 is
same then they are compared according to next character in position.
Examples:

S1= ‘ADAMS’ S1= ‘ADAMS’


S2= ‘A’ S2= ‘D’
S1 > S2 because ASCII value of ‘D’ S1 < S2 because ASCII value of
is greater than ASCII value of NULL ‘D’ is greater than ASCII value of
‘A’

Ans. Now to find name of employees whom name start with A or B or C or D , if we use
this query
Select ename from employee where ename between ‘A’ and ‘D’
then it will display all names start from ‘A’ and ‘B’ and ‘C’ and only ‘D’ , not all name
starting with ‘D’ . Instead this we can use this query

Select ename from employee where ename< ‘E’ and ename between ‘A’ and ‘E’

Q6. Write a query to find name of employee having no deptid.


Ans. Select ename from employee where deptid=null
This query will not display any row because Null cannot be used with relational operators.
Deptid=null neither returns true nor false but it returns null.

To compare with Null SQL provide ‘is’ and ‘is not’ operators. Above query correctly can
be written as:
Select ename from employee where deptid is null

Q7. Write a query to find name of employees whom name starts with ‘C’ and ends with
‘T’.
Ans. for this type of quries SQL provides wild cards and Like operator. Wild card ‘%’ is
used for string of zero to any length. Wild card ‘_’ is used for any single character.Like
and not like operator is used when string with wild card is compared. Query for this
question is:
Select ename from employee where ename like ‘C%T’

Q8. Write a query to find name of employees whom name contains ‘C’ as second
character.
Ans. Select ename from employee where ename like ‘_C%’

Q9. Write a query to find name of employees whom name contain at least two ‘T’ .
Ans. Select ename from employee where ename like ‘%T%T%’

Q10. Write a query to find name of employees whom name contain at exactly two ‘T’ .

ENGINEER’S CIRCLE, GWALIOR Page 4


Ans. Select ename from employee where ename like ‘%T%T%’ and not like ‘%T%T%T
%’

Q11. Write a query to find name of employees whom name contain exactly two characters.
Ans. Select ename from employee where ename like ‘_ _’

Q12. Write a query to find name of employees whom name contain ‘_’ .
Ans. If wild card character are to be compared in data then ‘/’is used before character as a
SCOTT escape character.
Allen Select ename from employee where ename like ‘% /_ %’
Null
123 Q13. Write a query to find name of employees whom name contain ‘/’
.
Adams
Ans. Select ename from employee where ename like ‘% / / %’
Smith
11. Order by: used to display data in(either increasing or decreasing) order. In tables data is
present in the order in which we insert data. But while retrieving data we can use order by
keyword to display data in increasing or decreasing order.
Example: select enamefrom employee order by ename
This display names in alphabetical order. Suppose employee table contains ename
Null
123
Alphabetical increasing order Adams
SCOTT
Allen
Smith

If there are two or more same name then we can order these same name according to other
column
Select ename from employee order by ename, job desc
Desc used to specify decreasing order.

12. Aggregate functions:consider Employee table given below:

empid Ename Job Salary Depid


1 Vivek Manager 50000 25
2 Priyesh Programmer 20000 25
3 Pawan HR 30000 26
4 Hermesh programmer 22000 Null
5 Null Manager Null Null

There are 5 aggregate function supported by SQL


(i) Sum(column_name) –used to find sum of values in field column_name
Example: select sum(salary) from employee
This query will return sum of salaries of all employee.
Sum(salary)
122000

ENGINEER’S CIRCLE, GWALIOR Page 5


(ii) count(column_name)- used to count records in field column_name. ifcolumn_name
contains any null value then it doesn’t count that row. Count can be used to count a tupple
of two or more column. Suppose if we use count(empid,ename) then it will treat them as
single field and count for each (empid,ename) tuple.

Query: Select count(salary) from employee


Output:
Count(salary)
4

Query: count(salary, deptid) from employee


Output:
Count(salary,deptid)
4

If you count a tuple of two or more fields then only tuple having all null values {Null,
Null} is counted as 0 and tuple like {22000,Null} will be counted as one .

(iii) max(column_name)- used to find maximum in a column


Query: select max(salary) from employee
Output:
max(salary)
50000
(iv) min(column_name)- used to find minimum in a column
Query: select min(salary) from employee
Output:
min(salary)
20000

(v) avg(column_name)- used to find average of values in a column and implemented as


sum(column_name) / count(column_name).
Query: select avg(salary) from employee
Output:
avg(salary)
30500
122000 / 4 =30500

13. As :used for renaming a field returned by select. In above query avg(salary) is column
name return by select, we can rename it using ‘as’ operator.
Query: select avg(salary) as average_salary from employee
Output:
Average_salary
30500

14. Group value function(group by): suppose if we want to find out average salary in each
department.
Query: select deptid, avg(salary) from employee group by(dept id)
Output:
Deptid Avg(salary)
25 35000

ENGINEER’S CIRCLE, GWALIOR Page 6


26 30000
Null 20000

If null is a entry in deptid then a separate group is created for deptidhaving null value and
then aggregate function is applied.

If you try to print avg(salary) and deptid without using group function as
Select deptidd, avg(salary) from employee
Then this query will give error because result is not compatible (too see how result is not
compatible try to draw table for above query).

Note:we can select column with aggregate functions only if that column appear in Group
by function.

Query: select deptid, job, avg(salary) from employee group by(deptid, job)
This query first divide table int the group of deptid, then each deptid divided into subgroup
of job.
Output:
Deptid Job Avg(salary)
25 Manager 50000
25 Programmer 20000
26 HR 30000
Null Manager Null
Null Programmer 22000

15. Having: its like where clause used only for group by function. We cannot apply where
clause to group by functions. So you want to apply aggregate function with some
conditions to each group divided by group by function then use having as:

selectavg(salary) from employee group by (deptid) having count(empid) >= 5


This query returns average salary of department having at least 5 employees. First table is
grouped according to deptid and group having at least 5 employess are selected then
average is found for selected group.

Q14.write a query to find deptid with more than 5 employee having salary more than
5000.
Ans. select deptid from employee where salary > 5000 group by(deptid) having
count(empid) >=5

16. Nested Queries- query within a query is called Nested query


Example: select ename from employee where salary = ( select max (sal) from emp)
In this query, query inside parentheses is inner query. Iinner query executes first and return
maximum salary in employee table which is 50000 ,then outer query will look like:
Select ename from employee where salary =50000

Q15.write a query to name of employees having salary greater than ‘Hermesh’.


Ans. select ename from employee where salary >( select salary from employee where
ename = ‘ Hermesh‘)
This query run correctly if there is only one employee with name ‘Hermesh’ , inner query
return single value. But, if there are more than one employee with name ‘Hermesh’ then
inner query returns a set of values and > operator (all relational operator) can compare

ENGINEER’S CIRCLE, GWALIOR Page 7


values but not set of value. In this case above condition look like: salary > {20000 ,
50000} , which is not a valid statement, so above query will give error. To overcome this
type of error Any,All and Some is used with relational operators.

Example: select ename from employee where salary > all (select salary from employee
where ename = ‘ Hermesh‘).
Now, if table have more than one employee with name ‘Hermesh’ and set of salaries are
returned by inner query, suppose it is {1000,30000,15000,20000}, then ‘>All’ will
compare values in salary field in table by all values in set. In simpler way, ‘>All’ select
maximum value from set and then compare this maximum value with values in table.
There is no need to compare for all other values in set. If values in salary field in table is
greater than maximum value in Set than it will also greater than all other values in Set.

Similarly,
<All - means Minimum in set
=ALL - not possible
> Any - means Minimum in set
<Any - means Maximum in set
=Any - same as ‘In’ operator
Some is same as Any.

Nested Queries for two or more tables


Examples: consider two relations employee and department
Employee(empno, ename,deptno)
Department(deptno, dname,dcity).

Q16.write a query to find name of employee of ‘Research’ department.


Ans. select ename from employee where deptno in (select deptno from department where
dname=’Research’)
Here inner query run on department table and return deptno, this deptno can be equated
with deptno of Employee table.

Q17.write a query to find name of employees whom department is in Delhi.


Ans. select ename from employee where deptno in (selectdeptno from department where
dcity=’Delhi’

17. Key constraints:


Unique Key – those fields in relation have unique values, can be assigned unique keys. A
unique key can have any number of null values. For example, if we assign ename field in
employee table to unique key then it cannot contain two or more employees with same
name, But it allows fields to have any number of null values.
Unique key can be assigned to group of fields. Example: (empid, ename) can be assigned a
unique key. In this case employees can have same name or same empid but two or more
employess cannot have both empid and ename same.

Primary key-This is defined as unique key with a Not Null constraint. If a field is
primary key than it cannot have null values, similarly if group of fields is primary key then
all fields for same record cannot have null values.Example: if (empid, ename) is primary
key then values (1,Null) is allowed but values (Null,Null) is not allowed in table.

Primary key uniquely identifies a record.

ENGINEER’S CIRCLE, GWALIOR Page 8


In relations primary key is denoted by underlining the keys. For example relation
employee(empid,ename,deptno) has primary key (ename,deptno).

Candidate key: keys or combination of keys those can be assigned as primary key. In
other words keys those are candidature of primary key are called as primary keys. For
example: ifboth empid and (ename,deptno) are able to assigned as primary key of table, so
both are candidate key. But only one of the is chosen as primary key, We should choose
candidate key which is used most in queries. Primary key is used for indexingpurpose in
table and indexes speed up searching process.

Super key: all possible keys or combination of keys those uniquely identify a record is
called super key.
Primary key(candidate key) is minimal set of super keys which uniquely identifies a record
in table. Super keys can be subdivided while candidate key cannot be divided further,
means if combination of keys is taken as candidate key and if we remove any field from
that combination then the combination will not remain candidate key or super key. If we
add more fields to primary key then combination will not remain candidate key but
becomes super key. Suppose empid and (ename,deptno) is primary key of employee
relation then possible super keys are: empid , (empid,ename) , (empid,deptno) , (ename
,deptno) and (empid,ename,deptno)

All candidate key are super key but not all super keys are candidate key. Similarly, every
primary key is candidate key but not vice-versa.

Unique key does not identify a record uniquely but values in this field or combination of
field are unique.

While defining table we can create primary key and unique key.
Example: create table employee (empidnumber(6) primary key ,
enamevarchar(20) unique,
deptno Number(2) );

if we want to make (ename,deptno) primary key then query is:


create table employee (empid number(6) ,
enamevarchar(20),
deptno Number(2) ,
primary key(ename, deptno)
);
)

Foreign Key:consider two relations employee(empid,ename,deptno) and


department(deptno, dname,city).If we want to make sure thatdeptno inserted or modified
in employee relation must be in department table or in other words, every employee should
be in a valid department, thenmake deptno of employee table (employee.deptno) foreign
key which references to deptno field of department table(department.deptno). Query to
create forign key is:
First define table department
Create table department(deptno Number(2),
dnamevarchar(20)
cityvarchar(20) );
now define table employee as
Create table employee (empidnumber(6) ,

ENGINEER’S CIRCLE, GWALIOR Page 9


enamevarchar(20),
deptno Number(2) ,
foreign key employee.deptno references department.deptno
);
Note:Foreign key can have any number of null values and field that is referred by foreign
key must be unique.

Suppose entries in tables are

Table: Department Table: Employee


Deptno Dname City Empid Ename Dep
10 Research Gwalior tno
12 Account Gwalior 1 Vivek 12
13 Managing Indore 2 Hermesh 10
3 Priyesh 10
4 Pawan Null

If we try to insert tuple(5, XYZ,20) in employee table then query will generate error
because deptno 40 is not in Department table.

Suppose if we delete row with deptno=10 from department table then two records(having
empid 1 and empid 2) in employee table, given above, become invalid, so SQL not allow
to delete row from department table until these two rows are deleted.

On delete cascade :this is used while defining foreign key for above situation.
Example: Create table employee (empidnumber(6)
,eeamevarchar(20),deptnoNumber(2) ,foreign key employee.deptno references
department.deptnoon delete cascade ) ;
Now , rows having deptno 10 in employee table is deleted first then row(with deptno 10) is
deleted from department table.

On delete NULL:in above case data from employee table is lost, For example ,if a
department is removed from company then this does not mean that employee belongs to
that company also removed but they can be shifted to other department or remain in no
department. For this case ‘on delete Null’ can be used instead of ‘on delete cascade’. So
when we delete row with deptno=10 from department table , first deptno of ‘Hermesh’ and
‘Priyesh’ is set to null then row is deleted from department table.

18. Join: to understand join first we look at the Cartesian product of table. Result of the
Cartesian product of two tables from previous example is (each row of one table is
combined with each row of other table):
Employee × Department

Em Ename Employee. Department. Dname City


pid Deptno Deptno
1 Vivek 12 10 Research Gwalior
1 Vivek 12 12 Account Gwalior
1 Vivek 12 13 Managing Indore
2 Hermesh 10 10 Research Gwalior
2 Hermesh 10 12 Account Gwalior
2 Hermesh 10 13 Managing Indore

ENGINEER’S CIRCLE, GWALIOR Page 10


3 Priyesh 10 10 Research Gwalior
3 Priyesh 10 12 Account Gwalior
3 Priyesh 10 13 Managing Indore
4 Pawan Null 10 Research Gwalior
4 Pawan Null 12 Account Gwalior
4 Pawan Null 13 Managing Indore

If 1st table have m rows and 2nd table have n rows then Cartesian product results in m×n
rows.
Query for above Cartesian product is:
Select * from employee , department
If two tables have same column name then they are differentiated by preceding table name
and a dot(‘.’) .

Now if we want department information of each employee then query will be:
Select empid ,ename, department.deptno, dname, city from employee, department where
employee.deptno = department.deptno;
Result of this query will be:

Empid Ename Department.Deptno Dname City


1 Vivek 12 Account Gwalior
2 Hermesh 10 Research Gwalior
3 Priyesh 10 Research Gwalior

This is called Naturaljoin, in which two or more tables are joined according to their
common fields.

Q18.write a query to find name of employees in account department.


Ans. select ename from employee, department where employee.deptno =
department.deptno and dname=’Account’

This problem can be solved by using nested query as:


Select ename from employee where deptno=(select deptno from department where
dname=’Account’);

Note: There are some problems that can be solved by both join and nested query , some
problems can only solved by join, similarly, some problems can only solved by nested
queries. There are also some problems which cannot solved by both join and nested
queries.

Self join: Joining a table from itself.


Consider the table employee (empid,ename,mgrid) where empid is primary key of table
and mgrid is foreign key references to empid of employee table. This table contain
information about employees and their manager.

Empid Ename Mgrid


1 Vivek Null
2 Hermesh 3
3 Priyesh 1
4 Pawan 5
5 Ravindra 7
ENGINEER’S CIRCLE, GWALIOR Page 11
6 Aditya 1
7 Mohan 6

Now if we want to find out who is manager of Aditya then we have to use self join as:
Select e2.ename from employee e1, employee e2 where e1.mgrid=e2.empid and
e1.ename= ‘Aditya’;
Here e1 and e2 are Alias of employee table. e1 and e2 are like two copies of employee
table then these copies joined by equating e1.mgrid and e2.empid.

Outer Joins:Notice that much of the data is lost when applying a join to two relations. In
some cases this lost data might hold useful information. An outer join retains the
information that would have been lost from the tables, replacing missing data with nulls.
There are three forms of the outer join, depending on which data is to be kept.
 LEFT OUTER JOIN - keep data from the left-hand table
 RIGHT OUTER JOIN - keep data from the right-hand table
 FULL OUTER JOIN - keep data from both tables
Example:

19. Correlated queries: this is special type of nested query in which inner query executed for
every row selected outer query.
Example: for relation employee(empid, ename, deptno, salary) write a query to find the
name of employees earning highest salary in their department.
Select * from employee e1 where e1.salary=(select max(salary) from employee e2
e1.deptno=e2.deptno)
Now suppose if there are 10 records in employee table selected by outer query then inner
query is executed for each record in employee.(i.e. 10 time total). For example if employee
table is:

Empid Ename deptno Salary


1 Vivek 10 30000
2 Hermesh 20 20000
3 Priyesh 20 12000
4 Pawan 10 25000
5 Ravindra 30 30000
6 Aditya 30 10000
7 Mohan 20 25000
st
First inner query runs for 1 record and in inner query put 30000 for e1.salary and 10 for
e1.deptno. Inner query returns max salary of department 10, then outer query compares
values and then save result. Now inner query runs for 2 nd ,3rd ,….. 7th record in sequence .
Finally result is produced.

ENGINEER’S CIRCLE, GWALIOR Page 12


Empid Ename deptno Salary
1 Vivek 10 30000
5 Ravindra 30 30000
7 Mohan 20 25000

20. Grant and Revoke: used for deciding access permission for select/update/delete/insert/
queries on table/views to other users:
Example:
Grant update, delete on employee to Rahul
This query gives permission to Rahul to delete and update on employee table.
Query: Grant all on employee to Rahul with grant option
This query gives permission to Rahul for all operation on employee table and also
permission to give permission to other user.
Similarly, Revoke is used to withdraw/cancel granted permission from a user.
Example: revoke update on employee from Rahul

21. Views:A SQL View is a virtual table, which is based on SQL SELECT query. Essentially
a view is very close to a real database table (it has columns and rows just like a regular
table), except for the fact that the real tables store data, while the views don’t. The view’s
data is generated dynamically when the view is referenced. A view references one or more
existing database tables or other views. In effect every view is a filter of the table data
referenced in it and this filter can restrict both the columns and the rows of the referenced
tables.

Here is an example of how to create a SQL view using already familiar employee and
department table
Create view employeeinfo as
Select empid ,ename, department.deptno, city
From employee, department where employee.deptno=department.deptno

Importance of views: if we want that other user should have access to only some fields of
tables then create views using only those field and make view accessible to other users
instead of original table.

Question : Are views updatable?


For a view to be updatable there are some condition which should be satisfy by view
 It should be single table based.
 If it is created by using two or more tables then all primary keys and not null keys of
alltables should be in view.
 It should not contain aggregate functions

22. Set operations: There are four set operation supported by SQL
UNION ALL: Combines the results of two SELECT statements into one result set.
UNION: Combines the results of two SELECT statements into one result set, and then
eliminates any duplicate rows from that result set.
MINUS: Takes the result set of one SELECT statement, and removes those rows that are
also returned by a second SELECT statement.
INTERSECT: Returns only those rows that are returned by each of two SELECT
statements.

ENGINEER’S CIRCLE, GWALIOR Page 13


SQL statements containing these set operators are referred to as compound queries, and
each SELECT statement in a compound query is referred to as a component query. Two
SELECTs can be combined into a compound query by a set operation only if they satisfy
the following two conditions:
1. The result sets of both the queries must have the same number of columns.
2. The datatype of each column in the second result set must match the datatype of its
corresponding column in the first result set.
Note: The datatypes do not need to be the same if those in the second result set can be
automatically converted by DBMS (using implicit casting) to types compatible with those
in the first result set.
These conditions are also referred to as union compatibility conditions. The term union
compatibility is used even though these conditions apply to other set operations as well.
Set operations are often called vertical joins, because the result combines data from two or
more SELECTS based on columns instead of rows. The generic syntax of a query
involving a set operation is:
<component query>
{UNION | UNION ALL | MINUS | INTERSECT}
<component query>
Example: select ename from employee where salary > 1000
Intersect
selectename from employee where deptno=10

Q19.The employee information in a company is stored in the relation


Employee (name, sex, salary, deptName)
Consider the following SQL query
Select deptName
From Employee
Where sex = male
Group by deptName
Having avg(salary) >
(selectavg (salary) from Employee)
It returns the names of the department in which
(a) the average salary is more than the average salary in the company
(b) the average salary of male employees is more than the average salary of all
maleemployees in the company
(c) the average salary of male employees is more than the average salary of employees in
thesame department.
(d) the average salary of male employees is more than the average salary in the company.
CS2004
Ans. d

RELATIONAL ALGEBRA
In order to implement a DBMS, there must exist a set of rules which state how the database
system will behave. For instance, somewhere in the DBMS must be a set of statements which
indicate than when someone inserts data into a row of a relation, it has the effect which the
user expects. One way to specify this is to use words to write an `essay' as to how the DBMS
will operate, but words tend to be imprecise and open to interpretation. Instead, relational
databases are more usually defined using Relational Algebra.
Relational Algebra is :

ENGINEER’S CIRCLE, GWALIOR Page 14


 the formal description of how a relational database operates
 an interface to the data stored in the database itself
 the mathematics which underpin SQL operations
Operators in relational algebra are not necessarily the same as SQL operators, even if they
have the same name. For example, the SELECT statement exists in SQL, and also exists in
relational algebra. These two uses of SELECT are not the same. The DBMS must take
whatever SQL statements the user types in and translate them into relational algebra
operations before applying them to the database.
Relational algebra is a procedural language
Operators in relational algebra:
1. Project(Π): used to select a subset of the attributes of a relation by specifying the names
of the required attributes. Same as select in sql.
Πename(employee)
This will return set of ename from employee table. its result same as SQL query
Select ename from employee.
The only difference is that select can return duplicate values while all relational algebra
work on set and set does not contain duplicated values, so values in set, returned by
project, are distinct.

2. Select(σ): same as where clause in sql. The only difference is in sql where clause checks
conditions in it but σ return complete rows from table according to condition.
σsalary>5000(employee)
this relational expression will return records of employees having salary > 5000
if we want to select only name of employees having salary>5000 then relational
expression will be:
Πename(σsalary>5000(employee))

Q20.write a relational expression to find name of employees of department number 10


having salary >500
Ans.Πename(σsalary>5000deptno=10 (employee))
is AND operator and  is OR operator.

3. Cartesian product(×):
Example: Πename,dname(employee × department)
This relational expression is same as sql statement:
Select ename ,dname from employee, department

4. Joins: in relational algebra special operators are used for joins. Joins are performed by
equating fields with same name in two tables.
Natural join:
Full outer join:
Right outer join:
Left outer join:

5. Renamingoperator() : used for creating alias or renaming a table field in output(similar


to ‘as’ operator in sql)

6. Group by ( ):
Example: write a relational expression to find average salary of each department.
Πsalary(employee) (deptno)
ENGINEER’S CIRCLE, GWALIOR Page 15
Q21. Let R1 (A,B,C) and R2 (D,E) be two relation schema, where the primary keys are
shownunderlined, and let C be a foreign key in R 1 referring to R2. Suppose there is no
violation ofthe above referential integrity constraint in the corresponding relation instances r 1
and r2.Which one of the following relational algebra expressions would necessarily produce
an emptyrelation?
(a) ΠD(r2) – ΠC(r1)
(b) ΠC(r1) – ΠD(r2)
(c) ΠD(r1 C=D r2)
(d) ΠC(r1 C=D r2) CS2004

Ans. b
Explanation: C is foreign key referring to R2(D of R2), means C contains values those are
already in D. Applying MINUS operator as ΠC(r1) – ΠD(r2) will return empty set.
We can also solve this query by taking example:
R1 R2
A B C D E
a1 b1 c1 c1 e1
a2 b2 c2 c2 e2
a3 b3 c2 c3 e2
a4 b4 c3 c4 e4
c5 e5

ΠD(r2) – ΠC(r1) returns {c4,c5}


ΠC(r1) – ΠD(r2) returns empty set{}
ΠD(r1 C=D r2) returns {c1,c2,c3}
ΠD(r1 C=D r2) returns {c1,c2,c3}

Q22.Consider the relation Student (name, sex, marks), where the primary key is
shownunderlined, pertaining to students in a class that has at least one boy and one girl. What
doesthe following relational algebra expression produce)(Note:  is the rename operator).
Πname (σsex= female (Student)) — Πname [σsex=female /\ x=male /\ marks mStudent n,x,m (Student)]
(a) names of girl students with the highest marks
(b) names of girl students with more marks than some boy student
(c) names of girl students with marks not less than some boy student
(d) names of girl students with more marks than all the boy students CS2004

Ans.d
Explanation:Πname (σsex= female (Student)) will return only name of female student.
Πname [σsex=female /\ x=male /\ marks mStudent n,x,m (Student)] will return name of female student
having marks less or equal than any male student.
Subtracting result of second query from result of first query will return name of female
students those do not have marks less or equal than any male student.

RELATIONAL CALCULUS
Relational calculus consists of two calculi, the tuple relational calculus and the domain
relational calculus, that are part of the relational model for databases and provide a
declarative way to specify database queries. This in contrast to the relational algebra which is
also part of the relational model but provides a more procedural way for specifying queries.
Relational calculus query specifies what is to be retrieved rather than how to retrieve it.

ENGINEER’S CIRCLE, GWALIOR Page 16


- No description of how to evaluate a query.
In first-order logic (or predicate calculus), predicate is a truth-valued function with
arguments. When we substitute values for the arguments, function yields an expression,
called a proposition, which can be either true or false. If predicate contains a variable (e.g. ‘x
is a member of staff’), there must be a range for x. When we substitute some values of this
range for x, proposition may be true; for other values, it may be false.

Tuple Relational Calculus: Interested in finding tuples for which a predicate is true. Based
on use of tuple variables.Tuple variable is a variable that ‘ranges over’ a named relation: i.e.,
variable whose only permitted values are tuples of the relation. Specify range of a tuple
variable S as the Staff relation as:
Staff(S)
To find set of all tuples S such that P(S) is true:
{S | P(S)}
Examples:
To find details of all staff earning more than 10,000:
{e | Staff(e) S.salary> 10000}
To find a particular attribute, such as salary, write:
{e.salary | Staff(S) e.salary> 10000}

In relational calculus two quantifiers are used to tell how many instances the predicate
applies to:
– Existential quantifier  (‘there exists’)
– Universal quantifier  (‘for all’)
Tuple variables qualified by  or  are called bound variables, otherwise called free
variables.
Existential quantifier used in formulae that must be true for at least one instance, such as:
Staff(e)(B)(Branch(B) 
(B.branchNo = e.branchNo) B.city = ‘London’)
Means ‘There exists a Branch tuple with same branchNo as the branchNo of the current Staff
tuple, S, and is located in London’.
Universal quantifier is used in statements about every instance, such as:
(B) (B.city¹ ‘Paris’)
Means ‘For all Branch tuples, the address is not in Paris’.

These identifiers can be used with negation operator (~) as


~(B) (B.city = ‘Paris’)
which means ‘There are no branches with an address in Paris’.

Examples:
List the names of all managers who earn more than £25,000.
{S.fName, S.lName | Staff(S) 
S.position = ‘Manager’ S.salary> 25000}

List the staff who manage properties for rent in Glasgow.


{S | Staff(S)  (P) (PropertyForRent(P)  (P.staffNo = S.staffNo) P.city = ‘Glasgow’)}

List the names of staff who currently do not manage any properties.
{S.fName, S.lName | Staff(S)  (~(P) (PropertyForRent(P)(S.staffNo = P.staffNo)))}
Or
{S.fName, S.lName | Staff(S)  ((P) (~PropertyForRent(P)  ~(S.staffNo = P.staffNo)))}
ENGINEER’S CIRCLE, GWALIOR Page 17
Expressions can generate an infinite set. For example:
{S | ~Staff(S)}
This type of expression are called unsafe expression. To avoid this, add restriction that all
values in result must be values in the domain of the expression.

DOMAIN RELATIONAL CALCULUS


Uses variables that take values from domains instead of tuples of relations.
If F(d1, d2, . . . , dn) stands for a formula composed of atoms and d1, d2, . . . , dnrepresent
domain variables, then:
{d1, d2, . . . ,dn | F(d1, d2, . . . , dn)}
is a general domain relational calculus expression.
Examples:
Find the names of all managers who earn more than £25,000.
{fn, ln | (sn, posn, sex, DOB, sal, bn)
(Staff (sn, fn, ln, posn, sex, DOB, sal, bn) posn = ‘Manager’ sal> 25000)}

Note:When restricted to safe expressions, domain relational calculus is equivalent to tuple


relational calculus restricted to safe expressions, which is equivalent to relational
algebra.Means every relational algebra expression has an equivalent relational calculus
expression, and vice versa.
If unsafe expressions are not restricted then relational calculus is more powerful than
relational algebra.

Q23.With regard to the expressive power of the formal relational query languages, which
ofthe following statements is true?
(a) Relational algebra is more powerful than relational calculus
(b) Relational algebra has the same power as relational calculus.
(c) Relational algebra has the same power as safe relational calculus.
(d) None of the above CS2002
Ans. b
Explanation: there is no restriction on unsafe query is given in question.

FUNCTIONAL DEPENDECY
Definition: A set of attributes X functionally determines a set of attributes Y if the value of X
determines a unique value for Y.
This is similar to functions in mathematics. In mathematics, a function f is said to be valid
function when for every value of x ,f(x) return single value of y. For example, function y = x 2
returns single value of y for every value of x (at x=0 y=0,at x=1 y=1,at x=2 y=4). Now
consider the function, y=√x this function returns 2 values of by for every value of x(for x=4
it returns y = -2 and y = +2). So it is not a valid function. For a valid function we can say x
functionally determine y.

Q24.Consider Relation R(A,B,C) and sample data in R


A B C
a1 b1 c1
a1 b2 c1
a2 b3 c2
a2 b4 c2
Which of the following dependencies holds in R?
1. AB 2. BA 3.BC 4. AC

ENGINEER’S CIRCLE, GWALIOR Page 18


Ans.
1. AB – this functional dependency does not hold since for a1 there are two values in
field B.
2. BA - functional dependency holds
3. BC - functional dependency holds
4. AC - functional dependency holds

Note:by using sample data we can only decide which functional dependency is not holding. If
a functional dependency is holding in sample data then it may or may not hold in whole
relation.

Q25.From the following instance of a relation schema R(A,B,C), we can conclude that:

A B C
1 1 1
1 1 0
2 3 2
2 3 2

(a) A functionally determines B and B functionally determines C


(b) A functionally determines B and B does not functionally determines C
(c) B does not functionally determines C
(d) A does not functionally determines B and B does not functionally determines C
CS2002

Ans. d
Explanation: from a instance of schema we can only prove that particular functional
dependency is not holding but we can’t determine that functional dependency is holding. So,
options (a) and (b) are wrong. Option (c) is wrong because for data ‘1’ in B there are two
values in C.

Some rules for Functional dependencies:


1. Reflexivity: XX always holds. It means a attribute or group of attribute always
functionally determines itself.
2. Transitivity: if XY and YZ then XZ
3. Pseudo-transitivity: if XY and YZW then XZW
4. Additivity: if XY and XZ then XYZ
5. Projectivity: if XYZ then XY and XZ
6. Augmentation: if XY then XZY

Where X,Y and Z are single attributes or group of attributes of a relation.


Note: if ABC then you can’t divide AB C into AC and BC.

Closure of aattribute(*): closure of attributes contains all attributes those are directly or
indirectly driven by this attribute(using above rules). Example: for a relation R(A,B,C,D),
functional dependencies are: AB , BC , BCD.
Closure of A: By 1st rule a attribute derives itself so its closure contain A(i.e {A}*={A}). Now
from AB, B can be directly derive from B. if AB and BC then AC(2nd rule), C can
be derived from A . similarly if AB and AC then ABC(4th rule) and if ABC and
BCD then AD, so D can be derived from A.
A* = {A,B,C,D}
ENGINEER’S CIRCLE, GWALIOR Page 19
Simlarly, B*={B,C,D} , C*={C} , D*={D}

If closure of a attribute of attributes contains all attributes of relation then attribute is


candidate key of relation. In above example A is candidate key of R.

Note: How to find closure of group of attributes: suppose we want to find closure of BC in
above example then closure of BC contains attribute directly or indirectly driven by B,C and
BC. {BC}*={B,C,D}

Q26. consider following functional dependencies for relation R(A,B,C,D,E,F,G,H,I,J,K)


ABC , ADE, BF , FGH , D IJ
Find closure of AB
Ans. {AB}* ={A,B,C,D,E,F,G,H,I,J}
Note: in above example AB is not candidate key of R since K is not in closure of AB. K is also
not in any functional dependency. Attributes those are not in any functional dependency must
be part of candidate key, so candidate key of R is ABK.

Q27. In a schema with attributes A, B, C, D and E following set of functionaldependencies


are given.
AB , AC , CDE, BD, EA
Which of the following functional dependencies is NOT implied by the above set?
(A) CD  AC
(B) BD  CD
(C) BC  CD
(D)AC  BC IT2005

Ans. B
Explanation: Find closure of attributes in left of all options
(A) {CD}+ = { CDEAB} - AC is in closure so AC can be derived from CD
+
(B) {BD} = {BD} - CD is not in closure so CD can not be derived from BD
(C) {BC}+ = { BCDEA} - CD is in closure so CD can be derived from BC
+
(D){AC} = { ACBDEA} - CD is in closure so CD can be derived from AC

Minimal Cover: Minimal cover of functional dependencies is set of functional


dependencies which does not contain any redundant functional dependency. For example, if a
relation R(A,B,C) have functional dependencies{AB,BC,AC}. In this set AC is
redundant because it can be derived from AB and BC, so we need not to write this
functional dependency in set. {AB,BC} is minimal cover of dependencies.

Steps to find Minimal Cover:


Consider the following functional dependencies of relation R(A,B,C,D,E,F)
AC , ACD , EADH
Step1: covert all functional dependencies to simple form. (If XYZ then break it into XY
and XZ).Now functional dependencies for R is:
AC , ACD, EA, ED, EH

Step2: to check whether a functional dependency is redundant or not , first hide that
functional from set and then find closure attributes those are at left of that functional
dependency without using reflexivity rule , if closure contains same attributes for whom we
are finding closure then functional dependency is redundant, remove this functional
dependency from the set.

ENGINEER’S CIRCLE, GWALIOR Page 20


First we check for AC , dependencies remains after hiding it:
ACD , EA , ED , EH

Problems InUnorganized Relation


Consider the relation student(Rollno, Name, CourseNo , CourseName) with (rollno,courseno)
as primary key. following problems are in this relation:
Data Redundancy: if one course is assigned to many student then that course name and
course number will be in many records in tables. This causes following anomalies in table:
1. Insertion anomaly: we can’t insert a new course until at least one student register for it.
2. Deletion Anomaly: if we want to delete a course from table then student information may
loss.
3. Updation anomaly: if we want to change course name of that course then we have to
change course in all the records of students those are assigned to that course.

Normalization: To remove data redundancy and anomalies we Normalize table by


decomposing into multiple tables. Following normal forms are defined for Normalization:

1st Normal Form: a relation is said to be in 1st normal form if it’s data is represented in
tabular form or atomic and there should not be duplicated row(whole row should not be
duplicated, at least value in one same field of two rows must be different ).
Example:
Consider following data in employee table.

Empid Ename Job Salary


Programmer 30000
1 Vivek Analyzer 20000
Project manager 12000

Above table appears to be in tabular form but it’s not in tabular form. A table in is in tabular
form if it for every row each column have single value.
Above table will be in 1NF if it is represented as

Empid Ename Job Salary


1 Vivek Programmer 30000
1 Vivek Analyzer 20000
1 Vivek Project manager 12000

2nd Normal Form(2NF):To understand 2NF first look at these terms:


Consider relation Student(Rollno, Name, CourceNo,CourceName, Deptid, DeptName)
 Prime attribute: Attributes those are parts of candidate key/primary key but not a
candidate key. For example, if {rollno ,courseno , deptid} is candidate key of student
relation then rollno , courseno , {rollno, courseno}, {rollno,deptid} and
{courseno,deptid} are the prime attributes. In other words, prime attributes are proper
subset of candidate keys.
 Determinant: in functional dependency X  Y , X is determinant(attributes at the
tail(left side) of arrow)
 Partial dependency: a functional dependency is said to be partial when determinant is
prime attribute and right side of arrow have non-prime attribute. Consider following

ENGINEER’S CIRCLE, GWALIOR Page 21


functional dependencies for student relation defined above and {rollno ,courseno ,
deptid} as candidate key .
Rollnoname partial (primenon-Prime)
Rollno ,courseno ,deptid name not partial(non-primenon-Prime)
Rollnocourseno not partial(primePrime)
Name,coursenorollno not partial(non-primePrime)

A relation is said to be in 2NF if and only if it is in 1NF and every non-key attribute is fully
dependent on the primary key. or in other words, A relation is said to be in 2NF if and only
if it is in 1NF and there exist no partial dependency.

Relation in 2NF has redundancy and suffers from anomalies.


Note: if all candidate keys have single attribute, then there will be no prime attribute and
relation will be in 2NF.

3rd Normal Form(3NF):A relation R is in third normal form (3NF) if and only if it is in 2NF
and every non-key(non-prime) attribute is non-transitively dependent on the primary key.
A functional dependency XY not violates 3NF conditions if either X is candidate key or Y
is prime attribute , where X and Y attributes or group of attributes. If any of the functional
dependencies violates 3NF conditions then relation is not in 3NF.
An attribute C is transitively dependent on attribute A if there exists an attribute B such that:
AB and BC. Note that 3NF is concerned with transitive dependencies which do not
involve candidate keys.
If A 3NF relation have more than one candidate key then itcan have transitive dependencies
of the form: primary_keyother_candidate_keyany_non-key_column.
A relation R having just one candidate key is in third normal form (3NF) if and only if the
non-key attributes of R (if any) are:
1) mutually independent(attributes, those are not present in any functional dependency, are
mutually independent) , and
2) fully dependent on the primary key of R.
A non-key attribute is any column which is not part of the primary key. Two or more
attributes are mutually independent if none of the attributes is functionally dependent on any
of the others.

A relation R having just one candidate key is in third normal form (3NF) if and only if no
non-key(non-prime) column (or group of columns) determines another non-key(non-prime)
column (or group of columns).

Example: consider a relation ShipDetails (Ship, Capacity, Date, Cargo ,Value) with following
functional dependencies:
Ship,DateCargo,Capacity
Cargo  Value
Capacity Value
To find whether given relation is in 3NF or not, first find all candidate keys of relation using
closure of attributes, then find whether relation is in 2NF or not, then check for 3NF.
Step1: candidate keyof above relation is {ship,date}.
Step 2: There is no partial dependency so relation is in 2NF.
Step 3:
Ship, DateCargo,Capacity not violates 3NF conditions(candidate keynon-prime
attribute)
ENGINEER’S CIRCLE, GWALIOR Page 22
Cargo  Value violates 3NF(non-primenon-prime)
Capacity Value violates 3NF(non-primenon-prime)

Relation ShipDetails is not in 3NF.

A relation in 3NF does not have any anomalies but it still have redundancy.
BoyceeCott’s Normal form(BCNF): A relation is in BCNF if it contains functional
dependencies of form XY, where X is superkey. This is Strongest than 3NF.

Powers of Normal Form can be compared as


1NF < 2NF < 3NF < BCNF

Q28. consider the following functional dependencies in a database.


Date_of_Birth Age Age Eligibility
Name Roll_Number Roll_Number Name
Course_NumberCourse_Name Course_Number Instructor
(Roll_Number, Course_number)Grade

The relation (Roll_Number, Name,Date_of_Birth,Age) is


(A) in second normal form but not in third normal form
(B) in third normal form but not in BCNF
(C) in BCNF
(D) in none of the above CS2003

Ans. D
Explanation: functional dependencies applicable for relation (Roll_Number, Name,
Date_of_Birth, Age) are:

Date_of_Birth Age
Name Roll_Number
Roll_Number Name

To check that a relation is in which normal form we should apply test from lower level. First
apply test for 2NF
Candidate keys of relations are: {Name ,Date_of_birth} and {Roll_number , Date_of_birth}
Now check for partial dependencies
Date_of_Birth Age - partial
Name Roll_Number - partial
Roll_Number Name - partial

There exist partial dependency in relation, relation is not in 2NF , so relation will not be in
neither 3NF nor BCNF.

Q29. The relation scheme Student Performance (name, courseNo, rolINo, grade) has
thefollowing functional dependencies:
name, courseNo grade
rolINo, courseNo grade
namerolINo
rolINo name
The highest normal form of this relation scheme is
(a) 2 NF (b) 3 NF (c) BCNF (d) 4 NF CS2004

ENGINEER’S CIRCLE, GWALIOR Page 23


Ans.b
Explanation: candidate keys of relation are: {name, courseNo} and {rollno, courseNo}
First apply test for 2NF.
name, courseNo grade - not partial(non-primenon-prime)
rolINo, courseNo grade - not partial(non-primenon-prime)
namerolINo - not partial(primeprime)
rolINo name - not partial(non-primenon-prime)
in this relation no partial dependency exist so relation is in 2NF.
Now check for 3NF: for every XY either X is candidate key or Y is prime attribute.
name, courseNo grade - not violating 3NF( candidate key at left side)
rolINo, courseNo grade - not violating 3NF( candidate key at left side)
namerolINo - not violating 3NF( prime attribute at right side)
rolINo name - not violating 3NF( prime attribute at right side)
No dependency is violating 3NF condition ,so relation is in 3NF

Now check for BCNF: for every XY X should be super key.
name, courseNo grade - not violating BCNF( super key at left side)
rolINo, courseNo grade - not violating BCNF( super key at left side)
namerolINo - violating BCNF
rolINo name - violating BCNF
Relation is not in BCNF.
Highest normal form of relation is 3NF.

Desirable Properties of Decomposition:


Lossy and lossless-join decomposition: if divided tables are not able to produce original
table after join then decomposition of table is lossy. This does not data is lost after joining
tables but extra spurious tuples may produced.
Consider the following relation
enrol (sno, cno, date-enrolled, room-No., instructor)
Sno cno date-enrolled room-No. instructor
830057 CP302 1FEB1984 MP006 Gupta
830057 CP303 1FEB1984 MP006 Jones
820159 CP302 10JAN1984 MP006 Gupta
825678 CP304 1FEB1984 CE122 Wilson
826789 CP305 15JAN1984 EA123 Smith

Suppose we decompose the above relation into two relations enrol1 and enrol2 as follows
enrol1 (sno, cno, date-enrolled)
enrol2 (date-enrolled, room-No., instructor)
There are problems with this decomposition but we wish to focus on one aspect at the
moment.
Let the decomposed relations enrol1 and enrol2 be:

Sno Cno date-enrolled


date-enrolled room-No. instructor
830057 CP302 1FEB1984
1FEB1984 MP006 Gupta
830057 CP303 1FEB1984
1FEB1984 MP006 Jones
820159 CP302 10JAN1984
10JAN1984 MP006 Gupta
825678 CP304 1FEB1984
1FEB1984 CE122 Wilson
826789 CP305 15JAN1984
15JAN1984 EA123 Smith

ENGINEER’S CIRCLE, GWALIOR Page 24


All the information that was in the relation enrol appears to be still available in enrol1 and
enrol2 but this is not so. Suppose, we wanted to retrieve the student numbers of all students
taking a course from Wilson, we would need to join enrol1 and enrol2. The join would have 11
tuples as follows:
Sno Cno date-enrolled room-No. instructor
830057 CP302 1FEB1984 MP006 Gupta
830057 CP302 1FEB1984 MP006 Jones
830057 CP303 1FEB1984 MP006 Gupta
830057 CP303 1FEB1984 MP006 Jones
830057 CP302 1FEB1984 CE122 Wilson
830057 CP303 1FEB1984 CE122 Wilson
The join contains a number of spurious tuples that were not in the original relation Enrol.
Because of these additional tuples, we have lost the information about which students take
courses from WILSON. (Yes, we have more tuples but less information because we are
unable to say with certainty who is taking courses from WILSON). Such decompositions are
called lossy decompositions.

A decomposition must be lossless.

How to check whether a decomposition is lossy or lossless-join decomposition.


For this we have to check whether decomposed tables are able to produce original table or
not. Suppose we have relation R(A,B,C,D,E) with functional dependencies :
AB , AC , DC , DE
Let we decomposes R in two table R1(A,B,D) and R2(C,D,E)
Step 1: Create at table with row equals to number of decomposed relations, and columns
equals to all attributes in R.
A B C D E
R1
R2

Step2: Now put X into cell(m,n) where m is decomposed relation and n is field which is
present in relation m
A B C D E
R1 X X X
R2 X X X Step 3:
Now
search for all column in table which X in two rows(which is D here).
Step 4: find those functional dependencies which have column, found in step 3, at left side.
(DC and DE in above example).
Put X into cell(m,n) where m is row selected in step 3 and n is attributes in the right of these
functional dependencies(C and E for rows selected in D).
A B C D E
R1 X X X X X
R2 X X X

Step 5: repeat step 3-5 until no further filling is possible.

If any of the rowcontain X in all columns then decomposition is lossless-join else it is lossy.

ENGINEER’S CIRCLE, GWALIOR Page 25


Suppose we decompose R in R1(A,B,C) and R2(A,D,E) then final table for it will be
A B C D E
R1 X X X
R2 X X X X X

Means this decomposition is also loss-less join

If we divide R in R1(A,B,C) and R2(C,D,E) then final table for it will be


A B C D E
R1 X X X
R2 X X X

No further filling of table is possible because there is no functional dependency in relation


having C as determinant.

Dependency Preserving:when we decompose a table into multiple table then every


dependency in original table must be preserved (every dependency must be satisfied by at
least one decomposed table). in previous example if we divide R in R1(A,B,C) and
R2(A,D,E) then the dependency DC is not satisfied by R! and R2, because of them not
containing D and C together. This decomposition is not dependency preserving.

How to decompose a relation into BCNF


A relation is not in BCNF when functional dependencies of relation not satisfying the
conditions of BCNF. Consider a relation R(A,B,C,D) with following dependencies:
AB
BC
CD
From these set of dependencies we can find primary key of R which is because closure of A,
A*={ABCD}
In this set of functional dependencies BC and CD is violating conditions of BCNF.
Take dependencies , those are violating BCNF conditions, one by one and create separate
table containing attributes in functional dependency(attributes in the left of functional
dependency forms) and remove attribute at the right of these functional dependency from
original table.
First we take CD .create separate table for this relation,R1(C,D) and remove D from R,
now remaining attributes in R are {A,B,D}.
First we take BC . Create separate table for this relation,R2(B,C) and remove C from R,
now remaining attributes in R are {A,B}.
So finally three tables are created : R(A,B) ,R1(C,D), R2(B,C) .these tables are now in
BCNF. This decomposition is lossless and dependency preserving.

Suppose if we would have taken BC first instead of CD then R and R1 after first
decomposition would be
R(A,B,D) R1(B,C)
Now CD is not holding by R and R1, so this dependency is lost, no further decomposition
is possible. This decomposition is lossless but not dependency preserving.

BCNF decomposition is lossless but may or may not dependency preserving.

Note: if a relation R is having no functional dependency then highest normal form supported
by such relation is BCNF.

ENGINEER’S CIRCLE, GWALIOR Page 26


Q30.Relation R with an associated set of functional dependencies, F, is decomposed
intoBCNF. The redundancy (arising out of functional dependencies) in the resulting set of
relationsis
(a) Zero
(b) More than zero but less than that of an equivalent 3NF decomposition
(c) Proportional to the size of F+
(d) Indeterminate CS2002

Ans. a
Explanation: if a relation is in BCNF then there is no redundancy left in relation , but if a
relation is in 3NF then there will be redundancy with no anomalies.

Q31.Relation R is decomposed using a set of functional dependencies, F, and relation S


isdecomposed using another set of functional dependencies, G. One decomposition is
definitely BCNF, the other is definitely 3NF, but it is not known which is which. To make a
guaranteedidentification, which one of the following tests should be used on the
decompositions?
(Assume that the closures of F andGare available).
(a) Dependency-preservation
(b) Lossless-join
(c) BCNF definition
(d) 3NF definition CS2002

Ans.C
Explanation: if we apply BCNF test to both F and G then only one of them will pass the
test(which is in BCNF) other will fail(which is in 3NF).

Q32.Which one of the following statements about normal forms is FALSE?


(a) BCNF is stricter than 3NF
(b) Lossless, dependency-preserving decomposition into 3NF is always possible
(c) Lossless, dependency-preserving decomposition into BCNF is always possible
(d) Any relation with two attributes is in BCNF CS2005

Ans. c

TRANSACTION MANAGEMENT
Transactions isA sequence of many actions which are considered to be one atomic unit of
work.Transacttion in DBMS uses following operations:
– Read, write, commit, abort
Each transaction has a unique starting point, some actions and one end point.A transaction is
a unit of work which completes as a unit or fails as a unit.

Properties of transactions(ACID)
 Atomicity: All actions in the transaction happen, or none happen .in other words, An event
either happens and is committed or fails and is rolled back. e.g. in a money transfer, debit
one account, credit the other. Either both debiting and crediting operations succeed, or
neither of them do.Transaction failure is called Abort. Commit and abort are irrevocable
actions. There is no undo for these actions.An Abort undoes operations that have already
been executed. For database operations, restore the data’s previous value from before the
transaction (Rollback-it); a Rollback command will undo all actions taken since the last
commit for that user. But some real world operations are not undoable.Examples - transfer
money, print ticket, fire missile
ENGINEER’S CIRCLE, GWALIOR Page 27
 Consistency: If each transaction is consistent, and the DB starts consistent, it ends up
consistent.Consistency preservation is a property of a transaction, not of the database
mechanisms for controlling it (unlike the A, I, and D of ACID). If each transaction
maintains consistency, then a serial execution of transactions does also. A database state
consists of the complete set of data values in the database. A database state is consistent if
the database obeys all the integrity constraint. A transaction brings the database from one
consistent state to another consistent state.
 Isolation: Execution of one transaction is isolated from that of other transactions
 Durability: If a transaction commits, its effects persist.When a transaction commits, its
results will survive failures (e.g. of the application, OS, DB system … even of the disk).
Durability makes it possible for a transaction to be a legal contract. Implementation is
usually via a log
– DB system writes all transaction updates to a log file. To commit, it adds a record
“commit(Ti)” to the log.When the commit record is on disk, the transaction is committed.
Then system waits for disk acknowledgement before acknowledging to user. There can be
fivestate of transactions:
1. Active: transaction is started and is issuing reads and writes to the database.
2. Partially committed: operations are done and values are ready to be written to the
database.
3. Committed: writing to the database is permitted and successfully completed.
4. Abort: the transaction or the system detects a fatal error.
5. Terminated: transaction leaves the system.

A transaction reaches its commit point when all operations accessing the database are
completed and the result has been recorded in the log. It then writes a [commit, ] and
terminates
When a system failure occurs, search the log file for entries [start, ]and if there are no
logged entries [commit, ]then undo all operations that have logged entries [write, , X,
old_value, new_value]

Durability is hardware aspect while consistency programming aspect(programmer should


design tables and write queries in such a way that consistency of database is maintained).
To guarantee ACID property following test are performed in DBMS. E.g.
Concurrency Control – Guarantees Consistency and Isolation, given Atomicity.
Logging and Recovery – Guarantees Atomicity and Durability.

Concurrency Control: concurrent transaction causes various problems if they run in


uncontrolled manner. Consider two transactions T1 and T2 running concurrently then
following problems may occur:
• Lost update
– Two transactions simultaneously update the same files
• Uncommitted update
– Transaction 2 uses the result updated by transaction 1
– Transaction 1 aborts and rolls back
– Transaction 2 commits
• Inconsistent Analysis
– Transaction 1 reads
ENGINEER’S CIRCLE, GWALIOR Page 28
– Transaction 2 reads and uses for calculation
– Transaction 1 updates and commits
– Transaction 2 updates and commits
Consider following two transactions on Bank Table:
T1 T2
Update bank set bal=3000 where Update bank set bal=4000 where
accountno=10 accountno=10

If two transaction are valid and they executed serially(eighther<T1,T2> or <T2,T1> in above
case) then system will always move from one valid state to another valid state. This type of
execution is called serial execution and schedule is serial schedule.
A concurrent schedule is called serializable if it behaves like(or equivalent) serial schedule.
Consider following transaction T1 and T2
T1 T2
Read A Read A
A=A+30 A=A*5
Write A Write A
Read B Read B
B=B-30 B=B/5
Write B Write B
There are two possible serial schedule for above two transactions:
S1:<T1 ,T2> execute T1 first then T2
S1:<T2 ,T1> execute T2 first then T1
Lets analyze these schedule, assume initially A is 100 and B is 200
S1:<T1 ,T2> Initially After T1 T2
Value of A 100 130 650
Value of B 200 170 34

S2:<T2 ,T1> Initially After T2 T1


Value of A 100 500 530
Value of B 200 40 10
These two schedules are not equivalent because T1 and T2 reads different value of A and B(in
S1 ,transaction T1 reads value of A=100 and B=200, while in S2 it reads value of A=500 and
B=40).
But these schedules are valid because they are serial. They always returns to a consistent
state.

Now consider concurrent schedule S3 for transaction T1 and t2 in above example:


T1 T2
Read A
A=A+30
Write A
Read A
A=A*5
Write A
Read B
B=B-50
Write B
Read B
B=B / 5
Write B

ENGINEER’S CIRCLE, GWALIOR Page 29


Now analyze this using same initial values
S3 initially After first After first half After second After
half of T1 of T2 half of T2 second
half of T2
Value of A 100 130 650
Value of B 200 170 34

This schedule is equivalent to the serial schedule S1( values of A and B read by T1 and T2 is
same as in S1).
A schedule is called serializable if it is equivalent to a serial schedule. So S3 is serializable.
In other words ,if change in order of instruction in a serial schedule results in a concurrent
schedule that exihibit same behavior as that serial schedule then concurrent schedule is
serializable schedule.

Testing for serializability: There are two type of serializability.


1. Conflict serializability
2. View Serializability

Conflict Serializable: Conflict actions are the sequence of actions which should not be
changed to maintain serializability for every data item. As in above example two transaction
is concurrent and they are executing by interleaving there action but for data item A sequence
T1--->T2 is maintained ,similarly for data item B sequence T1--->T2 is maintained. So all
actions on data items are executing in sequence same as a serializable schedule. This schedule
is conflict serializable.

There are two type of conflict that can occur in a schedule.


i.Read-Write conflict: A transaction T1 reads a data item (Let data item is A) then other
transaction T2 write data item A (before data item A is written by T1). So these two read-
write actions are in conflict.
ii. write-read conflict : A transaction T1 writes a data item (Let data item is A) then other
transaction T2 read data item A (before data item A is read by T1) . So these two write-read
actions are in conflict
iii. Write-write conflict: A transaction T1 writes a data item (Let data item is A) then other
transaction T2 write data item A (before data item A is read by T1) or vice versa. So these two
read-write actions are in conflict.
Example: consider the following schedule S4
T1 T2
In this schedule there are two conflicts only shown by
1 Read A arrows. Arrow indicate sequence in which actions are
2 A=A+30 conflicting.
3 Read A To find conflict we have to check from starting of
4 A=A*5 schedule.pick a read/write action from the beginning of
5 Write A schedule then find write instruction next to it on that data
6 Read B item. If conflicting actions are in same transaction then do
7 B=B-50 nothing(1 and 5 , 6 and 8).If conflicting actions are in
8 Write B different transactions then make an arrow from earlier to
9 Write A later action(3-5 and 8-10).
10 Read B In this example(8-12) can be pair of conflict action but we
11 B=B / 5 have made arrow (8-10) where 10 is before 12 and in
12 Write B same transaction T2 so need not to make arrow for(8-12).

ENGINEER’S CIRCLE, GWALIOR Page 30


Conflict actions shows that if you make new schedule S5 swapping to actions of transactions
in a schedule S4 but order of conflicting action(3-5, 5-9 and 8-10) remains same then new
schedule S5 will be equivalent to S4.
Consider new S5 created by swapping some instructions in

T1 T2
This schedule is equivalent to S4 since order of
1 Read A conflicting actions is same as S4.
2 Read A
3 A=A+30
4 Write A
5 Read B
6 A=A*5
7 B=B-50
8 Write A
9 Write B
10
11 Read B
12 B=B / 5
Write B

To find whether a schedule is conflict serializable or not draw dependency graph. This graph
is directed contains transactions as node and conflicting actions as edges(lables on edges can
be given, label contains name off data item for which conflict occur). For S4 and S5
dependency graph would be:

If dependency graph contains cycle then schedule is not conflict serializable. This graph
contains cycle so S4 and S5 is not conflict serializable.
If dependency graph does not contain cycle then we can find that schedule is equivalent to
which serial schedule by using topological sort of dependency graph.

Q33. Consider three data items D1, D2, and D3, and the following execution schedule of transactions
T1,T2,and T3. In the diagram, R(D) and W(D) denotes the actions reading and writing the data item D
respectively.
T1 T2 T3
R(D3)
R(D2)
W(D2)
R(D2)
R(D3)
R(D1)
W(D1)
W(D2)
W(D3)
R(D1)
R(D2)

ENGINEER’S CIRCLE, GWALIOR Page 31


W(D2)
W(D1)

Which of the following statements is correct?


(A) The schedule is serializable as T2;T3;T1
(B) The schedule is serializable as T2;T1;T3
(C) The schedule is serializable as T3;T2;T1
(D) The schedule is not serializable. CS2003

Ans. A
Explanation: If a schedule is conflict serializable then schedule is serializable. So first we
apply conflict serializability test on schedule.
Step 1. Find all conflicts

T1 T2 T3
R(D3)
R(D2)
W(D2)
R(D2)
R(D3)
R(D1)
W(D1)
W(D2)
W(D3)
R(D1)
R(D2)
W(D2)
W(D1)

Now draw dependency graph using these conflicts

T2 T3 T1

There is no cycle in dependency graph, so schedule is conflict serializable and order of


serialization can be found using topological sort of graph which is T2;T3;T1;

Locking-used to implement serialization and concurrently controlpractically.


there are two types of lock
(1) shared lock- it is applied when a transaction wants to only read data item.Multiple
transaction can acquire shared lock simultaneously on same data item.
(2) exclusive lock-when transaction want to manipulate data it put exclusive lock. it can be
acquired by single transaction at a time on a data item.
There are locking protocol which guide when to lock or unlock

2PL- 2 phase lock- transaction is divided in two phase.


(1) Growing phase-a transaction can acquire lock only when it is in this phase
at start up transaction is in growing phase

ENGINEER’S CIRCLE, GWALIOR Page 32


(2) Shrinking phase- transaction cannot acquire lock in this phase.
it start as soon as transaction unlock any data item

X(A)
A=A+30 growing phase
write(A)
X(B)
unlock(A)
read(B) Shrinking phase
B=B+30
write(B)
unlock(B

T1 T2

1 X(A) In this schedule if T1 is rollback(12) then


2 A=A+ T2 will be affected since it have read
3 30 value of A written by T1. So T1 should
4 Write also be rollbacked.
5 A When one transaction causes other to
6 X(B) X(A) rollback then this is called cascade
7 Unloc Read rollback
8 kA A
9 A=A+
1 50
0 Write
1 A
1 Read
1 B
2 If(B<6
1 0) X(B)
3 rollbac Unloc
1 k k(A)
4 Read
1 B
5 B=B /
1 5
6 Write
1 B
7 Unloc
1 k(B)
8
Problems in 2PL

To avoid cascade rollback there is some modification done in 2PL and modified 2PL is called
Strict 2PL.
All data item can be unlock at the end of transaction, so that interleaving of transaction are
minimum(restrict concurrent execution to certain limit). This increases waiting time of other
transactions.

ENGINEER’S CIRCLE, GWALIOR Page 33


In 2PL there is possibility of unrecoverable schedules in which one transaction read value of
data item from an uncommitted transaction, manipulate the value and then commit itself. If
that uncommitted transaction is rollbacked then value of data item written by transaction
become invalid, but transaction have committed and value is stored on disk so no rollback
can be performed on committed transaction. This type of transaction are called unrecoverable
transaction.
In Strict 2PL there is no problem of unrecoverable transactions.
Note: Schedule in 2PL always satisfy conflict serializability, but there is problem of deadlock
in both Simple 2Pl and Strict 2PL.

Deadlock:Consider following situation for two transaction T1 and T2


T1 T2
X(A)
X(B)
X(B)
X(A)
. .
. .
. .

In this case T1 will wait for B and T2 will wait for A and this situation is called deadlock
when two or more transaction are waiting for other transaction to unlock data item but no
transaction can make any progress.

2PL is a pessimistic approach.


Optimistic approach- according to this approach do not lock any data item, since generally
99% transaction are on different data item. It allows all transaction to read and manipulate
data but it has mechanism to detect inconsistencies. If inconsistencies occurred then rollback
transactions. Example of optimistic approach is timestamp protocol.

Q34. which of the following scenario may lead to an irrecoverable error in database system?
(A) A transaction writes a data item after it is read by an uncommitted transaction
(B) A transaction reads a data item after it is read by an uncommitted transaction
(C) A transaction reads a data item after it is written by an committed transaction
(D) A transaction reads a data item after it is written by an uncommitted transaction
CS2003
Ans. C

ENTITY RELATION DIAGRAM


Entity and Attributes: Any real time object is entity. Every entity have attributes. Ex. Book
is an entity and bookno , authors are its attributes. Entities are represented by rectangle and
attributes by ellipse.

If a attribute is primary key then it is represented by underlining the name of attribute.


Types of attribute:
1. Composite attribute: An attribute that can be divided into other attributes is alled
composite attribute Ex. Address can be divided into street, city ,state , country ,pin.

ENGINEER’S CIRCLE, GWALIOR Page 34


2. Derived Attribute: An attribute that can be derived from other attribute. Ex. Age can be
derived from DOB(date of birth). Derived attributes are represented by dashed ellipse.

3. Multivalued Attributes- an attribute have multiple values is called multivalued attribute.


Ex. A student can have multiple phone numbers.
Multivalued attributes are denoted by double ellipse.

4. Simple Attributes. Attributes those are not composite, derived or multivalued.

Relation: A relation is an association(“has-a” relationship) among several entities. E.g.


student has a book. Relation is represented by diamond.

Example: An E-R diagram of Book issuing system of an institute’s library:

A relation can have attributes. For example, in above ER diagram relation issued has attribute
“issuedate” which shows the date on which book is issued.

Cardinality of relations: express the number of entities to which another entity can be
associated via a relationship. For binary relationship sets between entity sets A and B, the
mapping cardinality must be one of:
1. One to one:An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A. E.g. if everystudent is allowed to borrow one book
only.
2. many to one: An entity in A is associated with any number in B. An entity in B is
associated with at most one entity in A. E.g. .if every student is allowed to borrow multiple
books.
3.many to many: Entities in A and B are associated with any number from each other. E.g. if
a book can be issued to many student and a student can borrow many books.

ENGINEER’S CIRCLE, GWALIOR Page 35


To denote cardinality put label ‘1’ on one side and ‘n’ for many side

1 n
Student issue Book

Partial participation: there may be some entry in entity which are not in relation, so there is
partial participation of entity in relation. Example: there may be students , those don’t have
borrowed any books or there may be books those are not issued to any student.
Total participation : if every entry in entity participate in relation. Total participation of an
entity into the relation is denoted by double line

Types of entities: -
1. Strong entity: those entities having primary key are called strong entities. These are
represented by single rectangle.
2. Weak Entity: those entities not having primary key are called weak entities. These are
represented by doubly-outlined rectangle. Relation joining weak entity is also represented by
doubly-outlined diamond.
Example: consider ER Diagram for Bank-Loan system.
n 1
Loan payment

installmentNo amount
LoanNo Date

In this System Payment is weak entity because installment number for two customer is
same( Customer A pays 3000 for his 1st installment and Customer B pays 3000 for his 1st
installment ). So in payment there is no primary key.
Facts about Weak Entities
 weak entity always have total participation in relation.
 If weak entity have relation with strong entity then cardinality is one to many as shown in
above figure(with one at weak entity and many at strong entity).
 Weak entity can also be represented as multivalued attribute. Or in other words if a
multivalued dependency have composite attribute(more than one) then multivalued
dependency can be represented as weak entity.

Translation of ERD into Tables


Using following rules we can convert an entity-relationship diagram into tables:
1. Create table for each strong entity and create column for each simple attribute in the entity.
In case of composite key create column only for sub-attributes.
2. Do not create separate column for derived attribute. Derived attributes are not added into
the tables.
3. Create separate table for multivalued attributes. Add primary key of entity to the column of
table for multivalued attribute. Similarly, create separate tables for weak entities.Primay key
of tables for multivalued attributes/weak entities will be formed by combining partial key of
weak entity and primary key of entity(strong entity with which weak entity is in relation).
Partial key/Discriminator of weak entity is the key which uniquely identifies weak entity.
4. Tables for relations depends on their cardinality:
i. for many-to-many relationship , create separate table for relationship and add primary key
of both entitiesin the table for relation. Primary key of this table will be formed by combining
primary key of both entities.
ENGINEER’S CIRCLE, GWALIOR Page 36
ii. for one-to-many relation there is no need to create separate table. Add primary key of one-
side-entityin the table for many-side-entity. If relation has attributes then add this attribute as
column in many-side-entity.
iii. also for one-to-one relation there is no need to create separate table. Add primary key of
either entity in table for other entity.
Example: tables for following ERD will be

Book(BookNo,Bname, Rollno, issueDate)


Student(RollNo, firstname,lastname,DOB)
Phoneno(rollno,phoneno)
Note: tables created using ERD are in 3NF.

Statement for Linked Answer Questions: 35&36


Consider the following ER diagram

Q35. The minimum number of tables needed to represent M, N, P, R1, R2 is


(when one to many relationship is shown by arrow then head of arrow is at one-side entity)
(A) 2 (B) 3 (C) 4 (D) 5 CS2008

Ans. B
Explanation: According to rules table created will be
M(M1,M2,M3,P1)
P(P1,P2)
N(N1,N2,P1)
No relation is many-to-many, so no table created for relation.

36. Which of the following is a correct attribute set for one of the tables for thecorrect answer
to the above question?
(A) {M1,M2,M3,P1} (B) {M1,P1,N1,N2}
(C) {M1,P1,N1} (D) {M1,P1} CS2008

Ans. A
Explanation: see explanation of previous question

ENGINEER’S CIRCLE, GWALIOR Page 37


Hashing and Indexing
Database are stored on secondary storage in form of files. Every secondary storage is divided
in blocks and Database is divided into records. Suppose we have a secondary storage with
block size 512 bytes and every record(row) is 100. Suppose we have to store 500 records of
this table then there are two types of organization that can be used for storing records in
blocks.
Spanned Organization: block size is 512 bytes, so 5 records can be stored on first block
directly then remaining 12 byte of block is occupied by 12 byte of 6th block and next 88 bytes
of 6th block is stored on 2nd block. This type of organization is called Spanned organization.
For storing 500 records we require only(100*500) /512=97.6 =98 blocks.

Unspanned organization: in this organization whole record is stored on a block , if some


space is available in block and record size is larger than this space then record will be stored
in new block. For above example 6th record will be stored in new block. Total number of
block required in this organization for storing 500 records=500/5 = 100 blocks.

In spanned organization, accessing a record require more time than unspanned organization,
as in above example accessing 6th record require accessing of two blocks.

Main purpose of indexing is to speed up searching. Searching in a table using linear or binary
search(if records are sorted) is not practical, since they require large time(for above example
searching in 100 blocks require loading and unloading of 100 blocks in to the memory,
accessing time will be large). A sophisticated algorithm is needed for searching in tables.
Index are created for every record using some key (generally primary key), this identifies
each record uniquely. Each index is associated with a record pointer which points to record
which is stored at secondary storage. Index and record pointer associated with it is called
index record. These index records are kept in separate file, called index file.If we want to
search a record, first we search index in index file, if it is found then we can locate whole
record using record pointer.
Suppose we have index record of 6 bytes, then for 500 records in table we have 500 index
records. A block of 512 byte can store 512/6= 85 index records, and for 500 index records we
require 500/85 = 5.88 =6 blocks. Now to search a record in table we have to access only 6
blocks.
Drawbacks of indexing:
- requires extra storage
- if we want to search using other key than we have to create another index file using
that key.

Still, searching index in index file using linear search or binary search is time consuming. To
reduce time complexity of searching we use B-tree/B+-tree in index files. B-tree /B+-tree are
created using index records.

If we have a B-tree of degree n then every internal node can have maximum n-1 keys, n-1
record pointers associated with keys and n child pointers or block pointer (Here term “block
pointer” is used for child pointer because, generally, every node of B-tree is stored on a
separate block).

ENGINEER’S CIRCLE, GWALIOR Page 38


If block pointer size is b bytes, key size is k bytes, and record pointer size is r bytes then
nb+(n-1)k+ (n-1) r = block size.
If given block pointer size is 8 bytes, key size is 4 bytes and record pointer size is 6 bytes
then maximum degree supported by B-tree can be calculated using above formula
n*8+(n-1)*4 +(n-1)*6=512
18n=522
n= 522/18=29

for above example maximum height of B-tree for 1000 index record will be log 291000 =
1.46=2 , which means to search a record we have to access only 2 blocks(instead of
1000/85= 12 blocks in linear search).

Q37. Consider a table T in a relational database with a key field K. A B-tree of order p is used
asan access structure on K, where p denotes the maximum number of tree pointers in a B-
treeindex node. Assume that K is 10 bytes long; disk block size is 512 bytes; each data
pointer Dis 8 bytes long and each block pointer PB is 5 bytes long. In order for each B-tree
node to fitin a single disk block, the maximum value of p is:
(A) 20 (B) 22 (C) 23 (D)32 IT2004

Ans.C
Explanation: by using formula nb+(n-1)k+ (n-1) r = block size. Here n is p
p *5+ (p-1)*10 + (p-1) * 8 = 512
23p= 530
p=23

B+-trees have different leaf structure. In B+- tree leaf node contains keys and record pointer
associated with it and a block pointer pointing to next leaf node. Non-leaf nodes contains
only keys and child pointer, there is no need to store record pointer at non-leaf node, because
all keys are ultimately present on leaf node.
For leaf node order will be maximum number of keys, record pointer pair a node can hold,
but order of non leaf node is determined by maximum child pointers it can have.
For leaf node equation will be:
n*k+ n* r + b = block size
For non-leaf node equation will be:
(n-1)k+ n b = block size

Q38.A B+ - tree index is to be built on the Name attribute of the relation STUDENT.
Assumethat all student names are of length 8 bytes, disk blocks are of size 512 bytes, and
indexpointers are of size 4 bytes. Given this scenario, what would be the best choice of the
degree(i.e. the number of pointers per node) of the B+ - tree?
(a) 16 (b) 42 (c) 43 (d) 44 CS2002

ENGINEER’S CIRCLE, GWALIOR Page 39


Ans.C
Explanation: Degree of B+-tree can be calculated if we know the maximum number of key a
internal node can have. By the formula for internal node of B+-tree
(n-1) k+ n b = block size
(n-1) * 8 + n*4=512
12n=520
n=43

Q39.The order of an internal node in a B+ tree index is the maximum number of children it
canhave. Suppose that a child pointer takes 6 bytes, the search field value takes 14 bytes,
andthe block size is 512 bytes. What is the order of theinternal node?
(a) 24 (b) 25 (c) 26 (d) 27 CS2004

Ans. c
Explanation: by formula for internal node of B+ tree of n degree
(n-1) k+ n b = block size
(n-1)*14 + n*6= 512
20 n=526
n=26

Q40. The order of a leaf node in a B +- tree is the maximum number of (value, datarecord
pointer) pairs it can hold. Given that the block size is 1K bytes, datarecord pointer is 7 bytes
long, the value field is 9 bytes long and a block pointeris 6 bytes long, what is the order of the
leaf node?
(A) 63 (B) 64 (C) 67 (D) 68
CS2007

Ans. A
Explanation: order of leaf node B+ tree can be determined by formula
n*k+ n* r + b = block size
n*9 + n*7 + 6=1024
n*16=1018
n=63

Hashing
In B/B+tree ,searching is faster but still we have to search. Hashing is used to remove
searching complexity. In this we use a hash functions and indexes are then mapped into hash
table according that hash function. if we want to locate an index then use hash function to
find that index. In hashing, searching is removed completely only hash function is to map and
locate indexes.
Hash function:the hash function is chose in such a way that it can map all keys into hash
table. for example we have hash function of ‘mod 10’ and we want to map keys 2010,4011 ,
3127,4256,3214 then hash table will look like

0 2010
1 4011
2
3
4 3124
5

ENGINEER’S CIRCLE, GWALIOR Page 40


6 4256
7 3127
8
9

If a new key result in the position which is already filled in the hash table then collision
occurs. Ex. If new entry is 5414 for above hash table then hash function returns 4th location in
the table which is already filled. There are two ways to handle these collisions.
1. Open Addressing / Rehashing : slightly change the hash function for new key which is
causing collision. Ex. Use (key+5)mod 10 when collision occurs.
Linear probing: This is a type of rehashing function. In this if new entries collides then search for
next free block in the table and fill this block by new entry. If in above example 5414 and 5444 arrives
then hash table will be
0 2010
1 4011
2
3
4 3124
5 5414
6 4256
7 3127
8 5444
9

In linear probing there is problem of primary clustering, means data concentrated at one
place. Hash function should distribute data uniformly to avoid primary clustering. Ex.
Quadratic function can be used for distribute data as:
Hashing function=(key+n2 )mod 10 + where n denotes number of collision
Key Collision number Hash table index
2010 0 (2010+02)%10= 0
4010 1 (4010+12)%10= 1
7120 2 (7120+22)%10= 4
9650 3 (9650+32)%10= 9
3250 4 (3250+42)%10= 6

2. Chaining: in this we make array of pointer instead of data. If entries of same hash function
value makes link list. New entry is added at the end of link list. Example: chaining for data
2010, 4011 , 3127,4256,3214 , 5414 ,5444, 6457 ,9666, 8888 using key mod 10 function will
be as:

0 2010 Null

1 4011 Null

2 Null

3 Null

4 3214 5414 5444 Null

5 Null
ENGINEER’S CIRCLE, GWALIOR Page 41
6 4256 9666 Null

7 3127 6457 Null

8 8888 Null

9 Null

Q41. Consider a hash table of size seven, with starting index zero, and a hash function(3x +
4)mod7. Assuming the hash table is initially empty, which of the followingis the contents of
the table when the sequence 1, 3, 8, 10 is inserted into thetable using closed hashing? Note
that − denotes an empty location in the table.
(A) 8, −, −, −, −, −, 10 (B) 1, 8, 10, −, −, −, 3
(C) 1, −, −, −, −, −, 3 (D) 1, 10, 8, −, −, −, 3 CS2007

Ans. B
Explanation: hash function is (3x+4) mod 7
Key hash table index
1 0
3 6
8 0
10 6
Final hash table will be(if linear probing is used)
0 1
1 8
2 10
3
4 Q42. consider the following SQL query
5 Select distinct a1,a2,, , an from r1,r2, , , rm where p
6 3 For an arbitrary predicate p, this query is equivalent to which of the following
relational algebra expressions?
(A) Π σp(r1 × r2 × … ×rm )
a1, a2 . . . an
(B) Π σp(r1 r2 … rm )
a1, a2 . . . an
(C) Π σp(r1 r2 …rm )
a1, a2 . . . an
(D) Π σp(r1 r2 …rm ) CS2003
a1, a2 . . . an

Ans. A

Q43. Consider set of relation shown below and SQL query that follows.
Students: (Roll_Number , Name, Date_of_Birth)
Course: (Cource_Number,Cource_Name, Instructor)
Grades: (Roll_Number, Course_Number, Grade)

Select distinct Name


from Students, Cources, Grades

ENGINEER’S CIRCLE, GWALIOR Page 42


Where Students.Roll_Number=Grades.Roll_Number
andCources.Instructor= Korth
andCources.Course_Number=Grades.Course_Number
andGrades.Grade=A

Which of the following sets is computed by above query?


(A) Names of Students who have got an A grade in all courses taught by Korth
(B) Names of Students who have got an A grade in all courses
(C) Names of Students who have got an A grade in at least one of the courses taught by Korth
(D) None of the above CS2003

Ans. C

Q44. Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199) and the
hashfunction x mod 10, which of the following statements are true?
i) 9679, 1989, 4199 hash to the same value
ii) 1471, 6171 has to the same value
iii) All elements hash to the same value
iv) Each element hashes to a different value

(a) i only (b) ii only (c) i and ii only (d) iii or iv CS2004

Ans. C
Explanation: when we apply hash function x mod 10 to 9679,1989,4199 , result is 9 , so
statement (i) is correct. Similarly, for 1471 and 6171 hash function returns 1, so statement (ii)
is also correct.

Q45. Consider the following relation schema pertaining to a Students database:


Students (rollno, name, address)
Enroll(rollno, courseno, coursename)
Where the primary keys are shown underlined. The number of tuples in the student and
Enrolltables are 120 and 8 respectively. What are the maximum and minimum number of
tuplesthat can be present in (Student * Enroll), where ‘*‘ denotes natural join?
(a) 8,8 (b) 120,8 (c) 960,8 (d) 960,120 CS2004

Ans. a
Explanation: Natural join will be performed by equating rollno attribute from both tables.
In Students relation there are 120 tuples and each tuple have a unique rollno, since rollno is
primary key in students relation. In enroll relation there are only 8 tuples and there can be two
extreme conditions:
1.Minimum condition: each tuple in enroll have unique student or in other words there are
only 8 students enrolled for courses.
2. Maximum condition: each tuple in enroll have same student but with different courses , or
in other words there is one student enrolled for 8 courses.
In both cases, natural join of two relation results only 8 tuples.

Q46. Which one of the following is a key factor for preferring B-trees to binary searchtrees
for indexing database relations?
(a) Database relations have a large number of records

ENGINEER’S CIRCLE, GWALIOR Page 43


(b) Database relations are sorted on the primary key
(c) B-trees require less memory than binary search trees
(d) Data transfer form disks is in blocks CS2005

Ans. a

Q47. Let r be a relation instance with schema R = (A, B, C, D). We define r 1 = ΠA,B,C (R)
andr2 = ΠA,D (r). Let s = r1* r2 where * denotes natural join. Given that the decomposition of r
into r1 and r2 is lossy, which one of the following isTRUE?
(a) sr (b) rs=r (c) r s (d)r*s=s CS2005

Ans. c
Explanation: Decomposition is lossy means when decomposed tables are joined then some
spurious(extra, meaning-less) tuples will be generated and because of these spurious tuples
we can’t obtain actual data after joining. So option(c) is correct.

Q48. The following table has two attributes A and C where A is the primary key and C is
theforeign key referencing a with on-delete cascade. The set of all tuples that must
beadditionally deleted to preserve referential integrity when the tuple (2,4) is deleted is:

A C
2 4
3 4
4 3
5 2
7 2
9 5
6 4

(a) (3,4) and (6,4)


(b) (5,2) and (7,2)
(c) (5,2), (7,2) and (9,5)
(d) (3,4), (4,3) and (6,4) CS2005

Ans. c
Explanation: C is foreign key referencing to A, C can’t have data other than data present in
A. if we delete tuple(2,4), then rows those containing data value ‘2’ in column C becomes
invalid, which are tuples(5,2) and (7,2), these tuples must be deleted from table.
If (5,2) is deleted then one more row become invalid(tuple (9,5)).
So by deleting (2,4) from table, tuples (5,2),(7,2) and (9,5) is also deleted

Q49. The relation book (title,price) contains the titles and prices of different books.
Assumingthat no two books have the same price, what does the following SQL query list?
select title
from book as B
where (select count(*)
from book as T
whereT.price>B.price)<5
(a) Titles of the four most expensive books
(b) Title of the fifth most inexpensive book

ENGINEER’S CIRCLE, GWALIOR Page 44


(c) Title of the fifth most expensive book
(d) Titles of the five most expensive books CS2005

Ans. d

Q50. Consider a relation scheme R = (A,B,C,D,E,H) on which the following


functionaldependencies hold: {AB, BCD, EC, D A}. What are the candidate keys
of R?
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH CS2005

Ans. d
Explanation: if we take closure AE, BE, DE , we will get all attributes appearing in
functional dependencies as:
AE+={ABCDE} BE+={ABCDE} DE+={ABCDE}

Only H is not in any of the FDs, so add H to AE, DE, and BE to generate candidate key.

Q51. Consider the following log sequence of two transactions on a bank account, with
initialbalance 12000, that transfer 2000 to a mortgage payment and then apply a 5% interest.
1. T1 start
2. T1 B old= 1200 new= 10000
3. T1 M old=0 new=2000
4. T1 commit
5. T2start
6. T2 B old= 10000 new= 10500
7. T2 commit
Suppose the database system crashes just before log record 7 is written. Whenthe system is
restarted, which one statement is true of the recovery procedure?
(A) We must redo log record 6 to set B to 10500
(B) We must undo log record 6 to set B to 10000 and then redo log records 2 and 3
(C) We need not redo log records 2 and 3 because transaction Ti has committed
(D) We can apply redo and undo operations in arbitrary order because they are idempotent.
CS2006

Ans. C

Q52. Consider the relation account (customer, balance) where customer is a primary key
andthere are no null values. We would like to rank customers according to decreasing
balance.The customer with the largest balance gets rank 1. ties arenot broke but ranks are
skipped: if exactly two customers have the largest balance they eachget rank 1 and rank 2 is
not assigned.
Queryl:select A.customer, count(B.customer)
from account A, account B
whereA.balance<=B.balance
group by A.customer

Query2:select A.customer, 1+count(B.customer)


from account A, account B
ENGINEER’S CIRCLE, GWALIOR Page 45
whereA.balance<B.balance
group by A.customer
Consider these statements about Queryl and Query2.
1. Queryl will produce the same row set as Query2 for some but not all databases.
2. Both Queryl and Query2 are correct implementation of the specification
3. Queryl is a correct implementation of the specification but Query2 is not
4. Neither Queryl nor Query2 is a correct implementation of the specification
5. Assigning rank with a pure relational query takes less time than scanning in
decreasingbalance order assigning ranks using ODBC.
Which two of the above statements are correct?
(A) 2 and 5
(B) 1 and 3
(C) 1 and 4
(D) 3 and 5 CS2006

Ans. C
Explanation: solve these queries by taking example. Suppose content of table is
Account Query1: Query2:

Customer balance Customer Count Customer Count


a 10000 A 5 a 4
b 10000 B 5 b 4
c 50000 C 3 c 3
d 60000 D 1 d 1
e 60000 E 1 e 1 So by seeing
result we can
prove that statement 1 is true and statement 2 and statement 3 is false.

Q53. Consider the relation enrolled (student, course) in which (student, course) is the
primarykey, and the relation paid (student, amount) where student is the primary key. Assume
nonull values and no foreign keys or integrity constraints. Given the following four queries:
Queryl:select student
from enrolled
where student in
(select student from paid)
Query2:selectstudent
from paid
where student in
(select student from enrolled)
Query3:select E.student
from enrolled E, paid P
whereE.student = P.student
Query4:select student
from paid
whereexists
(select * from enrolled where enrolled.student = paid.student)

Which one of the following statements is correct?


(A) All queries return identical row sets for any database
(B) Query2 and Query4 return identical row sets for all databases but there exist databasesfor
which Queryl and Query2 return different row sets.

ENGINEER’S CIRCLE, GWALIOR Page 46


(C) There exist databases for which Query3 returns strictly fewer rows than Query2
(D) There exist databases for which Query4 will encounter an integrity violation at runtime.
CS2006
Ans. b
Explanation: solve by taking example.

Q54. The following functional dependencies are given:


AB  CD,AF D,DE F,C G,F  E,G  A.
Which one of the following options is false?
(A) {CF} ={ACDEFG}
(B) {BG} = {ABCDG}
(C) {AF} ={ACDEFG}
(D) {AB} ={ABCDFG} CS2006

Ans. C,D
Explanation: in option C closure of AF contains C , so it is wrong.
In option D , closure of AB contains F, so it is also wrong.

Q55.Information about a collection of students is given by the relationstudinfo(studId, name,


sex). The relation enroll(studId, courseId) gives whichstudent has enrolled for (or taken) what
course(s). Assume that every course istaken by at least one male and at least one female
student. What does thefollowing relational algebra expression represent?
Πcourseid( ( ΠstudId(σ sex="female"(studInfo))×Πcourseid( enroll))– enroll )
(A) Courses in which all the female students are enrolled.
(B) Courses in which a proper subset of female students are enrolled.
(C) Courses in which only male students are enrolled.
(D) None of the above CS2007

Ans. b
Explanation:statement “ ΠstudId(σ sex="female"(studInfo)” returns studid of all female student,
these studid is naturally join with studid of enroll so statement “( ΠstudId(σ
sex="female"(studInfo))×Πcourseid( enroll))” returns cartesian product of all female student’s
studid with all available courses in enroll. Next, enroll is subtracted from this Cartesian
product, means actual entries of female student enrolled in enroll relation is removed from
result of Cartesian product, if a course is enrolled by all female student then it will be
removed completely. Now , final result contain female student with coursed in which they
have not enrolled but these course may be enrolled by other female student. Π courseid() will
select Courses in which a proper subset of female students are enrolled.

Example:
Studid Name Sex Studid courseid
S1 A Female S1 1
S2 B Female S1 2
S3 C Male S2 1
S4 D Male S3 1
S4 2

ΠstudId(σ sex="female"(studInfo))×Πcourseid( enroll) returns

Studid courseid
S1 1

ENGINEER’S CIRCLE, GWALIOR Page 47


S1 2
S2 1
S2 2

( ( ΠstudId(σ sex="female"(studInfo))×Πcourseid( enroll))– enroll ) returns


Studid courseid
S2 2

Πcourseid( ( ΠstudId(σ sex="female"(studInfo))×Πcourseid( enroll))– enroll ) returns

Coursed
2

Q56. Consider the relation employee(name, sex, supervisorName) with name as thekey.
supervisorName gives the name of the supervisor of the employee underconsideration. What
does the following Tuple Relational Calculus query produce?
{e.name employee (e) /\
x [¬employee (x) \/ x.supervisorName¹ e.name \/x.sex = "male" ] }
(A) Names of employees with a male supervisor.
(B) Names of employees with no immediate male subordinates.
(C) Names of employees with no immediate female subordinates.
(D) Names of employees with a female supervisor. CS2007
Ans. C

Q57. Consider the table employee(empId, name, department, salary) and the twoqueries
Q1 ,Q2 below. Assuming that department 5 has more than one employee,and we want to find
the employees who get higher salary than anyone in thedepartment 5, which one of the
statements is TRUE for any arbitrary employeetable?
Q1 : Select e.empId
From employee e
Where not exists
(Select * From employee s where s.department = “5” and s.salary>=e.salary)
Q2 : Select e.empId
From employee e
Where e.salary>Any
(Select distinct salary From employee s Where s.department = “5”)
(A)Q1 is the correct query
(B) Q2 is the correct query
(C) Both Q1 and Q2 produce the same answer.
(D) Neither Q1 nor Q2 is the correct query CS2007

Ans. b

Q58. Which one of the following statements if FALSE?


(A) Any relation with two attributes is in BCNF
(B) A relation in which every key has only one attribute is in 2NF
(C) A prime attribute can be transitively dependent on a key in a 3 NF relation.
(D) A prime attribute can be transitively dependent on a key in a BCNF relation. CS2007
Ans. d

ENGINEER’S CIRCLE, GWALIOR Page 48


Q59. Consider the following schedules involving two transactions. Which one of
thefollowing statements is TRUE?
S1: r1(X); r1(Y); r2(X); r2(Y); w2(Y); w1(X)
S2: r1(X); r2(X); r2(Y); w2(Y); r1(Y);w1(X)
(A) Both S1 and S2 are conflict serializable.
(B) S1 is conflict serializable and S2 is not conflict serializable.
(C) S1 is not conflict serializable andS2 is conflict serializable.
(D) Both S1 and S2 are not conflict serializable. CS2007

Ans. C
Explanation: to find conflict serializability first find conflict statements in schedules.

S1 S2
T1 T2 T1 T2
r1(X) r1(X)
r1(Y) r2(X)
r2(X) r2(Y)
r2(Y) w2(Y)
w2(Y) r1(Y)
w1(X) w1(X)
Dependency Graph: Dependency graph:

Y X
T X T T Y T
1 1 1 1
Cycle exists, Not conflict Cycle not exists, conflict
serializable serializable

Q60. Which of the following tuple relational calculus expression(s) is/are equivalent tot  r
(P(t))?
I.tr(P(t))
II. t r(P(t))
III. t r(P(t))
IV. tr(P(t))
(A) I only (B) II only (C) III only (D) III and IV only CS2008

Ans. C
Explanation: in this question some rules of predicate calculus are used.
can be replace by  or  can be replace by 
t  r (P(t)) =  ( t  r (P(t)))
= ( t r (P(t)))
Now  can be replace by 
= ( t r (P(t)))
= ( t r (P(t)))

Alternatively, you can take example for symbols and then compare each predicate.
Suppose r is student relation and P is predicate for Sick.
P(t)=Sick(t) means student t is sick
t  r (P(t)) means all are sick students
I.tr(P(t)) means there exist no one who belongs to sick student

ENGINEER’S CIRCLE, GWALIOR Page 49


II. t r(P(t)) means there exist some one who does not belong sick student
III. t r(P(t)) means there exist no one who belongs to not sick student
IV. tr(P(t)) means there exist no one who does not belong to not sick student

Q61.A clustering index is defined on the fields which are of type


(A) non-key and ordering (B) non-key and non-ordering
(C) key and ordering (D) key and non-ordering CS2008

Ans. C

Q62.The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hashtable of
length 10 using open addressing with hash function h(k) = k mod 10 andlinear probing. What
is the resultant hash table?

A B C D
0 0 0 0
1 1 1 1
2 2 2 12 2 12 2 12,2
3 23 3 13 3 13 3 13,3,23
4 4 4 2 4
5 15 5 5 5 3 5 5,15
6 6 6 23 6
7 7 7 5 7
8 18 8 18 8 18 8 18
9 9 9 15 9

Ans. C

Q63. Let R and S be relational schemes such that R={a,b,c} and S={c}. Now considerthe
following queries on the database:
I. Π R-S(r) - Π R-S( ΠR-S(r) ×S-Π R-S,S(r) )
II. {t | t Π R-S(r) /\ u  s (v r(u=v[s] /\ t=v[R-S] ))}
III. t | t Π R-S(r) /\ v  s (u  r(u=v[s] /\ t=v[R-S] ))}
IV. Select R.a, R.b
From R,S
Where R.c=S.c
Which of the above queries are equivalent?
(A) I and II (B) I and III (C) II and IV (D) III and IV CS2009

Ans. C
Hint: here R-S meansa,b and S means c
queryI. Π R-S(r) - Π R-S( ΠR-S(r) ×S-Π R-S,S(r) )
This query is equivalent to
Π a,b(r) - Π a,b( Π a,b(r) ×S-Π a,b,c(r) )
Returns combination of a,b in R which belongs all c in S

Common Data Questions: 64 & 65


Consider the following relational schema:
Suppliers(sid:integer, sname:string, city:string, street:string)
Parts(pid:integer, pname:string, color:string)
ENGINEER’S CIRCLE, GWALIOR Page 50
Catalog(sid:integer, pid:integer, cost:real)

Q64. Consider the following relational query on the above database:


SELECT S.sname
FROM Suppliers S
WHERE S.sid NOT IN (SELECT C.sid
FROM Catalog C
WHERE C.pid NOT in (SELECT P.pid
FROM Parts P
WHERE P.color<> 'blue'))

Assume that relations corresponding to the above schema are not empty. Whichone of the
following is the correct interpretation of the above query?
(A) Find the names of all suppliers who have supplied a non-blue part.
(B) Find the names of all suppliers who have not supplied a non-blue part.
(C) Find the names of all suppliers who have supplied only blue parts.
(D) Find the names of all suppliers who have not supplied only blue parts. CS2009

Ans.B
Explanation: (SELECT P.pidFROM Parts PWHEREP.color<> 'blue') returns pid of parts
those have blue color.
(SELECT C.sidFROM Catalog CWHEREC.pid NOT in (SELECT P.pidFROM Parts P
WHERE P.color<> 'blue')) returns sid of suppliers who have supplied at least one non-blue
part.

Finally, outer query will select suppliers who have not supplied any non-blue parts.

Q65. Assume that, in the suppliers relation above, each supplier and each street withina city
has a unique name, and (sname, city) forms a candidate key. No otherfunctional dependencies
are implied other than those implied by primary andcandidate keys. Which one of the
following is TRUE about the above schema?
(A) The schema is in BCNF
(B) The schema is in 3NF but not in BCNF
(C) The schema is in 2NF but not in 3NF
(D) The schema is not in 2NF CS2009

Ans.A
Explanation: in this relation FDs only depend on primary key and candidate key, so relation
is in BCNF.

Q66.A relational schema for a train reservation database is given below


Passenger (pid, pname, age)
Reservation (pid, cass, tid)

Table :Passenger Table :Reservation

pid pname age


Pid Class Tid
0 Sachin 65
0 AC 8200
1 Rahul 66
1 AC 8201
2 Sourav 67
2 AC 8201
3 Anil 69
5 AC 8203
1 AC 8204
ENGINEER’S CIRCLE, GWALIOR 3 AC 8202 Page 51
What pids are returned by the following SQL query for the above instance of thetables?
SELECT pid
FROM Reservation
WHERE class = 'AC' AND
EXISTS (SELECT *
FROM Passenger
WHERE age>65 ANDPassenger.pid=Reservation.pid)

(A) 1, 0 (B) 1, 2 (C) 1, 3 (D) 1, 5


CS2010

Ans. B
Explanation: it is correlated query ,so for every row select by outer query inner query will
run

Pid Class Tid Inner query returns Exists returns Outer query returns
0 AC 8200 Null False
1 AC 8201 {1,Rahul,66} True 1
2 AC 8201 {2,Sourav,67} True 2
5 AC 8203 Null False
1 AC 8204 {1,Rahul,66} True 1
3 AC 8202 Null False

Q67. Which of the following concurrency control protocols ensure both conflictserializability
and freedom from deadlock?
I. 2-phase locking
II. Time-stamp ordering
(A) I only (B) II only (C) Both I and II (D) Neither I nor II CS2010

Ans. B

Q68. Consider the following schedule for transactions T1, T2 and T3:
T1 T2 T3
Read(X)
Read(Y)
Read(Y)
Write(Y)
Write(X)
Write(X)
Read(X)
Write(X)
Which one of the schedules below is the correct serialization of the above?
(A) T1 → T3 → T2 (B) T2 → T1 → T3
(C) T2 → T3 → T1 (D) T3 → T1 → T2 CS2010

Ans. A
Explanation: first find all conflicts
T1 T2 T3
Read(X)
ENGINEER’S CIRCLE, GWALIOR Page 52
Read(Y)
Read(Y)
Write(Y)
Write(X)
Write(X)
Read(X)
Write(X)
Now make dependency graph

T T3 T2
1
By topological sorting of this graph we can find order of serialization which is
T1T3T2

Q69. The following functional dependencies hold for relations R(A, B, C) and S(B, D, E)
B A,
A C
The relation R contains 200tuples and the relation S contains 100tuples. What isthe maximum
number of tuples possible in the natural join of Rand S?
(A) 100 (B) 200 (C) 300 (D) 2000
CS2010

Ans. A
Explanation: from set of functional dependencies , we can find that B is primary key of R.
So in R , 200 tuples contains unique value of B. In S there can be two extreme conditions:
1. If all 100 B in S is same(and this B is present in R)
2. If all 100 B in S is unique( and every B in S is present in R)
In both case natural join would pick maximum 100 tuples.

Statement for Linked Answer Questions: 70 & 71


A hash table of length 10 uses open addressing with hash function h(k)=kmod 10, and linear probing.
After inserting 6 values into an empty hash table,the table is as shown below
0
1
2 42
3 23
4 34
5 52
6 46
7 33
8
9

Q70. Which one of the following choices gives a possible order in which the key valuescould
have been inserted in the table?
(A) 46, 42, 34, 52, 23, 33 (B) 34, 42, 23, 52, 33, 46
(C) 46, 34, 42, 23, 52, 33 (D) 42, 46, 33, 23, 34, 52 CS2010

Ans. C
Explanation: for all option create hash table

ENGINEER’S CIRCLE, GWALIOR Page 53


Option A Option B Option C Option D
0
1
2 42 42 42 42
3 52 23 23 33
4 34 34 34 23
5 23 52 52 34
6 46 33 46 46
7 33 46 33 52
8
9

Q71. How many different insertion sequences of the key values using the same hashfunction
and linear probing will result in the hash table shown above?
(A) 10 (B) 20 (C) 30 (D) 40
CS2010

Ans. C

Q72.Consider the following entity relationship diagram (ERD), where two entities El and
E2have a relation R of cardinality l:m.
1 m
E1 E2
R

The attributes of El are A11, A12 and A13 where A11 is the key attribute. The attributes of E2are
A21, A22 and A23 where A21 is the key attribute and A23 is a multi-valued attribute.Relation R
does not have any attribute. A relational database containing minimum number oftables with
each table satisfying the requirements of the third normal form (3NF) is designedfrom the
above ERD. The number of tables in the database is:
(A) 2 (B) 3 (C) 5 (D)4 IT2004

Ans. B
Explanation: tables created using ERD
E1(A11, A12 , A13) , E2(A21, A22) and A23(A21, A23)

Q73.A relational database contains two table student and department in which student
tablehas columns roll_no, name and dept_id and department table has columns dept_id
anddetp_name. the following insert statements were executed successfully to populate the
emptytables:
Insert into department values (1, ‘Mathematics’)
Insert into department values (2, ‘Physics’)
Insert into student values (1, ‘Navin’,l)
Insert into student values (2, ‘Mukesh’,2)
Insert into student values (3, ‘Gita’,l)

How many rows and columns will be retrieved by the following SQL statement?
Select * from student, department
(A) 0 row and 4 columns
(B) 3 rows and 4 columns
(C) 3 rows and 5 columns
ENGINEER’S CIRCLE, GWALIOR Page 54
(D) 6 rows and 5 columns
IT2004

Ans. D
Explanation: query is Cartesian product of student and department which returns 3*2 rows
and 5(3 student’s and 2 department’s) columns.

Q74. A relation Empdtl is defined with attributes empcode (unique), name, street, city,
stateand pincode. For any pincode, there is only one city and state. Also, for any given street,
cityand state, thereis just one pincode. In normalization terms, Empdtl is a relation in
(A) 1 NF only
(B) 2 NF and hence also in 1 NF
(C) 3 NF and hence also in 2 NF and 1 NF
(D) BCNF and hence also in 3 NF, 2NF and 1NF IT2004

Ans. C
Explanation: functional dependency given
Pincodecity
Pincodestate
Street,city,statepincode
Candidate key of Empdtl will be: {empcode, name, pincode, street} and {empcode,
name,Street, city, state}
Apply check for 2NF: find partial dependencies
Pincodecity - not partial(primeprime)
Pincodestate - not partial(primeprime)
Street,city,statepincode -not partial(primeprime)
Empdtl is in 2NF

Apply check for 3NF:


Pincodecity - not violate 3NF(right side have prime attribute)
Pincodestate - not violate 3NF(right side have prime attribute)
Street,city,statepincode -not violate 3NF(right side have prime attribute)
Empdtl is in 3NF

Apply check for BCNF


Pincodecity - violate BCNF(left side not have super key)
Pincodestate - violate BCNF(left side not have super key)
Street,city,statepincode -violate BCNF(left side not have super key)
Empdtl is not in BCNF

Highest Normal form supported by Empdtl is 3NF

Q75. A table Ti in a relational database has the following rows and columns:

Roll No Marks
1 10
2 20
3 30
4 Null

The following sequence of SQL statements was successfully executed on table T1.

ENGINEER’S CIRCLE, GWALIOR Page 55


Update Ti set marks = marks + 5
Select avg(marks) from Ti
What is the output of the select statement?
A) 18.75 (B) 20 (C) 25 (D)Null
IT2004

Ans. C
Explanation:Query “Update Ti set marks = marks + 5” update in table as

Roll No Marks
1 15
2 25
3 35
4 Null
Query “Select avg(marks) from Ti” returns
(15+25+35)/3=25

Q76. Consider the following schedule S of transactions T1 and T2:

T1 T2
Read(A)
A=A-10
Read(A)
Temp=0.2*A
Write(A)
Read(B)
Write(A)
Read(B)
B=B+10
Write(B)
B=B+temp
Write(B)
Which of the following is TRUE about the schedule 5?
(A) S is serializable only as T1, T2
(B) S is serializable only as T2, T1
(C) S is serializable both as T1, T2 and T2, T1
(D) S is serializable either as T1 or as T2 IT2004

Ans. D
Explanation: find all conflicts in schedule

T1 T2
Read(A)
A=A-10
Read(A)
Temp=0.2*A
Write(A)
Read(B)
Write(A)
Read(B)
B=B+10

ENGINEER’S CIRCLE, GWALIOR Page 56


Write(B)
B=B+temp
Write(B)

This schedule is not conflict serializable. But this can be view serializable.
If a schedule is not conflict serializable then this means schedule may or may not be
serializable. So we have to check for less strict definition of serializability i.e.
viewserializability. View serializability does not consider conflicts for blind writes. For
example
T1 T2
R(A)
W(A)
W(A)
W(A)
This schedule is not conflict serializable. But in last result of this schedule is value stored for
data item A. Last write operation writes value of A. result of this schedule is similar to
running only T2. If we swap the write statements then schedule is
T1 T2
R(A)
W(A)
W(A)
W(A)
Result of this schedule is similar to running T1 only. This situation is called blind writes.
These writes operation does not change serializability of schedule.

Now in question, there is only write-write conflict forming a cycle. We can swap these write
instructions in following two ways to remove cycle:

T1 T2 T1 T2
Read(A) Read(A)
A=A-10 A=A-10
Read(A) Read(A)
Temp=0.2*A Temp=0.2*A
Write(A) Write(A)
Write(A) Read(B)
Read(B) Write(A)
Read(B) Read(B)
B=B+10 B=B+10
Write(B) B=B+temp
B=B+temp Write(B)
Write(B) Write(B)
Output of this schedule is Output of this schedule is
same as output of running same as output of running
only T2 only T1

So option D is correct.

Q77. Consider two tables in a relational database with columns and rows as follows:
Table: Student Table: Department
Roll_no Name Dept_id Dept_id Dept_Name

ENGINEER’S CIRCLE, GWALIOR Page 57


1 ABC 1 1 A
2 DEF 1 2 B
3 GHI 2 3 C
4 JKL 3

Roll_no is the primary key of the Student table, Dept_id is the primary key of theDepartment
table and Studetn.Dept_id is a foreign key fromDepartment.Dept_id.What will happen if we
try to execute the following two SQL statements?
(i) update Student set Dept_id= Null where Roll_no =1
(ii) update Department set Dept_id = Null where Dept_id =1

(A) Both (i) and (ii) will fail (B) (i) will fail but (ii) will succeed
(C) (i) will succeed but (ii) will fail (D) Both (i) and (ii) will succeed IT2004

Ans. C
Explanation:
Query(i) runs correctly because foreign key(Student.dept_id) can have only null value other
than values in referred column(Department.dept_id).
Query(ii) will fail because it is trying to set dept_id to Null which is primary key and primary
key implicitly have two constraints 1. Unique and 2. Not null.

Q78.A hash table contains 10 buckets and uses linear probing to resolve collisions. The
keyvalues are integers and the hash function used is key % 10. if the values 43, 165, 62,
123,142 are inserted in the table, in what location would the key value 142 be inserted?
(A) 2 (B) 3 (C) 4 (D)6 IT2005

Ans. D
Explanation: create hash table for given data
0
1
2 62
3 43
4 123
5 165
6 142
7
8
9

Q79.Consider the entities ‘hotel room’, and ‘person’ with a many to many relationship
‘lodging’as shown below:

mm
Hotel Room lodging person
If we wish to store information about the rent payment to be made by person(s)
occupyingdifferent hotel rooms, then this information should appear as anattribute of
(A) Person
(B) Hotel Room
(C) Lodging

ENGINEER’S CIRCLE, GWALIOR Page 58


(D)None of these IT2005

Ans. C
Explanation: for many-nay relation a separate table is created. Here a separate table will be
created for lodging which contain primary keys of ‘hotel room’ and ‘person’ as its attribute
.so we can store information about the rent payment to be made by person(s)
occupyingdifferent hotel rooms in lodging table.

Q80. A table has fields Fl, F2, F3, F4, F5 with the following functional dependencies
Fl F3
F2 F4
(F1.F2)  F5
In terms of Normalization, this table is in
(A) 1 NF
(B) 2 NF
(C) 3 NF
(D)None of these IT2005

Ans. A
Explanation: first find candidate keys of relation: {F1,F2}
Now check for 2NF: find partial dependencies
Fl F3 - partial (prime non-prime)
F2 F4 -partial (prime non-prime)
(F1.F2)  F5 -not partial(non-prime non-prime)
Relation is not in 2NF because it have partial dependencies.

Q81. A B-tree used as an index for a large database table has four levels including the
rootnode. If a new key is inserted in this index, then the maximum number of nodes that
could benewly created in the process are
(A) 5 (B) 4 (C) 3 (D)2 IT2005

Ans. A
Explanation: solve by taking example.

Q82. Amongst the ACID properties of a transaction, the ‘Durability’ property requiresthat the
changes made to the database by a successful transaction persist
(A) except in case of an Operating System crash
(B) except in case of Disk crash
(C) except in case of a power failure
(D) always, even if there is a failure of any kind IT2005

Ans. D

Q83. A company maintains records of sales made by its salespersons and pays
themcommission based on each individual’s total sales made in a year. This data is
maintained in atable with following schema:
salesinfo = (salespersonid, totalsales, commission)
In a certain year, due to better business results, the company decides to further reward
itssalespersons by enhancing the commission paid to them as per the following formula.
If commission < = 50000, enhance it by 2%
If 50000 < commission < = 100000, enhance it by 4%
If commission > 100000, enhance it by 6%

ENGINEER’S CIRCLE, GWALIOR Page 59


The IT staff has written three different SQL scripts to calculate enhancement for
each slab, each of these scripts is to run as a separate transaction as follows:
T1 Update salesinfo
Set commission = commission * 1.02
Where commission < = 50000;
T2 Update salesinfo
Set commission = commission * 1.04
Where commission > 50000 and commission is <= 100000;
T3 Update salesinfo
Set commission = commission * 1.06
Where commission > 100000;
Which of the following options of running these transactions will update thecommission of
all salespersons correctly?
(A) Execute T1, followed by T2 followed by T3
(B) Execute T2, followed by T3; T1 running concurrently throughout
(C) Execute T3 followed by T2; Ti running concurrently throughout
(D) Execute T3 followed by T2 followed by T1

Ans. D
Explanation: suppose if we run T1 then there will be some employees whose salary have
become >50000 and now if we run T2 then these employees will also get benefit of 4%, so
T2 must not followed by T1, and similarly, T3 must not be followed by T2. So option D is
correct.

Q84. A table ‘student’ with schema (roll, name, hostel, marks) and another table ‘hobby’
withschema (roll, hobbyname) contains records as shown below.

Table: Student Table: Hobby


Roll Name Hostel Marks Roll Hobbyname
1798 ManojRathod 7 95 1798 Chess
2154 Soumic Banerjee 5 68 1798 Music
2369 Gumma Reddy 7 86 2154 Music
2581 PradeepPense 6 92 2369 Swimming
2643 SuhasKulkarni 5 78 2581 Cricket
2711 NitinKadam 8 72 2643 Chess
2872 KiranVora 5 92 2643 Hockey
2926 ManojKulkalikar 5 94 2711 Volleyball
2959 HemantKarkhanis 7 88 2872 Football
3125 Rajesh Doshi 5 82 2926 Cricket
2959 Photography
3125 Music
3125 Chess

The following SQL query is executed on the above tables:


select hostel
from student natural join hobby
where marks > = 75 and roll between 2000 and 3000;

Relations S and H with the same schema as those of these two tables respectively contain
thesame information as tuples. A new relation S’ is obtained by the following relational
algebraoperation:
S = hostel ((σs.roll=H.roll (σmarks>75 and roll>2000 and roll<3000 (S))  (H))
ENGINEER’S CIRCLE, GWALIOR Page 60
The difference between the number of rows output by the SQL statement and the number of
tuples in S is:
(A) 6 (B) 4 (C) 2 (D) IT2005

Ans. B
Explanation: Following table is created after joining tables with condition

Roll Name Hostel Marks Hobby


2369 Gumma Reddy 7 86 Swimming
2581 PradeepPense 6 92 Cricket
2643 SuhasKulkarni 5 78 Chess
2643 SuhasKulkarni 5 78 Hockey
2872 KiranVora 5 92 Football
2926 ManojKulkalikar 5 94 Cricket
2959 HemantKarkhanis 7 88 Photography
From this table only hostel is selected by SQL statements, so 7 rows are returned.
Relational algebra expression S is similar to this SQL statement , but relational algebra
always returns a set and a set removes replicated values itself. Result produced by S is
{7,6,5}only three tuple. So difference is 7-3=4

ENGINEER’S CIRCLE, GWALIOR Page 61

S-ar putea să vă placă și