Sunteți pe pagina 1din 288

Database Management Systems

Subject Code:

MI 0034

Revised Edition: Spring 2010

BKID

B1966

Sikkim Manipal University


Directorate of Distance Education
Department of Management Studies
Board of Studies
Chairman
HOD Management Studies
SMU DDE

Mr. Pankaj Khanna


Director
HR, Fidelity Mutual Fund

Additional Registrar
SMU DDE

Mr. Shankar Jagannathan


Former Group Treasurer
Wipro Technologies Limited

Dean
SMU DDE

Mr. Abraham Mathew


Chief Financial Officer
Infosys BPO

Dr. T. V. Narasimha Rao


Adjunct Faculty and Advisor
SMU DDE

Ms. Sadhna Dash


Senior HR Consultant
Bangalore

Prof. K. V. Varambally
Director
Manipal Institute of Management, Manipal
Revised Edition: Spring 2010
Printed: September 2014
This book is a distance education module comprising a collection of learning
materials for our students. All rights reserved. No part of this work may be
reproduced in any form by any means without permission in writing from Sikkim
Manipal University, Gangtok, Sikkim. Printed and Published on behalf of Sikkim
Manipal University, Gangtok, Sikkim by Manipal Global Education Services
Manipal 576 104.
Printed at Manipal Technologies Limited, Manipal.

Authors Profile:
Ramya S Gowda holds MS in Computer Science and Engineering and is pursuing
her post-graduation in management. She was working as Scientist C in Master
Control Facility, Department of Space Communication, ISRO, Hassan. She has
been associated with academics from 2006. She is presently working as a faculty
member in Sikkim Manipal University. She has published papers in various fields
like Pattern Recognition, E-Learning and Distance Education, Data mining,
Business intelligence, ecommerce, enterprise resource planning in national and
international Journals and conference such as International Conference on Digital
Factory (ICDF), National Conference on IT Enabled Practices and Emerging
management paradigms, International Conference on Computer Technology and
Development (ICCTD), emerging trends in computer science and information
technology (ETCSIT), international journal on computer science and information
technology (IJCSIT), International Journal Of Computational Engineering Research
(ijceronline.com), and Journal of Information Technology and Engineering.
Reviewers Profile
Dr Jai Raj Nair holds a Bachelor's degree in Architecture from Bengal Engineering
College (University of Calcutta), PGDBM from IIM, Calcutta and Ph.D from
Symbiosis International University, Pune. He worked for 9 years in the business
domain of Engineering and Software Consultancy in reputed organizations like
Development Consultants Ltd. (Delhi and Calcutta), and Kirloskar Computer
Services Ltd. (Bengaluru) prior to joining the academic world. At reputed B-Schools,
he has taught IT-related subjects, pertinent for management, for over 12 years. Dr
Nair is a voracious reader and an avid writer. He has presented papers at several
national, regional and international conferences. Some of his papers were selected
for international conferences conducted in Thailand, Italy, Greece and India. He has
also published research papers and articles in management journals of repute. His
research interests include e-retailing, supply chain management and retro-logistics,
business process reengineering, technology-enabled retailing, to name a few.
In House Content Review Team
Dr. Sudhakar G. P.
HOD
Dept. of Management Studies
SMU DDE

Ms. Ramya S Gowda.


Assistant Professor
Dept. of Management Studies
SMU DDE

Contents
Unit 1
Database Management System

Unit 2
Database Architecture

22

Unit 3
Record Storage and File Structure Organisation

41

Unit 4
Database Design

63

Unit 5
Entity Relationship Model

89

Unit 6
Relational Algebra and Relational Calculus

110

Unit 7
Structured Query Language

135

Unit 8
Functional Dependencies and Normalisation

173

Unit 9
Database Administration

192

Unit 10
Operations and Management

209

Unit 11
Controls

227

Unit 12
Distributed Databases

237

Unit 13
Object-Relational Databases

249

Unit 14
Security and Integrity

265

MI 0034
Database Management Systems
Course Description
A Database Management Systems (DBMS) is a collection of programs that
enables you to store, modify, and extract information from a database.
There are many different types of DBMSs, ranging from small systems that
run on personal computers to huge systems that run on mainframes.
This SLM Database Management System presents the fundamental
concepts of database management in an intuitive manner geared toward
allowing students to begin working with databases as quickly as possible.
This SLM is designed for as a first course in databases for students of post
graduation level. It also contains additional material that can be used as
supplements or as introductory material for an advanced course. To
understand this SLM better you should have a familiarity with basic data
structures, computer organization, and a high-level programming language
as prerequisites. Important theoretical results are covered, but formal proofs
are omitted. In place of proofs, figures and examples are used to suggest
why a result is true.

Course Objectives
Database management has evolved to a central component of a modern
computing environment. In this SLM, knowledge about database systems
has become an essential part of an education in computer science. In this
SLM the fundamental concepts of database management like database
design, database languages etc. have been discussed.
After studying this course, the student should be able to:
explain the different components of DBMS
elaborate the working of three-schema architecture
list and explain storage devices
describe various terminologies of database design
elucidate ER Model concept with an example and describe its
components
differentiate between tuple relational calculus and domain relational
calculus

list the different commands in SQL


elucidate the different types of normal forms
describe the basic concepts of transaction processing system
differentiate between centralised system and Distributed Database
Management System (DDBMS)
explain security and backup features in database
list the advantages of DDB
elucidate how ODBMS work
analyse the role of database administrator in security

This courseware comprises 14 units. A brief description of the units is given


below:
Unit 1: Database Management Systems
This unit discusses on differences between traditional file system and
modern database management system, elucidate the working of simple
centralised database system, list properties of DBMS, explain the different
components of DBMS, types of database users and database systems and
advantages of DBMS.
Unit 2: Database Architecture
This unit discusses the working of three-schema architecture, the
conceptual modelling, meaning of relationships and the database languages
and interfaces.
Unit 3: Record Storage and File Structure Organisation
This unit explains memory hierarchy, list and explain secondary storage
devices. This unit also explains buffering of blocks and placing file records
on disk, elaborate operation on files and differentiate the files of unordered
records (heap files) and ordered records.
Unit 4: Database Design
This unit covers the description of relational data model, various operations
in relational algebra, data dictionary and normalisation. This unit also
compares the different normal forms.
Unit 5: Entity Relationship Model
This unit discusses the conceptual data model for database design, explains
ER Model concept with an example, components of an ER Model and
constraints on relationship types.

Unit 6: Relational Algebra and Relational Calculus


This unit covers various constraints of relational model, update operation on
relations, demonstrate various operations in relational algebra, differentiate
between tuple relational calculus and domain relational calculus.
Unit 7: Structured Query Language
This unit explains the definition of SQL, list the different types of SQL,
multiple-table queries, elaborate on data manipulation language and
demonstrate creating the databases.
Unit 8: Functional Dependencies and Normalisation
This unit discusses on the guidelines for designing relational databases,
explain the levels of relational schema, elucidate the different types of
normal forms, distinguish between the different types of normal forms.
Unit 9: Database Administration
This unit discusses on the basic concepts of transaction processing system,
explains transactions in multiuser system, and list the properties of
transactions.
Unit 10: Operations and Management
This unit discusses on clientserver databases, list and explain different
types of locks and locking protocol in concurrency management,
differentiate between centralised system and Distributed Database
Management System (DDBMS), advantages and disadvantages of DDBMS
and the various types of distributed systems.
Unit 11: Controls
This unit describes atomicity, different recovery techniques, and security
and backup features in database.
Unit 12: Distributed Databases
This unit describes DDB system, list the advantages of DDB, describes data
replication and elucidate data fragmentation.
Unit 13: Object-Relational Databases
This unit describes advantages of object oriented design, the working of
object oriented data model, elucidate how ODBMS work, and the
constraints of ORDBMS.

Unit 14: Security and Integrity


This unit discusses various violations in security and integrity, relate the
concept of authorisation and authentication with database administrator,
security specifications in SQL, the role of database administrator in security
and the ethical issues in database security.

Database Management Systems

Unit 1

Unit 1

Database Management System

Structure:
1.1 Introduction
Objectives
1.2 Evolution of Database
1.3 Traditional File Systems versus Modern Database Management
Systems
File processing systems
Database management systems
Difference between file systems and database management
systems
1.4 Database Environment
1.5 Working of Simple Centralised System
1.6 Properties of Database Management System
1.7 Components of Database Management System
Database engine
Data dictionary
Forms generator
Query processor
Report writer
1.8 Types of Database Users
Database Administrator (DBA)
Database Designers (DBD)
End users
System analysts and application programmers
DBMS system designers and implementers
Tool developers
Operators and maintenance personnel
1.9 Types of Database Systems
1.10 Advantages of Database Management System
1.11 Summary
1.12 Glossary
1.13 Terminal Questions
1.14 Answers
1.15 Case Study

Sikkim Manipal University

B1966

Page No. 1

Database Management Systems

Unit 1

1.1 Introduction
To have a better understanding of the Database Management System
(DBMS), you should have a knowledge of data, information and database.
In this unit, you will study the evolution of database. You will also study the
basic difference between traditional file systems and modern database
management systems. We will describe the database environment and its
working. Before going through the unit, let me brief you on the basic
knowledge requirement for studying DBMS.
The major component in the database is data. Data is a raw fact that can
be recorded and has specific meaning. The processed data is called
information. For example, the combination of letters E L E P H A
N T has no meaning to us unless it is used as a noun word Elephant.
Here, the letter E is a data and Elephant is an information. The collection
of data in rows and columns is called database.
Therefore, database management system is defined as complex set of
software programs that controls the organisation, storage and retrieval of
data in a database. This means that DBMS is a collection of related data
consisting of a set of programs to access those data. It is the complete
description of the database structures and constraints. (Source:
www.managefranchise.blogspot.com)
DBMS is used in various areas of computers including business,
engineering, education, banking, law and in any transaction processing.
When we discuss about the various definitions of DBMS, we need to first
know about the earlier method used to store the data and the difference
between them.
In this unit, you will study the difference between the file system and DBMS.
You will also study the properties, components, advantages and
disadvantages of DBMS. When we discuss all these, we would also like to
find out the types of users using DBMS.
Let us go through this unit and discuss various insights of DBMS.
Objectives:
After studying this unit, you should be able to:
differentiate between traditional file system and modern database
management system
Sikkim Manipal University

B1966

Page No. 2

Database Management Systems

Unit 1

elucidate the working of simple centralised database system


list the properties of DBMS
explain the different components of DBMS
describe various types of database users and database systems
list the advantages of DBMS

1.2 Evolution of Database


Technical writing was introduced to store the information in the early days.
Though Aristotle said that storing large amount of data would be a delicate
issue, large data was nevertheless stored in voluminous repositories called
books. When large amount of data started to become part of the books, it
started consuming more place and size and took more time for users to cull
out the required information. Then the whole group of books took a
paradigm shift to the first real databases, that is, libraries.
Soon, libraries set standards for themselves as without standards there was
a complete chaos to access the data. Library usage was storage and
retrieval efficient.
This library system consists of indexing system and pointers. After this
invention, the library system grew with storage technologies until Alan
Turing invented Enigma codes. After the invention of computer codes, it
was extensively used to store and retrieve the data. By World War II, the
large amount of data was stored using file system as it was somehow
cheaper. The first file system used traditional lines and metaphors. Archived
data files are called tables. Rows in the table were called records and
columns were called fields. Slowly as data volume increased, the drawback
of file systems led to the invention of database management systems.
Self-Assessment Questions
1. Archived data files are called ______________.
a) tables
b) fields
c) records
d) rows
2. Rows in the table are called ____________ and columns are called
______________.
3. The collection of data in rows and columns is called _______________.
Sikkim Manipal University

B1966

Page No. 3

Database Management Systems

Unit 1

1.3 Traditional File Systems


Management Systems

versus

Modern

Database

1.3.1 File processing systems


In Section 1.2, you have studied how a file looks like. A file is a collection of
data that is indexed. In the file processing systems, the information is stored
as a group of records called files. These systems are the combination of
files and application programs to access those files. These files are called
flat files. The application programs are written using Common BusinessOriented Language (COBOL) as programming language. While creating
these systems, the focus was on business processes. Since the business
processes are dynamic, the files and applications need to be changed when
needed. And this offered little flexibility when the systems became more
complex and became very difficult to maintain.
1.3.2 Database management systems
According to Rameez Elmasri and Shamkant B. Navathe, Database
management system is a collection of programs that enables users to
create and maintain a database. DBMS is a general-purpose software
system that facilitates the processes of defining, constructing, manipulating,
and sharing databases among various users and applications. As
discussed in the introduction, database is a collection of data arranged in
rows and columns. For example, Automated Teller Machines (ATMs),
computerised library systems, ticket reservation systems, computer-based
inventory systems, and so on.
The database defines the field names and format of data, that is, whether
the data is a textual data, binary data or character data, and so on;
structures of the records, that is, whether the record is a pointer, fixed length
or field order, and so on; structure of the files, that is, whether the file
structure is indexed, sequential, and so on.
The main purpose of DBMS is to store, organise, control access and to
protect the data.
1.3.3 Difference between file systems and database management
systems
Table 1.1 explains the difference between file systems and database
management systems.
Sikkim Manipal University

B1966

Page No. 4

Database Management Systems

Unit 1

Table 1.1: Comparision between File Systems and Database Management


Systems
Modern database management
systems

Traditional file system


They are small systems, often PC based.

They are large systems like


mainframe based.

It is relatively cheap.

It is a relatively expensive.

Data definition is part of the application


program and works with only specific
application.
Change in data definition needs change in
the program

Data definition is part of DBMS.


Application is independent and
can be used with any application.

Design driven; they require design/coding


change when new kind of data occurs.
For example: In a traditional employee the
master file has Emp_name, Emp_id,
Emp_addr, Emp_design, Emp_dept,
Emp_sal, if we want to insert one more
column Emp_Mob number then it
requires a complete restructuring of the file
or redesign of the application code, even
though basically all the data except the one
column is the same.

One extra column (Attribute) can


be added without any difficulty.
Minor coding changes in the
Application program may be
required.

Keeps redundant [duplicate] information in


many locations. This might result in the
loss of Data Consistency.
For example: Employee names might exist
in separate files such as Payroll Master
File, and also in Employee Benefit Master
File, and so on. Now if an employee
changes his or her last name, the name
might be changed in the Payroll Master
File but not be changed in Employee
Benefit Master File, and so on. This might
result in the loss of Data Consistency.

Redundancy is eliminated to the


maximum extent in DBMS if
properly defined.

Data is scattered in various files, and each


of these files may be in different formats,
making it difficult to write new application
programs to retrieve the appropriate data.

This problem is completely


solved here.

Security features are to be coded in the


Application Program itself.

Coding for security requirements


is not required as most of them
have been taken care by DBMS.

Sikkim Manipal University

B1966

Page No. 5

Database Management Systems

Unit 1

It is a simple structure.

It is a complex structure.

It has no security.

It has very stringent data


security.

It has a simple and primitive backup


recovery.

It has a complex and


sophisticated backup recovery.

Self-Assessment Questions
4. ______________________ are the combination of files and application
programs to access those files.
5. State whether the following statements are true or false.
a) Database defines number of rows and columns.
b) DBMS is relatively cheap.
c) File system is design driven.
d) In DBMS, one extra column can be added without any difficulty.
e) DBMS is having a simple structure.

1.4

Database Environment

Let us take an example of name, roll number, class, section, attendance


and marks of the students in any class. One can have the note of these data
in a sequential address book or it can be stored on a hard disk using a
laptop and software like Microsoft Excel. Here the name, roll number, class,
section, attendance and marks of the students are considered as data; the
set of this related data having a specific implicit meaning is called a
database. Fig. 1.1 depicts a database environment.

Sikkim Manipal University

B1966

Page No. 6

Database Management Systems

Unit 1

Database Software

Application
Programs/queries

DBMS Software

Software to process
Programs/queries

Data
Description
(METADATA)

Software to access
stored data

Database
stored

Fig. 1.1: Database Environment

DBMS acts as an intermediatory agent between programs and the data.


Only after the application programs access the DBMS, the DBMS accesses
the data. Application programs are independent of the file structures. So
change in file structures does not require change in the programs and vice
versa.
Various procedures carried on in a DBMS
1. The process of specifying the data types, structures and constraints is
called defining the database.
2. The process of storing the data on some storage medium.
3. Manipulating the database involves the retrieval (activity of finding) of
required data and modifying it depending on the requirement.

Sikkim Manipal University

B1966

Page No. 7

Database Management Systems

Unit 1

Example: EMPLOYEE database


1. Defining a database
Entity

Employee

Attribute

Constraints
(limitations)

Data types

Emp_name

Char (40)

Alphabet Only

Emp_id

Num (6)

Val>0

Emp_add

Char (100)

Emp_desig

Char (15)

Emp_dept

Char (10)

Alphabet Only

Emp_sal

Number
(10.2)

Val>0

2. Constructing the database


Emp_
name

Emp_id

Prasad

100

Shubhodaya, Near
Katariguppe Big Bazaar,
BSK II stage, Bangalore

Usha

101

#165, 4th main Chamrajpet, Software


Bangalore
engineer

10,000

Nupur

102

#12, Manipal Towers,


Bangalore

Lecturer

30,000

Peter

103

Syndicate House, Manipal

IT executive

15,000

Emp_addr

Emp_desig
Project leader

Emp_Sal
(Rs.)
40,000

3. Manipulating the database


Examples for some queries:
1. List all employees whose salaries are greater than Rs. 20,000.
2. List all employees whose names start with P.
3. Delete records whose Emp_name is Prasad.
Self-Assessment Questions
6. DBMS acts as an intermediatory between ________________ and
_______________.
7. The process of specifying the data types, structures and constraints is
called __________________________.

Sikkim Manipal University

B1966

Page No. 8

Database Management Systems

Unit 1

1.5 Working of Simple Centralised System


Figure 1.2 shows the working of a centralised database system.

User
Purchasing
Request
s
Accounts
Payable

Sales

Accounts
Receivable
Inventory

Query

DBMS

Data
base

Outputs

Reports

Personnel

Payroll

Fig. 1.2: Working of a Simple Centralised System

In the centralised database system:


Database is stored in a central location.
Users have access to the common database.
Users can access the data from the central location from their own
machines using suitable programs. These required programs are
installed on individual computer terminals of the users, as shown in
Figure 1.2.

Sikkim Manipal University

B1966

Page No. 9

Database Management Systems

Unit 1

1.6 Properties of Database Management System


The following are the important properties of database:
A database is a logical collection of data having some implicit meaning.
If the data are not related then it is not called as proper database. For
example, a student studying in class II got 5th rank.
Stud_name
Vijetha

Class
Class II

Rank obtained
5th

A database consists of both data and the description of the database


structure and constraints. For example,
Field name

Type

Description

Stud_name

Character

It is the students name.

Class

Alpha numeric

It is the class of the student.

A database can have any size and be of various complexities. If we


consider the above example of employee database, the name and
address of the employee may consist of very few records each with
simple structure. For example,
Emp_
name

Emp_id

Prasad

100

Shubhodaya, Near Katariguppe Project leader


Big Bazaar, BSK
IIstage,Bangalore

40,000

Usha

101

#165, 4th main Chamrajpet,


Bangalore

Software
engineer

10,000

Nupur

102

#12, Manipal Towers, Bangalore Lecturer

30,000

Peter

103

Syndicate House, Manipal

15,000

Emp_addr

Emp_desig

IT executive

Emp_Sal
(Rs.)

Likewise, there may be n number of records.


DBMS is considered as general-purpose software system that facilitates the
process of defining, constructing and manipulating databases for various
applications.
A database provides insulation between programs, data and data
abstraction. Data abstraction is a feature that provides the integration of the
data source of interest and helps to leverage the physical data however the
structure is.
Sikkim Manipal University

B1966

Page No. 10

Database Management Systems

Unit 1

The data in the database is used by a variety of users for a variety of


purposes. For example, when you consider a hospital database
management system, the view of usage of patient database is different from
the same used by the doctor. In this case the data is stored separately for
the different users. In fact it is stored in a single database. This property is
referred to multiple views of the database.
Multiple user DBMS must allow the data to be shared by multiple users
simultaneously. For this purpose, DBMS includes concurrency control
software to ensure that the updates done to the database by variety of users
simultaneously get updated correctly. This property explains the multiuser
transaction processing.
Self-Assessment Questions
8. ____________________ is a logical collection of data having some
implicit meaning.
9. In Centralised database system, users have access to the common
database. (True/False)
10. DBMS is a specific purpose software system. (True/False)

1.7 Components of Database Management System


DBMS is a very important feature without which the database administrators
cannot perform the action on databases. We have studied in Section 1.4
that every request has to pass DBMS which contains a powerful tool set.
Figure 1.3 shows the various components of DBMS.

Data Dictionary
Forms
Generator
Query
processor

Database
Engine

Report writer

Fig. 1.3: Components of DBMS


Sikkim Manipal University

B1966

Page No. 11

Database Management Systems

Unit 1

1.7.1 Database engine


Database engine is termed as the heart of the database management
system that acts as a co-ordinator for performing all the activities of the
DBMS components. It has the responsibility for completion of the database
operations correctly. This means that the other components of DBMS
depend on database engine to carry out its tasks, for example, storing data
and system information, retrieving data and updating the data in the
database. In addition to the co-ordination of the different tasks of the
components of DBMS, the database engine is also responsible for the
security services of the database.
1.7.2 Data dictionary
Data Dictionary contains information about the data including the type of
data and its structures. Data dictionary is also an integrated part of the
database management system. It is the part of metadata of the database.
Data dictionary stores the definitions of the data and its structures. It also
stores the information about the storage allocation of the data. It coordinates with the database engine to keep track of data. It helps database
engine to retrieve data and generate and store report information.
1.7.3 Forms generator
The forms generator is used to design the front-end layout for the data
input. This will allow you to create the screen layout which helps in entering
data as an input and also to display data after retrieval. For example,
consider the database of students and if you would like to display the details
of the students who have scored more than 60% in the examination, you
need to enter several details such as student ID, student name and class to
search the specific query. You can use forms generator to display the
required information in a user-friendly format. Forms generator is found in
products such as Microsoft Access.
1.7.4 Query processor
Query processor is one of the important components of DBMS that does
parsing, optimising and compiling of queries while execution. Each DBMS
package will have their own standard query language for creating queries.
DBMS uses the Structured Query Language (SQL), which has the set of
commands to present the query.

Sikkim Manipal University

B1966

Page No. 12

Database Management Systems

Unit 1

For example, in order to retrieve the data from the student database whose
marks are more than 60%, we can write the following query:
SELECT *
FROM STUDENT
WHERE Marks obtained >= 60
When this query is written, database engine co-ordinates with the query
processor to process the query and gives the output. If any error occurs in
the query, the query processor parses it and checks for the errors and
displays the notification. It is then optimised by selecting the required
resources and then compiled to get the output.
1.7.5 Report writer
Report writers are an optional component of DBMS as the access to
database is available online. However, sometimes we may require printed
report for some documentation purpose. The format of the report writer is
product specific. Crystal Reports is an example of a popular report writer.
Self-Assessment Questions
11. ______________________ component is responsible for the security
services of the database.
12. ________________________ contains the information about the data
including the type of data and its structures.
13. Forms generator is used for generating reports. (True/False)
14. Query processor does __________________, ______________ and
_________.

1.8 Types of Database Users


Different persons who are involved in the design, usage and maintenance of
a large database include the following:
1. Database Administrator (DBA)
2. Database Designers (DBD)
3. End users
4. System analysts and application programmers
5. DBMS designers and implementers
6. Tool developers

Sikkim Manipal University

B1966

Page No. 13

Database Management Systems

Unit 1

1.8.1 Database Administrator (DBA)


Database is one of the many primary resources that is used by many people
in an organisation.
The responsibilities of database administrator are listed below:
DBMS and related software are the secondary resources. Administering
this secondary resource is the responsibility of the database
administrator.
He/she usually has the complete authority to access and monitor the
database.
He/she is responsible for creating, modifying and maintaining the
database.
He/she grants permission to the users of the database.
He/she stores the profile of each user in the database.
He/she defines procedures to recover the database resulting from
failures due to human, natural or hardware causes.
1.8.2 Database Designers (DBD)
A database designer designs the database in such a manner that it meets
the requirements of the clients.
1.8.3 End users
People who access the database, query and update the database and
generate the various reports; the database primarily exists for their use.
End users are of two types:
o Casual users They are the users accessing DBMS with SQL queries.
o Nave users They are the users accessing DBMS through menus.
1.8.4 System analysts and application programmers
System analysts collect the information regarding requirements of the
end users and develop specifications for canned transactions
(standardised queries and updates with carefully programmed data
validity checking) that meet their requirements.
Application programmers implement specifications developed by the
system analysts in the form of programs. They are also responsible to
test, debug, document and maintain these programs. These are the
programmers who write menus applications.

Sikkim Manipal University

B1966

Page No. 14

Database Management Systems

Unit 1

1.8.5 DBMS system designers and implementers


System Designers there need a detailed report to be generated to follow
the rules of development and integrateion of computer system to satisfy the
business requirements.
Implementers implement the DBMS modules and interfaces as a software
package.
1.8.6 Tool developers
Tools are the third party optional software packages that are not available
with DBMS. They include packages for DB design, performance, monitoring
and graphical interacts. In many cases, independent software vendors
develop and market these tools. They are called Tool Developers.
1.8.7 Operators and maintenance personnel
These are the system administration personnel that are responsible for the
actual running and maintenance of the hardware and software environment
for the DBMS.
Self-Assessment Questions
15. ___________________________ grants permission to the users of the
database.
16. _______________ and ________________ are the types of end users.
17. Implementers are responsible for testing, debugging and maintaining
the programs. (True/False)

1.9 Types of Database Systems


The three major types of database systems are as follows:
Analytic databases
Operational databases
Object-oriented databases

Analytic databases - Analytic databases are also called Online Analytic


Processing (OLAP). If you need to store some of the historical data to
be stored and used to analyse the results then we can use OLAP. For
example, if you want to compare the student progress from his previous
years then you can use this analytical database to make the analysis.
Similarly you can also use this to store the previous years data of the
market and conduct the markey survey analysis of the stock market.

Sikkim Manipal University

B1966

Page No. 15

Database Management Systems

Unit 1

Operational databases - Operational databases are also called Online


Transaction Processing (OLTP). It allows you to track the real-time
information. For example, real-time information helps the company to
store the warehouse stock data. When there is any use of the stock in
the company, real-time information helps in knowing the daily stock and
helps to order the purchase on priority.

Object-Oriented Database Management Systems - Object-Oriented


Database Management Systems (OODBMS) are used to store data of
various new data types such as audio, video, graphics, images, and so
on. OODBMS is helpful in storing the detailed information such as
name, address, phone number or any kind of numerical or statistical
data. OODBMS is used to store a variety of media sources such as
sound, video, and so on.

1.10 Advantages of Database Management System


The following are the advantages of DBMS:

Logical organisation of data sets.

Redundancy is reduced.

Data located on a server can be shared by clients.

Integrity (accuracy) can be maintained.

Security features protect the data from unauthorised access.

Modern DBMS supports Internet-based application.

In DBMS the application program and structure of data are independent.

Consistency of data is maintained.

DBMS supports multiple views. As DBMS has many users, and each
one of them might use it for different purposes, and may require to view
and manipulate only on a portion of the database, depending on
requirement.

Data flexibility is increased.

Interdependency of program and data is decreased.

Software development cost is reduced.

Monitors data performance.

Common data is shared among the application programs.

Sikkim Manipal University

B1966

Page No. 16

Database Management Systems

Unit 1

Self-Assessment Questions
18. Analytic databases are also called _____________________.
19. _________________ allows to track the real-time information.
20. ___________________ is helpful in storing media sources.

1.11 Summary
Let us recapitulate the important concepts discussed in this unit:
Database management system is a collection of related data consisting
of a set of programs to access those data. It is the complete description
of the database structures and constraints.
In the file processing systems, the information is stored as a group of
records called files. These systems are the combination of files and
application programs to access those files. These files are called flat
files.
The database defines the field names and format of data, that is,
whether the data is a textual data, binary data or character data, and so
on; structures of the records, that is, whether the record is a pointer,
fixed length or field order, and so on; structure of the files, that is,
whether the file structure is indexed, sequential, and so on.
DBMS acts as an intermediatory agent between programs and the data.
Only after the application programs access DBMS, the DBMS accesses
the data. Application programs are independent of the file structures. So
change in file structures does not require change in the programs and
vice versa.
The most important property of a database is that it is a logical collection
of data having some implicit meaning.
The important compoents of DBMS are database engine, data
dictionary, forms generator, report writer and query processor.
The various users of DBMS are Database Administrator (DBA),
Database Designer (DBD), end users, system analysts and application
programmers, DBMS system designers and implementers and tool
developers.
The three major types of database systems are analytic databases,
operational databases and object-oriented databases.
The major advantage of DBMS is reduction of redundacy that leads to
an increase in consistency.
Sikkim Manipal University

B1966

Page No. 17

Database Management Systems

Unit 1

1.12 Glossary
Constructing: Process of storing the information of a medium as instructed
by DBMS.
Defining: Specifies the data types, structures and constraints of the data to
be stored in the database.
Manipulating: Includes requests to retrieve the specific data in the
database, updating the database and generating reports from the retrieved
data.
Sharing: Simultaneous accessing of data by multiple users.

1.13 Terminal Questions


1. How do you differentiate between file system and
management system?
2. Explain simple centralised database system.
3. What are the the properties of DBMS?
4. What are the components of DBMS? Describe with its use.
5. What are the types of database users?
6. List all the advantages of database management systems.

database

1.14 Answers
Self-Assessment Questions
1. (a) Tables
2. Records, fields
3. Database
4. File processing systems
5. (a) False
(b) False
(c) True
(d) True
(e) False
6. Programs, data
7. Defining the database
8. Database
9. True
10. False
Sikkim Manipal University

B1966

Page No. 18

Database Management Systems

11.
12.
13.
14.
15.
16.
17.
18.
19.
20.

Unit 1

Database engine
Data dictionary
False
Parsing, optimising, compiling
Database administrator
Casual users, nave users
False
OLAP
OLTP
OODBMS

Terminal Questions
1. The file processing system is relatively cheaper compared to database
management system. In a file processing system, the programs and
data are interdependent where as in DBMS they are independent of
each other. (Refer to Section 1.2 for further information.)
2. In the centralised database system, database is stored in a central
location. (Refer to Section 1.5 for further information.)
3. One of the important properties of database systems is that database is
a logical collection of data having some implicit meaning. If the data are
not related then it is not called as proper database. (Refer to Section 1.6
for further information.)
4. There are many components of DBMS. Very important are database
engine, data dictionary, report writer, forms generator and query
processor. (Refer to Section 1.7 for further information.)
5. The different types of database users are Database Administrator
(DBA), Database Designers (DBD), end users, system analysts and
application programmers, DBMS system designers and implementers
and tool developers. (Refer to Section 1.8 for further information.)
6. The most important advantage of DBMS is that it reduces redundancy
and consistency is increased. (Refer to Section 1.9 for further
information.)

Sikkim Manipal University

B1966

Page No. 19

Database Management Systems

Unit 1

1.15 Case Study


Case Study on the Electronic Storage of Data
Database management system plays a very important role in terms of
storing records of the organisation. Computer-based DBMS usage helps in
overcoming vulnerability to the theft, loss, fire or decay of the files stored in
the rack. But many organisations follow both the ways of storing the data in
order to avoid any kind of loss of data.
Example 1: Let us take an example of a Library Management System. In
this system, the list of resources is present in the electronic backup and also
they maintain manual register in spite. This is done in order to get the data
from the database regarding theft/loss of the resources in the library and
manual database is maintained in order to retrieve the data in the case of
database failure. But this has got cost effect on the organisation since it
bears both the cost of database and maintenance.
Example 2: Similarly we will take another example to better understand the
situation. In the Hospital Management System, the details of the patient,
doctor, equipment and other services are maintained in the tables and
tables are interlinked. But again in the reception, paper cards and registers
are maintained manually. Sometimes these registers are filled up by the
patients or the employees and then entered into the database. So this costs
the organisation both time and money. The time taken by the reception to
enter the details in the register can be used in some other constructive work
and also the person who sits to do this job needs to be paid extra.
Discussion Questions:
1. How do you overcome the problem of loss or is there any solution to this
situation?
2. What are the properties of automation with respect to DBMS?
3. Conventional design versus database design.
(Hint: Refer to Sections 1.3 and 1.6.)

Sikkim Manipal University

B1966

Page No. 20

Database Management Systems

Unit 1

References/E-References:
References:
Er. Jain, V. K. (2006). Database Management Systems. Dreamtech
Press.
Elmasri, R., & Shamkant Navathe, B. (2009). Fundamentals of
Database Systems. Pearson Education.
Gillenson, M. L., Ponniah, P., Kriegal, A., Boris T., Taylor, A. G.,
Powell, G., & Miller, F. (2008). Introduction to Database Management.
Wiley India Edition.

Sikkim Manipal University

B1966

Page No. 21

Database Management Systems

Unit 2

Unit 2

Database Architecture

Structure:
2.1 Introduction
Objectives
2.2 Three-Schema Architecture
2.3 Conceptual Data Modelling
Relationships
Data independence
2.4 Database Languages and Interfaces
DBMS languages
DBMS interfaces
2.5 Summary
2.6 Glossary
2.7 Terminal Questions
2.8 Answers
2.9 Case Study

2.1 Introduction
In the previous unit, we studied the basic concepts of database
management systems where we discussed the working of centralised
system and the components of DBMS. We also discussed the properties
and advantages of DBMS. We studied the different users of DBMS. In this
unit, we will study in detail the three-tier architecture of the DBMS. We will
also discuss the different models and the relationships.
Database modelling is based on the three-tier architecture. As mentioned in
IJCA website, A database model is a theory or specification describing how
a database is structured and used. The different models are hierarchical
model, network model, relational model, and so on.
In this unit, you will also study the database language and interfaces. We
need database languages to describe the data, data structures and its
operations. The best tool will be having languages to define them. There are
different database languages such as data definition language, data
manipulation language, data control language, and so on. A very good
example of a database language is Structured Query Language (SQL)

Sikkim Manipal University

B1966

Page No. 22

Database Management Systems

Unit 2

which is discussed in detail in Unit 7. In this unit we will also discuss the
components of DBMS.
Objectives:
After studying this unit, you should be able to:

elaborate the working of three-schema architecture

explain the conceptual modelling

explain the meaning of relationships

describe the database languages and interfaces

2.2 Three-Schema Architecture


According to the three-schema or three-tier architecture, the schema is a
representation of the real-world objects. The four levels are level 0, level 1,
level 2 and level 3. Level 0 represents the real world, level 1 represents the
entity relationship model, level 2 represents the relational model and level 3
represents the physical data structures. Before going through in detail on
the working of three-schema architecture, you should have knowledge on
data models, schema and instances.

Data Models: You can define data model as a set of concepts for
viewing a set of data in a structured way. Data models are the easier
way to understand the database system by professionals and nontechnical users. It can explain the way in which the organisation uses
and manages the information. In data models, the concept of entity,
attribute and relationship is very important. Entity is something that has
a distinct, separate existence, though it need not be of a material
existence. For example, student is an entity. Attribute is the property that
describes an entity. It is a characteristic or property of an object, such as
weight, size or colour. Relationship describes the relationship between
two or more entities.

Schema: You can define schema as the description of the database. In


other words schema defines the names, data type, size of a column in a
table and database (actual data in the table) itself. The description of a
database is called the database schema (or the Meta data). Description
of a database is specified during database design and is not frequently
changed. For example, Roll No., Name, Semester, Branch. Schema is

Sikkim Manipal University

B1966

Page No. 23

Database Management Systems

Unit 2

differentiated into two parts, namely, external schema and internal


schema.

Instances: You can define instance as a collection of data stored in the


database at a particular moment. A database instance is also called as
database state or snapshot. These changes are very frequent due to
addition, deletion and modification of the data. For example, the
instance for the schema example given above will be 9 Shubha Dixit IV
Computer Science.

The three-schema architecture has three levels of architecture an internal


level, a conceptual level and an external level. The three-schema
architecture is also referred to as clientserver architecture. The division of
architecture into three levels is an advantage of this architecture that allows
both developers and users to work on their own levels. They do not need to
know the details of the other levels and they do not have to know anything
about changes in the other levels. Note that each of these schemas are only
descriptions of data and the actual data exists only at the physical level.
Figure 2.1 depicts the working of the three-schema architecture.

Fig. 2.1: Three-Schema Architecture


(Source:
http://www.gitta.info/DBSysConcept/en/html/DBMSArchitec_learningObject1.html
(Retrieved on 29th May 2012))
Sikkim Manipal University

B1966

Page No. 24

Database Management Systems

Unit 2

The external schema This is the outermost layer of the three-tier


architecture which forms the closest layer to the users. The data viewed by
the individual users through the applications is called external level data.
The logical schema This forms the middle layer of the three-tier
architecture that hides the details of physical storage structures and
concentrates on describing entities. The logical schema is derived from the
conceptual schema. This schema is independent of both software and
hardware.
The internal schema This schema forms the description of the physical
storage structure of the database. Operations performed in this schema are
translated into modifications of the contents and structure of the files. This
level describes the complete details of the stored records and access
methods used to achieve efficient access to the data.
A three-tier architecture has many applications with its relevant external
schema. The external schema of a specific application is mapped to only
that part of the logical schema which is relevant for its application.
Therefore, a database has exactly one internal and one logical schema but
may have several external schemas for several applications using this
database.
The aim of the three-schema architecture is keeping away the user from the
physical database. The data is actually present in the internal level and
other forms of data are derived from this level if required. The DBMS has
the task to realise this representation between each of these levels.
Self-Assessment Questions
1. Database modelling is based on ________________ architecture.
2. ________________ are the easier way to understand database
management system by professional and non-technical users.
3. Entity has distinct, separate existence. (True/False)
4. _____________ is a collection of data stored in the database at a
particular moment.

2.3 Conceptual Data Modelling


The conceptual data modelling identifies the highest level relationships
between the different entities. Highest level relationships mean that it does
Sikkim Manipal University

B1966

Page No. 25

Database Management Systems

Unit 2

not have detailed information about the attributes and schema. Conceptual
data modelling contains only the important entities and its relationships. This
kind of modelling is used in the initial modelling phase.
In order to create conceptual data model, the data is gathered from various
sources such as business documents, business analysts, group discussion
with the functional teams, database reports and end users.
The representation of a conceptual data modelling is explained with an
example of a student database which is shown in Figure 2.2.
Class

Subject

Student

Teachers

Fig. 2.2: Representation of Conceptual Data Modelling

In the example of a student database, there are three entities related to


student entity: teachers, class and subjects. All these three are in
relationship with the student entity. Therefore, in the conceptual data model,
we can see only information that describes the entity and its relationships
between those entities. The detailed information about the attributes,
primary key, and so on, is not shown in conceptual data modelling.

Sikkim Manipal University

B1966

Page No. 26

Database Management Systems

Unit 2

Characteristics of conceptual data modelling:

It gives an overall view of the database structure and provides broad


information on the data structure.

This step is the first step in data modelling and gives clear information in
representing business of the organisation.

It comprises important entities and relationships between them.

In conceptual modelling, there is no primary key specified.

In conceptual modelling, no attribute is specified.

In conceptual data modelling, both technical and non-technical teams


project their ideas to build a strong data modelling in the next step.

2.3.1 Relationships
Relationships are the type of connectivity between two or more entities. For
example, if we say student and class are two entities, then student belongs
to class will be the relationship between student and class. Therefore, here
you can say belongs to is a relationship.
There are many types of relationships. In this unit we are going to cover the
following three types of relationships:

One-to-one relationships

One-to-many and many-to-one relationships

Many-to-many relationships

One-to-one relationships In one-to-one relationships, an entity of


one database is uniquely related to an entity of another database. For
example, if we consider customer table and address table related to
customer table, then customer table and address table are said to be in
one-to-one relationship with each other. This is shown clearly in Tables
2.1 and 2.2.
Table 2.1
CUSTOMER
customer id

customer name

address id

201

Shubha B.G

301

202

Arpita Mandal

302

203

Priya Mehta

303

Sikkim Manipal University

B1966

Page No. 27

Database Management Systems

Unit 2
Table 2.2
ADDRESS

Address id

Address

301

#4, Whitehouse Apartments, 1st cross, III main,


RT Nagar, Bangalore 560008

302

#83, Prafulla Sarkar Street, Calcutta 700 001

303

#6, Old Airport Road, near Cammando Hospital,


Domlur, Bangalore 560008

Now we have two tables: CUSTOMER and ADDRESS. If each record in the
address table belongs to one record in the customer table, then it is called
as one-to-one relationships. In this if you observe, you need to add an extra
field in the customer table called address_id. This field is called as foreign
key. Foreign key is always the primary key of another table.
We can show the mapping of the above tables in one-to-one relationships
as shown in Figure 2.3.

201

301

202

302

203

303

Fig. 2.3: Example of a One-to-One Relationship

One-to-many and many-to-one relationships In this kind of


relationships, one entity of a database may be related to one or more
entities of another database and vice versa. For example, if we consider
the same customer table as in Table 2.1 and assume the online
transaction on any e-commerce website, then we may consider the facts
that
o

One customer can make many orders.

More than one item can be made in one order

Sikkim Manipal University

B1966

Page No. 28

Database Management Systems

Unit 2

Now considering these two situations, we will have one-to-many


relationships. We will have another table called order table, as shown in
Table 2.3. Here is an elaboration of the example.
Table 2.3
ORDER
Order_id

Customer_id

Number of
items

Date of
order

Amount
(Rs.)

701

201

3/06/2012

15,000

702

203

4/06/2012

30,000

703

201

4/06/2012

35,000

A customer may have no orders or may have one or many orders. However,
every order belongs to only one customer.
We can show the mapping of the above tables in one-to-many relationships
as shown in Figure 2.4.
201

701

202

702

203

703

Fig. 2.4: Example of a One-to-Many Relationships

Many-to-many relationships In this kind of relationships, multiple


entities of one database may be related to more than one entity of
another database and an entity in the second database may be related
to many entities in the other database. That means multiple instances
exist between the relationships. For example, in every order we can
have multiple items and each item may be there in multiple orders. All
we need to do here is to create one more table which can relate to the
order and item table. The only purpose of this table is to create many-tomany relationships.

Sikkim Manipal University

B1966

Page No. 29

Database Management Systems

Unit 2

Considering the order table as in Table 2.3 and taking another table called
items table as shown in Table 2.4, we need to create an additional table
called items_order table,as shown in Table 2.5.
Table 2.4

ITEMS
Item_id

Item_name

300

Transcend: 16 GB pendrive

301

Kingston Blue-ray disc

302

Seagate hard disc drive: 300 GB

303

Sony DVD: 4.7 GB


Table 2.5
ITEMS_ORDER
Order_id

Item_id

701

300

702

300

702

301

703

302

703

303

We can show the mapping of the above tables in many-to-many


relationships as depicted in Figure 2.5.

300
701
301
702
302
703
303

Fig. 2.5: Example of Many-to-Many Relationships

Sikkim Manipal University

B1966

Page No. 30

Database Management Systems

Unit 2

2.3.2 Data independence


Data independence is defined as the ability to modify a schema definition in
one level without affecting a schema definition in a higher level.
There are two kinds of data independence:
Physical data independence
Logical data independence
Physical data independence This is the ability to modify the physical
scheme without causing application programs to be rewritten. Modifications
at this level are usually to improve performance.
Logical data independence This is the ability to modify the conceptual
scheme without causing application programs to be rewritten. This is usually
done when the logical structure of database is altered. Logical data
independence is harder to achieve, as the application programs are usually
heavily dependent on the logical structure of the data. An analogy is made
to abstract data types in programming languages.
Self-Assessment Questions
5. State whether the following statements are true or false:
a) In conceptual data modelling, primary key is present.
b) Attribute is specified in conceptual data modelling.
6. ________________ are the type of connectivity between two or more
entities.
7. In ____________________ relationships an entity of a database is
uniquely related to entity of another database.
8. _________________ and ________________ are the two types of
data independence.

2.4 Database Languages and Interfaces


So far we have discussed data structures, description of data and
components of database management systems. In this section, we will
study the various languages required to process this data. As database
supports a number of user groups, DBMS must have languages and
interfaces that support each of these user groups.

Sikkim Manipal University

B1966

Page No. 31

Database Management Systems

Unit 2

2.4.1 DBMS languages


The languages are necessary for describing data and data structures. There
are two languages that are used for definition and manipulation of database.
They are:
DDL: Data Definition Language
DML: Data Manipulation Language
DDL This language is used by the Database Administrator (DBA) and
database designers to define the conceptual and internal schemas.
o

The DBMS has a DDL compiler to process DDL statements in order to


identify the schema constructs and to store the description in the
catalogue.

In databases where there is a separation between the conceptual and


internal schemas, DDL is used to specify the conceptual schema; SDL
(Storage Definition Language) is used to specify the internal schema.

For true three-schema architecture, VDL (View Definition Language) is


used to specify the user views and their mappings to the conceptual
schema. But in most DBMS, the DDL is used to specify both the
conceptual schema and the external schema.

DML This is a family of computer languages used by computer programs


or database users to retrieve, insert, delete and update data in a database.
o

Currently, the most popular DML is that of SQL, which is used to retrieve
and manipulate data in a relational database.

Other forms of DML are those used by IMS/DL1, CODASYL databases


(such as IDMS), and others.

DMLs were initially used only by computer programs, but have come to
be used by non-programmers as well (with the advent of SQL).

DMLs have their functional capability organised by the initial word in a


statement, which is almost always a verb. In the case of SQL, these
verbs are select, insert, update and delete.

DMLs tend to have many different flavours and capabilities between


database vendors.

There has been a standard established for SQL by ANSI, but vendors
still exceed the standard and provide their own extensions.

Sikkim Manipal University

B1966

Page No. 32

Database Management Systems

Unit 2

The two main types of DML are as follows:


High-level/non-procedural
In high level language, the user only specifies what data is needed.
This language is easier for user. This language may not generate code
as efficient as that produced by procedural languages

It can be used on its own to specify complex database operations.

DBMSs allow DML statements to be entered interactively from a


terminal, or to be embedded in a programming language.

If the commands are embedded in a general-purpose programming


language, the statements must be identified, so that they can be
extracted by a pre-compiler and processed by the DBMS.

High-level DMLs, such as SQL can specify and retrieve many records in
a single DML statement, and are called set at a time or set oriented
DMLs.

High-level languages are often called declarative because the DML


often specifies what to retrieve, rather than how to retrieve it.

Low-level/procedural
In low level language, the user specifies what data is needed and how
to get it.

It must be embedded in a general purpose programming language.

It typically retrieves individual records or objects from the database and


processes each separately.

Therefore, it needs to use programming language constructs such as


loops.

Low-level DMLs are also called record at a time DMLs

2.4.2 DBMS interfaces


Types of interfaces provided by the DBMS include the following:
Menu-based interfaces for web clients or browsing

It presents users with a list of options (menus).

Lead user through formulation of request.

The query is composed of selection options from menus displayed by


the system.

Sikkim Manipal University

B1966

Page No. 33

Database Management Systems

Unit 2

Forms-based interfaces

It displays a form to each user.

A user can fill out the form to insert new data or fill out only certain
entries.

It is designed and programmed for nave users as interfaces to canned


transactions.

Graphical user interfaces (GUIs)


It display a schema to the user in a diagrammatic form. The user can
specify a query by manipulating the diagram. GUIs use both forms and
menus.
Natural language interfaces

It accept requests in written English or other languages and attempts to


understand them.

This interface has its own schema and a dictionary of important words. It
uses the schema and dictionary to interpret a natural language request.

Interfaces for parametric users

Parametric users have a small set of operations they perform.

Analysts and programmers design and implement a special interface for


each class of nave users.

Often a small set of commands are included to minimise the number of


keystrokes required (i.e. function keys).

Interfaces for the DBA

Systems contain privileged commands only for DBA staff.

It includes commands for creating accounts, setting parameters,


authorising accounts, changing the schema, reorganising the storage
structures, and so on.

Self-Assessment Questions
9. ___________________ and _____________________ are the two
languages that are used for definition and manipulation of database.
10. What are the two types of DML?

Sikkim Manipal University

B1966

Page No. 34

Database Management Systems

Unit 2

2.5 Summary
Let us recapitulate the important concepts discussed in this unit:

The three-schema architecture has three levels of architecture, an


internal level, a conceptual level and an external level. The threeschema architecture is also referred to as clientserver architecture.

The conceptual data modelling identifies the highest level relationships


between the different entities. Highest level relationships means it does
not have detailed information about the attributes and schema.
Relationships are the type of connectivity between two or more entities.
The different types of relationships are one-to-one relationships; one-tomany and many-to-one relationships; and many-to-many relationships.

Data and data types are described using two DBMS languages, namely,
Data Definition Language (DDL) and Data Manipulation Language
(DML).

The different types of DBMS interfaces are menu-based interfaces for


web clients or browsing, forms-based interfaces, graphical user
interfaces, natural language interfaces, interfaces for parametric users
and interfaces for the DBA.

2.6 Glossary
ANSI: The American National Standards Institute (ANSI) is a private nonprofit organisation that oversees the development of voluntary consensus
standards for products, services, processes, systems and personnel in the
United States. The organisation also co-ordinates U.S. standards with
international standards so that American products can be used worldwide.
For example, standards ensure that people who own cameras can find the
film they need for that camera anywhere around the globe.
Client: A client is an application or system that accesses a service made
available by a server.
CODASYL: (Conference on Data Systems Languages) CODASYL is
remembered almost entirely for two activities: its work on the development
of the COBOL language and its activities in standardising database
interfaces. It also worked on a wide range of other topics, including end-user
form interfaces and operating-system control languages.

Sikkim Manipal University

B1966

Page No. 35

Database Management Systems

Unit 2

IDMS: Integrated Database Management System is primarily a network


(CODASYL) database management system for mainframes. It was first
developed at B.F. Goodrich and later marketed by Cullinane Database
Systems (renamed Cullinet in 1983).
IMS/DL1: DL1 and IMS are the IBM databases. DL1 for VSE users and IMS
for MVS users. Several DL1 files may be processed together, including
having input and output of all other types of files.
Mapping: The process to convert a request (from external level) and the
result between view levels is called mapping. The mapping defines the
correspondence between three view levels. The mapping description is also
stored in data dictionary. The DBMS is responsible for mapping between the
three types of schemas.
Non-technical users: Non-technical users are people who are not required
to be technically savvy to use the product.
Server: Server is a physical computer dedicated to running one or more
such services to serve the needs of users of the other computers on the
network.
SQL: SQL stands for Structured Query Language. SQL lets you access and
manipulate databases. SQL is an ANSI standard.
Statements: Statements are a set of instructions given to perform a specific
task.
Table: A table is a set of data elements or values that are organised using a
model of vertical columns and horizontal rows, the cell being the unit where
a row and column intersect.

2.7 Terminal Questions


1. Define entity, schema and instance.
2. Briefly explain the working of clientserver architecture.
3. Differentiate between the different types of relationships with an
example.
4. What are DBMS languages? Explain the different types of DBMS
interfaces.

Sikkim Manipal University

B1966

Page No. 36

Database Management Systems

Unit 2

2.8 Answers
Self-Assessment Questions
1. Three-tier
2. Data models
3. True
4. Instance
5. (a) False
(b) False
6. Relationships
7. One-to-one
8. Physical and logical
9. Data Definition Language and Data Manipulation Language
10. High-level/non-procedural and low-level/procedural
Terminal Questions
1. Entity is something that has a distinct, separate existence, though it
need not be of a material existence. You can define schema as a
description of the database. Instance is the collection of data stored in
the database at a particular moment. A database instance is also called
as database state or snapshot. (Refer to Section 2.2 for further
information.)
2. The three-schema architecture has three levels of architecture, an
internal level, a conceptual level and an external level. The threeschema architecture is also referred to as clientserver architecture. The
division of architecture into three levels is an advantage of this
architecture which allows both developers and users to work on their
own levels. They do not need to know the details of the other levels and
they do not have to know anything about changes in the other levels.
Note that each of these schemas are only descriptions of data and the
actual data exists only at the physical level. (Refer to Section 2.2 for
further information.)
3. In one-to-one relationships, an entity of one database is uniquely related
to an entity of another database. In one-to-many and many-to-one
relationships, one entity of a database may be related to one or more
Sikkim Manipal University

B1966

Page No. 37

Database Management Systems

Unit 2

entities of another database and vice versa. In many-to-many


relationships, multiple entities of one database may be related to more
than one entity of another database and an entity in the second
database may be related to many entities in the other database. That
means multiple instances exist between the relationships. (Refer to
Section 2.3 for further information.)
4. The languages are necessary for describing data and data structures.
There are two languages that are used for definition and manipulation of
database. They are DDL (Data Definition Language) and DML (Data
Manipulation Language). The different types of DBMS interfaces are
menu-based interfaces for web clients or browsing, forms-based
interfaces, graphical user interfaces, natural language interfaces,
interfaces for parametric users and interfaces for the DBA. (Refer to
Section 2.4 for further information.)

2.9 Case Study


ClientServer Environment
A Korean securities firm migrates its ledger systems from third-party
managed mainframe systems to a new, internal clientserver-based
architecture, creating greater customer satisfaction and benefit.
Before the introduction of clientserver environment in the securities market,
most securities companies had their ledger systems on IBM mainframe
systems and managed by external agencies. Having a third party to manage
the ledger, it was difficult to respond positively with rapid changes to the IT
infrastructure or to cyber customer needs. So they built a new ledger
transfer system based on clientserver and replication technology. In
building the new ledger system based on clientserver architecture, they
could support new customer-centric services and get various benefits such
as total customer management and innovations in business processes.
After building the new system, customers received more benefits and
satisfaction from the rapid clientserver system. It has caused maximum
securities companies to move their systems to a clientserver environment.
Based on the clientserver system, these companies are now extending
their systems to Internet trading system.

Sikkim Manipal University

B1966

Page No. 38

Database Management Systems

Unit 2

Applications:
The goal of adopting clientserver project by the Korean security team was
to remove the inefficiency and rigidity caused by their third-party mainframe
ledger management and to build a customer-focused information system.
Benefits:

In customer services The customers can do all transactions for


services and financial products through just one bankbook. Transaction
time has rapidly and dramatically improved.

For employees The project can give total account service to


customers for the services and financial products by the integration of
diverse customer account information. Also, it has reduced transaction
error by 85% because the old complex process was simplified and the
new accounting process is automatic. Closing time has been drastically
shortened by the management of real-time cash-on-hand by tellers.

In management efficiency The process enhancement has increased


productivity and customer services. It has improved the competitive
position.

Importance:
The Korean team implemented the distributed computing environment for
transaction processing through the clientserver architecture and the
replication technology before other competitive companies could do so.
They could respond to their customer requirements within a specific time.
That is, the replication technology they adopted was very important and
core of the new system. It enabled them to process a multitude of
complicated transactions in a specific time such that they got a rapid
response time and stability. According to them, for the first time, they have
introduced the clientserver architecture and distributed computing
environment in the securities market. Before they adopted the replication
technology, the industry implemented it as the backup function only. But
they used it as one of the functions of distributed computing. After that,
many companies have adopted it as core technology to build their client
server architectures.
Success:
They had exceeded their goals and targets. Before they built this system,
they had as a target 2 million accounts, 700 million transactions per day,
Sikkim Manipal University

B1966

Page No. 39

Database Management Systems

Unit 2

1 million orders processed per day and 1.8 million order inquiries per day.
They now have 1.8 million accounts and process 850 million transactions
per day, 1.8 million order process and inquiries per day. The clientserver
technology and replication function are now operating through the entire
operational business. They will have a plan to extend our system to
Customer Relationship Management (CRM) based on the Internet
environment. According to the Korean team, before they built the client
server system, they ranked as the fifth company in securities market, but
now they are the leading company based on the new clientserver system
in Korea. Their mission will be to become a worldwide investment company.
Discussion Questions:
1. Why did the Korean team adopt clientserver environment over IBM
mainframes for maintaining their ledger systems?
2. What are the success factors of clientserver environment?
References/E-References:
References:

Er. Jain, V. K. (2006). Database Management Systems. Dreamtech


Press.

Mark Gillenson, L., Paulraj P., Alex K., Boris T. M., Allen Taylor, G., &
Gavin Powell, F. M. (2008). Introduction to Database Management.
Wiley India Edition.

Ramez E., & Shamkant N. B. (2009). Fundamentals of Database


Systems. Pearson Education.

E-References:

http://www.1keydata.com/datawarehousing/conceptual-data-model.html
(Retrieved on 4th June 2012)

http://www.learndatamodeling.com/cdm.htm (Retrieved on 4th June


2012)

http://www.cwhonors.org/laureates/finance/20055379.pdf (Retrieved on
18th June 2012)

http://www.cs.sfu.ca/CourseCentral/354/zaiane/material/notes/Chapter1/
node17.html (Retrieved on 14th May 2014)

Sikkim Manipal University

B1966

Page No. 40

Database Management Systems

Unit 3

Unit 3

Record Storage and File Structure Organisation

Structure:
3.1 Introduction
3.2 Memory Hierarchy
3.3 Secondary Storage Devices
Hard disk drive
DVD drive
Blu-ray disk drive
3.4 Buffering of Blocks
3.5 Placing File Records on Disk
3.6 Operations on Files
Files of unordered records (heap files)
Files of ordered records (sorted files)
Hashing techniques
3.7 Summary
3.8 Glossary
3.9 Terminal Questions
3.10 Answers
3.11 Case Study

3.1 Introduction
In Units 1 and 2, you studied about the definition of database management
systems and the core concepts of database. You have also studied
database architecture in Unit 2. The collection of data is stored on a storage
medium. The DBMS system can retrieve, update and process this data as
needed using the stored data. Computer storage media includes two main
categories: primary storage and secondary storage. Primary storage is
volatile memory and secondary storage is a permanent storage and is a
non-volatile memory.
The following are the reasons for storing databases on secondary storage:
Databases are too large to fit entirely in main memory.
Secondary storage devices are non-volatile storage, whereas main
memory is often called volatile storage.
The cost of storage per unit of data is less for disk than for primary
storage.
Sikkim Manipal University

B1966

Page No. 41

Database Management Systems

Unit 3

Indexes are used to speed up the retrieval of records.


Indexes can be created using one or more columns, providing the basis
for both rapid random lookups and efficient ordering of access to
records.
The disk space required to store the index is typically less than the
storage of the table (since indexes usually contain only the key fields
according to which the table is to be arranged, and exclude all the other
details in the table).
Index file consists of two fields, the first field contains the value and
second field contains the list of pointers to address values in the disk
block.
Searching an index is much faster than searching the table because the
index is sorted and its rows are very small.
Index access structure is usually defined on a single field of a file, called
an indexing field.

In this unit, you will study about the definition of the storage media and how
data is stored in the database. You will also come to know the difference
between the conventional file systems and database management systems.
Objectives:
After studying this unit, you should be able to:
describe memory hierarchy
list and explain secondary storage devices
explain buffering of blocks and placing file records on disk
elaborate operation on files
differentiate the files of unordered records (heap files) and ordered
records

3.2 Memory Hierarchy


The orderly arrangement of the storage in the system architecture is called
memory hierarchy. The memory hierarchy has the following levels:

Processor register It is usually used for the fastest possible access


at the rate of 1 CPU cycle. It can access only hundreds of bytes in size.

Level 1 cache Level 1 cache is often accessed in just a few cycles,


usually in tens of kilobytes.

Sikkim Manipal University

B1966

Page No. 42

Database Management Systems

Unit 3

Level 2 cache It has got higher latency than level 1 by 210 times in
512 KiB or more (KiB = KibiByte referred as kilo Binary bytes, 1 Kibibyte
= 210 bytes. Its value is nearer to kilobyte).

Main memory It can take hundreds of cycles in multiple gigabytes.

Disk storage It has latency of millions of cycles, but very large.

Tertiary storage It has latency of several records can be huge.

3.3 Secondary Storage Devices


Secondary storage devices include magnetic disks, optical disks, tapes and
drums and usually are of larger capacity, low cost and provide slower
access to data than primary storage devices. Data in secondary storage
cannot be processed directly by the CPU; it must be first copied into primary
storage.
The storage device that holds the information permanently in it unless the
data is deleted or overwritten is called secondary storage device. This
device holds the information even when the power is cut off from the
computer. For example, hard disk drive, DVD drive and Blu-ray Disk (BD)
drive. Figure 3.1 depicts the three different storages of a computer.

Fig. 3.1: Types of Computer Storage


(Source: http://www.computerhope.com)
Sikkim Manipal University

B1966

Page No. 43

Database Management Systems

Unit 3

The computer storage is divided into mainly two types:


Primary memory
Secondary memory
Primary storage It includes storage media that can be operated directly
by the computer central processing unit (CPU), such as the computer main
memory and smaller but faster cache memories.
Secondary storage It is used with todays computers to store all your
programs and your personal data even though it is slow compared to
primary memory. The best example is hard disk drive.
Although offline storage is considered to be secondary storage, for your
better understanding, we have classified it separately considering the fact
that hard drive is always connected to the computer and offline storage is
not always connected and is used whenever it is required. It is detachable
from the computer and can be stored elsewhere.
In this section, we will study the characteristics of few secondary storage
media. For our study, we will consider describing the characteristics of hard
disk drive, DVD drive and BD drive.
3.3.1 Hard disk drive
The hard drive is also known as hard disk drive or fixed disk drive. It is the
main and largest storage device on the computer. It is referred usually in the
computer by C: drive. This consists of all the important programs and
applications of the computer. Hard drive is a non-volatile memory, random
access device for electronic data in the computer. Hard drives are very
much similar to the video tapes. Therefore, the data is stored on the long,
thin tape which is coated with some magnetic material on its surface.
Construction
Hard drive consists of magnetic read/write heads that read the data from the
rotating discs. It consists of the different parts that serve the different
functions of the hard disk. Hard disk consists of one or more rough and solid
substrate called Platters. Platters are made of aluminium as it is a light
material. They are circular in shape and magnetic substances are coated on
both the sides of the platters for reading/writing the data. Two or more
magnetic heads are connected to the platter in order to read/write the data
into the disc. Platters move on the common axis and heads are allowed to
Sikkim Manipal University

B1966

Page No. 44

Database Management Systems

Unit 3

move on the radius on the platter. Therefore, it allows the heads to read all
parts of the surface.
The information of each division of the platter is formed to represent a
specific location. This forms a set of concentric circles which is used to
record the data. Each concentric circle on a platter is called a track and
these tracks are further divided into sections. When the head of one surface
is on one track, the head of the corresponding other surface is also on the
respective track. All the tracks together are called cylinder. Sometimes track
and cylinder are used interchangeably. You can see a typical assembly of
platter and its data organisation in Figure 3.2.

Fig. 3.2: Data Organisation on a Hard Disk Drive

A platter contains thousands of tracks. Tracks are further divided into a


smaller segment which is called sector. Each sector holds a 512 byte of
data which include error checking and housekeeping data that are used to
identify sector, track and Cyclic Redundancy Check (CRC) result. CRC, also
known as polynomial code checksum, is a function that is designed to
detect the changes that occur to the computer data accidently. The hard
disk must be manufactured in high priority due to extreme smaller versions
of the components. The main part of the hard disk is separated from the
contact of external air so that no dust can enter the platter and avoid
damage caused to the read/write head.
Sikkim Manipal University

B1966

Page No. 45

Database Management Systems

Unit 3

Data density characteristics We should take care that all the


information must be
in the
hard drive platter. We can get the
maximum amount of capacity in terms of Megabytes per Square Inch
(MBSI) from areal density of the media. The following are the factors
that affect the real density:
o

The size of the magnetic particle is a barrier to areal density. Areal


density is more if the coercivity of the hard drive is large, and a
tighter magnetisation field with smaller read/write head allows higher
areal density.

The altitude of a read/write head over the platter surface (which is


also called head height) affects density if the read/write head
passes closer to the hard drive then areal densities will be more. If
the read/write head passes away from the media, areal densities will
be reduced due to the magnetic field.

Another major important limiting factor is surface smoothness


because smoother surface allows read/write head to fly closer to the
media.

Latency and seek Latency is the time delay that exists between the
moment that read/write command is initiated over the physical interface
of the drive and the moment where the desired information is placed.
Latency also refers to the time taken to pass the needed byte under a
read/write head. If the read/write head has not quite reached the desired
location there will be short latency. If the head has just missed the
desired location then the head must wait for one full rotation. Therefore,
latency can be very long. Seek time is the time taken to step the
read/write head between another delay added by the track to the hard
drive performance. There are a number of ways in seek time listing such
as track-to-track seek, full-stroke seek and average seek.
o

Track-to-track seek is the time required to step between two


adjacent tracks on the platter.

Full stroke is the time required to step from the innermost to the
outermost tracks. This time is relatively longer.

The average seek time is half the full-stroke seek time. Seek and
latency is together needed to load and save files. For example,
while loading a file a certain amount of seek time is taken to locate

Sikkim Manipal University

B1966

Page No. 46

Database Management Systems

Unit 3

the track which contains the starting of the file. There is some
latency during the platter rotating around the necessary sector.
The major parts of the hard disk are the frame, platters, read/write heads,
head actuators, spindle motors and drive electronics.

Frame The frame is also called chassis which is an important part of


the hard drive. This affects the structural, thermal and electrical integrity
of the drive. In order to mount the other components on the hard drive
the frame must be strong and provide steady platform. Therefore, cast
aluminium is used in larger drives for chassis and the smaller drive in
the laptop computer uses a plastic chassis.

Read/write heads Read/write heads form the interface between the


electronic circuitry and magnetic media of the hard drive. While writing,
electronic signals are translated into the magnetic flux transitions with
the help of a head which saturate points on the media where the
transition takes place. The read operation works almost reverse to this
process. Here, flux transitions induce electrical signals in the head that
are amplified, filtered and translated into respective logic signals.

Head actuators Hard drives use voice coil motors which are also
called rotary coil motors that are used to actuate head movement. Voice
coil motors work with the principal of analog meter moments, that is, a
permanent magnet is enclosed within two opposing coils. When there is
a current flow in the coil, it produces a magnetic field which opposes the
permanent magnet. In order to cause a deflection that is directly
proportional to the amount of driving current, a force of opposition is
maintained by attaching the head arms to the rotating magnet. Greater
opposition and deflection is obtained by increasing current signals. You
can choose the cylinder by increasing the servo signal and maintaining
the signal at a desired level. Voice coil motors are very small and light
assemblies that are well suited to fast access times and small hard drive
assemblies. The process of track following is called serving the heads.

Spindle motors The speed at which the media passes under the
read/write heads is one of the major factors that are responsible for
drives performance. Media is passed under the read/write heads by
spinning the platter at a high speed. The spindle motor is a brushless,
low-profile DC (Direct Current) motor which is responsible for spinning

Sikkim Manipal University

B1966

Page No. 47

Database Management Systems

Unit 3

the platter. An index censor provides feedback pulse signals that detect
the spindle as it rotates. Index signals are used by control electronics of
the drive that is used to regulate spindle speed as precisely as possible.

Drive electronics Hard drives are made up of sophisticated circuitry.


The drive electronics board that is mounted below the chassis contains
all the necessary circuitry to communicate data and control signals with
the particular physical interface, the read/write heads and spinning the
platter. RAID: RAID stands for Redundant Array of Inexpensive Disks.
In order to balance the I/O load you have to equate the database files
across all the disk drives. Therefore, maintaining of individual disk drives
in large number is cumbersome. For the sake of simplification of this
task, RAID was introduced.

RAID system can be configured in different ways depending upon your


needs. The criterion for configurations is of fault tolerance. These are known
as levels in RAID.
Even though these levels work separately, they serve the same purpose of
creating a logical disk drive out of two or more physical disks. The multiple
disk drives are configured in the form of an array to provide the desired
performance and fault tolerance properties.
The RAID array is classified into the following levels:
RAID 0
RAID 1
RAID 0 + 1
RAID 5
1. RAID 0 In this level, even though redundancy is not there it is stated
as RAID level. This level is also known as striping. It is the most basic
of all RAID levels. RAID 0 helps in splitting up the logical volume to
physical request. These requests are sent one by one but they are
executed simultaneously. In this level, performance level reaches to the
maximum since the processing doesnt have overhead. Since all the
disk space is used, even if a single disk drive fails all the data is lost. So,
there is no fault tolerance.
2. RAID 1 It is also known as mirroring. RAID 1 makes each disk to
duplicate itself entirely thereby avoiding the loss of all the data when the
Sikkim Manipal University

B1966

Page No. 48

Database Management Systems

Unit 3

single disk drive fails. But this is expensive since it requires more
number of disks as storage system.
3. RAID 0 + 1 This level is the combination of RAID 0 and RAID 1. IN
RAID 0 + 1, disk drives are mirrored and then striped. Hence, it includes
both the properties of large disk space and performance and mirroring.
4. RAID 5 It uses parity check method for fault tolerance. This eliminates
the use of double the number of disk drives just by adding a drive to
store parity.
Below table 3.1 shows comparison of different levels of RAID
RAID level

Read
performance

Write
performance

Fault
tolerance

Cost

RAID 0

Good

Good

None

Low

RAID 1 &
0+1

Good

Ok 1 logical
write = 2
physical I/Os

Excellent

Highest

RAID 5

Good

Poor 1 logical
write = 4
physical I/Os

Ok

Best for
fast
tolerance

Mainly RAID is used for improving reliability and performance in large


organisations.
3.3.2 DVD drive
The Compact Disc (CD) made a way for the world of new evolution in the
PC. Since these CD disks could handle only 650 MB of computer programs
and data or 1 hour of music, they were found to be an outdated medium of
storage for multimedia applications, large databases and interactive games.
DVD is a high-density storage media which is now widely used in
computers. DVD stands for Digital Versatile Disk that can hold program,
data, audio and video. It is capable of providing up to 17 GB of external
storage on your computer.
Access time The time needed by the drive to locate the required
information on the disk is called access time. These drives are very slow
and can take up to hundreds of milliseconds (ms) to access information.
Sikkim Manipal University

B1966

Page No. 49

Database Management Systems

Unit 3

Data transfer rates It is defined as the time taken to read the data from
the disk. Once you access data from the disk, it has to be transferred from
the disk to the system. There are two ways of measuring data rates. They
are as follows:
o Speed at which the data is read into the onboard buffer of the drive.
o Speed at which the data is transferred across the interface in the drive
controller.
The characteristics of the DVD drive are as follows:

Storage capacity
The hard disk drive capacity is measured in bytes. Modern period drive
capacities vary from gigabyte to terabyte or more. The capacity is a
factor of the number of platters, or disks, that are installed in the drive
and the density of the magnetic storage capability of those platters.

Access speed
The hard disk drive is an electro-mechanical device. The data is read by
a head which is present on the surface of the disk. Access speed is the
combination of the speed of the head movement and how quickly the
platter can rotate under the head.

Form factor
When compared to earlier hard drives, modern hard drives are compact
and have three physical formats: 3.5, 2.5 and 1.8. The smaller
physical size limits the number of platters and the diameter of those
platters. For example, a 1.8 drive has a maximum capacity of 320
gigabytes.

Interface
There are a series of changes that have occurred over time in the
electronic connection between the hard drive and the processor.
Whenever any change occurs there is an improvement in the transfer
speed of the data and ease of handling the hard drive by the
motherboard. The current standard interface is SATA, that is, Serial
Advanced Technology Attachment.

3.3.3 Blu-ray Disk drive


Blu-ray Disk (BD) is the advanced version of DVD which is made out of
smaller pits and lands. Single-layer BD can store about more than five times
the DVD capacity (almost 25 billion bytes) and double-layer BD can store up
Sikkim Manipal University

B1966

Page No. 50

Database Management Systems

Unit 3

to 50 billion bytes. It is so named because it uses blue-violet laser light to


read or write the data from the disk.
BD uses 0.1 mm (millimetre) cover layer which is used to move the data
closer to the lens. Through this we can achieve higher density. To read the
smaller pits it requires a blue laser of wavelength at 405 nm (nanometre).
The technology used in BD is High Definition Movie (HDMV). HDMV is a
technology that is used to provide the functionality of high-definition
graphics planes, animated and popup menu buttons and sound effects for
the selection of menu buttons.
BD has the following features:
It can record High-Definition Television (HDTV) without loss in the
quality.
It can instantly skip to any spot on the disc.
We can simultaneously record one program while we are watching
another on the disc.
We can create playlists.
We can edit or reorder programs recorded on the disk.
It helps us to search the empty space on the disk automatically and
avoid recording over a program.
It allows us to access the Internet to download subtitles and other extra
features.
Self-Assessment Questions
1. ________________ memory can take hundreds of cycles in multiple
gigabytes.
2. ______________ is referred as kilo binary byte.
3. How many bytes makes 1 kibibyte?
4. State whether the following statements are true or false:
a) Offline storage is also a part of primary storage.
b) Data in the secondary storage cannot be processed by CPU
directly.
c) Hard disk drive is an example of primary storage.
5. _____________ and ___________ are the types of computer storage.
6. The concentric circle on a platter is called ______________.

Sikkim Manipal University

B1966

Page No. 51

Database Management Systems

Unit 3

7. What is the time delay that exists between the moment that read/write
command is initated over the physical interface of the drive and the
moment where the desired information is placed?
8. Which of the following is the time required to step from innermost to the
outermost tracks?
a) Track-to-track
b) Full stroke
c) Seek
d) Latency
9. RAID stands for ____________________________________.
10. BD stands for ______________________________.

3.4 Buffering of Blocks


In the previous sections, you have studied the various features of different
secondary storage devices. Before processing any data, data in terms of
blocks is copied into the main memory buffer. When several blocks need to
be transferred from disk to main memory, several buffers can be reserved in
the main memory to speed up the transfer. While one buffer is being read or
written, with the help of input/output processor, the CPU can process data in
the other buffer.
Figure 3.3 illustrates how two processes can proceed in parallel.

Fig. 3.3: Interleaved Concurrency versus Parallel Execution

Sikkim Manipal University

B1966

Page No. 52

Database Management Systems

Unit 3

Process A and B are running concurrently in an interleaved fashion.


Buffering is most useful when processes can run concurrently in a
simultaneous fashion by both I/O processor and CPU. The CPU can start
processing a block when the data is transferred to main memory from
secondary memory, thus transferring the next block into a different buffer.
This technique is called double buffering. Double buffering permits
continuous reading or writing of data on consecutive disk blocks, which
eliminates the seek time and rotational delay for all but the first block
transfer. Moreover, data is kept ready for processing, thus reducing the
waiting time in the programs.
Self-Assessment Questions
11. While one buffer is being read or written with the help of
__________________, the CPU can process data in the other buffer.
12. Double buffer permits continuous reading or writing of data on
consecutive disk blocks. (True/False)

3.5 Placing File Records on Disk


Before knowing how to place records, you must have an understanding of
record types.
Record types
Data is usually stored in the form of records. Each record consists of a
collection of related data values. Records usually describe entities and their
attributes. For example, an EMPLOYEE record represents an employee
entity, and each field value in the record specifies some attribute of that
employee, such as NAME, BIRTH DATE, SALARY or SUPERVISOR. A
collection of field names and their corresponding data types constitutes a
record type or record format definition. You can classify record types as
fixed length records and variable length records. A file is a sequence of
records.
Fixed length records
All records in a file are of the same record type. If every record in the file
has exactly the same size (in bytes), the file is said to be made up of fixed
length records.

Sikkim Manipal University

B1966

Page No. 53

Database Management Systems

Unit 3

Variable length records


If different records in the file have different sizes, the file is said to be made
up of variable length records. The variable length field is a field whose
maximum allowed length would be specified. When the actual length of the
value is less than the maximum length, the field will take only the required
space. In the case of fixed length fields, even if the actual value is less than
the specified length, the remaining length will be filled with spaces of null
values.
We should always remember that a file may have variable length records for
several reasons. Some of them are given below:

Records having variable length fields The file records are of the
same record type, but one or more of the fields are of varying size. For
example, the NAME field of EMPLOYEE can be a variable length field.

Records having repeating fields The file records are of the same
record type, but one or more of the fields may have multiple values for
individual records. Group of values for the field is called repeating group.
Here, the record length varies depending on the number of authors.

Records having optional fields The file records are of the same
record type, but one or more of the fields are optional. That is, some of
the fields will not have values in all the records. For example, there are
25 fields in a record and out of 25 if 10 fields are optional then there will
be wastage of memory. So only the values that are present in each
record will be stored.

Record blocking and spanned versus unspanned records The


records of a file must be allocated to disk blocks. If the block size is
larger than the record size, each block will contain numerous records.
Some files may have unusually large record sizes that cannot fit in one
block. Suppose that the block size is B bytes. For a file of fixed length
records of size R bytes, with B R we can fit bfr = [(B/R)] records per
block. The value bfr is called the blocking factor for the file. In general, R
may not divide B exactly, so we have some unused space in each block
equal to R (bfr *R) bytes.

To utilise this unused space, we can store part of the record on one block
and the rest on another block. A pointer at the end of the first block points to
the block containing the remainder of the record. This organisation is called
Sikkim Manipal University

B1966

Page No. 54

Database Management Systems

Unit 3

spanned, because records can span more than one block. Whenever a
record is larger than a block, we must use a spanned organisation, shown in
Figure 3.4. If records are not allowed to cross block boundaries, the
organisation is called unspanned, shown in Figure 3.4. This is used with
fixed length records having B R.
We can use bfr to calculate the number of blocks b needed for file of r
records:
b = [(r/bfr)} blocks.

Fig. 3.4: Types of Record Organisation: (a) Unspanned and (b)Spanned

Allocating File Blocks on Disk


There are several standard techniques for allocating the blocks of a file on
disk. In contiguous (sequential) allocation, the file blocks are allocated to
consecutive disk blocks. This makes reading the whole file very fast, using
double buffering, but it makes expanding the file difficult. In linked allocation,
each file block contains a pointer to the next file block. A combination of the
two allocates clusters of consecutive disk blocks, and the clusters are linked
together. Clusters are sometimes called segments or extents.
File headers: A file header or file descriptor contains information about a
file that is needed by the header and includes information to determine the
disk addresses of the file blocks as well as to record format descriptions,
which may include field lengths and order of fields within a record for fixedlength unspanned records and field type codes, separator characters.

Sikkim Manipal University

B1966

Page No. 55

Database Management Systems

Unit 3

To search for a record on disk, one or more blocks are copied into main
memory buffers. Programs then search for the desired record utilising the
information in the file header. If the address of the block that contains the
desired record is not known, the search programs must do a linear search
through the file blocks. Each file block is copied into a buffer and searched
until the record is located. This can be very time consuming for a large file.
Self-Assessment Questions
13. Data is stored in the form of __________________.
14. _______________ and _____________ are the types of record types.

3.6 Operations on Files


Operations on files are usually grouped into retrieval operations and update
operations such as insertion or deletion of records or by modification of field
values. The following are the commands to operate on a file.

Find (or locate) Searches for the first record satisfying search
condition.

Read (or get) Copies the current record from the buffer to a program
variable.

Find next Searches for the next record.

Delete Deletes the current record

Modify Modifies some field values for the current record.

Insert Inserts a new record into the file.

There are two different types of records that you can store in the files. They
are unordered records and ordered records. The files of unordered records
are called heap files and the files of ordered records are called sorted files.
3.6.1 Files of unordered records (heap files)
In the simplest and most basic type of organisation, records are placed in
the file in the order in which they are inserted, and new records are inserted
at the end of the file. Such an organisation is called a heap or pile file.
Inserting a new record is very efficient The last disk block of the file is
copied into a buffer, the new record is added and the block is then rewritten
back to the disk. However, searching for a record using linear search is an
expensive procedure.
Sikkim Manipal University

B1966

Page No. 56

Database Management Systems

Unit 3

Deleting a record In order to delete a record, a program must first find


the record, copy the block into a buffer, delete the record from the buffer
and finally rewrite the block back to the disk. This leaves extra unused
space in the disk block. Another technique used for record deletion is a
marker stored with each record. A record is deleted by setting the deletion
marker to a certain value. A different value of the marker indicates a valid
(not deleted) record. Search programs consider only valid records in a block
when conducting their search. Both of these deletion techniques require
periodic reorganisation of the file. During reorganisation, records are packed
by removing deleted records.
3.6.2 Files of ordered records (sorted files)
We can physically re-arrange the order of the records of a file on a disk
based on the values of one of their fields called the ordering field. If the
ordering field is also a key field of the file, a field guaranteed to have a
unique value in each record, then the field is also called the ordering key for
the file.
Ordered records have some advantages over unordered files. First, reading
the records in the order of the ordering field values becomes extremely
efficient, since no sorting is required. Second, finding the next record in an
ordering field usually requires no additional block accesses, because the
next record is in the same block as the current one (unless the current
record is the last one in the block). Third, using a search condition based on
the value of an ordering key field results in faster access when the binary
search technique is used.
Inserting and deleting records are expensive operations for an ordered file
because the records must remain physically ordered. To insert a new
record, we must find its correct position in the file based on its ordering field
value and then make space in the file to insert the record in that position.
For a large file this can be very time consuming. For record deletion the
problem is less severe if we use deletion markers and reorganise the file
periodically.
3.6.3 Hashing techniques
One disadvantage of sequential file organisation is that we must use linear
search or binary search to locate the desired record and that results in more
I/O operations. In this, there are a number of unnecessary comparisons. In
Sikkim Manipal University

B1966

Page No. 57

Database Management Systems

Unit 3

hashing technique or direct file organisation, the key value is converted into
an address by performing some arithmetic manipulation on the key value,
which provides very fast access to records.
Key Value

Hash function

Address

Let us consider a hash function h that maps the key value k to the value
h(k). The VALUE h(k) is used as an address.
The basic terms associated with the hashing techniques are:
1. Hash table It is simply an array that is having address of records.
2. Hash function It is the transformation of a key into the corresponding
location or address in the hash table (it can be defined as a function that
takes key as input and transforms it into a hash table index).
3. Hash key Let R be a record and its key hashes into a key value
called hash key.
Self-Assessment Questions
15. State whether the following statements are true or false:
a) Inserting a new record is inefficient.
b) Hash table is an array having address of records.

3.7 Summary
Let us recapitulate the important concepts discussed in this unit:

The orderly arrangement of the storage in the system architecture is


called the memory hierarchy.

Secondary storage devices include magnetic disks, optical disks, tapes


and drums; they are usually of larger capacity, cost less and provide
slower access to data than primary storage devices. The different
examples of secondary storage devices are hard disk drive, DVD drive,
BD drive, and so on.

Buffering is most useful when processes can run concurrently in a


simultaneous fashion by both I/O processor and CPU.

Data is usually stored in the form of records. Each record consists of a


collection of related data values. Records usually describe entities and
their attributes.

Sikkim Manipal University

B1966

Page No. 58

Database Management Systems

Unit 3

Operations on files are usually grouped into retrieval operations and


update operations such as insertion or deletion of records or by
modification of field values. The different types of records are ordered
records and unordered records.

3.8 Glossary
Analog: Analog describes a device or system that represents changing
values as continuously variable physical quantities. A typical analog device
is a clock in which the hands move continuously around the face.
Buffer: A buffer is an 8-KB page in memory, the same size as a data or
index page.
Cache: Cache is a collection of data duplicating original values stored
elsewhere on a computer. It is a part of primary memory.
Coercivity: It is the magnetic field applied during magnetisation of any
Ferro magnetic material.
Index: A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of slower writes and
increased storage space. Indexes can be created using one or more
columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
Non-volatile: Non-volatile storage is the memory that can retain the stored
information even when not powered.
Platter: A platter is a round magnetic plate that constitutes part of a hard
disk. Hard disks typically contain up to a dozen platters.
Record: A record is a collection of data items arranged for processing by a
program.
Volatile: Volatile memory retains the information as long as power supply is
on, but when power supply is off or interrupted the stored memory is lost.

3.9 Terminal Questions


1. Describe the construction of hard disk drive.
2. Explain fixed length and variable length record.
3. Differentiate between files on unordered records and files on ordered
records.
4. What is hashing technique?
Sikkim Manipal University

B1966

Page No. 59

Database Management Systems

Unit 3

3.10 Answers
Self-Assessment Questions
1. Main
2. Kibibyte
3. 210
4. Answers:
a) False
b) True
c) False
5. Primary storage and secondary storage
6. Track
7. Latency
8. b
9. redundant array of inexpensive disks
10. Blu-ray Disk
11. I/O processor
12. True
13. Records
14. Fixed length and variable length
15. Answers:
a) False
b) True
Terminal Questions
1. Hard drive consists of magnetic read/write heads that reads the data
from the rotating discs. It consists of different parts which serve the
different functions of the hard disc. Hard disc consists of one or more
rough and solid substrate called Platters. Platters are made out of
aluminium as it is a light material. They are circular in shape and
magnetic substances are coated on both the sides of the platters for
reading/writing the data. (Refer to Section 3.3.1 for further information.)
2. Fixed length: All records in a file are of the same record type. If every
record in the file has exactly the same size (in bytes), the file is said to
be made up of fixed length records. Variable length records: If
different records in the file have different sizes, the file is said to be
made up of variable length records. The variable length field is a field

Sikkim Manipal University

B1966

Page No. 60

Database Management Systems

Unit 3

whose maximum allowed length would be specified. (Refer to Section


3.4 for further information.)
3. Ordered records have some advantages over unordered files. First,
reading the records in order of the ordering field values becomes
extremely efficient, since no sorting is required. Second, finding the
next record in an ordering field usually requires no additional block
accesses, because the next record is in the same block as the current
one (unless the current record is the last one in the block). Third, using
a search condition based on the value of an ordering key field results in
faster access when the binary search technique is used. (Refer to
Section 3.6 for further information.)
4. In hashing technique or direct file organisation, the key value is
converted into an address by performing some arithmetic manipulation
on the key value, which provides very fast access to records. (Refer to
Section 3.6 for further information.)

3.11 Case Study


DVD Spin Coat Process Automation
A leading manufacturer of DVD mastering equipment needed to have an
automated system that would be able to feed cleaned, coated and cured
glass disks into their mastering system. The current method involved
operators who moved individual glass disks from a washing station to a
separate chemical dispense and spin coater and finally to a bake and cure
station. This was a manual process that took up a lot of floor space and
supplied coated disks of varying and unpredictable quality levels to their
flagship mastering system. Not wanting to diverge from their core
technology and dilute engineering resources, this was a perfect outsource
project for both the client and Owens Design.
(Source: http://www.owensdesign.com/case-studies-hard-disk-drive/dvdspin-coat-process.html)
Discussion Question:
1. The challenge here is the complex process requirements. A new
system is needed in 5 months for major tradeshow. What is the
solution to overcome this challenge?

Sikkim Manipal University

B1966

Page No. 61

Database Management Systems

Unit 3

References/E-References:
E-References:
http://www.computerhope.com/jargon/s/secostor.htm (Retreived on 20th
June 2012)
http://www.ehow.com/list_6684495_characteristics-hard-drive_.html
(Retreived on 20th June 2012)
http://searchoracle.techtarget.com/definition/record (Retreived on 22nd
June 2012)
http://www.owensdesign.com/case-studies-hard-disk-drive/index.html
(Retrieved on 22nd June 2012)

Sikkim Manipal University

B1966

Page No. 62

Database Management Systems

Unit 4

Unit 4

Database Design

Structure:
4.1
Introduction
Objectives
4.2
Relational Data Model
4.3
Relational Algebra
4.4
Data Dictionary
4.5
Normalisation
4.6
Summary
4.7
Glossary
4.8
Terminal Questions
4.9
Answers
4.10 Case Study

4.1 Introduction
In Unit 3 you have studied the basic concepts of secondary storage devices.
This unit will enable you to have knowledge of how to design a database
and the different types of designing models.
The relational model was first introduced in 1970 by Ted Codd who was
working in IBM research. The concept used in the model was mathematical
relation which resembles table and it is based on set theory and first-order
predicate logic. In this unit, we will discuss the basic characteristics of model
and constraints. These models are referred to as legacy database systems.
In this unit, we will describe the basic principles of relational model of data.
For this purpose, we will start our study by defining the concepts of models
and notations of the relational model. We will discuss the relational
constraints which is an important model. Also we will define the update
operations of the relational model, and further discuss how to handle the
violations of integrity constraints. We will study the meaning of relational
algebra which is again dealt with in detail in Unit 6.
You have to observe that this unit is an introduction of the concepts
explained in subsequent units. Therefore, you should have a better
understanding of these concepts.

Sikkim Manipal University

B1966

Page No. 63

Database Management Systems

Unit 4

Objectives:
After studying this unit, you should be able to:

describe relational data model

list the various operations in relational algebra

elaborate data dictionary

define normalisation

compare the different normal forms

4.2 Relational Data Model


The relational model represents the database as a collection of relations
having a set of rows and columns, each of which is assigned a unique
name. Relation consists of a relational schema (structure of table) and
relational instance (data in a table at the particular time); there is a close
correspondence between the concept of table and the mathematical
concept of relation.
In relational model, we use certain conventions. For instance, a row is called
a tuple and a column is termed as an attribute. The domain of a relational
schema is a pool of legal values. Consider the below example, STUDENT
[Reg. No., Name, Addr, Phone, Dbirth, GPA].
In this example, STUDENT is a relation and the attributes [columns] are
RegNo., Name, Addr, Phone, Dbirth. A possible tuple for the STUDENT
relation is [MBA02C1101, Priyadarshini, 440, 1-main, 2nd cross, Airport
Road, Kodenahalli, Bangalore-560008, 25256789, 11-Jan-1986].
The domain of each attribute is as follows:
Reg No. :

10 alphanumeric characters

Name

characters

Addr

Alphanumeric characters

Phone

7 digits

D birth

Date

Characteristics of a relation: According to brumScouse, A relation can be


represented by a table in database. A relation in the context of modeling a
problem will include the fields and possibly the identification of fields which

Sikkim Manipal University

B1966

Page No. 64

Database Management Systems

Unit 4

have relationships with other relations. Relation is basically a table with


rows & columns. Row is a relation is called a tuple.
The tuples in the relation need not be ordered.
Each tuple in the relation is an entity.
Domain
A domain D is a set of atomic values. For each attribute, there is a set of
permitted values called the domain of that attribute.
For example: For the attribute empname, the domain is the set of all
empnames.
Let, D1 denotes the attributes of empnames
D2 - > set of all empnames
D3 - > set of all addresses
D4 - > set of all phone numbers
D5 - > set of all salary amounts
In general, a table of n columns must be a subset of D1 D2 D3 D4
Dn1 Dn.
In relational model terminology, the data type describing the type of values
that can appear in each column is called a domain.
1. Since a relation is a set of tuples, we use the mathematical notation of t
and r to denote that tuple t is in relation r.
2. A domain is atomic if it is not divisible and if elements of the domain are
considered to be individual units.
For example, the set of integers is an atomic domain.
Degree of relation is defined as the number of attributes of its relation
schema. Relation Database Schema is made up of a relation name R and a
list of attributes, where each attribute is the name of a role played by some
domain in the relation schema. Relation database state is the set of total
number of tuples.
Tuple is a collection of components in a sequence. The component type
depends upon the data type specified. The component may be a persons
name, address, date of birth, age, etc. Each component of a tuple is a value
of a specified type. A tuple containing n components is n-tuple. For
example, a quadruple contains four tuples containing components of a
Sikkim Manipal University

B1966

Page No. 65

Database Management Systems

Unit 4

specified type, namely, STUDENT, CLASS, MARKS and SUB, implying that
certain students of the class obtained certain marks in a certain subject.
Entity set
The number of tuples in a relation is called an entity set.
Database schema or relational schema:
1. Denoted by R[A1, A2, A3, , An] is made up of a relation name R and a
list of attributes A1, A2, A3, , An.
2. Database instance is the data in DATABASE at a particular point in
time.
3. D is called domain of A1 and denoted by dom[A1].
4. A relational schema is a list of attributes and their corresponding
domains [set of values].
5. To represent incomplete tuples, we must use NULL values; for example:
Apartment number.
6. Candidate keys are other keys [except primary key].
7. Primary attribute is one of the candidate keys, where values of their
attribute are unique and NOT NULL.
8. If we denote cardinality of a domain D by | D |, and assume that all
domains are finite, the total number of tuples in the Cartesian product is:
| dom(A1) | * |dom(A2) | * * | dom(An) |
9. Current relation state reflects only the valid tuples that represent a
particular state of the real world.
Relational model notation

A relational schema R of degree n is denoted by R[A1, A2, A3, , An],


known as degree of relation [total number of attributes].

An n-tuple t in a relation r(R) is denoted by t = [V1, V2, V3, , Vn]


where Vi is the value corresponding to attribute Ai.

The letters Q, R, S denote relation names.

The letters q, r, s denote relation status.

The letters t, u, v denote tuples.

In general, the name of a relation such as STUDENT indicates the


current set of tuples in that relationwhere STUDENT (name, SSN)
refers to the relation schema.

Sikkim Manipal University

B1966

Page No. 66

Database Management Systems

Unit 4

Primary Key: A single attribute in the relation is a key. In a relation R, a


value of a key attribute A can be used to uniquely identify each tuple in the
relation R. For example, in STUDENT relation, std_id can be used to
uniquely identify std_name, Class, Address and Phone_No of the student. If
there is no such key attribute, then we cannot access the tuples in the
relation. Also, we cannot designate std_name as a key because there may
be more than one tuple that can hold the same student name. Therefore,
std_id is called the primary key.
In a relation, there can be more than one key attribute. For example, in the
STUDENT relation, as shown in Table 4.1, there are two key attributes,
namely, std_ID and phone number. In this case, these keys are called
candidate keys. Usually one of the candidate key is designated as primary
key.
Table 4.1: Table to Show Candidate Keys
Std_ID

Std_name

Class

Address

Phone_No.

201

Jagdish Tiwari

2 MCA

#7, Juhu Post,


New Delhi 110023

9112345599

302

Annapoorna

3 MCA

No.2, Sriram
Nagar, Near
Mahalaxmi Nagar
Main Road,
Guduvanchery,
Chennai 603201

26789580

105

Sathya Shukla

1 MCA

#64, near Central


Excise & Service
Tax Department,
Manik Bagh
Palace, Indore
452001

07312395007

303

Sathya Shukla

3 MCA

Juhu post, Andheri


West, Mumbai
230098

934567889

Therefore, a primary key is defined as a key attribute which uniquely


determines the tuples in the relation.
There may arise some cases where we need to combine or link two tables
of the database management system that are having one or more common
attributes. Consider an example of two relations STUDENT and
PROFESSOR as shown in Tables 4.2(a) and 4.2(b).
Sikkim Manipal University

B1966

Page No. 67

Database Management Systems

Unit 4

Table 4.2(a): Student Relation


Std_ID

Std_name

Class

Subject_code

Marks
obtained
(%)

101

Ranjith Jha

1 MBA

F1

78

203

Meghna Sinha

2 MBA

HR1

89

105

Mekhala Sha

1 MBA

F1

98

303

Samiksha
Shukla

3 MBA

M1

67

109

Vinay Singh

2 MBA

M1

95

Table 4.2(b): Professor Relation


Subject_code
F1

Subject name

Professor-in-charge

Finance

Prof. N. Rao

Human Resource

Prof. Ravi

M1

Marketing Management

Prof. A. Agrawal

IS1

Information System

Prof. R. Gowda

HR1

Subject_code is the primary key for the relation PROFESSOR and this key
links the relation with relation STUDENT. Therefore, the primary key of
relation PROFESSOR will become the foreign key in relation STUDENT.
Therefore, the foreign key is defined as the key attribute that is used to link
the two relations; also, remember that the foreign key of one relation will
always be the primary key of the linked relation.
Self-Assessment Questions
1. Relation consists of _________________ and _________________.
2. A row is also called as _________________.
3. _________________ is a set of atomic values.
4. Tuple is the number of attributes of its relation schema. (True/False)

4.3 Relational Algebra


Relational algebra is a language in which one relation gets defined by
another relation without the original relation getting changed. There are
many operations that work with relational algebra. Some of them are given
below:

Selection

Sikkim Manipal University

B1966

Page No. 68

Database Management Systems

Unit 4

Projection

Cartesian product

Union

No duplication

Join operators

Intersection

Selection (restriction) When you select the rows, it is Selection of


rows. For example, for those employees whose salary is more than
Rs.1,000 in the salary table, it is denoted by O condition (R), that is, O
salary > 1,000 (Staff).

Projection Using this operation you can select columns from a


relation. Projection operation is denoted by the symbol . The format is
col1, col2, , coln (R) where (R) is the relation. For example, sno,
fname, address (Student).

Cartesian product This is the product of rows of one relation to rows


of another relation. You can concatenate rows of one relation to other
relations record, that is, R * S = (Sid, name)(student) * (cid)(courses)
where sid, name are the records of student relations and cid is the
record of course table.

Union To join two relations, they must be of the same data type. It is
represented by RS and read as R union S.

No duplication There will be no duplicate rows in the relation.

Difference Difference operation results in a relation in which the


tuples are in R but not in S. It is represented by RS.

Join operators The operation with this operator joins two relations.
The different types of join operators are:
o

Theta join Theta matching is matching of two fields in two


relations. It includes =, <, >, <=, =!. It is represented by RXS.

Equi-join In equi-join, on the basis of values of a common attribute


between the two relations, that is, the meaningful attributes which
are common between the relations, are joined.

Natural join Natural join is similar to equi-join with a difference


that the result relation has only one set of common attributes.

Sikkim Manipal University

B1966

Page No. 69

Database Management Systems

Unit 4

Outer join A join in which each matching record from two tables is
combined into one record in the querys results, and at least one
table contributes all of its records, even if the values in the joined
field dont match those in the other table.

Semi-join In semi-join there is no complete join. In this join, it


returns rows from one table that would join with another table.
Specifically, this join does not have a specific representation. This
join will be useful with Distributed Database (DDB) which takes
relations from one side and then relations from the other side.

Intersection Intersection can be applied on the tuples that are in both


relations, say, R and S, must be union-compatible. It is represented as
RS.

Division Division operation produces the tuples in one relation R


which matches all the tuples in the relation S. You can express this
operation in terms of projection, cross-product or difference in relations.
Division operation is represented as RS.

Out of the above operators, select, project, union, set difference and
Cartesian product are considered as basic operators and set intersection,
division, join are called derived operators. You will study relational algebra
in more detail in Unit 6.
Self-Assessment Questions
5. Which of the following operation helps you to select columns from a
relation?
a) Selection
b) Projection
c) Cartesian product
d) Join
6. _________________ join is matching of two fields in two relations.
7. __________________ operation produces the tuples in the relation
which matches all the tuples in the other relation.

4.4 Data Dictionary


A data dictionary contains files that have details about the date and
information present in the database. These files have the details of the
Sikkim Manipal University

B1966

Page No. 70

Database Management Systems

Unit 4

number of records in each file, names and the data types of each field. This
data dictionary is always a hidden file from the users so that the contents in
the data dictionary do not get accidentally destroyed.
A typical data dictionary has the following information:

Schema definitions of the objects in the database.

The names of the database users.

Space allocated and the space used by the schemas.

Authorisation details.

Default values of the fields.

Updating information like who is the original author and who has
updated it thereafter.

Structure of data dictionary


The data dictionary consists of the following components as its important
requirement:

Base tables

User-accessible views

SYS file

Base tables
The database of a particular software has more than one database
associated to one another. Base tables store the associated table
information of a database. These tables are the normalised tables and are
often stored in an encrypted format to prevent them from getting destroyed.
User-accessible views
These views summarise the information stored in the base table and
decrypt the information into its respective field names, rows, and so on. For
this, join operation and WHERE clauses are mostly used. Using Views is
the safest way to avoid direct access to the base tables.
SYS file
(system file) SYS file or SYS schema is the owner of the data dictionary. No
user should perform alteration like INSERT, DELETE, MODIFY, and so on,
to the SYS file. This is the central account for the security administrator and
he should have strict control of its access.

Sikkim Manipal University

B1966

Page No. 71

Database Management Systems

Unit 4

Uses of data dictionary


The uses of data dictionary are as follows:

Data dictionary can be used to find information about users, schema


objects and storage structures.

Data dictionary can be modified every time by issuing a data definition


language.

Data dictionary can be used as a read-only reference for information


about the database.

Data dictionary is available in SYSTEM table space which is online


always. When a database is open the data dictionary is available.

4.5 Normalisation
According to R. Elmasri and S.B. Navathe, normalisation is a process of
analysing the given relation schemas based on their functional
dependencies and primary keys to achieve the two desirable properties
mentioned below:

Minimising redundancy and insertion

Reducing deletion and updation anomalies

When these above properties are not satisfied by the relation then the
relation set is decomposed to form two or more sets which have the above
properties by inserting a primary key or by inserting a field to the relation.
This relational form is called Normal Form. It is always clear that the Higher
order of Normal Form (HNF) has lesser vulnerability.
In this section, we will discuss the following types of Normal Forms.

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

BoyceCode Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

First Normal Form (1NF)


A relation is said to be in First Normal Form only if,
1. It is a relation.
Sikkim Manipal University

B1966

Page No. 72

Database Management Systems

Unit 4

2. It has no repeating rows.


3. Each attribute value is atomic.
If a relation does not satisfy any one of the above conditions then it is not in
1NF.
For example, consider the STUDENT schema having the fields as shown in
Table 4.3(a).
Table 4.3(a): Relation Schema of a STUDENT Relation
Std. ID

Std_Name

Class

Address

Tel. No.

201

Ranjith

#4, Chokkanahalli,
Bangalore 560074

26677780

202

Shivraj

XI

Andheri (east) Mumbai


400064

2514890
9885643247

304

Lavanya

#10, Dadra Post, Bandra


(east), Mumbai 400014

25234972
9912451356

The above table is not in 1NF since the field Tel. no. is multi-valued for
std_ID 202 and 304. However, if we insert a field name Mobile_No as
shown in Table 4.3(b) to maintain the atomic value attribute we may create
a null field in the field which is not allowed. Therefore, Table 4.3(b) is not in
1NF.
Table 4.3(b) Relation Sschema of a STUDENT rRelation.
Std_ ID Std_ Name

Class

Address

Tel_no.

Mobile No.

201

Ranjith

#4, Chokkanahalli,
Bangalore 560074

26677780

202

Shivraj

XI

Andheri (east)
Mumbai 400064

2514890

304

Lavanya

#10, Dadra Post, 25234972 9912451356


Bandra
(east),
Mumbai 400014

9885643247

Therefore, to make the table in 1NF we need to decompose Table 4.3(a)


into two tables as shown in Tables 4.4(a) and 4.4(b).
Table 4.4(a)
Std_
ID
201

Std_
Name
Ranjith

Table 4.4(b)

Class
X

Sikkim Manipal University

Address
#4,

Chokkanahalli,
B1966

Std_
ID
201

Tel_No.
26677780
Page No. 73

Database Management Systems

Unit 4

Bangalore 560074
202

Shivraj

XI

Andheri
(east)
Mumbai 400064

202

2514890

304

Lavanya

#10,
Dadra
Post,
Bandra
(east),
Mumbai 400014

202

9885643247

304

25234972

304

9912451356

Now, Tables 4.4(a) and 4.4(b) are in First Normal Form.


Second Normal Form (2NF)
Second Normal Form is based on full functional dependency. A functional
dependency is said to be fully functional dependency. If we remove any
attribute from the relation then the dependency will be lost in the relation.
According to R. Elmasri and S.B. Navathe, A relation is said to be in 2NF
only if the relation is in 1NF and every nonprime attribute in the relation is
fully functionaly dependent on the primary key of the relation. For example,
consider a STUD_PROJ relation as shown in Table 4.5.
Table 4.5: STUD_PROJ Relation
Std_
ID

Project_
Code

Hours

Std_ Name

Class

Proj_ name

Prof_
incharge

101

HMS1

20

Ranjith Jha

1 MBA

Hospital
Ms.
management Sahana
System

203

SIM2

30

Meghna Sinha

2 MBA

Simulation of Mr. Murali


petrol bunk

303

DM1

15

Samiksha
Shukla

3 MBA

Data mining Mr.


in research Benjamin
analysis

Table 4.5 (STUD_PROJ) is in 1NF but not in 2NF. Therefore, we need to


decompose the table as given in Figure 4.1.

Sikkim Manipal University

B1966

Page No. 74

Database Management Systems

Unit 4

Fig. 4.1: 2NF Normalisation

Now, the relations SP1, SP2 and SP3 are in 2NF.


Third Normal Form (3NF)
According to R. Elmasri and S.B. Navathe, a relation is said to be 3NF, if it
satisfies and holds a nontrivial functional dependency either by
1. a superkey of relation, or
2. a prime attribute of the relation.
3NF is based on transitive dependency. A functional dependency in relation
R is transitive dependent if the attributes of the relation are neither a
candidate key nor a subset of any key of the relation.
Let us take an example of PROFESSOR relation as given in Table 4.6 for
our understanding of 3NF.

Sikkim Manipal University

B1966

Page No. 75

Database Management Systems

Unit 4

Table 4.6: PROFESSOR Relation


Prof_ name

Prof_id

Subjects
specialisation

Qualification

Dept_
Number

Dept_
Name

HOD_ID

Dr. Rao

A1

Finance

PhD

D1

Management

H2

Dr. Ravi

A2

Marketing

PhD

D1

Management

H2

Prof. Sanat
Sha

B1

Computer
science

MCA

D2

IT

H1

Prof. Neena
Gupta

B2

Sociology

MA,
MPhil

D3

Arts &
Humanities

H3

Figure 4.2 shows the decomposition of the above table to form 3NF.

Fig. 4.2: 3NF Normalisation

Now the relations P1 and P2 are in Third Normal Form.


BoyceCodd Normal Form (BCNF)
BoyceCodd Normal Form is the simplest form of 3NF. But it is stricter than
3NF. Every relation in BCNF is also in 3NF but not all the relations in 3NFs
need to necessarily be in BoyceCodd Normal Form.
A relation is said to be in BCNF only if every determinant is candidate key. A
determinant is any attribute (simple or composite) on which some other
attribute is fully functionally dependent.

Sikkim Manipal University

B1966

Page No. 76

Database Management Systems

Unit 4

Let us take an example of a relation STUD_REPORT which has the field as


shown in Figure 4.3.
STUD_REPORT

Fig. 4.3: Student Report Table

In this figure, the functional dependencies of the relation are:


Std_ID -> Std_name
Course_code -> Course_title, Faculty_incharge
Faculty_incharge -> Fac_loc
Std_ID, Course_code, Program -> Grade
Std_ID, Program -> Coordinator
Coordinator -> program
The above relation is not normalised (Figure 4.3). To normalise, remove the
redundant groups.
Then it will be,
STUDENT

Fig. 4.3(a)

STUD_PROG

Fig. 4.3(b)
Sikkim Manipal University

B1966

Page No. 77

Database Management Systems

Unit 4

STUD_COURSE

Fig. 4.3(c)

Figures 4.3(a), 4.3(b) and 4.3(c) are only in 1NF. To make it 2NF, we need
to remove the partial key dependencies. Therefore, we will decompose the
schema STUD_COURSE in Figure 4.3(c) into two more schemas, namely,
STUD_COURSE1 as shown in Figure 4.4(a) and COURSE in Figure 4.4(b).
STUD_COURSE1

Fig. 4.4(a): Stud_Course Relation after the Decomposition

COURSE

Fig. 4.4(b): Course Relation Decomposed from Relation STUD_COURSE

Now we have removed the partial key dependencies and the relation is in
2NF. To make this relation into 3NF we need to remove the transitive
dependency of the relation. Therefore, after the decomposition of relation
COURSE (Figure 4.4(b)), the normalised schemas will be as shown in the
Figures 4.5(a) and 4.5(b).
COURSE1

Fig. 4.5(a): COURSE Relation after Decomposition

Sikkim Manipal University

B1966

Page No. 78

Database Management Systems

Unit 4

FACULTY

Fig. 4.5(b): FACULTY Relation Decomposed from Course Relation

Now the above schemas are in 3NF. Relation STUDENT (Figure 4.3(a)),
STUD_PROG
(Figure
4.3(b)),
STUD_COURSE1(Figure
4.4(a)),
COURSE1(Figure 4.5(a)) and FACULTY (Figure 4.5(b)) are in Third Normal
Form.
Now we can observe in STUDENT relation that the only determinant is
Std_ID. In STUD_COURSE1 relation, the only determinant is Std_ID,
Program. In the COURSE1 relation, the only determinant is Course_code.
In the relation FACULTY, the only determinant is Faculty_incharge. In
STUD_PROG, the determinants are Std_ID, Prog or Prog_coordinator.
Therefore, Std_ID, Prog is a candidate key. So, we will decompose the
relation STUD_PROG (Figure 4.3(b)) into two relations as shown in Figures
4.6(a) and 4.6(b).
STUD_PROG1

Fig. 4.6(a)

PROG

Fig. 4.6(b)

Sikkim Manipal University

B1966

Page No. 79

Database Management Systems

Unit 4

Therefore, now Figure 4.7 is in BoyceCodd Normal Form.

Fig. 4.7: Example of a BCNF Normalised Relation

Fourth Normal Form (4NF)


An entity is in the Fourth Normal Form (4NF) if it is in 3NF and has entity
which has more than one one-to-many relationships in the relationship
within the entity; if any many-to-many relationship exists, they are resolved
independently.
For example, consider the relation STUDENT as shown in Table 4.7(a)
which has three attribute names Std_name, Sub_name, Fac_incharge.
Table 4.7(a): STUDENT Relation
Std_name

Sub_name

Fac_incharge

Pushpa

Maths

Prof. Chidanand

Pushpa

Physics

Prof. Ramesh

Pushpa

Physics

Prof. Chidanand

Pushpa

Maths

Prof. Ramesh

Sikkim Manipal University

B1966

Page No. 80

Database Management Systems

Unit 4

In this relation, a student whose name is Std_name opts for subject


Sub_name and has dependent fac_incharge. A student can opt for multiple
subjects and may have several faculty incharge and the students subjects
and faculty incharge are independent of one another. Therefore, to keep the
relation state consistent we must maintain atomic entry feature and have
separate rows to represent every combination of a students faculty
incharge and students subject. This constraint is called Multi-Valued
Dependency (MVD) on STUDENT relation. MVD will arise when two
independent relationships are mixed in the same relation.
Therefore, to convert to Fourth Normal Form we need to decompose the
STUDENT relation into two 4NF relations STUD_SUB and STUD_FAC as
shown in Tables 4.7(b) and 4.7(c).
Table 4.7(b): STUD_SUB
Std_name
Pushpa
Pushpa

Table 4.7(c): STUD_FAC

Sub_name
Maths
Physics

Std_name
Pushpa
Pushpa

Fac_incharge
Prof. Chidanand
Prof. Ramesh

Now Tables 4.7(b) and 4.7(c) are in 4NF relation.


Fifth Normal Form (5NF)
An entity is said to be Fifth Normal Form (5NF) if and only if it is in 4NF and
every join dependency for the entity is a consequence of its candidate keys.
Join dependency means every legal state of the relation should have
nonadditive join decomposition.
For example, consider the relation STUDENT as shown in Table 4.8(a)
which has attributes Std_name, Sub_name, Proj_name. This has no MVD
and therefore it is in 4NF but not in 5NF.
Table 4.8(a): STUDENT Relation
Std_name

Sub_name

Proj_name

Pushpa

Chemistry

ProjX

Pushpa

Physics

ProjY

Kapila

History

ProjY

Kavitha

Maths

ProjZ

Kapila

English

ProjX

Kapila

Chemistry

ProjX

Pushpa

Chemistry

ProjY

Sikkim Manipal University

B1966

Page No. 81

Database Management Systems

Unit 4

To convert to 5NF, we need to decompose the above table into three


relations, namely, STD_SUB, STD_PROJ and SUB_PROJ as shown in
Tables 4.8(b), 4.8(c) and 4.8(d), respectively.

Now the above tables are in 5NF.


Self-Assessment Questions
8. ___________________________is a hidden file.
9. HNF stands for _________________________.
10. 2NF is also in _____________.
11. 3NF is based on _________________dependency.
12. ______________________ is the simplest form of 3NF.
13. _________________________ means every legal state of the relation
should have nonadditive join decomposition.
14. Fifth Normal Form is based on ____________________dependency.

4.6 Summary
Let us recapitulate the important concepts discussed in this unit:

The relational model represents the database as a collection of relations


having a set of rows and columns, each of which is assigned a unique
name. Relation consists of a relational schema and relational instance

In relational model, a row is called a tuple and a column is called an


attribute.

Domain is the set of permitted values for each attribute. Degree of


relation is defined as the number of attributes of its relation schema.
Tuple is a collection of components in a sequence. The number of

Sikkim Manipal University

B1966

Page No. 82

Database Management Systems

Unit 4

tuples in a relation is called an entity set. A single attribute in the relation


is a key.

Relational algebra is a language in which one relation gets defined by


another relation without the original relation getting changed. There are
many operations that work with relational algebra. They are Selection,
Projection, Cartesian product, Union, No duplication, Join operators and
Intersection.

A data dictionary contains files that have details about the date and
information present in the database. The typical data dictionary has
information such as schema definitions of the objects in the database,
the names of the database users, space allocated and the space used
by the schemas, authorisation details, default values of the fields,
updating information like who is the original author and who has
updated, and so on.

Normalisation is a process of analysing the given relation schemas


based on their functional dependencies and primary keys to achieve the
two desirable properties: minimising redundancy and minimising the
insertion, deletion and updatation anomalies. There are different types
of normal forms such as the First Normal Form, Second Normal Form,
Third Normal Form, BoyceCodd Normal Form, Fourth Normal Form
and Fifth Normal Form.

4.7 Glossary
Constraint: Constraints are used to limit the type of data that can go into a
table. Constraints can be specified when a table is created (with the
CREATE TABLE statement) or after the table is created (with the ALTER
TABLE statement).
Conventions: Conventions are tools or terminologies used to represent a
concept.
Functional dependency: A functional dependency is a constraint between
two sets of attributes in a relation from a database.
Model: A model is a representation of an object.
Multi-value databases: Multi-value databases include commercial products
from Rocket Software, TigerLogic, jBASE, Revelation, Ladybridge,
InterSystems, Northgate Information Solutions and other companies. These
Sikkim Manipal University

B1966

Page No. 83

Database Management Systems

Unit 4

databases differ from a relational database in that they have features that
support and encourage the use of attributes which can take a list of values,
rather than all attributes being single-valued.
Nonprime: Nonprime is an attribute that is never included in any candidate
key.
Nontrivial: Nontrivial is a functional dependencies database management
forum discussing nondatabase specific SQL.
Quadruple: A tuple with four rows is called quadruple.
Schema: Schema came from a Greek word which means shape. Schema
defines a shape of the database with the type of the field and its size, and
so on.

4.8 Terminal Questions


1. Explain the various concepts of relational data model with an example
of your own.
2. List the different operations of relational algebra with suitable
examples.
3. Describe data dictionary.
4. Compare the different normal forms and elucidate them with common
examples.

4.9 Answers
Self-Assessment Questions
1. Relational schema and relational instance
2. Tuple
3. Domain
4. False
5. b
6. Theta
7. Division
8. Data dictionary
9. Higher order Normal Form
10. 1NF
11. Transitive
Sikkim Manipal University

B1966

Page No. 84

Database Management Systems

Unit 4

12. BoyceCodd Normal Form


13. Join dependency
14. Join
Terminal Questions
1. The relational model represents the database as a collection of relations
having a set of rows and columns, each of which is assigned a unique
name. Relation consists of a relational schema and relational instance.
In relational model we use certain conventions. A row is called a tuple
and a column is termed as an attribute. The domain of a relational
schema is a pool of legal values. A domain is a set of atomic values. For
each attribute there is a set of permitted values called the domain of that
attribute. Degree of relation is defined as the number of attributes of its
relation schema. Tuple is a collection of components in a sequence. The
number of tuples in a relation is called an entity set. Primary Key: a
single attribute in the relation is a key. (Refer to Section 4.2 for further
information.)
2. Relational algebra is a language in which one relation gets defined by
another relation without the original relation getting changed. The
various operations of relational algebra are Selection, Projection,
Cartesian product, Union, No duplication, Join operators and
Intersection. (Refer to Section 4.3 for further information.)
3. A data dictionary contains files that have details about the date and
information present in the database. (Refer to Section 4.4 for further
information.)
4. Normalisation is a process of analysing the given relation schemas
based on their functional dependencies and primary keys to achieve the
two desirable properties: minimising redundancy and minimising the
insertion, deletion and updation anomalies. There are different types of
normal forms such as First Normal Form, Second Normal Form, Third
Normal Form, BoyceCodd Normal Form, Fourth Normal Form and Fifth
Normal Form. (Refer to Section 4.5 for further information.)

4.10 Case Study


Since the Internet business is a trend these days, a businessman by name
X has decided to start his own Internet business called ABC Ltd. His aim for
Sikkim Manipal University

B1966

Page No. 85

Database Management Systems

Unit 4

ABC Ltd. is to collect imitation jewelry from different parts of the country and
to market it to private individuals and commercial companies. He has called
upon a reputed database designer to design and implement a database to
support his new business. At the initial planning meeting, he has put forth
his requirements which is as follows:

The system must


and the jewelry
categorised as B
categories entitle
respectively.

Customers often request jewelry by a particular maker or pattern (e.g.


devitha, artistry, sterling, temple design, Gujarati style, kundan work,
bridal, etc.).

Over time, a customer may hire the same jewelry more than once.

Each jewelry is allocated to a customer on a monthly rental price defined


by the jewelry maker. The jewelry maker is then paid 10% of that
customer rental price. If any jewelry is not hired within 6 months, it is
returned to the maker. However, after 3 months, he can resubmit the
returned jewelry.

Each jewelry can have only one maker associated with it.

Several reports are required from the system. The three main ones are
as follows:

be able to manage the details of customers, jewelry


currently on hire to customers. Customers are
(bronze), S (silver), G (gold) or P (platinum). These
a customer to a discount of 0%, 5%, 10% or 15%,

1. For each customer, a report showing an overview of all the jewelry


they have hired or are currently hiring.
2. For each maker, a report of all jewelry submitted for hire.
3. For each maker, a returns report for the jewelry not hired over the
past 6 months.
Now, the first thing the database designer has to do is to collect the report
and create a set of Third Normal Form relations.
For ABC Ltd., the database designer produced three reports as shown
below.
Customer Rental Report
Customer (customer no., customer name, customer address, customer
category*)
Sikkim Manipal University

B1966

Page No. 86

Database Management Systems

Unit 4

Category (customer category, category description, category discount)


Rental (customer no., jewelry no., date of hire, date due back, return flag)
Jewelry (jewelry no., jewelry title, jewelry type)
Maker Report
Maker (Maker no., Maker name, address, phone)
Portfolio (maker no., jewelry no.)
Jewelry (jewelry no., jewelry title, jewelry type, rental price, owner no.*)
Owner (owner no., owner name, owner tel. no.)
Return to Owner Report
Owner (owner no., owner name, owner address)
Return (owner no., jewelry no., return date)
jewelry (jewelry no., jewelry title)
This has resulted in three occurrences of the jewelry entity and two
occurrences of the owner entity. Any entity that has the same primary key,
as in the case of the three jewelry and two owner entities, can be merged.
The key remains the same, together with all the non-key attributes.
(Source:http://www.sqa.org.uk/elearning/SoftDevRDS02CD/page_27.htm#MErgingE)
Discussion Questions:
1. Bring out the result of the above occurrences after merging.
2. State which is the Higher Normal Form which can be brought in the
above scenario.
References/E-References:
E-References:

http://www.computerhope.com/jargon/s/secostor.htm (Retrieved on 20th


June 2012)

http://www.ehow.com/list_6684495_characteristics-hard-drive_html
(Retrieved on 20th June 2012)

http://searchoracle.techtarget.com/definition/record (Retrieved on 22nd


June 2012)

Sikkim Manipal University

B1966

Page No. 87

Database Management Systems

Unit 4

http://www.owensdesign.com/case-studies-hard-disk-drive/index.html
(Retrieved on 22nd June 2012)

http://office.microsoft.com/en-us/access-help/inner-join-operationHA001231487.aspx (Retrieved on 29th June 2012)

http://docs.oracle.com/cd/B19306_01/server.102/b14220/datadict.htm
(Retrieved on 2nd July 2012)

http://db.grussell.org/section008.html#_Toc67114443 (Retrieved on 2nd


July 2012)

http://db.grussell.org/section008.html#_Toc67114448 (Retrieved on 3rd


July 2012)

http://db.grussell.org/section009.html#_Toc67114457 (Retrieved on 4th


July 2012)

http://db4u.wikidot.com/fourth-normal-form (Retrieved on 6th July 2012)

Sikkim Manipal University

B1966

Page No. 88

Database Management Systems

Unit 5

Unit 5

Entity Relationship Model

Structure:
5.1 Introduction
Objectives
5.2 Conceptual Data Model for Database Design
Create the ER Model
Conceptual data model
5.3 ER Model Concept with an Example
Components of an ER Model
Different types of attributes
5.4 Relationships, Roles and Structural Constraints
Relationships
Degree of relationship type
5.5 Constraints on Relationship Types
5.6 Summary
5.7 Glossary
5.8 Terminal Questiona
5.9 Answers
5.10 Case Study

5.1 Introduction
In Unit 4 you have studied the basic concepts of database design such as
data dictionary and normalisation. Using these concepts, now we will study
how to design a database and the different types of designing models.
Entity Relationship Model (ER Model) is used to represent objects in the
real world and the relationship among these objects, which represents the
overall logical structure of a database. We have also seen that the data
model that is independent of both the DBMS software and the hardware is
the conceptual model. ER Model is a high-level conceptual model
developed by Chen in 1976 to facilitate database design. The ER Model is
extremely useful in mapping the meaning and interaction of real-world
enterprises onto a conceptual schema. The main usage is in the design of
the database.
For better understanding of this unit you should have knowledge of the
relations and definition of ER Model. The Entity Relationship Model is
Sikkim Manipal University

B1966

Page No. 89

Database Management Systems

Unit 5

generally referred to as ER Model. As the name indicates, it is the


representation of the features and relationships that exist between two
entities and their relations. We will take a small example to describe the
meaning of a relation. Consider a relation called PART which is 5NF as
shown in Table 5.1. The five domains are a sets of values representing
Pno., Pname, Colour, Weight and Location in which parts are stored. For
example, The part colour domain is the set of all valid part colours.
Table 5.1: PART Relation
PNo.

Pname

Colour

Weight

Location

P1

Nut

Red

12

Bangalore

P2

Bolt

Green

17

Ahmedabad

P3

Screw

Blue

17

Rome

P4

Screw

Red

14

Bangalore

In Table 5.1, each row represents one tuple of the relation. The number of
tuples in a relation is called the cardinality of the relation; for example, the
cardinality of the PART relation is four.
Relations of degree one are said to be unary; similarly, relations of degree
two are binary.
Objectives
After studying this unit, you should be able to:
describe conceptual data model for database design
elucidate ER Model concept with an example
elaborate components of an ER Model
explain constraints on relationship types

5.2 Conceptual Data Model for Database Design


5.2.1 Create the ER Model
Entity Relationship Model or ER Model is a data model and is also called
ERD or Entity Relationship Diagram. There are various important elements
of ER Model. They are entities, attributes, identifiers and relationships which
are explained in this unit in further sections.

Sikkim Manipal University

B1966

Page No. 90

Database Management Systems

Unit 5

Conceptual data model is the first and important step among the three
phases of database design methodology. The three phases of database
design are conceptual design, logical design and physical design.
Conceptual database design - It is the process of constructing a
database model which is independent of all physical considerations,
using the information of the enterprise.
Logical database design - In logical database design, a model is
constructed based on a specific data model. The model is constructed
on the information used in an enterprise. This model is independent of
particular DBMS and other physical considerations.
Physical database design - The database description is produced
based on the implementation and is stored on a secondary storage.
5.2.2 Conceptual data model
In this unit, we will discuss conceptual design in detail. Figure 5.1 describes
the working of database design methodology in detail.

Fig. 5.1: Phase of Database Design (Simplified)


Sikkim Manipal University

B1966

Page No. 91

Database Management Systems

Unit 5

1. Step 1 - The first step in database design is requirements collection and


analysis. In this step, database designers interview clients and get their
required information. Without getting the exact requirement from the
client, it would become very difficult to design a good ER diagram, which
may lead to a very poor database design. It is useful to specify
functional requirements of the application. Data flow diagrams are used
to specify functional requirements. So you can create a data model to
represent what is needed: the content, relationships and constraints of
the data.
2. Step 2 - The next step in the application development is to create a
conceptual schema for the database. The conceptual schema is one
that describes the data type, relationship and constraints. This step is
called conceptual database design. This concept does not include any
implementation details or storage details. It is usually easier to
understand and can also be used to communicate with non-technical
users. In this model, you can include data constraints such as limits on
data values, integrity constraints and business rules.
3. Step 3 - After conceptual schema has been designed, the next step in
database design is the actual implementation of the database, using a
commercial DBMS like Oracle, MS Access, and so on. This step is
called logical database design. In this step, you can construct the
database and fill it with objects such as data, forms, reports, and so on;
here the responsibilities like training the users, writing the documents
and installing the system software are very important.
4. Step 4 - Finally, the last step is the physical database design phase.
During this step, the internal storage structure and file organisation for
the database are specified.
Self-Assessment Questions
1. The three phases of database design are __________, ____________
and physical design.
2. ______________________ is the process of constructing a database
model which is independent of all physical considerations using the
formation of the enterprise.
3. The second step in the applications development is to create a
conceptual schema for the database. (True/False)
Sikkim Manipal University

B1966

Page No. 92

Database Management Systems

Unit 5

5.3 ER Model Concept with an Example


The basic representation of ER Model is given below. After the
requirements collection and analysis phase, we create its conceptual
schema step-by-step by using ER Model concepts.
Example: A company database consists of the following:
Several departments and each department has a manager.
Several employees work for a department.
A department may have several locations.
Each department controls several projects.
We store each employees name, Social Security Number (SSN) and
address. An employee is assigned to one department but may work on
several departments. We keep track of the number of hours per week that
an employee works on each project.
We need to keep track of employeedepartment information for the purpose
of insurance, and so on.

Sikkim Manipal University

B1966

Page No. 93

Database Management Systems

Unit 5

Figure 5.2 is an example of an ER diagram.

Fig. 5.2: The ER Conceptual Schema Diagram for the New COMPANY
Database

Notations for Entity Relationship Diagrams


The main advantage of the ER Model is its simplicity, which helps in
understanding the overall structure of a database. An ER diagram includes
various notations as shown in Figure 5.3.

Sikkim Manipal University

B1966

Page No. 94

Database Management Systems

Unit 5

Fig. 5.3: Notations of ER Diagram Representation

Using the above notation we can represent the ER diagram.


5.3.1 Components of an ER Model
The ER diagram represents three main concepts.
Entities - The fundamental item in any ER Model is the entity which is a
thing in the real world with an independent existence that is
Sikkim Manipal University

B1966

Page No. 95

Database Management Systems

Unit 5

distinguishable from all other objects. For example, each employee in an


organisation is an entity. A company, a job, a book, and so on, are all
entities. Each entity has particular properties called attributes that
describe it, for example, an employee entity may be described by the
employees name, age, address, salary, and so on.

Entity sets - It is a set of entities of the same type that share the same
properties or attributes. The set of all employees working for the same
department can be defined as the entity set employee, but each entity
has its own values for each attribute. For example, Entity Type Name:
Employee
Company

Attributes - It refers to a set of properties. An example for the attributes


can be Name, Age, Salary
Name, Headquarters, President
Consider the example of a student database and its ER diagram having
entities such as STUDENT, PROGRAM, STUD_SECTION_A,
STUD_SECTION_B, and PROJECT, as shown in Figure 5.4.

Fig. 5.4: ER Diagram for a Student Database

Sikkim Manipal University

B1966

Page No. 96

Database Management Systems

Unit 5

5.3.2 Different types of attributes


Each attribute is associated with a set of values called domain.
Simple and composite attributes: Simple attributes are not divided
into sub parts. They are also called atomic attributes, for example: AGE.
Composite attributes can be divided into sub parts with an independent
meaning of their own. For example, address attribute can be composed
of components such as street number, area, city and pin code, as
shown in Figure 5.5.

Address

Street Number

Area

City

Pin code

Fig. 5.5: Example of a Composite Attribute

Single-valued and multi-valued attributes: A single-valued attribute is


one that holds a single value for a single entity.
For example: Age, room number.
Multi-valued attribute is one that holds multiple values for a single entity.
For example, college degree attribute for studies [B.Sc., M.Sc. Ph.D.].

Derived attributes: Derived attribute is one that represents a value that


is derived from the value of a related attribute.
For example, the value of Age can be determined from the current date
[todays] and the value of that persons birthday; the age attribute is
hence called derived attribute and is said to be derivable from the
birthday attribute which is called a stored attribute.

Null attribute - A null value attribute is used when an attribute does not
have any value. A null value does not mean that the value is equal to
zero, but it indicates that no value is stored for that attributefor
example, (a) Apartment number attribute of an address applies only to
addresses that are in apartment buildings and not in other types of

Sikkim Manipal University

B1966

Page No. 97

Database Management Systems

Unit 5

residences such as single-family homes; (b) E-mail: All employees in an


employee database may not have e-mail addresses.

Key attribute - An entity type usually has an attribute whose values are
distinct for each individual entity. Such an attribute is called a key
attribute. These attributes that uniquely identify every instance of the
entity are termed as the primary key.

Value sets or domain attributes - Each attribute is associated with a


set of values called domain of that attribute; assume that values from a
set of permitted values. For example, range of ages allowed for
employees is between 18 and 58. Domain of the attribute name might
be a set of all text strings of a certain length; mathematically, an attribute
A of entity type E whose value set is V can be defined as a function from
E to the power set P(v) or V as A:E--->P(v).

Self-Assessment Questions
4. __________________ is a thing in the real world with an independent
existense that is distinguishable from all other objects.
5. State whether the following statements are true or false:
a) Entity sets is a set of properties.
b) Attributes are a set of entities of the same type that share the
same properties.
6. Simple attributes are called _______________ attributes.
7. ___________________ attribute holds a single value for a single entity.
8. State whether the following statements are true or false:
a) Key attribute is an attribute that can be used when an attribute does
not have any value.
b) A set of values associated with the attributes is called domain.

5.4 Relationships, Roles and Structural Constraints


5.4.1 Relationships
In the real world, items have relationships with one another, for example, a
book is published by a particular publisher. The association or relationship
that exists between the entities relates data items to each other in a
meaningful way. A relationship is an association between entities. A
collection of relationships of the same type is called a relationship set.
Sikkim Manipal University

B1966

Page No. 98

Database Management Systems

Unit 5

A relationship type R is a set of associations between E1, E2, En entity


types. Mathematically it can be represented as R = {ri}; R is a set of
relationship instances ri.
For example, consider a relationship type WORKS_FOR between two entity
types employee and department, which associates each employee with
the department the employee works for, as shown in Figure 5.6. Each
relationship instance in WORKS_FOR associates one employee entity and
one department entity, where each relationship instance is Ri which
connects employee and department entities that participate in Ri.
Employee E1, E3 and E6 work for department D1. Employee E2 and E4
work for D2, and E5 and E7 work for D3. Relationship type R is a set of all
relationship instances.

Fig. 5.6: Some Instances of the WORKS_FOR Relationship

5.4.2 Degree of relationship type


Degree of relationship type is the number of entity sets that participate in a
relationship set. The degree of the relationship types are shown in the
Figures 5.7(a), 5.7(b) and 5.7(c).
Employee

Manage
s
Fig. 5.7(a): Example for a Unary Relationship
Sikkim Manipal University

B1966

Page No. 99

Database Management Systems

Unit 5

A unary relationship exists when an association is maintained with a single


entity.

Fig. 5.7(b): Example for a Binary Relationship

A binary relationship exists when two entities are associated.

Fig. 5.7(c): Example for a Ternary Relationship

A ternary relationship exists when there are three entities associated.


Role Names and Recursive Relationship
Each entry type to participate in a relationship type plays a particular role in
the relationship. The role name signifies the role that a participating entity
from the entity type plays in each relationship instance. For example, in the
WORKS_FOR relationship type, the employee plays the role of employee or
worker and the department plays the role of department or employer.
However, in some cases the same entity type participates more than once
in a relationship type in different roles. Such relationship types are called
recursive.
For example, employee entity type participates twice in SUPERVISION
once in the role of supervisor and once in the role of supervisee.

Sikkim Manipal University

B1966

Page No. 100

Database Management Systems

Unit 5

Fig. 5.8

Self-Assessment Questions
9. Relationship types is a set of all attributes. (True/False)
10. When the association is maintained with a single entity then it is a
______________ relationship.
11. If the same entity type participates more than once in a relationship
type in different roles then such relationship type is called __________
relation.

5.5 Constraints on Relationship Types


Relationship types usually have certain constraints that limit the possible
combination of entities that may participate in the relationship instance; for
example, if the company has a rule that each employee must work for
exactly one department. The two main types of constraints are cardinality
ratio and participation constraints.
The cardinality ratio specifies the number of entities to which another entity
can be associated through a relationship set.
The different mapping cardinalities, as you have already studied in Unit 2,
are of four types. They are as follows:
One-to-one
One-to-many
Many-to-one
Many-to-many
One-to-one - An entity in A is associated with at most one entity in B and
vice versa shown in Figure 5.9.

Sikkim Manipal University

B1966

Page No. 101

Database Management Systems

Unit 5

Fig. 5.9: Example for a One-to-One Cardinality

An employee can work in only one department and that a department has
only one manager.
One-to-many - An entity in A is associated with any number in B. An entity
in B, however, can be associated with at most one entity in A.

Fig. 5.10: Example for a One-to-Many Cardinality

Each department can be related to numerous employees but an employee


can be related to only one department, as shown in Figure 5.10.
Many-to-one - An entity in A is associated with at most one entity in B. An
entity in B, however, can be associated with any number of entities in A.
Many depositors deposit into a single account.
Many-to-many: An entity in A is associated with any number of entities in B
and an entity in B is associated with any number of entities in A, as shown
Figure 5.11.

Fig. 5.11: Example for Many-to-Many Cardinality

Sikkim Manipal University

B1966

Page No. 102

Database Management Systems

Unit 5

An employee can work on several projects and several employees can work
on a particular project.
Participation roles - There are two ways in which an entity can participate
in a relationship:

Total participation - The participation of an entity set E in a relationship


set R is said to be total if every entity in E participates in at least one
relationship in R. Every employee must work for a department. The
participation of employee in WORK_FOR is called total, which is shown
in Figure 5.12.

Fig. 5.12: Some Instances of the WORKS_FOR Relationship

Total participation is sometimes called existence dependency.

Partial participation - If only some entities in E participate in


relationship in R, the participation of entity set E in relationship R is said
to be partial, which is shown is Figure 5.13.

Sikkim Manipal University

B1966

Page No. 103

Database Management Systems

Unit 5

Fig. 5.13: Some Instances of the WORKS_FOR Relationship

We do not expect every employee to manage a department, so the


participation of employee in MANAGES relationship type is partial.
Weak entity - Some entity types may not have any key attribute of their
own; they are called weak entity types. An entity set that has a primary key
is termed as a strong entity type. A weak entity type always has a total
participation (existence dependence) with respect to a strong entity.
A weak entity type is dependent on the existence of another entity. Weak
entity is also referred to as child, dependent or subordinate entities; strong
entities are referred to as parent, owner or dominant entities. For example,
in Figure 5.14, relationship PARENT is a weak entity as it needs the entity
EMPLOYEE for its existence. The entities EMPLOYEE, COMPANY, and so
on, are strong entities. Weak entities are represented by a double-lined
rectangle.

Fig. 5.14: Example to Represent Weak Entity


Sikkim Manipal University

B1966

Page No. 104

Database Management Systems

Unit 5

Self-Assessment Questions
12. A student belongs to only one class and the class can have many
students. This is a good example for _______________________
relationship.
13. The two types of participation roles are ________________________
and _________________________.
14. Total participation is also called ___________________________.

5.6 Summary
Let us recapitulate the important concepts discussed in this unit:

Entity Relationship Model is used to represent objects in the real world


and the relationship among these objects, which represents the overall
logical structure of a database. Entity-Relationship Model is generally
referred to as ER Model. As the name indicates, it is the representation
of the features and relationships that exist between two entities and their
relations.

Conceptual data model is the first and the most important step among
the three phases of database design methodology. The three phases of
database design are conceptual design, logical design and physical
design. The conceptual design on the database design is a four-step
process. The first step is the requirement analysis, the second step is
the creation of the conceptual schema, the third step is the actual
implementation of the conceptual schema and the last step is the
physical database design phase.

After the requirement collection and analysis phase, we create its


conceptual schema step-by-step by using ER Model concepts.

The main advantage of the ER Model is its simplicity. It helps in the


understanding of the overall structure of a database. An ER diagram
includes various notations to represent the various concepts like weak
entity, composite attributes, and so on.

5.7 Glossary
Instance: Instance is an occurrence or a copy of an object, whether
currently executing or not.
Notation: Is a symbol used to represent a particular concept.
Sikkim Manipal University

B1966

Page No. 105

Database Management Systems

Unit 5

Oracle: Oracle is a brand name for database application developer,


specially for Object Relational Database Management System (ORDBMS).
Set: A set is a collection of well-defined and distinct objects.

5.8 Terminal Questions


1.
2.
3.
4.

Explain the different phases in the design of ER Model.


List the various notations in constructing ER diagrams.
Describe the different elements of ER Model.
Briefly explain various constraints on relationship types.

5.9 Answers
Self-Assessment Questions
1. Conceptual, logical
2. Conceptual database design
3. True
4. Entity
5. Answers
a) False
b) False
6. Atomic
7. Single-valued
8. Answers
a) False
b) True
9. False
10. Unary
11. Recursive
12. One-to-many
13. Total participation and partial participation
14. Existence dependency
Terminal Questions
1. The four phases in the design on ER Model are requirement analysis,
schema design, implementation and physical design. (Refer to Section
5.2.2 for further information.)

Sikkim Manipal University

B1966

Page No. 106

Database Management Systems

Unit 5

2. Entity, weak entity, attributes and composite attribute, and so on. (Refer
to Section 5.3 and Figure 5.3 for further information.)
3. Entity, attributes, identifier, relationships, and so on. (Refer to
Sections 5.4 and 5.5 for further information.)
4. Relationship types usually have certain constraints that limit the
possible combination of entities that may participate in the relationship
instance.

5.10 Case Study


The ABC Company needs a database to track employee information. When
an employee is hired, they are assigned to a particular department. Each
employee is assigned an employee ID and a manager. The HR department
also needs to track the employees name, date of birth and hire date.
Department information, such as department code, name and budget code
should also be tracked. A department will have many employees, but an
employee can be assigned to only one department.
Here, we have two entities EMPLOYEE and DEPARTMENTS and the
attributes
emp_ID,
Manager_ID,
Fathers_name_Address,
DOB,
Date_of_joining are assigned to EMPLOYEE entity and the attributes
dept_code,
dept_name,
budget_code
are assigned to entity
DEPARTMENT. The relationship of the employee details is shown in the
figure below.

Sikkim Manipal University

B1966

Page No. 107

Database Management Systems

Unit 5

Recursive Relation

EMPLOYEE

One department may have


many employees

Emp_ID
Manager_ID

DEPARTMENT

Father_name

Dept_code

Address

Dept_name

DOB
Date_of_joining

Each employee belongs to


one department

Budget_code

Dept_code

Discussion Questions:
1. Which are the important elements of the ER Model? Identify the different
elements pertaining to the above case.
2. Why employee is a recursive relation and what kind of relationship does
EMPLOYEE entity share with itself?
3. What kind of relationship does DEPARTMENT entity share with
EMPLOYEE?
4. Which is the foreign key in the above case and why?
5. Consider the enhancement of the above case considering each
employee as either salaried or hourly. Hourly employees receive an
hourly rate of pay. Salaried employees can be assigned to projects.
Projects have a definite start and end date and may have a team of
salaried employees working on it. Each project is given a priority level of
low, medium or high. Identify the different components in this case and
construct an ER diagram for the same.

Sikkim Manipal University

B1966

Page No. 108

Database Management Systems

Unit 5

References/E-References:
E-References:
http://www.google.co.in/url?sa=t&rct=j&q=conceptual%20data%20model
%20for%20database%20design&source=web&cd=7&ved=0CGAQFjAG
&url=http%3A%2F%2Fpeople.stfx.ca%2Frpalanis%2F475%2FConceptu
al.ppt&ei=wIb6T6m5MsexrAfb0angBg&usg=AFQjCNEG6R_KFxyPDSa
CHVYTSrdH0j9lyA (Retrieved on 9th July 2012)

Sikkim Manipal University

B1966

Page No. 109

Database Management Systems

Unit 6

Unit 6

Relational Algebra and Relational Calculus

Structure:
6.1 Introduction
6.2 Relational Model Constraints
Domain constraints
Key constraints
Constraints on NULLs
Entity-integrity constraints
Referential-integrity constraints
6.3 Update Operations on Relations
Insert operations
Delete operations
Modify operations
6.4 The Relational Algebra
Set theoretic operations
Relational operations
6.5 Relational Calculus
Tuple relational calculus
Domain relational calculus
Tuple relational calculus versus domain relational calculus
Relational algebra versus relational calculus:
6.6 Summary
6.7 Glossary
6.8 Terminal Questions
6.9 Answers

6.1 Introduction
In Unit 4 you have studied the basic concepts of relational algebra such as
the different types of operations in relational algebra and their definition.
You have already studied that the relational model represents the database
in terms of relations having a set of rows and columns, each of which is
assigned a unique name. According to the relational model, database is a
collection of relations. The relational model was first introduced by Ted
Codd in 1970. As it was simple and based on mathematics, it was
immediately accepted by the people. This model is based on mathematical
Sikkim Manipal University

B1966

Page No. 110

Database Management Systems

Unit 6

relations which uses table of values, basic building blocks of mathematics


and theoretical basis in set theory and first-order predicate logic.
In this unit, we will discuss the basic characteristics of the model and its
constraints. As discussed in Unit 4, a data model needs to have a set of
operations to manipulate the database, and we also need to have a set of
constraints to define the database structure. The basic operations in
relational algebra help the user to retrieve the requests from the database.
The basic set of operations forms relational algebra, and the sequence of
relational algebraic operations forms relational algebraic expression. You
must always note that the result from these requests is also a relation.
Relation consists of a relational schema (structure of table) and relational
instance (data in a table at the particular time); there is a close
correspondence between the concept of table and the mathematical
concept of relation.
In this unit, we will discuss two formal languages for the relational model.
They are relational algebra and relational calculus. While a set of operations
are defined by algebra, calculus provides the notation to specify the
database queries related to the relations. In this unit, we will also discuss
the two variations of relational calculus.
Objectives:
After studying this unit, you should be able to:
describe relational model constraints
elucidate update operation on relations
demonstrate various operations in relational algebra
differentiate between tuple relational calculus and domain relational
calculus

6.2 Relational Model Constraints


The relational database is composed of many relations and tuples. Each
tuple is related to one another in a number of ways. The database depends
on the state of its relations at a particular instant of time. The state of the
relation depends on the restrictions which are put on the actual values in the
database. As you have already studied characteristics of a single relation in
Unit 4, in this unit you will study the various restrictions that are put on the
database to form a constraint. There are three main types of constraints:
Sikkim Manipal University

B1966

Page No. 111

Database Management Systems

Unit 6

Inherent-model-based
Schema-based
Application-based

Inherent-model-based This type is also called implicit constraint.


These constraints are inherent in the data model. For example,
constraint for ordering a tuple in a relation and ordering of values within
a tuple and an alternative definition of relation are inherent constraint.

Schema-based This type is also called explicit constraint. These


constraints are expressed by specifying them in the Data Definition
Language (DDL) about which you have studied in Unit 2. These
constraints can be directly put on the schemas of the data model.
Examples of schema-based constraints are domain constraint, key
constraint, constraints on NULLs, entity-integrity constraints and
referential-integrity constraints.

Application-based This type is also called semantic constraint or


business rules. These constraints are expressed by the application
programs. These constraints cannot be directly put on the schemas of
the data model. For example, behaviour of the attributes, meaning of the
attributes and those which are difficult to express within the data model
and are checked within an application program.

In this section, we will discuss about the schema-based constraints in detail.


As you already know, they are expressed on the schemas and specified in
the DDL.
Now we will concentrate on the different types of schema-based constraints,
and they are as follows:
Domain constraints
Key constraints
Constraints on NULLs
Entity-integrity constraints
Referential-integrity constraints
6.2.1 Domain constraints
The domain constraints specify that the value of each attribute in a tuple
must be an atomic value from the same domain.
Sikkim Manipal University

B1966

Page No. 112

Database Management Systems

Unit 6

That means for any attribute A in a tuple r, A must be the atomic value from
the same domain dom(A). By atomic we mean that each value in the
domain is indivisible as far as the relational model is concerned.
Examples:
o The set of 11 digit phone numbers is valid in India.
o The set of character strings represent the name of the person.
o The age of an employee in a company must vary between 18 and 65.
Therefore, the data types available for domain constraint may be character,
integer, real numbers, Boolean, fixed-length and variable-length strings,
date, time, currency, and so on.
6.2.2 Key constraints
Key constraint states that any two tuples in a relation cannot have identical
values for all the attributes in the key, and key is a minimal superkey; it
means it is a superkey from which we cannot remove any further attributes
from the database and still the uniqueness exists satisfying the first
condition.
Table 6.1: STUDENT Relation
Std_ID

Std_name

Class

Subject_code

Marks
obtained

If you recall the example of STUDENT database from Unit 4, std_ID is a key
as no two students in the database have the same std_ID. In Table 6.1, a
superkey is [std_ID, Std_name, class, Subject_code, Marks Obtained]. This
is not the key because removing std_name or class still leaves us with a key
attribute.
6.2.3 Constraints on NULLs
In any relation, NULL attributes are not allowed. Another constraint on
attribute is to specify whether the NULL attributes are allowed or not in any
relation. For example, suppose in a STUDENT database if it has to have a
valid tuple then every student must have a name and class. Then in that
case std_name and class are constrained to be NOT NULL.
6.2.4 Entity-integrity constraints
In any relation, a primary key attribute cannot have NULL value; because, if
there is NULL value in the primary key then we may not be able to identify
Sikkim Manipal University

B1966

Page No. 113

Database Management Systems

Unit 6

the tuple in the relation and we may lose one or more tuples which have
NULL value. This constraint is expressed by entity-integrity constraints.
Table 6.2: STUDENT Relation with Null Values
Std_ID

Std_name

Class

Subject_code

Marks
obtained (%)

101

AAA

1 MBA

CN1

67

NULL value

BBB

1 MBA

DB1

78

104

CCC

1MBA

DB1

67

NULL value

EEE

1 MBA

CN1

89

For example, in the STUDENT relation, if std_ID can have NULL values as
in the case of Table 6.2, when we give reference of std_ID we may lose the
tuples which have NULL value and then it cannot be a primary key as per its
definition.
We must keep in mind that entity-integrity constraints are expressed on
individual relation.
6.2.5 Referential-integrity constraints
In order to have a clear understanding of referential-integrity constraints, let
us recall the definition of foreign key which you have studied in Unit 4 a
foreign key of one relation is a primary key in the related table.
For example, consider Tables 6.3(a) and 6.3(b), STUDENT and SUBJECT
Relations.
Table 6.3 (a): STUDENT Relation
Std_ID

Std_name

Class

Subject_code

Marks obtained
(%)

101

AAA

1 MBA

CN1

67

103

BBB

1 MBA

DB1

78

104

CCC

1MBA

DB1

67

105

EEE

1 MBA

SE1

89

Sikkim Manipal University

B1966

Page No. 114

Database Management Systems

Unit 6

Table 6.3(b): SUBJECT Relation


Sub_code

Sub_name

Fac_incharge

CN1

Computer Networks

Prof. Prithvi Mehtha

DB1

Database Management System

Prof. Guru. S

SE1

Software Engineering

Prof. Thimmaih

In Table 6.3(a), the attribute subject_code gives the subject code for which
each student opts for in his/her class. Therefore, the value in the
subject_code in the STUDENT relation must match the sub_code value of
some tuple in the SUBJECT relation. Here, sub_code is a primary key of
SUBJECT relation and hence it is a foreign key in STUDENT relation.
In the above example, STUDENT relation is called referencing relation and
SUBJECT relation is called referenced relation.
Therefore, if a referential-integrity constraint has to be held in a database,
then the attributes of foreign key of referencing relation must have the same
domain as the primary key of referenced relation. Also, the value of a
foreign key in a tuple of a current state of the referencing relation occurs as
a value of primary key for some tuple in the current state of the referenced
relation or it is a NULL.
Self-Assessment Questions
1. ________________________________ constraints are also called as
implicit constraints.
2. Application-based constraint is also called as ____________________
constraint.
3. State whether the following statements are true or false:
a) Domain constraint states that any two tuples in a relation cannot
have identical values.
b) NULL attributes are allowed in a relation.
4. If a referential integrity has to be held in the database, then the
attributes of foreign key of referencing relation must have same domain
as the primary key of the referenced relation.

Sikkim Manipal University

B1966

Page No. 115

Database Management Systems

Unit 6

6.3 Update Operations on Relations


The operations of the data model based on relations can be classified as
retrieval operations and update operations. Retrieval operations are the
base for relational algebra which is explained in detail in Section 6.4. In this
section, we will discuss on update operation in detail. The three operations
are as follows:
Insert operations
Delete operations
Modify operations
6.3.1 Insert operations
Insert operations are used to insert a new tuple or tuples in a relation. Insert
operations can violate any of the following four types of constraints:

Domain constraints Can be violated if an attribute value is given that


does not appear in the corresponding domain (allows only permitted
values).

Key constraints Can be violated if a key value in the new tuple t


already exists in another tuple in the Relation r(R) (avoids duplicate
entries).

Entity constraints Can be violated if the primary key of the new tuple
t is NULL (avoids NULL values).

Referential integrity Can be violated if the value of any foreign key in


t refers to a tuple that does not exist in the referenced relation (foreign
key values should match with primary key values).

For example, the operation,


insert into std values (1, AAA, etc.) into std
is acceptable.
insert into std values (1, BBBetc.) into std
is not acceptable because the same already exists in the STUDENT
relation, violates key constraints.
insert into std values [NULL, raj.etc.] into std
is not acceptable because NULL for the primary key eno; it violates entityintegrity constraints.
Insert into std values (1, AAA, CO1, etc.) into std
Sikkim Manipal University

B1966

Page No. 116

Database Management Systems

Unit 6

is not acceptable because it violates referential-integrity constraints


specified on sub_code, no SUBJECT tuple exists with sub_code = CO1.
6.3.2 Delete operations
It is used to delete tuples. Delete operation can violate only referential
integrity if the tuple being deleted is referenced by foreign keys from other
tuples in the DATABASE. To specify the deletion, a condition on the relation
selects the tuple to be deleted.
Three options are available if a deletion operation causes a violation.
The first option is to reject the deletion.
The second option is to attempt to delete the referenced tuples, when
link data field is deleted.
The third option is to modify the referencing attribute values that cause
the violation.
For example, the deletion operation,
Delete the STUDENT tuple with std_ID = 5 is acceptable whereas delete
the SUBJECT tuple with sub_code = 10 is not acceptable because tuples in
STUDENT refer to this tuple. This results in referential integrity violation.
6.3.3 Modify operations
The modify operation is used to change the values of one or more attributes
in a tuple/s some relation R
It is necessary to specify a condition on the attributes of relation R to
select the tuple/s to be modified.
For example: Modify the salary of STUDENT with std_ID = 101 to 1,000.0
1. Modifying an attribute that is neither a primary key nor a foreign key
usually causes no problems. The DBMS only needs to check to confirm
that the new value is of correct data type and domain.
2. If a foreign key attribute is modified, the DBMS must make sure that
the new value refers to an existing tuple in the referenced relation.
3. If you are modifying a primary key, key constraints are violated if that
modified primary key value already exists
For example, modifying the salary of the STUDENT tuple with std_ID = 100
to 10,000 is an acceptable operation whereas modifying the number of the

Sikkim Manipal University

B1966

Page No. 117

Database Management Systems

Unit 6

SUBJECT tuple with sub_code = 10 to 40 is not an acceptable operation as


it violates referential integrity.
Self-Assessment Questions
5. ___________________ operations are the base for relational algebra.
6. ______________________ constraint can be violated if an attribute
value is given that does not appear in the corresponding domain.
7. In _____________________________ constraint foreign key values
should match with primary key values.

6.4 The Relational Algebra


In the beginning of this unit, you studied that there are two formal languages
in the relational model. The most important among these two is relational
algebra. There are many reasons for using relational algebra. Some of them
are as follows:
It acts as the building block for relational model operations.
It is the basis for RDBMS.
Few operation concepts are used in SQL in RDBMS
Relational algebra is classified based on two types, namely, mathematical
set theory and operations for relational databases.
1. Set theoretic operations Based on mathematical set theory, we have
the following operations in relational algebra:
Union
Intersection
Set difference
Cartesian product
2. Relational operations Based on operations for relational databases,
we have the following operations in relational algebra:
SELECT
PROJECT
JOIN
SELECT and PROJECT are the unary operations and JOIN is a binary
relation. Unary operations are the one that operate on one relation. Binary
relations are the ones that operate on two relations.

Sikkim Manipal University

B1966

Page No. 118

Database Management Systems

Unit 6

6.4.1 Set theoretic operations


These are used to merge the elements of two sets in various ways,
including union, intersection and difference. Three of these operations
require the table to be union compatible. The two relations are said to
require the table to be union compatible. The two relations are said to be
union compatible if the following conditions are satisfied:
1. The two relations/tables (say R & S as shown in Tables 6.4(a) and
6.4(b)) have the same number of columns (have the same degree).
2. Each column of the first relation/table must be either the same data
type as the corresponding column of the second relation/table(s).
Table 6.4(a): R Relation

Table 6.4(b): S Relation

Std_ID

Name

Std_ID

Name

Jyothi

Girija

Ganga

Ankitha

Girija

Tanvi

Ankitha

Manvi

Union ( ) - The union operation is denoted by symbol . The result


of this operation is denoted by RS, in which R and S are relations
and the result is also a relation that includes all tuples that are either in
R or in S or in both. Duplicate tuples will not appear in the output.
For example, the union result of Tables 6.4(a) and 6.4(b) is shown in
Table 6.4(c).
Table 6.4(c): RS Relation
RS

Sikkim Manipal University

Std_ID

Name

Jyothi

Ganga

Girija

Ankitha

Tanvi

Manvi
B1966

Page No. 119

Database Management Systems

Unit 6

Intersection () The intersection operation is denoted by the symbol


." The intersection operation selects the common tuples from the two
relations.
For example, the result of the intersection operation in the above two
relations is given in Table 6.4(d)
Table 6.4(d): RS Relation

RS
Std_ID
3
4

Difference ( ) The difference operation is denoted by the symbol .


The set difference operation selects those tuples that are in the first
relation and not in the second relation. For example, the result of the
difference (R S) consists of all tuples in R but not in S as shown in
Table 6.4(e)
Table 6.4(e): R S Relation
RS
Std_ID
1
2

Name
Girija
Ankitha

Name
Jyothi
Ganga

Cartesian products (X) The Cartesian product is denoted by the


symbol X. The Cartesian product or cross-product is a binary operation
that is used to combine two relations. For example, let us assume R and
S as relations with n and m attributes, respectively; the Cartesian
products R x S can be written as:
R(A1, A2, , An) S(B1, B2, , Bn)
The result of the above set operation is
Q (A1, A2, , An, B1, B2, , Bn)
Total number of columns in Q which is called as the degree (Q) = n + m
Total number of tuples in Q which is called as count (Q) = Number of
tuples in R * Number of tuples in S.

Sikkim Manipal University

B1966

Page No. 120

Database Management Systems

Unit 6

Therefore, for a better understanding of this operation, let us consider


different tables as R and S as shown in Tables 6.5(a) and 6.5(b).
Table 6.5(a): R Relation

Table 6.5(b): S Relation

Sub_code

Sub_name

Proj_ID

Proj_name

EC1

E&C

10

Networking

CS1

Computer
Science

11

Payroll

HR1

HRD

Cartesian product of R and S can be written as RS


result is as shown in Table 6.5(c).

R S and the

Table 6.5(c): R S Relation</


RS
Sub_code

Sub_name

Proj_ID

Proj_name

EC1

E&C

10

Networking

EC1

E&C

11

Payroll

CS1

Computer science

10

Net working

CS1

Computer science

11

Payroll

HR1

HRD

10

Networking

HR1

HRD

11

Payroll

The relation R has 2 columns and 3 tuples. The relation S has 2 columns
and 3 tuples. So the Cartesian product has 4 columns (2 + 2) and 6 tuples
(3 x 2).
The Cartesian product operation applied by itself alone is generally
meaningless. It is useful only when followed by selection and projection
operations.
6.4.2 Relational operations
These are the operations that are developed for relational databases. In this
section, we discuss about SELECT, PROJECT and JOIN operations.
The SELECT operation: This operation selects required rows from the
table. This operation is used to select the subset of the tuples from a
Sikkim Manipal University

B1966

Page No. 121

Database Management Systems

Unit 6

relation that satisfies a selection condition or search criteria. This


operation is denoted by mathematical symbol (read as sigma). The
general syntax used to represent selection operation is,
<Selection condition>(<relation name>).
The <Selection condition> is a Boolean expression, and it consists of
attribute names, comparison operators like =,!=,<,<=,>,>= and Boolean
operations like AND, OR and NOT.
For example, to select the students from the STUDENT relation who
have opted for the subject with subject code CS1 and whose marks are
greater than 60%, the expression used is as shown below:
(sub_code=CS1 AND marks_obt>60%)(STUDENT)
To select the students for the STUDENT relation who have opted the
subject with subject code CS1 and scored more than 60% or students
who have opted the subject with subject code HR1 and scored more
than 80%, the below expression is used:
(sub_code=CS1 AND marks_obt> 60%) OR (sub_code=HR1 AND
marks_obt> 80%) (STUDENT)

The PROJECT operation - Projection operation is used to select only


few columns from a table. This operation is denoted by the symbol
(read as pie) and the general syntax used to represent selection
operation is as shown below:
<attribute list> (<relation name>)
Here, <attribute list> is a list of attributes from the relation. Hence, the
degree (number of columns) of the result is equal to the number of
attributes specified in the attribute list.
For example, to select the names and marks obtained by all the
students, we express as follows:
std_name.marks_obt (STUDENT).
This query selects only name and marks obtained by the students from
relation STUDENT.

Sikkim Manipal University

B1966

Page No. 122

Database Management Systems

Unit 6

To select names and addresses of all students who have opted for the
subject with subject code CS1, the below query is used:
std_name, address(sub_code=CS1(STUDENT)

The JOIN operation This is denoted as Join (x). The capability of


retrieving data from multiple tables using a single SQL statement is one
of the most powerful and useful features of RDBMS, which is made
possible due to the availability of JOIN operation. We know that one
table may not give all the information about a particular entity. The JOIN
operation, denoted by is used to combine two relations to retrieve
useful information. A JOIN operation matches data from two or more
tables; based on the values of one or more columns in each table, it
allows us to process more than one table at a time.
For example, the STUDENT table gives only the sub_codes. If we want
to know the subject name, then we have to get the information by joining
STUDENT table and SUBJECT table. In JOIN, only combinations of
tuples satisfying the join condition appear in the result.
The general syntax for a JOIN operation is given below:
Rx<join condition>S
For example, by joining STUDENT and SUBJECT relations, we can get
the name of the subject which the student has opted for (subject name
exists in SUBJECT table). The below query can be used for this
example:
Select std_ID, std_name, SUBJECT.sub_name from STUDENT.
SUBJECT Where STUDENT.sub_code = SUBJECT.sub_code
andstd_ID = &std_ID
STUD_SUB
STUDENTx
Sub_codeSUBJECT
RESULT

STUDENT.sub_code=

SUBJECT.

((std_ID, std_name, sub_name) STUD_SUB)

The first operation in the JOIN operation will combine the tuples of the
STUDENT and SUBJECT relations on the basis of the sub_code to form
a relation called STUD_SUB. Then the Project operation will create a
relation RESULT with the attributes std_ID, std_name, and sub_name.
Sikkim Manipal University

B1966

Page No. 123

Database Management Systems

Unit 6

To perform JOIN between two relations, there should be a common field


between them.
As discussed in Unit 4, there are different types of JOIN operations such
as theta join, equi join, natural join, outer join, and so on. We will now
discuss the different types of joins with an example.

Theta join According to Elmasri and Navathe, A join condition is of


the form
<Condition>and<condition>and<condition>
where, each condition is of the form Ai Bj (SUBJECT.sub_code =
STUDENT.sub_code). Ai is an attribute of R and Bj is an attribute of S.
Ai and Bj have the same domain (same values) and (read as theta) is
one of the comparison operators (=,<,<=,>,>=,!=). A join operation with
such a general join condition is called a 'Theta join'.

Equi join - While joining, if the comparison operator is = then it is an


equi join.
For example,
Select std_ID, std_name, SUBJECT.sub_name from STUDENT.
SUBJECT where STUDENT. Sub_code = SUBJECT.sub_code.

Natural join - Natural join is represented by the symbol . The


standard definition of natural join requires that the join attributes have
the same name in both relations. In general, natural join is performed by
equating all attribute pairs that have the same name in the two relations.
The general format is,
QR <list 1><list 2> S
Here, list 1 specifies list of attributes from R and list 2 specifies a list of
attributes from S.
Table 6.6(a): SUBJECT Relation
Sub_code

Sub_name

CN1

Computer networks

SE2

Software engineering

HR3

Human Resource

Sikkim Manipal University

B1966

Page No. 124

Database Management Systems

Unit 6

Table 6.6(b): Project Relation


Pnumber

Pname

Sub_code

10

Library Management

SE2

20

ERP

HR3

30

Hospital Management

SE2

40

Wireless Network

CN1

Table 6.6(c): SUB_PROJ Relation


Pnumber

PName

Sub_code

Sub_name

10

Library
Management

SE2

Software engineering

20

ERP

HR3

Human Resource

30

Hospital
Management

SE2

Software Engineering

40

Wireless Network

CN1

Computer Networks

Here, the joining is done over the attribute sub_code of SUBJECT


relation and sub_code of PROJECT relation, as shown in Tables 6.6(a),
6.6(b) and 6.6(c). In fact, sub_code of PROJECT is a foreign key which
references sub_code of SUBJECT. Generally, in a natural join, the
joining attribute is implicitly considered. Suppose the two relations have
no attribute(s) in common, RS is simply the cross-product of these two
relations. Joining can be done between any set of attributes and need
not be always with respect to the primary key and foreign key
combinations.
The expected size of the JOIN result divided by maximum size, that is,
nRnS leads to a relation called join selectively.

Outer join It returns both matching and non-matching rows. It differs


from the inner join, in the sense that the rows in one table having no
matching rows in the other table will also appear in the results table with
nulls in the other attribute position, instead of being ignored (as in case
with the inner join). It outputs rows even if they do not satisfy the join
condition. An example of outer join is shown in Tables 6.7(a), 6.7(b) and
6.7(c).

Sikkim Manipal University

B1966

Page No. 125

Database Management Systems

Unit 6

Table 6.7(a): WORKER Relation

Table 6.7(b): WORKER_SKILL


Relation

Worker

Worker_skill

Name

Age

Addr

Name

Skill

Adah

23

Adah

Work

Andrew

29

Jone

Smithy

Barath

22

Elbert

Discuss

Jone

19

Helen

Driver

Donald

23

Wilfred

Fitter

Elbert

26

Marg

Smithy

George

28

Rita

Fitting

Helen

15

Table 6.7(c): RESULT Relation

Result
Name

Age

Addr

Adah

23

Work

Andrew

29

Barath

22

Jone

18

Donald

16

Elbert

43

George

41

Helen

27

Smithy
Discuss
Driver

In the above example, even though there is no matching row with B


name, all workers are listed along with age and skill. If there is no
match, you will simply get an empty skill column. The outer join can be
used when we want to keep all the tuples in R or in S; those in both
relations, whether or not they have matching tuples in the other relation.
There are three types of outer joins. They are as follows:
o Left outer join It is denoted by
. The left outer join operation
keeps every tuple in the first or left relation R in relation R
S. If

Sikkim Manipal University

B1966

Page No. 126

Database Management Systems

Unit 6

no matching tuple is found in S in the join, result is filled with null


values.
o
o

Right outer join It is denoted by


, and it keeps every tuple in
the second or right relation S in the result of R.
Full outer join It is denoted by
, and it keeps all tuples in both
the left and right relations; when no matching tuples are found they
are filled with null values as needed.

Self-Assessment Questions
8. _______________________ and ___________________________
are the two types on which relational algebra is classified.
9. JOIN operation is a _______________________ operation.
10. Cartesian product is based on _______________________ operation.
11. Union is denoted by the symbol _________.
12. _______________________ is a binary operation which is used to
combine two relations.
13. _________________________ operation is represented by pie.
14. ________________, __________________ and ____________ are
the three types of outer join.

6.5 Relational Calculus


Relational calculus can be used when there are higher level relational
queries, and it is considered to be a notation for specifying the relational
queries. There are two types of relational calculus. They are:
Tuple relational calculus
Domain relational calculus
6.5.1 Tuple relational calculus
The tuple calculus is based on specifying a number of tuple variables. A
tuple variable is a variable that ranges over some named relation that is a
variable whose permitted values are only tuples of that relation. This means
that in a relation R, if the tuple variable t ranges over R, then at any given
time T represents some individual tuple of R.
Example 1: Consider that a simple tuple relational calculus query is of the
form,
{T| COND (t) and t.marks>60%}
Sikkim Manipal University

B1966

Page No. 127

Database Management Systems

Unit 6

Here, t is a tuple variable that ranges over relation STUDENT. Each tuple
in a STUDENT relation that satisfies the condition, that is, marks > 60% will
be retrieved.
Example 2: Retrieve std_ID, std_name and class of students who are
residing at Bangalore. So the query will be
t.std_ID, t.std_name, t.class | STUDENT (t) and t.city = Bangalore.
In this, we first specify the requested attributes and then the condition.
Formula specification of tuple relational calculus:
The expressions of the tuple calculus are constructed from the following
elements:

A general expression of the tuple relational calculus is of the following


form:
{t1, A1, t2, A2tn|COND (t1, t2, tA, tn..tn+m)}

where t1, t2, are tuple variables ranging over relation R; COND is a
formula of the tuple relational calculus, where tA represents the component
of t, where A is an attribute of the relation.
Conditions of the form x * y where * is any of the following =, !=, <, >, <=.
Well-formed formulas (Wff):
A Wff is constructed from one or more atoms connected via Boolean
operators (AND, OR NOT) and quantifiers () according to the rules
below:
1. Every atom is a formula.
2. If F1 and F2 are formulas, then so are (F1 and F2), not (F1), and not
(F2). The truth values of these four formulas are derived from their
component formulas F1 and F2 as follows:
a) (F1 and F2) is TRUE if both F1,and F2 are TRUE; otherwise, it is
FALSE.
b) (F1,and F2) is FALSE if both F1 and F2 are FALSE; otherwise, it is
TRUE.
c) Not (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE
d) Not (F2) is TRUE if F2 is FALSE; it is FALSE if F2 is TRUE.

Sikkim Manipal University

B1966

Page No. 128

Database Management Systems

Unit 6

Free and bound variables:


Each tuple variable within a formula is either free or bound if it is quantified,
meaning that it appears in an (t) or (t) clause; otherwise it is free.
The two special symbols called quantifiers that can appear in formulas are
the universal quantifier () and the existential quantifiers ().

If F is a formula, then so is (t) (F), where t is a tuple variable. The


formula (t) (F) is true if the formula evaluates to true for at least one
tuple assigned to free occurrences of t in F; otherwise (t) (F) is false.

If F is a formula, then so is (t) (F), where t is a tuple variable. The


formula (t) (F) is true if the formula evaluates to true for every tuple
assigned to free occurrences of t in F; otherwise (t) (F) is false.

A tuple variable t in F is bound in a formula F1 of the form:


F1 = (t)(F) or F1 = (t)(F)
Consider an example: F1 = d, Dname=Research
F2 = (t) (d.Dnumber=t.Dno)
Here, tuple variable d is free in both F1 and F2, whereas t is bound to the
quantifier in F2. The existential quantifier is read there exists.

Select the student information who are not from London: t.std_ID,
t.Bdate, t.address | STUDENT(e) and NOT (t.city=london)

Another example:
For every project located in Bangalore, list the subject code, subjects
opted and the faculty in charges last name, address:
s.sub_code,s.sub_opt,f.last_name,f.address | SUBJECT (s) and STUDENT
(f) and s.location = Bangalore and ((d)(FACULTY(d))
s.sub_code=d.sub_code and d.faculty_code=f.faculty_code)

and

Example, to get the names, subject names of all students whose marks
obtained are greater than 70%.
(s.std_name, d.name | STUDENT
d.sub_code=s.sub_code) and

(s)

and

SUBJECT

(d)

and

{( m) (marks(m)and m.std_ID=s.std_ID andm.marks>70%)}


Sikkim Manipal University

B1966

Page No. 129

Database Management Systems

Unit 6

The existential quantifier is read as there exists. The above example


can be read as there exists marks tuple with std_ID value equal to the
std_ID component of STUDENT relation and marks obtained greater than
70%.
Consider Px is a tuple variable for relation parts (P). The symbol
represents the universal quantifier (for all). This formula can therefore be
read for all p tuples the color is RED.
Safe expressions:
A safe expression in relational calculus is one that is guaranteed to yield a
finite number of tuples as its result; otherwise, the expression is called
unsafe. For example,
{ t | not(STUDENT(t))} is unsafe because it yields all tuples in the universe
that are not student tuples, which are infinite; such values are not in the
domain of the expression.
6.5.2 Domain relational calculus
There is another type of relational calculus called domain calculus. The
domain calculus differs from the tuple calculus in the types of variables used
in formulas. In domain calculus the variables range over single values from
domains of attributes rather than ranging over tuples. In other words,
domain variables take on values from an attribute domain rather than values
for an entire tuple. An expression of domain calculus is as follows:
For example, {<x1, x2 . xn> | COND(x1,x2..xn)}
where, x1, x2,, xn represents domain variables, and COND is a condition
or formula.
Expressions of the domain calculus are constructed from the following
elements:
Domain variable D, E, F,. Each domain variable is constrained to
range over some specified domain.
X * Y, here x and y are domain variables. * is one of the comparison
operators in the set {=, !=, <,>, <=,>=} and xi and xj are domain
variables.
Consider R) x1, x2, , xi) where R is the name of a relation. This states
that a list of values of <x1, x2, , xj>, must be a tuple in the relation R,
where xi is the value at the attribute value of the tuple.
Sikkim Manipal University

B1966

Page No. 130

Database Management Systems

Unit 6

For example: Find the branch name, loan number and amount for loans of
over 25,000
{<vmKma> | (<b,L,a> borrower ^ b(<b,L,a> loan ^b=Bombay))}

Find the names of all customers who have a loan from the Bombay
branch and find the loan amount.

{ <c,a>| L(<c,L>) borrower ^ b(<b,L,a> loan ^=Bombay)}


6.5.3 Tuple relational calculus versus domain relational calculus
Now that we have studied about two different types of calculus, it is now
time to draw the differences between the two. The differences between the
two types are given in Table 6.8.
Table 6.8: Tuple Relational Calculus versus Domain Relational Calculus
Tuple relation calculus

Domain relation calculus

A Tuple variable is a variable that


ranges over tuples of values.

Domain variable is a variable whose


value is drawn from the domain of an
attribute unlike an entire tuple.

The result of a query with respect to


a given database is the set of all
choices of tuples for the variables T
that make the query condition a true
statement about the database.

The result of query with respect to a


given database is a set of all tuples
such that for I = 1, , n, if xi is
substituted for the free variable Xi, then
condition(x1, , xn) is a true statement
about the database.

Every atom is a formula.

Expressions of the domain are the


formula.

Now the time has come to point out the differences between relational
algebra and relational calculus.
6.5.4 Relational algebra versus relational calculus
Relational algebra is procedural whereas relational calculus is nonprocedural.
Expressive power of relational algebra and relational calculus are
equivalent. This means that any query that could be expressed in
relational algebra can be expressed by formulas in relational calculus.
Self-Assessment Questions
15. __________________________ and _________________________
are the two types of calculus.
Sikkim Manipal University

B1966

Page No. 131

Database Management Systems

Unit 6

16. _______________________ is a variable that ranges over some


named relation that is a variable whose permitted values are only
tuples of that relation.
17. Wff stands for ______________________________________.
18. ____________________ expression guarantees to yield a finite
number of tuples as its result.

6.6 Summary
Let us recapitulate the important concepts discussed in this unit:
Relational database is composed of many relations and tuples. Each
tuple is related to one another in a number of ways. The three main
types of constraints are inherent-model-based, schema-based and
application-based. Schema-based constraint is also called explicit
constraint. The different types of schema-based constraints are domain
constraints, key constraints, constraints on NULLs, entity-integrity
constraints and referential-integrity constraints.
The three operations are insert operation, delete operation and modify
operation.
Relational algebra is classified based on two types, namely,
mathematical set theory and operations for relational databases. Set
theoretic operations are based on mathematical set theory. The
common operations in relational algebra based on this type are Union,
Intersection, Set difference and Cartesian product. Relational operations
are based on operational for relational databases. The operations based
on this type are SELECT, PROJECT and JOIN.
Relational calculus can be used when there are higher level relational
queries and is considered to be the notation for specifying the relational
queries. There are two types of relational calculus. They are tuple
relational calculus and domain relational calculus

6.7 Glossary
Atomic: Atom is single value. The cell having atomic value means, in a cell
there is only one value which is allowed in the table.
Constraint: Constraint is an element factor to restrict an entity, project or a
system.
Entity: Entity is something that exists by itself.
Sikkim Manipal University

B1966

Page No. 132

Database Management Systems

Unit 6

Integrity: Integrity is a concept of consistency in values methods and


components. Referential integrity is a database concept that ensures that
the relationships between tables remain consistent.
NULL: NULL means no value. In a table any cell cannot have NULL value if
it has to be normalised. And, any primary key cannot be NULL value.
Programs: Program is a set of instructions written in a specific language to
perform a specific task.
Reference: Reference is a relation between objects in which one object
designates or acts as a means by which to connect to or link to another
object.
Schema: Schema is a description of the database structure in a formal
language that computers can understand.

6.8 Terminal Questions


1. Explain the different relational constraints based on schema constraint.
2. Briefly describe with an example the different update operations in
relations.
3. Consider any one example of a relational database and show how the
different operations of relational algebra can be performed on the table
showing the output.
4. Differentiate between tuple relational calculus and domain relational
calculus.

6.9 Answers
Self-Assessment Questions
1. Inherent-model-based
2. Semantic
3. Answers
a) False
b) False
4. True
5. Retrieval
6. Domain
7. Referential-integrity
8. Mathematical set theory and operations for relational databases
9. Relational
Sikkim Manipal University

B1966

Page No. 133

Database Management Systems

Unit 6

10. Set theoretic


11.
12.
13.
14.
15.
16.
17.
18.

Cartesian product
PROJECT
Left outer join, right outer join and full outer join
Tuple relational calculus and domain relational calculus
Tuple variable
Well-formed formulas
Safe

Terminal Questions
1. The different types of schema-based constraints are domain
constraints, key constraints, constraints on NULLs, entity-integrity
constraints and referential-integrity constraints. (Refer to Section 6.2
for further information.)
2. The various update relations are of three types: insert, modify and
delete operations. (Refer to Section 6.3 for further information.)
3. The relational algebra can be classified based on two main types: set
theoretic operations and relational operations. The common operations
based on set theoretic operations are UNION, INTERSECTION, crossproduct and set difference. Based on relational database there are
SELECT, PROJECT and JOIN operations. (Refer to Section 6.4 for
further information.)
4. A tuple is a variable that ranges over tuples of values whereas domain
variable is a variable whose value is drawn from the domain of an
attribute unlike entire tuple. (Refer to Section 6.5.3 for further
information.)
References/E-References:
E-References:
http://www.google.co.in/url?sa=t&rct=j&q=conceptual%20data%20model
%20for%20database%20design&source=web&cd=7&ved=0CGAQFjAG
&url=http%3A%2F%2Fpeople.stfx.ca%2Frpalanis%2F475%2FConceptu
al.ppt&ei=wIb6T6m5MsexrAfb0angBg&usg=AFQjCNEG6R_KFxyPDSa
CHVYTSrdH0j9lyA (Retreived on 9th July 2012)
http://www.youtube.com/watch?v=mQ4D0drMrYI (Retreived on 12th
July 2012)
Sikkim Manipal University

B1966

Page No. 134

Database Management Systems

Unit 7

Unit 7

Structured Query Language

Structure:
7.1 Introduction
Objectives
7.2 SQL: The Universal Database Language
7.3 Types of SQL Statements
7.4 SQL Tables
Data retrieval statement (SELECT)
7.5 Multi Table Queries
Nested queries or sub queries
Multiple-row nested queries
The exists clause
7.6 Data Manipulation Language
7.7 Creating Databases
7.8 Summary
7.9 Glossary
7.10 Terminal Questions
7.11 Answers
7.12 Case Study

7.1 Introduction
We discussed the role of ER diagram and the different usages of the
notations in unit 5. Once database design is done, it needs to be
implemented. To implement a database, we need well-structured language
that can code the queries in the database. Therefore, in DBMS, Structured
Query Language (SQL) is used to implement the query program.
SQL is a non-procedural language that describes the type of data to be
retrieved, updated or deleted. This is a structured language and has the
capability to update the database and its data. In short, we can say that if
you are trying to do any serious work with the database, you need SQL.
Therefore, in this unit we will discuss various types of SQL statements and
elaborate on SQL tables. We will also study multiple-table queries and how
to deal with SQL in creating databases. In addition, we will discuss how to
use SQL and explain the same with examples.

Sikkim Manipal University

B1966

Page No. 135

Database Management Systems

Unit 7

Objectives
After studying this unit, you should be able to:
define SQL
list the different types of SQL
elucidate multiple-table queries
elaborate on data manipulation language
demonstrate creating the databases

7.2 SQL: The Universal Database Language


You may feel that in many textbooks and technical magazines, the term
SEQUEL is used most of the times for referring to programming language in
RDBMS. That is because, in 1970, when relational DBMS was introduced, it
necessitated the need for a programming language and the first query
language used at that time was structured English query language. It was
shortly called as SEQUEL. Then IBM upgraded the SEQUEL by adding
features of application programming interface and called it as SEQUEL/2.
Thereafter, due to some legal issues SEQUEL/2 was later termed as SQL.
SQL helped relational database model which was introduced in 1970 by
Dr. E. F. Codd. Prior to the relational model, users needed to have a
complete knowledge about how the data are physically linked with each
other to access them. When there was a change in the model of the
database, the programmers also needed to change the entire code for their
applications though the logical structure was the same. This has been
overcome in the relational model where data and its applications are
independent of each other; if you make any change in the physical
structure, you will not lose any data and also you are not required to
remember how the data is linked to each other. The introduction of relational
model changed the thinking of the people in managing the data. There was
a remarkable transformation in the database technology from managing the
data being an art into a scientific industry after this revolution.
SQL is a relational database language used to communicate with a
database. The American National Standards Institute declared the standard
language for RDBMS. Accordingly, the common relational database
management systems that SQL uses are Oracle, Sybase, Microsoft SQL
Server, Access, and so on. When the database is structured using the
Sikkim Manipal University

B1966

Page No. 136

Database Management Systems

Unit 7

relational data model, SQL is a suitable language because SQL is designed


in such a way that suits the tables. You will now recall as in relational data
model, that SQL databases are composed of a set of rows and columns
called tables and also called as data dictionary.
Now, let us refer back to Units 3 and 4 for each of these terms: relational
database system contains one or more objects called tables. Tables contain
the data or information and have uniquely identified names. Columns
contain column name, data type and any other attributes for the column.
The data or information is stored in the row.
For example, to show a simple structure of a table, consider a sample table
of an employee database having the columns as employee name, address,
salary, department and phone number. The rows constitute the data for a
table.
EMPLOYEE
Emp _name

Address

Salary (Rs.)

Department

Phone_no

Mr. Surat

Orissa

100,000

Purchase

9886725499

Ms. Nandini

Bangalore

50,000

Accounts

9986752388

Ms. Usha Nair

Hyderabad

25,000

Accounts

9886743477

Mr. Abhishek

Delhi

75,000

Marketing

8017634122

Mr. Reddy

Hyderabad

40,000

Marketing

9985467233

Though SQL was introduced in 1970, it was first standardised in 1986 and
universally adopted. This language became famous even for non-relational
database systems. SQL provides support for a variety of professionals like
programmers, analysts, designers, database administrators, and so on,
unlike the basic programming language like C and COBOL, which provide
support for only specific domain of programmers.
SQL is a special purpose non-procedural language which is used to support
database applications and one cannot write general purpose applications
with it.
Now let us see how a simple SQL statement can be written with an example
of an EMPLOYEE database. The format of the simple SQL statement is as
given below:
SELECT <field name> FROM < table name> WHERE <condition>
Sikkim Manipal University

B1966

Page No. 137

Database Management Systems

Unit 7

So to retrieve names of employees who are earning a salary of Rs.10,000:


The following SQL statement is to be used:
SELECTemp_name FROM EMPLOYEE WHERE salary = 10,000
In this SQL statement, there are three clauses, SELECT, FROM and
WHERE.
Any FROM clause is used along with a table name.
In the above example,
FROM EMPLOYEE means the data is retrieved from the employee
database or table.
WHERE clause is used to specify the condition.
WHERE salary = 10,000 means the selection is based on the criteria that
the employee whose salary is Rs.10,000 need to be selected.
SELECT clause is used to select the field names as required
SELECTemp_name means the specified column name is listed after
satisfying the condition.
Self-Assessment Questions
1. SQL is a _______________________ language.
2. SEQUEL stands for_______________________________________.
3. _____________________ contain the data or information and have
uniquely identified names.
4. ____________________ constitute a data in the table.

7.3 Types of SQL Statements


In the previous section, we saw a simple SQL statement which can be used
to retrieve a specific column. But since relational data model is such a vast
concept and is not able to manipulate with this simple statement, SQL
statement can be categorised into the following four types:
DDL (Data Definition Language)
DML (Data Manipulation Language)
DCL (Data Control Language)
TCL (Transaction Control Language)

Sikkim Manipal University

B1966

Page No. 138

Database Management Systems

Unit 7

DDL: Data Definition Language


The DDL statement provides commands for defining relation schema, that
is, for creating tables, indexes, sequences, and so on and commands for
dropping, altering renaming objects.
CREATE to create objects in the database
ALTER to alter the structure of the database
DROP to delete objects from the database
TRUNCATE to remove all records from a table, including all spaces
allocated for the records
COMMENT to add comments to the data dictionary
RENAME to rename an object
DML: Data Manipulation Language
The DML statements are used to alter the database tables. The databases
are managed with the help of these statements. The following are the DML
statements:
SELECT to retrieve data from the a database
INSERT to insert data into a table
UPDATE to update existing data within a table
DELETE to delete all records from a table; however, the space for the
records remain
MERGE UPSERT operation (insert or update)
CALL to call a PL/SQL or Java subprogram
EXPLAIN PLAN to explain access path to data
LOCK TABLE to control concurrency
DCL: Data Control Language
The DCL statements are used to Grant permission to the user and Revoke
already given permissions from the user.

Grant: Grant permission to the user.


For example,
o SQL DBA>Revoke Import from Akash;
o SQL DBA>Grant all on emp to public;
o SQL DBA>Grant select, Update on EMP to L.Suresh;
o SQL DBA>Grant ALL on EMP to Akash with Grant option;

Sikkim Manipal University

B1966

Page No. 139

Database Management Systems

Unit 7

Revoke: Revoke takes out privilege from one or more tables or views.
o SQL DBA> Revoke UPDATE, DELETE FROM INSURES;
o SQL DBA>Revoke all on emp from Akash;

TCL: Transaction Control Language


COMMIT save work done
SAVEPOINT identify a point in a transaction to which you can later roll
back
ROLLBACK restore database to original since the last COMMIT
SET TRANSACTION change transaction options like isolation level
and what rollback segment to use
SQL* Commands
This subsection discusses the often used commands in SQL environment.
For example, if your SQL commands are saved in a file (typically in note
pad) you can execute this file using an at @command. Here * represents
ALL. Similarly, there are a number of such commands, a few are given in
Table 7.1; the format for the command is shown below:
@<filename> Runs the command file stored in <filename>
Table 7.1: SQL Command List
/

Runs the SQL command or PL/SQL block currently


stored in the SQL buffer

EXECUTE

Runs a single PL/SQL statement

RUN

Runs the SQL command or PL/SQL block currently


stored in the SQL buffer

R<filename>

Runs the file specified in <filename>

EXIT or QUIT

Exits from SQL

LIST

Lists the content of the buffer

APPEND<text>

Adds text at the end of the line

CLEAR BUFFER

Deletes all the lines in the buffer

GER<filename>

Loads host OS file into SQL BUFFER (does not


execute it)

SAVE<filename>

Saves contents of buffers to OS file

DEFIN_EDITOR=
notepad

Defines notepad as the editor

EDIT

Invokes the editor defined through

Sikkim Manipal University

B1966

Page No. 140

Database Management Systems

Unit 7

Data Types in Oracle 8i SQL


In Table 7.2, you can see the list of data types allowed in Oracle. This is the
complete set of data types.
Table 7.2: Data Types in Oracle
Data Type

Description

CHAR (size)

Fixed length character. Max = 2,000

VARCHAR2(size)

Variable length character. Max = 4,000

DATE

Date, valid range is from Jan 1, 4712 B.C. to


Dec 31, 4712 A.D.

BLOB

Binary large object Max = 4 GB

CLOB

Character large object Max = 4 GB

BFILE

Pointer to binary OS file

LONG

Character data of variable size, Max = 2 GB

LONG RAW

Raw binary data. Rest is same as long

NUMBER (size)

Numbers. Max. size = 40 digits

NUMBER(size,d)

Numbers, range = 1.0E-130 to 9.9E125

DECIMAL

Same as NUMBER. Size /d cant be specified

FLOAT

Same as NUMBER

INTEGER

Same as NUMBER. Size /d cant be specified

SMALLINT

Same as NUMBER

Example Tables:
To study the SQL commands of various types we need some tables. Let us
consider Tables 7.3 and 7.4, which will be used throughout our discussion.
Table 7.3: Employee Relation
Ssn

Name

Bdate

Salary

Mgrssn

Dno

1111

Deepak

5-Jan-62

22,000

4444

2222

Yadav

27-Feb-84

30,000

4444

3333

Venkat

22-Jan-65

18,000

2222

4444

Prasad

2-Feb-68

32,000

Null

5555

Reena

4-Aug-79

8,000

4444

Sikkim Manipal University

B1966

Page No. 141

Database Management Systems

Unit 7

Table 7.4: Department Relation


Dno

Dname

Loc

Admin

Chennai

Research

Bangalore

Accounts

Bangalore

In the above tables, we have EMPLOYEE table in which Ssn is an integer


type; name is an array of characters; Bdate is a data of type date; salary is
an integer; Mgrssn is of type integer; and Dno is an integer. Note that Dno
of EMPLOYEE relation is a unique key in DEPARTMENT relation.
Therefore, Dno is a primary key in DEPARTMENT relation and a foreign key
in EMPLOYEE relation.
Self-Assessment Questions
5. DROP is a __________________________ command.
6. The database is managed with the help of ________________
statement.
7. _______________________ command adds text at the end of the line.
8. __________________ data type is used to describe binary large
object.
9. The maximum capacity for storing character large object using CLOB
command is ________________.

7.4 SQL Tables


To work with tables, one needs to have the knowledge of selecting and
creating the database and then manipulating the tables.
In the following sections, we will discuss various features of tables and the
activities that can be performed on the table.
7.4.1 Data retrieval statement (SELECT)
The select statement is used to extract information from one or more tables
in the database. To demonstrate SELECT, we must have some tables
created within a specific user session. Let us assume a default user and
create two tables EMPLOYEE and DEPARTMENT using CREATE
STATEMENT.

Sikkim Manipal University

B1966

Page No. 142

Database Management Systems

Unit 7

Create table DEPARTMENT (


Dno
number (d) not null
Dname
varchar2 (10) not null
Loc
varchar2 (15)
Primary key (Dno));
Create table EMPLOYEE (
SSN number (4) not null
Name varchar3=2(20) not null
Bdate date,
Salary number (10,2)
MgrSS number (4)
DNo number (2) not null
Primary
key (SSN)
Foreign key [MgrSSN] reference EMPLOYEE (SSN)
Foreign key (DNo) reference DEPARTMENT(DNo))
As you already know, the general syntax of SELECT STATEMENT is given
below:
Select* | {[DISTINCT] column | expression \}From table(s);
The basic select statements are of two types:
A SELECT clause
A FROM clause
To select all columns,
Consider for example, Select * From EMPLOYEE
The * indicates that it should retrieve all the columns from employee table.
The output of this query is shown in Table 7.5.
Table 7.5: Output1
Dno

Ssn

Name

Bdate

Salary

Mgrssn

1111

Deepak

5-Jan-62

20,000

4444

2222

Yadav

27-Feb-60

30,000

4444

3333

Venkat

22-Jan-65

18,000

2222

4444

Prasad

2-Feb-84

32,000

Null

5555

Reena

4-Aug-65

8,000

4444

Sikkim Manipal University

B1966

Page No. 143

Database Management Systems

Unit 7

Consider another example, Select * From EMPLOYEE


Order by Ssn;
The output of the above query is given in the Table 7.6.
Table 7.6: Output2

Ssn

Name

Bdate

Salary

Mgrssn

Dno

1111

Deepak

27-Feb-84

30,000

4444

2222

Yadav

15-Jan-65

8,000

4444

3333

Venkat

22-Jan-85

20,000

2222

4444

Prasad

27-Feb-84

32,000

Null

5555

Reena

15-Jan-65

8,000

4444

Selecting specific columns


If we wish to retrieve only name and salary of the employees, then the query
used will be:
SELECT name, salary FROM EMPLOYEE;
The output of this query is given in Table 7.7.
Table 7.7: OUTPUT3
Name

Salary

Prasad

32,000

Reena

8,000

Deepak

22,000

Venkat

30,000

Yadav

18,000

Using arithmetic operators


SELECT name, salary, salary * 12
FROM EMPLOYEE;
The output for the above query is given in Table 7.8.

Sikkim Manipal University

B1966

Page No. 144

Database Management Systems

Unit 7

Table 7.8: Output4


Name

Salary

Salary *12

Prasad

32,000

384,000

Reena

8,000

96,000

Deepak

22,000

264,000

Yadav

18,000

360,000

Venkat

30,000

216,000

Using Aliases (Alternate name given to columns)


SELECT \ (Name, Salary, Salary *12 YRLY SALARY
FROM

Employee)

The output of the above query is given in Table 7.9.


Table 7.9: Output5
Name

Salary

Salary *12

Prasad

32,000

384,000

Reena

8,000

96,000

Deepak

22,000

264,000

Yadav

18,000

360,000

Venkat

30,000

216,000

Eliminating duplicate rows


To eliminate duplicate rows simply use the keyword DISTINCT.
SELECT DISTINCT MGRSSN FROM Employee
The output for the above query is shown below:
OUTPUT
MGRSSN
2222
4444
Displaying table structure
To display schema of table, use the command DESCRIBE or DESC.
DESC Employee;

Sikkim Manipal University

B1966

Page No. 145

Database Management Systems

Unit 7

The output for the above query is shown in Table 7.10.


Table 7.10: Output6
Name

Null?

Type

Ssn

NOT NULL

NUMBER [4]

NAME

NOT NULL

VARCHAR2 [20]

BDATE

DATE

SALARY

NUMBER [10,20]

MGRSSN

NUMBER [4]

DNO

NOT NULL

NUMBER [2]

SELECT statement with WHERE clause


The conditions are specified in the where clause. It instructs SQL to search
the data in a table and returns only those rows that meet the search criteria.
SELECT * FROM emp
WHERE name = Yadav
Table 7.11 shows the result of the above query.
Table 7.11: Output1
Ssn

Name

Bdate

Salary

Mgrssn

Dno

2222

Yadav

10-Dec-60

30,000

4444

SELECT Name, Salary FROM Employee


WHERE Salary >20,000;
The result of the above query is shown in Table 7.12.
Table 7.12: Output
Name

Salary

Prasad

32,000

Deepak

22,000

Yadav

18,000

If you observe Tables 7.11 and 7.12, you will notice that though the query
looks similar the output is different.

Sikkim Manipal University

B1966

Page No. 146

Database Management Systems

Unit 7

Relational operators and comparison conditions


Table 7.13 gives you a clear picture of the meaning of the relational
operators.
Table 7.13: List of Relational Operators
=

Equal to

>

Greater than

Greater than or equal

<

Less than

Less than or equal

<>

Not equal

BETWEEN<a>AND<b>

Range between <a> and <b> inclusive

IN<set>

True when the member is in the <set>

LIKE<pattern>

Matches a specified pattern

IS NULL

Is a null value

Between and or operator


To illustrate the between and or operators, SQL supports range searches.
For example, if we want to see all the employees with salary between
Rs.22,000 and Rs.32,000:
SELECT * from employee
WHERE salary between 22,000 and 32,000
The output of the above query is given in Table 7.14.
Table 7.14: Output
Name

Salary

Prasad

32,000

Yadav

30,000

Deepak

22,000

IS NULL OR IS NOT NULL


The null tests for the null values in the table.
For example, the query,
SELECT
Name FROM
Employee
WHERE
Mgrssn IS NULL;
Sikkim Manipal University

B1966

Page No. 147

Database Management Systems

Unit 7

Will result in the following output:


NAME
Prasad
Sorting (order by clause)
It gives a result in a particular order. Select all the employee lists sorted by
name, salary in descending order.
Select* from emp order by basic;
Select job, ename from emp order by joindate desc;
Desc at the end of the order by clause orders the list in descending order
instead of the default [ascending] order.
For example:
SELECT* FROM EMPLOYEE
ORDER BY name DESC;
Will result in:
NAME
Reena
Pooja
Deepak
Aruna
Like Condition:
WHERE clause with a pattern searches for sub-string comparisons.
SQL also supports pattern searching through the like operator. We
describe patterns using two special characters.
o Percent [%]. The % character matches any sub-string.
o Underscore (_) The underscore matches any character.
For example:
SELECT emp_name from emp
WHERE emp_name like Ra%
Will result in:
NAME
Raj
Rama
Ramana

Sikkim Manipal University

B1966

Page No. 148

Database Management Systems

Unit 7

Consider another example,


SELECT emp_name FROM emp WHERE name starts with R and has j
as third character.
SELECT *from Emp
WHERE EMP_NAME LIKE r_J%;
WHERE clause using IN operator
SQL supports the concept of searching for items within a list; it compares
the values within a parenthesis.
1. Select employee, who are working for department 10 & 20.
SELECT * from emp
WHERE deptno in (10, 20);
2. Selects the employees who are working in the same project as that in
which Raja works in.
SELECT eno from emp
WHERE (pno) IN (select pno from works_on where empno=E20);
Aggregate functions and grouping
Group by clause is used to group the rows based on certain common
criteria; for example, we can group the rows in an employee table by the
department. For example, the employees working for department number 1
may form a group, and all employees working for department number 2 may
form another group. Group by clause is usually used in conjunction with
aggregate functions like SUM, MAX, MIN, and so on. Group by clause runs
the aggregate function described in the SELECT statement. It gives
summary information.
An example, consider the following query:
For each department, retrieve the department number, the number of
employees in the department and their average salary. The query will be:
SELECT Dno, count (*) No. of Employees
FROM employee
GROUP BY DNO;

Sikkim Manipal University

B1966

Page No. 149

Database Management Systems

Unit 7

The result of the above query is as shown in Table 7.15.


Table 7.15: Output
Dno

No. of employees

30

40

52

53

Consider another query


Select total salary for each department.
SELECT deptno sum (salary) from emp
GROUP BY depno;
The result is given in Table 7.16.
Table 7.16: Output
DNO

SUM(SALARY)

22,000

18,000

7,000

Consider the following queries:


For each project, retrieve the project number, project name and the
number of employees who work on the total project.
SELECT
Pnumber, Pname, count(*)
FROM
project, works on
WHERE
Pnumber=PNO
GROUP BY
P number, Pname;

Retrieve total number of employees in the research department.


SELECT COUNT(*) FROM employee, department
WHERE Dno=Dnumber and
Dname= Research;

Find the sum of salaries, the maximum and minimum salary of all the
employees.
SELECT Sum(salary), Max(salary), Min(salary)
FROM emp;

Sikkim Manipal University

B1966

Page No. 150

Database Management Systems

Unit 7

Sum(salary)

Max(salary)

Min(salary)

50,000

32,000

8,000

Find the sum of the salaries of all employees of the Research


department, as well as the maximum salary and minimum salary in this
department.
SELECT SUM(salary), MAX(salary), MIN(salary)
FROM emp, department
Where DNO=Dnumber andDname= Research;

Having Clause
The having clause filters the rows returned by the group by clause.
To demonstrate this clause, consider the following queries:
Select job, count (*) from EMP group by job having count (*)>20;
Select Deptno, max(basic), min(basic) from EMP group by Deptno
having salary >30,000;
Find the average salary of only department1.
SELECT DnO,avg(salary)
FROM Employee
GROUP BY Dno
HAVING Dno = 1;

For each department, retrieve the department number, Dname and the
number of employees working in that department if the department
should contain more than three employees.
SELECT
Dno, Dname, count(*)
FROM
Emp, Dept.
WHERE
Emp.Dno=dept.Dno
GROUP BY
Dno
HAVING
count (*)>3;

Here, where_clause limits the tuples to which functions are applied and the
having clause is used to select individual groups of tuples.

For each department that has more than three employees, retrieve the
department number and the number of its employees earning more than
Rs.10,000.
SELECT Dno, AVG (salary)
FROM Employee

Sikkim Manipal University

B1966

Page No. 151

Database Management Systems

Unit 7

WHERE Bdate LIKE %jan%


GROUP BY Dno
HAVING max (salary) >10,000;
The output will be:
DNO AVG (SALARY)
1. 22,000
2. 18,000
3. 20,000
Self-Assessment Questions
10. To eliminate duplicate rows from the table __________________
keyword is used.
11. ___________________________ operator is used to match a specified
pattern.

7.5 Multi Table Queries


So far, we have discussed the queries that use only one table in the clause.
We may have to deal with many situations where we have to use more than
one table for retrieval of data or updating the database of a referencing
table. In this section, we will take you through how to deal with multiple
tables using SQL query.

Simple equi-joins
We must follow the guidelines given below to join two tables together:
o Table names in the FROM clause are separated by commas.
o Use appropriate joining condition. This means that the foreign key of
Table 1 will be made equal to the primary key of Table 2. This
column acts as the joining attribute. For example, dno of employee
table and dno of department will be involved in the joining condition
of WHERE clause.

The following example demonstrates the equi-join and the purpose is to


display the employee names and the department names for which they
work.
SELECT
NAME, DNAME
FROM
Employee, Department
WHERE
employee.Dno = department.Dno;

Sikkim Manipal University

B1966

Page No. 152

Database Management Systems

Unit 7

Given below is the output for the query:


NAME
DNAME
Prasad
Accounts
Reena
Accounts
Deepak
Admin
Venkat
Accounts
Pooja
Research
Let us now
department.
SELECT
FROM
WHERE

try to display only employees working for the Accounts

Name, salary, Dname


Employee, department
(Emplyee.DNO = Department.DNO)
AND (Dname = Accounts);
The above query results in:
NAME
SALARY
DNAME
Prasad
32,000
Accounts
Reena
8,000
Accounts
Venkat
30,000
Accounts
Self-Join and Table Aliases
The self-join is one where you involve the same table in the join. This is
illustrated in the following example. This technique is used fully to solve
many queries.
For example:
To find the employee who earns more than Venkat, the query will be:
SELECT
e1.name, e1.salary
FROM
Employee e1, Employee e2
WHERE
(e1.salary > e2.salary) AND (e2.name = Venkat);
And the result is:
NAME
SALARY
Prasad
32,000
Outer Joins
Outer joins are used to display rows that do not meet the join condition. For
left outer join, use a plus sign (+) to left condition, and for right outer join use
the plus sign (+) to the right condition. The syntax for left and right outer
joins is given below:
Sikkim Manipal University

B1966

Page No. 153

Database Management Systems

Unit 7

Left outer join


SELECT
table1.col, table2.col
FROM
table1 t1, table2 t2
WHERE
t1.col (+) = t2.col;
Notice that the plus sign cannot be placed on both sides of the condition.
The example below demonstrates the right outer join by retaining the right
side table (department) tuples and giving null values for the tuples that do
not match the left side table (employee).
SELECT
Name, Dname
FROM
Employee E, Department D
WHERE
E.Name(+) = D.Dname;
The output of the above query is:
NAME
DNAME
Accounts
Admin
Consider another example which is same as the previous examplethe
only difference is that it is a left outer join. So all the left table (employee)
rows are kept, and if no match occurs with the right side table (department)
a null is shown.
SELECT
Name, Dname
FROM
Employee E, Department D
WHERE
E.Name = D.Dname(+);
The output of the above query is:
NAME
DNAME
Deepak
Venkat
Pooja
Prasad
Reena
7.5.1 Nested queries or sub queries
A WHERE clause generally contains a condition, but it can also contain an
SQL query. The query within a WHERE clause is called the inner query or
sub query, and the query that encloses the inner one is called as outer
query or main query. It is also possible to place the inner query within a

Sikkim Manipal University

B1966

Page No. 154

Database Management Systems

Unit 7

FROM or HAVING clause. Using nested queries it is possible to build


powerful SQL programs.
Execution of nested queries
The general syntax of a nested query is given below:
SELECT
<column (s)>
FROM
table ---------------------------------------- outer query
WHERE
<condn> operator
(SELECT
<column>
FROM table); ---------------------------------------- inner query
The operator mentioned in the outer query can be any one of >, =, or IN.
Normally, the outer query uses the result of the inner query to display the
values of columns mentioned in the outer query.
Single-row nested queries
The simplest single-row nested query is by using = sign.
For example, assume that we wish to display the names and the employees
working for the accounts department.
SELECT
Name
FROM
Employee
WHERE
Dno =
(SELECT
DNo
FROM Department
WHERE
Dname = accounts);
The output of the above query is:
NAME
Prasad
Reena
Venkat
GROUP BY clause in SUB QUERIES
If you have to display all the employees drawing more than or equal to the
average salary of department number 3, then the following query can be
used:
SELECT
Name, Salary
FROM
Employee
WHERE
Salary >=
Sikkim Manipal University

B1966

Page No. 155

Database Management Systems

Unit 7

(SELECT
AVG (salary)
FROM Employee
GROUP BY Dno
HAVING
dno = 3);
The output of the above query is:
Name
Salary
Prasad
32,000
Venkat
30,000
7.5.2 Multiple-row nested queries
The operators IN, ANY and ALL are used in the multiple-row sub queries.
The descriptions of these operators are shown in Table 7.17. The sub query
in this case returns more than one row.
Table 7.17: Operators in Multiple-Row Nested Query
Operators

Description

IN

Equal to any member in the list.

ANY

Compare value to each value returned by the sub query.

ALL

Compare value to all the values returned by the sub query.

Consider an
employee:
SELECT
FROM
WHERE

example of a query to display the name of the highest paid


Name, salary
Employee
Salary IN
(SELECT
MAX (Salary)
FROM Employee)

Remember that the multiple-row sub queries expect one or more results. In
this example, the inner query gives a single value and the next example
shows a set of values. Table 7.18 gives an idea of how to use ANY and
ALL.

Sikkim Manipal University

B1966

Page No. 156

Database Management Systems

Unit 7

Table 7.18: Use of Operators


Operator

Meaning

Example

<ANY

Less than the


maximum

e<ANY (5.3.8) e is less than any single item in the


life (5, 3, 8). Even 7 qualifies, because 7<8

>ANY

More than the


minimum

e>ANY (5, 3, 8): e is less than any single item in


the list (5, 3, 8). Even 4 qualities, because 4>3

<ANY

Same as IN

e = any (5, 3, 8). ALL value in the list quality

<ALL

Less than the


maximum

e <ALL (5, 3, 8); anything below 3 qualifies

>ALL

More than the


maximum

e > ALL (5,3,8): anything greater than 8 qualifies

!=ALL

Not equal to
any thing

E! = (5, 3, 8): anything other than 5, 3 and 8


qualifies

Consider the following examples to illustrate the above operators:


Consider query 1:
SELECT
Name, salary
FROM
Employee
WHERE
Salary< ANY
(SELECT
Salary
FROM Employee
WHERE
DNo =3);
The above query results in:
NAME
SALARY
Reena
8,000
Deepak
22,000
Venkat
30,000
Pooja
18,000
Consider query 2:
SELECT
Name, salary
FROM
Employee
WHERE
Salary> ANY
(SELECT
Salary
FROM
Employee
WHERE
DNO =3);
Sikkim Manipal University

B1966

Page No. 157

Database Management Systems

Unit 7

The output of the above query will be:


NAME
SALARY
Prasad
32,000
Deepak
22,000
Venkat
30,000
Pooja
18,000
In this example of query 2, all the rows qualify except the row with salary
Rs.8,000, because all employees draw more than the minimum salary in the
result of the sub query. If your condition is then you get all the rows.
Consider query 3:
SELECT
Name, salary
FROM
Employee
WHERE
Salary< ANY
(SELECT
Salary
FROM
Employee
WHERE
DNO =3);
The output of the above query is:
No rows selected.
If anybody draws a salary lower than the minimum value in the set, their
names will be displayed. Here, nobody draws lower than 8,000 and hence,
there is no output.
Consider query 2:
SELECT
Name, salary
FROM
Employee
WHERE
Salary > ALL
(SELECT
Salary
FROM
Employee
WHERE
DNO =3);
The result is:
No rows selected.
Similar to the previous query nobody draws more than the maximum in
the set and so no output again.

Sikkim Manipal University

B1966

Page No. 158

Database Management Systems

Unit 7

Consider query 3:
SELECT
Name, salary
FROM
Employee
WHERE
Salary = ALL
(SELECT
Salary
FROM
Employee
WHERE
DNO =3);
The output of the above query is:
NAME
SALARY
Deepak
22,000
Pooja
18,000
Obviously, this query should output salaries of employees other than the set
given by the sub query.
7.5.3 The exists clause
The exists clause returns true in a WHERE clause, if the sub query that
follows returns at least one row.
Consider query 1:
Assume that we want to display the names of employees who work for the
Accounts department. We can write it as:
SELECT
Name
FROM
Employee E
WHERE
EXISTS
(SELECT*FROM Department D
SHERE E.DNO = D.DNO AND DNAME = Accounts);
The result of the above query is:
NAME
Prasad
Reena
Venkat

7.6

Data Manipulation Language

A Data Manipulation Language (DML) consists of SQL statements that are


used to insert, delete and update the records in a table.

Sikkim Manipal University

B1966

Page No. 159

Database Management Systems

Unit 7

Insert statement
The general syntax to add a new row into the table is given below:
INSERT INTO table [(column-1I, column-2I)]
Values (value- I, value-2..I);
Using this syntax, you can insert only one row at a time. To insert more than
one row, you can execute the insert statement repeatedly. The simplest
example for INSERT statement is shown below.
EMPLOYEE
INSERT INTO Employee
VALUES (1111, Deepak, 5-jan-82, 0000, 4444,);
To enter more records, we can use / (slash symbol); / is used to execute
the commands stored in the buffer.
Insert
into
EMPLOYEE
&eaddr,&ba);

(empno,eaddr,basic)

values

(&empno,

Delete command
It is a DML statement to delete record(s). The general syntax to delete
command is given below:
DELETE FROM tableWHERE cond;
You should remember that if the WHERE condition is not present in the
query, all the rows in the table are deleted.
For example, DELETE from EMPLOYEEWHERE name = Yadav;
Update command
It is used to change existing values in a table. The general syntax to update
command is given below:
UPDATE tableSET [col I = val I, col2 = val2]WHERE cond];
For example, UPDATE EMPLOYEE SET deptno = 100;
If the WHERE condition is not present in the query, all the rows in the table
are updated.
For example, UPDATE EMPLOYEE SETename= Sourav WHEREempno = 100;

Transaction control language


As you already know, it is used to control transaction. Example for a TCL
command can be,

Sikkim Manipal University

B1966

Page No. 160

Database Management Systems

Unit 7

Commit - It saves changes permanently in the database.


Roll back - It discards/cancels the changes up to the previous commit point.
Save point - It commits/rolls back a particular point.
For example, Commit
Insert
Update..
Save point aa
Delete
#
#
#
Rollback to aa.
Commit to creating and altering database objects (DDL):
The basic motive of this section is to introduce the ways to create and
manipulate the following database objects.
Table A tabular structure that stores data.
View A tabular structure similar to a table but is a collection of data
pulled out from one or more tables.
Sequence Automatically generates a sequence of numbers.
Index Provides an efficient access structure.
Self-Assessment Questions
12. According to simple equi-join guidelines to join two tables together table
names in _____________________ clause are separated by commas.
13. The general syntax for delete command to delete a record(s) is given
by ______________________________________________________
________________________________________________________
14. Commit is __________________ command.

7.7 Creating Databases


Now we shall discuss in detail the ways to create and alter tables with
constraints.

Sikkim Manipal University

B1966

Page No. 161

Database Management Systems

Unit 7

General Syntax to create a table is given below:


:
CREATE
TABLE
datatype.);

tablename(Column_1

datatype,

column_2

You will need to specify the name of the table (it should be unique) and one
or more attributes and their data types.
For example,
CREATE TABLE
Employee (
SSN
number (4)
not null
NAME
varchar (2) (20) not null
BDATE
data,
Mgrssn
number,
Primary key (SSN)
Foreign key (mgrssn) reference employee (SSN));
Alter table statement
After creating a table, one or more columns can be added to this table.
Similarly, columns can be dropped (applies to Oracle9i only) and in either
case the existing table columns will not be affected. For example, assume
that we wish to add a column phone numbers to employee table.
For example,
ALTER TABLE Employee
ADD phone number (7) not null;
Using the same alter command you can modify the data type of a column.
For example, the phone column can be modified from number to varchar2.
ALTER TABLE Employee
MODIFY phone varchar2 (10);
Oracle 8i does not support dropping a column, but oracle 9i does it.
ALTER TABLE employee
DROP COLUMN MODIFY phone;
Dropping a table even when it has data is possible.
SYNTAX: DROP [TABLE] table;
For Example, to drop the employee table, use the following statement:
DROP employee
Sikkim Manipal University

B1966

Page No. 162

Database Management Systems

Unit 7

To rename the table use the RENAME statement as shown below:


RENAME Employee to workers.
VIEWS (Virtual Table)
View is a derived table, which doesnt have storage of its own. Views are
created by picking certain columns from the base table. The advantages of
using views are:
It restricts direct data access from tables, that is, it provides security.
Reduces joining of tables each and every time.
General syntax to create views is given below:
Create view<view name> as select <column/s>
where<condition>

from<table/s>

For example, Create view V1 as Select ssm.ma,e.sa; aru from EMPLOYEE


where Desc V1;
You can create views by referring to more than one table.
Note:
If there are NOT NULL columns which are missing in view, you cannot
insert the records.
If a view is created by referring to more than one table, we cannot do
DML operation except select.
View update (Ins.Del.Update) is possible only if it is created by a single
table.
For example,
Insert into V1 values (11, LSURESH, 50,000)
Update v1 set ename= Akash where empno=101;
Create view v2 as select empno, ename,dept.deptno.deptname from
emp.dept;
Indexing
Indexing provides a faster access (for the columns that are indexed).
Indexes can also be used to ensure that no duplicate values are entered
into a column.
For example, primary key of a table.
General syntax for creating index is given below:
CREATE INDEX index_nameON table (column1, column2);
Sikkim Manipal University

B1966

Page No. 163

Database Management Systems

Unit 7

For example:
Create index ind1 ON EMPLOYEE(empno);
Query 1:
Retrieve the name and address of all employees who work for the research
department.
SELECT
FNAME, LNAME, ADDRESS
FROM
EMPLOYEE, DEPARTMENT
WHERE DNAME = research AND D number=DNO
Query 1 is similar to a SELECTPROJECTJOIN sequence of relational
algebra operations.
Such queries are often called select-project-join queries. In the WHERE
clause of Q1, the conditional DNAME = Research is the selection
condition and corresponds to a SELECT operation in Relational algebra.
Other important examples:
Company database example:
This example uses the following tables and underlines columns that are the
primary keys:
Employee (
SSN char (9), Name varchar2 (10), Bdate Date, Address varchar 2 (30),
Sex chart (1), Salary Number (10, 2) SuperSSN char (9), Dno Number (2))
Department (
Dnumber Number (2), Dname Varchar2 (10), MgrSSj char (9), Mgrstartdate
Date)
Project (
Pnumber Number (2), Pname varchar2 (10). Plocation varchar2 (15).Dnum
Number (2))
Dependent (
ESSN CHAR (9), Dependent name Varchar2 (15), sex char, Bdate Date,
Relationship varchar2 (10))
Dept_locations (
Dnumber Number (2), Dlocation varchar2 (15))
Works_on (
ESSN char (9), PnoNumber (2), Hours Number (3, 1))

Sikkim Manipal University

B1966

Page No. 164

Database Management Systems

Unit 7

Query 1:
Retrieve the name and address of all employees who work for the research
department.
Q1:
SELECT
FNAME, LNAME, ADDRESS
FROM
EMPLOYEE, DEPARTMENT
WHERE DNAME = research AND Dnumber=DNO;
Query 2:
Retrieve the birth date and address of the employee whose name is John
B. Smith.
Q2:
SELECT
BDATE, ADDRESS
FROM
EMPLOYEE
WHERE
FNAME = John AND D minit = B and
LNAME = Smith;
This query involves only the EMPLOYEE relation listed in the FROM
clause.
Query 3:
For every project located in Stafford, list the project number, the controlling
department number and the department managers last name, address and
birth date.
Q3:
SELECT
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
FROM
PROJECT, DEPARTMENT, EMPLOYEE
WHERE
DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION= Stafford;
The join condition DNUM=DNUMBER relates a project to its controlling
department, whereas, the join condition MGRSSN=SSN relates the
controlling department to the employee who manages that department.
Query 4:
Retrieve the name of each employee who has a dependent with the same
first name as the employee.
Q4:
SELECT
E.FNAME, E.LNAME
FROM
EMPLOYEE
WHERE
E.SSN IN (SELECT ESSN FROM DEPENDENT
WHERE
ESSN=E.SSN AND
E.FNAME=DEPENDENT_NAME);
Sikkim Manipal University

B1966

Page No. 165

Database Management Systems

Unit 7

Query 5:
Q5:
SELECT
FNAME, LNAME
FROM EMPLOYEE
WHERE
((SELECT
PNO
FROM WORKS_ON
WHERE
SSN=ESSN)
CONTAINS
(SELECT
PNUMBER
FROM
PROJECT
WHERE
DNUM=5));

Query 6:
List the names of managers who have at least one dependent.
SELECT
FNAM.LNAME
FROM
EMPLOYEE
WHERE
EDISTS (SELECT *
FROM
DEPENDENT
WHERE
SSN=ESSN)
AND
EXISTS (SELECT *
FROM
DEPENDENT
WHERE
SSN=MGRSSN;

One way to write this query is shown in Q7, where we specify two nested
correlates.
Queries: the first one selects all dependent tuples related to an
EMPLOYEE, and the second one selects all department tuples managed by
the EMPLOYEE.
Query 7:
For each employee, retrieve the employees first and last name of his or her
immediate supervisor.
Q7:
SELECT
E.NAME, E.LNAME, S.FNAME, S.LNAME
FROM
EMPLOYEE E.EMPLOYEE S
WHERE
E.SUPERSSN=S.SSN;
In this case, we are allowed to declare alternative relation names E and S,
called aliases, for the EMPLOYEE relation.
Sikkim Manipal University

B1966

Page No. 166

Database Management Systems

Unit 7

Query 8:
Make a list of all project numbers for projects that involve an employee
whose last name is Smith, either as the worker or as the manager of the
department who controls the project.
Q8:
SELECT
PNUMBER
FROM
PROJECT.DEPARTMENT.EMPLOYEE
WHERE
DNUM=DNUMBER AND MGRSSN=SSN AND LNAME=
smith.)
UNION
(SELECT
PNUMBER
FROM
WHERE
PNUMBER=PNO AND ESSN=SSN
AND LNAME= smith;
The first SELECT query retrieves the project that involves Smith as the
manager of the department who controls the project, and the second
retrieves the projects that involve Smith as a worker on the project.
Query 9:
Retrieve the social security numbers of all employees who work on project
numbers 1, 2, 3.
Q9:
SELECT
DISTINCT ESSN
FROM
WORKS ON
WHERE
PNO IN (1, 2, 3);
Query 10:
Find the sum of all salaries of all employees, the maximum salary, the
minimum salary and the average salary.
Q10: SELECT
SUM (SALARY), MAX (SALARY), MIN (SALARY)
AVG (SALARY)
FROM
EMPLOYEE;

Query 11:
Count the number of distinct salary values in the database.

Q.11: SELECT
FROM

COUNT (DISTINCT SALARY)


EMPLOYEE;

Note that if we write COUNT (SALARY) instead of COUNT (DISTINCT


SALARY) IN Q 12, we get the same result as COUNT (*) because
duplicates will not be eliminated.
Sikkim Manipal University

B1966

Page No. 167

Database Management Systems

Unit 7

Self-Assessment Questions
15. __________________ is a derived table that doesnt have storage of
its own.
16. State whether the following statements are true or false:
a) Indexing provides faster access.
b) Indexing can be done with the help of primary key.

7.8 Summary
Let us recapitulate the important concepts discussed in this unit:

SQL is a non-procedural language that describes the type of data to be


retrieved, updated or deleted. SQL is a relational database language
used to communicate with a database. American National Standards
Institute declared the standard language for RDBMS, in which some of
the common relational database management systems that SQL uses
are Oracle, Sybase, Microsoft SQL Server, Access, and so on.

SQL statement can be categorised into four types, namely, DDL (Data
Definition Language), DML (Data Manipulation Language), DCL (Data
Control Language) and TCL (Transaction Control Language).

Working with tables requires the knowledge of selecting and creating the
database and then manipulating the tables. There are various features
of tables and the activities that can be performed on the table.

Using multiple nested queries we can deal with multiple tables using
SQL query.

A DML consists of SQL statements that are used to insert, delete and
update the records in a table.

Creating and altering the table with the constraints are the important
aspects of databases.

7.9

Glossary

BFILE: A bfile is a data type used to store a locator (link) to an external


binary file (file stored outside of the database).
BLOB: BLOB is a binary large object that can hold a variable amount of
data.
Buffer: A buffer is a region of a physical memory storage used to
temporarily hold data while it is being moved from one place to another.
Sikkim Manipal University

B1966

Page No. 168

Database Management Systems

Unit 7

Clause: A clause is a group of related words containing a subject and a


verb.
CLOB: CLOB may refer to Character large object, a collection of character
data in a database management system; Clabber, a trick-taking card game.
Concurrency: Concurrency is a property of systems in which several
computations are executing simultaneously, and potentially interacting with
each other.
Control concurrency: Concurrency control ensures that correct results for
concurrent operations are generated, while getting those results as quickly
as possible.
Editor: Text editor is a type of program used for editing plain text files.
Long raw: Long raw is an Oracle data type for storing binary data of
variable lengths up to 2 GB in length.
Notepad: Notepad is a simple text editor for Microsoft Windows.
PL/SQL: Procedural language/structured query language is Oracle
Corporations procedural extension language for SQL and the Oracle
relational database.
Revoke: A revoke is a violation ranked in seriousness somewhat below
overt cheating, with the status of a more minor offense only because, when
it happens, it is usually accidental.

7.10 Terminal Questions


1. Explain the different types of SQL statements.
2. Create two tables STUDENT and DEPARTMENT using statements in
SQL tables. After creating the table, write the queries for the following:
a) List all the student names and marks obtained from the STUDENT
table who has scored above 70%.
b) List all the students sorted by their names in ascending order.
c) List student names for STUDENT for all the names starting with V
and has A as the fourth character.
3. Describe the different types of joins giving examples for each type.
4. List the different commands in DML and give an example for each.
5. How do you create database and manipulate them? Explain with an
example.
Sikkim Manipal University

B1966

Page No. 169

Database Management Systems

Unit 7

7.11 Answers
Self-Assessment Questions
1. Non-procedural
2. Structured English query language
3. Tables
4. Row
5. DDL
6. DML
7. Append
8. BLOB
9. 4 GB
10. DISTINCT
11. LIKE<pattern>
12. FROM
13. DELETE FROM table WHERE condition
14. TCL
15. View
16. Answers:
a) True
b) True
Terminal Questions
1. SQL statement can be categorised into following four types: They are
DDL (Data Definition Language), DML (Data Manipulation Language),
DCL (Data Control Language) and TCL (Transaction Control Language).
(Refer to Section 7.3 for further information.)
2. Working with tables needs the knowledge of selecting and creating the
database and then manipulating the tables. We have commands like
create table, update, delete, and so on. (Refer to Section 7.4 for further
information.)
3. The different kinds of joins are: simple equi-join, self-join, outer join.
(Refer to Section 7.5 for further information.)
4. The different DML commands are DELETE, UPDATE, and so on. (Refer
to Section 7.6 for further information.)
5. General Syntax to create a table is CREATE TABLE tablename
(Column_1 datatype, column_2 datatype.); you specify the
Sikkim Manipal University

B1966

Page No. 170

Database Management Systems

Unit 7

name of the table (it should be unique) and one or more attributes and
their data types. (Refer to Section 7.7 for further information.)

7.12 Case Study


How to Misuse SQLs FROM Clause?
How to Misuse the FROM Clause?
Relational theory calls relations as what most practitioners know as tables
a list of unordered statements (the rows) about the various qualities (the
column values) associated with some entities uniquely identified by the
primary key. When you join tables, you derive other relations from the
known relations, just like you often derive a theorem from other theorems.
This is basically what joins are about, returning related data from different
sources and creating new relations.
However, when you look closely, you very often realise that the returned
data comes from only some of the tables listed in the FROM clause,
typically something in the guise of:
select distinct a.CUSTOMER_ID, a.CUSTOMER_NAME
from CUSTOMERS a,
ORDERS b
where a.ZIP_CODE in ...
and b.ORDERED_DATE>= ...
and b.CUSTOMER_ID = a.CUSTOMER_ID
order by a.CUSTOMER_NAME;
This is a very simple example of a pattern you meet over and over. Isnt this
a very fine query, which does just what it asks?
In my view, this is a very bad query; its logically flawed, which isnt very
surprising, and its execution plan, execution time and various statistics
prove that it also performs badly. Poor performance is more often than not
the direct consequence of poor logic. The good news is that its possible to
improve the speed of such a slow query significantly. (As anybody who has
ever successfully tuned a SQL query can testify, this is an area where
significant means an order of magnitude of 2, not 20%).
(Source: Faroult, S. (2004). How to Misuse SQLs FROM Clause?. [ONLINE]
Available at: http://onlamp.com/ [Retrieved on 10th August 2012])
Sikkim Manipal University

B1966

Page No. 171

Database Management Systems

Unit 7

Discussion Question:
1. Where is the logical flaw?
Hint: Refer section No. 7.7
References/E-References:
References:
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.

Sikkim Manipal University

B1966

Page No. 172

Database Management Systems

Unit 8

Unit 8

Functional Dependencies and Normalisation

Structure:
8.1 Introduction
8.2 Information Design Guidelines for Relational Databases
8.3 Levels of Relation Schema
8.4 Normalisation Based on Primary Keys
8.5 Summary
8.6 Glossary
8.7 Terminal Questions
8.8 Answers
8.9 Case Study

8.1 Introduction
In Unit 7, you studied about how to create a database using SQL. In this
unit, we will study how to normalise the data in the database. As you have
already studied, normalisation is the process of building database
structures to store data, because any application ultimately depends on its
data structures. Normalisation is the formal process for deciding which
attributes should be grouped together in a relation. If the data structures are
poorly designed, the application will start from a poor foundation. This will
require a lot more work to create a useful and efficient application.
Normalisation serves as a tool for validating and improving the logical
design, so that the logical design avoids unnecessary duplication of data,
that is, it eliminates redundancy and promotes integrity. In the normalisation
process, we analyse and decompose the complex relations into smaller,
simpler and well-structured relations. Apart from the normalisation process,
in this unit, you will also study the guidelines for relational database
schema.
Objectives:
After studying this unit, you should be able to:
list the guidelines for designing relational databases
explain the levels of relational schema
elucidate the different types of normal forms
distinguish between the different types of normal forms
Sikkim Manipal University

B1966

Page No. 173

Database Management Systems

Unit 8

8.2 Information Design Guidelines for Relational Databases


As we start discussing on designing relational database, we should also
think about the mandatory things necessary to follow while creating a
database. Before we can start, let us throw some light on the informal
measures to improve the quality for relation schema design.
Some criteria for good and bad relation schemas are as follows:
Semantics of the attributes
Reducing the redundant values in tuples
Reducing the null values in tuples
Disallowing spurious tuples
Semantics of the attributes
tuple is the attribute values in the table and Understanding the meaning of
the attribute values in the table are referred as semantics. In addition to this
semantic also specifies how they are related to one another. Whenever we
group attributes to form a relation, we assume that a certain meaning is
associated with the attributes. This meaning is called Semantics, and it
specifies how the attribute values in a tuple relate to one another.
For example, consider a company database schema. The various relations
considered for this database are EMPLOYEE and DEPARTMENT, which
are shown in Figure 8.1.
EMPLOYEE
Emp_ID

Emp_name

Address

Basic salary

P.K

Dept_ID
F.K

DEPARTMENT
Dept_ID

Dept_name

P.K

Dmgr_id
F.K

Fig. 8.1: Simplified Version of the Company Relational Database Schema

The meaning of the EMPLOYEE relation is quite simple: each tuple


represents an employee. The dept_ID attribute is a foreign key that
represents an implicit relationship between EMPLOYEE and
DEPARTMENT relations.

Sikkim Manipal University

B1966

Page No. 174

Database Management Systems

Unit 8

Guideline 1:
Design a relation schema so that it is easy to explain its meaning. Do not
combine attributes from multiple entity types and relationship types into a
single relation.
Reducing redundant values on tuples
Storage space is one of the most important considerations of a relational
schema. Improper grouping of attributes has a significant effect on the
storage space of the relational schema.
Emp_ID

Emp_name

Basic salary

Address

Fig. 8.2 (a): Employee


Dept_ID

Dept_name

Dept_loc

Fig. 8.2 (b): Department

In Figure 8.2(b), each department information appears only once in the


department relation.
If we integrate Figures 8.2(a) and 8.2(b) as single relation EMP_DEPT, we
get below format as shown in fig 8.2(c)
Emp_
ID

Emp
_name

Basic
salary

address

Dept_ID

Dept_name

Dept_loc

Fig. 8.2(c): Emp_dept

There will be serious problem in using Figure 8.2(c); that is insertion


anomalies, deletion anomalies and modification anomalies.

Update anomalies Update anomalies are those problems which arise


from the data redundancy of the un-normalised database table.
There are three types of update anomalies. They are:
o Insertion anomalies
o Deletion anomalies
o Modification anomalies

Insertion anomalies It is difficult to insert a new department that has


no employees in the Emp_dept relation. This causes a problem because
Emp.no is the primary key of Emp_dept. This problem does not occur in

Sikkim Manipal University

B1966

Page No. 175

Database Management Systems

Unit 8

the design of Figure 8.2(b), because a department is entered in the


DEPARTMENT relation, whether or not any employee works for it.

Deletion anomalies If we delete the last employee of a department


from the EMP_DEPT relation, then the whole information about that
department will be lost. This kind of problem does not occur in Figure
8.2(b) because the department database is stored separately.

Modification anomalies If we modify the value of one of the attributes


of a specific department, then we need to change the tuples of
EMPLOYEE who work in that department or else the database is not
consistent.

Guideline 2:
Design the database in such a way that no insertion, deletion or modification
anomalies are present in that relation. If there are any anomalies, note them
clearly, so that proper actions can be taken.
NULL values in tuples
These include unnecessary attributes in the relation. If many of the
attributes do not take any values, we insert NULL values. This can waste
space at the storage level, and it can also lead to problems in understanding
the meaning of the attributes and specifying join operation. Nulls may lead
to counting problems while using aggregate functions.
Guideline 3:
As far as possible, avoid using NULL values for attributes in a relation.
Disallowing spurious tuples
Design relational schema so that they can be joined with equality conditions.
For example:
EMP_LOC
Emp_name

P_loc

Fig. 8.3(a)

EMP_PROJECT
Ssn

Proj_id

Proj_name

Proj_loc

Fig. 8.3(b)
Sikkim Manipal University

B1966

Page No. 176

Database Management Systems

Unit 8

If we attempt a natural join operation on Figures 8.3(a) and 8.3(b), the result
produces many more tuples than the actual combination of tuples.
Additional tuples are called spurious tuples as they represent wrong
information.
Guideline 4:
Design relation schemas so that they can be joined with equality conditions
on attributes that are either primary key or foreign key. It guarantees that no
spurious tuples are generated.
Self-Assessment Questions
1. __________________ specifies how the attribute values in a tuple
relate to one another.
2. ____________________________ are those problems that arise from
the data redundancy of the un-normalised database table.
3. _______________ may lead to counting problems while using
aggregate functions.

8.3 Levels of Relation Schema


There are two levels of relation schema. They are as follows:

Conceptual level schema Conceptual level schema describes the


database structures, inter-relationships and constraints. The basic
components of the schema are the entity types, relationship types and
attributes.

Physical level schema Physical level schema specifies the internal


storage, structures, indexes, access paths and file organisations for the
database files. Along with this, they design application programs that are
implemented as transactions. This can be represented with the help of
ER diagrams.

8.4 Normalisation Based on Primary Keys


As you have already studied the meaning of normalisation and the different
normal forms in Unit 4, in this section we will discuss the types of normal
forms in detail. As you already know, the different normal forms are 1NF,
2NF, 3NF, BCNF, 4NF and 5NF. Here, we will discuss each one in detail.
First Normal Form (1NF) A relation is said to be in First Normal Form
only if,
Sikkim Manipal University

B1966

Page No. 177

Database Management Systems

o
o
o

Unit 8

It is a relation.
It has no repeating rows.
Each attribute value is atomic.

If a relation does not satisfy any one of the above conditions, then it is not in
1NF.
For example, the STUDENT schema having the fields as shown in Table
8.1(a).
Table 8.1(a): Relation Schema of a Student Relation
Std. id

Std_name

Class

Address

Tel no.

201

Ranjith

#4, Chokkanahalli,
Bangalore 560074

26677780

202

Shivraj

XI

Andheri (east) Mumbai


400064

2514890
9885643247

304

Lavanya

#10, Dadra Post, Bandra


(east), Mumbai 400014

25234972
9912451356

The above table is not in 1NF because the field Tel no. is multi-value for
std_ID 202 and 304. However, if we insert a field name Mobile no. as shown
in Table 8.1(b) to maintain the atomic value attribute, we may create a
nullify field in the field which is not allowed. Therefore, Table 8.1(b) is not in
1NF.
Table 8.1(b)
Std
_id

Std
_name

Class

201

Ranjith

#4,
Chokkanahalli,
Bangalore
560074

26677780

202

Shivraj

XI

Andheri (east)
Mumbai 400064

2514890

9885643247

304

Lavanya

#10, Dadra
Post, Bandra
(east), Mumbai
400014

25234972

9912451356

Sikkim Manipal University

Address

B1966

Tel_no.

Mobile no.

Page No. 178

Database Management Systems

Unit 8

Therefore, to make the table in 1NF we need to decompose Table 8.1(a)


into two tables as shown in Tables 8.2(a) and 8.2(b).
Table 8.2(a)
Std_id

Std_name

Class

Address

201

Ranjith

#4, Chokkanahalli, Bangalore 560074

202

Shivraj

XI

Andheri (east) Mumbai 400064

304

Lavanya

#10, Dadra Post, Bandra (east), Mumbai


400014
Table 8.2(b)
Std_id

Tel_no.

201

26677780

202

2514890

202

9885643247

304

25234972

304

9912451356

Now, Tables 8.2(a) and 8.2(b) are in First Normal Form.

Second Normal Form (2NF) Second Normal Form is based on full


functional dependency. A functional dependency is said to be a fully
functional dependency. If we remove any attribute from the relation then
the dependency will be lost in the relation.
According to R. Elmasri and S.B. Navathe, A relation is said to be in
2NF only if the relation is in 1NF and every nonprime attribute in the
relation is fully functionally dependent on the primary key of the relation.
For example, consider a STUD_PROJ relation as shown in Table 8.3.

Sikkim Manipal University

B1966

Page No. 179

Database Management Systems

Unit 8

Table 8.3: STUD_PROJ Relation


Std
_ID

Project
_code

Hours

Std
_name

Class

Proj_name

Prof
_incharge

101

HMS1

20

Ranjith
Jha

1 MBA

Hospital
management
system

Ms.

203

SIM2

30

Meghna
Sinha

2 MBA

Simulation of
petrol bunk

Mr. Murali

303

DM1

15

Samiksha
Shukla

3 MBA

Data mining
in research
analysis

Mr. Benjamin

Sahana

The table STUD_PROJ is in 1NF but not in 2NF. Therefore, we need to


decompose the table as given in Figure 8.4.

Fig. 8.4: 2NF Normalisation

Now the relations SP1, SP2 and SP3 are in 2NF.


Third Normal Form (3NF) According to R. Elmasri and S.B. Navathe,
a relation is said to be 3NF if it satisfies and if it holds a nontrivial
functional dependency either by
Sikkim Manipal University

B1966

Page No. 180

Database Management Systems

Unit 8

1. There is a superkey of relation or


2. There is a prime attribute of the relation.
3NF is based on the transitive dependency. A functional dependency in
relation R is transitive dependent if the attributes of the relation are
neither a candidate key nor a subset of any key of the relation.
Let us take an example of PROFESSOR relation as given in Table 8.4
for our understanding of 3NF.
Table 8.4: PROFESSOR Relation
Prof
_name

Prof
_id

Subjects
specialisation

Qualificati
on

Dept_
number

Dept
_name

HOD
_id

Dr.
Rao

A1

Finance

PhD

D1

Manage
ment

H2

Dr.
Ravi

A2

Marketing

PhD

D1

Manage
ment

H2

Prof.
Sanat
Sha

B1

Computer
science

MCA

D2

IT

H1

Prof.
Neena
Gupta

B2

Sociology

MA, MPhil

D3

Arts &
Humanit
ies

H3

Figure 8.5 shows the decomposition of the above table to form 3NF.

Fig. 8.5: 3NF Normalisation


Sikkim Manipal University

B1966

Page No. 181

Database Management Systems

Unit 8

Now the relations P1 and P2 are in Third Normal Form.


BoyceCodd Normal Form (BCNF) BoyceCodd Normal Form is the
simplest form of 3NF. But it is stricter than 3NF. Every relation in BCNF is
also in 3NF but not all the relations in 3NFs need necessarily be in Boyce
Codd Normal Form.
A relation is said to be in BCNF only if every determinant is candidate key. A
determinant is any attribute (simple or composite) on which some other
attribute is fully functionally dependent.
Let us take an example of a relation STUD_REPORT which has the
following fields as shown in Figure 8.6.
STUD_REPORT

Fig. 8.6: Student Report Table

In this figure, the functional dependencies of the relation are:


Std_ID -> Std_name
Course_code -> Course_title, Faculty_incharge
Faculty_incharge -> Fac_loc
Std_ID, Course_code, Program -> Grade
Std_ID, Program -> Coordinator
Coordinator -> program
The above relation is not normalised (Figure 8.6). To normalise, remove the
redundant groups.
Then it will be:
STUDENT

Fig. 8.7(a)
Sikkim Manipal University

B1966

Page No. 182

Database Management Systems

Unit 8

STUD_PROG

Fig. 8.7(b)

STUD_COURSE

Fig. 8.7(c)

Figures 8.7(a), 8.7(b) and 8.7(c) are only in 1NF. To make them 2NF, we
need to remove the partial key dependencies. Therefore, we will
decompose the schema STUD_COURSE in Figure 8.7(c) into two more
schemas, namely, STUD_COURSE1 and COURSE, which are shown in
Figures 8.8 (a) and 8.8 (b), respectively.
STUD_COURSE1

Fig. 8.8(a): Stud_course Relation after the Decomposition

COURSE

Fig. 8.8(b): Course Relation Decomposed from Relation Stud_course

Now that we have removed the partial key dependencies, the relation is in
2NF. To make this relation into 3NF, we need to remove the transitive
dependency of the relation. Therefore, after the decomposition of relation
COURSE (Figure 8.8(b)), the normalised schemas will be as shown in
Figures 8.9(a) and 8.9(b).

Sikkim Manipal University

B1966

Page No. 183

Database Management Systems

Unit 8

COURSE1

Fig. 8.9(a): Course Relation after Decomposition

FACULTY

Fig. 8.9(b): Faculty Relation Decomposed from Course Relation

Now, the above schemas that is relation STUDENT (Figure 8.7(a)),


STUD_PROG (Figure 8.7(b)), STUD_COURSE1 (Figure 8.8(a)), COURSE1
(Figure 8.9(a)) and FACULTY (Figure 8.9(b)) are in Third Normal Form.
Now we can observe in STUDENT relation that the only determinant is
Std_ID. In STUD_COURSE1 relation the only determinant is Std_ID,
Program. In the COURSE1 relation, the only determinant is Course_code.
In the relation FACULTY, the only determinant is Faculty_incharge. In
STUD_PROG, the determinants are Std_ID, Prog or Prog_coordinator.
Therefore, Std_ID, Prog is a candidate key. So we will decompose the
relation STUD_PROG (Figure 8.7(b)) into two relations as shown in Figures
8.10(a) and 8.10(b).
STUD_PROG1

Fig. 8.10(a)

PROG

Fig. 8.10(b)

Sikkim Manipal University

B1966

Page No. 184

Database Management Systems

Unit 8

Therefore, now Figure 8.11 is in BoyceCodd Normal Form.


STUDENT
Std_ID Std_name
STUD_COURSE1
Std_ID Program
Course_code

Grade obtained

STUD_PROG1
Std_ID Prog_coordinator
PROG
Prog_Coordinator

Program

COURSE1
Course_code Course_title
FACULTY
Faculty incharge

Faculty incharge

Fac_loc

Fig. 8.11: Example of a BCNF Normalised Relation

Fourth Normal Form (4NF) An entity is in the Fourth Normal Form


(4NF) if it is in 3NF and has the entity which has more than one one-tomany relationships in the relationship within the entity; if any many-tomany relationship exists, they are resolved independently.
For example, consider the relation STUDENT as shown in Table 8.5(a),
which has three attributes names Std_name, Sub_name, fac_incharge.
Std_name
Pushpa
Pushpa
Pushpa
Pushpa

Sikkim Manipal University

Table 8.5(a): Student Relation


Sub_name
Fac_incharge
Maths
Prof. Chidanand
Physics
Prof. Ramesh
Physics
Prof. Chidanand
Maths
Prof. Ramesh

B1966

Page No. 185

Database Management Systems

Unit 8

In this relation, a student whose name is Std_name opts for subject


Sub_name and has dependent fac_incharge. A student can opt for multiple
subjects and may have several faculty incharge and the students subjects
and faculty incharge are independent of one another. Therefore, to keep the
relation state consistent we must maintain atomic entry feature and have
separate rows to represent every combination of a students faculty
incharge and students subject. This constraint is called Multi-Valued
Dependency (MVD) on STUDENT relation. MVD will arise when two
independent relationships are mixed in the same relation.
Therefore, to convert to Fourth Normal Form we need to decompose the
STUDENT relation into two 4NF relations STUD_SUB and STUD_FAC as
shown in Tables 8.5(b) and 8.5(c).
Table 8.5(b): STUD_SUB
Std_name

Table 8.5(c): STUD_FAC

Sub_name

Std_name

Fac_incharge

Pushpa

Maths

Pushpa

Prof. Chidanand

Pushpa

Physics

Pushpa

Prof. Ramesh

Now Tables 8.5(b) and 8.5(c) are in 4NF relation.


Fifth Normal Form (5NF) An entity is said to be in the Fifth Normal
Form if and only if it is in 4NF and every join dependency for the entity is
a consequence of its candidate keys. Join dependency means every
legal state of the relation should have non-additive join decomposition.
For example, consider the relation STUDENT as shown in Table 8.6(a)
which has attributes std_name, Sub_name, Proj_name. This has no
MVD and therefore it is in 4NF but not in 5NF.
Table 8.6(a): Student Relation
Std_name

Sub_name

Proj_name

Pushpa

Chemistry

ProjX

Pushpa

Physics

ProjY

Kapila

History

ProjY

Kavitha

Maths

ProjZ

Kapila

English

ProjX

Kapila

Chemistry

ProjX

Pushpa

Chemistry

ProjY

Sikkim Manipal University

B1966

Page No. 186

Database Management Systems

Unit 8

To convert to 5NF, we need to decompose the above table into three


relations, namely, STD_SUB, STD_PROJ and SUB_PROJ as shown in
Tables 8.6(b), 8.6(c) and 8.6(d), respectively.

Now the above tables are in 5NF.


Self-Assessment Questions
4. The basic components of the schema are _______________,
__________________ and attributes.
5. Physical level schema can be represented with the help of
___________________.
6. A relation is said to be in First Normal Form only if each attribute value
is _________________.
7. State whether the following statements are true or false:
a) Second Normal Form is based on transitive dependency.
b) Third Normal Form is based on full functional dependency.
c) Every BoyceCodd Normal Form is also in Third Normal Form.
d) A relation is said to be in BoyceCodd Normal Form only if every
determinant is a candidate key.
e) 5NF is based on join dependency.

8.5 Summary
Let us recapitulate the important concepts discussed in this unit:
Some criteria for good and bad relation schemas are: Semantics of the
attributes, reducing the redundant values in tuples, reducing the null
values in tuples and disallowing spurious tuples.
The two levels of relation schema are conceptual level schema and
physical level schema. Physical schema is represented with the help of
an ER diagram.
Sikkim Manipal University

B1966

Page No. 187

Database Management Systems

Unit 8

Normalisation is the formal process for deciding which attributes should


be grouped together in a relation. The different kinds of normal forms
based on the primary keys and their dependencies are First Normal
Form, Second Normal Form, Third Normal Form, BoyceCodd Normal
Form, Fourth Normal Form and Fifth Normal Form.

8.6 Glossary
Fully functional dependency: A functional dependency is a constraint
between two sets of attributes in a relation from a database. A functional
dependency FD: X Y is called trivial if Y is a subset of X.
Join dependency: A join dependency is a constraint on the set of legal
relations over a database scheme. A table T is subject to a join dependency
if T can always be recreated by joining multiple tables each having a subset
of the attributes of T. If one of the tables in the join has all the attributes of
the table T, the join dependency is called trivial.
Null: In the database, Null value means having nothing in the cell; or in
other words, an empty cell.
Redundancy: Redundancy means occurrence of the repeated field in two
or more tables in a database system.
Spurious: Spurious means not genuine, authentic or true.
Tuple: A tuple is an ordered list of elements in set theory. In a database,
collection of data in a row is called tuple

8.7 Terminal Questions


1.
2.
3.
4.

Explain the design guidelines for relational databases using an example.


Differentiate between physical schema and conceptual schema.
Describe the different normal forms with one example throughout.
Compare the different types of normal forms.

8.8 Answers
Self-Assessment Questions
1. Semantics
2. Update anomalies
3. Nulls
4. Entity types, relationship types
Sikkim Manipal University

B1966

Page No. 188

Database Management Systems

Unit 8

5. ER diagrams
6. Atomic
7. Answers:
a) False
b) False
c) True
d) True
e) True
Terminal Questions
1. Guideline 1: Design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity types and
relationship types into a single relation. Guideline 2: Design the
database in such a way that no insertion, deletion or modification
anomalies are present in that relation. If there are any anomalies, note
them clearly, so that proper actions can be taken. Guideline 3: As far as
possible, avoid using NULL values for attributes in a relation.
Disallowing spurious tuples: Design relational schema so that they can
be joined with equality conditions. Guideline 4: Design relation schemas
so that they can be joined with equality conditions on attributes that are
either primary key or foreign key. It guarantees that no spurious tuples
are generated. (Refer to Section 8.2 for further information.)
2. There are two levels of relation schema. They are: (1) Conceptual level
schema: This schema describes the database structures, interrelationships and constraints. The basic components of the schema are
the entity types, relationship types and attributes. (2) Physical level
schema: This schema specifies the internal storage, structures, indexes,
access paths and file organisations for the database files. Along with
this, they design application programs which are implemented as
transactions. This can be represented with the help of ER diagrams.
(Refer to Section 8.3 for further information.)
3. The different kinds of normal forms based on primary keys and
functional dependencies are 1NF, 2NF, 3NF, BCNF, 4NF and 5NF.
(Refer to Section 8.4 for further information.)
4. First Normal Form: A relation is said to be in 1NF if and only if the
attribute value is atomic. (Refer to Section 8.4 for further information.)
Sikkim Manipal University

B1966

Page No. 189

Database Management Systems

Unit 8

8.9 Case Study


Consider a student database with [std_ID, std_name, address, housename,
house color, subject, grade] as the attributes. Std_ID being the primary key
of the database. See the below table:
Table 1

This relation is not in First Normal Form. Therefore, create new rows so that
each cell contains only one value.
Table 2

Still Table 2 is not in 1NF. Make std_ID and subject together as primary key
so that it can identify a tuple.
Now the relation is in 1NF.
Now consider this table. Student name and address are dependent on
std_ID which is a part of the key. But still this is not in 2NF.
Discussion Questions:
1. Why Table 1 is not in 1NF?
2. Why Table 2 is not in 2NF even after student name and address are
dependent on std_ID?
(Hint: Refer to Section 8.4, Normalisation Based on Primary Keys.)
References/E-References:
References:
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Sikkim Manipal University

B1966

Page No. 190

Database Management Systems

Unit 8

E-References:
http://www.cs.man.ac.uk/~horrocks/Teaching/cs2312/Lectures/Handouts
/ NFexamples.pdf ( retrieved on January 14 2012)
www.Vceit.com
http://db.grussell.org/section009.html (retrieved on May 15, 2012)

Sikkim Manipal University

B1966

Page No. 191

Database Management Systems

Unit 9

Unit 9

Database Administration

Structure:
9.1 Introduction
Objectives
9.2 Transaction Processing Concepts
9.3 Transactions in Multiuser System
9.4 Desirable Properties of Transactions
9.5 Summary
9.6 Glossary
9.7 Terminal Questions
9.8 Answers
9.9 Case Study

9.1 Introduction
So far, we have discussed the various technical concepts of the database
systems with its application in an organisation for designing and analysis of
a system. In this unit, we will discuss how to administer the database in the
organisation and study the basic concepts of transaction processing
systems.
Transaction management is the ability of a database management system
to manage the various transactions that occur within the system.
Transaction is a set of program statements or collections of operations that
form a single logical unit of work. A database management system should
ensure that the transactions are executed properly; either the entire
transaction should be executed or none of the operations should be
executed. This is also called atomic cooperation. The DBMS should execute
this task or transaction in total to avoid inconsistency.
In this unit, we will study various concepts of transaction processing, various
uses of transactions and the properties of transactions.
Objectives:
After studying this unit, you should be able to:
describe the basic concepts of transaction processing system
explain transactions in multiuser system
list the properties of transactions
Sikkim Manipal University

B1966

Page No. 192

Database Management Systems

Unit 9

9.2 Transaction Processing Concepts


A transaction is an atomic unit comprising one or more SQL statements. A
transaction begins with the first executable statement and ends when it is
committed or rolled back.
Single user versus multiuser systems - A DBMS is used if at most one
user can use the system at a time. It is multiuser if many users can use the
system and have access to the database concurrently. For example, an
airline reservation system is used by hundreds of travel agencies and clerks
concurrently.
In a single-user system, one can execute at most one process at a time.
Parallel execution of
Operators C and D

Fig. 9.1: Interleaved Concurrency versus Parallel Execution

The read and write operations and DBMS buffers


A transaction is a logical unit of database processing that includes one or
more database access operations (insertion, delete, etc). Only retrieval of
data is called read-only transaction.
The basic database access operations are as follows:
1. Read-item (x) It reads a database item named x into a program
variable.
2. Write-item (x) It writes the value of the program variable x into the
database.
Read-item (x) includes the following steps:
1. Find the address of the disk block that contains item x.
2. Copy that disk block into a buffer in main memory.
Sikkim Manipal University

B1966

Page No. 193

Database Management Systems

Unit 9

3. Copy item x from the buffer to the program variable x.


Executing the write-item (x) includes the following steps:
1. Find the address of the disk block that contains item (x).
2. Copy that disk block into a buffer in main memory.
3. Copy item x from the program variable into its current location in the
buffer.
4. Store the updated block from the buffer back to the disk.
a.

b.

T1

T2

Read_item (X)
X = X N'
Write_item(X);

Read_item(X);
X: = X + M
Write_item(X)

Read_item(Y)
Y = Y + N;
Write_item(Y)

Concurrent control The data in the database must perform their


transactions concurrently without violating the ACID (Atomicity,
Consistency, Integrity and Durability) properties of a database. It takes
place during the progression of an activity. It involves the regulation of
ongoing activities that are part of the transformation process to ensure
that they conform to organisational standards. Concurrency control
solves the major issues involved with allowing multiple people
simultaneous access to shared entities, and their object representations.

Need for concurrency control In a multiuser database, transactions


submitted by the various users may execute concurrently and update
the same data. Concurrently, executing transactions must be
guaranteed to produce the same effect as serial execution of
transactions (one by one). Several problems can occur when concurrent
transactions execute in an uncontrolled manner. Therefore, the primary
concern of a multiuser database includes how to control data
concurrency and consistency.

Data concurrency Access to data concurrently (simultaneously) used


by many users must be coordinates.

Sikkim Manipal University

B1966

Page No. 194

Database Management Systems

Unit 9

Data consistency - A user always sees a consistent (accurate) view of


all data committed by other transactions as of that time and all changes
made by the user up to that time. Several problems can occur when
concurrent transactions execute in an uncontrolled manner.
Let us take the example of an airline reservation database in which a
record is stored for each flight. Each record includes the number of
reserved seats on that flight. Fig.a shows a Transaction T1 that
transfers N reservations from one flight, whose number of reserved
seats is X, to another flight whose number of reserved seats is Y.
Fig.b shows a transaction T2 that reserves M seats on the first flight.
We will now discuss the types of problems we may encounter when
these two transactions run concurrently.

1. The lost update problem - Suppose transactions T1 and T2 are


submitted at the same time. When these two transactions are executed
concurrently., then the final value of X is incorrect. Because T2 reads
the value of X before T1 changes it in the database, the updated value
resulting from T1 is lost. For example, X = 80 at the start (80 reservation
at the beginning), N = 5 (T1 transfers 5 seat reservation from the flight X
to Y), and M = 4 (T2 reserves 4 seats on X), the final result should be X
= 79, but due to interleaving of operations X = 84, because updating T1
that removed the 5 seats from X was lost.
T1

T2

Read-item(x)
X: = x n

Read_item(x)
X: = x + m

Write-item (x);
Read-item (y)
Write_item(x): Item x has an incorrect because its
update by T1 is lost.
Y: = y + n;
Write_item(y):
2. Dirty read problem This problem occurs when one transaction
updates a database item and then the transaction fails for some reason.
The updated item is accessed by another transaction before it is
changed back to its original value.
Sikkim Manipal University

B1966

Page No. 195

Database Management Systems

Unit 9

For example, T1 updates item x and then fails before completion, so the
system must change x back to the original value. Before it can do so,
transaction T2 reads the temporary value of x, which will not be
recorded permanently in the database, because of the failure of T1. The
value of item x that is read by T2 is called Dirty Data, because it has
been created by a transaction that has not been completed and
committed yet. Hence, this problem is also known as the temporary
update problem.
T1

T2

Read-item (x);
X: = x n
Write_item(x)

Read_item(x);
X: = x + m;
Write-item(x)

Read_item(y);
3. Incorrect summary problem If one transaction is calculating an
aggregate summary function on a number of records while other
transactions are updating some of these records, the aggregate function
may calculate some values before they are updated and others are
calculated after they are updated.
For example, Transaction T3 is calculating the total number of
reservations on all the flights while transaction T1 is executing. T3 reads
the values of x after n seats have been subtracted from it, but reads the
value of y before those n seats have been added to it.
T1

T3
Sum: = 0
Read_item(A);
Sum: = sum + A;

Read_item(x);
X: = x n'
Wrote_ote,(x);

Read_item(x); T3 reads x after m is subtracted

Sum: = sum + x; and reads y before n is added,


Sikkim Manipal University

B1966

Page No. 196

Database Management Systems

Read_item (y);

Unit 9

so a wrong summary is the


Sum: = sum + y;

result.

Read_item(y);
Y: = y + n;
Write_item(y)
Why is recovery needed?
A major responsibility of the database administrator is to prepare for the
possibility of hardware, software, network and system failure. It is usually
desirable to recover the databases and return to normal operation as quickly
as possible. Recovery should proceed in such a manner that it protects the
database and the users from unnecessary problems.
Whenever a transaction is submitted to a DBMS for execution, the system is
responsible for making sure that either:
1. All the operations in the transactions are completed successfully and
their effects are recorded permanently in the database, or
2. The transaction has no effect on the database; this may happen if a
transaction fails after executing some of its operations, but before
executing all of them.
Types of failures

A computer failure (System crash) Hardware, software or network


error occurs in the computer system during a transaction.

Transaction or system error - Some operations in the transaction


may cause it to fail, such as integer overflow or division by Zero, and
so on.

Local errors or exception conditions detected by the transaction


During transaction execution, certain conditions may occur that perform
cancellation of the transaction. For example, data for the transaction
may not be found.

Concurrency control enforcement - The concurrency control method


may decide to abort the transactions and restart it later, because several
transactions are in a state of deadlock.

Disk failure Some disk blocks may lose their data because of read or
write malfunctions.

Sikkim Manipal University

B1966

Page No. 197

Database Management Systems

Unit 9

Physical problems and catastrophes This refers to a list of


problems that includes power or air conditioning failure, fire, theft,
overwriting disks, and so on.

Transaction states and additional operations


A transaction is an atomic unit of work that is entirely completed or not done
at all. For recovery purpose, the system needs to keep track of when the
transaction starts, terminates, commits or aborts. Hence, the recovery
manager keeps track of the following operations.

Begin transaction This marks the beginning of transaction execution.

Read/Write - These specify read/write operation execution.

End transaction This specifies that the read and write transaction
operations have ended, and this marks the end of the transaction
execution. At this point it may be necessary to check whether the
changes can be permanently applied to the database or aborted.

Commit transaction This signals a successful end of the transaction,


and any changes executed by the transaction can be committed to the
database.

Rollback This signals that the transaction has ended unsuccessfully,


and any changes that the transaction may have applied to the database
must be undone.

Fig. 9.2: State Transition Diagram Illustrating the States for Transaction
Execution

Figure 9.2 shows a state transition diagram that describes how a transaction
moves through its execution states. A transaction goes into an active state
immediately after it starts execution, where it can issue read and write
operations. When the transaction ends, it moves to the partially committed
Sikkim Manipal University

B1966

Page No. 198

Database Management Systems

Unit 9

state. At this point, some recovery protocols need to ensure that there is no
system failure. Once this check is successful, the transaction is said to have
reached its commit point and enters the committed state.
However, a transaction can go to the failed state if one of the checks fails or
if the transaction is aborted during its active state. The transaction may then
have to be rolled back to undo the effect of its write operations on the
database. The terminated state corresponds to the transaction by leaving
the system or it ends the transaction.
Static and dynamic files Static files are those files on which the update
operation is done every rarely. However, in dynamic files constant update
operation takes place. It may change frequently.
For example, the master file of any database is a static file whereas a
transaction file is a dynamic file.
The transaction file can retrieve the records from the master file and the
entire update operations take place in transaction file.
Self-Assessment Questions
1. ____________________ is an atomic unit comprising one or more
SQL statements.
2. ____________________ users can access databases and use
computer systems simultaneously.
3. ________________________________ occurs when one transaction
updates a database item and then the transaction fails for some
reason.
4. State whether the following statements are true or false:
a) Whenever a transaction is submitted to a DBMS for execution, the
system is responsible for making sure that all the operations in the
transactions are completed successfully and their effects are
recorded permanently in the database.
b) System crash occurs when some operation in the transaction may
cause failure in the system.
c) Local errors refer to a list of problems that includes power or air
conditioning failure, fire, theft.
d) Static files are those on which the update operations are done
every day.
Sikkim Manipal University

B1966

Page No. 199

Database Management Systems

Unit 9

9.3 Transactions in Multiuser System


A transaction is a process of exchange and changes made to the attributes
in the database. The atomicity of the transaction is maintained in order to
protect the integrity of the databases. The atomicity of the transaction
means transactions are completely committed or completely rolled back.
A multiuser transaction means they have multiple users operating at the
same interval of time; for example, in the debit card operations, if a person
shops in the mall and pays the bill in terms of debit card, it hardly takes 0.01
second of time. The main task to be taken care of is the coordination of the
transaction between the accounts and it should be atomic, or else the
persons account might be debited and the corporate current account of the
retailer may not be credited.
In multiuser access, it will be simply done by locking the database during
the process of transaction. Update request to that particular database at that
particular instant of time will be postponed.

9.4 Desirable Properties of Transactions


To ensure data integrity, the database management system should maintain
the following transaction properties. These are often called the ACID
properties.
1. Atomicity A transaction is an atomic unit of processing. It is either
performed in its entirety (completely) or not performed at all.
2. Consistency The basic idea behind ensuring atomicity is as follows:
The database system keeps back of the old values of any data on which
a transaction performs a write, and if the transaction does not complete
its execution, the old values are restored to make it appear as though
the transaction was never executed.
For example, let Ti be a transaction that transfers Rs.850 from account
A to account B. This transaction can be defined as
Ti; read(A)
A: = A 50;
Write(A);
Read(B);
B: = B + 50;
Write (B).
Sikkim Manipal University

B1966

Page No. 200

Database Management Systems

Unit 9

Suppose that before execution of transactions Ti, the values of accounts


A and B are Rs.1,000 and Rs.2,000, respectively. Now suppose that
during the execution of transaction Ti, a failure has occurred after
write(A) operation, that prevents Ti from completing its execution
successfully. But before the write of B operation was executed values of
A and B in the database are Rs. 950 and Rs. 2,000, respectively. We
have lost Rs. 50 which is executed in a sequential fashion.
3. Isolation: In database systems, isolation determines how transaction
integrity is visible to other users and systems. A lower isolation level
increases the ability of many users to access data at the same time, but
increases the number of concurrency effects (such as dirty reads or lost
updates) users might encounter. Conversely, a higher isolation level
reduces the types of concurrency effects that users may encounter, but
requires more system resources and increases the chances that one
transaction will block another.
4. Durability - Once a transaction changes the database and the changes
are committed, these changes must never be lost because of
subsequent failures. The users need not worry about the incomplete
transactions. Partially executed transactions can be rolled back to the
original state, thus ensuring that durability is the responsibility of the
recovery management component of the DBMS.
Concepts of schedule history, recoverable cascading rollback and
script schedules
A schedule history A schedule history can be defined as a partial order
over the operations of a set of transactions.
Suppose T1, T2 and T3 are the three transactions.
T1: Read (x)

T2: Write (x)

T3: Read (x)

Write (x)

Write (y)

Read (y)

Commit

Read (z)

Read (z)

Commit

Commit

H1 = {W2(x), R1(x), R3(x), W1(x), C1, W2(y), R3(y), R2 (z), C2, R3 (z), C3}

Sikkim Manipal University

B1966

Page No. 201

Database Management Systems

Unit 9

Whereas R1, R2 and R3 are the read operations of T1, T2 and T3; W1, W2
and W3 are the write operations of T1, T2 and T3; C1, C2 and C3 are the
COMMIT operations of T1, T2 and T3.
Recoverable cascading rollback - Recoverability is the ability to recover
data from the transaction failure. The transactions that are committed will
not read data written by the transactions aborted. This is because the
transactions commit only after all the changes of the transaction they read
ends with a COMMIT. So they must read COMMIT.
For example, consider the below schedules may be termed as examplerecover

Here, both S and S1 are recoverable schedules. In S, transaction T1


commits before T2 transaction; therefore, the value read for x in T2 is
correct. Later, T2 will also commit and hence will be recoverable.
In S1, transaction T1 is aborted. Therefore, T2 has to abort itself since the
value read for x in T2 is incorrect.
Thus, in both the cases, the consistency of the database is maintained.

Sikkim Manipal University

B1966

Page No. 202

Database Management Systems

Unit 9

Consider another example:

In this schedule, S2 transaction T2 is read before T1 commits. Therefore,


the value for x in T2 is incorrect because T1 got aborted later. Still, T2 got
committed. Therefore, it is unrecoverable.
Cascading rollbacks The cascading effect means if one transaction
happens to fail in a schedule it affects many to fail along with it. In this case,
cascading rollback has to take place. Rollback is a process that makes the
database status go back to the state before the failure transaction. It helps
in maintaining the integrity of the database.
A ROLLBACK command in SQL makes the status of the data to roll back
to its state as it was before any changes were made.
Refer to the example on recoverable schedules that is example-recover. In
this example, although S1 is recoverable it cannot avoid cascading failures
(aborts). When transaction T1 aborts, the transaction T2 will abort itself to
maintain consistency.
Below is the example that shows a recoverable schedule which avoids
cascading effect. But the update of x by T1 is lost.

Sikkim Manipal University

B1966

Page No. 203

Database Management Systems

Unit 9

Script schedule Script scheduler is part of the schedule. During script


schedule, the script is executed in the duration of time by the schedules.
Self-Assessment Questions
5. _________________________ in a transaction means transactions
are completely committed or completely rolled back.
6. Data integrity is ensured by maintaining the transaction properties
called _______________ properties.
7. _____________________ can be defined as a partial order over the
operations of a set of transactions.
8. State whether the following statements are true or false:
a) Recoverability is the ability to recover data from the transaction
failure.
b) During cascading rollback, the transactions will not read data
written by the transaction aborted.

9.5 Summary
Let us recapitulate the important concepts discussed in this unit:

Transaction management is the ability of a database management


system to manage the various transactions that occur within the system.
Transaction is a set of program statements or collections of operations
that form a single logical unit of work.

The basic database access operations are Read-item (x) and Write-item
(x).

The types of problem we may encounter when two transactions run


concurrently are the lost update problem, dirty read problem and
incorrect summary problem.

The different types of failures of a system are computer failure (system


crash), transaction or system error, local errors or exception conditions
detected by the transaction, concurrency control enforcement, disk
failure, physical problems and catastrophes.

To ensure data integrity, the database management system should


maintain the transaction properties. These are often called the ACID
properties.

Sikkim Manipal University

B1966

Page No. 204

Database Management Systems

Unit 9

9.6 Glossary
Buffer: It is a temporary storage area, usually in RAM. The purpose of
most buffers is to act as a holding area, enabling the CPU to manipulate
data before transferring it to a device.
Cascading rollback: A cascading rollback occurs in database systems
when a transaction (T1) causes a failure and a rollback must be performed.
Other transactions dependent on T1s actions must also be rolled back due
to T1s failure, thus causing a cascading effect. That is, one transactions
failure causes many to fail.
Catrostrophy: It is the mathematical basis for the study of large changes in
a total system which may result from small changes in a critical variable in
the system.
Concurrency: It refers to acting together, as agents or circumstances or
events.
Consistency: It refers to reliability or uniformity of successive results or
events.
Deadlock: A deadlock is a situation in which two or more competing actions
are each waiting for the other to finish, and thus neither ever does.
Durability: Durability refers to the ability of the system to recover committed
transaction updates if either the system or the storage media fails.
Integrity: Integrity constraints guard against accidental damage to the
database, by ensuring that authorised changes to the database do not result
in a loss of data consistency.
Variable: It refers to a logical set of attributes. These can be changed with
respect to time.

9.7 Terminal Questions


1. Explain the steps involved in database access operations.
2. List the different types of failures in a sytem.
3. What is multiuser system?
4. Describe the ACID properties with examples.

Sikkim Manipal University

B1966

Page No. 205

Database Management Systems

Unit 9

9.8 Answers
Self-Assessment Questions
1. Transaction
2. Multiple
3. Dirty read problem
4. Answers:
a) True
b) False
c) False
d) True
5. Atomicity
6. ACID
7. Schedule history
8. Answers:
a) True
b) False
Terminal Questions
1. The basic database access operations are Read-item (x) and Writeitem (x). Read-item(x) reads a database item named x into a program
variable and Write-item writes the value of the program variable x into
the database. We have a few steps for each of the types of operations.
(Refer to Section 9.2 for further information.)
2. The different types of failures are computer failure (system crash),
transaction or system error, local errors or exception conditions
detected by the transaction, concurrency control enforcement, disk
failure and physical problems and catastrophes. (Refer to Section 9.2
for further information.)
3. A multiuser transaction means they have multiple users operating at
the same interval of time. (Refer to Section 9.3 for further information.)
4. To ensure data integrity, the database management system should
maintain the transaction properties. These are often called the ACID
properties. ACID can be abbreviated as Atomicity, Consistency,
Integrity and Durability. (Refer to Section 9.4 for further information.)
Sikkim Manipal University

B1966

Page No. 206

Database Management Systems

Unit 9

9.9 Case Study


It is often necessary for more than one user to share stored data in a
database. With shared database comes the problem of concurrency control
for preserving data consistency; that is, ensuring that database operations
from different users do not interfere with each other. Database operations
include both reading and updating stored data. Update involves reading an
object from the secondary storage, changing values in the object in main
memory, and then writing the object back to the secondary storage.
Subsequent access is to the updated object.
Concurrency control has been studied extensively for traditional databases.
But research on concurrency control mechanisms for object-oriented
databases is still at its infancy.
CANDIDE is an experimental information retrieval system that stores a wide
variety of data, such as text, graphics, digitised images, sound, application
programs and computer simulation from different disciplines. Objectoriented database concepts are used to organise these data. Currently,
CANDIDE is a single-user system. This thesis involves the design of
multiuser CANDIDE DBMS, design of an efficient concurrency control
mechanism and showing the feasibility of implementing the system in a PC
environment.
A clientserver architecture is used for this multiuser DBMS. For
concurrency control, the hierarchical locking scheme is extended to suit the
object-oriented data model requirements. This method exploits the
semantics of the data model and provides maximum concurrency control.
This scheme is chosen to minimise the number of locks that need to be
acquired while accessing a set of objects.
Researchers made three major contributions towards making CANDIDE a
multiuser system. They are as follows:

The researchers designed the multiuser CANDIDE for Version I and


discussed the complete implementation of Version I. They chose the
clientserver architecture for its suitability for the system.

They designed an efficient Concurrency Control (CC) mechanism based


on the theory of granularity locking and designed an efficient lock table

Sikkim Manipal University

B1966

Page No. 207

Database Management Systems

Unit 9

to suit the mechanism. This CC mechanism is aimed at providing


maximum parallelism.

Finally, they demonstrated the feasibility of implementing this system on


PCs. NetBIOS communication software was recommended to
implement this system on PCs because it is compatible with most of the
existing LANs.

Discussion Questions:
1. Is recovery addressed in this case? If not, how can it be addressed?
2. Is deadlock manager needed and why?
(Hint: Refer an article on
http://itlab.uta.edu/sharma/PPL/ThesisWeb/hks_thesis.pdf)
References/E-References:
References:

Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database


Systems, 5th ed. New Delhi: Pearson Education Inc.

Er. Jain, V. K. (2008). Database Management Systems. New Delhi:


Dreamtech Press.

E-References:

http://www.wiziq.com/tutorial/225006-Transaction-Processing-Systemin-DBMS (retrieved on 12th February 2012)

http://itlab.uta.edu/sharma/PPL/ThesisWeb/hks_thesis.pdf (retrieved on
25th March 2012)

http://www.google.co.in/url?sa=t&rct=j&q=case%20study%20on%20mult
iuser%20system%20in%20dbms&source=web&cd=5&cad=rja&ved=0C
EQQFjAE&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fd
ownload%3Fdoi%3D10.1.1.206.2749%26rep%3Drep1%26type%3Dpdf
&ei=RLakUKX9I-eviQe994GYDg&usg=AFQjCNFOQnv9YPfHTx2
Lcc7Op0slIjsCsw (retrieved on 25th march 2012)

Sikkim Manipal University

B1966

Page No. 208

Database Management Systems

Unit 10

Unit 10

Operations and Management

Structure:
10.1 Introduction
10.2 ClientServer Databases
10.3 Concurrency Management
Types of locks: Locking technique for concurrency control
The two-phase locking protocol
10.4 Distributed Database Management System
10.5 Heterogeneous and Homogeneous Systems
10.6 Summary
10.7 Glossary
10.8 Terminal Questions
10.9 Answers
10.10 Case Study

10.1 Introduction
In Unit 9, you have studied about how to process the transactions. In this
unit, we will discuss about clientserver databases. Any company will use
Local Area Network (LAN) to connect between computers and share the
resources and peripherals, mainly PCs which are servers and printers. LAN
is used to connect between computers that are physically kept nearer to
each other from the shared computer. In this unit, you will study about the
major concept of data warehousing and query processing. You will also
study about some of the main techniques used to control concurrent
execution of transactions which are based on the concept of locking data
items. A lock is a restriction on access to data in a multiuser environment. It
prevents multiple users from changing the same data simultaneously. If
locking is not used, data within the database may become logically incorrect
and may produce unexpected results. In addition, we will discuss about the
heterogeneous and homogeneous systems.
In order to meet the needs of the present information systems, everybody
would like to access a companys databases. The company database may
include the details about the employees, customers, suppliers and vendors
of various resources. Placing the data of each sector in an individual system
and maintaining the integrity of the system is a meaningless expectation.
Sikkim Manipal University

B1966

Page No. 209

Database Management Systems

Unit 10

Therefore, in the present Internet world, telecommunication allows the


company to store the entire database in a concentrated place in a
mainframe computer and allows to give user access to the systems. But this
type of storing the information in a single mainframe can have advantages
and disadvantages. To name the negative side of the single mainframe
system, suppose any system or site goes down then the whole system will
come to a standstill until the site gets repaired. Or, suppose a data is lost in
the server then it is very risky to retrieve the lost data. Also, the cost needed
to connect the terminals situated far away to the main server is very high.
Therefore, distributed database was designed as an alternative to
centralised database.
In this unit, we will discuss the basic concepts of distributed databases. We
will also study the different types of databases like heterogeneous and
homogeneous databases,
Objectives
After studying this unit, you should be able to:
explain clientserver databases
list and explain different types of locks and locking protocol in
concurrency management
differentiate between centralised system and Distributed Database
Management System (DDBMS)
list the advantages and disadvantages of DDBMS
describe the types of distributed systems

10.2 ClientServer Databases


Clientserver database is a type of arrangement of personnel computers
through a communication medium. These computers are connected through
Local Area Network (LAN). LAN is used to connect the computers that are
located nearby to each other.
The clientserver model is basic to distributed systems; it allows clients to
make requests that are routed to the appropriate server in the form of
transactions. The clientserver model consists of three parts, and they are
as follows:
Client
Server
Sikkim Manipal University

B1966

Page No. 210

Database Management Systems

Unit 10

Network
Client The client is the machine (workstation or PC) running the frontend applications. It interacts with a user through the keyboard, display
and mouse. The client has no direct data access responsibilities. The
client machine provides front-end application software for accessing the
data on the server. The clients initiate transactions and the server
processes the transactions.

Interaction between the client and the server might be processed in the
following ways while processing an SQL query:
a) The client passes a user query and decomposes it into a number of
independent site queries. Each site query is sent to the appropriate
server site.
b) Each server processes the local query and sends the resulting relation
to the client site.
c) The client site combines the results of the queries to produce the result
of the originally submitted query.
Thus, the server is called database processor or back-end machine,
whereas the client is called application processor or front-end machine.
Another function controlled by the client is that of ensuring consistency of
replicated copies of a data item by using distributed concurrency control
techniques. The client must also ensure the atomicity of global transactions
by performing global recovery when certain sites fail. It provides distribution
transparency, which means the client hides the details of data distribution
from the user.
Server The server is a machine that is referred to as back end. The
server processes SQL and other query statements received from client
applications. It can have large disk capacity and fast processors.
Network The network enables remote data access through client
server and server-to-server communication.
Each computer in a network is a node, and it acts as a client, a server, or
both, depending on the situation.
Advantages
Client applications are not dependent on physical location of the data. If
the data is moved or distributed to other database servers, the
application continues to function with little or no modification.
Sikkim Manipal University

B1966

Page No. 211

Database Management Systems

Unit 10

It provides multitasking and shared memory facilities; as a result, they


can deliver the highest possible degree of concurrency and data
integrity.
In a networked environment, shared data is stored on the servers rather
than on all computers in the system. This makes it easier and more
efficient to manage concurrent access. Inexpensive, low-end client work
stations can access the remote data of the server effectively.

Data warehousing versus query processing


When you are discussing about the clientserver database, the important
concept you need to focus on is data warehousing because it helps in
supporting the increasing power of analytical tools and techniques. Unlike
traditional databases that were transactional, data warehouses are meant
for decision making. They have optimised the data retrieval process and are
not meant for daily transaction processing.
According to W.H. Inmon, data warehouse is defined as subject oriented,
integrated, non-volatile, and time variant collection of data in supportive of
management decisions. Thus, with the help of data warehouse, data can
be used for complex analysis, knowledge discovery and decision making.
Self-Assessment Questions
1. Clientserver model consists of three parts: client, server and
_________________.
2. ________________ is a workstation running the front-end application.
3. Interaction between ______________ and __________________
might be processed during an SQL query.
4. _______________________________ is defined as subject oriented,
integrated, non-volatile and time variant collection of data in supporting
management decisions.

10.3 Concurrency Management


Suppose you and your colleague are working on the same data in a table,
and you both make a change to a common row of the table and you both try
to change the versions of the database at the same time, then we need to
think whose changes should be saved by the database. At the same time, if
you both are using the same object of the table and try to make any
changes, what should happen?
Sikkim Manipal University

B1966

Page No. 212

Database Management Systems

Unit 10

Here, the aspect of concurrency control comes into picture. Concurrency


control helps in solving the problems that are caused by multiple users
using the same data at the same time. In order to solve the problem of
multiple users, you must be aware of the meaning of transactions and
collisions. We have already discussed the meaning of transactions in detail
in Unit 9. Collisions occur when two activities attempt to change the entities
within a system of record. A collision may be a complete transaction or it
may be a part of it. There are different problems which you have already
studied in Unit 9.
Let us discuss the different types of locks for concurrency control. These
locks are based on the collision strategies.
10.3.1 Types of locks: Locking technique for concurrency control
Several types of locks are used in concurrency control to introduce locking
concepts gradually. We shall first discuss binary locks, which are simple but
restrictive and therefore not used in practice. We shall then discuss shared
locks and exclusive locks, which provide more general locking capabilities
and are used in practical database locking schemes.
Binary locks A binary lock can have two states or values: locked and
unlocked (or 1 and 0, for simplicity). A distinct lock is associated with each
database item X. If the value of the lock on X is 1, item X cannot be
accessed by a database operation that requests the item. If the value of the
lock on X is 0, the item can be accessed when requested.
Two operations, lock_item and unlock_item, are used with binary locking. A
transaction requests access to an item X by first issuing a lock_item(X)
operation. If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0,
it is set to 1 and the transaction is allowed to access item X. When the
transaction is through using the item, it issues an unlock_item(X) operation,
which sets LOCK(X) to 0 (unlocks the item), so that X may be accessed by
another transaction. Hence, a binary lock enforces mutual exclusion on the
data item.
Lock_item(D):
B: if LOCK(X) = 0 (item is unlocked)
Then LOCK(X) 1 (lock the item)
else begin
Sikkim Manipal University

B1966

Page No. 213

Database Management Systems

Unit 10

wait (until lock(X) = 0 and


the lock manager wakes up the transaction);
goto B
end;
unlock_item(X):
LOCK(X) 0; (unlock the item)
If any transactions are waiting
Then wake up one of the waiting transactions:
If the simple binary locking scheme described here is used, every
transaction must obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any
read_item(X) or write_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all
read_item(X) and write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds
the lock on item X.
4. A transaction T will not issue an unlock_item(X) operation unless it
already holds the lock on item X.
Shared locks It is used for read only operations, that is, for operations
that do not change or update the data.
For example, SELECT statement.
Shared locks allow concurrent transaction to read (SELECT) a data. No
other transactions can modify the data while shared locks exist. Shared
locks are released as soon as the data has been read.
Exclusive locks Exclusive locks are used for data modification
operations, such as UPDATE, DELETE and INSERT. It ensures that
multiple updates cannot be made to the same resource simultaneously. No
other transaction can read or modify data when locked by an exclusive lock.
Exclusive locks are held until transaction commits or rolls back since those
are used for write operations.

Sikkim Manipal University

B1966

Page No. 214

Database Management Systems

Unit 10

There are three locking operations: read_lock(X), write_lock(X) and


unlock(X). A lock associated with an item X, LOCK(X), now has three
possible states: read locked, write-locked or unlocked. A read-locked
item is also called share-locked, because other transactions are allowed to
read the item, whereas a write-locked item is called exclusive-locked,
because a single transaction exclusive holds the lock on the item.
Each record on the lock table will have four fields: <data item name, LOCK,
no_of_reads, locking_transaction(s)>. The value (state) of LOCK is either
read-locked or write-locked.
read_lock(X):
B, if LOCK(X) = unlocked
Then begin LOCK(X) read-locked
No_of_reads(x) 1
end
else if LOCK(X) = read-locked
then (no_of_reads(X)=no_of_reads(X) + 1
else begin wait(until)LOCK(X) = unlocked and
the lock manager wakes up the transaction);
goto B
end;
write_lock(X):
B: if LOCK(X) = unlocked
Then LOCK(X) write-locked;
else begin
wait(until LOCK(X) = unlocked and
the lock manager wakes up the transaction);
goto B
end;
unlock(X):
if LOCK(X) = write-locked
Then begin LOCK(X) un-locked;
Wake up one of the waiting transactions, if any
end
Sikkim Manipal University

B1966

Page No. 215

Database Management Systems

Unit 10

else if LOCK(X) = read-locked


then begin
no_of_reads(X) no_of_reads(X)-1
if no_of_reads(X) = 0
then begin LOCK(X) = unlocked;
wake up one of the waiting transactions, if any
end
end;
10.3.2 The two-phase locking protocol
The two-phase locking protocol is a process to access the shared resources
as their own without creating deadlocks. This process consists of two
phases.
1. Growing phase In this phase, the transaction may acquire lock, but
may not release any locks. Therefore, this phase is also called as
resource acquisition activity.
2. Shrinking phase In this phase, the transaction may release locks, but
may not acquire any new locks. This includes the modification of data
and release locks. Here, two activities are grouped together to form a
second phase.
In the beginning, the transaction is in a growing phase. Whenever a lock is
needed, the transaction acquires it. As the lock is released, the transaction
enters the next phase and it can stop acquiring the new lock request.
Strict two-phase locking
In the two-phase locking protocol, cascading rollback is not avoided. In
order to avoid this, a slight modification is made to two-phase locking and
this is called strict two-phase locking. In this phase, all the locks acquired by
the transaction are kept on hold until the transaction commits.
Deadlock and starvation
In deadlock state, there exists a set of transactions in which every
transaction in the set is waiting for another transaction in the set.
Suppose there exists a set of transactions waiting,

Sikkim Manipal University

B1966

Page No. 216

Database Management Systems

Unit 10

{T1, T2, T3, Tn) such that T1 is waiting for a data item existing in T2, T2
for T3, and so on and Tn is waiting of T1. In this state none of the
transactions will progress.
Self-Assessment Questions
5. State whether the following statements are True/False:
a) Concurrency control helps to solve the problems that are caused
by multiple users using the same data at the same time
b) A binary lock has one state and the value of lock on X is 1.
6. The two-phase locking protocol are of two phases, and they are
____________________ and ___________________.
7. In __________________ state there exists a set of transactions in
which every transaction in the set is waiting for another transaction in
the set.

10.4 Distributed Database Management System


Distributed Database Management System (DDBMS) is based on
decentralisation. In DDBMS, the data is stored in multiple CPUs and they
share an interrelated logic among the data. Distributed database systems
are considered to be one of the powerful management tools as they are
very powerful in supporting and manipulating database.
The difference between centralised database systems and distributed
database systems are as shown in Figures 10.1(a) and 10.1(b).

Fig. 10.1(a): Centralised DBMS

Sikkim Manipal University

Fig. 10.1(b): Distributed DBMS

B1966

Page No. 217

Database Management Systems

Unit 10

If you observe Figures 10.1(a) and 10.1(b), you will see that in Figure
10.1(a) there is only one network of system connected to databases and
through the communication network system the database is shared through
centralised system by all the systems connected. In Figure 10.1(b) you find
many databases are connected to different systems and are interlinked
through a communication network.
From the above two figures, we come to know that in centralised database
system the data is stored in one place and all the systems share the data
which is present in a single place; whereas in distributed database system,
data is present in various places and are interlinked logically. So any system
connected to the network can have access to the required data. In this way,
the fear of data loss due to system failure or database failure is reduced.
Advantages of DDBMS:

Increased reliability and availability Unlike the centralised database


system, which will be unavailable when the system fails, distributed
system will not fail and continues to function, but at a little lower level of
performance. Because the data is available in various systems on a
distributed mode, the reliability of the data exists even in system failure.

Local control DDBMS helps the local system to have more control on
their data and can exercise rigorously. This increases data integrity and
administration. Any user can have control over the non-local data when
needed.

Modular growth In case of extension of a system, it is easier to add a


local system or to share data with a connected network. It reduces the
disturbance to users due to if the main system collapses as in case of
centralised database system.

Lower communication costs DDBMS reduces communication costs


due to the possibility of keeping the data nearby to the point of use.

Faster response Mostly, data are stored in the same site where the
system is located. This helps in faster response of the system.

Disadvantages of DDBMS:
Software cost and complexity DDBMS requires complex software
to help the network to work in alignment, which increases the cost.

Sikkim Manipal University

B1966

Page No. 218

Database Management Systems

Unit 10

Processing overhead Due to the inclusion of additional sites, they


may have to exchange the messages and perform additional
calculations so that the data are properly coordinated outside the site.
Data integrity Increased complexity and need for additional
calculation may hamper the integrity of the data.
Slow response Data is distributed and it needs additional queries to
coordinate; if queries are not manipulated properly, it may result in slow
responses.
Self-Assessment Questions
8. DDBMS is based on ___________________.
9. In DDBMS, most of the data are stored in the same site where the
system is located. This advantage helps in ______________________
from the system.
10. _______________________ increases integrity and administration.
11. State whether the following statements are True/False:
a) DDBMS uses simple software to help the network to work in
alignment, which increases the cost.
b) Due to increased complexity and need for additional calculations,
DDBMS may hamper the integrity of the data.

10.5 Heterogeneous and Homogeneous Systems


Before getting into the definition of the two different types of distributed
systems, we shall study about data fragmentation, replication and allocation
techniques for distributed database design.
Data fragmentation

Techniques are used to break up the database into

logical units called fragments, and they may be assigned for storage at
various sites. In a DDBMS, decisions must be made regarding which site
should be used to store which portions of the database. There are two types
of fragmentation:
1. Horizontal fragmentation

A horizontal fragmentation divides a

relation horizontally by grouping rows to create subsets of tuples,


where each subset has a certain logical meaning. These fragments can
then be assigned to different sites in the distributed system. For
example, we may divide employee relation into three horizontal
fragments with the following conditions: (DNO=10), (DNO=20) AND
Sikkim Manipal University

B1966

Page No. 219

Database Management Systems

Unit 10

(DNO=30) each fragment contains the Employee tuples working for a


particular department.
2. Vertical fragmentations It is a collection of only certain attributes of
the relation. It divides a relation vertically by columns. For example, we
may want to fragment the employee relation into two vertical fragments.
The first fragment includes personal informationName, B date,
Address; the second fragment includes work-related informationSSN,
Salary, Mgr. no., and so on.
3. Mixed fragmentation Mixing of horizontal and vertical fragmentation
is called mixed fragmentation.
Data replication and allocation Replication is useful in improving the
availability of data. This replication of the whole database at every site in the
distributed system is called fully replicated database. This can improve
availability because the system can continue to operate as long as at least
one site is up. It improves performance of retrieval for global queries,
because the result of such a query can be obtained locally from any one
site. The disadvantage is that it can slow down update operations, since
update must be performed on every copy of the database to keep the
copies consistent. Full replication makes the concurrency control and
recovery techniques more expensive.
The other extreme from full replication is no replicatingthat is, each
fragment is stored at only one location, whereas in partial replication some
fragments of the database may be replicated and others may not. Some
people carry partially replicated databases with them on laptops.
Allocation

Each copy of a fragment must be assigned to a particular site

in the distributed system. This process is called data distribution or


allocation.
Types of DDB systems
In DDB, software is distributed over multiple sites connected by a network.
These can be distinguished with the help of two factors. The first factor is
the degree of homogeneity of the DDBMS software. If all servers (or
individual local DDBMSs) use identical software and all users use identical
software, the DDBMS is called homogeneous; otherwise, it is called
heterogeneous. At the other extreme is the federated DDBMS or
Sikkim Manipal University

B1966

Page No. 220

Database Management Systems

Unit 10

multidatabase system. In such a system, each server has an independent


DBMS, own local users, local programmers and DBA. In heterogeneous
FDBS, one server may be Relational Database Management Systems
(RDBMS), another may be network DBMS, the third one may be
hierarchical DBMS, and so on. for this purpose, it is necessary to have a
canonical system language and language translators to translate canonical
language to the language of each server.
Self-Assessment Questions
12. _______________________ divides a relation horizontally by grouping
rows to create subsets of tuples where each subset has a certain
logical meaning.
13. _________________ is useful in improving the availability of data.
14. The process of assigning each copy of a fragment to a particular site in
the distributed system is called _____________________.
15. If all servers and users use identical software, the DDBMS is called
Homogeneous. (True/False)

10.6 Summary
Let us recapitulate the important concepts discussed in this unit:
Clientserver database is a type of arrangement of personal computers
through a communication medium in which the computers are
connected through LAN. The clientserver model is basic to distributed
systems. The clientserver model consists of three parts, and they are
Client, Server and Network.
Concurrency control helps to solve the problems that are caused by
multiple users using the same data at the same time. The locking
technique for concurrency control begins here. A binary lock can have
two states or values: locked and unlocked (or 1 and 0, for simplicity).
DDBMS is based on decentralisation. In DDBMS the data is stored in
multiple CPUs and they share an interrelated logic among the data.
The two types of fragmentation are horizontal fragmentation and vertical
fragmentation.
If all servers (or individual local DDMSs) use identical software and all
users use identical software, the DDBMS is called homogeneous;
otherwise, it is called heterogeneous.
Sikkim Manipal University

B1966

Page No. 221

Database Management Systems

Unit 10

10.7 Glossary
Data warehousing: A data warehouse is a relational database that is
designed for query and analysis rather than for transaction processing.
LAN: A Local Area Network (LAN) is a computer network that interconnects
computers in a limited area such as a home, school, computer laboratory or
office building using network media.
Locks: Locks ensure that data shared by conflicting operations are
accessed by one operation at a time a simple way of serialisation.
Multitasking: It refers to the ability to execute more than one task at the
same time, a task being a program.
Query processing: Query processing is a process that turns user queries
and data modification commands into a query plan a sequence of
operations (or algorithm) on the database.
SQL: Sequential Query Language or Structured Query Language is a
special-purpose programming language designed for managing data in
Relational Database Management Systems (RDBMS). Originally based on
relational algebra and tuple relational calculus, its scope includes data
insert, query, update and delete, schema creation and modification and data
access control.
Transaction: Exchange of data is called transaction.
Workstation: Normally, a personal computer that is connected to the
central server is termed as workstation. There can be many workstations
connected to a server.

10.8 Terminal Questions


1. Describe clientserver database system.
2. What are the different types of locks in concurrency management?
3. Compare centralised system and distributed database management
system.
4. What are the advantages of DDBMS?
5. How is homogeneous distributed system different from heterogeneous
system?

Sikkim Manipal University

B1966

Page No. 222

Database Management Systems

Unit 10

10.9 Answers
Self-Assessment Questions
1. Network
2. Client
3. Client, server
4. Data warehouse
5. Answers:
a) True
b) False
6. Growing phase, shrinking phase
7. Deadlock
8. Decentralisation
9. Faster response
10. Local control
11. Answers:
a) False
b) True
12. Horizontal fragmentation
13. Replication
14. Allocation
15. True
Terminal Questions
1. Clientserver database is a type of arrangement of personal computers
through a communication medium. These computers are connected
through LAN. LAN is used to connect the computers that are located
nearby to each other. (Refer to Section 10.2 for further information.)
2. Binary locks, shared locks and exclusive locks. (Refer to Section 10.3.1
for further information.)
3. In centralised database system the data is stored in one place and all
the systems share the data which is present in a single place, whereas
in distributed database system data is present in various places and
are interlinked logically. So, any system connected to the network can
have access to the required data. In this way, the fear of data loss due
to system failure or database failure is reduced. (Refer to Section 10.4
for further information.)
Sikkim Manipal University

B1966

Page No. 223

Database Management Systems

Unit 10

4. Advantages of DDBMS are increased reliability and availability, local


control, modular growth, lower communication costs and faster
response. (Refer to Section 10.4 for further information.)
5. If all servers (or individual local DDBMSs) use identical software and all
users use identical software, the DDBMS is called homogeneous;
otherwise, it is called heterogeneous. (Refer to Section 10.5 for further
information.)

10.10 Case Study


Fragmentation and Performance
Informix; Dynamic ServerTM, version 7 is a communication database that
keeps track of cell phone subscribers information. This system is built on a
Sun Enterprise 3500 system with limited hardware resources. The machine
has only two low-speed CPUs and only 512 MB of RAM. The system has
six 9-GB disks that are mirrored to each other; in other words, we can only
write data on three disks, the other three are used for mirroring. The
database is small; barely 2 GB of data. The database has 130 tables of
which 66 are being fragmented. All fragmented tables use the same
expression-based distribution scheme that partitions a tables data into 20
fragments based on the mod function. As we examine the system more
closely, we find that tables being fragmented were not huge tables (tables
with millions of rows). These tables are pretty small; the largest one is
merely 5 MB. Only six tables exceed 1 MB (most tables are smaller than
100 KB) and the last few tables in the list are almost empty. Further
investigations show that all the fragments are placed on one physical disk
slice.
We can observe that such fragmentation strategy did not improve Informix
Dynamic Server performance at all. It actually seemed to adversely affect
the performance. The reason is obvious. When retrieving data from one
table, Informix Dynamic Server had to search 20 non-contiguous db spaces
instead of one contiguous extent and, since the system did not implement
Informix Dynamic Server Parallel Database Query (PDQ) mechanism,
Informix Dynamic Server had to perform all those searches in serial.
This is because Informix Dynamic Server did not take advantage of the
fragmentation as expected; it still searched or scanned all fragments for the
Sikkim Manipal University

B1966

Page No. 224

Database Management Systems

Unit 10

needed data. This actually defeats the whole purpose of fragmentation; the
ultimate goal of fragmentation is to reduce retrieval time by accessing
directly from the fragment that contains the data we need.
(Source:
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0206fan/
0206fan.html#section2)

Discussion Questions:
1. What shall be done to increase the efficiency?
2. Will eliminating tables help? If so how?
(Hint: Fragmentation is a new feature for Informix Dynamic Server
version 7 and above. If used properly, it will improve overall Informix
Dynamic Server performance significantly; but if used without care, it
may adversely affect performance.)
References/E-References
References
Elmasri, R., & Navathe, S. H. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
E-References
http://www.wiziq.com/tutorial/225006-Transaction-Processing-Systemin-DBMS (Retrieved on 10th November 2012)

http://media.wiley.com/product_data/excerpt/79/EHEP0003/EHEP00037
9.pdf (Retrieved on 12th November 2012)

http://docs.oracle.com/cd/B10501_01/server.920/a96520/concept.htm#
50413 (Retrieved on 15th November 2012)

http://msdn.microsoft.com/en-us/library/orm-9780596521301-02-08.aspx
(Retrieved on 15th November 2012)

http://www.agiledata.org/essays/concurrencyControl.html (Retrieved on
15th November 2012)

http://www.cs.wmich.edu/~yang/tlt/cs643/applets/twopc/locktext.html
(Retrieved on 18th January 2012)

Sikkim Manipal University

B1966

Page No. 225

Database Management Systems

Unit 10

http://docs.oracle.com/cd/B10501_01/server.920/a96520/concept.htm#
50413 (Retrieved on 18th January 2012)

http://www.ibm.com/developerworks/data/zones/informix/library/techartic
le/0206fan/0206fan.html#section2 (Retrieved on 28th January 2012)

Sikkim Manipal University

B1966

Page No. 226

Database Management Systems

Unit 11

Unit 11

Controls

Structure:
11.1 Introduction
Objectives
11.2 Atomicity
11.3 Recovery Techniques
Deferred update
Immediate update
11.4 Security, Backup and Recovery
11.5 Summary
11.6 Terminal Questions
11.7 Answers
11.8 Case Study

11.1 Introduction
In the previous unit, you studied the importance of distributed database
management system and its types. When it comes to distributed database
system, the chances of getting prone to hacking, failure, and so on will be
more. There may be also chance diversion in the flow of data. We are
aware that a computer system, like any other mechanical or electrical
device is subjected to failure. The different reasons for such failure are disk
crash, power failure, software error, and so on. In each of these cases,
information may be lost. Therefore, the database system is responsible for
the restoration of the database to a consistent state just before the time of
failure. To restore the original state of the database, the DBMS must keep
information about the changes made by the various transactions in the
system log.

Block disk cache - Cached values on disk are stored by Block Disk
Cache. Like the Indexed Disk Cache, it keeps the keys in memory. The
block disk cache stores the values in a group of fixed size blocks.

System log - This contains events logged by any operating systems.


For example, if any driver fails booting during the initial process, that
event is recorded in the system log. The operating system finds the
events that are logged by the system components beforehand.

Sikkim Manipal University

B1966

Page No. 227

Database Management Systems

Unit 11

Transaction commit - Transaction commit is responsible for making all


the data modifications permanent in the database. When transaction
commit occurs, the following are the observations:
o A commit record is made to indicate that the modifications are
permanent, and this is written to the log Depending upon the type of
commit, the log information in memory is simultaneously written to
the disk.
o Locks are released. This means the modifications can be viewable.

In this unit, you will study the different recovery techniques in the database.
You will study in detail about the security and backup feature in a database.
Objectives
After studying this unit, you should be able to:
define atomicity
identify different recovery techniques
explain security and backup features in database

11.2 Atomicity
Atomicity is a process where it states the database as a rule of ALL or
NONE. If any one part of the transaction fails, the whole transaction fails,
and that transaction is said to be an atomic transaction. A very critical
characteristic of database management is that it has to maintain atomic
nature of transactions. An example of atomic transaction can be that of
ordering a plane ticket. In this case, there are two actions involved in this
transaction. Either the customer has to pay for the seat and thus reserve it
or he/she doesnt pay for it and doesnt reserve it.
For a simpler example, let us assume you want to subtract 15 apples from
basket A and add 10 apples to basket B. This is a valid transaction. Assume
that you have removed 15 apples from basket A and your transaction is
aborted due to some error. Then you cannot add apples to basket B. In this
case, the whole transaction is cancelled.
Therefore, atomicity means indivisibility.

Sikkim Manipal University

B1966

Page No. 228

Database Management Systems

Unit 11

Self-Assessment Questions
1. Atomicity means ________________________.
2. ____________________rule says that if any one part of the transaction
fails, the whole transaction fails.
3. _______________________ is responsible for making all the data
modifications permanent in the database.

11.3 Recovery Techniques


Recovery is a process whereby an image backup of the database has taken
place (lets say at 7.00 A.M.), and a system failure has occurred at 2.00
P.M. This event causes inconsistency in the database. Say, a system had
been used by 12 people between 7.00 A.M. and 2.00 P.M. Everything these
people did to the database was written to the redo logs. The recovery
manager should redo the changes and get back the original status of the
database.
By using achieved redo logs (their log files contain all the changes made to
the database), we can completely recover the database without losing any
data and thus get back the data up to the time of failure. Then using rollback
segments, it undoes any uncommitted transactions that were recorded in
the redo logs. Thus, the redo logs are a mirror image of the database.
The DBA should be responsible for bringing the database back to operation
as quickly as possible and with little or no data loss. Conceptually, we can
distinguish two main techniques for recovery from non-catastrophic failures:
(1) deferred update and (2) immediate update recovery techniques.
11.3.1 Deferred update
It defers or postpones any actual updates to the database until the
transaction completes its execution successfully and reaches its commit
point. During transaction execution, the updates are recorded only in the log
and in each buffer. After the transaction reaches its commit point, the log is
force written to the disk, that is, the updates are recorded in the database. If
a transaction fails before it reaches the commit point, there is no need to
undo any operations; this is because the changes are not updated to
database and therefore undoing is not required, that is, the information on
the log is simply ignored. REDO is needed in case the system fails after a
transaction commits, but this is to be done before all the changes are
Sikkim Manipal University

B1966

Page No. 229

Database Management Systems

Unit 11

recorded in the database on the disk (after a failure has occurred, the
recovery subsystem consults the log to determine which transactions need
to be redone). Transaction (Ti) needs to be redone if and only if the log
contains both the record <Ti Start> and the record <Ti commit>. Thus,
information in the log is used in restoring the system to a previous
consistent state.
Hence, it is also known as No-undo/redo algorithm.
For example, consider a transaction t1 that transfers Rs. 50 from account A
to account B.
This is defined as follows:
T0: read (A)
log
T0: read (A)
A: = A 50
Write (A)
Read (B)
Write (B)
A = 950
B = 2050

database
<T0 START>
<t0.A.950>
<T0.B.2050>
<T0.commit>

Let t1 be a transaction that withdraws Rs.100 from account C, defined as


T1:read
<T1.SAR>
C:C 100
<T1.c.60>
Write(C)
<T1.commit>
C = 100
C = 600
Before execution, consider that the values for A, B and C were Rs. 1,000,
Rs. 2,000 and Rs. 700, respectively.
Now, let us assume that a system crash occurs just after the step write (C)
of transaction T1, that is, the log at the time of the crash
<T0.START>
<T0.A.950>
<T0.B.2050>
<T0.commit>
<T1.START>
<T1.C.600>

Sikkim Manipal University

B1966

Page No. 230

Database Management Systems

Unit 11

When the system comes back up, the operation read <T0> is performed.
Since the record <T0 commit> appears in the log on the disk. After this
operation is executed, the values of accounts A and B are Rs. 950 and Rs.
2,050. The value of account C remains Rs. 700. Due to incomplete
transaction, T1 can be deleted from the log.
11.3.2 Immediate update
This method updates the database without waiting to reach the commit
point, that is, when a transaction issues an update command, the database
can be updated immediately. An updated operation must be recorded in the
log before it is applied to the database. If a transaction fails after recording
some changes in the database, but before reaching its commit point, the
effect of the transaction on the database must be undone (rolled back). We
have to redo the already updated operation and undo (transactions must be
rolled back) the effects of uncommitted transactions. So in the case of
immediate update technique, both undo and redo operations are required
during recovery. Hence, immediate update technique is known as
UNDO/REDO algorithm.
Now we shall take up the algorithm for UNDO/REDO scheme.
1. Step 1: Redo all transactions for which the log has both start and
commit entries.
2. Step 2: Undo all transactions for which the log has start entry but no
commit entry.
Undo (Ti) restores the value of all data items updated by transaction
(Ti) to the old values.
Redo (Ti) sets the value of all data items updated by transaction (Ti)
to the new values.
After a failure has occurred, the recovery scheme consults the log to
determine which transactions need to be undone and which need to be
redone. This classification of transaction is accomplished as follows:
Transaction (Ti) needs to be undone if the log contains the record
<Ti start> but does not contain the record <Ti commit>
Transaction (Ti) needs to be redone if the log contains both the records
<Ti start> and the record <Ti commit>. Changes made by the
transactions are stored back to the database.

Sikkim Manipal University

B1966

Page No. 231

Database Management Systems

Unit 11

'Here is an example of an Immediate update in which the actual updates


take place in both the database and the log as a result of execution of
T0 and T1.'Lot
Database
<T0 START>
<T0.S.1000.950>
<T0.B.2000.2050>
A = 950
B = 2,050
<T0 CP, OT?>
<t1.START>
<T1.C.700.600>
C = 600
<T1 COMMIT>
Here, assume that the crash occurs just after the statement <T1.C.700.600>
but before <T1 commit> (i.e. before commit of T1).
When the system comes back, two recovery actions have to be taken:
The operation undo (T1) must be performed because only <T1 START>
is existing in the log, but there is no <T1 commit> record.
The operation redo <T0> must be performed because the log contains
both the records: <TO START> and <TO commit>.
Self-Assessment Questions
4. State whether the following statements are True or False:
a) The DBA should be responsible for bringing the database back to
operation as quickly as possible and with little or no data loss.
b) After a failure has occurred in the transaction, the recovery scheme
consults the log to check whether the transaction has been
completed or not.
5. The two different types of recovery techniques are ________________
and __________________.
6. Deferred update is also known as _________________ algorithm.
7. In ___________________________, both undo and redo operations
are required during recovery.

Sikkim Manipal University

B1966

Page No. 232

Database Management Systems

Unit 11

11.4 Security, Backup and Recovery


In this section, you will study how to secure databases against various
threats. We will provide you the introduction of the security issues and the
various threats to the databases. We shall also discuss how to handle these
threats using some control measures. If you go through this section, you will
be able to understand the basic database security techniques.
An efficient backup and security system is very essential for the enterprise.
A good DBMS must provide aids to recover the hardware/software failures.
The backup and recovery subsystem are together responsible for recovery.
Security is a broad area which has to be addressed in many ways. Some of
them are listed below:

Legal and ethical issues - This is regarding the right to access certain
information. Some of the information may not be accessible to
unauthorised users and it is legally unethical. Such information area is
managed by numerous laws.

Policy issues - There may be policy issues related to institution,


government or corporate level. There are some kinds of information
which are not supposed to be publicly available like inpatient medical
records or bank statement of an account holder.

System-related issues - There are different levels in the system where


security has to be enforced.

There may be many reasons for the failure of data transmission. The
transmission may fail because of system crash, errors in local systems,
transmission errors, catastrophes or concurrency control enforcement. The
main purpose of recovery process is to recover the data during any failure
without losing any part of the data. The recovery process can be done by
DBMS automatically or through restoring from backup copies by users.
DBMS has metadata, manipulation language and construct to meet the
responsibilities of recovery system. The components included in DBMS are
data definition language, query optimisation algorithm, performance
monitoring functions and recovery and concurrency mechanism. It is an
important job of a DBMS to respond to user requests at the right time and to
the right person.

Sikkim Manipal University

B1966

Page No. 233

Database Management Systems

Unit 11

Self-Assessment Questions
8. ______________________ issue in security is regarding the right to
access certain information.
9. State whether the following statements are True or False:
a) There are some kinds of information which are not supposed to be
publicly available like inpatient medical records or bank statement of
an account holder. Such issues are dealt with as legal and ethical
issues.
b) Security has to be enforced in different levels of the system. Such
issues are handled as system-related issues.

11.5 Summary
Let us recapitulate the important concepts discussed in this unit:
Atomicity is a process where it states the database as a rule of ALL or
NONE.
Recovery is a process whereby an image backup of the database has
taken place.
Deferred update defers or postpones any actual updates to the
database until the transaction completes its execution successfully.
Immediate update method updates the database without waiting to
reach the commit pointthat is, when a transaction issues an update
command, the database can be updated immediately.
Security is a broad area which has to be addressed in many ways.
Some of them are legal and ethical issues, policy issues and systemrelated issues.

11.6 Terminal Questions


1. What is atomicity? Explain with an example.
2. Explain the different techniques of recovery.
3. Describe the issues related to security

11.7 Answers
Self-Assessment Questions
1. Indivisibility
2. Atomic transaction
3. Transaction commit
Sikkim Manipal University

B1966

Page No. 234

Database Management Systems

Unit 11

4. Answers:
a) True
b) False
5. Deferred update and immediate update
6. No-undo/redo algorithm
7. Immediate update technique
8. Legal and ethical
9. Answers:
a) False
b) True
Terminal Questions
1. (Refer to Section 11.1 for further information.)
2. (Refer to Section 11.2 for further information.)
3. (Refer to Section 11.3 for further information.)

11.8 Case Study


Read the following news snippets:
On 20 November 1985, the Bank of New York lost over $5 million as a
result of an error in the software of the digital system that registered all
the banks financial transactions.
In 1992, a software problem created total chaos in the communication
system of ambulance services in London. The delay in communications
caused the death of 30 people.
On 7 August 1996, the computer system of Internet-provider America
Online (AOL) failed for 19 hours when a new software had been
installed. Over 16 million subscribers were affected. Before this took
place, the AOL experts had strongly suggested that the system was
immune to this kind of disaster.
Discussion Questions:
1. Which of the above recovery technique is responsible for the above
given issues?
2. Is it justified to say that digital systems are unreliable and carries
enormous risks?

Sikkim Manipal University

B1966

Page No. 235

Database Management Systems

Unit 11

3. What countermeasures should be put in place to minimise damages due


to failure of digital systems? Give your answer for each of the above
three situations.
Reference/E-References:
Elmasri, R., & Navathe, S. H. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.

Sikkim Manipal University

B1966

Page No. 236

Database Management Systems

Unit 12

Unit 12

Distributed Databases

Structure:
12.1 Introduction
Objectives
12.2 Overview of Distributed Database (DDB) System
Clientserver model
12.3 Features of DDB
12.4 Advantages and Disadvantages of DDB
12.5 Data Replication
12.6 Data Fragmentation
12.7 Summary
12.8 Glossary
12.9 Terminal Questions
12.10 Answers
12.11 Case Study

12.1 Introduction
In the 1980s, Distributed Database (DDB) systems had evolved to
overcome the limitations of centralised database management systems and
to cope with the rapid changes in communication and database
technologies. This unit introduces the fundamentals of distributed database
systems. The benefits and limitations of distributed DBMS over centralised
DBMS are briefly discussed. The objectives of a distributed system, the
components of a distributed system and the functionality provided by a
distributed system are also described in this unit.
In this unit we will study fundamentals of distributed databases, and the
features of distributed DBMSs. The pros and cons of distributed DBMSs
are discussed with an example of a distributed database system. The
classification of distributed DBMSs is explained and will introduce the
functions of distributed DBMS. We will also illustrates the components of a
distributed database system, and discuss Dates 12 objectives for
distributed database system
Objectives:
After studying this unit, you should be able to:
describe DDB system
Sikkim Manipal University

B1966

Page No. 237

Database Management Systems

list the advantages of DDB

describe data replication

elucidate data fragmentation

Unit 12

12.2 Overview of Distributed Database (DDB) System


In a centralised database system, all system components such as data,
DBMS software and storage devices reside at a single computer or site,
whereas in distributed database system, data is spread over one or more
computers connected by a network.
Distributed Database (DDB) is thus a set of databases stored on multiple
computers but it appears to a user as a single database (Figure 12.1). The
data on several computers can be simultaneously accessed and modified
(data from local and remote databases) using a network. Each database
server in the DDB is controlled by its local DBMS, and each co-operates to
maintain the consistency of the global database.
As a general goal, distributed computing systems divide a big,
unmanageable problem into smaller pieces and solve it efficiently in a
coordinated manner.

Fig. 12.1: Data Distribution and Replication among Distributed Database

Sikkim Manipal University

B1966

Page No. 238

Database Management Systems

Unit 12

Functions of distributed databases


Basic functions performed by DDBMS in addition to those of centralised
DBMS are as follows:

Distributed query processing Distributed query processing means


the ability to access remote sites and transmit queries and data among
the various sites via the communication network.

Data tracing DDBMS should have the ability to keep track of the data
distribution, fragmentation and replication by maintaining DDBMS
catalogue.

Distributed transaction management In DDBMS, the transactions


that accesses data from more than one site, and synchronises the
access to distributed data and maintains the integrity of the overall
database Is also called as distributed transaction management.

Distributed database recovery It is the ability to recover from


individual site crashes and from new types of failures.

Security It must be executed with proper management of the security


of the data and the authorisation/access privileges of the users.

Distributed directory (catalogue) management A directory contains


information (metadata) about data in the database. The directory may
be global for the entire DDB, or local for each site. The placement and
distribution of the directory are based on design and policy issues

These functions increase the complexity of a DDBMS over a centralised


DBMS. Functions of centralised database
Basic functions performed by centralised database are as follows:

It provides a complete view of your data. For example, you can query for
the number of customers worldwide or the worldwide inventory level of a
product.

It is easier to manage a centralised database than several distributed


databases.

Sikkim Manipal University

B1966

Page No. 239

Database Management Systems

Unit 12

Differences between distributed database and centralised database


Distributed database

Centralised database

It is a collection of logically distributed


database that are connected to each
other through a network.

It has all the data at one place.

Data availability is efficient.

Data availability is not efficient.

Since the data is stored in multiple


computers, the failure of one
computer will not lose the data. They
are available in another location.

The failure of central database will


lead to loss of whole database.

The data is available more closely to


the user; it is timely and convenient.

Due to the complexity in the design,


it requires significant procedures,
very
experienced
DBAs
and
systems people.

Addition of new nodes is easy in the


network.

It will not support addition of new


nodes due to the frequency.

12.1.1 Clientserver model


The clientserver model is basic to distributed systems; it allows clients to
make requests that are routed to the appropriate server in the form of
transactions. The clientserver model consists of three parts.
1. Client The client is the machine (workstation or PC) running the frontend applications. It interacts with a user through the keyboard, display
and mouse. The client has no direct data access responsibilities. The
client machine provides front-end application software for accessing the
data on the server. The clients initiate transactions and the server
processes the transactions.
Interaction between client and server might be processed in the
following ways while processing an SQL query.
a) The client passes a user query and decomposes it into a number of
independent site queries. Each site query is sent to the appropriate
server site.
b) Each server processes the local query and sends the resulting
relation to the client site.
c) The client site combines the results of the queries to produce the
result of the originally submitted query.
Thus, the server is called database processor or back-end machine,
whereas the client is called application processor or front-end machine.
Sikkim Manipal University

B1966

Page No. 240

Database Management Systems

Unit 12

Another function controlled by the client is that of ensuring consistency


of replicated copies of a data item by using distributed concurrency
control techniques. The client must also ensure the atomicity of global
transactions by performing global recovery when certain sites fail. It
provides distribution transparency, which means the client hides the
details of data distribution from the user.
2. Server The server is the machine that runs the DMS software. It is
referred to as back end. The server processes SQL and other query
statements received from client applications. It can have large disk
capacity and fast processors.
3. Network The network enables remote data access through client
server and server-to-server communication.
Each computer in a network is a node, and it acts as a client, a server,
or both, depending on the situation.
Advantages

Client applications are not dependent on physical location of the data. If


the data is moved or distributed to other database servers, the
application continues to function with little or no modification.

It provides multitasking and shared memory facilities; as a result, they


can deliver the highest possible degree of concurrency and data
integrity.

In a networked environment, shared data is stored on the servers rather


than on all computers in the system. This makes it easier and more
efficient to manage concurrent access. Inexpensive, low-end client work
stations can access the remote data of the server effectively

Self-Assessment Questions
1. ____________________ is a set of databases stored on multiple
computers but it appears to a user as a single database.
2. Which of the following function of distributed databases have the ability
to keep track of the data distribution, fragmentation and replication by
maintaining DDBMS catalogue?
a. Distributed query processing
b. Data tracing
c. Distributed database recovery
d. Security
Sikkim Manipal University

B1966

Page No. 241

Database Management Systems

Unit 12

3. The difference between distributed databases and centralized


databases in terms of data availability is that in centralized database,
data availability is __________________.
4. The client machine provides front-end application software for
accessing the data on the server. True/false?
5. The server enables remote data access through client server and
server-to-server communication. True/false?

12.3 Features of DDB


Distributed Database Management System (DDBMS) is designed to serve a
large amount of users. There it has compulsory one global application,
along with many local applications. Based on the global applications, the
features are listed as follows:

DDB is a logically related shared data.

Data is fragmented in DDB.

Fragmented data can be replicated.

These replicas are located in different sites.

The websites are linked by communication network.

The data at each website is controlled by DBMS.

Each DBMS at the local site can handle their data independently

12.4 Advantages and Disadvantages of DDB


DDBs have certain advantages and disadvantages.
Advantages

Increased reliability and availability Reliability is broadly defined as


the probability that a system is running at a certain time point, whereas
availability is defined as the system that is continuously available during
a time interval. When the data and DBMS software are distributed over
several sites, one site may fail while other sites continue to operate.
Only the data and software that exist at the failed site cannot be
accessed. In a centralised system, failure at a single site makes the
whole system unavailable to all users.

Improved performance Large database is divided into smaller


databases by keeping the necessary data where it is needed most. Data
localisation reduces the contention for CPU and I/O services, and

Sikkim Manipal University

B1966

Page No. 242

Database Management Systems

Unit 12

simultaneously reduces access delays involved in wide area network.


When a large database is distributed over multiple sites, smaller
databases exist at each site. As a result, local queries and transactions
accessing data at a single site have better performance because of the
smaller local databases. To improve parallel query processing, a single
large transaction is divided into a number of smaller transactions and
executes multiple transactions at different sites.

Data sharing Data can be accessed by users at other remote sites


through the DDBMS software.

Transparency Ideally, a DDB should be distribution-transparent in the


sense of hiding the details of where each file is physically stored within
the system. It provides network transparency, that is, the command
used to perform a task is independent of the location of data and the
location of the system where the command was issued.

Easier expansion In a distributed environment, expansion of the


system in terms of adding more data, increasing database size or
adding more processors is easier.

All the above advantages can be brought down to the following list in brief:

Data is located near the site that has the greatest demand.

Access to data is faster.

Data processing is faster.

Communication is improved.

Operating costs are reduced.

It has got user-friendly interface.

Due to many points connected to each other, there is less chance of


failure due to a single point.

It is independent of the processor.

Disadvantages

Managing and controlling becomes complex.

Security is an issue due to various access rights.

It has lack of standards.

Storage requirement is tremendous as huge amount of data is involved


in it.

Sikkim Manipal University

B1966

Page No. 243

Database Management Systems

Unit 12

It has greater difficulty in managing environment.

It has increased training costs.

Types of DDB systems


In DDB, software is distributed over multiple sites connected by a network. It
is categorised as follows:
The first factor is the degree of homogeneity of the DDBMS software. If all
servers (or individual local DDMSs) use identical software and all users use
identical software, the DDBMS is called homogeneous; otherwise, it is
called heterogeneous. At the other extreme is the federated DDBMS or
multidatabase system. In such a system, each server has an independent
DBMS, own local users, local programmers and DBA. In heterogeneous
FDBS, one server may be RDBMS, another may be network DBMS, the
third one may be hierarchical DBMS, and so on. In such a way, it is
necessary to have a canonical system language and language translators to
translate canonical language to the language of each server.
Self Assessment Questions
6. ________________is broadly defined as the probability that a system
is running at a certain time point.
7. In a distributed environment, expansion of the system in terms of
adding more data, increasing database size, or adding more
processors is much easier. This is called transparency. True/false?
8. Which is the first factor in DDBMS software?

12.5 Data Replication


Replication is useful in improving the availability of data. This replication of
the whole database at every site in the distributed system is called fully
replicated database. This can improve availability because the system can
continue to operate as long as at least one site is up. It improves the
performance of retrieval for global queries, because the result of such a
query can be obtained locally from any one site. The disadvantage is that it
can slow down update operations, since update must be performed on
every copy of the database to keep the copies consistent. Full replication
makes the concurrency control and recovery techniques more expensive.

Sikkim Manipal University

B1966

Page No. 244

Database Management Systems

Unit 12

The other extreme of full replication is no replicating, that is, each


fragment is stored at only one location, whereas in partial replication some
fragments of the database may be replicated and others may not. Some
people carry partially replicated databases with them on laptops.

12.6 Data Fragmentation


Techniques that are used to break up the database into logical units called
fragments may be assigned for storage at various sites. In a DDBMS,
decisions must be made regarding which site should be used to store which
portions of the database. There are three types of fragmentation, and they
are as follows:

Horizontal fragmentation A horizontal fragmentation divides a


relation horizontally by grouping rows to create subsets of tuples,
where each subset has a certain logical meaning. These fragments can
then be assigned to different sites in the distributed system. For
example, we may divide employee relation into three horizontal
fragments with the following conditions: (DNO=10), (DNO=20) AND
(DNO=30) each fragment contains the Employee tuples working for a
particular department.

Vertical fragmentations It is a collection of only certain attributes of


the relation. It divides a relation vertically by columns. For example, we
may want to fragment the employee relation into two vertical fragments.
The first fragment includes personal information Name, B date,
Address; the second includes work related information SSN, Salary,
Mgr. no., and so on.

Mixed fragmentation Mixing of horizontal and vertical fragmentation


is called mixed fragmentation.

Allocation Each copy of a fragment must be assigned to a particular site


in the distributed system. This process is called data distribution or
allocation.
Self-Assessment Questions
9. _________________ is useful in improving the availability of data.
10. What are the three different types of data fragmentation?

Sikkim Manipal University

B1966

Page No. 245

Database Management Systems

Unit 12

12.7 Summary
Let us recapitulate the important concepts discussed in this unit:

Distributed Database (DDB) is thus a set of databases stored on


multiple computers but it appears to a user as a single database

Basic functions performed by DDBMS in addition to those of centralised


DBMS includes, Distributed query processing, Data tracing, Distributed
transaction management, Distributed database recovery, Security and
Distributed directory (catalogue) management

Replication is useful in improving the availability of data. This replication


of the whole database at every site in the distributed system is called
fully replicated database. This can improve availability because the
system can continue to operate as long as at least one site is up.

Techniques that are used to break up the database into logical units
called fragments that may be assigned for storage at the various sites.

There are three types of fragmentation are Horizontal fragmentation,


Vertical fragmentations and Mixed fragmentation

12.8 Glossary
Server: A server is a system (software and suitable computer hardware)
that responds to requests across a computer network to provide, or help to
provide, a network service.
DMS software: A document management system (DMS) is a computer
system (or set of computer programs) used to track and store electronic
documents.
Network: A network is a group of two or more computer systems linked
together.
Fragments: A Fragment represents a behavior or a portion of user interface
in an Activity.
Website: A website is a set of related web pages served from a single web
domain.
Communication: Communication is the exchange and flow of information
and ideas from one person to another; it involves a sender transmitting an
idea, information, or feeling to a receiver.

Sikkim Manipal University

B1966

Page No. 246

Database Management Systems

Unit 12

RDBMS: RDBMS stands for Relational Database Management System.


RDBMS data is structured in database tables, fields and records. Each
RDBMS table consists of database table rows. Each database table row
consists of one or more database table fields.
Heterogeneous: Heterogeneous Data is data from any number of sources,
largely unknown and unlimited, and in many varying formats.
Homogeneous: homogeneity measures the differences or similarities
between the several studies

12.9 Terminal Questions


1. List various functions of Distributed databases?
2. Identify the advantages of client applications
3. Explain three different types of data fragmentation

12.10 Answers
Self-Assessment Questions
1. Distributed Database (DDB)
2. Data tracing
3. Not efficient.
4. true
5. false
6. Reliability
7. false
8. degree of homogeneity
9. Replication
10. Horizontal fragmentation,
fragmentation

Vertical

fragmentations

and

Mixed

Terminal Questions
1. Basic functions performed by DDBMS are Distributed query
processing, Data tracing, Distributed transaction management,
Distributed database recovery , Security, Distributed directory
(catalogue) management ( Refer section No 12.2- Overview of
distributed database systems)
Sikkim Manipal University

B1966

Page No. 247

Database Management Systems

Unit 12

2. Client applications are not dependent on physical location of the data. If


the data is moved or distributed to other database servers, the
application continues to function with little or no modification. It provides
multitasking and shared memory facilities; as a result, they can deliver
the highest possible degree of concurrency and data integrity. In a
networked environment, shared data is stored on the servers rather than
on all computers in the system. This makes it easier and more efficient
to manage concurrent access. Inexpensive, low-end client work stations
can access the remote data of the server effectively. ( Refer section No.
12.2)
3. There are three types of fragmentation, and they are Horizontal
fragmentation, Vertical fragmentations and mixed fragmentation. ( Refer
section 12.6 - Data Fragmentation)
References/E-References:
Reference:

Ray, C. (2009). Distributed Database System. India: Pearson Education.

E-Reference:

http://my.safaribooksonline.com/book/databases/9788131727188/distrib
uted-databaseconcepts/ch03lev1sec2#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk
3ODgxMzE3MjcxODglMkZjaDAzJnF1ZXJ5PQ==

http://www.krystaldms.in/ (retrieved on March 2012)

http://www.webopedia.com/TERM/N/network.html (retrieved on 15th


may 2014)

http://www.databasedir.com/what-is-rdbms/ (retrieved on 15th May 2014)

http://www.nwlink.com/~donclark/leader/leadcom.html#sthash.VNrbMjkT
.dpuf (retrieved on 15th may 2014)

http://developer.android.com/guide/components/fragments.html.
(retrieved on 13th may 2014)

Sikkim Manipal University

B1966

Page No. 248

Database Management Systems

Unit 13

Unit 13

Object-Relational Databases

Structure:
13.1 Introduction
Objectives
13.2 Basics of Object-Oriented Design (OOD)
Characteristics of OOD
Advantages of OOD
Object-oriented development
Object and object classes
13.3 Object-Oriented Data Model
Object identity
Complex objects
Persistence
Type and class hierarchies
Inheritance
13.4 Object-Oriented Databases
History of databases
How do ODBMSs work?
Implementation issues
Relationships
Advantages
Limitations
13.5 Object Relational Database Management System (ORDBMS)
Performance constraints
ORDBMS benefits
13.6 Summary
13.7 Terminal Questions
13.8 Answers

13.1 Introduction
In the previous unit, you studied distributed databases. This unit introduces
you to the basic concepts of object-oriented databases (OODs). Its purpose
is to help you decide whether you should investigate such products further,
and to understand how they work. This unit will explain to you the
approaches to OODs. The object-oriented approach offers the flexibility to
handle some of these requirements without being limited by the data types
Sikkim Manipal University

B1966

Page No. 249

Database Management Systems

Unit 13

and query languages available in traditional database systems. You will


study the modelling and designing of OOD. The key feature of OOD is their
power of giving the designer the capability to specify both the structure of
complex objects and the operations that can be applied to these objects.
Objectives
After studying this unit, you should be able to:

explain advantages of object oriented design

elaborate the working of object oriented data model

elucidate how ODBMS work

identify the constraints of ORDBMS

13.2 Basics of Object-Oriented Design (OOD)


Object-Oriented Designing (OOD) can be used to explain how a software
design may be represented as a set of interacting objects that manage their
own states and behaviours.
13.2.1 Characteristics of OOD
The following are the characteristics of OOD:

Objects are abstractions of real-world entities and manage themselves.

Objects are independent and encapsulate state and representation


information.

System functionality is expressed in terms of object services.

Shared data areas are eliminated. Objects communicate by message


passing.

Objects may be distributed and may execute sequentially or in parallel.

13.2.2 Advantages of OOD

It is easier to maintain. Objects may be understood as stand-alone


entities.

Objects are appropriate reusable components.

For some systems, there may be an obvious mapping from real-world


entities to system objects.

Sikkim Manipal University

B1966

Page No. 250

Database Management Systems

Unit 13

13.2.3 Object-oriented development


In any object-oriented development, there are three stages involved; they
are always related to each other and are distinct. The three stages are as
follows:
Analysis
Design
Programming
Object-oriented analysis involves the development of object model of the
application domain.
Object-oriented design deals with the development of object-oriented
system model to implement requirements.
Object-oriented programming is concerned with realising the OOD using
object-oriented programming language such as C++ or Java.
13.2.4 Object and object classes
Objects are the entities that represent instances of real-world and system
entities. For example, when we consider the instance of a class room, then
teacher, student, black board, bench, table, and so on will be the object of
that instance. Object classes are templates for objects. They may be used
to create objects. You can group [student name, roll number, class, marks
obtained, rank] name under class STUDENT. Objects are created according
to some object class definition. An object class definition serves as a
template for objects. It includes declarations of all the attributes and
services which should be associated with an object of that class.
Self-Assessment Questions
1. State whether True or False:
a) Objects are dependent.
b) System functionality is expressed in terms of object services.
c) Easier maintenance of OOD says that objects can be understood
as stand-alone entities.
2. What are the three stages of object-oriented development?

13.3 Object-Oriented Data Model


A data model is a logic organisation of the real-world objects (entities),
constraints put on them and the relationships that exists among objects.
Sikkim Manipal University

B1966

Page No. 251

Database Management Systems

Unit 13

You can see a DB language as a concrete syntax for a data model. Data
model is implemented by a DB system.
The basic concepts of object-oriented data model are the following:
13.3.1 Object identity
Any real-world entity is uniformly modelled as an object. They are attached
with a unique ID which is used to refer the object for retrieval. You can see
an object retaining its identity even if some or all of the values of variables
or definitions of methods change over time.
This concept of object identity is necessary in applications but does not
apply to tuples of a relational database. It is a stronger notion of identity
than that typically found in programming languages or in data models not
based on object orientation.
There are many forms of identity. They are as follows:

Value - A data value is used for identity; for example, the primary key of
a tuple in a relational database.

Name - A user-supplied name is used for identity; for example, file name
in a file system.

Built-in - A notion of identity is built into the data model or programming


languages and no user-supplied identifier is required; for example, in
object-oriented systems.

There are many situations that avail the benefits of generating the identifiers
automatically, which help in becoming human-independent in performing the
task.
13.3.2 Complex objects
Complex objects are those that are formed from the simpler objects by
applying methods to them. Examples of simpler objects may be integers,
characters, strings of any length, Booleans (0/1), floating point values and
so on; examples of methods or constructor can be set, list, tuples, and
so on.
You can differentiate complex objects as structured objects and
unstructured objects.
Structured complex objects are components and are defined by applying
type constructor recursively at different levels. For example, consider the
Sikkim Manipal University

B1966

Page No. 252

Database Management Systems

Unit 13

object DEPARTMENT. Figure 13.1 shows the diagrammatic representation


of the structured complex object for the object DEPARTMENT.

Fig. 13.1: Example for Structure Complex Objects

Represents Tuple
Represents Structure
In the first level, the DEPARTMENT has a tuple structure with six attributes
(Dno, Dname, Manager, Location, Employee and Project). You can observe
that out of these attributes Dno and Dname have basic values; the other
four have complex structure. Therefore, you need to build second level of
the complex object structure. You can also observe that out of these four,
Manager and Employee have tuple structure and the other two (Location,
Projects) have set attributes. For the third level, the manager has one basic
attribute for start_date_exec and Mgr is an attribute that refers to employee
object and has a tuple structure. For Location and Projects, we have a set of
tuple structured objects.
Thus, it is used to represent the object and its hierarchy in a structured form.
Unstructured components are data types that are stored on large data
storage. This kind of complex object is used to represent image or large
text. For example, consider objects that are two-dimensional images; if we
Sikkim Manipal University

B1966

Page No. 253

Database Management Systems

Unit 13

need that any application needs to select from the collection of those
images which are of similar pattern, then the user must provide the pattern
which is recognised. Here, pattern recognition is a different field of study in
itself which may help in studying the different patterns and building
relationship between the patterns.
13.3.3 Persistence
You can create any object by executing some applications program orby
invoking the object constructor operations. Not all objects are meant to be
stored permanently in the database. Object persistence, a term you often
hear, is used in conjunction with the issue of storing objects in databases.
Persistence is expected to operate with transactional integrity, and as such
it is subject to strict conditions. In contrast, language services offered
through standard language libraries and packages are often free from
transactional constraints. The typical mechanisms for making an object
persistent are naming and reachability.
The naming mechanism involves giving an object a unique persistent name
through which it can be retrieved by this and other programs. However, it is
sometimes not practical to give names to all objects in a large database that
includes thousands of objects; therefore, most objects are made persistent
by using the second mechanism called reachability. The reachability
mechanism works by making the object reachable from some persistent
object.
13.3.4 Type and class hierarchies
A type is defined by giving a type name and later listing the names of its
visible (public) functions. Here is a simple example: you can define a type
that gives the details of an EMPLOYEE as,
EMPLOYEE: Emp_Id, Name, Address, department, DOB, age, Phne_no
In the EMPLOYEE type, you can implement Emp_Id, Name, Address,
department, DOB, Phne_no functions as stored attributes, and the age
function as a method that calculates the age from the value given in the
DOB attribute and current date.
Class is a means of grouping all the objects that share the same set of
attributes and methods. An object must belong to only one class as an
instance of that class (instance of relationship). A class is similar to an
Sikkim Manipal University

B1966

Page No. 254

Database Management Systems

Unit 13

abstract data type. A class may also be primitive (no attributes), for
example, integer, string, Boolean. Class hierarchies derive a new class
(subclass) from an existing class (superclass). The subclass inherits all the
attributes and methods of the existing class and may have additional
attributes and methods.
13.3.5 Inheritance
Inheritance is a way of defining relationships among objects. As the name
indicates, inheritance tells us that an object is able to inherit characteristics
from another object. In more detail, we can say that an object is capable of
acquiring the state and behaviour of its parent object. The objects will have
common behaviours so that inheritance works.
For example, suppose we would like to create a class called Human which
would represent the physical characteristics. It is a generic class that would
represent you, me and any other human in the world. It has a state that talks
of having legs, arms and so on. They can eat, sleep, drink and walk. In that
way, human is capable of acquiring behaviours that resemble all of us. But
when it comes to the specific of being of a particular gender, it is not the
same. Here, another two new class types need to be creatednamely,
man and woman. The state and behaviour of the human will now depend
upon these two classes. The human will differ from each other based on
these two types and may be a combination of two classes. Therefore,
inheritance allows us to encompass the state and behaviour of a parent
class to a child class. The child class is treated as the specialised version of
its parent.
The following are the advantages of inheritance:
It is an abstraction mechanism which may be used to classify entities.
It is a reuse mechanism at both the design and the programming level.
The inheritance graph is a source of organisational knowledge about
domains and systems.
Self-Assessment Questions
3. What are the different forms of object identity?
4. ___________ are those that form from the simpler objects by applying
methods to them.

Sikkim Manipal University

B1966

Page No. 255

Database Management Systems

Unit 13

5. The typical mechanisms for making an object persistent are _________


and _________.
6. _______ is similar to an abstract data type.
7. Out of which of the concepts of object-oriented data model does one
object acquire the behaviour of another?

13.4 Object-Oriented Databases


Object-oriented databases are also called Object-Oriented Database
Management System (OODBMS). These databases store objects rather
than storing data such as integers, strings and real numbers. An object has
attributes and methods. As you know, attributes are data that define the
characteristics of an object; they store integers, strings, and so on. Methods
define the behaviour of the objects. They are also called as procedures or
functions.
If we ask ourselves when should we use object databases, then the answer
would be that we should use them whenever there is a need to store
complex databases or relationships.
While the relational database stores the data about the object, object
databases store the objects. In this way, it avoids more processing methods
for understanding the object. For example, if you have to store the image of
a dog in a relational database then you need to store the different parts of
the dog in the table. Table 13.1 may be the representation of storing the
image of a dog in the relational database.
Table 13.1: Relational Database
Face

Eyes

Moustache

Ears

Body

Front legs

Back legs

The mouth gap

If the same thing has to be stored in ODBMS, it is the object DOG which is
the combination of many attributes and methods (Table 13.2).
Table 13.2: ODBMS Table
DOG

This kind of storing serves in following types of applications:


CAD applications - This kind of storing the data helps in storing the
complex data types.
Sikkim Manipal University

B1966

Page No. 256

Database Management Systems

Unit 13

Multimedia applications - ODBMS helps in storing a wide range of


data types in the same database.

Evolutionary applications - By using ODBMS it becomes easier to


follow objects through time.

13.4.1 History of databases


In the early 1950s, the data was stored in files and this was the first method
of recording the data for processing and future process. This method was
called file processing systems. In this kind of system, data was stored after
the process was created it and has now ceased to exist. Gradually, files
were replaced by tree structure in 1960. The structure allows repeating
information using parentchild relationships: each parent can have many
children but each child has only one parent. All attributes of a specific record
are listed under an entity type. During the 1970s, relational DBMS was
introduced which works on relationship that exists between the entities.
They show more reliable property, more flexibility, less redundancy and
multiple views of the same data.
For better simulation and storing complex objects, ODBMS came into
existence during the 1990s. Advantages of reuse of database and inheriting
the properties were undertaken. Table 13.3 shows the history of databases
during the course of time.
Table 13.3: History of Databases
System

Property

File systems
(1950s)

Store data after process created it and has ceased to exist

Hierarchical/
network
(1960s)

Concurrency
Recovery
Fast access
Complex structures

Relational
(19701980s)

More reliability
Less redundancy
More flexibility
Multiple views

ODBMS
(1990s)

Better simulation
More (and complex) data types
More relationships (e.g. aggregation, specialisation)

Sikkim Manipal University

B1966

Page No. 257

Database Management Systems

Unit 13

Single language for database AND programming


Better versioning
No reconstruction of objects
Other object-oriented advantages (reuse, inheritance, etc.)

13.4.2 How do ODBMSs work?


Consider an example of Student_Course relationship. The entity STUDENT
has got attributes std_id, std_name and std_address. The following will be
the table for STUDENT database.
STUDENT
Std_id

Std_name

Std_add

MBA2001

Priyadarshini Bhat

#16, Cambridge Layout, Bangalore

MBA2002

Ashwini Sharma

#45, Gupta Layout, Mumbai

MBA2003

Ravi Joshi

#54, Airport Road, Delhi

MBA2004

Shilpa Saxena

5th Main, BTM Layout, Bangalore

MBA2005

Rashi Khanna

#4, Kanaka Layout, Lucknow

The entity COURSE has got attributes Course_id and Course_name


COURSE
Course_id

Course_name

M1

Marketing

H1

Human Resource

IS1

Information Science

IT2

Information Technology

The relationship between the STUDENT and COURSE is identified by the


relation OPTED. Therefore, the relationship database has Std_id and
course_id as its attributes.
OPTED

Sikkim Manipal University

Std_id

Course_id

MBA2001

M1

MBA2002

H1

MBA2003

IS1

MBA2004

IT2

MBA2005

M1
B1966

Page No. 258

Database Management Systems

Unit 13

The examples of queries in relational database model are given below:


1. When we require to know the course of the student with student ID
MBA2005, then the query may be
o Go to OPTED and look up student with id MBA2005 and return the
course_id.
It will return M1.
o Go to COURSE and look up M1 and return Marketing.
2. For the query, name all students opting Marketing, then the query may
be
o Go to COURSE and find course_id.
It will return M1.
o Go to OPTED and look up M1 and return all std_id.
It will return MBA2001 and MBA2005.
o Go to STUDENT and find each std_id and return each std_name.
It will return Priyadarshini Bhat and Rashi Khanna.
Figure 13.2 represents the object-oriented database model.

Fig. 13.2: Object-Oriented Database Model

The same examples of queries are represented in the object-oriented


database model as given below.
1. When we require to know the course of the student with student ID
MBA2005, then the query may be
o Search STUDENT index for pointer to MB2005.
o Follow course pointer to M1 and return course_name.

It will return Marketing.


Sikkim Manipal University

B1966

Page No. 259

Database Management Systems

Unit 13

2. For the query, name all students opting Marketing, then the query may
be
o Search Course index and find Course_id.
o Follow student pointers, looking up each std_id.
This process is called Navigation. You should note that the process relies
on pointers and for this reason pointers must be persistent. When this
system was first initiated, the querying varied considerably. But due to the
existence of Object-Oriented Language (OOL), it has become normalised.
13.4.3 Implementation issues
To implement a stored procedure, the behaviour must be described in the
object model and implemented in the run time implementation of the object
model behaviour.
Likewise, referential integrity, which is traditionally supported through
triggers or declarative constructs in the relational world, must be described
in the object model and implemented in the runtime. The theoretical problem
with this is that such things as database rules must be consistently
implemented in each application, as opposed to once in the DBMS with
most RDBMS and ERDBMS products. If this separation is not managed,
inconsistencies can arise in the database.
The most important factors that are responsible are the following:
Persistence - This is that property of object-oriented database which
gives objects persistence. This allows the objects to be stored between
database runs. This also helps in versioning, which means a new object
is created every time changes are made.
Sharing - Objects can be shared in the distributed environment. Objects
can be shared between processes wherever required. This is possible
with object-oriented databases.
Paging - Object-oriented databases can reduce the need for paging by
enabling only the currently required objects to be loaded into memory
(relational databases load in tables containing both the required data
AND other unnecessary data).
13.4.4 Relationships
Relationships are the connectivity between the two objects or among
different objects. Diamond is the notation used to represent relationship. For
Sikkim Manipal University

B1966

Page No. 260

Database Management Systems

Unit 13

any object to be a part of a system, it should have at least one relationship


that exists with the other object or entity. For example, in the class room,
suppose STUDENT and CLASSROOM are two objects then they should
have relationship between them like student sits in the classroom. Here sits
in is the relationship that is there between student and classroom. This is
represented by
STUDENT

Sits in

CLASSROOM

Similarly, for the objects TEACHER teaches STUDENT

TEACHER

Teaches

STUDENT

There are four different kinds of standard relationships which object oriented
databases models.
Inheritance - This kind of relationship is used when one object is a kind
of something else. For example, son looks like his father.
Association - This kind of relationship is used when one object is
having a connection with another object. For example, husband is
related to his wife.
Aggregation - This kind of relationship is used when one object is made
out of other objects. For example, human body is made out of different
organs.
Inverse relationship - This kind of relationship is used when one object
is part of another object. For example, stomach is part of a human body.
13.4.5 Advantages
There are many advantages in using ODBMS over RDBMS. They are as
follows:
Objects dont require assembly and disassembly, and thereby saves
coding time and execution time to assemble or disassemble objects.
ODBMS has reduced paging.
ODBMS has easier navigation facilities, which leads to easier
versioning.
Sikkim Manipal University

B1966

Page No. 261

Database Management Systems

Unit 13

ODBMS has better concurrency controla hierarchy of objects may be


locked.
In ODBMS the data model is based on the real world.
Less code is required in ODBMS when applications are object oriented.
Relationships and constraints on objects can be stored in the server
application.
ODBMS fits in well with client/server and distributed architectures.

13.4.6 Limitations
Despite several advantages in ODBMS, there are many drawbacks that are
mentioned below:
ODBMS has lower efficiency when data is simple and relationships are
simple.
In ODBMS relational tables are simpler.
ODBMS has reduced access speed due to late binding.
More user tools exist in RDBMS.
In ODBMS lack of standards includes lack of common query language,
such as SQL.
Support for RDBMS is more certain and change is less likely to be
required.
Self-Assessment Questions
8. ________ are the behaviour of the objects defined by methods.
9. What are the properties of hierarchical systems?
10. A relationship is represented using the notation _________.
11. Name the kinds of standards which the object-oriented databases
models.

13.5 Object Relational Database Management System (ORDBMS)


Object Relational Database Management System or simply ORDBMS is a
system that implements object-oriented front end on a relational database. It
acts as an interface when the other applications interact with this database.
It will behave as though the data is stored as objects. The information that is
in the form of objects is converted into database.
13.5.1 Performance constraints
ORDBMS converts data between an object-oriented format and RDBMS
format. Therefore, the speed performance of the database is degraded
Sikkim Manipal University

B1966

Page No. 262

Database Management Systems

Unit 13

substantially. This is due to the additional conversion work the database


must do.
13.5.2 ORDBMS benefits
The main benefits of ORDBMS are as follows:
1. Software is provided to convert the objects into database format.
2. Programmers need not write special codes.
3. Access is easy from an object-oriented language.
Self-Assessment Questions
12. In ___________ systems the objects are converted into relational data
in rows and columns.

13.6 Summary
In this unit, we discussed the object-oriented database development. We
discussed on how the concepts of OOD help in designing. Object-oriented
systems are used to represent the data in the form of objects. It is more
beneficial than representing in the relational database format because we
can represent the whole object, and by using pointers we can retrieve the
data easily. Despite the advantages, there are many drawbacks in using this
model. We discussed the usage of collaborative model of system where
object-oriented front end was implemented on relational database.

13.7 Terminal Questions


1. Explain the basics of OOD.
2. What are the different concepts of object-oriented development?
Discuss.
3. How does ODBMS work?
4. What are the constraints and benefits of ORDBMS?

13.8 Answers
Self-Assessment Questions
1. Answers:
a) False
b) True
c) True
2. Analysis, design and programming
Sikkim Manipal University

B1966

Page No. 263

Database Management Systems

3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

Unit 13

Value, name and built-in


Complex objects
Naming and reliability
Class
Inheritance
Procedures
Concurrency, recovery, fast access and complex structures
Diamond
Inheritance, association, aggregation and inverse relationship
ORDBMS

Terminal Questions
1. (Refer to Section 13.2 for further information.)
2. (Refer to Section 13.3 for further information.)
3. (Refer to Section 13.4.2 for further information.)
4. (Refer to Sections 13.5.1 and 13.5.2 for further information.)

Sikkim Manipal University

B1966

Page No. 264

Database Management Systems

Unit 14

Unit 14

Security and Integrity

Structure:
14.1 Introduction
Objectives
14.2 Security and Integrity Violations
14.3 Authorisation
14.4 Authentication
14.5 Encryption
The Data Encryption Standards (DESs)
Public key encryption
14.6 Granting of Privileges
14.7 Security Specification in SQL
14.8 Role of Database Administrators (DBAs) in Database Security
14.9 Issues in Database Security
14.10 Summary
14.11 Glossary
14.12 Terminal Questions
14.13 Answers
14.14 Case Study

14.1 Introduction
In the previous unit we discussed the object-oriented database system.
When we say object oriented, then the topic of securing the data is a great
issue because it has direct access to the database. We should take more
care in authorisation and authentication of the database. Security is one of
the major factors in database management that covers all the above
discussed factors. Data in a database has to be protected from
unauthorised access and manipulations. Database security involves
allowing or disallowing users from performing actions on the database.
Database must be secured against data misuse or inconsistency due to
concurrent execution.
Caselet
A database management system is a suite of software applications that
together make it possible for people or businesses to store, modify and
Sikkim Manipal University

B1966

Page No. 265

Database Management Systems

Unit 14

extract information from a database. Sounds like something found only in


bank vaults? Its not. You can find these systems in many places in your
everyday life. The ATM that you get cash out of every week is a database
management system. When you make flight reservations online, youre
providing information that is entered into such a system. Even the library
that you or your children check out books from runs on one. On a more
personal level, your personal computer can have its own database
management system. You might have spreadsheets that contain
mountains of data. Any time you fill up a spreadsheet with data and run
queries to find and analyse data in different ways, you are accessing
such a system. How do you view the data that is the result of a query?
You view the data by looking at a report. Most systems have a reporting
function that is the last step in the data manipulation process. After all,
collating the data without looking at it wont get you very far.
One of the main functions of the database management system is doing
the heavy lifting for you. In other words, you dont necessarily have to
know exactly where all that data is in the system; as long as the system
knows where it all is, it can deliver a report for you to peruse. This might
not seem to matter if youre thinking of just your computer, but throw in a
mainframe that contains reams and reams of data, and were talking
about a huge amount of information that can be stored in any number of
places within the mainframe system. The result is the same, though: a
report that you can read, analyse and act on. This functionality also
extends to a multiuser database. Such a system under this scenario
would allow you as one user to operate all functions within the database
without having to know what other users are accessing from the same
database. One popular example of this kind of multiuser database is
Microsoft SQL Server.
(Source:http://www.wisegeek.org/what-is-a-database-management-system.htm
(Retrieved on 30th January 2013))

In the above scenario, we were discussing on ATMs which are a means of


transaction of money at any time. A number of users will attempt to use the
machine and when it is a matter of money the issue of security is very
important. The authorisation and authentication is managed by a person
who is designated as a database administrator. In this unit, we will study the
Sikkim Manipal University

B1966

Page No. 266

Database Management Systems

Unit 14

different security and integrity violations. We will also discuss about the
authentication and authorisation of the users. We will also discuss the role
of the database administrator in security. When we speak about security, we
also have to discuss on ethical issues of DBMS.
Objectives
After studying this unit, you should be able to:
find out the various violations in security and integrity
relate the concept of authorisation and authentication with database
administrator
describe the security specifications in SQL
analyse the role of database administrator in security
identify the ethical issues in database security

14.2 Security and Integrity Violations


The increased use of Internet allows several people to share resources with
one another in an easy way. It also allows access to the most confidential
databases which may lead to abnormal use of the database. Misuse of
database can be categorised as being either intentional or accidental.
Intentional data loss
Intentional data misuse is a punishable offense in network security. The
database will be used for the purpose of personal benefit.
Accidental loss of data consistency
There is sometimes accidental loss of data due to the following reasons:
System crashes during transaction processing
Due to multiple users accessing the database
Distribution of data over several computers
Intentional loss of data may be due to reading, writing or destruction of data
by unauthorised users.
Database security usually protects data through several techniques
Certain portions [selected columns] of a database are available only to
those persons who are authorised to access it. This ensures that the
confidentiality of data is maintained. For example, in large organisations,
where different users may use the same database, sensitive information

Sikkim Manipal University

B1966

Page No. 267

Database Management Systems

Unit 14

such as employees salaries should be kept confidential from most of


the other users.
To protect the database, we must take security measures at several
levels. Network security is also important as database security.
Security within the operating system is implemented by providing a
password for the user accounts. It protects data in primary memory by
avoiding direct access to the data.

Self-Assessment Questions
1. ___________________________and_________________________
are the categories of misuse of data.
2. State whether the following statements are True or False:
a) A system crashes during transaction processing.
b) Single-user-accessing of the database will lead to accidental loss of
data consistency.
c) Database security is done by protecting the data in primary memory
by avoiding direct access to the data.

14.3 Authorisation
A user may have several forms of authorisation on parts of the database.
Among them are the following:
Read authorisation allows reading, but not modification of data.
Insert authorisation allows insertion of new data, but not modification of
existing data.
Update authorisation allows modification, but not deletion of data.
Delete authorisation allows deletion of data.
Index authorisation allows the creation and deletion of indices.
Resource authorisation allows the addition or deletion of attributes in a
relation.
Drop authorisation allows the deletion of relations.
The ultimate form of authority is that given to the database
administrator. The database administrator may authorise new users.
Authorisation and views
A view can hide data that a user does not need to see. Views play a very
important role in providing data security, and it simplifies the complex
queries so that users can concentrate only on the required portion of the
Sikkim Manipal University

B1966

Page No. 268

Database Management Systems

Unit 14

relations (tables). It prevents users from direct access to a relation; they can
only view portions of the table.
For example: Create view V_emp as select emp_no. Ename, Sal from Emp;
then select * from V_emp;
Here clerks are not authorised to see salary information directly from
employee relation. But he/she must be granted access to the view V_emp. It
provides a security on relation emp. A view V_emp must have read
authorisation on employee.

14.4 Authentication
While the authorisation will have a check on the amount of database to be
accessed by a user, authentication is a process that identifies the user. It
can be done with the help of simple passwords. We get confused between
these two terms and consider them to be the same and try to use them as
synonyms. But in reality, these two have to be dealt with care. As mentioned
earlier, authentication identifies the user of the database, checks with the
unique ID given to the user with the registered users, fetches the related
information needed to confirm that the user is a valid user and the rights to
access the database.
For example: If User U is asking to access the database, then the database
must identify U as a registered user. This is authentication. Suppose User U
has to perform some operation, fetch any resource or perform any operation
on a particular resource, then it has to be validated by the database that
User U is allowed to do the above tasks. This is known as authorisation.
Self-Assessment Questions
3. Insert authorisation allows insertion of new data but not ____________
of existing data.
4. State whether the following statements are True or False:
a) The DBA is not authorised to give access to new users.
b) Authorisation technique identifies the user of the database and
checks with the unique ID given to the user with registered users.

14.5 Encryption
While we try to maintain the security of the data with authentication
technique, there are always various methods to access and change the flow
Sikkim Manipal University

B1966

Page No. 269

Database Management Systems

Unit 14

of control. In order to reduce these kinds of hacking methods, we use


encryption methods to safeguard the data. Encrypting a data means
disguising a message so that even when there is diversion in the flow of
data there will be no leak of data unless it is decrypted. According to Ramez
Elmasri and Shamkant B. Navathe, Encryption is a means of maintaining
secure data in an insecure environment. It uses some specified encryption
key which can be decrypted using decryption key to obtain the original
data.
A good encryption technique should be relatively simple. It should be
difficult to determine the encryption key for an unauthorised person.
14.5.1 The Data Encryption Standards (DESs)
The DES is a system developed by the US Government for the general use
of the public. The DES algorithm uses two methods of encryption, namely,
substitution and permutation. As the name indicates, substitution is
replacement. A symbol or groups of symbols are replaced by some other
symbol. For example, MY NAME IS SHANKAR, is replaced by PB QDPH
LV VKDQNDU by taking three letters down in the English alphabet. If the
alphabets come to an end, it will be considered from the beginning.
Permutation is an arbitrary reordering of the elements in the set. One of the
permutation methods is specifying the position of the current set in relation
to the old set. For example, consider Set S, which is for the encrypted key
for the old set.
S = {3, 6, 7, 1, 13, 15, 2, 4, 8, 11, 14, 12, 10, 5, 9}
The ordering of the set S is such that the third positioned element of the old
set is the first position element in the new set, the sixth positioned element
occupies the second position in the new set, and so on. Therefore, upon
working on the set (MYNAMEISSHANKAR) will be (NEIMKRYASAANHMS).
The only difference between set and permutation is that set is enclosed by
flower braces and permutation is referred by enclosing with simple
brackets ().
14.5.2 Public key encryption
Public key encryption is a method of cryptosystem which was proposed by
Diffie and Hellman in 1976. Public key encryption was based on
mathematical functions. They used two keys instead of one key as in
Sikkim Manipal University

B1966

Page No. 270

Database Management Systems

Unit 14

bit-pattern method. This increases the strength of confidentiality. The two


keys are private key and public key in which private key is a secret key.
These keys allow anyone to send a message in a coded form. However,
decryption key is secret and only the rightful recipient can decode the
message. It is more secure and expensive.
Self-Assessment Questions
5. Which of the following means disguising a message to avoid data
leakage when the diversion occurs?
a) Authorisation
b) Authentication
c) Encryption
d) Decryption
6. DES stands for ___________________________.
7. ________________________ is an arbitrary recording of the elements
in the set.
8. Considering the encrypting technique used is three alphabets down in
the array of alphabet, then YOUR PIECE OF MIND will be _________.
9. Suppose set S = (361542) and the word is PIECES, what is the
encrypted key of the word after applying permutation?
10. __________________________ was based on mathematical function.
11. Public key encryption is based on bit-pattern method. (True/False)

14.6 Granting of Privileges


A user who has been granted some form of authorisation may be allowed to
pass all or part of his/her rights to another user.
For example: The granting of update authorisation on the loan relation of
the bank database. Assume that, initially, the database administrator grants
update authorisation on loan to users U1, U2 and U3, who may in turn pass
on this authorisation to other users. The passing of authorisation from one
user to another can be represented by an authorisation graph. The nodes of
this graph are the users. An edge U1Uj is included in the graph if Ui grants
update authorisation to Uj. The root of the graph is the database
administrator. Observe that user U5 is granted authorisation by both U1 and
U2; U4 is granted authorisation by only U1.

Sikkim Manipal University

B1966

Page No. 271

Database Management Systems

Unit 14

A user has an authorisation if and only if there is a path from the root of the
authorisation graph down to the node representing the user.
Suppose that the database administrator decides to revoke or cancel the
authorisation of a user U1, but users U4 and U5 have been granted
authorisation from U1. Before revoking authorisation from U4, U1 has to be
revoked. But there is no need to revoke permissions from U5 because U5
was granted permissions from U1 and U2. Both U1 and U2 are still granting
authorisation to U5 who retains update authorisation on loan.
To properly revoke access rights, all paths in the authorisation group must
start from the authoriser.
U1

DBA

U4

U2

U5

U3
Fig. 14.1: Authorisation Grant Graph

14.7 Security Specification in SQL


The SQL data definition language includes commands to grant and revoke
privileges. The SQL standard includes delete, insert and update privilege.
The select privilege corresponds to the read privilege.
The grant statement is used to give authorisation. The syntax is as follows:
grant<privilege list>on <relation name or view name>to<user list>
The privilege list allows the granting of several privileges in one command.
The following grant statement grants users U1, U2 and U3 select
authorisation on the branch relation.
Grant select on branch to U1, U2, U3.

Sikkim Manipal University

B1966

Page No. 272

Database Management Systems

Unit 14

The update, insert authorisation may be given either on all attributes of the
relation or on only some.
Grant update (amount) on loan to U1, U2 and U3.
If we wish to grant a privilege and allow the recipient to pass the privilege on
to other users, we append with grant option clause to the appropriate grant
command.
If we wish to allow U1 the select privilege on branch and allow U1 to grant
this privilege to others. We write:
Grant select on branch to U1 with grant option.
To revoke an authorisation, we use the revoke statement. It takes a form
almost identical to that of grant:
Revoke <privilege list>on<relation name or view name>
From<user list> [restrict | cascade]
Thus, to revoke the privilege that we granted previously, we write:
Revoke select on branch from U1, U2, U3 cascade
Revoke update (amount) on loan from U1, U2, U3
Revoke references (branch-name) on branch from U1
Self-Assessment Questions
12. To properly revoke access rights, all the paths in the authorisation
group must start from the _________________________.
13. The _________________ statement is used to give authorisation.
14. Which of the following allows the deletion of relations:
a) Index authorisation
b) Drop authorisation
c) Update authorisation
d) Select authorisation

14.8 Role of Database Administrators (DBAs) in Database


Security
The role of DBAs starts with the track-keeping of application usage in a
database environment to the record maintenance. The following are the
roles of DBAs:
Track who is accessing the data in the database.
Sikkim Manipal University

B1966

Page No. 273

Database Management Systems

Unit 14

Track which data is accessed through which application along with


noting down the time.
Manage database users.
Authorise user access to the data.
Maintain records on the usage of database access at an instant of time
which helps in load balancing and allocating system costs.
Help in auditing.

14.9 Issues in Database Security


The threats to database security are very direct like loss of availability, data
integrity, confidentiality, privacy, theft and fraud and accidental losses.
There are different types of issues which are addressed by the database.
They are as follows:
Ethical and legal issues - Some of the security features are enforced
at the level where rights are not given to be accessed by the
unauthorised users.
System issues - Various functions are installed at the physical level like
operating system and system architecture or physical level.
Organisation-based issues - In the organisation level, the data is
categorised into different levels of management, that is, as top level
security, middle level and operational level security; the data access
rights are accordingly given to the respective authorised users.
Policy-based issues - Sometimes, in any institution or organisation,
there is a policy where it is written which data can be shared and which
of them are to be kept private.
It is very important to maintain the security of the database when some part
of the database has to be accessed by a user. The authority has to be given
for that portion of database only in an integrated database. During that time,
database security plays a very important role in safeguarding the data
access from the unauthorised users.
Self-Assessment Questions
15. __________________tracks the accessing of the data in the database.
16. _________________ type of issue deals with operating system and
system architecture.

Sikkim Manipal University

B1966

Page No. 274

Database Management Systems

Unit 14

14.10 Summary
Let us recapitulate the important concepts discussed in this unit:

Misuse of database can be categorised as being either intentional or


accidental. Security within the operating system is implemented by
providing a password for the user accounts. It protects data in primary
memory by avoiding direct access to the data.

Authorisation checks the amount of database to be accessed by a user.

Authentication is a process that identifies the user.

Encryption is a means of maintaining secure data in an insecure


environment. It uses some specified encryption keys which can be
decrypted using decryption key to obtain the original data.

The DES algorithm uses two methods of encryption, namely,


substitution and permutation.

Granting authority uses grant statement to perform the various


functions.

14.11 Glossary
Integrity: the quality of being honest and having strong moral principles
Security: Security is the degree of resistance to, or protection from, harm.
DES algorithm: Data Encryption standard algorithm is a previously
predominant symmetric-key algorithm for the encryption of electronic data. It
was highly influential in the advancement of modern cryptography in the
academic world.

14.12 Terminal Questions


1. Explain DES in detail.
2. How is public key encryption different from bit-pattern method?
3. Explain the working authorisation grant graph.
4. List the role of DBA in database security.

Sikkim Manipal University

B1966

Page No. 275

Database Management Systems

Unit 14

14.13 Answers
Self-Assessment Questions
1. Intentional data loss and accidental loss of data consistency
2. Answers:
a) True
b) False
c) True
3. Modification
4. Answers:
a) False
b) False
5. c) Encryption
6. Data encryption standard
7. Permutation
8. BRXU SLHFH RIPLQG
9. ESPECI
10. Public key encryption
11. False
12. Authoriser
13. Grant
14. b) Drop authorization
15. DBA
16. System
Terminal Questions
1. The DES is a system developed by the US government for the general
use of the public. The DES algorithm uses two methods of encryption,
namely, substitution and permutation. Substitution, as the name
indicates, is replacement. A symbol or groups of symbols are replaced
by some other symbol. For example, MY NAME IS SHANKAR, is
replaced by PB QDPH LV VKDQNDU by taking three letters down in
the English alphabet. If the alphabets come to an end, it will be
considered from the beginning. (Refer to Section 14.5.1 for further
information.)
2. Public key encryption was based on mathematical functions. They used
two keys instead of one key as in bit-pattern method. This increases
Sikkim Manipal University

B1966

Page No. 276

Database Management Systems

Unit 14

the strength of confidentiality. (Refer to Section 14.5.2 for further


information.)
3. A user who has been granted some form of authorisation may be
allowed to pass all or part of his/her rights to another user. (Refer to
Section 14.6 for further information.)
4. The role of DBA in database security are as follows: track who is
accessing the data in the database; track which data is being accessed
through which application along with the time; manage the database
users. Authorise user access to the data; maintain records on the
usage of database access at an instant of time which helps in load
balancing and allocating system costs; and help in auditing. (Refer to
Section 14.8 for further information.)

14.14 Case Study


This case study speaks about the web application security assessment as
provided by ABC IT Solution. ABC IT is a leading company in the field of
Information Technology (IT) that has been providing excellent web-based
security solutions to clients. In this case we have taken an example of one
of the clients of ABC ITan industry leader for 30 years, a large sperm
bank combined the worlds most comprehensive selection of stringently
screened donors with extensive quality control. Keeping in mind the
requirements of the client, ABC IT provided a Web Application and
Database Assessment Report with the help of which the organisation was
able to identify several areas that placed the organisation at risk to hackers
and other external threats. It did so by assessing the use of a variety of
manual and automated tools to probe the organisations website for
vulnerabilities. The organisations were benefited by these assessment
reports and were able to increase security while protecting assets. ABC IT
also provides both network and security assessments as well as network
and security audits. As in the security assessments provided by ABC IT, it
reviews your systems, people and processes to help your organisation
manage its risks; its security audits are designed to review your systems
and compare with standards and compliance regulations that may apply to
your organisation. Also there are authorisation tools that give the required
right to access the details of the client to the authorised user. Assessment
Sikkim Manipal University

B1966

Page No. 277

Database Management Systems

Unit 14

services are necessary as these add business value and help one maximise
their investment in the IT infrastructure.
Discussion Questions:
1. What are the drawbacks if you dont have authorisation property in
database?
2. What are the drawbacks of an unauthorised user?
(Hint: Refer to Sections 14.3 and 14.4 for further information.)
References/E-References:
References:
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Singh, S. K. (2009). Database SystemsConcepts, Design and
Application, 3rd ed. New Delhi: Dorling Kindersley (India) Pvt Ltd.,
licenced by Pearson Education.
E-References:
http://www.cs.man.ac.uk/~horrocks/Teaching/cs2312/Lectures/Handouts
/NFexamples.pdf (Retrieved on 29th January 2013)
www.Vceit.com (Retrieved on 29th January 2013)
http://db.grussell.org/section009.html (Retrieved on 29th January 2013)
http://www.wisegeek.org/what-is-a-database-management-system.htm
(Retrieved on 30th January 2013)
http://www.dragonwins.com/domains/getteched/crypto/subs_and_perms
.htm (Retrieved on 25th February 2013)
http://docs.oracle.com/cd/E11882_01/server.112/e10897/users_secure.
htm#CHDEBHDE (Retrieved on 25th February 2013)
http://niatec.info/ViewPage.aspx?id=153 (Retrieved on 29th February
2013)
http://www.techrepublic.com/whitepapers/case-study-web-applicationdatabase-security-audit/1855003?tag=content;selector-1 (Retrieved on
4th March 2013)

Sikkim Manipal University

B1966

Page No. 278

S-ar putea să vă placă și