Documente Academic
Documente Profesional
Documente Cultură
Subject Code:
MI 0034
BKID
B1966
Additional Registrar
SMU DDE
Dean
SMU DDE
Prof. K. V. Varambally
Director
Manipal Institute of Management, Manipal
Revised Edition: Spring 2010
Printed: September 2014
This book is a distance education module comprising a collection of learning
materials for our students. All rights reserved. No part of this work may be
reproduced in any form by any means without permission in writing from Sikkim
Manipal University, Gangtok, Sikkim. Printed and Published on behalf of Sikkim
Manipal University, Gangtok, Sikkim by Manipal Global Education Services
Manipal 576 104.
Printed at Manipal Technologies Limited, Manipal.
Authors Profile:
Ramya S Gowda holds MS in Computer Science and Engineering and is pursuing
her post-graduation in management. She was working as Scientist C in Master
Control Facility, Department of Space Communication, ISRO, Hassan. She has
been associated with academics from 2006. She is presently working as a faculty
member in Sikkim Manipal University. She has published papers in various fields
like Pattern Recognition, E-Learning and Distance Education, Data mining,
Business intelligence, ecommerce, enterprise resource planning in national and
international Journals and conference such as International Conference on Digital
Factory (ICDF), National Conference on IT Enabled Practices and Emerging
management paradigms, International Conference on Computer Technology and
Development (ICCTD), emerging trends in computer science and information
technology (ETCSIT), international journal on computer science and information
technology (IJCSIT), International Journal Of Computational Engineering Research
(ijceronline.com), and Journal of Information Technology and Engineering.
Reviewers Profile
Dr Jai Raj Nair holds a Bachelor's degree in Architecture from Bengal Engineering
College (University of Calcutta), PGDBM from IIM, Calcutta and Ph.D from
Symbiosis International University, Pune. He worked for 9 years in the business
domain of Engineering and Software Consultancy in reputed organizations like
Development Consultants Ltd. (Delhi and Calcutta), and Kirloskar Computer
Services Ltd. (Bengaluru) prior to joining the academic world. At reputed B-Schools,
he has taught IT-related subjects, pertinent for management, for over 12 years. Dr
Nair is a voracious reader and an avid writer. He has presented papers at several
national, regional and international conferences. Some of his papers were selected
for international conferences conducted in Thailand, Italy, Greece and India. He has
also published research papers and articles in management journals of repute. His
research interests include e-retailing, supply chain management and retro-logistics,
business process reengineering, technology-enabled retailing, to name a few.
In House Content Review Team
Dr. Sudhakar G. P.
HOD
Dept. of Management Studies
SMU DDE
Contents
Unit 1
Database Management System
Unit 2
Database Architecture
22
Unit 3
Record Storage and File Structure Organisation
41
Unit 4
Database Design
63
Unit 5
Entity Relationship Model
89
Unit 6
Relational Algebra and Relational Calculus
110
Unit 7
Structured Query Language
135
Unit 8
Functional Dependencies and Normalisation
173
Unit 9
Database Administration
192
Unit 10
Operations and Management
209
Unit 11
Controls
227
Unit 12
Distributed Databases
237
Unit 13
Object-Relational Databases
249
Unit 14
Security and Integrity
265
MI 0034
Database Management Systems
Course Description
A Database Management Systems (DBMS) is a collection of programs that
enables you to store, modify, and extract information from a database.
There are many different types of DBMSs, ranging from small systems that
run on personal computers to huge systems that run on mainframes.
This SLM Database Management System presents the fundamental
concepts of database management in an intuitive manner geared toward
allowing students to begin working with databases as quickly as possible.
This SLM is designed for as a first course in databases for students of post
graduation level. It also contains additional material that can be used as
supplements or as introductory material for an advanced course. To
understand this SLM better you should have a familiarity with basic data
structures, computer organization, and a high-level programming language
as prerequisites. Important theoretical results are covered, but formal proofs
are omitted. In place of proofs, figures and examples are used to suggest
why a result is true.
Course Objectives
Database management has evolved to a central component of a modern
computing environment. In this SLM, knowledge about database systems
has become an essential part of an education in computer science. In this
SLM the fundamental concepts of database management like database
design, database languages etc. have been discussed.
After studying this course, the student should be able to:
explain the different components of DBMS
elaborate the working of three-schema architecture
list and explain storage devices
describe various terminologies of database design
elucidate ER Model concept with an example and describe its
components
differentiate between tuple relational calculus and domain relational
calculus
Unit 1
Unit 1
Structure:
1.1 Introduction
Objectives
1.2 Evolution of Database
1.3 Traditional File Systems versus Modern Database Management
Systems
File processing systems
Database management systems
Difference between file systems and database management
systems
1.4 Database Environment
1.5 Working of Simple Centralised System
1.6 Properties of Database Management System
1.7 Components of Database Management System
Database engine
Data dictionary
Forms generator
Query processor
Report writer
1.8 Types of Database Users
Database Administrator (DBA)
Database Designers (DBD)
End users
System analysts and application programmers
DBMS system designers and implementers
Tool developers
Operators and maintenance personnel
1.9 Types of Database Systems
1.10 Advantages of Database Management System
1.11 Summary
1.12 Glossary
1.13 Terminal Questions
1.14 Answers
1.15 Case Study
B1966
Page No. 1
Unit 1
1.1 Introduction
To have a better understanding of the Database Management System
(DBMS), you should have a knowledge of data, information and database.
In this unit, you will study the evolution of database. You will also study the
basic difference between traditional file systems and modern database
management systems. We will describe the database environment and its
working. Before going through the unit, let me brief you on the basic
knowledge requirement for studying DBMS.
The major component in the database is data. Data is a raw fact that can
be recorded and has specific meaning. The processed data is called
information. For example, the combination of letters E L E P H A
N T has no meaning to us unless it is used as a noun word Elephant.
Here, the letter E is a data and Elephant is an information. The collection
of data in rows and columns is called database.
Therefore, database management system is defined as complex set of
software programs that controls the organisation, storage and retrieval of
data in a database. This means that DBMS is a collection of related data
consisting of a set of programs to access those data. It is the complete
description of the database structures and constraints. (Source:
www.managefranchise.blogspot.com)
DBMS is used in various areas of computers including business,
engineering, education, banking, law and in any transaction processing.
When we discuss about the various definitions of DBMS, we need to first
know about the earlier method used to store the data and the difference
between them.
In this unit, you will study the difference between the file system and DBMS.
You will also study the properties, components, advantages and
disadvantages of DBMS. When we discuss all these, we would also like to
find out the types of users using DBMS.
Let us go through this unit and discuss various insights of DBMS.
Objectives:
After studying this unit, you should be able to:
differentiate between traditional file system and modern database
management system
Sikkim Manipal University
B1966
Page No. 2
Unit 1
B1966
Page No. 3
Unit 1
versus
Modern
Database
B1966
Page No. 4
Unit 1
It is relatively cheap.
It is a relatively expensive.
B1966
Page No. 5
Unit 1
It is a simple structure.
It is a complex structure.
It has no security.
Self-Assessment Questions
4. ______________________ are the combination of files and application
programs to access those files.
5. State whether the following statements are true or false.
a) Database defines number of rows and columns.
b) DBMS is relatively cheap.
c) File system is design driven.
d) In DBMS, one extra column can be added without any difficulty.
e) DBMS is having a simple structure.
1.4
Database Environment
B1966
Page No. 6
Unit 1
Database Software
Application
Programs/queries
DBMS Software
Software to process
Programs/queries
Data
Description
(METADATA)
Software to access
stored data
Database
stored
B1966
Page No. 7
Unit 1
Employee
Attribute
Constraints
(limitations)
Data types
Emp_name
Char (40)
Alphabet Only
Emp_id
Num (6)
Val>0
Emp_add
Char (100)
Emp_desig
Char (15)
Emp_dept
Char (10)
Alphabet Only
Emp_sal
Number
(10.2)
Val>0
Emp_id
Prasad
100
Shubhodaya, Near
Katariguppe Big Bazaar,
BSK II stage, Bangalore
Usha
101
10,000
Nupur
102
Lecturer
30,000
Peter
103
IT executive
15,000
Emp_addr
Emp_desig
Project leader
Emp_Sal
(Rs.)
40,000
B1966
Page No. 8
Unit 1
User
Purchasing
Request
s
Accounts
Payable
Sales
Accounts
Receivable
Inventory
Query
DBMS
Data
base
Outputs
Reports
Personnel
Payroll
B1966
Page No. 9
Unit 1
Class
Class II
Rank obtained
5th
Type
Description
Stud_name
Character
Class
Alpha numeric
Emp_id
Prasad
100
40,000
Usha
101
Software
engineer
10,000
Nupur
102
30,000
Peter
103
15,000
Emp_addr
Emp_desig
IT executive
Emp_Sal
(Rs.)
B1966
Page No. 10
Unit 1
Data Dictionary
Forms
Generator
Query
processor
Database
Engine
Report writer
B1966
Page No. 11
Unit 1
B1966
Page No. 12
Unit 1
For example, in order to retrieve the data from the student database whose
marks are more than 60%, we can write the following query:
SELECT *
FROM STUDENT
WHERE Marks obtained >= 60
When this query is written, database engine co-ordinates with the query
processor to process the query and gives the output. If any error occurs in
the query, the query processor parses it and checks for the errors and
displays the notification. It is then optimised by selecting the required
resources and then compiled to get the output.
1.7.5 Report writer
Report writers are an optional component of DBMS as the access to
database is available online. However, sometimes we may require printed
report for some documentation purpose. The format of the report writer is
product specific. Crystal Reports is an example of a popular report writer.
Self-Assessment Questions
11. ______________________ component is responsible for the security
services of the database.
12. ________________________ contains the information about the data
including the type of data and its structures.
13. Forms generator is used for generating reports. (True/False)
14. Query processor does __________________, ______________ and
_________.
B1966
Page No. 13
Unit 1
B1966
Page No. 14
Unit 1
B1966
Page No. 15
Unit 1
Redundancy is reduced.
DBMS supports multiple views. As DBMS has many users, and each
one of them might use it for different purposes, and may require to view
and manipulate only on a portion of the database, depending on
requirement.
B1966
Page No. 16
Unit 1
Self-Assessment Questions
18. Analytic databases are also called _____________________.
19. _________________ allows to track the real-time information.
20. ___________________ is helpful in storing media sources.
1.11 Summary
Let us recapitulate the important concepts discussed in this unit:
Database management system is a collection of related data consisting
of a set of programs to access those data. It is the complete description
of the database structures and constraints.
In the file processing systems, the information is stored as a group of
records called files. These systems are the combination of files and
application programs to access those files. These files are called flat
files.
The database defines the field names and format of data, that is,
whether the data is a textual data, binary data or character data, and so
on; structures of the records, that is, whether the record is a pointer,
fixed length or field order, and so on; structure of the files, that is,
whether the file structure is indexed, sequential, and so on.
DBMS acts as an intermediatory agent between programs and the data.
Only after the application programs access DBMS, the DBMS accesses
the data. Application programs are independent of the file structures. So
change in file structures does not require change in the programs and
vice versa.
The most important property of a database is that it is a logical collection
of data having some implicit meaning.
The important compoents of DBMS are database engine, data
dictionary, forms generator, report writer and query processor.
The various users of DBMS are Database Administrator (DBA),
Database Designer (DBD), end users, system analysts and application
programmers, DBMS system designers and implementers and tool
developers.
The three major types of database systems are analytic databases,
operational databases and object-oriented databases.
The major advantage of DBMS is reduction of redundacy that leads to
an increase in consistency.
Sikkim Manipal University
B1966
Page No. 17
Unit 1
1.12 Glossary
Constructing: Process of storing the information of a medium as instructed
by DBMS.
Defining: Specifies the data types, structures and constraints of the data to
be stored in the database.
Manipulating: Includes requests to retrieve the specific data in the
database, updating the database and generating reports from the retrieved
data.
Sharing: Simultaneous accessing of data by multiple users.
database
1.14 Answers
Self-Assessment Questions
1. (a) Tables
2. Records, fields
3. Database
4. File processing systems
5. (a) False
(b) False
(c) True
(d) True
(e) False
6. Programs, data
7. Defining the database
8. Database
9. True
10. False
Sikkim Manipal University
B1966
Page No. 18
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Unit 1
Database engine
Data dictionary
False
Parsing, optimising, compiling
Database administrator
Casual users, nave users
False
OLAP
OLTP
OODBMS
Terminal Questions
1. The file processing system is relatively cheaper compared to database
management system. In a file processing system, the programs and
data are interdependent where as in DBMS they are independent of
each other. (Refer to Section 1.2 for further information.)
2. In the centralised database system, database is stored in a central
location. (Refer to Section 1.5 for further information.)
3. One of the important properties of database systems is that database is
a logical collection of data having some implicit meaning. If the data are
not related then it is not called as proper database. (Refer to Section 1.6
for further information.)
4. There are many components of DBMS. Very important are database
engine, data dictionary, report writer, forms generator and query
processor. (Refer to Section 1.7 for further information.)
5. The different types of database users are Database Administrator
(DBA), Database Designers (DBD), end users, system analysts and
application programmers, DBMS system designers and implementers
and tool developers. (Refer to Section 1.8 for further information.)
6. The most important advantage of DBMS is that it reduces redundancy
and consistency is increased. (Refer to Section 1.9 for further
information.)
B1966
Page No. 19
Unit 1
B1966
Page No. 20
Unit 1
References/E-References:
References:
Er. Jain, V. K. (2006). Database Management Systems. Dreamtech
Press.
Elmasri, R., & Shamkant Navathe, B. (2009). Fundamentals of
Database Systems. Pearson Education.
Gillenson, M. L., Ponniah, P., Kriegal, A., Boris T., Taylor, A. G.,
Powell, G., & Miller, F. (2008). Introduction to Database Management.
Wiley India Edition.
B1966
Page No. 21
Unit 2
Unit 2
Database Architecture
Structure:
2.1 Introduction
Objectives
2.2 Three-Schema Architecture
2.3 Conceptual Data Modelling
Relationships
Data independence
2.4 Database Languages and Interfaces
DBMS languages
DBMS interfaces
2.5 Summary
2.6 Glossary
2.7 Terminal Questions
2.8 Answers
2.9 Case Study
2.1 Introduction
In the previous unit, we studied the basic concepts of database
management systems where we discussed the working of centralised
system and the components of DBMS. We also discussed the properties
and advantages of DBMS. We studied the different users of DBMS. In this
unit, we will study in detail the three-tier architecture of the DBMS. We will
also discuss the different models and the relationships.
Database modelling is based on the three-tier architecture. As mentioned in
IJCA website, A database model is a theory or specification describing how
a database is structured and used. The different models are hierarchical
model, network model, relational model, and so on.
In this unit, you will also study the database language and interfaces. We
need database languages to describe the data, data structures and its
operations. The best tool will be having languages to define them. There are
different database languages such as data definition language, data
manipulation language, data control language, and so on. A very good
example of a database language is Structured Query Language (SQL)
B1966
Page No. 22
Unit 2
which is discussed in detail in Unit 7. In this unit we will also discuss the
components of DBMS.
Objectives:
After studying this unit, you should be able to:
Data Models: You can define data model as a set of concepts for
viewing a set of data in a structured way. Data models are the easier
way to understand the database system by professionals and nontechnical users. It can explain the way in which the organisation uses
and manages the information. In data models, the concept of entity,
attribute and relationship is very important. Entity is something that has
a distinct, separate existence, though it need not be of a material
existence. For example, student is an entity. Attribute is the property that
describes an entity. It is a characteristic or property of an object, such as
weight, size or colour. Relationship describes the relationship between
two or more entities.
B1966
Page No. 23
Unit 2
B1966
Page No. 24
Unit 2
B1966
Page No. 25
Unit 2
not have detailed information about the attributes and schema. Conceptual
data modelling contains only the important entities and its relationships. This
kind of modelling is used in the initial modelling phase.
In order to create conceptual data model, the data is gathered from various
sources such as business documents, business analysts, group discussion
with the functional teams, database reports and end users.
The representation of a conceptual data modelling is explained with an
example of a student database which is shown in Figure 2.2.
Class
Subject
Student
Teachers
B1966
Page No. 26
Unit 2
This step is the first step in data modelling and gives clear information in
representing business of the organisation.
2.3.1 Relationships
Relationships are the type of connectivity between two or more entities. For
example, if we say student and class are two entities, then student belongs
to class will be the relationship between student and class. Therefore, here
you can say belongs to is a relationship.
There are many types of relationships. In this unit we are going to cover the
following three types of relationships:
One-to-one relationships
Many-to-many relationships
customer name
address id
201
Shubha B.G
301
202
Arpita Mandal
302
203
Priya Mehta
303
B1966
Page No. 27
Unit 2
Table 2.2
ADDRESS
Address id
Address
301
302
303
Now we have two tables: CUSTOMER and ADDRESS. If each record in the
address table belongs to one record in the customer table, then it is called
as one-to-one relationships. In this if you observe, you need to add an extra
field in the customer table called address_id. This field is called as foreign
key. Foreign key is always the primary key of another table.
We can show the mapping of the above tables in one-to-one relationships
as shown in Figure 2.3.
201
301
202
302
203
303
B1966
Page No. 28
Unit 2
Customer_id
Number of
items
Date of
order
Amount
(Rs.)
701
201
3/06/2012
15,000
702
203
4/06/2012
30,000
703
201
4/06/2012
35,000
A customer may have no orders or may have one or many orders. However,
every order belongs to only one customer.
We can show the mapping of the above tables in one-to-many relationships
as shown in Figure 2.4.
201
701
202
702
203
703
B1966
Page No. 29
Unit 2
Considering the order table as in Table 2.3 and taking another table called
items table as shown in Table 2.4, we need to create an additional table
called items_order table,as shown in Table 2.5.
Table 2.4
ITEMS
Item_id
Item_name
300
Transcend: 16 GB pendrive
301
302
303
Item_id
701
300
702
300
702
301
703
302
703
303
300
701
301
702
302
703
303
B1966
Page No. 30
Unit 2
B1966
Page No. 31
Unit 2
Currently, the most popular DML is that of SQL, which is used to retrieve
and manipulate data in a relational database.
DMLs were initially used only by computer programs, but have come to
be used by non-programmers as well (with the advent of SQL).
There has been a standard established for SQL by ANSI, but vendors
still exceed the standard and provide their own extensions.
B1966
Page No. 32
Unit 2
High-level DMLs, such as SQL can specify and retrieve many records in
a single DML statement, and are called set at a time or set oriented
DMLs.
Low-level/procedural
In low level language, the user specifies what data is needed and how
to get it.
B1966
Page No. 33
Unit 2
Forms-based interfaces
A user can fill out the form to insert new data or fill out only certain
entries.
This interface has its own schema and a dictionary of important words. It
uses the schema and dictionary to interpret a natural language request.
Self-Assessment Questions
9. ___________________ and _____________________ are the two
languages that are used for definition and manipulation of database.
10. What are the two types of DML?
B1966
Page No. 34
Unit 2
2.5 Summary
Let us recapitulate the important concepts discussed in this unit:
Data and data types are described using two DBMS languages, namely,
Data Definition Language (DDL) and Data Manipulation Language
(DML).
2.6 Glossary
ANSI: The American National Standards Institute (ANSI) is a private nonprofit organisation that oversees the development of voluntary consensus
standards for products, services, processes, systems and personnel in the
United States. The organisation also co-ordinates U.S. standards with
international standards so that American products can be used worldwide.
For example, standards ensure that people who own cameras can find the
film they need for that camera anywhere around the globe.
Client: A client is an application or system that accesses a service made
available by a server.
CODASYL: (Conference on Data Systems Languages) CODASYL is
remembered almost entirely for two activities: its work on the development
of the COBOL language and its activities in standardising database
interfaces. It also worked on a wide range of other topics, including end-user
form interfaces and operating-system control languages.
B1966
Page No. 35
Unit 2
B1966
Page No. 36
Unit 2
2.8 Answers
Self-Assessment Questions
1. Three-tier
2. Data models
3. True
4. Instance
5. (a) False
(b) False
6. Relationships
7. One-to-one
8. Physical and logical
9. Data Definition Language and Data Manipulation Language
10. High-level/non-procedural and low-level/procedural
Terminal Questions
1. Entity is something that has a distinct, separate existence, though it
need not be of a material existence. You can define schema as a
description of the database. Instance is the collection of data stored in
the database at a particular moment. A database instance is also called
as database state or snapshot. (Refer to Section 2.2 for further
information.)
2. The three-schema architecture has three levels of architecture, an
internal level, a conceptual level and an external level. The threeschema architecture is also referred to as clientserver architecture. The
division of architecture into three levels is an advantage of this
architecture which allows both developers and users to work on their
own levels. They do not need to know the details of the other levels and
they do not have to know anything about changes in the other levels.
Note that each of these schemas are only descriptions of data and the
actual data exists only at the physical level. (Refer to Section 2.2 for
further information.)
3. In one-to-one relationships, an entity of one database is uniquely related
to an entity of another database. In one-to-many and many-to-one
relationships, one entity of a database may be related to one or more
Sikkim Manipal University
B1966
Page No. 37
Unit 2
B1966
Page No. 38
Unit 2
Applications:
The goal of adopting clientserver project by the Korean security team was
to remove the inefficiency and rigidity caused by their third-party mainframe
ledger management and to build a customer-focused information system.
Benefits:
Importance:
The Korean team implemented the distributed computing environment for
transaction processing through the clientserver architecture and the
replication technology before other competitive companies could do so.
They could respond to their customer requirements within a specific time.
That is, the replication technology they adopted was very important and
core of the new system. It enabled them to process a multitude of
complicated transactions in a specific time such that they got a rapid
response time and stability. According to them, for the first time, they have
introduced the clientserver architecture and distributed computing
environment in the securities market. Before they adopted the replication
technology, the industry implemented it as the backup function only. But
they used it as one of the functions of distributed computing. After that,
many companies have adopted it as core technology to build their client
server architectures.
Success:
They had exceeded their goals and targets. Before they built this system,
they had as a target 2 million accounts, 700 million transactions per day,
Sikkim Manipal University
B1966
Page No. 39
Unit 2
1 million orders processed per day and 1.8 million order inquiries per day.
They now have 1.8 million accounts and process 850 million transactions
per day, 1.8 million order process and inquiries per day. The clientserver
technology and replication function are now operating through the entire
operational business. They will have a plan to extend our system to
Customer Relationship Management (CRM) based on the Internet
environment. According to the Korean team, before they built the client
server system, they ranked as the fifth company in securities market, but
now they are the leading company based on the new clientserver system
in Korea. Their mission will be to become a worldwide investment company.
Discussion Questions:
1. Why did the Korean team adopt clientserver environment over IBM
mainframes for maintaining their ledger systems?
2. What are the success factors of clientserver environment?
References/E-References:
References:
Mark Gillenson, L., Paulraj P., Alex K., Boris T. M., Allen Taylor, G., &
Gavin Powell, F. M. (2008). Introduction to Database Management.
Wiley India Edition.
E-References:
http://www.1keydata.com/datawarehousing/conceptual-data-model.html
(Retrieved on 4th June 2012)
http://www.cwhonors.org/laureates/finance/20055379.pdf (Retrieved on
18th June 2012)
http://www.cs.sfu.ca/CourseCentral/354/zaiane/material/notes/Chapter1/
node17.html (Retrieved on 14th May 2014)
B1966
Page No. 40
Unit 3
Unit 3
Structure:
3.1 Introduction
3.2 Memory Hierarchy
3.3 Secondary Storage Devices
Hard disk drive
DVD drive
Blu-ray disk drive
3.4 Buffering of Blocks
3.5 Placing File Records on Disk
3.6 Operations on Files
Files of unordered records (heap files)
Files of ordered records (sorted files)
Hashing techniques
3.7 Summary
3.8 Glossary
3.9 Terminal Questions
3.10 Answers
3.11 Case Study
3.1 Introduction
In Units 1 and 2, you studied about the definition of database management
systems and the core concepts of database. You have also studied
database architecture in Unit 2. The collection of data is stored on a storage
medium. The DBMS system can retrieve, update and process this data as
needed using the stored data. Computer storage media includes two main
categories: primary storage and secondary storage. Primary storage is
volatile memory and secondary storage is a permanent storage and is a
non-volatile memory.
The following are the reasons for storing databases on secondary storage:
Databases are too large to fit entirely in main memory.
Secondary storage devices are non-volatile storage, whereas main
memory is often called volatile storage.
The cost of storage per unit of data is less for disk than for primary
storage.
Sikkim Manipal University
B1966
Page No. 41
Unit 3
In this unit, you will study about the definition of the storage media and how
data is stored in the database. You will also come to know the difference
between the conventional file systems and database management systems.
Objectives:
After studying this unit, you should be able to:
describe memory hierarchy
list and explain secondary storage devices
explain buffering of blocks and placing file records on disk
elaborate operation on files
differentiate the files of unordered records (heap files) and ordered
records
B1966
Page No. 42
Unit 3
Level 2 cache It has got higher latency than level 1 by 210 times in
512 KiB or more (KiB = KibiByte referred as kilo Binary bytes, 1 Kibibyte
= 210 bytes. Its value is nearer to kilobyte).
B1966
Page No. 43
Unit 3
B1966
Page No. 44
Unit 3
move on the radius on the platter. Therefore, it allows the heads to read all
parts of the surface.
The information of each division of the platter is formed to represent a
specific location. This forms a set of concentric circles which is used to
record the data. Each concentric circle on a platter is called a track and
these tracks are further divided into sections. When the head of one surface
is on one track, the head of the corresponding other surface is also on the
respective track. All the tracks together are called cylinder. Sometimes track
and cylinder are used interchangeably. You can see a typical assembly of
platter and its data organisation in Figure 3.2.
B1966
Page No. 45
Unit 3
Latency and seek Latency is the time delay that exists between the
moment that read/write command is initiated over the physical interface
of the drive and the moment where the desired information is placed.
Latency also refers to the time taken to pass the needed byte under a
read/write head. If the read/write head has not quite reached the desired
location there will be short latency. If the head has just missed the
desired location then the head must wait for one full rotation. Therefore,
latency can be very long. Seek time is the time taken to step the
read/write head between another delay added by the track to the hard
drive performance. There are a number of ways in seek time listing such
as track-to-track seek, full-stroke seek and average seek.
o
Full stroke is the time required to step from the innermost to the
outermost tracks. This time is relatively longer.
The average seek time is half the full-stroke seek time. Seek and
latency is together needed to load and save files. For example,
while loading a file a certain amount of seek time is taken to locate
B1966
Page No. 46
Unit 3
the track which contains the starting of the file. There is some
latency during the platter rotating around the necessary sector.
The major parts of the hard disk are the frame, platters, read/write heads,
head actuators, spindle motors and drive electronics.
Head actuators Hard drives use voice coil motors which are also
called rotary coil motors that are used to actuate head movement. Voice
coil motors work with the principal of analog meter moments, that is, a
permanent magnet is enclosed within two opposing coils. When there is
a current flow in the coil, it produces a magnetic field which opposes the
permanent magnet. In order to cause a deflection that is directly
proportional to the amount of driving current, a force of opposition is
maintained by attaching the head arms to the rotating magnet. Greater
opposition and deflection is obtained by increasing current signals. You
can choose the cylinder by increasing the servo signal and maintaining
the signal at a desired level. Voice coil motors are very small and light
assemblies that are well suited to fast access times and small hard drive
assemblies. The process of track following is called serving the heads.
Spindle motors The speed at which the media passes under the
read/write heads is one of the major factors that are responsible for
drives performance. Media is passed under the read/write heads by
spinning the platter at a high speed. The spindle motor is a brushless,
low-profile DC (Direct Current) motor which is responsible for spinning
B1966
Page No. 47
Unit 3
the platter. An index censor provides feedback pulse signals that detect
the spindle as it rotates. Index signals are used by control electronics of
the drive that is used to regulate spindle speed as precisely as possible.
B1966
Page No. 48
Unit 3
single disk drive fails. But this is expensive since it requires more
number of disks as storage system.
3. RAID 0 + 1 This level is the combination of RAID 0 and RAID 1. IN
RAID 0 + 1, disk drives are mirrored and then striped. Hence, it includes
both the properties of large disk space and performance and mirroring.
4. RAID 5 It uses parity check method for fault tolerance. This eliminates
the use of double the number of disk drives just by adding a drive to
store parity.
Below table 3.1 shows comparison of different levels of RAID
RAID level
Read
performance
Write
performance
Fault
tolerance
Cost
RAID 0
Good
Good
None
Low
RAID 1 &
0+1
Good
Ok 1 logical
write = 2
physical I/Os
Excellent
Highest
RAID 5
Good
Poor 1 logical
write = 4
physical I/Os
Ok
Best for
fast
tolerance
B1966
Page No. 49
Unit 3
Data transfer rates It is defined as the time taken to read the data from
the disk. Once you access data from the disk, it has to be transferred from
the disk to the system. There are two ways of measuring data rates. They
are as follows:
o Speed at which the data is read into the onboard buffer of the drive.
o Speed at which the data is transferred across the interface in the drive
controller.
The characteristics of the DVD drive are as follows:
Storage capacity
The hard disk drive capacity is measured in bytes. Modern period drive
capacities vary from gigabyte to terabyte or more. The capacity is a
factor of the number of platters, or disks, that are installed in the drive
and the density of the magnetic storage capability of those platters.
Access speed
The hard disk drive is an electro-mechanical device. The data is read by
a head which is present on the surface of the disk. Access speed is the
combination of the speed of the head movement and how quickly the
platter can rotate under the head.
Form factor
When compared to earlier hard drives, modern hard drives are compact
and have three physical formats: 3.5, 2.5 and 1.8. The smaller
physical size limits the number of platters and the diameter of those
platters. For example, a 1.8 drive has a maximum capacity of 320
gigabytes.
Interface
There are a series of changes that have occurred over time in the
electronic connection between the hard drive and the processor.
Whenever any change occurs there is an improvement in the transfer
speed of the data and ease of handling the hard drive by the
motherboard. The current standard interface is SATA, that is, Serial
Advanced Technology Attachment.
B1966
Page No. 50
Unit 3
B1966
Page No. 51
Unit 3
7. What is the time delay that exists between the moment that read/write
command is initated over the physical interface of the drive and the
moment where the desired information is placed?
8. Which of the following is the time required to step from innermost to the
outermost tracks?
a) Track-to-track
b) Full stroke
c) Seek
d) Latency
9. RAID stands for ____________________________________.
10. BD stands for ______________________________.
B1966
Page No. 52
Unit 3
B1966
Page No. 53
Unit 3
Records having variable length fields The file records are of the
same record type, but one or more of the fields are of varying size. For
example, the NAME field of EMPLOYEE can be a variable length field.
Records having repeating fields The file records are of the same
record type, but one or more of the fields may have multiple values for
individual records. Group of values for the field is called repeating group.
Here, the record length varies depending on the number of authors.
Records having optional fields The file records are of the same
record type, but one or more of the fields are optional. That is, some of
the fields will not have values in all the records. For example, there are
25 fields in a record and out of 25 if 10 fields are optional then there will
be wastage of memory. So only the values that are present in each
record will be stored.
To utilise this unused space, we can store part of the record on one block
and the rest on another block. A pointer at the end of the first block points to
the block containing the remainder of the record. This organisation is called
Sikkim Manipal University
B1966
Page No. 54
Unit 3
spanned, because records can span more than one block. Whenever a
record is larger than a block, we must use a spanned organisation, shown in
Figure 3.4. If records are not allowed to cross block boundaries, the
organisation is called unspanned, shown in Figure 3.4. This is used with
fixed length records having B R.
We can use bfr to calculate the number of blocks b needed for file of r
records:
b = [(r/bfr)} blocks.
B1966
Page No. 55
Unit 3
To search for a record on disk, one or more blocks are copied into main
memory buffers. Programs then search for the desired record utilising the
information in the file header. If the address of the block that contains the
desired record is not known, the search programs must do a linear search
through the file blocks. Each file block is copied into a buffer and searched
until the record is located. This can be very time consuming for a large file.
Self-Assessment Questions
13. Data is stored in the form of __________________.
14. _______________ and _____________ are the types of record types.
Find (or locate) Searches for the first record satisfying search
condition.
Read (or get) Copies the current record from the buffer to a program
variable.
There are two different types of records that you can store in the files. They
are unordered records and ordered records. The files of unordered records
are called heap files and the files of ordered records are called sorted files.
3.6.1 Files of unordered records (heap files)
In the simplest and most basic type of organisation, records are placed in
the file in the order in which they are inserted, and new records are inserted
at the end of the file. Such an organisation is called a heap or pile file.
Inserting a new record is very efficient The last disk block of the file is
copied into a buffer, the new record is added and the block is then rewritten
back to the disk. However, searching for a record using linear search is an
expensive procedure.
Sikkim Manipal University
B1966
Page No. 56
Unit 3
B1966
Page No. 57
Unit 3
hashing technique or direct file organisation, the key value is converted into
an address by performing some arithmetic manipulation on the key value,
which provides very fast access to records.
Key Value
Hash function
Address
Let us consider a hash function h that maps the key value k to the value
h(k). The VALUE h(k) is used as an address.
The basic terms associated with the hashing techniques are:
1. Hash table It is simply an array that is having address of records.
2. Hash function It is the transformation of a key into the corresponding
location or address in the hash table (it can be defined as a function that
takes key as input and transforms it into a hash table index).
3. Hash key Let R be a record and its key hashes into a key value
called hash key.
Self-Assessment Questions
15. State whether the following statements are true or false:
a) Inserting a new record is inefficient.
b) Hash table is an array having address of records.
3.7 Summary
Let us recapitulate the important concepts discussed in this unit:
B1966
Page No. 58
Unit 3
3.8 Glossary
Analog: Analog describes a device or system that represents changing
values as continuously variable physical quantities. A typical analog device
is a clock in which the hands move continuously around the face.
Buffer: A buffer is an 8-KB page in memory, the same size as a data or
index page.
Cache: Cache is a collection of data duplicating original values stored
elsewhere on a computer. It is a part of primary memory.
Coercivity: It is the magnetic field applied during magnetisation of any
Ferro magnetic material.
Index: A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of slower writes and
increased storage space. Indexes can be created using one or more
columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
Non-volatile: Non-volatile storage is the memory that can retain the stored
information even when not powered.
Platter: A platter is a round magnetic plate that constitutes part of a hard
disk. Hard disks typically contain up to a dozen platters.
Record: A record is a collection of data items arranged for processing by a
program.
Volatile: Volatile memory retains the information as long as power supply is
on, but when power supply is off or interrupted the stored memory is lost.
B1966
Page No. 59
Unit 3
3.10 Answers
Self-Assessment Questions
1. Main
2. Kibibyte
3. 210
4. Answers:
a) False
b) True
c) False
5. Primary storage and secondary storage
6. Track
7. Latency
8. b
9. redundant array of inexpensive disks
10. Blu-ray Disk
11. I/O processor
12. True
13. Records
14. Fixed length and variable length
15. Answers:
a) False
b) True
Terminal Questions
1. Hard drive consists of magnetic read/write heads that reads the data
from the rotating discs. It consists of different parts which serve the
different functions of the hard disc. Hard disc consists of one or more
rough and solid substrate called Platters. Platters are made out of
aluminium as it is a light material. They are circular in shape and
magnetic substances are coated on both the sides of the platters for
reading/writing the data. (Refer to Section 3.3.1 for further information.)
2. Fixed length: All records in a file are of the same record type. If every
record in the file has exactly the same size (in bytes), the file is said to
be made up of fixed length records. Variable length records: If
different records in the file have different sizes, the file is said to be
made up of variable length records. The variable length field is a field
B1966
Page No. 60
Unit 3
B1966
Page No. 61
Unit 3
References/E-References:
E-References:
http://www.computerhope.com/jargon/s/secostor.htm (Retreived on 20th
June 2012)
http://www.ehow.com/list_6684495_characteristics-hard-drive_.html
(Retreived on 20th June 2012)
http://searchoracle.techtarget.com/definition/record (Retreived on 22nd
June 2012)
http://www.owensdesign.com/case-studies-hard-disk-drive/index.html
(Retrieved on 22nd June 2012)
B1966
Page No. 62
Unit 4
Unit 4
Database Design
Structure:
4.1
Introduction
Objectives
4.2
Relational Data Model
4.3
Relational Algebra
4.4
Data Dictionary
4.5
Normalisation
4.6
Summary
4.7
Glossary
4.8
Terminal Questions
4.9
Answers
4.10 Case Study
4.1 Introduction
In Unit 3 you have studied the basic concepts of secondary storage devices.
This unit will enable you to have knowledge of how to design a database
and the different types of designing models.
The relational model was first introduced in 1970 by Ted Codd who was
working in IBM research. The concept used in the model was mathematical
relation which resembles table and it is based on set theory and first-order
predicate logic. In this unit, we will discuss the basic characteristics of model
and constraints. These models are referred to as legacy database systems.
In this unit, we will describe the basic principles of relational model of data.
For this purpose, we will start our study by defining the concepts of models
and notations of the relational model. We will discuss the relational
constraints which is an important model. Also we will define the update
operations of the relational model, and further discuss how to handle the
violations of integrity constraints. We will study the meaning of relational
algebra which is again dealt with in detail in Unit 6.
You have to observe that this unit is an introduction of the concepts
explained in subsequent units. Therefore, you should have a better
understanding of these concepts.
B1966
Page No. 63
Unit 4
Objectives:
After studying this unit, you should be able to:
define normalisation
10 alphanumeric characters
Name
characters
Addr
Alphanumeric characters
Phone
7 digits
D birth
Date
B1966
Page No. 64
Unit 4
B1966
Page No. 65
Unit 4
specified type, namely, STUDENT, CLASS, MARKS and SUB, implying that
certain students of the class obtained certain marks in a certain subject.
Entity set
The number of tuples in a relation is called an entity set.
Database schema or relational schema:
1. Denoted by R[A1, A2, A3, , An] is made up of a relation name R and a
list of attributes A1, A2, A3, , An.
2. Database instance is the data in DATABASE at a particular point in
time.
3. D is called domain of A1 and denoted by dom[A1].
4. A relational schema is a list of attributes and their corresponding
domains [set of values].
5. To represent incomplete tuples, we must use NULL values; for example:
Apartment number.
6. Candidate keys are other keys [except primary key].
7. Primary attribute is one of the candidate keys, where values of their
attribute are unique and NOT NULL.
8. If we denote cardinality of a domain D by | D |, and assume that all
domains are finite, the total number of tuples in the Cartesian product is:
| dom(A1) | * |dom(A2) | * * | dom(An) |
9. Current relation state reflects only the valid tuples that represent a
particular state of the real world.
Relational model notation
B1966
Page No. 66
Unit 4
Std_name
Class
Address
Phone_No.
201
Jagdish Tiwari
2 MCA
9112345599
302
Annapoorna
3 MCA
No.2, Sriram
Nagar, Near
Mahalaxmi Nagar
Main Road,
Guduvanchery,
Chennai 603201
26789580
105
Sathya Shukla
1 MCA
07312395007
303
Sathya Shukla
3 MCA
934567889
B1966
Page No. 67
Unit 4
Std_name
Class
Subject_code
Marks
obtained
(%)
101
Ranjith Jha
1 MBA
F1
78
203
Meghna Sinha
2 MBA
HR1
89
105
Mekhala Sha
1 MBA
F1
98
303
Samiksha
Shukla
3 MBA
M1
67
109
Vinay Singh
2 MBA
M1
95
Subject name
Professor-in-charge
Finance
Prof. N. Rao
Human Resource
Prof. Ravi
M1
Marketing Management
Prof. A. Agrawal
IS1
Information System
Prof. R. Gowda
HR1
Subject_code is the primary key for the relation PROFESSOR and this key
links the relation with relation STUDENT. Therefore, the primary key of
relation PROFESSOR will become the foreign key in relation STUDENT.
Therefore, the foreign key is defined as the key attribute that is used to link
the two relations; also, remember that the foreign key of one relation will
always be the primary key of the linked relation.
Self-Assessment Questions
1. Relation consists of _________________ and _________________.
2. A row is also called as _________________.
3. _________________ is a set of atomic values.
4. Tuple is the number of attributes of its relation schema. (True/False)
Selection
B1966
Page No. 68
Unit 4
Projection
Cartesian product
Union
No duplication
Join operators
Intersection
Union To join two relations, they must be of the same data type. It is
represented by RS and read as R union S.
Join operators The operation with this operator joins two relations.
The different types of join operators are:
o
B1966
Page No. 69
Unit 4
Outer join A join in which each matching record from two tables is
combined into one record in the querys results, and at least one
table contributes all of its records, even if the values in the joined
field dont match those in the other table.
Out of the above operators, select, project, union, set difference and
Cartesian product are considered as basic operators and set intersection,
division, join are called derived operators. You will study relational algebra
in more detail in Unit 6.
Self-Assessment Questions
5. Which of the following operation helps you to select columns from a
relation?
a) Selection
b) Projection
c) Cartesian product
d) Join
6. _________________ join is matching of two fields in two relations.
7. __________________ operation produces the tuples in the relation
which matches all the tuples in the other relation.
B1966
Page No. 70
Unit 4
number of records in each file, names and the data types of each field. This
data dictionary is always a hidden file from the users so that the contents in
the data dictionary do not get accidentally destroyed.
A typical data dictionary has the following information:
Authorisation details.
Updating information like who is the original author and who has
updated it thereafter.
Base tables
User-accessible views
SYS file
Base tables
The database of a particular software has more than one database
associated to one another. Base tables store the associated table
information of a database. These tables are the normalised tables and are
often stored in an encrypted format to prevent them from getting destroyed.
User-accessible views
These views summarise the information stored in the base table and
decrypt the information into its respective field names, rows, and so on. For
this, join operation and WHERE clauses are mostly used. Using Views is
the safest way to avoid direct access to the base tables.
SYS file
(system file) SYS file or SYS schema is the owner of the data dictionary. No
user should perform alteration like INSERT, DELETE, MODIFY, and so on,
to the SYS file. This is the central account for the security administrator and
he should have strict control of its access.
B1966
Page No. 71
Unit 4
4.5 Normalisation
According to R. Elmasri and S.B. Navathe, normalisation is a process of
analysing the given relation schemas based on their functional
dependencies and primary keys to achieve the two desirable properties
mentioned below:
When these above properties are not satisfied by the relation then the
relation set is decomposed to form two or more sets which have the above
properties by inserting a primary key or by inserting a field to the relation.
This relational form is called Normal Form. It is always clear that the Higher
order of Normal Form (HNF) has lesser vulnerability.
In this section, we will discuss the following types of Normal Forms.
B1966
Page No. 72
Unit 4
Std_Name
Class
Address
Tel. No.
201
Ranjith
#4, Chokkanahalli,
Bangalore 560074
26677780
202
Shivraj
XI
2514890
9885643247
304
Lavanya
25234972
9912451356
The above table is not in 1NF since the field Tel. no. is multi-valued for
std_ID 202 and 304. However, if we insert a field name Mobile_No as
shown in Table 4.3(b) to maintain the atomic value attribute we may create
a null field in the field which is not allowed. Therefore, Table 4.3(b) is not in
1NF.
Table 4.3(b) Relation Sschema of a STUDENT rRelation.
Std_ ID Std_ Name
Class
Address
Tel_no.
Mobile No.
201
Ranjith
#4, Chokkanahalli,
Bangalore 560074
26677780
202
Shivraj
XI
Andheri (east)
Mumbai 400064
2514890
304
Lavanya
9885643247
Std_
Name
Ranjith
Table 4.4(b)
Class
X
Address
#4,
Chokkanahalli,
B1966
Std_
ID
201
Tel_No.
26677780
Page No. 73
Unit 4
Bangalore 560074
202
Shivraj
XI
Andheri
(east)
Mumbai 400064
202
2514890
304
Lavanya
#10,
Dadra
Post,
Bandra
(east),
Mumbai 400014
202
9885643247
304
25234972
304
9912451356
Project_
Code
Hours
Std_ Name
Class
Proj_ name
Prof_
incharge
101
HMS1
20
Ranjith Jha
1 MBA
Hospital
Ms.
management Sahana
System
203
SIM2
30
Meghna Sinha
2 MBA
303
DM1
15
Samiksha
Shukla
3 MBA
B1966
Page No. 74
Unit 4
B1966
Page No. 75
Unit 4
Prof_id
Subjects
specialisation
Qualification
Dept_
Number
Dept_
Name
HOD_ID
Dr. Rao
A1
Finance
PhD
D1
Management
H2
Dr. Ravi
A2
Marketing
PhD
D1
Management
H2
Prof. Sanat
Sha
B1
Computer
science
MCA
D2
IT
H1
Prof. Neena
Gupta
B2
Sociology
MA,
MPhil
D3
Arts &
Humanities
H3
Figure 4.2 shows the decomposition of the above table to form 3NF.
B1966
Page No. 76
Unit 4
Fig. 4.3(a)
STUD_PROG
Fig. 4.3(b)
Sikkim Manipal University
B1966
Page No. 77
Unit 4
STUD_COURSE
Fig. 4.3(c)
Figures 4.3(a), 4.3(b) and 4.3(c) are only in 1NF. To make it 2NF, we need
to remove the partial key dependencies. Therefore, we will decompose the
schema STUD_COURSE in Figure 4.3(c) into two more schemas, namely,
STUD_COURSE1 as shown in Figure 4.4(a) and COURSE in Figure 4.4(b).
STUD_COURSE1
COURSE
Now we have removed the partial key dependencies and the relation is in
2NF. To make this relation into 3NF we need to remove the transitive
dependency of the relation. Therefore, after the decomposition of relation
COURSE (Figure 4.4(b)), the normalised schemas will be as shown in the
Figures 4.5(a) and 4.5(b).
COURSE1
B1966
Page No. 78
Unit 4
FACULTY
Now the above schemas are in 3NF. Relation STUDENT (Figure 4.3(a)),
STUD_PROG
(Figure
4.3(b)),
STUD_COURSE1(Figure
4.4(a)),
COURSE1(Figure 4.5(a)) and FACULTY (Figure 4.5(b)) are in Third Normal
Form.
Now we can observe in STUDENT relation that the only determinant is
Std_ID. In STUD_COURSE1 relation, the only determinant is Std_ID,
Program. In the COURSE1 relation, the only determinant is Course_code.
In the relation FACULTY, the only determinant is Faculty_incharge. In
STUD_PROG, the determinants are Std_ID, Prog or Prog_coordinator.
Therefore, Std_ID, Prog is a candidate key. So, we will decompose the
relation STUD_PROG (Figure 4.3(b)) into two relations as shown in Figures
4.6(a) and 4.6(b).
STUD_PROG1
Fig. 4.6(a)
PROG
Fig. 4.6(b)
B1966
Page No. 79
Unit 4
Sub_name
Fac_incharge
Pushpa
Maths
Prof. Chidanand
Pushpa
Physics
Prof. Ramesh
Pushpa
Physics
Prof. Chidanand
Pushpa
Maths
Prof. Ramesh
B1966
Page No. 80
Unit 4
Sub_name
Maths
Physics
Std_name
Pushpa
Pushpa
Fac_incharge
Prof. Chidanand
Prof. Ramesh
Sub_name
Proj_name
Pushpa
Chemistry
ProjX
Pushpa
Physics
ProjY
Kapila
History
ProjY
Kavitha
Maths
ProjZ
Kapila
English
ProjX
Kapila
Chemistry
ProjX
Pushpa
Chemistry
ProjY
B1966
Page No. 81
Unit 4
4.6 Summary
Let us recapitulate the important concepts discussed in this unit:
B1966
Page No. 82
Unit 4
A data dictionary contains files that have details about the date and
information present in the database. The typical data dictionary has
information such as schema definitions of the objects in the database,
the names of the database users, space allocated and the space used
by the schemas, authorisation details, default values of the fields,
updating information like who is the original author and who has
updated, and so on.
4.7 Glossary
Constraint: Constraints are used to limit the type of data that can go into a
table. Constraints can be specified when a table is created (with the
CREATE TABLE statement) or after the table is created (with the ALTER
TABLE statement).
Conventions: Conventions are tools or terminologies used to represent a
concept.
Functional dependency: A functional dependency is a constraint between
two sets of attributes in a relation from a database.
Model: A model is a representation of an object.
Multi-value databases: Multi-value databases include commercial products
from Rocket Software, TigerLogic, jBASE, Revelation, Ladybridge,
InterSystems, Northgate Information Solutions and other companies. These
Sikkim Manipal University
B1966
Page No. 83
Unit 4
databases differ from a relational database in that they have features that
support and encourage the use of attributes which can take a list of values,
rather than all attributes being single-valued.
Nonprime: Nonprime is an attribute that is never included in any candidate
key.
Nontrivial: Nontrivial is a functional dependencies database management
forum discussing nondatabase specific SQL.
Quadruple: A tuple with four rows is called quadruple.
Schema: Schema came from a Greek word which means shape. Schema
defines a shape of the database with the type of the field and its size, and
so on.
4.9 Answers
Self-Assessment Questions
1. Relational schema and relational instance
2. Tuple
3. Domain
4. False
5. b
6. Theta
7. Division
8. Data dictionary
9. Higher order Normal Form
10. 1NF
11. Transitive
Sikkim Manipal University
B1966
Page No. 84
Unit 4
B1966
Page No. 85
Unit 4
ABC Ltd. is to collect imitation jewelry from different parts of the country and
to market it to private individuals and commercial companies. He has called
upon a reputed database designer to design and implement a database to
support his new business. At the initial planning meeting, he has put forth
his requirements which is as follows:
Over time, a customer may hire the same jewelry more than once.
Each jewelry can have only one maker associated with it.
Several reports are required from the system. The three main ones are
as follows:
B1966
Page No. 86
Unit 4
http://www.ehow.com/list_6684495_characteristics-hard-drive_html
(Retrieved on 20th June 2012)
B1966
Page No. 87
Unit 4
http://www.owensdesign.com/case-studies-hard-disk-drive/index.html
(Retrieved on 22nd June 2012)
http://docs.oracle.com/cd/B19306_01/server.102/b14220/datadict.htm
(Retrieved on 2nd July 2012)
B1966
Page No. 88
Unit 5
Unit 5
Structure:
5.1 Introduction
Objectives
5.2 Conceptual Data Model for Database Design
Create the ER Model
Conceptual data model
5.3 ER Model Concept with an Example
Components of an ER Model
Different types of attributes
5.4 Relationships, Roles and Structural Constraints
Relationships
Degree of relationship type
5.5 Constraints on Relationship Types
5.6 Summary
5.7 Glossary
5.8 Terminal Questiona
5.9 Answers
5.10 Case Study
5.1 Introduction
In Unit 4 you have studied the basic concepts of database design such as
data dictionary and normalisation. Using these concepts, now we will study
how to design a database and the different types of designing models.
Entity Relationship Model (ER Model) is used to represent objects in the
real world and the relationship among these objects, which represents the
overall logical structure of a database. We have also seen that the data
model that is independent of both the DBMS software and the hardware is
the conceptual model. ER Model is a high-level conceptual model
developed by Chen in 1976 to facilitate database design. The ER Model is
extremely useful in mapping the meaning and interaction of real-world
enterprises onto a conceptual schema. The main usage is in the design of
the database.
For better understanding of this unit you should have knowledge of the
relations and definition of ER Model. The Entity Relationship Model is
Sikkim Manipal University
B1966
Page No. 89
Unit 5
Pname
Colour
Weight
Location
P1
Nut
Red
12
Bangalore
P2
Bolt
Green
17
Ahmedabad
P3
Screw
Blue
17
Rome
P4
Screw
Red
14
Bangalore
In Table 5.1, each row represents one tuple of the relation. The number of
tuples in a relation is called the cardinality of the relation; for example, the
cardinality of the PART relation is four.
Relations of degree one are said to be unary; similarly, relations of degree
two are binary.
Objectives
After studying this unit, you should be able to:
describe conceptual data model for database design
elucidate ER Model concept with an example
elaborate components of an ER Model
explain constraints on relationship types
B1966
Page No. 90
Unit 5
Conceptual data model is the first and important step among the three
phases of database design methodology. The three phases of database
design are conceptual design, logical design and physical design.
Conceptual database design - It is the process of constructing a
database model which is independent of all physical considerations,
using the information of the enterprise.
Logical database design - In logical database design, a model is
constructed based on a specific data model. The model is constructed
on the information used in an enterprise. This model is independent of
particular DBMS and other physical considerations.
Physical database design - The database description is produced
based on the implementation and is stored on a secondary storage.
5.2.2 Conceptual data model
In this unit, we will discuss conceptual design in detail. Figure 5.1 describes
the working of database design methodology in detail.
B1966
Page No. 91
Unit 5
B1966
Page No. 92
Unit 5
B1966
Page No. 93
Unit 5
Fig. 5.2: The ER Conceptual Schema Diagram for the New COMPANY
Database
B1966
Page No. 94
Unit 5
B1966
Page No. 95
Unit 5
Entity sets - It is a set of entities of the same type that share the same
properties or attributes. The set of all employees working for the same
department can be defined as the entity set employee, but each entity
has its own values for each attribute. For example, Entity Type Name:
Employee
Company
B1966
Page No. 96
Unit 5
Address
Street Number
Area
City
Pin code
Null attribute - A null value attribute is used when an attribute does not
have any value. A null value does not mean that the value is equal to
zero, but it indicates that no value is stored for that attributefor
example, (a) Apartment number attribute of an address applies only to
addresses that are in apartment buildings and not in other types of
B1966
Page No. 97
Unit 5
Key attribute - An entity type usually has an attribute whose values are
distinct for each individual entity. Such an attribute is called a key
attribute. These attributes that uniquely identify every instance of the
entity are termed as the primary key.
Self-Assessment Questions
4. __________________ is a thing in the real world with an independent
existense that is distinguishable from all other objects.
5. State whether the following statements are true or false:
a) Entity sets is a set of properties.
b) Attributes are a set of entities of the same type that share the
same properties.
6. Simple attributes are called _______________ attributes.
7. ___________________ attribute holds a single value for a single entity.
8. State whether the following statements are true or false:
a) Key attribute is an attribute that can be used when an attribute does
not have any value.
b) A set of values associated with the attributes is called domain.
B1966
Page No. 98
Unit 5
Manage
s
Fig. 5.7(a): Example for a Unary Relationship
Sikkim Manipal University
B1966
Page No. 99
Unit 5
B1966
Unit 5
Fig. 5.8
Self-Assessment Questions
9. Relationship types is a set of all attributes. (True/False)
10. When the association is maintained with a single entity then it is a
______________ relationship.
11. If the same entity type participates more than once in a relationship
type in different roles then such relationship type is called __________
relation.
B1966
Unit 5
An employee can work in only one department and that a department has
only one manager.
One-to-many - An entity in A is associated with any number in B. An entity
in B, however, can be associated with at most one entity in A.
B1966
Unit 5
An employee can work on several projects and several employees can work
on a particular project.
Participation roles - There are two ways in which an entity can participate
in a relationship:
B1966
Unit 5
B1966
Unit 5
Self-Assessment Questions
12. A student belongs to only one class and the class can have many
students. This is a good example for _______________________
relationship.
13. The two types of participation roles are ________________________
and _________________________.
14. Total participation is also called ___________________________.
5.6 Summary
Let us recapitulate the important concepts discussed in this unit:
Conceptual data model is the first and the most important step among
the three phases of database design methodology. The three phases of
database design are conceptual design, logical design and physical
design. The conceptual design on the database design is a four-step
process. The first step is the requirement analysis, the second step is
the creation of the conceptual schema, the third step is the actual
implementation of the conceptual schema and the last step is the
physical database design phase.
5.7 Glossary
Instance: Instance is an occurrence or a copy of an object, whether
currently executing or not.
Notation: Is a symbol used to represent a particular concept.
Sikkim Manipal University
B1966
Unit 5
5.9 Answers
Self-Assessment Questions
1. Conceptual, logical
2. Conceptual database design
3. True
4. Entity
5. Answers
a) False
b) False
6. Atomic
7. Single-valued
8. Answers
a) False
b) True
9. False
10. Unary
11. Recursive
12. One-to-many
13. Total participation and partial participation
14. Existence dependency
Terminal Questions
1. The four phases in the design on ER Model are requirement analysis,
schema design, implementation and physical design. (Refer to Section
5.2.2 for further information.)
B1966
Unit 5
2. Entity, weak entity, attributes and composite attribute, and so on. (Refer
to Section 5.3 and Figure 5.3 for further information.)
3. Entity, attributes, identifier, relationships, and so on. (Refer to
Sections 5.4 and 5.5 for further information.)
4. Relationship types usually have certain constraints that limit the
possible combination of entities that may participate in the relationship
instance.
B1966
Unit 5
Recursive Relation
EMPLOYEE
Emp_ID
Manager_ID
DEPARTMENT
Father_name
Dept_code
Address
Dept_name
DOB
Date_of_joining
Budget_code
Dept_code
Discussion Questions:
1. Which are the important elements of the ER Model? Identify the different
elements pertaining to the above case.
2. Why employee is a recursive relation and what kind of relationship does
EMPLOYEE entity share with itself?
3. What kind of relationship does DEPARTMENT entity share with
EMPLOYEE?
4. Which is the foreign key in the above case and why?
5. Consider the enhancement of the above case considering each
employee as either salaried or hourly. Hourly employees receive an
hourly rate of pay. Salaried employees can be assigned to projects.
Projects have a definite start and end date and may have a team of
salaried employees working on it. Each project is given a priority level of
low, medium or high. Identify the different components in this case and
construct an ER diagram for the same.
B1966
Unit 5
References/E-References:
E-References:
http://www.google.co.in/url?sa=t&rct=j&q=conceptual%20data%20model
%20for%20database%20design&source=web&cd=7&ved=0CGAQFjAG
&url=http%3A%2F%2Fpeople.stfx.ca%2Frpalanis%2F475%2FConceptu
al.ppt&ei=wIb6T6m5MsexrAfb0angBg&usg=AFQjCNEG6R_KFxyPDSa
CHVYTSrdH0j9lyA (Retrieved on 9th July 2012)
B1966
Unit 6
Unit 6
Structure:
6.1 Introduction
6.2 Relational Model Constraints
Domain constraints
Key constraints
Constraints on NULLs
Entity-integrity constraints
Referential-integrity constraints
6.3 Update Operations on Relations
Insert operations
Delete operations
Modify operations
6.4 The Relational Algebra
Set theoretic operations
Relational operations
6.5 Relational Calculus
Tuple relational calculus
Domain relational calculus
Tuple relational calculus versus domain relational calculus
Relational algebra versus relational calculus:
6.6 Summary
6.7 Glossary
6.8 Terminal Questions
6.9 Answers
6.1 Introduction
In Unit 4 you have studied the basic concepts of relational algebra such as
the different types of operations in relational algebra and their definition.
You have already studied that the relational model represents the database
in terms of relations having a set of rows and columns, each of which is
assigned a unique name. According to the relational model, database is a
collection of relations. The relational model was first introduced by Ted
Codd in 1970. As it was simple and based on mathematics, it was
immediately accepted by the people. This model is based on mathematical
Sikkim Manipal University
B1966
Unit 6
B1966
Unit 6
Inherent-model-based
Schema-based
Application-based
B1966
Unit 6
That means for any attribute A in a tuple r, A must be the atomic value from
the same domain dom(A). By atomic we mean that each value in the
domain is indivisible as far as the relational model is concerned.
Examples:
o The set of 11 digit phone numbers is valid in India.
o The set of character strings represent the name of the person.
o The age of an employee in a company must vary between 18 and 65.
Therefore, the data types available for domain constraint may be character,
integer, real numbers, Boolean, fixed-length and variable-length strings,
date, time, currency, and so on.
6.2.2 Key constraints
Key constraint states that any two tuples in a relation cannot have identical
values for all the attributes in the key, and key is a minimal superkey; it
means it is a superkey from which we cannot remove any further attributes
from the database and still the uniqueness exists satisfying the first
condition.
Table 6.1: STUDENT Relation
Std_ID
Std_name
Class
Subject_code
Marks
obtained
If you recall the example of STUDENT database from Unit 4, std_ID is a key
as no two students in the database have the same std_ID. In Table 6.1, a
superkey is [std_ID, Std_name, class, Subject_code, Marks Obtained]. This
is not the key because removing std_name or class still leaves us with a key
attribute.
6.2.3 Constraints on NULLs
In any relation, NULL attributes are not allowed. Another constraint on
attribute is to specify whether the NULL attributes are allowed or not in any
relation. For example, suppose in a STUDENT database if it has to have a
valid tuple then every student must have a name and class. Then in that
case std_name and class are constrained to be NOT NULL.
6.2.4 Entity-integrity constraints
In any relation, a primary key attribute cannot have NULL value; because, if
there is NULL value in the primary key then we may not be able to identify
Sikkim Manipal University
B1966
Unit 6
the tuple in the relation and we may lose one or more tuples which have
NULL value. This constraint is expressed by entity-integrity constraints.
Table 6.2: STUDENT Relation with Null Values
Std_ID
Std_name
Class
Subject_code
Marks
obtained (%)
101
AAA
1 MBA
CN1
67
NULL value
BBB
1 MBA
DB1
78
104
CCC
1MBA
DB1
67
NULL value
EEE
1 MBA
CN1
89
For example, in the STUDENT relation, if std_ID can have NULL values as
in the case of Table 6.2, when we give reference of std_ID we may lose the
tuples which have NULL value and then it cannot be a primary key as per its
definition.
We must keep in mind that entity-integrity constraints are expressed on
individual relation.
6.2.5 Referential-integrity constraints
In order to have a clear understanding of referential-integrity constraints, let
us recall the definition of foreign key which you have studied in Unit 4 a
foreign key of one relation is a primary key in the related table.
For example, consider Tables 6.3(a) and 6.3(b), STUDENT and SUBJECT
Relations.
Table 6.3 (a): STUDENT Relation
Std_ID
Std_name
Class
Subject_code
Marks obtained
(%)
101
AAA
1 MBA
CN1
67
103
BBB
1 MBA
DB1
78
104
CCC
1MBA
DB1
67
105
EEE
1 MBA
SE1
89
B1966
Unit 6
Sub_name
Fac_incharge
CN1
Computer Networks
DB1
Prof. Guru. S
SE1
Software Engineering
Prof. Thimmaih
In Table 6.3(a), the attribute subject_code gives the subject code for which
each student opts for in his/her class. Therefore, the value in the
subject_code in the STUDENT relation must match the sub_code value of
some tuple in the SUBJECT relation. Here, sub_code is a primary key of
SUBJECT relation and hence it is a foreign key in STUDENT relation.
In the above example, STUDENT relation is called referencing relation and
SUBJECT relation is called referenced relation.
Therefore, if a referential-integrity constraint has to be held in a database,
then the attributes of foreign key of referencing relation must have the same
domain as the primary key of referenced relation. Also, the value of a
foreign key in a tuple of a current state of the referencing relation occurs as
a value of primary key for some tuple in the current state of the referenced
relation or it is a NULL.
Self-Assessment Questions
1. ________________________________ constraints are also called as
implicit constraints.
2. Application-based constraint is also called as ____________________
constraint.
3. State whether the following statements are true or false:
a) Domain constraint states that any two tuples in a relation cannot
have identical values.
b) NULL attributes are allowed in a relation.
4. If a referential integrity has to be held in the database, then the
attributes of foreign key of referencing relation must have same domain
as the primary key of the referenced relation.
B1966
Unit 6
Entity constraints Can be violated if the primary key of the new tuple
t is NULL (avoids NULL values).
B1966
Unit 6
B1966
Unit 6
B1966
Unit 6
Std_ID
Name
Std_ID
Name
Jyothi
Girija
Ganga
Ankitha
Girija
Tanvi
Ankitha
Manvi
Std_ID
Name
Jyothi
Ganga
Girija
Ankitha
Tanvi
Manvi
B1966
Unit 6
RS
Std_ID
3
4
Name
Girija
Ankitha
Name
Jyothi
Ganga
B1966
Unit 6
Sub_code
Sub_name
Proj_ID
Proj_name
EC1
E&C
10
Networking
CS1
Computer
Science
11
Payroll
HR1
HRD
R S and the
Sub_name
Proj_ID
Proj_name
EC1
E&C
10
Networking
EC1
E&C
11
Payroll
CS1
Computer science
10
Net working
CS1
Computer science
11
Payroll
HR1
HRD
10
Networking
HR1
HRD
11
Payroll
The relation R has 2 columns and 3 tuples. The relation S has 2 columns
and 3 tuples. So the Cartesian product has 4 columns (2 + 2) and 6 tuples
(3 x 2).
The Cartesian product operation applied by itself alone is generally
meaningless. It is useful only when followed by selection and projection
operations.
6.4.2 Relational operations
These are the operations that are developed for relational databases. In this
section, we discuss about SELECT, PROJECT and JOIN operations.
The SELECT operation: This operation selects required rows from the
table. This operation is used to select the subset of the tuples from a
Sikkim Manipal University
B1966
Unit 6
B1966
Unit 6
To select names and addresses of all students who have opted for the
subject with subject code CS1, the below query is used:
std_name, address(sub_code=CS1(STUDENT)
STUDENT.sub_code=
SUBJECT.
The first operation in the JOIN operation will combine the tuples of the
STUDENT and SUBJECT relations on the basis of the sub_code to form
a relation called STUD_SUB. Then the Project operation will create a
relation RESULT with the attributes std_ID, std_name, and sub_name.
Sikkim Manipal University
B1966
Unit 6
Sub_name
CN1
Computer networks
SE2
Software engineering
HR3
Human Resource
B1966
Unit 6
Pname
Sub_code
10
Library Management
SE2
20
ERP
HR3
30
Hospital Management
SE2
40
Wireless Network
CN1
PName
Sub_code
Sub_name
10
Library
Management
SE2
Software engineering
20
ERP
HR3
Human Resource
30
Hospital
Management
SE2
Software Engineering
40
Wireless Network
CN1
Computer Networks
B1966
Unit 6
Worker
Worker_skill
Name
Age
Addr
Name
Skill
Adah
23
Adah
Work
Andrew
29
Jone
Smithy
Barath
22
Elbert
Discuss
Jone
19
Helen
Driver
Donald
23
Wilfred
Fitter
Elbert
26
Marg
Smithy
George
28
Rita
Fitting
Helen
15
Result
Name
Age
Addr
Adah
23
Work
Andrew
29
Barath
22
Jone
18
Donald
16
Elbert
43
George
41
Helen
27
Smithy
Discuss
Driver
B1966
Unit 6
Self-Assessment Questions
8. _______________________ and ___________________________
are the two types on which relational algebra is classified.
9. JOIN operation is a _______________________ operation.
10. Cartesian product is based on _______________________ operation.
11. Union is denoted by the symbol _________.
12. _______________________ is a binary operation which is used to
combine two relations.
13. _________________________ operation is represented by pie.
14. ________________, __________________ and ____________ are
the three types of outer join.
B1966
Unit 6
Here, t is a tuple variable that ranges over relation STUDENT. Each tuple
in a STUDENT relation that satisfies the condition, that is, marks > 60% will
be retrieved.
Example 2: Retrieve std_ID, std_name and class of students who are
residing at Bangalore. So the query will be
t.std_ID, t.std_name, t.class | STUDENT (t) and t.city = Bangalore.
In this, we first specify the requested attributes and then the condition.
Formula specification of tuple relational calculus:
The expressions of the tuple calculus are constructed from the following
elements:
where t1, t2, are tuple variables ranging over relation R; COND is a
formula of the tuple relational calculus, where tA represents the component
of t, where A is an attribute of the relation.
Conditions of the form x * y where * is any of the following =, !=, <, >, <=.
Well-formed formulas (Wff):
A Wff is constructed from one or more atoms connected via Boolean
operators (AND, OR NOT) and quantifiers () according to the rules
below:
1. Every atom is a formula.
2. If F1 and F2 are formulas, then so are (F1 and F2), not (F1), and not
(F2). The truth values of these four formulas are derived from their
component formulas F1 and F2 as follows:
a) (F1 and F2) is TRUE if both F1,and F2 are TRUE; otherwise, it is
FALSE.
b) (F1,and F2) is FALSE if both F1 and F2 are FALSE; otherwise, it is
TRUE.
c) Not (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE
d) Not (F2) is TRUE if F2 is FALSE; it is FALSE if F2 is TRUE.
B1966
Unit 6
Select the student information who are not from London: t.std_ID,
t.Bdate, t.address | STUDENT(e) and NOT (t.city=london)
Another example:
For every project located in Bangalore, list the subject code, subjects
opted and the faculty in charges last name, address:
s.sub_code,s.sub_opt,f.last_name,f.address | SUBJECT (s) and STUDENT
(f) and s.location = Bangalore and ((d)(FACULTY(d))
s.sub_code=d.sub_code and d.faculty_code=f.faculty_code)
and
Example, to get the names, subject names of all students whose marks
obtained are greater than 70%.
(s.std_name, d.name | STUDENT
d.sub_code=s.sub_code) and
(s)
and
SUBJECT
(d)
and
B1966
Unit 6
B1966
Unit 6
For example: Find the branch name, loan number and amount for loans of
over 25,000
{<vmKma> | (<b,L,a> borrower ^ b(<b,L,a> loan ^b=Bombay))}
Find the names of all customers who have a loan from the Bombay
branch and find the loan amount.
Now the time has come to point out the differences between relational
algebra and relational calculus.
6.5.4 Relational algebra versus relational calculus
Relational algebra is procedural whereas relational calculus is nonprocedural.
Expressive power of relational algebra and relational calculus are
equivalent. This means that any query that could be expressed in
relational algebra can be expressed by formulas in relational calculus.
Self-Assessment Questions
15. __________________________ and _________________________
are the two types of calculus.
Sikkim Manipal University
B1966
Unit 6
6.6 Summary
Let us recapitulate the important concepts discussed in this unit:
Relational database is composed of many relations and tuples. Each
tuple is related to one another in a number of ways. The three main
types of constraints are inherent-model-based, schema-based and
application-based. Schema-based constraint is also called explicit
constraint. The different types of schema-based constraints are domain
constraints, key constraints, constraints on NULLs, entity-integrity
constraints and referential-integrity constraints.
The three operations are insert operation, delete operation and modify
operation.
Relational algebra is classified based on two types, namely,
mathematical set theory and operations for relational databases. Set
theoretic operations are based on mathematical set theory. The
common operations in relational algebra based on this type are Union,
Intersection, Set difference and Cartesian product. Relational operations
are based on operational for relational databases. The operations based
on this type are SELECT, PROJECT and JOIN.
Relational calculus can be used when there are higher level relational
queries and is considered to be the notation for specifying the relational
queries. There are two types of relational calculus. They are tuple
relational calculus and domain relational calculus
6.7 Glossary
Atomic: Atom is single value. The cell having atomic value means, in a cell
there is only one value which is allowed in the table.
Constraint: Constraint is an element factor to restrict an entity, project or a
system.
Entity: Entity is something that exists by itself.
Sikkim Manipal University
B1966
Unit 6
6.9 Answers
Self-Assessment Questions
1. Inherent-model-based
2. Semantic
3. Answers
a) False
b) False
4. True
5. Retrieval
6. Domain
7. Referential-integrity
8. Mathematical set theory and operations for relational databases
9. Relational
Sikkim Manipal University
B1966
Unit 6
Cartesian product
PROJECT
Left outer join, right outer join and full outer join
Tuple relational calculus and domain relational calculus
Tuple variable
Well-formed formulas
Safe
Terminal Questions
1. The different types of schema-based constraints are domain
constraints, key constraints, constraints on NULLs, entity-integrity
constraints and referential-integrity constraints. (Refer to Section 6.2
for further information.)
2. The various update relations are of three types: insert, modify and
delete operations. (Refer to Section 6.3 for further information.)
3. The relational algebra can be classified based on two main types: set
theoretic operations and relational operations. The common operations
based on set theoretic operations are UNION, INTERSECTION, crossproduct and set difference. Based on relational database there are
SELECT, PROJECT and JOIN operations. (Refer to Section 6.4 for
further information.)
4. A tuple is a variable that ranges over tuples of values whereas domain
variable is a variable whose value is drawn from the domain of an
attribute unlike entire tuple. (Refer to Section 6.5.3 for further
information.)
References/E-References:
E-References:
http://www.google.co.in/url?sa=t&rct=j&q=conceptual%20data%20model
%20for%20database%20design&source=web&cd=7&ved=0CGAQFjAG
&url=http%3A%2F%2Fpeople.stfx.ca%2Frpalanis%2F475%2FConceptu
al.ppt&ei=wIb6T6m5MsexrAfb0angBg&usg=AFQjCNEG6R_KFxyPDSa
CHVYTSrdH0j9lyA (Retreived on 9th July 2012)
http://www.youtube.com/watch?v=mQ4D0drMrYI (Retreived on 12th
July 2012)
Sikkim Manipal University
B1966
Unit 7
Unit 7
Structure:
7.1 Introduction
Objectives
7.2 SQL: The Universal Database Language
7.3 Types of SQL Statements
7.4 SQL Tables
Data retrieval statement (SELECT)
7.5 Multi Table Queries
Nested queries or sub queries
Multiple-row nested queries
The exists clause
7.6 Data Manipulation Language
7.7 Creating Databases
7.8 Summary
7.9 Glossary
7.10 Terminal Questions
7.11 Answers
7.12 Case Study
7.1 Introduction
We discussed the role of ER diagram and the different usages of the
notations in unit 5. Once database design is done, it needs to be
implemented. To implement a database, we need well-structured language
that can code the queries in the database. Therefore, in DBMS, Structured
Query Language (SQL) is used to implement the query program.
SQL is a non-procedural language that describes the type of data to be
retrieved, updated or deleted. This is a structured language and has the
capability to update the database and its data. In short, we can say that if
you are trying to do any serious work with the database, you need SQL.
Therefore, in this unit we will discuss various types of SQL statements and
elaborate on SQL tables. We will also study multiple-table queries and how
to deal with SQL in creating databases. In addition, we will discuss how to
use SQL and explain the same with examples.
B1966
Unit 7
Objectives
After studying this unit, you should be able to:
define SQL
list the different types of SQL
elucidate multiple-table queries
elaborate on data manipulation language
demonstrate creating the databases
B1966
Unit 7
Address
Salary (Rs.)
Department
Phone_no
Mr. Surat
Orissa
100,000
Purchase
9886725499
Ms. Nandini
Bangalore
50,000
Accounts
9986752388
Hyderabad
25,000
Accounts
9886743477
Mr. Abhishek
Delhi
75,000
Marketing
8017634122
Mr. Reddy
Hyderabad
40,000
Marketing
9985467233
Though SQL was introduced in 1970, it was first standardised in 1986 and
universally adopted. This language became famous even for non-relational
database systems. SQL provides support for a variety of professionals like
programmers, analysts, designers, database administrators, and so on,
unlike the basic programming language like C and COBOL, which provide
support for only specific domain of programmers.
SQL is a special purpose non-procedural language which is used to support
database applications and one cannot write general purpose applications
with it.
Now let us see how a simple SQL statement can be written with an example
of an EMPLOYEE database. The format of the simple SQL statement is as
given below:
SELECT <field name> FROM < table name> WHERE <condition>
Sikkim Manipal University
B1966
Unit 7
B1966
Unit 7
B1966
Unit 7
Revoke: Revoke takes out privilege from one or more tables or views.
o SQL DBA> Revoke UPDATE, DELETE FROM INSURES;
o SQL DBA>Revoke all on emp from Akash;
EXECUTE
RUN
R<filename>
EXIT or QUIT
LIST
APPEND<text>
CLEAR BUFFER
GER<filename>
SAVE<filename>
DEFIN_EDITOR=
notepad
EDIT
B1966
Unit 7
Description
CHAR (size)
VARCHAR2(size)
DATE
BLOB
CLOB
BFILE
LONG
LONG RAW
NUMBER (size)
NUMBER(size,d)
DECIMAL
FLOAT
Same as NUMBER
INTEGER
SMALLINT
Same as NUMBER
Example Tables:
To study the SQL commands of various types we need some tables. Let us
consider Tables 7.3 and 7.4, which will be used throughout our discussion.
Table 7.3: Employee Relation
Ssn
Name
Bdate
Salary
Mgrssn
Dno
1111
Deepak
5-Jan-62
22,000
4444
2222
Yadav
27-Feb-84
30,000
4444
3333
Venkat
22-Jan-65
18,000
2222
4444
Prasad
2-Feb-68
32,000
Null
5555
Reena
4-Aug-79
8,000
4444
B1966
Unit 7
Dname
Loc
Admin
Chennai
Research
Bangalore
Accounts
Bangalore
B1966
Unit 7
Ssn
Name
Bdate
Salary
Mgrssn
1111
Deepak
5-Jan-62
20,000
4444
2222
Yadav
27-Feb-60
30,000
4444
3333
Venkat
22-Jan-65
18,000
2222
4444
Prasad
2-Feb-84
32,000
Null
5555
Reena
4-Aug-65
8,000
4444
B1966
Unit 7
Ssn
Name
Bdate
Salary
Mgrssn
Dno
1111
Deepak
27-Feb-84
30,000
4444
2222
Yadav
15-Jan-65
8,000
4444
3333
Venkat
22-Jan-85
20,000
2222
4444
Prasad
27-Feb-84
32,000
Null
5555
Reena
15-Jan-65
8,000
4444
Salary
Prasad
32,000
Reena
8,000
Deepak
22,000
Venkat
30,000
Yadav
18,000
B1966
Unit 7
Salary
Salary *12
Prasad
32,000
384,000
Reena
8,000
96,000
Deepak
22,000
264,000
Yadav
18,000
360,000
Venkat
30,000
216,000
Employee)
Salary
Salary *12
Prasad
32,000
384,000
Reena
8,000
96,000
Deepak
22,000
264,000
Yadav
18,000
360,000
Venkat
30,000
216,000
B1966
Unit 7
Null?
Type
Ssn
NOT NULL
NUMBER [4]
NAME
NOT NULL
VARCHAR2 [20]
BDATE
DATE
SALARY
NUMBER [10,20]
MGRSSN
NUMBER [4]
DNO
NOT NULL
NUMBER [2]
Name
Bdate
Salary
Mgrssn
Dno
2222
Yadav
10-Dec-60
30,000
4444
Salary
Prasad
32,000
Deepak
22,000
Yadav
18,000
If you observe Tables 7.11 and 7.12, you will notice that though the query
looks similar the output is different.
B1966
Unit 7
Equal to
>
Greater than
<
Less than
<>
Not equal
BETWEEN<a>AND<b>
IN<set>
LIKE<pattern>
IS NULL
Is a null value
Salary
Prasad
32,000
Yadav
30,000
Deepak
22,000
B1966
Unit 7
B1966
Unit 7
B1966
Unit 7
No. of employees
30
40
52
53
SUM(SALARY)
22,000
18,000
7,000
Find the sum of salaries, the maximum and minimum salary of all the
employees.
SELECT Sum(salary), Max(salary), Min(salary)
FROM emp;
B1966
Unit 7
Sum(salary)
Max(salary)
Min(salary)
50,000
32,000
8,000
Having Clause
The having clause filters the rows returned by the group by clause.
To demonstrate this clause, consider the following queries:
Select job, count (*) from EMP group by job having count (*)>20;
Select Deptno, max(basic), min(basic) from EMP group by Deptno
having salary >30,000;
Find the average salary of only department1.
SELECT DnO,avg(salary)
FROM Employee
GROUP BY Dno
HAVING Dno = 1;
For each department, retrieve the department number, Dname and the
number of employees working in that department if the department
should contain more than three employees.
SELECT
Dno, Dname, count(*)
FROM
Emp, Dept.
WHERE
Emp.Dno=dept.Dno
GROUP BY
Dno
HAVING
count (*)>3;
Here, where_clause limits the tuples to which functions are applied and the
having clause is used to select individual groups of tuples.
For each department that has more than three employees, retrieve the
department number and the number of its employees earning more than
Rs.10,000.
SELECT Dno, AVG (salary)
FROM Employee
B1966
Unit 7
Simple equi-joins
We must follow the guidelines given below to join two tables together:
o Table names in the FROM clause are separated by commas.
o Use appropriate joining condition. This means that the foreign key of
Table 1 will be made equal to the primary key of Table 2. This
column acts as the joining attribute. For example, dno of employee
table and dno of department will be involved in the joining condition
of WHERE clause.
B1966
Unit 7
B1966
Unit 7
B1966
Unit 7
B1966
Unit 7
(SELECT
AVG (salary)
FROM Employee
GROUP BY Dno
HAVING
dno = 3);
The output of the above query is:
Name
Salary
Prasad
32,000
Venkat
30,000
7.5.2 Multiple-row nested queries
The operators IN, ANY and ALL are used in the multiple-row sub queries.
The descriptions of these operators are shown in Table 7.17. The sub query
in this case returns more than one row.
Table 7.17: Operators in Multiple-Row Nested Query
Operators
Description
IN
ANY
ALL
Consider an
employee:
SELECT
FROM
WHERE
Remember that the multiple-row sub queries expect one or more results. In
this example, the inner query gives a single value and the next example
shows a set of values. Table 7.18 gives an idea of how to use ANY and
ALL.
B1966
Unit 7
Meaning
Example
<ANY
>ANY
<ANY
Same as IN
<ALL
>ALL
!=ALL
Not equal to
any thing
B1966
Unit 7
B1966
Unit 7
Consider query 3:
SELECT
Name, salary
FROM
Employee
WHERE
Salary = ALL
(SELECT
Salary
FROM
Employee
WHERE
DNO =3);
The output of the above query is:
NAME
SALARY
Deepak
22,000
Pooja
18,000
Obviously, this query should output salaries of employees other than the set
given by the sub query.
7.5.3 The exists clause
The exists clause returns true in a WHERE clause, if the sub query that
follows returns at least one row.
Consider query 1:
Assume that we want to display the names of employees who work for the
Accounts department. We can write it as:
SELECT
Name
FROM
Employee E
WHERE
EXISTS
(SELECT*FROM Department D
SHERE E.DNO = D.DNO AND DNAME = Accounts);
The result of the above query is:
NAME
Prasad
Reena
Venkat
7.6
B1966
Unit 7
Insert statement
The general syntax to add a new row into the table is given below:
INSERT INTO table [(column-1I, column-2I)]
Values (value- I, value-2..I);
Using this syntax, you can insert only one row at a time. To insert more than
one row, you can execute the insert statement repeatedly. The simplest
example for INSERT statement is shown below.
EMPLOYEE
INSERT INTO Employee
VALUES (1111, Deepak, 5-jan-82, 0000, 4444,);
To enter more records, we can use / (slash symbol); / is used to execute
the commands stored in the buffer.
Insert
into
EMPLOYEE
&eaddr,&ba);
(empno,eaddr,basic)
values
(&empno,
Delete command
It is a DML statement to delete record(s). The general syntax to delete
command is given below:
DELETE FROM tableWHERE cond;
You should remember that if the WHERE condition is not present in the
query, all the rows in the table are deleted.
For example, DELETE from EMPLOYEEWHERE name = Yadav;
Update command
It is used to change existing values in a table. The general syntax to update
command is given below:
UPDATE tableSET [col I = val I, col2 = val2]WHERE cond];
For example, UPDATE EMPLOYEE SET deptno = 100;
If the WHERE condition is not present in the query, all the rows in the table
are updated.
For example, UPDATE EMPLOYEE SETename= Sourav WHEREempno = 100;
B1966
Unit 7
B1966
Unit 7
tablename(Column_1
datatype,
column_2
You will need to specify the name of the table (it should be unique) and one
or more attributes and their data types.
For example,
CREATE TABLE
Employee (
SSN
number (4)
not null
NAME
varchar (2) (20) not null
BDATE
data,
Mgrssn
number,
Primary key (SSN)
Foreign key (mgrssn) reference employee (SSN));
Alter table statement
After creating a table, one or more columns can be added to this table.
Similarly, columns can be dropped (applies to Oracle9i only) and in either
case the existing table columns will not be affected. For example, assume
that we wish to add a column phone numbers to employee table.
For example,
ALTER TABLE Employee
ADD phone number (7) not null;
Using the same alter command you can modify the data type of a column.
For example, the phone column can be modified from number to varchar2.
ALTER TABLE Employee
MODIFY phone varchar2 (10);
Oracle 8i does not support dropping a column, but oracle 9i does it.
ALTER TABLE employee
DROP COLUMN MODIFY phone;
Dropping a table even when it has data is possible.
SYNTAX: DROP [TABLE] table;
For Example, to drop the employee table, use the following statement:
DROP employee
Sikkim Manipal University
B1966
Unit 7
from<table/s>
B1966
Unit 7
For example:
Create index ind1 ON EMPLOYEE(empno);
Query 1:
Retrieve the name and address of all employees who work for the research
department.
SELECT
FNAME, LNAME, ADDRESS
FROM
EMPLOYEE, DEPARTMENT
WHERE DNAME = research AND D number=DNO
Query 1 is similar to a SELECTPROJECTJOIN sequence of relational
algebra operations.
Such queries are often called select-project-join queries. In the WHERE
clause of Q1, the conditional DNAME = Research is the selection
condition and corresponds to a SELECT operation in Relational algebra.
Other important examples:
Company database example:
This example uses the following tables and underlines columns that are the
primary keys:
Employee (
SSN char (9), Name varchar2 (10), Bdate Date, Address varchar 2 (30),
Sex chart (1), Salary Number (10, 2) SuperSSN char (9), Dno Number (2))
Department (
Dnumber Number (2), Dname Varchar2 (10), MgrSSj char (9), Mgrstartdate
Date)
Project (
Pnumber Number (2), Pname varchar2 (10). Plocation varchar2 (15).Dnum
Number (2))
Dependent (
ESSN CHAR (9), Dependent name Varchar2 (15), sex char, Bdate Date,
Relationship varchar2 (10))
Dept_locations (
Dnumber Number (2), Dlocation varchar2 (15))
Works_on (
ESSN char (9), PnoNumber (2), Hours Number (3, 1))
B1966
Unit 7
Query 1:
Retrieve the name and address of all employees who work for the research
department.
Q1:
SELECT
FNAME, LNAME, ADDRESS
FROM
EMPLOYEE, DEPARTMENT
WHERE DNAME = research AND Dnumber=DNO;
Query 2:
Retrieve the birth date and address of the employee whose name is John
B. Smith.
Q2:
SELECT
BDATE, ADDRESS
FROM
EMPLOYEE
WHERE
FNAME = John AND D minit = B and
LNAME = Smith;
This query involves only the EMPLOYEE relation listed in the FROM
clause.
Query 3:
For every project located in Stafford, list the project number, the controlling
department number and the department managers last name, address and
birth date.
Q3:
SELECT
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
FROM
PROJECT, DEPARTMENT, EMPLOYEE
WHERE
DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION= Stafford;
The join condition DNUM=DNUMBER relates a project to its controlling
department, whereas, the join condition MGRSSN=SSN relates the
controlling department to the employee who manages that department.
Query 4:
Retrieve the name of each employee who has a dependent with the same
first name as the employee.
Q4:
SELECT
E.FNAME, E.LNAME
FROM
EMPLOYEE
WHERE
E.SSN IN (SELECT ESSN FROM DEPENDENT
WHERE
ESSN=E.SSN AND
E.FNAME=DEPENDENT_NAME);
Sikkim Manipal University
B1966
Unit 7
Query 5:
Q5:
SELECT
FNAME, LNAME
FROM EMPLOYEE
WHERE
((SELECT
PNO
FROM WORKS_ON
WHERE
SSN=ESSN)
CONTAINS
(SELECT
PNUMBER
FROM
PROJECT
WHERE
DNUM=5));
Query 6:
List the names of managers who have at least one dependent.
SELECT
FNAM.LNAME
FROM
EMPLOYEE
WHERE
EDISTS (SELECT *
FROM
DEPENDENT
WHERE
SSN=ESSN)
AND
EXISTS (SELECT *
FROM
DEPENDENT
WHERE
SSN=MGRSSN;
One way to write this query is shown in Q7, where we specify two nested
correlates.
Queries: the first one selects all dependent tuples related to an
EMPLOYEE, and the second one selects all department tuples managed by
the EMPLOYEE.
Query 7:
For each employee, retrieve the employees first and last name of his or her
immediate supervisor.
Q7:
SELECT
E.NAME, E.LNAME, S.FNAME, S.LNAME
FROM
EMPLOYEE E.EMPLOYEE S
WHERE
E.SUPERSSN=S.SSN;
In this case, we are allowed to declare alternative relation names E and S,
called aliases, for the EMPLOYEE relation.
Sikkim Manipal University
B1966
Unit 7
Query 8:
Make a list of all project numbers for projects that involve an employee
whose last name is Smith, either as the worker or as the manager of the
department who controls the project.
Q8:
SELECT
PNUMBER
FROM
PROJECT.DEPARTMENT.EMPLOYEE
WHERE
DNUM=DNUMBER AND MGRSSN=SSN AND LNAME=
smith.)
UNION
(SELECT
PNUMBER
FROM
WHERE
PNUMBER=PNO AND ESSN=SSN
AND LNAME= smith;
The first SELECT query retrieves the project that involves Smith as the
manager of the department who controls the project, and the second
retrieves the projects that involve Smith as a worker on the project.
Query 9:
Retrieve the social security numbers of all employees who work on project
numbers 1, 2, 3.
Q9:
SELECT
DISTINCT ESSN
FROM
WORKS ON
WHERE
PNO IN (1, 2, 3);
Query 10:
Find the sum of all salaries of all employees, the maximum salary, the
minimum salary and the average salary.
Q10: SELECT
SUM (SALARY), MAX (SALARY), MIN (SALARY)
AVG (SALARY)
FROM
EMPLOYEE;
Query 11:
Count the number of distinct salary values in the database.
Q.11: SELECT
FROM
B1966
Unit 7
Self-Assessment Questions
15. __________________ is a derived table that doesnt have storage of
its own.
16. State whether the following statements are true or false:
a) Indexing provides faster access.
b) Indexing can be done with the help of primary key.
7.8 Summary
Let us recapitulate the important concepts discussed in this unit:
SQL statement can be categorised into four types, namely, DDL (Data
Definition Language), DML (Data Manipulation Language), DCL (Data
Control Language) and TCL (Transaction Control Language).
Working with tables requires the knowledge of selecting and creating the
database and then manipulating the tables. There are various features
of tables and the activities that can be performed on the table.
Using multiple nested queries we can deal with multiple tables using
SQL query.
A DML consists of SQL statements that are used to insert, delete and
update the records in a table.
Creating and altering the table with the constraints are the important
aspects of databases.
7.9
Glossary
B1966
Unit 7
B1966
Unit 7
7.11 Answers
Self-Assessment Questions
1. Non-procedural
2. Structured English query language
3. Tables
4. Row
5. DDL
6. DML
7. Append
8. BLOB
9. 4 GB
10. DISTINCT
11. LIKE<pattern>
12. FROM
13. DELETE FROM table WHERE condition
14. TCL
15. View
16. Answers:
a) True
b) True
Terminal Questions
1. SQL statement can be categorised into following four types: They are
DDL (Data Definition Language), DML (Data Manipulation Language),
DCL (Data Control Language) and TCL (Transaction Control Language).
(Refer to Section 7.3 for further information.)
2. Working with tables needs the knowledge of selecting and creating the
database and then manipulating the tables. We have commands like
create table, update, delete, and so on. (Refer to Section 7.4 for further
information.)
3. The different kinds of joins are: simple equi-join, self-join, outer join.
(Refer to Section 7.5 for further information.)
4. The different DML commands are DELETE, UPDATE, and so on. (Refer
to Section 7.6 for further information.)
5. General Syntax to create a table is CREATE TABLE tablename
(Column_1 datatype, column_2 datatype.); you specify the
Sikkim Manipal University
B1966
Unit 7
name of the table (it should be unique) and one or more attributes and
their data types. (Refer to Section 7.7 for further information.)
B1966
Unit 7
Discussion Question:
1. Where is the logical flaw?
Hint: Refer section No. 7.7
References/E-References:
References:
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
B1966
Unit 8
Unit 8
Structure:
8.1 Introduction
8.2 Information Design Guidelines for Relational Databases
8.3 Levels of Relation Schema
8.4 Normalisation Based on Primary Keys
8.5 Summary
8.6 Glossary
8.7 Terminal Questions
8.8 Answers
8.9 Case Study
8.1 Introduction
In Unit 7, you studied about how to create a database using SQL. In this
unit, we will study how to normalise the data in the database. As you have
already studied, normalisation is the process of building database
structures to store data, because any application ultimately depends on its
data structures. Normalisation is the formal process for deciding which
attributes should be grouped together in a relation. If the data structures are
poorly designed, the application will start from a poor foundation. This will
require a lot more work to create a useful and efficient application.
Normalisation serves as a tool for validating and improving the logical
design, so that the logical design avoids unnecessary duplication of data,
that is, it eliminates redundancy and promotes integrity. In the normalisation
process, we analyse and decompose the complex relations into smaller,
simpler and well-structured relations. Apart from the normalisation process,
in this unit, you will also study the guidelines for relational database
schema.
Objectives:
After studying this unit, you should be able to:
list the guidelines for designing relational databases
explain the levels of relational schema
elucidate the different types of normal forms
distinguish between the different types of normal forms
Sikkim Manipal University
B1966
Unit 8
Emp_name
Address
Basic salary
P.K
Dept_ID
F.K
DEPARTMENT
Dept_ID
Dept_name
P.K
Dmgr_id
F.K
B1966
Unit 8
Guideline 1:
Design a relation schema so that it is easy to explain its meaning. Do not
combine attributes from multiple entity types and relationship types into a
single relation.
Reducing redundant values on tuples
Storage space is one of the most important considerations of a relational
schema. Improper grouping of attributes has a significant effect on the
storage space of the relational schema.
Emp_ID
Emp_name
Basic salary
Address
Dept_name
Dept_loc
Emp
_name
Basic
salary
address
Dept_ID
Dept_name
Dept_loc
B1966
Unit 8
Guideline 2:
Design the database in such a way that no insertion, deletion or modification
anomalies are present in that relation. If there are any anomalies, note them
clearly, so that proper actions can be taken.
NULL values in tuples
These include unnecessary attributes in the relation. If many of the
attributes do not take any values, we insert NULL values. This can waste
space at the storage level, and it can also lead to problems in understanding
the meaning of the attributes and specifying join operation. Nulls may lead
to counting problems while using aggregate functions.
Guideline 3:
As far as possible, avoid using NULL values for attributes in a relation.
Disallowing spurious tuples
Design relational schema so that they can be joined with equality conditions.
For example:
EMP_LOC
Emp_name
P_loc
Fig. 8.3(a)
EMP_PROJECT
Ssn
Proj_id
Proj_name
Proj_loc
Fig. 8.3(b)
Sikkim Manipal University
B1966
Unit 8
If we attempt a natural join operation on Figures 8.3(a) and 8.3(b), the result
produces many more tuples than the actual combination of tuples.
Additional tuples are called spurious tuples as they represent wrong
information.
Guideline 4:
Design relation schemas so that they can be joined with equality conditions
on attributes that are either primary key or foreign key. It guarantees that no
spurious tuples are generated.
Self-Assessment Questions
1. __________________ specifies how the attribute values in a tuple
relate to one another.
2. ____________________________ are those problems that arise from
the data redundancy of the un-normalised database table.
3. _______________ may lead to counting problems while using
aggregate functions.
B1966
o
o
o
Unit 8
It is a relation.
It has no repeating rows.
Each attribute value is atomic.
If a relation does not satisfy any one of the above conditions, then it is not in
1NF.
For example, the STUDENT schema having the fields as shown in Table
8.1(a).
Table 8.1(a): Relation Schema of a Student Relation
Std. id
Std_name
Class
Address
Tel no.
201
Ranjith
#4, Chokkanahalli,
Bangalore 560074
26677780
202
Shivraj
XI
2514890
9885643247
304
Lavanya
25234972
9912451356
The above table is not in 1NF because the field Tel no. is multi-value for
std_ID 202 and 304. However, if we insert a field name Mobile no. as shown
in Table 8.1(b) to maintain the atomic value attribute, we may create a
nullify field in the field which is not allowed. Therefore, Table 8.1(b) is not in
1NF.
Table 8.1(b)
Std
_id
Std
_name
Class
201
Ranjith
#4,
Chokkanahalli,
Bangalore
560074
26677780
202
Shivraj
XI
Andheri (east)
Mumbai 400064
2514890
9885643247
304
Lavanya
#10, Dadra
Post, Bandra
(east), Mumbai
400014
25234972
9912451356
Address
B1966
Tel_no.
Mobile no.
Unit 8
Std_name
Class
Address
201
Ranjith
202
Shivraj
XI
304
Lavanya
Tel_no.
201
26677780
202
2514890
202
9885643247
304
25234972
304
9912451356
B1966
Unit 8
Project
_code
Hours
Std
_name
Class
Proj_name
Prof
_incharge
101
HMS1
20
Ranjith
Jha
1 MBA
Hospital
management
system
Ms.
203
SIM2
30
Meghna
Sinha
2 MBA
Simulation of
petrol bunk
Mr. Murali
303
DM1
15
Samiksha
Shukla
3 MBA
Data mining
in research
analysis
Mr. Benjamin
Sahana
B1966
Unit 8
Prof
_id
Subjects
specialisation
Qualificati
on
Dept_
number
Dept
_name
HOD
_id
Dr.
Rao
A1
Finance
PhD
D1
Manage
ment
H2
Dr.
Ravi
A2
Marketing
PhD
D1
Manage
ment
H2
Prof.
Sanat
Sha
B1
Computer
science
MCA
D2
IT
H1
Prof.
Neena
Gupta
B2
Sociology
MA, MPhil
D3
Arts &
Humanit
ies
H3
Figure 8.5 shows the decomposition of the above table to form 3NF.
B1966
Unit 8
Fig. 8.7(a)
Sikkim Manipal University
B1966
Unit 8
STUD_PROG
Fig. 8.7(b)
STUD_COURSE
Fig. 8.7(c)
Figures 8.7(a), 8.7(b) and 8.7(c) are only in 1NF. To make them 2NF, we
need to remove the partial key dependencies. Therefore, we will
decompose the schema STUD_COURSE in Figure 8.7(c) into two more
schemas, namely, STUD_COURSE1 and COURSE, which are shown in
Figures 8.8 (a) and 8.8 (b), respectively.
STUD_COURSE1
COURSE
Now that we have removed the partial key dependencies, the relation is in
2NF. To make this relation into 3NF, we need to remove the transitive
dependency of the relation. Therefore, after the decomposition of relation
COURSE (Figure 8.8(b)), the normalised schemas will be as shown in
Figures 8.9(a) and 8.9(b).
B1966
Unit 8
COURSE1
FACULTY
Fig. 8.10(a)
PROG
Fig. 8.10(b)
B1966
Unit 8
Grade obtained
STUD_PROG1
Std_ID Prog_coordinator
PROG
Prog_Coordinator
Program
COURSE1
Course_code Course_title
FACULTY
Faculty incharge
Faculty incharge
Fac_loc
B1966
Unit 8
Sub_name
Std_name
Fac_incharge
Pushpa
Maths
Pushpa
Prof. Chidanand
Pushpa
Physics
Pushpa
Prof. Ramesh
Sub_name
Proj_name
Pushpa
Chemistry
ProjX
Pushpa
Physics
ProjY
Kapila
History
ProjY
Kavitha
Maths
ProjZ
Kapila
English
ProjX
Kapila
Chemistry
ProjX
Pushpa
Chemistry
ProjY
B1966
Unit 8
8.5 Summary
Let us recapitulate the important concepts discussed in this unit:
Some criteria for good and bad relation schemas are: Semantics of the
attributes, reducing the redundant values in tuples, reducing the null
values in tuples and disallowing spurious tuples.
The two levels of relation schema are conceptual level schema and
physical level schema. Physical schema is represented with the help of
an ER diagram.
Sikkim Manipal University
B1966
Unit 8
8.6 Glossary
Fully functional dependency: A functional dependency is a constraint
between two sets of attributes in a relation from a database. A functional
dependency FD: X Y is called trivial if Y is a subset of X.
Join dependency: A join dependency is a constraint on the set of legal
relations over a database scheme. A table T is subject to a join dependency
if T can always be recreated by joining multiple tables each having a subset
of the attributes of T. If one of the tables in the join has all the attributes of
the table T, the join dependency is called trivial.
Null: In the database, Null value means having nothing in the cell; or in
other words, an empty cell.
Redundancy: Redundancy means occurrence of the repeated field in two
or more tables in a database system.
Spurious: Spurious means not genuine, authentic or true.
Tuple: A tuple is an ordered list of elements in set theory. In a database,
collection of data in a row is called tuple
8.8 Answers
Self-Assessment Questions
1. Semantics
2. Update anomalies
3. Nulls
4. Entity types, relationship types
Sikkim Manipal University
B1966
Unit 8
5. ER diagrams
6. Atomic
7. Answers:
a) False
b) False
c) True
d) True
e) True
Terminal Questions
1. Guideline 1: Design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity types and
relationship types into a single relation. Guideline 2: Design the
database in such a way that no insertion, deletion or modification
anomalies are present in that relation. If there are any anomalies, note
them clearly, so that proper actions can be taken. Guideline 3: As far as
possible, avoid using NULL values for attributes in a relation.
Disallowing spurious tuples: Design relational schema so that they can
be joined with equality conditions. Guideline 4: Design relation schemas
so that they can be joined with equality conditions on attributes that are
either primary key or foreign key. It guarantees that no spurious tuples
are generated. (Refer to Section 8.2 for further information.)
2. There are two levels of relation schema. They are: (1) Conceptual level
schema: This schema describes the database structures, interrelationships and constraints. The basic components of the schema are
the entity types, relationship types and attributes. (2) Physical level
schema: This schema specifies the internal storage, structures, indexes,
access paths and file organisations for the database files. Along with
this, they design application programs which are implemented as
transactions. This can be represented with the help of ER diagrams.
(Refer to Section 8.3 for further information.)
3. The different kinds of normal forms based on primary keys and
functional dependencies are 1NF, 2NF, 3NF, BCNF, 4NF and 5NF.
(Refer to Section 8.4 for further information.)
4. First Normal Form: A relation is said to be in 1NF if and only if the
attribute value is atomic. (Refer to Section 8.4 for further information.)
Sikkim Manipal University
B1966
Unit 8
This relation is not in First Normal Form. Therefore, create new rows so that
each cell contains only one value.
Table 2
Still Table 2 is not in 1NF. Make std_ID and subject together as primary key
so that it can identify a tuple.
Now the relation is in 1NF.
Now consider this table. Student name and address are dependent on
std_ID which is a part of the key. But still this is not in 2NF.
Discussion Questions:
1. Why Table 1 is not in 1NF?
2. Why Table 2 is not in 2NF even after student name and address are
dependent on std_ID?
(Hint: Refer to Section 8.4, Normalisation Based on Primary Keys.)
References/E-References:
References:
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Sikkim Manipal University
B1966
Unit 8
E-References:
http://www.cs.man.ac.uk/~horrocks/Teaching/cs2312/Lectures/Handouts
/ NFexamples.pdf ( retrieved on January 14 2012)
www.Vceit.com
http://db.grussell.org/section009.html (retrieved on May 15, 2012)
B1966
Unit 9
Unit 9
Database Administration
Structure:
9.1 Introduction
Objectives
9.2 Transaction Processing Concepts
9.3 Transactions in Multiuser System
9.4 Desirable Properties of Transactions
9.5 Summary
9.6 Glossary
9.7 Terminal Questions
9.8 Answers
9.9 Case Study
9.1 Introduction
So far, we have discussed the various technical concepts of the database
systems with its application in an organisation for designing and analysis of
a system. In this unit, we will discuss how to administer the database in the
organisation and study the basic concepts of transaction processing
systems.
Transaction management is the ability of a database management system
to manage the various transactions that occur within the system.
Transaction is a set of program statements or collections of operations that
form a single logical unit of work. A database management system should
ensure that the transactions are executed properly; either the entire
transaction should be executed or none of the operations should be
executed. This is also called atomic cooperation. The DBMS should execute
this task or transaction in total to avoid inconsistency.
In this unit, we will study various concepts of transaction processing, various
uses of transactions and the properties of transactions.
Objectives:
After studying this unit, you should be able to:
describe the basic concepts of transaction processing system
explain transactions in multiuser system
list the properties of transactions
Sikkim Manipal University
B1966
Unit 9
B1966
Unit 9
b.
T1
T2
Read_item (X)
X = X N'
Write_item(X);
Read_item(X);
X: = X + M
Write_item(X)
Read_item(Y)
Y = Y + N;
Write_item(Y)
B1966
Unit 9
T2
Read-item(x)
X: = x n
Read_item(x)
X: = x + m
Write-item (x);
Read-item (y)
Write_item(x): Item x has an incorrect because its
update by T1 is lost.
Y: = y + n;
Write_item(y):
2. Dirty read problem This problem occurs when one transaction
updates a database item and then the transaction fails for some reason.
The updated item is accessed by another transaction before it is
changed back to its original value.
Sikkim Manipal University
B1966
Unit 9
For example, T1 updates item x and then fails before completion, so the
system must change x back to the original value. Before it can do so,
transaction T2 reads the temporary value of x, which will not be
recorded permanently in the database, because of the failure of T1. The
value of item x that is read by T2 is called Dirty Data, because it has
been created by a transaction that has not been completed and
committed yet. Hence, this problem is also known as the temporary
update problem.
T1
T2
Read-item (x);
X: = x n
Write_item(x)
Read_item(x);
X: = x + m;
Write-item(x)
Read_item(y);
3. Incorrect summary problem If one transaction is calculating an
aggregate summary function on a number of records while other
transactions are updating some of these records, the aggregate function
may calculate some values before they are updated and others are
calculated after they are updated.
For example, Transaction T3 is calculating the total number of
reservations on all the flights while transaction T1 is executing. T3 reads
the values of x after n seats have been subtracted from it, but reads the
value of y before those n seats have been added to it.
T1
T3
Sum: = 0
Read_item(A);
Sum: = sum + A;
Read_item(x);
X: = x n'
Wrote_ote,(x);
B1966
Read_item (y);
Unit 9
result.
Read_item(y);
Y: = y + n;
Write_item(y)
Why is recovery needed?
A major responsibility of the database administrator is to prepare for the
possibility of hardware, software, network and system failure. It is usually
desirable to recover the databases and return to normal operation as quickly
as possible. Recovery should proceed in such a manner that it protects the
database and the users from unnecessary problems.
Whenever a transaction is submitted to a DBMS for execution, the system is
responsible for making sure that either:
1. All the operations in the transactions are completed successfully and
their effects are recorded permanently in the database, or
2. The transaction has no effect on the database; this may happen if a
transaction fails after executing some of its operations, but before
executing all of them.
Types of failures
Disk failure Some disk blocks may lose their data because of read or
write malfunctions.
B1966
Unit 9
End transaction This specifies that the read and write transaction
operations have ended, and this marks the end of the transaction
execution. At this point it may be necessary to check whether the
changes can be permanently applied to the database or aborted.
Fig. 9.2: State Transition Diagram Illustrating the States for Transaction
Execution
Figure 9.2 shows a state transition diagram that describes how a transaction
moves through its execution states. A transaction goes into an active state
immediately after it starts execution, where it can issue read and write
operations. When the transaction ends, it moves to the partially committed
Sikkim Manipal University
B1966
Unit 9
state. At this point, some recovery protocols need to ensure that there is no
system failure. Once this check is successful, the transaction is said to have
reached its commit point and enters the committed state.
However, a transaction can go to the failed state if one of the checks fails or
if the transaction is aborted during its active state. The transaction may then
have to be rolled back to undo the effect of its write operations on the
database. The terminated state corresponds to the transaction by leaving
the system or it ends the transaction.
Static and dynamic files Static files are those files on which the update
operation is done every rarely. However, in dynamic files constant update
operation takes place. It may change frequently.
For example, the master file of any database is a static file whereas a
transaction file is a dynamic file.
The transaction file can retrieve the records from the master file and the
entire update operations take place in transaction file.
Self-Assessment Questions
1. ____________________ is an atomic unit comprising one or more
SQL statements.
2. ____________________ users can access databases and use
computer systems simultaneously.
3. ________________________________ occurs when one transaction
updates a database item and then the transaction fails for some
reason.
4. State whether the following statements are true or false:
a) Whenever a transaction is submitted to a DBMS for execution, the
system is responsible for making sure that all the operations in the
transactions are completed successfully and their effects are
recorded permanently in the database.
b) System crash occurs when some operation in the transaction may
cause failure in the system.
c) Local errors refer to a list of problems that includes power or air
conditioning failure, fire, theft.
d) Static files are those on which the update operations are done
every day.
Sikkim Manipal University
B1966
Unit 9
B1966
Unit 9
Write (x)
Write (y)
Read (y)
Commit
Read (z)
Read (z)
Commit
Commit
H1 = {W2(x), R1(x), R3(x), W1(x), C1, W2(y), R3(y), R2 (z), C2, R3 (z), C3}
B1966
Unit 9
Whereas R1, R2 and R3 are the read operations of T1, T2 and T3; W1, W2
and W3 are the write operations of T1, T2 and T3; C1, C2 and C3 are the
COMMIT operations of T1, T2 and T3.
Recoverable cascading rollback - Recoverability is the ability to recover
data from the transaction failure. The transactions that are committed will
not read data written by the transactions aborted. This is because the
transactions commit only after all the changes of the transaction they read
ends with a COMMIT. So they must read COMMIT.
For example, consider the below schedules may be termed as examplerecover
B1966
Unit 9
B1966
Unit 9
9.5 Summary
Let us recapitulate the important concepts discussed in this unit:
The basic database access operations are Read-item (x) and Write-item
(x).
B1966
Unit 9
9.6 Glossary
Buffer: It is a temporary storage area, usually in RAM. The purpose of
most buffers is to act as a holding area, enabling the CPU to manipulate
data before transferring it to a device.
Cascading rollback: A cascading rollback occurs in database systems
when a transaction (T1) causes a failure and a rollback must be performed.
Other transactions dependent on T1s actions must also be rolled back due
to T1s failure, thus causing a cascading effect. That is, one transactions
failure causes many to fail.
Catrostrophy: It is the mathematical basis for the study of large changes in
a total system which may result from small changes in a critical variable in
the system.
Concurrency: It refers to acting together, as agents or circumstances or
events.
Consistency: It refers to reliability or uniformity of successive results or
events.
Deadlock: A deadlock is a situation in which two or more competing actions
are each waiting for the other to finish, and thus neither ever does.
Durability: Durability refers to the ability of the system to recover committed
transaction updates if either the system or the storage media fails.
Integrity: Integrity constraints guard against accidental damage to the
database, by ensuring that authorised changes to the database do not result
in a loss of data consistency.
Variable: It refers to a logical set of attributes. These can be changed with
respect to time.
B1966
Unit 9
9.8 Answers
Self-Assessment Questions
1. Transaction
2. Multiple
3. Dirty read problem
4. Answers:
a) True
b) False
c) False
d) True
5. Atomicity
6. ACID
7. Schedule history
8. Answers:
a) True
b) False
Terminal Questions
1. The basic database access operations are Read-item (x) and Writeitem (x). Read-item(x) reads a database item named x into a program
variable and Write-item writes the value of the program variable x into
the database. We have a few steps for each of the types of operations.
(Refer to Section 9.2 for further information.)
2. The different types of failures are computer failure (system crash),
transaction or system error, local errors or exception conditions
detected by the transaction, concurrency control enforcement, disk
failure and physical problems and catastrophes. (Refer to Section 9.2
for further information.)
3. A multiuser transaction means they have multiple users operating at
the same interval of time. (Refer to Section 9.3 for further information.)
4. To ensure data integrity, the database management system should
maintain the transaction properties. These are often called the ACID
properties. ACID can be abbreviated as Atomicity, Consistency,
Integrity and Durability. (Refer to Section 9.4 for further information.)
Sikkim Manipal University
B1966
Unit 9
B1966
Unit 9
Discussion Questions:
1. Is recovery addressed in this case? If not, how can it be addressed?
2. Is deadlock manager needed and why?
(Hint: Refer an article on
http://itlab.uta.edu/sharma/PPL/ThesisWeb/hks_thesis.pdf)
References/E-References:
References:
E-References:
http://itlab.uta.edu/sharma/PPL/ThesisWeb/hks_thesis.pdf (retrieved on
25th March 2012)
http://www.google.co.in/url?sa=t&rct=j&q=case%20study%20on%20mult
iuser%20system%20in%20dbms&source=web&cd=5&cad=rja&ved=0C
EQQFjAE&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fd
ownload%3Fdoi%3D10.1.1.206.2749%26rep%3Drep1%26type%3Dpdf
&ei=RLakUKX9I-eviQe994GYDg&usg=AFQjCNFOQnv9YPfHTx2
Lcc7Op0slIjsCsw (retrieved on 25th march 2012)
B1966
Unit 10
Unit 10
Structure:
10.1 Introduction
10.2 ClientServer Databases
10.3 Concurrency Management
Types of locks: Locking technique for concurrency control
The two-phase locking protocol
10.4 Distributed Database Management System
10.5 Heterogeneous and Homogeneous Systems
10.6 Summary
10.7 Glossary
10.8 Terminal Questions
10.9 Answers
10.10 Case Study
10.1 Introduction
In Unit 9, you have studied about how to process the transactions. In this
unit, we will discuss about clientserver databases. Any company will use
Local Area Network (LAN) to connect between computers and share the
resources and peripherals, mainly PCs which are servers and printers. LAN
is used to connect between computers that are physically kept nearer to
each other from the shared computer. In this unit, you will study about the
major concept of data warehousing and query processing. You will also
study about some of the main techniques used to control concurrent
execution of transactions which are based on the concept of locking data
items. A lock is a restriction on access to data in a multiuser environment. It
prevents multiple users from changing the same data simultaneously. If
locking is not used, data within the database may become logically incorrect
and may produce unexpected results. In addition, we will discuss about the
heterogeneous and homogeneous systems.
In order to meet the needs of the present information systems, everybody
would like to access a companys databases. The company database may
include the details about the employees, customers, suppliers and vendors
of various resources. Placing the data of each sector in an individual system
and maintaining the integrity of the system is a meaningless expectation.
Sikkim Manipal University
B1966
Unit 10
B1966
Unit 10
Network
Client The client is the machine (workstation or PC) running the frontend applications. It interacts with a user through the keyboard, display
and mouse. The client has no direct data access responsibilities. The
client machine provides front-end application software for accessing the
data on the server. The clients initiate transactions and the server
processes the transactions.
Interaction between the client and the server might be processed in the
following ways while processing an SQL query:
a) The client passes a user query and decomposes it into a number of
independent site queries. Each site query is sent to the appropriate
server site.
b) Each server processes the local query and sends the resulting relation
to the client site.
c) The client site combines the results of the queries to produce the result
of the originally submitted query.
Thus, the server is called database processor or back-end machine,
whereas the client is called application processor or front-end machine.
Another function controlled by the client is that of ensuring consistency of
replicated copies of a data item by using distributed concurrency control
techniques. The client must also ensure the atomicity of global transactions
by performing global recovery when certain sites fail. It provides distribution
transparency, which means the client hides the details of data distribution
from the user.
Server The server is a machine that is referred to as back end. The
server processes SQL and other query statements received from client
applications. It can have large disk capacity and fast processors.
Network The network enables remote data access through client
server and server-to-server communication.
Each computer in a network is a node, and it acts as a client, a server, or
both, depending on the situation.
Advantages
Client applications are not dependent on physical location of the data. If
the data is moved or distributed to other database servers, the
application continues to function with little or no modification.
Sikkim Manipal University
B1966
Unit 10
B1966
Unit 10
B1966
Unit 10
B1966
Unit 10
B1966
Unit 10
B1966
Unit 10
{T1, T2, T3, Tn) such that T1 is waiting for a data item existing in T2, T2
for T3, and so on and Tn is waiting of T1. In this state none of the
transactions will progress.
Self-Assessment Questions
5. State whether the following statements are True/False:
a) Concurrency control helps to solve the problems that are caused
by multiple users using the same data at the same time
b) A binary lock has one state and the value of lock on X is 1.
6. The two-phase locking protocol are of two phases, and they are
____________________ and ___________________.
7. In __________________ state there exists a set of transactions in
which every transaction in the set is waiting for another transaction in
the set.
B1966
Unit 10
If you observe Figures 10.1(a) and 10.1(b), you will see that in Figure
10.1(a) there is only one network of system connected to databases and
through the communication network system the database is shared through
centralised system by all the systems connected. In Figure 10.1(b) you find
many databases are connected to different systems and are interlinked
through a communication network.
From the above two figures, we come to know that in centralised database
system the data is stored in one place and all the systems share the data
which is present in a single place; whereas in distributed database system,
data is present in various places and are interlinked logically. So any system
connected to the network can have access to the required data. In this way,
the fear of data loss due to system failure or database failure is reduced.
Advantages of DDBMS:
Local control DDBMS helps the local system to have more control on
their data and can exercise rigorously. This increases data integrity and
administration. Any user can have control over the non-local data when
needed.
Faster response Mostly, data are stored in the same site where the
system is located. This helps in faster response of the system.
Disadvantages of DDBMS:
Software cost and complexity DDBMS requires complex software
to help the network to work in alignment, which increases the cost.
B1966
Unit 10
logical units called fragments, and they may be assigned for storage at
various sites. In a DDBMS, decisions must be made regarding which site
should be used to store which portions of the database. There are two types
of fragmentation:
1. Horizontal fragmentation
B1966
Unit 10
B1966
Unit 10
10.6 Summary
Let us recapitulate the important concepts discussed in this unit:
Clientserver database is a type of arrangement of personal computers
through a communication medium in which the computers are
connected through LAN. The clientserver model is basic to distributed
systems. The clientserver model consists of three parts, and they are
Client, Server and Network.
Concurrency control helps to solve the problems that are caused by
multiple users using the same data at the same time. The locking
technique for concurrency control begins here. A binary lock can have
two states or values: locked and unlocked (or 1 and 0, for simplicity).
DDBMS is based on decentralisation. In DDBMS the data is stored in
multiple CPUs and they share an interrelated logic among the data.
The two types of fragmentation are horizontal fragmentation and vertical
fragmentation.
If all servers (or individual local DDMSs) use identical software and all
users use identical software, the DDBMS is called homogeneous;
otherwise, it is called heterogeneous.
Sikkim Manipal University
B1966
Unit 10
10.7 Glossary
Data warehousing: A data warehouse is a relational database that is
designed for query and analysis rather than for transaction processing.
LAN: A Local Area Network (LAN) is a computer network that interconnects
computers in a limited area such as a home, school, computer laboratory or
office building using network media.
Locks: Locks ensure that data shared by conflicting operations are
accessed by one operation at a time a simple way of serialisation.
Multitasking: It refers to the ability to execute more than one task at the
same time, a task being a program.
Query processing: Query processing is a process that turns user queries
and data modification commands into a query plan a sequence of
operations (or algorithm) on the database.
SQL: Sequential Query Language or Structured Query Language is a
special-purpose programming language designed for managing data in
Relational Database Management Systems (RDBMS). Originally based on
relational algebra and tuple relational calculus, its scope includes data
insert, query, update and delete, schema creation and modification and data
access control.
Transaction: Exchange of data is called transaction.
Workstation: Normally, a personal computer that is connected to the
central server is termed as workstation. There can be many workstations
connected to a server.
B1966
Unit 10
10.9 Answers
Self-Assessment Questions
1. Network
2. Client
3. Client, server
4. Data warehouse
5. Answers:
a) True
b) False
6. Growing phase, shrinking phase
7. Deadlock
8. Decentralisation
9. Faster response
10. Local control
11. Answers:
a) False
b) True
12. Horizontal fragmentation
13. Replication
14. Allocation
15. True
Terminal Questions
1. Clientserver database is a type of arrangement of personal computers
through a communication medium. These computers are connected
through LAN. LAN is used to connect the computers that are located
nearby to each other. (Refer to Section 10.2 for further information.)
2. Binary locks, shared locks and exclusive locks. (Refer to Section 10.3.1
for further information.)
3. In centralised database system the data is stored in one place and all
the systems share the data which is present in a single place, whereas
in distributed database system data is present in various places and
are interlinked logically. So, any system connected to the network can
have access to the required data. In this way, the fear of data loss due
to system failure or database failure is reduced. (Refer to Section 10.4
for further information.)
Sikkim Manipal University
B1966
Unit 10
B1966
Unit 10
needed data. This actually defeats the whole purpose of fragmentation; the
ultimate goal of fragmentation is to reduce retrieval time by accessing
directly from the fragment that contains the data we need.
(Source:
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0206fan/
0206fan.html#section2)
Discussion Questions:
1. What shall be done to increase the efficiency?
2. Will eliminating tables help? If so how?
(Hint: Fragmentation is a new feature for Informix Dynamic Server
version 7 and above. If used properly, it will improve overall Informix
Dynamic Server performance significantly; but if used without care, it
may adversely affect performance.)
References/E-References
References
Elmasri, R., & Navathe, S. H. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
E-References
http://www.wiziq.com/tutorial/225006-Transaction-Processing-Systemin-DBMS (Retrieved on 10th November 2012)
http://media.wiley.com/product_data/excerpt/79/EHEP0003/EHEP00037
9.pdf (Retrieved on 12th November 2012)
http://docs.oracle.com/cd/B10501_01/server.920/a96520/concept.htm#
50413 (Retrieved on 15th November 2012)
http://msdn.microsoft.com/en-us/library/orm-9780596521301-02-08.aspx
(Retrieved on 15th November 2012)
http://www.agiledata.org/essays/concurrencyControl.html (Retrieved on
15th November 2012)
http://www.cs.wmich.edu/~yang/tlt/cs643/applets/twopc/locktext.html
(Retrieved on 18th January 2012)
B1966
Unit 10
http://docs.oracle.com/cd/B10501_01/server.920/a96520/concept.htm#
50413 (Retrieved on 18th January 2012)
http://www.ibm.com/developerworks/data/zones/informix/library/techartic
le/0206fan/0206fan.html#section2 (Retrieved on 28th January 2012)
B1966
Unit 11
Unit 11
Controls
Structure:
11.1 Introduction
Objectives
11.2 Atomicity
11.3 Recovery Techniques
Deferred update
Immediate update
11.4 Security, Backup and Recovery
11.5 Summary
11.6 Terminal Questions
11.7 Answers
11.8 Case Study
11.1 Introduction
In the previous unit, you studied the importance of distributed database
management system and its types. When it comes to distributed database
system, the chances of getting prone to hacking, failure, and so on will be
more. There may be also chance diversion in the flow of data. We are
aware that a computer system, like any other mechanical or electrical
device is subjected to failure. The different reasons for such failure are disk
crash, power failure, software error, and so on. In each of these cases,
information may be lost. Therefore, the database system is responsible for
the restoration of the database to a consistent state just before the time of
failure. To restore the original state of the database, the DBMS must keep
information about the changes made by the various transactions in the
system log.
Block disk cache - Cached values on disk are stored by Block Disk
Cache. Like the Indexed Disk Cache, it keeps the keys in memory. The
block disk cache stores the values in a group of fixed size blocks.
B1966
Unit 11
In this unit, you will study the different recovery techniques in the database.
You will study in detail about the security and backup feature in a database.
Objectives
After studying this unit, you should be able to:
define atomicity
identify different recovery techniques
explain security and backup features in database
11.2 Atomicity
Atomicity is a process where it states the database as a rule of ALL or
NONE. If any one part of the transaction fails, the whole transaction fails,
and that transaction is said to be an atomic transaction. A very critical
characteristic of database management is that it has to maintain atomic
nature of transactions. An example of atomic transaction can be that of
ordering a plane ticket. In this case, there are two actions involved in this
transaction. Either the customer has to pay for the seat and thus reserve it
or he/she doesnt pay for it and doesnt reserve it.
For a simpler example, let us assume you want to subtract 15 apples from
basket A and add 10 apples to basket B. This is a valid transaction. Assume
that you have removed 15 apples from basket A and your transaction is
aborted due to some error. Then you cannot add apples to basket B. In this
case, the whole transaction is cancelled.
Therefore, atomicity means indivisibility.
B1966
Unit 11
Self-Assessment Questions
1. Atomicity means ________________________.
2. ____________________rule says that if any one part of the transaction
fails, the whole transaction fails.
3. _______________________ is responsible for making all the data
modifications permanent in the database.
B1966
Unit 11
recorded in the database on the disk (after a failure has occurred, the
recovery subsystem consults the log to determine which transactions need
to be redone). Transaction (Ti) needs to be redone if and only if the log
contains both the record <Ti Start> and the record <Ti commit>. Thus,
information in the log is used in restoring the system to a previous
consistent state.
Hence, it is also known as No-undo/redo algorithm.
For example, consider a transaction t1 that transfers Rs. 50 from account A
to account B.
This is defined as follows:
T0: read (A)
log
T0: read (A)
A: = A 50
Write (A)
Read (B)
Write (B)
A = 950
B = 2050
database
<T0 START>
<t0.A.950>
<T0.B.2050>
<T0.commit>
B1966
Unit 11
When the system comes back up, the operation read <T0> is performed.
Since the record <T0 commit> appears in the log on the disk. After this
operation is executed, the values of accounts A and B are Rs. 950 and Rs.
2,050. The value of account C remains Rs. 700. Due to incomplete
transaction, T1 can be deleted from the log.
11.3.2 Immediate update
This method updates the database without waiting to reach the commit
point, that is, when a transaction issues an update command, the database
can be updated immediately. An updated operation must be recorded in the
log before it is applied to the database. If a transaction fails after recording
some changes in the database, but before reaching its commit point, the
effect of the transaction on the database must be undone (rolled back). We
have to redo the already updated operation and undo (transactions must be
rolled back) the effects of uncommitted transactions. So in the case of
immediate update technique, both undo and redo operations are required
during recovery. Hence, immediate update technique is known as
UNDO/REDO algorithm.
Now we shall take up the algorithm for UNDO/REDO scheme.
1. Step 1: Redo all transactions for which the log has both start and
commit entries.
2. Step 2: Undo all transactions for which the log has start entry but no
commit entry.
Undo (Ti) restores the value of all data items updated by transaction
(Ti) to the old values.
Redo (Ti) sets the value of all data items updated by transaction (Ti)
to the new values.
After a failure has occurred, the recovery scheme consults the log to
determine which transactions need to be undone and which need to be
redone. This classification of transaction is accomplished as follows:
Transaction (Ti) needs to be undone if the log contains the record
<Ti start> but does not contain the record <Ti commit>
Transaction (Ti) needs to be redone if the log contains both the records
<Ti start> and the record <Ti commit>. Changes made by the
transactions are stored back to the database.
B1966
Unit 11
B1966
Unit 11
Legal and ethical issues - This is regarding the right to access certain
information. Some of the information may not be accessible to
unauthorised users and it is legally unethical. Such information area is
managed by numerous laws.
There may be many reasons for the failure of data transmission. The
transmission may fail because of system crash, errors in local systems,
transmission errors, catastrophes or concurrency control enforcement. The
main purpose of recovery process is to recover the data during any failure
without losing any part of the data. The recovery process can be done by
DBMS automatically or through restoring from backup copies by users.
DBMS has metadata, manipulation language and construct to meet the
responsibilities of recovery system. The components included in DBMS are
data definition language, query optimisation algorithm, performance
monitoring functions and recovery and concurrency mechanism. It is an
important job of a DBMS to respond to user requests at the right time and to
the right person.
B1966
Unit 11
Self-Assessment Questions
8. ______________________ issue in security is regarding the right to
access certain information.
9. State whether the following statements are True or False:
a) There are some kinds of information which are not supposed to be
publicly available like inpatient medical records or bank statement of
an account holder. Such issues are dealt with as legal and ethical
issues.
b) Security has to be enforced in different levels of the system. Such
issues are handled as system-related issues.
11.5 Summary
Let us recapitulate the important concepts discussed in this unit:
Atomicity is a process where it states the database as a rule of ALL or
NONE.
Recovery is a process whereby an image backup of the database has
taken place.
Deferred update defers or postpones any actual updates to the
database until the transaction completes its execution successfully.
Immediate update method updates the database without waiting to
reach the commit pointthat is, when a transaction issues an update
command, the database can be updated immediately.
Security is a broad area which has to be addressed in many ways.
Some of them are legal and ethical issues, policy issues and systemrelated issues.
11.7 Answers
Self-Assessment Questions
1. Indivisibility
2. Atomic transaction
3. Transaction commit
Sikkim Manipal University
B1966
Unit 11
4. Answers:
a) True
b) False
5. Deferred update and immediate update
6. No-undo/redo algorithm
7. Immediate update technique
8. Legal and ethical
9. Answers:
a) False
b) True
Terminal Questions
1. (Refer to Section 11.1 for further information.)
2. (Refer to Section 11.2 for further information.)
3. (Refer to Section 11.3 for further information.)
B1966
Unit 11
B1966
Unit 12
Unit 12
Distributed Databases
Structure:
12.1 Introduction
Objectives
12.2 Overview of Distributed Database (DDB) System
Clientserver model
12.3 Features of DDB
12.4 Advantages and Disadvantages of DDB
12.5 Data Replication
12.6 Data Fragmentation
12.7 Summary
12.8 Glossary
12.9 Terminal Questions
12.10 Answers
12.11 Case Study
12.1 Introduction
In the 1980s, Distributed Database (DDB) systems had evolved to
overcome the limitations of centralised database management systems and
to cope with the rapid changes in communication and database
technologies. This unit introduces the fundamentals of distributed database
systems. The benefits and limitations of distributed DBMS over centralised
DBMS are briefly discussed. The objectives of a distributed system, the
components of a distributed system and the functionality provided by a
distributed system are also described in this unit.
In this unit we will study fundamentals of distributed databases, and the
features of distributed DBMSs. The pros and cons of distributed DBMSs
are discussed with an example of a distributed database system. The
classification of distributed DBMSs is explained and will introduce the
functions of distributed DBMS. We will also illustrates the components of a
distributed database system, and discuss Dates 12 objectives for
distributed database system
Objectives:
After studying this unit, you should be able to:
describe DDB system
Sikkim Manipal University
B1966
Unit 12
B1966
Unit 12
Data tracing DDBMS should have the ability to keep track of the data
distribution, fragmentation and replication by maintaining DDBMS
catalogue.
It provides a complete view of your data. For example, you can query for
the number of customers worldwide or the worldwide inventory level of a
product.
B1966
Unit 12
Centralised database
B1966
Unit 12
Self-Assessment Questions
1. ____________________ is a set of databases stored on multiple
computers but it appears to a user as a single database.
2. Which of the following function of distributed databases have the ability
to keep track of the data distribution, fragmentation and replication by
maintaining DDBMS catalogue?
a. Distributed query processing
b. Data tracing
c. Distributed database recovery
d. Security
Sikkim Manipal University
B1966
Unit 12
Each DBMS at the local site can handle their data independently
B1966
Unit 12
All the above advantages can be brought down to the following list in brief:
Data is located near the site that has the greatest demand.
Communication is improved.
Disadvantages
B1966
Unit 12
B1966
Unit 12
B1966
Unit 12
12.7 Summary
Let us recapitulate the important concepts discussed in this unit:
Techniques that are used to break up the database into logical units
called fragments that may be assigned for storage at the various sites.
12.8 Glossary
Server: A server is a system (software and suitable computer hardware)
that responds to requests across a computer network to provide, or help to
provide, a network service.
DMS software: A document management system (DMS) is a computer
system (or set of computer programs) used to track and store electronic
documents.
Network: A network is a group of two or more computer systems linked
together.
Fragments: A Fragment represents a behavior or a portion of user interface
in an Activity.
Website: A website is a set of related web pages served from a single web
domain.
Communication: Communication is the exchange and flow of information
and ideas from one person to another; it involves a sender transmitting an
idea, information, or feeling to a receiver.
B1966
Unit 12
12.10 Answers
Self-Assessment Questions
1. Distributed Database (DDB)
2. Data tracing
3. Not efficient.
4. true
5. false
6. Reliability
7. false
8. degree of homogeneity
9. Replication
10. Horizontal fragmentation,
fragmentation
Vertical
fragmentations
and
Mixed
Terminal Questions
1. Basic functions performed by DDBMS are Distributed query
processing, Data tracing, Distributed transaction management,
Distributed database recovery , Security, Distributed directory
(catalogue) management ( Refer section No 12.2- Overview of
distributed database systems)
Sikkim Manipal University
B1966
Unit 12
E-Reference:
http://my.safaribooksonline.com/book/databases/9788131727188/distrib
uted-databaseconcepts/ch03lev1sec2#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk
3ODgxMzE3MjcxODglMkZjaDAzJnF1ZXJ5PQ==
http://www.nwlink.com/~donclark/leader/leadcom.html#sthash.VNrbMjkT
.dpuf (retrieved on 15th may 2014)
http://developer.android.com/guide/components/fragments.html.
(retrieved on 13th may 2014)
B1966
Unit 13
Unit 13
Object-Relational Databases
Structure:
13.1 Introduction
Objectives
13.2 Basics of Object-Oriented Design (OOD)
Characteristics of OOD
Advantages of OOD
Object-oriented development
Object and object classes
13.3 Object-Oriented Data Model
Object identity
Complex objects
Persistence
Type and class hierarchies
Inheritance
13.4 Object-Oriented Databases
History of databases
How do ODBMSs work?
Implementation issues
Relationships
Advantages
Limitations
13.5 Object Relational Database Management System (ORDBMS)
Performance constraints
ORDBMS benefits
13.6 Summary
13.7 Terminal Questions
13.8 Answers
13.1 Introduction
In the previous unit, you studied distributed databases. This unit introduces
you to the basic concepts of object-oriented databases (OODs). Its purpose
is to help you decide whether you should investigate such products further,
and to understand how they work. This unit will explain to you the
approaches to OODs. The object-oriented approach offers the flexibility to
handle some of these requirements without being limited by the data types
Sikkim Manipal University
B1966
Unit 13
B1966
Unit 13
B1966
Unit 13
You can see a DB language as a concrete syntax for a data model. Data
model is implemented by a DB system.
The basic concepts of object-oriented data model are the following:
13.3.1 Object identity
Any real-world entity is uniformly modelled as an object. They are attached
with a unique ID which is used to refer the object for retrieval. You can see
an object retaining its identity even if some or all of the values of variables
or definitions of methods change over time.
This concept of object identity is necessary in applications but does not
apply to tuples of a relational database. It is a stronger notion of identity
than that typically found in programming languages or in data models not
based on object orientation.
There are many forms of identity. They are as follows:
Value - A data value is used for identity; for example, the primary key of
a tuple in a relational database.
Name - A user-supplied name is used for identity; for example, file name
in a file system.
There are many situations that avail the benefits of generating the identifiers
automatically, which help in becoming human-independent in performing the
task.
13.3.2 Complex objects
Complex objects are those that are formed from the simpler objects by
applying methods to them. Examples of simpler objects may be integers,
characters, strings of any length, Booleans (0/1), floating point values and
so on; examples of methods or constructor can be set, list, tuples, and
so on.
You can differentiate complex objects as structured objects and
unstructured objects.
Structured complex objects are components and are defined by applying
type constructor recursively at different levels. For example, consider the
Sikkim Manipal University
B1966
Unit 13
Represents Tuple
Represents Structure
In the first level, the DEPARTMENT has a tuple structure with six attributes
(Dno, Dname, Manager, Location, Employee and Project). You can observe
that out of these attributes Dno and Dname have basic values; the other
four have complex structure. Therefore, you need to build second level of
the complex object structure. You can also observe that out of these four,
Manager and Employee have tuple structure and the other two (Location,
Projects) have set attributes. For the third level, the manager has one basic
attribute for start_date_exec and Mgr is an attribute that refers to employee
object and has a tuple structure. For Location and Projects, we have a set of
tuple structured objects.
Thus, it is used to represent the object and its hierarchy in a structured form.
Unstructured components are data types that are stored on large data
storage. This kind of complex object is used to represent image or large
text. For example, consider objects that are two-dimensional images; if we
Sikkim Manipal University
B1966
Unit 13
need that any application needs to select from the collection of those
images which are of similar pattern, then the user must provide the pattern
which is recognised. Here, pattern recognition is a different field of study in
itself which may help in studying the different patterns and building
relationship between the patterns.
13.3.3 Persistence
You can create any object by executing some applications program orby
invoking the object constructor operations. Not all objects are meant to be
stored permanently in the database. Object persistence, a term you often
hear, is used in conjunction with the issue of storing objects in databases.
Persistence is expected to operate with transactional integrity, and as such
it is subject to strict conditions. In contrast, language services offered
through standard language libraries and packages are often free from
transactional constraints. The typical mechanisms for making an object
persistent are naming and reachability.
The naming mechanism involves giving an object a unique persistent name
through which it can be retrieved by this and other programs. However, it is
sometimes not practical to give names to all objects in a large database that
includes thousands of objects; therefore, most objects are made persistent
by using the second mechanism called reachability. The reachability
mechanism works by making the object reachable from some persistent
object.
13.3.4 Type and class hierarchies
A type is defined by giving a type name and later listing the names of its
visible (public) functions. Here is a simple example: you can define a type
that gives the details of an EMPLOYEE as,
EMPLOYEE: Emp_Id, Name, Address, department, DOB, age, Phne_no
In the EMPLOYEE type, you can implement Emp_Id, Name, Address,
department, DOB, Phne_no functions as stored attributes, and the age
function as a method that calculates the age from the value given in the
DOB attribute and current date.
Class is a means of grouping all the objects that share the same set of
attributes and methods. An object must belong to only one class as an
instance of that class (instance of relationship). A class is similar to an
Sikkim Manipal University
B1966
Unit 13
abstract data type. A class may also be primitive (no attributes), for
example, integer, string, Boolean. Class hierarchies derive a new class
(subclass) from an existing class (superclass). The subclass inherits all the
attributes and methods of the existing class and may have additional
attributes and methods.
13.3.5 Inheritance
Inheritance is a way of defining relationships among objects. As the name
indicates, inheritance tells us that an object is able to inherit characteristics
from another object. In more detail, we can say that an object is capable of
acquiring the state and behaviour of its parent object. The objects will have
common behaviours so that inheritance works.
For example, suppose we would like to create a class called Human which
would represent the physical characteristics. It is a generic class that would
represent you, me and any other human in the world. It has a state that talks
of having legs, arms and so on. They can eat, sleep, drink and walk. In that
way, human is capable of acquiring behaviours that resemble all of us. But
when it comes to the specific of being of a particular gender, it is not the
same. Here, another two new class types need to be creatednamely,
man and woman. The state and behaviour of the human will now depend
upon these two classes. The human will differ from each other based on
these two types and may be a combination of two classes. Therefore,
inheritance allows us to encompass the state and behaviour of a parent
class to a child class. The child class is treated as the specialised version of
its parent.
The following are the advantages of inheritance:
It is an abstraction mechanism which may be used to classify entities.
It is a reuse mechanism at both the design and the programming level.
The inheritance graph is a source of organisational knowledge about
domains and systems.
Self-Assessment Questions
3. What are the different forms of object identity?
4. ___________ are those that form from the simpler objects by applying
methods to them.
B1966
Unit 13
Eyes
Moustache
Ears
Body
Front legs
Back legs
If the same thing has to be stored in ODBMS, it is the object DOG which is
the combination of many attributes and methods (Table 13.2).
Table 13.2: ODBMS Table
DOG
B1966
Unit 13
Property
File systems
(1950s)
Hierarchical/
network
(1960s)
Concurrency
Recovery
Fast access
Complex structures
Relational
(19701980s)
More reliability
Less redundancy
More flexibility
Multiple views
ODBMS
(1990s)
Better simulation
More (and complex) data types
More relationships (e.g. aggregation, specialisation)
B1966
Unit 13
Std_name
Std_add
MBA2001
Priyadarshini Bhat
MBA2002
Ashwini Sharma
MBA2003
Ravi Joshi
MBA2004
Shilpa Saxena
MBA2005
Rashi Khanna
Course_name
M1
Marketing
H1
Human Resource
IS1
Information Science
IT2
Information Technology
Std_id
Course_id
MBA2001
M1
MBA2002
H1
MBA2003
IS1
MBA2004
IT2
MBA2005
M1
B1966
Unit 13
B1966
Unit 13
2. For the query, name all students opting Marketing, then the query may
be
o Search Course index and find Course_id.
o Follow student pointers, looking up each std_id.
This process is called Navigation. You should note that the process relies
on pointers and for this reason pointers must be persistent. When this
system was first initiated, the querying varied considerably. But due to the
existence of Object-Oriented Language (OOL), it has become normalised.
13.4.3 Implementation issues
To implement a stored procedure, the behaviour must be described in the
object model and implemented in the run time implementation of the object
model behaviour.
Likewise, referential integrity, which is traditionally supported through
triggers or declarative constructs in the relational world, must be described
in the object model and implemented in the runtime. The theoretical problem
with this is that such things as database rules must be consistently
implemented in each application, as opposed to once in the DBMS with
most RDBMS and ERDBMS products. If this separation is not managed,
inconsistencies can arise in the database.
The most important factors that are responsible are the following:
Persistence - This is that property of object-oriented database which
gives objects persistence. This allows the objects to be stored between
database runs. This also helps in versioning, which means a new object
is created every time changes are made.
Sharing - Objects can be shared in the distributed environment. Objects
can be shared between processes wherever required. This is possible
with object-oriented databases.
Paging - Object-oriented databases can reduce the need for paging by
enabling only the currently required objects to be loaded into memory
(relational databases load in tables containing both the required data
AND other unnecessary data).
13.4.4 Relationships
Relationships are the connectivity between the two objects or among
different objects. Diamond is the notation used to represent relationship. For
Sikkim Manipal University
B1966
Unit 13
Sits in
CLASSROOM
TEACHER
Teaches
STUDENT
There are four different kinds of standard relationships which object oriented
databases models.
Inheritance - This kind of relationship is used when one object is a kind
of something else. For example, son looks like his father.
Association - This kind of relationship is used when one object is
having a connection with another object. For example, husband is
related to his wife.
Aggregation - This kind of relationship is used when one object is made
out of other objects. For example, human body is made out of different
organs.
Inverse relationship - This kind of relationship is used when one object
is part of another object. For example, stomach is part of a human body.
13.4.5 Advantages
There are many advantages in using ODBMS over RDBMS. They are as
follows:
Objects dont require assembly and disassembly, and thereby saves
coding time and execution time to assemble or disassemble objects.
ODBMS has reduced paging.
ODBMS has easier navigation facilities, which leads to easier
versioning.
Sikkim Manipal University
B1966
Unit 13
13.4.6 Limitations
Despite several advantages in ODBMS, there are many drawbacks that are
mentioned below:
ODBMS has lower efficiency when data is simple and relationships are
simple.
In ODBMS relational tables are simpler.
ODBMS has reduced access speed due to late binding.
More user tools exist in RDBMS.
In ODBMS lack of standards includes lack of common query language,
such as SQL.
Support for RDBMS is more certain and change is less likely to be
required.
Self-Assessment Questions
8. ________ are the behaviour of the objects defined by methods.
9. What are the properties of hierarchical systems?
10. A relationship is represented using the notation _________.
11. Name the kinds of standards which the object-oriented databases
models.
B1966
Unit 13
13.6 Summary
In this unit, we discussed the object-oriented database development. We
discussed on how the concepts of OOD help in designing. Object-oriented
systems are used to represent the data in the form of objects. It is more
beneficial than representing in the relational database format because we
can represent the whole object, and by using pointers we can retrieve the
data easily. Despite the advantages, there are many drawbacks in using this
model. We discussed the usage of collaborative model of system where
object-oriented front end was implemented on relational database.
13.8 Answers
Self-Assessment Questions
1. Answers:
a) False
b) True
c) True
2. Analysis, design and programming
Sikkim Manipal University
B1966
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Unit 13
Terminal Questions
1. (Refer to Section 13.2 for further information.)
2. (Refer to Section 13.3 for further information.)
3. (Refer to Section 13.4.2 for further information.)
4. (Refer to Sections 13.5.1 and 13.5.2 for further information.)
B1966
Unit 14
Unit 14
Structure:
14.1 Introduction
Objectives
14.2 Security and Integrity Violations
14.3 Authorisation
14.4 Authentication
14.5 Encryption
The Data Encryption Standards (DESs)
Public key encryption
14.6 Granting of Privileges
14.7 Security Specification in SQL
14.8 Role of Database Administrators (DBAs) in Database Security
14.9 Issues in Database Security
14.10 Summary
14.11 Glossary
14.12 Terminal Questions
14.13 Answers
14.14 Case Study
14.1 Introduction
In the previous unit we discussed the object-oriented database system.
When we say object oriented, then the topic of securing the data is a great
issue because it has direct access to the database. We should take more
care in authorisation and authentication of the database. Security is one of
the major factors in database management that covers all the above
discussed factors. Data in a database has to be protected from
unauthorised access and manipulations. Database security involves
allowing or disallowing users from performing actions on the database.
Database must be secured against data misuse or inconsistency due to
concurrent execution.
Caselet
A database management system is a suite of software applications that
together make it possible for people or businesses to store, modify and
Sikkim Manipal University
B1966
Unit 14
B1966
Unit 14
different security and integrity violations. We will also discuss about the
authentication and authorisation of the users. We will also discuss the role
of the database administrator in security. When we speak about security, we
also have to discuss on ethical issues of DBMS.
Objectives
After studying this unit, you should be able to:
find out the various violations in security and integrity
relate the concept of authorisation and authentication with database
administrator
describe the security specifications in SQL
analyse the role of database administrator in security
identify the ethical issues in database security
B1966
Unit 14
Self-Assessment Questions
1. ___________________________and_________________________
are the categories of misuse of data.
2. State whether the following statements are True or False:
a) A system crashes during transaction processing.
b) Single-user-accessing of the database will lead to accidental loss of
data consistency.
c) Database security is done by protecting the data in primary memory
by avoiding direct access to the data.
14.3 Authorisation
A user may have several forms of authorisation on parts of the database.
Among them are the following:
Read authorisation allows reading, but not modification of data.
Insert authorisation allows insertion of new data, but not modification of
existing data.
Update authorisation allows modification, but not deletion of data.
Delete authorisation allows deletion of data.
Index authorisation allows the creation and deletion of indices.
Resource authorisation allows the addition or deletion of attributes in a
relation.
Drop authorisation allows the deletion of relations.
The ultimate form of authority is that given to the database
administrator. The database administrator may authorise new users.
Authorisation and views
A view can hide data that a user does not need to see. Views play a very
important role in providing data security, and it simplifies the complex
queries so that users can concentrate only on the required portion of the
Sikkim Manipal University
B1966
Unit 14
relations (tables). It prevents users from direct access to a relation; they can
only view portions of the table.
For example: Create view V_emp as select emp_no. Ename, Sal from Emp;
then select * from V_emp;
Here clerks are not authorised to see salary information directly from
employee relation. But he/she must be granted access to the view V_emp. It
provides a security on relation emp. A view V_emp must have read
authorisation on employee.
14.4 Authentication
While the authorisation will have a check on the amount of database to be
accessed by a user, authentication is a process that identifies the user. It
can be done with the help of simple passwords. We get confused between
these two terms and consider them to be the same and try to use them as
synonyms. But in reality, these two have to be dealt with care. As mentioned
earlier, authentication identifies the user of the database, checks with the
unique ID given to the user with the registered users, fetches the related
information needed to confirm that the user is a valid user and the rights to
access the database.
For example: If User U is asking to access the database, then the database
must identify U as a registered user. This is authentication. Suppose User U
has to perform some operation, fetch any resource or perform any operation
on a particular resource, then it has to be validated by the database that
User U is allowed to do the above tasks. This is known as authorisation.
Self-Assessment Questions
3. Insert authorisation allows insertion of new data but not ____________
of existing data.
4. State whether the following statements are True or False:
a) The DBA is not authorised to give access to new users.
b) Authorisation technique identifies the user of the database and
checks with the unique ID given to the user with registered users.
14.5 Encryption
While we try to maintain the security of the data with authentication
technique, there are always various methods to access and change the flow
Sikkim Manipal University
B1966
Unit 14
B1966
Unit 14
B1966
Unit 14
A user has an authorisation if and only if there is a path from the root of the
authorisation graph down to the node representing the user.
Suppose that the database administrator decides to revoke or cancel the
authorisation of a user U1, but users U4 and U5 have been granted
authorisation from U1. Before revoking authorisation from U4, U1 has to be
revoked. But there is no need to revoke permissions from U5 because U5
was granted permissions from U1 and U2. Both U1 and U2 are still granting
authorisation to U5 who retains update authorisation on loan.
To properly revoke access rights, all paths in the authorisation group must
start from the authoriser.
U1
DBA
U4
U2
U5
U3
Fig. 14.1: Authorisation Grant Graph
B1966
Unit 14
The update, insert authorisation may be given either on all attributes of the
relation or on only some.
Grant update (amount) on loan to U1, U2 and U3.
If we wish to grant a privilege and allow the recipient to pass the privilege on
to other users, we append with grant option clause to the appropriate grant
command.
If we wish to allow U1 the select privilege on branch and allow U1 to grant
this privilege to others. We write:
Grant select on branch to U1 with grant option.
To revoke an authorisation, we use the revoke statement. It takes a form
almost identical to that of grant:
Revoke <privilege list>on<relation name or view name>
From<user list> [restrict | cascade]
Thus, to revoke the privilege that we granted previously, we write:
Revoke select on branch from U1, U2, U3 cascade
Revoke update (amount) on loan from U1, U2, U3
Revoke references (branch-name) on branch from U1
Self-Assessment Questions
12. To properly revoke access rights, all the paths in the authorisation
group must start from the _________________________.
13. The _________________ statement is used to give authorisation.
14. Which of the following allows the deletion of relations:
a) Index authorisation
b) Drop authorisation
c) Update authorisation
d) Select authorisation
B1966
Unit 14
B1966
Unit 14
14.10 Summary
Let us recapitulate the important concepts discussed in this unit:
14.11 Glossary
Integrity: the quality of being honest and having strong moral principles
Security: Security is the degree of resistance to, or protection from, harm.
DES algorithm: Data Encryption standard algorithm is a previously
predominant symmetric-key algorithm for the encryption of electronic data. It
was highly influential in the advancement of modern cryptography in the
academic world.
B1966
Unit 14
14.13 Answers
Self-Assessment Questions
1. Intentional data loss and accidental loss of data consistency
2. Answers:
a) True
b) False
c) True
3. Modification
4. Answers:
a) False
b) False
5. c) Encryption
6. Data encryption standard
7. Permutation
8. BRXU SLHFH RIPLQG
9. ESPECI
10. Public key encryption
11. False
12. Authoriser
13. Grant
14. b) Drop authorization
15. DBA
16. System
Terminal Questions
1. The DES is a system developed by the US government for the general
use of the public. The DES algorithm uses two methods of encryption,
namely, substitution and permutation. Substitution, as the name
indicates, is replacement. A symbol or groups of symbols are replaced
by some other symbol. For example, MY NAME IS SHANKAR, is
replaced by PB QDPH LV VKDQNDU by taking three letters down in
the English alphabet. If the alphabets come to an end, it will be
considered from the beginning. (Refer to Section 14.5.1 for further
information.)
2. Public key encryption was based on mathematical functions. They used
two keys instead of one key as in bit-pattern method. This increases
Sikkim Manipal University
B1966
Unit 14
B1966
Unit 14
services are necessary as these add business value and help one maximise
their investment in the IT infrastructure.
Discussion Questions:
1. What are the drawbacks if you dont have authorisation property in
database?
2. What are the drawbacks of an unauthorised user?
(Hint: Refer to Sections 14.3 and 14.4 for further information.)
References/E-References:
References:
Er. Jain, V. K. (2008). Database Management Systems. New Delhi:
Dreamtech Press.
Elmasri, R., & Navathe, S. B. (2009). Fundamentals of Database
Systems, 5th ed. New Delhi: Pearson Education Inc.
Singh, S. K. (2009). Database SystemsConcepts, Design and
Application, 3rd ed. New Delhi: Dorling Kindersley (India) Pvt Ltd.,
licenced by Pearson Education.
E-References:
http://www.cs.man.ac.uk/~horrocks/Teaching/cs2312/Lectures/Handouts
/NFexamples.pdf (Retrieved on 29th January 2013)
www.Vceit.com (Retrieved on 29th January 2013)
http://db.grussell.org/section009.html (Retrieved on 29th January 2013)
http://www.wisegeek.org/what-is-a-database-management-system.htm
(Retrieved on 30th January 2013)
http://www.dragonwins.com/domains/getteched/crypto/subs_and_perms
.htm (Retrieved on 25th February 2013)
http://docs.oracle.com/cd/E11882_01/server.112/e10897/users_secure.
htm#CHDEBHDE (Retrieved on 25th February 2013)
http://niatec.info/ViewPage.aspx?id=153 (Retrieved on 29th February
2013)
http://www.techrepublic.com/whitepapers/case-study-web-applicationdatabase-security-audit/1855003?tag=content;selector-1 (Retrieved on
4th March 2013)
B1966