Sunteți pe pagina 1din 37

As the name suggests, the database management system consists of two parts.

They are:
1. Database and
2. Management System

What is a Database?

To find out what database is, we have to start from data, which is the basic building block of
any DBMS.

Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).

Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent meaningful
information.

Roll Name Age


1 ABC 19

Table or Relation: Collection of related records.

Roll Name Age


1 ABC 19
2 DEF 22
3 XYZ 28

The columns of this relation are called Fields, Attributes or Domains. The rows are called
Tuples or Records.

Database: Collection of related relations. Consider the following collection of tables:

T1
Roll Name Age
1 ABC 19
2 DEF 22
3 XYZ 28

T2
Roll Address
1 KOL
2 DEL
3 MUM

T3
Roll Year
1 I
2 II
3 I
T4
Year Hostel
I H1
II H2

We now have a collection of 4 tables. They can be called a related collection because we
can clearly find out that there are some common attributes existing in a selected pair of
tables. Because of these common attributes we may combine the data of two or more tables
together to find out the complete details of a student. Questions like Which hostel does the
youngest student live in? can be answered now, although Age and Hostel attributes are in
different tables.

In a database, data is organized strictly in row and column format. The rows are called Tuple
or Record. The data items within one row may belong to different data types. On the other
hand, the columns are often called Domain or Attribute. All the data items within a single
attribute are of the same data type.

What is Management System?

A management system is a set of rules and procedures which help us to create organize and
manipulate the database. It also helps us to add, modify delete data items in the database. The
management system can be either manual or computerized.

The management system is important because without the existence of some kind of rules and
regulations it is not possible to maintain the database. We have to select the particular
attributes which should be included in a particular table; the common attributes to create
relationship between two tables; if a new record has to be inserted or deleted then which
tables should have to be handled etc. These issues must be resolved by having some kind of
rules to follow in order to maintain the integrity of the database.
Three Views of Data

We know that the same thing, if viewed from different angles produces difference sights.
Likewise, the database that we have created already can have different aspects to reveal if
seen from different levels of abstraction. The term Abstraction is very important here.
Generally it means the amount of detail you want to hide. Any entity can be seen from
different perspectives and levels of complexity to make it a reveal its current amount of
abstraction. Let us illustrate by a simple example.

A computer reveals the minimum of its internal details, when seen from outside. We do not
know what parts it is built with. This is the highest level of abstraction, meaning very few
details are visible. If we open the computer case and look inside at the hard disc,
motherboard, CD drive, CPU and RAM, we are in middle level of abstraction. If we move on
to open the hard disc and examine its tracks, sectors and read-write heads, we are at the
lowest level of abstraction, where no details are invisible.

In the same manner, the database can also be viewed from different levels of abstraction to
reveal different levels of details. From a bottom-up manner, we may find that there are three
levels of abstraction or views in the database. We discuss them here.

The word schema means arrangement how we want to arrange things that we have to store.
The diagram above shows the three different schemas used in DBMS, seen from different
levels of abstraction.

The lowest level, called the Internal or Physical schema, deals with the description of how
raw data items (like 1, ABC, KOL, H2 etc.) are stored in the physical storage (Hard Disc,
CD, Tape Drive etc.). It also describes the data type of these data items, the size of the items
in the storage media, the location (physical address) of the items in the storage device and so
on. This schema is useful for database application developers and database administrator.

The middle level is known as the Conceptual or Logical Schema, and deals with the
structure of the entire database. Please note that at this level we are not interested with the
raw data items anymore, we are interested with the structure of the database. This means we
want to know the information about the attributes of each table, the common attributes in
different tables that help them to be combined, what kind of data can be input into these
attributes, and so on. Conceptual or Logical schema is very useful for database administrators
whose responsibility is to maintain the entire database.

The highest level of abstraction is the External or View Schema. This is targeted for the end
users. Now, an end user does not need to know everything about the structure of the entire
database, rather than the amount of details he/she needs to work with. We may not want the
end user to become confused with astounding amount of details by allowing him/her to have
a look at the entire database, or we may also not allow this for the purpose of security, where
sensitive information must remain hidden from unwanted persons. The database
administrator may want to create custom made tables, keeping in mind the specific kind of
need for each user. These tables are also known as virtual tables, because they have no
separate physical existence. They are crated dynamically for the users at runtime. Say for
example, in our sample database we have created earlier, we have a special officer whose
responsibility is to keep in touch with the parents of any under aged student living in the
hostels. That officer does not need to know every detail except the Roll, Name, Addresss and
Age. The database administrator may create a virtual table with only these four attributes,
only for the use of this officer.

Data Independence

This brings us to our next topic: data independence. It is the property of the database which
tries to ensure that if we make any change in any level of schema of the database, the schema
immediately above it would require minimal or no need of change.

What does this mean? We know that in a building, each floor stands on the floor below it. If
we change the design of any one floor, e.g. extending the width of a room by demolishing the
western wall of that room, it is likely that the design in the above floors will have to be
changed also. As a result, one change needed in one particular floor would mean continuing
to change the design of each floor until we reach the top floor, with an increase in the time,
cost and labour. Would not life be easy if the change could be contained in one floor only?
Data independence is the answer for this. It removes the need for additional amount of work
needed in adopting the single change into all the levels above.

Data independence can be classified into the following two types:


1. Physical Data Independence: This means that for any change made in the physical
schema, the need to change the logical schema is minimal. This is practically easier to
achieve. Let us explain with an example.

Say, you have bought an Audio CD of a recently released film and one of your friends has
bought an Audio Cassette of the same film. If we consider the physical schema, they are
entirely different. The first is digital recording on an optical media, where random access is
possible. The second one is magnetic recording on a magnetic media, strictly sequential
access. However, how this change is reflected in the logical schema is very interesting. For
music tracks, the logical schema for both the CD and the Cassette is the title card imprinted
on their back. We have information like Track no, Name of the Song, Name of the Artist and
Duration of the Track, things which are identical for both the CD and the Cassette. We can
clearly say that we have achieved the physical data independence here.

2. Logical Data Independence: This means that for any change made in the logical
schema, the need to change the external schema is minimal. As we shall see, this is a
little difficult to achieve. Let us explain with an example.

Suppose the CD you have bought contains 6 songs, and some of your friends are interested in
copying some of those songs (which they like in the film) into their favorite collection. One
friend wants the songs 1, 2, 4, 5, 6, another wants 1, 3, 4, 5 and another wants 1, 2, 3, 6. Each
of these collections can be compared to a view schema for that friend. Now by some mistake,
a scratch has appeared in the CD and you cannot extract the song 3. Obviously, you will have
to ask the friends who have song 3 in their proposed collection to alter their view by deleting
song 3 from their proposed collection as well.

Database Administrator

The Database Administrator, better known as DBA, is the person (or a group of persons)
responsible for the well being of the database management system. S/he has the flowing
functions and responsibilities regarding database management:

1. Definition of the schema, the architecture of the three levels of the data abstraction,
data independence.
2. Modification of the defined schema as and when required.
3. Definition of the storage structure i.e. and access method of the data stored i.e.
sequential, indexed or direct.
4. Creating new used-id, password etc, and also creating the access permissions that
each user can or cannot enjoy. DBA is responsible to create user roles, which are
collection of the permissions (like read, write etc.) granted and restricted for a class of
users. S/he can also grant additional permissions to and/or revoke existing
permissions from a user if need be.
5. Defining the integrity constraints for the database to ensure that the data entered
conform to some rules, thereby increasing the reliability of data.
6. Creating a security mechanism to prevent unauthorized access, accidental or
intentional handling of data that can cause security threat.
7. Creating backup and recovery policy. This is essential because in case of a failure the
database must be able to revive itself to its complete functionality with no loss of data,
as if the failure has never occurred. It is essential to keep regular backup of the data so
that if the system fails then all data up to the point of failure will be available from a
stable storage. Only those amount of data gathered during the failure would have to be
fed to the database to recover it to a healthy status.

Advantages and Disadvantages of Database Management System

We must evaluate whether there is any gain in using a DBMS over a situation where we do
not use it. Let us summarize the advantages.

1. Reduction of Redundancy: This is perhaps the most significant advantage of using


DBMS. Redundancy is the problem of storing the same data item in more one place.
Redundancy creates several problems like requiring extra storage space, entering
same data more than once during data insertion, and deleting data from more than one
place during deletion. Anomalies may occur in the database if insertion, deletion etc
are not done properly.
2. Sharing of Data: In a paper-based record keeping, data cannot be shared among
many users. But in computerized DBMS, many users can share the same database if
they are connected via a network.
3. Data Integrity: We can maintain data integrity by specifying integrity constrains,
which are rules and restrictions about what kind of data may be entered or
manipulated within the database. This increases the reliability of the database as it can
be guaranteed that no wrong data can exist within the database at any point of time.
4. Data security: We can restrict certain people from accessing the database or allow
them to see certain portion of the database while blocking sensitive information. This
is not possible very easily in a paper-based record keeping.

However, there could be a few disadvantages of using DBMS. They can be as following:

1. As DBMS needs computers, we have to invest a good amount in acquiring the


hardware, software, installation facilities and training of users.
2. We have to keep regular backups because a failure can occur any time. Taking backup
is a lengthy process and the computer system cannot perform any other job at this
time.
3. While data security system is a boon for using DBMS, it must be very robust. If
someone can bypass the security system then the database would become open to any
kind of mishandling.

Entity Relationship Diagram:-


When a company asks you to make them a working, functional DBMS which they can work
with, there are certain steps to follow. Let us summarize them here:

1. Gathering information: This could be a written document that describes the system
in question with reasonable amount of details.
2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic
representation of the description we have gathered about the system.
3. Designing the database: Out of the ERD we have created, it is very easy to
determine the tables, the attributes which the tables must contain and the relationship
among these tables.
4. Normalization: This is a process of removing different kinds of impurities from the
tables we have just created in the above step.

How to Prepare an ERD

Step 1
Let us take a very simple example and we try to reach a fully organized database from it. Let
us look at the following simple statement:
A boy eats an ice cream.

This is a description of a real word activity, and we may consider the above statement as a
written document (very short, of course).

Step 2
Now we have to prepare the ERD. Before doing that we have to process the statement a little.
We can see that the sentence contains a subject (boy), an object (ice cream) and a verb (eats)
that defines the relationship between the subject and the object. Consider the nouns as entities
(boy and ice cream) and the verb (eats) as a relationship. To plot them in the diagram, put the
nouns within rectangles and the relationship within a diamond. Also, show the relationship
with a directed arrow, starting from the subject entity (boy) towards the object entity (ice
cream).

Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every
boy must have a name, address, phone number etc. and every ice cream has a manufacturer,
flavor, price etc. Without these the diagram is not complete. These items which we mentioned
here are known as attributes, and they must be incorporated in the ERD as connected ovals.
But can only entities have attributes? Certainly not. If we want then the relationship must
have their attributes too. These attribute do not inform anything more either about the boy or
the ice cream, but they provide additional information about the relationships between the
boy and the ice cream.

Step 3
We are almost complete now. If you look carefully, we now have defined structures for at
least three tables like the following:

Boy
Name Address Phone

Ice Cream
Manufacturer Flavor Price

Eats
Date Time

However, this is still not a working database, because by definition, database should be
collection of related tables. To make them connected, the tables must have some common
attributes. If we chose the attribute Name of the Boy table to play the role of the common
attribute, then the revised structure of the above tables become something like the following.
Boy
Name Address Phone

Ice Cream
Manufacturer Flavor Price Name

Eats
Date Time Name

This is as complete as it can be. We now have information about the boy, about the ice cream
he has eaten and about the date and time when the eating was done.

Cardinality of Relationship

While creating relationship between two entities, we may often need to face the cardinality
problem. This simply means that how many entities of the first set are related to how many
entities of the second set. Cardinality can be of the following three types.

One-to-One
Only one entity of the first set is related to only one entity of the second set. E.g. A teacher
teaches a student. Only one teacher is teaching only one student. This can be expressed in the
following diagram as:

One-to-Many
Only one entity of the first set is related to multiple entities of the second set. E.g. A teacher
teaches students. Only one teacher is teaching many students. This can be expressed in the
following diagram as:
Many-to-One
Multiple entities of the first set are related to multiple entities of the second set. E.g. Teachers
teach a student. Many teachers are teaching only one student. This can be expressed in the
following diagram as:

Many-to-Many
Multiple entities of the first set is related to multiple entities of the second set. E.g. Teachers
teach students. In any school or college many teachers are teaching many students. This can
be considered as a two way one-to-many relationship. This can be expressed in the following
diagram as:

In this discussion we have not included the attributes, but you can understand that they can be
used without any problem if we want to.

The Concept of Keys

A key is an attribute of a table which helps to identify a row. There can be many different
types of keys which are explained here.

Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a
row in a table. Generally they contain unique values and can never contain NULL values.
There can be more than one super key or candidate key in a table e.g. within a STUDENT
table Roll and Mobile No. can both serve to uniquely identify a student.

Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the
entire table. E.g. although there are two candidate keys in the STUDENT table, the college
would obviously use Roll as the primary key of the table.

Alternate Key: This is the candidate key which is not chosen as the primary key of the table.
They are named so because although not the primary key, they can still identify a row.

Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a
single class Roll is enough to find a student, but in the entire school, merely searching by the
Roll is not enough, because there could be 10 classes in the school and each one of them may
contain a certain roll no 5. To uniquely identify the student we have to say something like
class VII, roll no 5. So, a combination of two or more attributes is combined to create a
unique combination of values, such as Class + Roll.

Foreign Key: Sometimes we may have to work with an attribute that does not have a primary
key of its own. To identify its rows, we have to use the primary attribute of a related table.
Such a copy of another related tables primary key is called foreign key.

Strong and Weak Entity

Based on the concept of foreign key, there may arise a situation when we have to relate an
entity having a primary key of its own and an entity not having a primary key of its own. In
such a case, the entity having its own primary key is called a strong entity and the entity not
having its own primary key is called a weak entity. Whenever we need to relate a strong and a
weak entity together, the ERD would change just a little.

Say, for example, we have a statement A Student lives in a Home. STUDENT is obviously
a strong entity having a primary key Roll. But HOME may not have a unique primary key, as
its only attribute Address may be shared by many homes (what if it is a housing estate?).
HOME is a weak entity in this case.

The ERD of this statement would be like the following

As you can see, the weak entity itself and the relationship linking a strong and weak entity
must have double border.

Different Types of Database

There are three different types of data base. The difference lies in the organization of the
database and the storage structure of the data. We shall briefly mention them here.

Relational DBMS
This is our subject of study. A DBMS is relational if the data is organized into relations, that
is, tables. In RDBMS, all data are stored in the well-known row-column format.

Hierarchical DBMS
In HDBMS, data is organized in a tree like manner. There is a parent-child relationship
among data items and the data model is very suitable for representing one-to-many
relationship. To access the data items, some kind of tree-traversal techniques are used, such as
preorder traversal.

Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty
to organize a hierarchical database into row column format. For example, consider the
following hierarchical database that shows four employees (E01, E02, E03, and E04)
belonging to the same department D1.

There are two ways to represent the above one-to-many information into a relation that is
built in one-to-one relationship. The first is called Replication, where the department id is
replicated a number of times in the table like the following.

Dept-Id Employee Code


D1 E01
D1 E02
D1 E03
D1 E04

Replication makes the same data item redundant and is an inefficient way to store data. A
better way is to use a technique called the Virtual Record. While using this, the repeating
data item is not used in the table. It is kept at a separate place. The table, instead of containing
the repeating information, contains a pointer to that place where the data item is stored.

This organization saves a lot of space as data is not made redundant.

Network DBMS
The NDBMS is built primarily on a oneto-many relationship, but where a parent-child
representation among the data items cannot be ensured. This may happen in any real world
situation where any entity can be linked to any entity. The NDBMS was proposed by a group
of theorists known as the Database Task Group (DBTG). What they said looks like this:
In NDBMS, all entities are called Records and all relationships are called Sets. The record
from where the relationship starts is called the Owner Record and where it ends is called
Member Record. The relationship or set is strictly one-to-many.

In case we need to represent a many-to-many relationship, an interesting thing happens. In


NDBMS, Owner and Member can only have one-to-many relationship. We have to introduce
a third common record with which both the Owner and Member can have one-to-many
relationship. Using this common record, the Owner and Member can be linked by a many-to-
many relationship.

Suppose we have to represent the statement Teachers teach students. We have to introduce a
third record, suppose CLASS to which both teacher and the student can have a many-to-many
relationship. Using the class in the middle, teacher and student can be linked to a virtual
many-to-many relationship.

Normalization:-

While designing a database out of an entityrelationship model, the main problem existing in
that raw database is redundancy. Redundancy is storing the same data item in more one
place. A redundancy creates several problems like the following:

1. Extra storage space: storing the same data in many places takes large amount of disk
space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are no
done properly. It creates inconsistency and unreliability in the database.

To solve this problem, the raw database needs to be normalized. This is a step by step
process of removing different kinds of redundancy and anomaly at each step. At each step a
specific rule is followed to remove specific kind of impurity in order to give the database a
slim and clean look.
Un-Normalized Form (UNF)

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can
be further decomposed and simplified. Consider the following table:

Emp-Id Emp-Name Month Sales Bank-Id Bank-Name


E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).

First Normal Form (1NF)

A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the
following table in 1NF.

Emp-Name Month Sales Bank-Id Bank-Name


Emp-Id
E01 AA Jan 1000 B01 SBI
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI

As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the
relation is now in 1NF.
Second Normal Form (2NF)

A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully
depends on the primary key of the relation. Speaking inversely, if a table has some attributes
which is not dependant on the primary key of that table, then it is not in 2NF.

Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and Bank-
Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is not the
primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be removed into
another related relation, it would come to 2NF.

Emp-Id Emp-Name Month Sales Bank-Id


E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01

Bank-Id Bank-Name
B01 SBI
B02 UTI

After removing the portion into another relation we store lesser amount of data in two
relations without any loss information. There is also a significant reduction in redundancy.

Third Normal Form (3NF)

A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive


dependency in that relation. Speaking inversely, if a table contains transitive dependency,
then it is not in 3NF, and the table must be split to bring it into 3NF.

What is a transitive dependency? Within a relation if we see


A B [B depends on A]
And
B C [C depends on B]
Then we may derive
A C[C depends on A]
Such derived dependencies hold well in most of the situations. For example if we have
Roll Marks
And
Marks Grade
Then we may safely derive
Roll Grade.

This third dependency was not originally specified but we have derived it.

The derived dependency is called a transitive dependency when such dependency


becomes improbable. For example we have been given
Roll City
And
City STDCode

If we try to derive Roll STDCode it becomes a transitive dependency, because obviously


the STDCode of a city cannot depend on the roll number issued by a school or college. In
such a case the relation should be broken into two, each containing one of these two
dependencies:
Roll City
And
City STD code

Boyce-Code Normal Form (BCNF)

A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These
could be same situation when a 3NF relation may not be in BCNF the following conditions
are found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.

Professor Code Department Head of Dept. Percent Time


P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept.
are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the
information that Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.

Professor Code Department Percent Time


P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

Head of Dept.
Department
Physics Ghosh
Mathematics Krishnan
Chemistry Rao
See the dependency diagrams for these new relations.

Fourth Normal Form (4NF)

When attributes in a relation have multi-valued dependency, further Normalization to 4NF


and 5NF are required. Let us first find out what multi-valued dependency is.

A multi-valued dependency is a typical kind of dependency in which each and every


attribute within a relation depends upon the other, yet none of them is a unique primary key.

We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:

1. A vendor is capable of supplying many items.


2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.

A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

The given relation has a number of problems. For example:

1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem
mentioned above. The problem is reduced by expressing this relation as two relations in the
Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one independent multi
valued dependency or one independent multi valued dependency with a functional
dependency.

The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects
in independently specified in the 4NF relation.

Vendor-Supply
Item Code
Vendor Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project No.
Vendor Code
V1 P1
V1 P3
V2 P1
V3 P2

Fifth Normal Form (5NF)

These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved the
dependencies between Vendor Code and Item code in the first table and Vendor Code and Item code
in the second table, we have lost the relationship between Item Code and Project No. If there were a
primary key then this loss of dependency would not have occurred. In order to revive this
relationship we must add a new table like the following. Please note that during the entire process of
normalization, this is the only step where a new table is created by joining two attributes, rather than
splitting them into separate tables.

Project No. Item Code


P1 11
P1 12
P2 11
P3 11
P3 13
Let us finally summarize the normalization steps we have discussed so far.

Input Transformation Output


Relation Relation
All Eliminate variable length record. Remove multi-attribute lines in 1NF
Relations table.
1NF Remove dependency of non-key attributes on part of a multi- 2NF
Relation attribute key.
2NF Remove dependency of non-key attributes on other non-key 3NF
attributes.
3NF Remove dependency of an attribute of a multi attribute key on an BCNF
attribute of another (overlapping) multi-attribute key.
BCNF Remove more than one independent multi-valued dependency from 4NF
relation by splitting relation.
4NF Add one relation relating attributes with multi-valued dependency. 5NF

Indexing

What is an Index?

An index is a small table having only two columns. The first column contains a copy of the
primary or candidate key of a table and the second column contains a set of pointers holding
the address of the disk block where that particular key value can be found.

The advantage of using index lies in the fact is that index makes search operation perform
very fast. Suppose a table has a several rows of data, each row is 20 bytes wide. If you want
to search for the record number 100, the management system must thoroughly read each and
every row and after reading 99x20 = 1980 bytes it will find record number 100. If we have an
index, the management system starts to search for record number 100 not from the table, but
from the index. The index, containing only two columns, may be just 4 bytes wide in each of
its rows. After reading only 99x4 = 396 bytes of data from the index the management system
finds an entry for record number 100, reads the address of the disk block where record
number 100 is stored and directly points at the record in the physical storage device. The
result is a much quicker access to the record (a speed advantage of 1980:396).

The only minor disadvantage of using index is that it takes up a little more space than the
main table. Additionally, index needs to be updated periodically for insertion or deletion of
records in the main table. However, the advantages are so huge that these disadvantages can
be considered negligible.

Types of Index
Primary Index
In primary index, there is a one-to-one relationship between the entries in the index table and
the records in the main table. Primary index can be of two types:

Dense primary index: the number of entries in the index table is the same as the number of
entries in the main table. In other words, each and every record in the main table has an entry
in the index.

Sparse or Non-Dense Primary Index:


For large tables the Dense Primary Index itself begins to grow in size. To keep the size of the
index smaller, instead of pointing to each and every record in the main table, the index points
to the records in the main table in a gap. See the following example.

As you can see, the data blocks have been divided in to several blocks, each containing a
fixed number of records (in our case 10). The pointer in the index table points to the first
record of each data block, which is known as the Anchor Record for its important function. If
you are searching for roll 14, the index is first searched to find out the highest entry which is
smaller than or equal to 14. We have 11. The pointer leads us to roll 11 where a short
sequential search is made to find out roll 14.

Clustering Index
It may happen sometimes that we are asked to create an index on a non-unique key, such as
Dept-id. There could be several employees in each department. Here we use a clustering
index, where all employees belonging to the same Dept-id are considered to be within a
single cluster, and the index pointers point to the cluster as a whole.

Let us explain this diagram. The disk blocks contain a fixed number of records (in this case 4
each). The index contains entries for 5 separate departments. The pointers of these entries
point to the anchor record of the block where the first of the Dept-id in the cluster can be
found. The blocks themselves may point to the anchor record of the next block in case a
cluster overflows a block size. This can be done using a special pointer at the end of each
block (comparable to the next pointer of the linked list organization).

The previous scheme might become a little confusing because one disk block might be shared
by records belonging to different cluster. A better scheme could be to use separate disk blocks
for separate clusters. This has been explained in the next page.
In this scheme, as you can see, we have used separate disk block for the clusters. The
pointers, like before, have pointed to the anchor record of the block where the first of the
cluster entries would be found. The block pointers only come into action when a cluster
overflows the block size, as for Dept-id 2. This scheme takes more space in the memory and
the disk, but the organization in much better and cleaner looking.

Secondary Index
While creating the index, generally the index table is kept in the primary memory (RAM) and
the main table, because of its size is kept in the secondary memory (Hard Disk).
Theoretically, a table may contain millions of records (like the telephone directory of a large
city), for which even a sparse index becomes so large in size that we cannot keep it in the
primary memory. And if we cannot keep the index in the primary memory, then we lose the
advantage of the speed of access. For very large table, it is better to organize the index in
multiple levels. See the following example.
In this scheme, the primary level index, (created with a gap of 100 records, and thereby
smaller in size), is kept in the RAM for quick reference. If you need to find out the record of
roll 14 now, the index is first searched to find out the highest entry which is smaller than or
equal to 14. We have 1. The adjoining pointer leads us to the anchor record of the
corresponding secondary level index, where another similar search is conducted. This finally
leads us to the actual data block whose anchor record is roll 11. We now come to roll 11
where a short sequential search is made to find out roll 14.

Multilevel Index
The Multilevel Index is a modification of the secondary level index system. In this system we
may use even more number of levels in case the table is even larger.

Index in a Tree like Structure

We can use tree-like structures as index as well. For example, a binary search tree can also be
used as an index. If we want to find out a particular record from a binary search tree, we have
the added advantage of binary search procedure, that makes searching be performed even
faster. A binary tree can be considered as a 2-way Search Tree, because it has two pointers in
each of its nodes, thereby it can guide you to two distinct ways. Remember that for every
node storing 2 pointers, the number of value to be stored in each node is one less than the
number of pointers, i.e. each node would contain 1 value each.

M-Way Search Tree


The abovementioned concept can be further expanded with the notion of the m-Way Search
Tree, where m represents the number of pointers in a particular node. If m = 3, then each
node of the search tree contains 3 pointers, and each node would then contain 2 values.
A sample m-Way Search Tree with m = 3 is given in the following.
Transaction Management

What is a Transaction?

A transaction is an event which occurs on the database. Generally a transaction reads a value
from the database or writes a value to the database. If you have any concept of Operating
Systems, then we can say that a transaction is analogous to processes.

Although a transaction can both read and write on the database, there are some fundamental
differences between these two classes of operations. A read operation does not change the
image of the database in any way. But a write operation, whether performed with the
intention of inserting, updating or deleting data from the database, changes the image of the
database. That is, we may say that these transactions bring the database from an image which
existed before the transaction occurred (called the Before Image or BFIM) to an image
which exists after the transaction occurred (called the After Image or AFIM).

The Four Properties of Transactions

Every transaction, for whatever purpose it is being used, has the following four properties.
Taking the initial letters of these four properties we collectively call them the ACID
Properties. Here we try to describe them and explain them.

Atomicity: This means that either all of the instructions within the transaction will be
reflected in the database, or none of them will be reflected.

Say for example, we have two accounts A and B, each containing Rs 1000/-. We now start a
transaction to deposit Rs 100/- from account A to Account B.

Read A;
A = A 100;
Write A;
Read B;
B = B + 100;
Write B;
Fine, is not it? The transaction has 6 instructions to extract the amount from A and submit it
to B. The AFIM will show Rs 900/- in A and Rs 1100/- in B.

Now, suppose there is a power failure just after instruction 3 (Write A) has been complete.
What happens now? After the system recovers the AFIM will show Rs 900/- in A, but the
same Rs 1000/- in B. It would be said that Rs 100/- evaporated in thin air for the power
failure. Clearly such a situation is not acceptable.

The solution is to keep every value calculated by the instruction of the transaction not in any
stable storage (hard disc) but in a volatile storage (RAM), until the transaction completes its
last instruction. When we see that there has not been any error we do something known as a
COMMIT operation. Its job is to write every temporarily calculated value from the volatile
storage on to the stable storage. In this way, even if power fails at instruction 3, the post
recovery image of the database will show accounts A and B both containing Rs 1000/-, as if
the failed transaction had never occurred.

Consistency: If we execute a particular transaction in isolation or together with other


transaction, (i.e. presumably in a multi-programming environment), the transaction will yield
the same expected result.

To give better performance, every database management system supports the execution of
multiple transactions at the same time, using CPU Time Sharing. Concurrently executing
transactions may have to deal with the problem of sharable resources, i.e. resources that
multiple transactions are trying to read/write at the same time. For example, we may have a
table or a record on which two transaction are trying to read or write at the same time.
Careful mechanisms are created in order to prevent mismanagement of these sharable
resources, so that there should not be any change in the way a transaction performs. A
transaction which deposits Rs 100/- to account A must deposit the same amount whether it is
acting alone or in conjunction with another transaction that may be trying to deposit or
withdraw some amount at the same time.

Isolation: In case multiple transactions are executing concurrently and trying to access a
sharable resource at the same time, the system should create an ordering in their execution so
that they should not create any anomaly in the value stored at the sharable resource.

There are several ways to achieve this and the most popular one is using some kind of
locking mechanism. Again, if you have the concept of Operating Systems, then you should
remember the semaphores, how it is used by a process to make a resource busy before
starting to use it, and how it is used to release the resource after the usage is over. Other
processes intending to access that same resource must wait during this time. Locking is
almost similar. It states that a transaction must first lock the data item that it wishes to access,
and release the lock when the accessing is no longer required. Once a transaction locks the
data item, other transactions wishing to access the same data item must wait until the lock is
released.

Durability: It states that once a transaction has been complete the changes it has made
should be permanent.

As we have seen in the explanation of the Atomicity property, the transaction, if completes
successfully, is committed. Once the COMMIT is done, the changes which the transaction
has made to the database are immediately written into permanent storage. So, after the
transaction has been committed successfully, there is no question of any loss of information
even if the power fails. Committing a transaction guarantees that the AFIM has been reached.

There are several ways Atomicity and Durability can be implemented. One of them is called
Shadow Copy. In this scheme a database pointer is used to point to the BFIM of the
database. During the transaction, all the temporary changes are recorded into a Shadow Copy,
which is an exact copy of the original database plus the changes made by the transaction,
which is the AFIM. Now, if the transaction is required to COMMIT, then the database pointer
is updated to point to the AFIM copy, and the BFIM copy is discarded. On the other hand, if
the transaction is not committed, then the database pointer is not updated. It keeps pointing to
the BFIM, and the AFIM is discarded. This is a simple scheme, but takes a lot of memory
space and time to implement.

If you study carefully, you can understand that Atomicity and Durability is essentially the
same thing, just as Consistency and Isolation is essentially the same thing.

Transaction States

There are the following six states in which a transaction may exist:

Active: The initial state when the transaction has just started execution.

Partially Committed: At any given point of time if the transaction is executing properly,
then it is going towards it COMMIT POINT. The values generated during the execution are
all stored in volatile storage.

Failed: If the transaction fails for some reason. The temporary values are no longer required,
and the transaction is set to ROLLBACK. It means that any change made to the database by
this transaction up to the point of the failure must be undone. If the failed transaction has
withdrawn Rs. 100/- from account A, then the ROLLBACK operation should add Rs 100/- to
account A.

Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The
transaction is now said to have been aborted.

Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the
temporary values are written to the stable storage and the transaction is said to have been
committed.

Terminated: Either committed or aborted, the transaction finally reaches this state.

The whole process can be described using the following diagram:


Concurrent Execution

A schedule is a collection of many transactions which is implemented as a unit. Depending


upon how these transactions are arranged in within a schedule, a schedule can be of two
types:

Serial: The transactions are executed one after another, in a non-preemptive manner.
Concurrent: The transactions are executed in a preemptive, time shared method.

In Serial schedule, there is no question of sharing a single data item among many
transactions, because not more than a single transaction is executing at any point of time.
However, a serial schedule is inefficient in the sense that the transactions suffer for having a
longer waiting time and response time, as well as low amount of resource utilization.

In concurrent schedule, CPU time is shared among two or more transactions in order to run
them concurrently. However, this creates the possibility that more than one transaction may
need to access a single data item for read/write purpose and the database could contain
inconsistent value if such accesses are not handled properly. Let us explain with the help of
an example.

Let us consider there are two transactions T1 and T2, whose instruction sets are given as
following. T1 is the same as we have seen earlier, while T2 is a new transaction.

T1
Read A;
A = A 100;
Write A;
Read B;
B = B + 100;
Write B;

T2
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;

T2 is a new transaction which deposits to account C 10% of the amount in account A.

If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or
T2 will completely finish before T1 can begin. However, if we want to create a concurrent
schedule, then some Context Switching need to be made, so that some portion of T1 will be
executed, then some portion of T2 will be executed and so on. For example say we have
prepared the following concurrent schedule.

T1 T2

Read A;
A = A - 100;
Write A;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Read B;
B = B + 100;
Write B;

No problem here. We have made some Context Switching in this Schedule, the first one after
executing the third instruction of T1, and after executing the last statement of T2. T1 first
deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads the value of A,
calculates the value of Temp to be Rs 90/- and adds the value to C. The remaining part of T1
is executed and Rs 100/- is added to B.

It is clear that a proper Context Switching is very important in order to maintain the
Consistency and Isolation properties of the transactions. But let us take another example
where a wrong Context Switching can bring about disaster. Consider the following example
involving the same T1 and T2

T1 T2

Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;

This schedule is wrong, because we have made the switching at the second instruction of T1.
The result is very confusing. If we consider accounts A and B both containing Rs 1000/- each,
then the result of this schedule should have left Rs 900/- in A, Rs 1100/- in B and add Rs 90
in C (as C should be increased by 10% of the amount in A). But in this wrong schedule, the
Context Switching is being performed before the new value of Rs 900/- has been updated in
A. T2 reads the old value of A, which is still Rs 1000/-, and deposits Rs 100/- in C. C makes
an unjust gain of Rs 10/- out of nowhere.

In the above example, we detected the error simple by examining the schedule and applying
common sense. But there must be some well formed rules regarding how to arrange
instructions of the transactions to create error free concurrent schedules. This brings us to our
next topic, the concept of Serializability.

Serializability

When several concurrent transactions are trying to access the same data item, the instructions
within these concurrent transactions must be ordered in some way so as there are no problem
in accessing and releasing the shared data item. There are two aspects of serializability which
are described here:

Conflict Serializability
Two instructions of two different transactions may want to access the same data item in order
to perform a read/write operation. Conflict Serializability deals with detecting whether the
instructions are conflicting in any way, and specifying the order in which these two
instructions will be executed in case there is any conflict. A conflict arises if at least one (or
both) of the instructions is a write operation. The following rules are important in Conflict
Serializability:

1. If two instructions of the two concurrent transactions are both for read operation, then
they are not in conflict, and can be allowed to take place in any order.
2. If one of the instructions wants to perform a read operation and the other instruction
wants to perform a write operation, then they are in conflict, hence their ordering is
important. If the read instruction is performed first, then it reads the old value of the
data item and after the reading is over, the new value of the data item is written. It the
write instruction is performed first, then updates the data item with the new value and
the read instruction reads the newly updated value.
3. If both the transactions are for write operation, then they are in conflict but can be
allowed to take place in any order, because the transaction do not read the value
updated by each other. However, the value that persists in the data item after the
schedule is over is the one written by the instruction that performed the last write.

It may happen that we may want to execute the same set of transaction in a different schedule
on another day. Keeping in mind these rules, we may sometimes alter parts of one schedule
(S1) to create another schedule (S2) by swapping only the non-conflicting parts of the first
schedule. The conflicting parts cannot be swapped in this way because the ordering of the
conflicting instructions is important and cannot be changed in any other schedule that is
derived from the first. If these two schedules are made of the same set of transactions, then
both S1 and S2 would yield the same result if the conflict resolution rules are maintained
while creating the new schedule. In that case the schedule S1 and S2 would be called
Conflict Equivalent.

View Serializability:
This is another type of serializability that can be derived by creating another schedule out of
an existing schedule, involving the same set of transactions. These two schedules would be
called View Serializable if the following rules are followed while creating the second
schedule out of the first. Let us consider that the transactions T1 and T2 are being serialized
to create two different schedules S1 and S2 which we want to be View Equivalent and both
T1 and T2 wants to access the same data item.

1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the
initial value of that same data item.
2. If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1
should write the value in the data item before T2 reads it.
3. If in S1, T1 performs the final write operation on that data item, then in S2 also, T1
should perform the final write operation on that data item.

Except in these three cases, any alteration can be possible while creating S2 by modifying S1.

Concurrency Control:-

When multiple transactions are trying to access the same sharable resource, there could arise
many problems if the access control is not done properly. There are some important
mechanisms to which access control can be maintained. Earlier we talked about theoretical
concepts like serializability, but the practical concept of this can be implemented by using
Locks and Timestamps. Here we shall discuss some protocols where Locks and Timestamps
can be used to provide an environment in which concurrent transactions can preserve their
Consistency and Isolation properties.

Lock Based Protocol

A lock is nothing but a mechanism that tells the DBMS whether a particular data item is
being used by any transaction for read/write purpose. Since there are two types of operations,
i.e. read and write, whose basic nature are different, the locks for read and write operation
may behave differently.

Read operation performed by different transactions on the same data item poses less of a
challenge. The value of the data item, if constant, can be read by any number of transactions
at any given time.
Write operation is something different. When a transaction writes some value into a data
item, the content of that data item remains in an inconsistent state, starting from the moment
when the writing operation begins up to the moment the writing operation is over. If we allow
any other transaction to read/write the value of the data item during the write operation, those
transaction will read an inconsistent value or overwrite the value being written by the first
transaction. In both the cases anomalies will creep into the database.

The simple rule for locking can be derived from here. If a transaction is reading the content of
a sharable data item, then any number of other processes can be allowed to read the content
of the same data item. But if any transaction is writing into a sharable data item, then no other
transaction will be allowed to read or write that same data item.

Depending upon the rules we have found, we can classify the locks into two types.
Shared Lock: A transaction may acquire shared lock on a data item in order to read its
content. The lock is shared in the sense that any other transaction can acquire the shared lock
on that same data item for reading purpose.
Exclusive Lock: A transaction may acquire exclusive lock on a data item in order to both
read/write into it. The lock is excusive in the sense that no other transaction can acquire any
kind of lock (either shared or exclusive) on that same data item.

The relationship between Shared and Exclusive Lock can be represented by the following
table which is known as Lock Matrix.

Locks already existing

Shared Exclusive
Shared TRUE FALSE
Exclusive FALSE FALSE

How Should Lock be Used?

In a transaction, a data item which we want to read/write should first be locked before the
read/write is done. After the operation is over, the transaction should then unlock the data
item so that other transaction can lock that same data item for their respective usage. In the
earlier chapter we had seen a transaction to deposit Rs 100/- from account A to account B.
The transaction should now be written as the following:

Lock-X (A); (Exclusive Lock, we want to both read As value and modify it)
Read A;
A = A 100;
Write A;
Unlock (A); (Unlocking A after the modification is done)
Lock-X (B); (Exclusive Lock, we want to both read Bs value and modify it)
Read B;
B = B + 100;
Write B;
Unlock (B); (Unlocking B after the modification is done)
And the transaction that deposits 10% amount of account A to account C should now be
written as:

Lock-S (A); (Shared Lock, we only want to read As value)


Read A;
Temp = A * 0.1;
Unlock (A); (Unlocking A)
Lock-X (C); (Exclusive Lock, we want to both read Cs value and modify it)
Read C;
C = C + Temp;
Write C;
Unlock (C); (Unlocking C after the modification is done)

Let us see how these locking mechanisms help us to create error free schedules. You should
remember that in the previous chapter we discussed an example of an erroneous schedule:

T1 T2

Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;

We detected the error based on common sense only, that the Context Switching is being
performed before the new value has been updated in A. T2 reads the old value of A, and thus
deposits a wrong amount in C. Had we used the locking mechanism, this error could never
have occurred. Let us rewrite the schedule using the locks.

T1 T2

Lock-X (A)
Read A;
A = A - 100;
Write A;
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Write A;
Unlock (A)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)

We cannot prepare a schedule like the above even if we like, provided that we use the locks in
the transactions. See the first statement in T2 that attempts to acquire a lock on A. This would
be impossible because T1 has not released the excusive lock on A, and T2 just cannot get the
shared lock it wants on A. It must wait until the exclusive lock on A is released by T1, and
can begin its execution only after that. So the proper schedule would look like the following:

T1 T2

Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)

And this automatically becomes a very correct schedule. We need not apply any manual
effort to detect or correct the errors that may creep into the schedule if locks are not used in
them.

Two Phase Locking Protocol


The use of locks has helped us to create neat and clean concurrent schedule. The Two Phase
Locking Protocol defines the rules of how to acquire the locks on a data item and how to
release the locks.
The Two Phase Locking Protocol assumes that a transaction can only be in one of two phases.

Growing Phase: In this phase the transaction can only acquire locks, but cannot release any
lock. The transaction enters the growing phase as soon as it acquires the first lock it wants.
From now on it has no option but to keep acquiring all the locks it would need. It cannot
release any lock at this phase even if it has finished working with a locked data item.
Ultimately the transaction reaches a point where all the lock it may need has been acquired.
This point is called Lock Point.

Shrinking Phase: After Lock Point has been reached, the transaction enters the shrinking
phase. In this phase the transaction can only release locks, but cannot acquire any new lock.
The transaction enters the shrinking phase as soon as it releases the first lock after crossing
the Lock Point. From now on it has no option but to keep releasing all the acquired locks.
There are two different versions of the Two Phase Locking Protocol. One is called the Strict
Two Phase Locking Protocol and the other one is called the Rigorous Two Phase Locking
Protocol.

Strict Two Phase Locking Protocol


In this protocol, a transaction may release all the shared locks after the Lock Point has been
reached, but it cannot release any of the exclusive locks until the transaction commits. This
protocol helps in creating cascade less schedule.

A Cascading Schedule is a typical problem faced while creating concurrent schedule.


Consider the following schedule once again.

T1 T2

Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)

The schedule is theoretically correct, but a very strange kind of problem may arise here. T1
releases the exclusive lock on A, and immediately after that the Context Switch is made. T2
acquires a shared lock on A to read its value, perform a calculation, update the content of
account C and then issue COMMIT. However, T1 is not finished yet. What if the remaining
portion of T1 encounters a problem (power failure, disc failure etc) and cannot be committed?
In that case T1 should be rolled back and the old BFIM value of A should be restored. In such
a case T2, which has read the updated (but not committed) value of A and calculated the
value of C based on this value, must also have to be rolled back. We have to rollback T2 for
no fault of T2 itself, but because we proceeded with T2 depending on a value which has not
yet been committed. This phenomenon of rolling back a child transaction if the parent
transaction is rolled back is called Cascading Rollback, which causes a tremendous loss of
processing power and execution time.

Using Strict Two Phase Locking Protocol, Cascading Rollback can be prevented. In Strict
Two Phase Locking Protocol a transaction cannot release any of its acquired exclusive locks
until the transaction commits. In such a case, T1 would not release the exclusive lock on A
until it finally commits, which makes it impossible for T2 to acquire the shared lock on A at a
time when As value has not been committed. This makes it impossible for a schedule to be
cascading.

Rigorous Two Phase Locking Protocol

In Rigorous Two Phase Locking Protocol, a transaction is not allowed to release any lock
(either shared or exclusive) until it commits. This means that until the transaction commits,
other transaction might acquire a shared lock on a data item on which the uncommitted
transaction has a shared lock; but cannot acquire any lock on a data item on which the
uncommitted transaction has an exclusive lock.

Timestamp Ordering Protocol

A timestamp is a tag that can be attached to any transaction or any data item, which denotes
a specific time on which the transaction or data item had been activated in any way. We, who
use computers, must all be familiar with the concepts of Date Created or Last Modified
properties of files and folders. Well, timestamps are things like that.

A timestamp can be implemented in two ways. The simplest one is to directly assign the
current value of the clock to the transaction or the data item. The other policy is to attach the
value of a logical counter that keeps incrementing as new timestamps are required.
The timestamp of a transaction denotes the time when it was first activated. The timestamp of
a data item can be of the following two types:

W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.

These two timestamps are updated each time a successful read/write operation is performed
on the data item Q.
How should timestamps be used?

The timestamp ordering protocol ensures that any pair of conflicting read/write operations
will be executed in their respective timestamp order. This is an alternative solution to using
locks.

For Read operations:

1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data
item Q which has already been overwritten by some other transaction. Hence the
value which T wanted to read from Q does not exist there anymore, and T would be
rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data
item Q which has been written and committed by some other transaction earlier.
Hence T will be allowed to read the value of Q, and the R-timestamp of Q should be
updated to TS (T).

For Write operations:

1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for
transaction T to write its value, and the delay has become so great that it has allowed
another transaction to read the old value of data item Q. In such a case T has lost its
relevance and will be rolled back.
2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the
system has allowed another transaction to write into the data item Q. in such a case
too, T has lost its relevance and will be rolled back.
3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS
(T).

S-ar putea să vă placă și