Sunteți pe pagina 1din 27

UNIT 1 INTRODUCTION TO DBMS

1. File system organization

1. Computer file contains information arranged in an electronic format. It also facilitates easy
storage, retrieval, and manipulation of data.
2. They are stored in the form bits and bytes. It has a name and the computer would recognize a file
based on this name.
3. A programmer working with this file can give instructions to the computer to open the file, read
from it, write to it, modify its contents, close it, and so on.
4. A program passes control to another in a sequence. This is called batch processing, where no or
minimum human interaction is required.
5. In many situations the program needs to be conversational. These days, the computer performs
both the searching and the answering operations in an automated manner.
6. A search that can take place at any time is called as an online query.
7. When an instantaneous answer is expected, it is called online processing or real-time
processing. Example: Airlines reservations
8. Data can be classified into two types:

Master data -does not change with time.

Transaction data -can change from time to time.
9. Example: Library Management
10. There is a library of books and a librarian to maintain it. The librarian has created one card per
book, which contains details such as book number, title, author, price and date of purchase.
11. For this, the librarian has used the conceptual record layout as shown in the figure 1.1

Fig1.1Record layout for the Book file


12. A card is similar to a record and, in technical terms, the entire pile of cards is similar to a file.
13. A field used to identify a record is called as a record key, or just a key.
14. A record key can be of two types:

Primary key: Identifies a record uniquely based on a field value.
Example: Book number.

Secondary key: May or may not identify a record uniquely, but can identify one or more
records based on a field value.
Example: Author.
1.1 Sequential organization
 A file is called as a sequential file when it contains records arranged in sequential fashion.
• The records are added as and when they are available. That is a new record is always
added to the end of the current file.
 Advantages of sequential organization

Simplicity: The sequential organization of records is quite simple ant it just needs to create
new record for the new books.

Less overhead: There is no need to keep any key or any other extra information
on the books file. The file is enough.
 Disadvantage

Difficulties in searching:

Searching can be a very slow process.

It starts from the beginning and continue till the end, or until the desired record is
found, whichever is earlier. This is both time consuming and cumbersome.

Lack of support for queries: To even find out whether something is available in
the file or not, the entire file has to be read.

Problem with record deletion: It is not simple to delete records. The space freed
by the deletion of the record cannot be reclaimed.
1.2 pointers
 A pointer in a record is a special field, whose value is the address/reference of another
record in the same file.
 The special field forms a chain of records. A chain of records is a logical sequence of
records created by the use of pointer fields.
 These chains were called one-way chains.
 Problems with One-way Chains:

They can suffer from the drawback of lost/damaged references.

They are unidirectional by nature.
 Two-way Chains
1. In two-way chains another logical chain is created.
2. Add another chain in the reverse direction, such that there are two pointer fields. The
new pointer fields are called as the back-pointer.
3. The back-pointer points to the previous record in the chain.
4. A broken chain does not cause a major problem in two-way chains. This is for the
simple reason that another chain exists in the opposite direction.
5. Two-way chains do not suffer from the drawback of lost/damaged references.
 Disadvantages:
1. More effort goes into their maintenance. For every new record that is being added,
there is a need to make the entry in both forward and backward fields.
2. If a record is lost both the previous and next pointers need to be adjusted so that they
point to the correct record.
1.3 Indexed Organization
1. One of the fields in the file is the primary key. This field identifies a record uniquely. In
every record, the primary key field should occupy the same position.
2. In order to create and maintain index files, a computer creates a data file and an index
file. The data file contains the actual contents (data) of the record, whereas the index file
contains the index entries.
3. The way files are organized in computers is as follows:
o
The data file is sorted in the order of the primary key field values.
o
The index file contains two fields: the key value and the pointer to the data area.
o
One record in the index file thus consists of a key value and a pointer to the
corresponding data record.
o
The key value is generally the largest primary key value in a given range of
records. The pointer points to the first entry within that range of data records.
4. This is illustrated in figure. In the first index entry, the index value is C, which is the
highest primary key value in the first data block.
5. The pointer from this index entry points to the start of this range (i.e. A). The address
(i.e. memory location) of this on the disk is assumed to be 0, as shown.
6. The second index entry contains F as the highest primary key value for that range of
records, and a pointer to D, which is the start of the range, and so on.
7. The address of this on the disk is assumed to be 100, as shown.

Fig1.2 Indexed file organization


8. This arrangement works fine. There are two problems, as
follows:

To insert new index values between any two existing values.

The number of index values becomes too high.
 Solution :

Inserting a new index entry would necessitate a split in the index, and
appropriate adjustments in the address values.

To solve the second problem create a multi-level index (index of indexes).
In this type of index, the very first line does not point to the data items as
before. Instead, it points to another lower-level index.

Depending on the need, this lower-level index may point to yet another
lower-level index, and so on. Only the final level of index points to the
actual data items.
1.4 Direct organization
 The idea is quite simple. All records in direct file are of the same size.
 Every record has an associated record number. The record number serves the same
purpose as a primary key in an index file.
 Direct files can be classified in to two main types .They are:

Hashed files

Non-hashed files
 Non - hashed files

Here, records are placed in its appropriate slot based on its record number.

The drawback of the non-hashed file approach is the creation of too many
empty slots.
 Hashed file

In hashed file the record number itself becomes an equivalent of the primary key.

The term hash indicates splitting or chopping of key in to pieces.

They are three primary hashing techniques they are: division method, mid-square
method and folding method.
2.Purpose of Database System

9. Database systems arose in response to early methods of computerized


management of commercial data.
10. As an example consider part of a university organization that among other data, keeps
information about all instructors, students, departments and course offerings.
11. One way to keep the information on a computer is to store it in operating system files.
12. To allow users to manipulate the information, the system has a number of application
programs that manipulate the files, including programs to:

Add new students, instructors, and courses.

Register students for courses.

Assign grades to students, compute grade point averages (GPA) and generate
transcripts.
13. New application programs are added to the system as the need arises.

2.1 File Processing System


o This system is supported by a conventional operating system.
o The system stores permanent records in various files.
o It needs different application programs to extract records from, and add records
to, the appropriate files.
 Before database management systems (DBMSs) were introduced, organizations usually
stored information in such systems.
2.1.1 Drawbacks of using file systems to store data

 Data redundancy and inconsistency



Different programmers create files and application program.

The files created, have different structures and the programs may be
written in several programming language.

The same information may be duplicated in several files.
 Difficulty in accessing data

Need to write a new program to carry out each new task.
 Data isolation

Data are scattered in various file. The files may be stored in different
format

Writing new application program to retrieve appropriate data is difficult.
 Integrity problems

Integrity constraints (e.g., account balance > 0) become ―buried ‖ in
program code rather than being stated explicitly.

Hard to add new constraints or change existing ones.
 Atomicity of updates

Failures may leave database in an inconsistent state with partial updates carried
out.

Example: Transfer of funds from one account to another should either
complete or not happen at all.
 Concurrent access anomalies

Concurrent access needed for improved performance.
o
Uncontrolled concurrent accesses can lead to inconsistencies.
o
Example: Two people reading a balance (say 100) and updating it by withdrawing
money (say 50 each) at the same time.
 Security problems
o
Not every user of the database system should be able to access all the data.
o
Example: In a university, payroll personnel need to see only the financial information
o
.They does not see information about academic records.
o
Since application programs are added to file processing system in an adhoc
manner, enforcing such security constraint is difficult.
 Database systems offer solutions to all the above problems.

3. Database System Terminologies

Database: A collection of related data.


Data: Known facts that can be recorded and have an implicit meaning.
Mini-world: Some part of the real world about which data is stored in a database.
For example: student grades and transcripts at a university.
Database Management System (DBMS): A software package/ system to facilitate the creation and
maintenance of a computerized database.
Database System: The DBMS software together with the data itself. Sometimes, the applications are
also included.

4. Database Characteristics
The main characteristics of the database approach are the following:
1. Self-describing nature of a database system
2. Insulation between programs and data, and data abstraction
3. Support of multiple views of the data
4. Sharing of data and multiuser transaction processing

1. Self-describing nature of a database system


A DBMS catalog stores the description of a particular database (e.g. data structures, types,
and constraints)
The description is called meta-data.
This allows the DBMS software to work with different database applications.

2. Insulation between programs and data, and data abstraction


The structure of data files is stored in the DBMS catalog separately from the access
programs. This property is called program-data independence.
Allows changing data structures and storage organization without having to change the
DBMS access programs.
Data Abstraction: A data model is used to hide storage details and present the users
with a conceptual view of the database.
Programs refer to the data model constructs rather than data storage details.

3. Support of multiple views of the data


Each user may see a different view of the database, which describes only the data of
interest to that user.

4. Sharing of data and multiuser transaction processing


o Allowing a set of concurrent users to retrieve from and to update the
database.
o Concurrency control within the DBMS guarantees that each transaction is
correctly executed or aborted.
o Recovery subsystem ensures each completed transaction has its effect
permanently recorded in the database.
o OLTP (Online Transaction Processing) is a major part of database
applications. This allows hundreds of concurrent transactions to execute per
second.
VIEWS OF DATA
 A major purpose of a database system is to provide users with an abstract view of
the data i.e the system hides certain details of how the data are stored and
maintained.
Views have several other benefits.
 Views provide a level of security. Views can be setup to exclude data that some
users should not see.
 Views provide a mechanism to customize the appearance of the database.
 A view can present a consistent, unchanging picture of the structure of the
database, even if the underlying database is changed.
The ANSI / SPARC architecture defines three levels of data abstraction.
External level / logical level
Conceptual level
Internal level / physical level
The objectives of the three level architecture are to separate each user's view of
the database from the way the database is physically represented.

















External level
 The users' view of the database External level describes that part of the database
that is relevant to each user. The external level consists of a number of different
external views of the database. Each user has a view of the 'real world' represented
in a form that is familiar for that user. The external view includes only those
entities, attributes, and relationships in the real world that the user is interested in.
The use of external models has some very major advantages,
 Makes application programming much easier.
 Simplifies the database designer's task.
 Helps in ensuring the database security.
Conceptual level
 The community view of the database conceptual level describes what data is
stored in the database and the relationships among the data. The middle
level in the three level architecture is the conceptual level. This level
contains the logical structure of the entire database as seenby theDBA. It is a
complete view of the data requirements of the organization that is
independent of any storage considerations.
The conceptual level represents:
 All entities, their attributes and their relationships
 The constraints on the data
 Semantic information about the data
 Security and integrity information.
 The conceptual level supports each external view. However, this level must
notcontain any storage dependent details. For instance, the description of an
entity should contain only data
 types of attributes and their length, but not any storage consideration such as the
number of bytes occupied.
Internal level
 The physical representation of the database on the computer Internal level describes
how the data is stored in the database. The internal level covers the physical
implementation of the database to achieve optimal runtime performance and storage
space utilization. It covers the data structures and file organizations used to store
data on storage devices.
The internal level is concerned with
•Storage space allocation for data and indexes.
•Record descriptions for storage
•Record placement.
•Data compression and data encryption techniques.
•Below the internal level there is a physical level that maybe managed by the
operating system under the direction of the DBMS

Physical level
 The physical level below the DBMS consists of items only the operating system
knows such as exactly how the sequencing is implemented and whether the fields
of internal records are stored as contiguous bytes onthe disk.
Instances and Schemas
 Similar to types and variables in programming languages which we alreadyknow,
Schema is the logical structure of the database E.g., the database consists of
information about a set of customers and accounts and the relationship between
them) analogous to type information of a variable in a program.
 Physical schema: database design at the physical level
 Logical schema: database design at the logical level

DATA MODELS
The data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way to describe the design of
a data base at the physical, logical and view level.
The purpose of a data model is to represent data and to make the data understandable.
According to the types of concepts used to describe the database structure, there are three
data models:
1. An external data model, to represent each user's view of the organization.
2. A conceptual data model, to represent the logical view that is DBMS independent
3. An internal data model, to represent the conceptual schema in such a way that it can
be understood by the DBMS.

Categories of data model:


1. Record-based data models
2. Object-based data models
3. Physical-data models.
The first two are used to describe data at the conceptual and external levels, the latter is used
to describe data at the internal level.

1. Record -Based data models


In a record-based model, the database consists of a number of fixed formatrecords possibly
of differing types. Each record type defines a fixed number of fields, each typically of a
fixed length.
There are three types of record-based logical data model.
•Hierarchical data model.
•Network data model
•Relational data model
Hierarchical data model
In the hierarchical model, data is represented as collections of records and relationships are
represented by sets. The hierarchical model allows a node to have only one parent. A
hierarchical model can be represented as a tree graph, with records appearing as nodes, also
called segments, and sets as edges.

Network data model


In the network model, data is represented as collections of records and relationships
are represented by sets. Each set is composed of at least two record types:
•An owner record that is equivalent to the hierarchical model's parent
•A member record that is equivalent to the hierarchical model's child
A set represents a 1 :M relationship between the owner and the member.

Relational data model:

The relational data model is based on the concept of mathematical relations. Relational model
stores data in the form of a table. Each table corresponds to an entity, and each row represents
an instance of that entity. Tables, also called relations are related to each other through the
sharing of a common entitycharacteristic.
Example
Relational DBMS DB2, oracle, MS SQLserver.
Object -Based Data Models
Object-based data models use concepts such as entities, attributes, and relationships. An entity
is a distinct object in the organization that is to be represents in the database. An attribute is a
property that describes some aspect of the object, and a relationship is an association between
entities. Common types of object-based data model are:
•Entity -Relationship model
•Object -oriented model
•Semantic model
Entity Relationship Model:
The ER model is based on the following components:
•Entity: An entity was defined as anything about which data are to be collected and stored.
Each row in the relational table is known as an entity instance or entity occurrence in the ER
model. Each entity is described by a set of attributes that describes particular characteristics of
the entity.
Object oriented model:
In the object-oriented data model (OODM) both data and their relationships are contained in
a single structure known as an object. An object is described by its factual content. An object
includes information about relationships between the facts within the object, as well as
information about its relationships with other objects. Therefore, the facts within the object
are given greater meaning. The OODM is said to be a semantic data model because semantic
indicates meaning.
The OO data model is based on the following components:
An object is an abstraction of a real-world entity.
Attributes describe the properties of an object.
Data abstraction:
o Suppression of details of data organization and Storage.
o Highlighting the essential features for an improved understanding of data.
Data model:
 Collection of concepts that describe the structure of a database.
 Provides means to achieve data abstraction.
Basic operations

Specify retrievals and updates on the database
Dynamic aspect or behavior of a database application
Allows the database designer to specify a set of valid operations allowed on
database objects.
Categories of Data Models
High-level or conceptual data models
Close to the way many users perceive data.
Conceptual data models use concepts such as entities, attributes, and relationships.

Entity-Represents a real-world object or concept.

Attribute-Represents some property of interest that further describes an entity.

Relationship among two or more entities represents an association among the
entities.
Low-level or physical data models
Describe the details of how data is stored on computer storage media.
Representational data models

Easily understood by end users.

Also similar to how data organized in computer storage.
Relational data model
Used most frequently in traditional commercial DBMSs.
Object data model
New family of higher-level implementation data models that are closer to
conceptual data models.
Physical data models

Describe how data is stored as files in the computer.

Access path- Structure that makes the search for particular database records efficient.

Index- Example of an access path that allows direct access to data using an index term
or keyword.
6. DBMS Components
A DBMS is a complex software system.
Figure illustrates, in a simplified form, the typical DBMS components.

Fig6.1Component modules of a DBMS and their interactions


The top part of the figure refers to the various users of the database environment and
their interfaces.
The lower part shows the internals of the DBMS responsible for storage of data
and processing of transactions.
Let us consider the top part of Figure:
1. It shows interfaces for

the DBA staff, casual users who work with interactive interfaces to formulate
queries,

application programmers who program using some host languages,

parametric users who do data entry work by supplying parameters to
predefined transactions.
2. The DDL compiler: processes schema definitions, specified in the DDL, and
stores descriptions of the schemas (meta-data) in the DBMS catalog.
3. Casual users and persons with occasional need for information from the database
interact using some form of interface called as interactive query interface.
4. Query compiler: handles high-level queries that are entered interactively.
5. The query optimizer is concerned with the rearrangement and possible reordering of
operations, elimination of redundancies, and use of correct algorithms and indexes
during execution.
6. Application programmers write programs in host languages such as Java, C, or
C++that are submitted to a precompiler.
7. The precompiler extracts DML commands from an application program written in a
host programming language.
8. DML compiler: compiles the DML commands into object code for database access.
9. The rest of the program is sent to the host language compiler.
10. The object codes for the DML commands and the rest of the program are linked,
forming a canned transaction whose executable code includes calls to the runtime
database processor.
Now, Let us consider the lower part of figure
1. Run-time database processor: handles database access at run time. It receives retrieval
and update operations and carries them out on the database.
2. It also works with the stored data manager, which controls access to DBMS
information that is stored on disk through interaction with operating system.
3. Concurrency control and backup and recovery systems are integrated into the
working of the runtime database processor for purposes of transaction management.
Database System Utilities
There are some functions that are not provided through the normal DBMS components
rather they are provided through additional programs called utilities. Some of these
are:
1. Loading or import utility: used to load or import existing data files into the database.
2. Backup utility: used to create backup copies of the database, usually by dumping the
entire database onto tape.
3. File reorganization utility: is used to reorganize a database file into a different
file organization to improve performance.
4. Performance monitoring utility: is used to monitor database usage and provides
statistics to the DBA.

DATABASESYSTEM ARCHITECTURE
Transaction Management
A transaction is a collection of operations that performs a single logical function in a
database application. Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g. power failures and
operating system crashes) and transaction failures. Concurrency-control manager controls
the interaction among the concurrent transactions, to ensure the consistency of the database.

Storage Management
a. A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted
to the system.
b. The storage manager is responsible for the following tasks:
c. Interaction with the file manager
d. Efficient storing, retrieving, and Storage Management
e. A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted
to the system.
a. The storage manager is responsible for the following tasks:
b. Interaction with the file manager
c. Efficient storing, retrieving, and updating of data
Database Administrator
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprise’s information resources and needs:
a. Schema definition
b. Storage structure and access method definition
c. Schema and physical organization modification
d. Granting user authority to access the database
e. Specifying integrity constraints
f. Acting as liaison with users
g. Monitoring performance and responding to changes in requirements
Database Users
Users are differentiated by the way they expect to interact with the system.
a. Application programmers: interact with system through DML calls.
b. Sophisticated users– form requests in a database query language
c. Specialized users – write specialized database applications that do not fit into the
traditional data processing framework
d. Naive users– invoke one of the permanent application programs that have been
written previously
File manager
Manages allocation of disk space and data structures used to represent information on disk.
Database manager-The interface between low level data and application programs and queries.
Query processor
Translates statements in a query language into low-level instructions the database
manager understands.
DML precompiler
Converts DML statements embedded in an application program to normal procedure calls in
a host language. The precompiler interacts with the query processor.
DDL compiler
Converts DDL statements to a set of tables containing metadata stored in a data dictionary.
In addition, several data structures are required for physical system implementation:
Data files: store the database itself.
Data dictionary: stores information about the structure of the database. It is used heavily.
Great emphasis should be placed on developing a good design and efficient implementation of
the dictionary.
Indices: provide fast access to data items holding particular values
7. Relational Algebra
1. A set of operators (unary and binary) that take relation instances as arguments and return
new relations.
2. Gives a procedural method of specifying a retrieval query.
3. Forms the core component of a relational query engine.
4. SQL queries are internally translated into Relational Algebra expressions.
5. Provides a framework for query optimization.
6. A sequence of relational algebra operations forms a relational algebra expression

7.1 Unary Relational Operations: SELECT ,PROJECT and RENAME


 The Select operation (denoted by σ (sigma))can be used to select those tuples of a
relation
that satisfy a given condition.

Notation:
σ : select operator ( read as sigma)
R: relation name
Examples of select expressions
Obtain information about a professor with name ―giridhar
σ name= “giridhar”(professor)
Obtain information about professors who joined the university between 1980 and 1985
σ startYear≥1980 ^ startYear < 1985(professor)
To select the tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000, the following
SELECT operation is given:
σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)
The result is shown in Figure

Fig7.1.Results of select operation


The Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
(cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE;
otherwise, it is FALSE.
(cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;
otherwise, it is FALSE.
(NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
 The project operation(denoted by π(pie)) can be used to keep only the required
attributes of a relation instance and throw away others.

Notation:
Π: project operator (read as pie) R: relation name
Examples of project expressions
To list each employee’s first and last name and salary, the PROJECT operation is
used as follows:
πLname, Fname, Salary(EMPLOYEE)
The result is shown in figure

Fig7.2.Results of project operation


 The Rename operator is denoted by ρ (rho).
It is used to rename the attributes of a relation or the relation name or both.
The general RENAME operation ρ can be expressed by any of the following forms:
ρS (B1, B2, …, Bn )(R) changes both:
the relation name to S, and
the column (attribute) names to B1, B1, …..Bn
ρS(R) changes:
the relation name only to S
ρ(B1, B2, …, Bn )(R) changes:
the column (attribute) names only to B1, B1, …..Bn
Example of Rename operation
To rename the attributes in a relation, simply list the new attribute names in parentheses,
as in the following example:
TEMP ← σ DNO = 4 (EMPLOYEE)
R (FN, LN, SAL)← π FNAME, LNAME, SALARY (TEMP)
These two operations are illustrated in Figure
Fig7.3.Results of Rename operation

7.2 Relational Algebra Operations from Set


Theory Union Operation
1. Binary operation, denoted by .
2. The result of R S is a relation that includes all tuples that are either in R or in S or in
both R and S.
3. Duplicate tuples are eliminated.
4. The two operand relations R and S must be ―type compatible (or UNION compatible):
R and S must have same number of attributes.
Each pair of corresponding attributes must be type compatible (have same domains).
Intersection operation
1. INTERSECTION is denoted by ∩.
2. The result of the operation R ∩ S, is a relation that includes all tuples that are in both R
and S.
3. The attribute names in the result will be the same as the attribute names in R.
4. The two operand relations R and S must be ―type compatible.
Set Difference
1. SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –
2. The result of R – S, is a relation that includes all tuples that are in R but not in S
3. The attribute names in the result will be the same as the attribute names in R
4. The two operand relations R and S must be―type compatible
Example of union, intersection and set difference operations
Cartesian (Or Cross) Product Operation
1. This operation is used to combine tuples from two relations in a combinatorial fashion.
2. Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm).
3. Result is a relation Q with degree n + m attributes: Q(A1, A2, . . ., An, B1, B2, . . .,
Bm), in that order.
4. The resulting relation state has one tuple for each combination of tuples—one from R
and one from S.
5. Hence, if R has nR tuples and S has nS tuples, then R x S will have nR * nS tuples.
6. The two operands do NOT have to be "type compatible‖.
7. Example:
FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)
EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES x DEPENDENT
8. EMP_DEPENDENTS will contain every combination of EMPNAMES and
DEPENDENT.
9. The operations are illustrated in the figure

Fig7.4. The Cartesian product (Cross Product) operation

8. Relational DBMS (RDBMS)


It is a database management system where the data are organized as tables
of data values and all the operations on the data work on these tables.
8.0 Codd’s rule
Dr. Edgar F. Codd proposed a set of 12 rules that were intended to
define the important characteristics and capabilities of any relational
system [Codd 1986]. The rules are listed below:
Rule Rule Name Description
All
Rule 1 Information rule information is represented logically by values in tables
Rule 2 Guaranteed Access Every data value is logically accessible by a combination of table name,
Rule primary key value and column name.
Rule 3 Missing Information Null values are systematically supported independent of data type.
rule
The logical description of the database is represented and may be
Rule 4 System catalogue interrogated
Rule by authorized users.
Rule 5 Comprehensive A high level relational language that support all of the following: data
language Rule definition, view definition, data manipulation, integrity constraints,
authorization, transaction boundaries.

The system should able to perform all theoretically possible updates on


Rule 6 View update rule view.
Rule 7 Set level Update Rule The ability to treat whole table as single object applies to insertion,
modification and deletion, as well as retrieval of data.
Rule 8 Physical data User operations and application program should be independent of any
independence rule changes in physical storage.
Rule 9 Physical data User operations and application program should be independent of any
independence rule changes in
Logical structure of base table provided they involve no loss of information.
Entity and referential integrity constraints should be defined in the high
Rule 10 Integrity level
independence rule relational language, not by application programs.
User operations and application program should be independent of location
Rule11 Distribution of
independence rule data when it is distributed over multiple computers.
Rule 12 Non-subversion rule If a low-level procedural language is supported, it must not able to subvert
integrity or security constraints expressed in the high-level relational
language

9. Entity-Relationship model
Entity-Relationship (ER) model- Popular high-level conceptual data model.
ER diagrams -Diagrammatic notation associated with the ER model.
Entity- Thing in real world with independent existence.
Attributes-Particular properties that describe entity. For example, an
EMPLOYEE entity may be described by the attributes employee’s
name, age, address, salary, and job.
Several types of attributes occur in the ER model: simple, composite,
single valued, multi valued, stored, and derived.
Simple or atomic attributes: Attributes that are not divisible.
Composite attributes: It can be divided into smaller subparts, which
represent more basic attributes with independent meanings. Composite
attributes can form a hierarchy.
Example: Address attribute of the EMPLOYEE entity can be subdivided
into Street_address, City, State, and Zip.

Fig9.1. A hierarchy of composite attributes


Single-Valued Attributes: Attributes that have a single value for a
particular entity. For example, Age of a person.
Multiv

alued Attributes:
An attribute can have a set of values for the same entity.

A multi valued attribute may have lower and upper bounds to
constrain the number of values allowed for each individual entity.
Stored versus Derived Attributes:
Two (or more) attribute values are related. Example: Age and
Birth_date attributes of a person.
For a particular person entity, the value of Age can be determined from the current
(today’s) date and the value of that person’s Birth_date.
The Age attribute is called a derived attribute and is said to be derivable from the
Birth_date attribute, which is called a stored attribute.
Entity type: Collection (or set) of entities that have the same attributes.

Fig 9.2 Two entity types, EMPLOYEE and COMPANY, and some member entities of
each
Key or uniqueness constraint: Attributes whose values are distinct for each individual
entity in entity set
Key attribute: Uniqueness property must hold for every entity set of the entity type.
Value sets (or domain of values): Specifies set of values that may be assigned to
that attribute for each individual entity.
Relationship: attribute of one entity type refers to another entity type. Represent
references as relationships not attributes.
Relationship Types, Sets, and Instances:
Relationship type R among n entity types E1, E2, ..., En: Defines a set of
associations among entities from these entity types.
Relationship instances ri: Each ri associates n individual entities (e1,e2, ..., en)and
each entity ej in ri is a member of entity set Ej.
Relationship Degree
Degree of a relationship type:1. Number of participating entity types 2. A
relationship type of degree two is called binary, and one of degree three is called
ternary.
Relationships as attributes: Think of a binary relationship type in terms of
attributes.

Fig9.3. Some instances in the WORKS_FOR relationship set, which represents a relationship
type WORKS_FOR between EMPLOYEE and DEPARTMENT

 Role names: Role name signifies the role that a participating entity plays in
each relationship instance.
 Recursive relationships: Same entity type participates more than oncein a relationship
type in different roles.
 Cardinality ratio for a binary relationship: Specifies maximum number of relationship
instances that entity can participate in.
 Participation constraint: Specifies whether existence of entity depends on its being
related to another entity.
Types: total and partial.
 Attributes of Relationship Types

Attributes of 1:1 or 1:N relationship types: can be migrated to one entity type.

For a 1:N relationship type: Relationship attribute can be migrated only to
entity type on N-side of relationship.

For M:N relationship types :1.Some attributes may be determined by combination of
participating entities2. be specified as relationship attributes.
 Weak Entity Types

Do not have key attributes of their own.

Identified by being related to specific entities another entity type.
 Regular entity types that do have a key attribute are called strong entity types.
 Identifying relationship of the weak entity type: The relationship type that relates a
weak entity type to its owner.
Summary of the notation for ER diagram:
Fig 9.4 ER Design for the COMPANY Database
10. Functional dependencies
1. The whole database is described by a single universal relation schema R = { A1, A2, ...,
An }. a. Definition:
2. A functional dependency, denoted by X → Y, between two sets of attributes X and Y that
are subsets of R specifies a constraint on the possible tuples that can form a relation state
r of R.
3. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].
4. The values of the Y component of a tuple in r depend on, or are determined by, the values
of the X component.
5. The values of the X component of a tuple uniquely (or functionally) determine the values
of the Y component.
6. There is a functional dependency (FD or f.d) from X to Y, or that Y is functionally
dependent on X.
7. X functionally determines Y in a relation schema R if, and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Y value. Note the
following:

If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)

That is, X is a candidate key of R—this implies that X → Y for any subset of
attributes Y of R.

If X→Y in R, this does not say whether or not Y→X in R.
8. A functional dependency is a property of the semantics or meaning of the attributes.
9. Whenever the semantics of two sets of attributes in R indicate that a functional
dependency should hold, specify the dependency as a constraint.
10. Relation extensions r(R) that satisfy the functional dependency constraints are called
legal relation states (or legal extensions) of R.

Fig10.1. Relation schemas EMP_PROJ.


11. Consider the relation schema EMP_PROJ in Figure10.1; from the semantics of the attributes
and the relation, the following functional dependencies should hold:
Ssn→Ename
Pnumber→{Pname,Plocation}
{Ssn, Pnumber}→Hours
12. These functional dependencies specify that

The value of an employee’s Social Security number (Ssn) uniquely determines the
employee name (Ename),

The value of a project’s number (Pnumber) uniquely determines the project
name (Pname) and location (Plocation),

A combination of Ssn and Pnumber values uniquely determines the number of
hours the employee currently works on the project per week (Hours).

Alternatively, Ename is functionally determined by (or functionally dependent on)
Ssn.
10.1Normal Forms Based on Primary Keys
10.1.1 Normalization of Relations:
The normalization process, as first proposed by Codd (1972a), takes a relation schema
through a series of tests to certify whether it satisfies a certain normal form.
10.1.2Normalization of data:
1. It can be considered a process of analyzing the given relation schemas based on their FDs
and primary keys to achieve the desirable properties of (1) minimizing redundancy and
(2) minimizing the insertion, deletion, and update anomalies.
2. Unsatisfactory relation schemas that do not meet certain conditions—the normal form
tests are decomposed into smaller relation schemas that meet the tests and hence possess
the desirable properties.
3. Definition: The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
4. Normalization must confirm the existence of additional properties:
5. The no additive join or lossless join property, which guarantees that the spurious tuple
generation problem does not occur with respect to the relation schemas created after
decomposition.
6. The dependency preservation property, which ensures that each functional dependency
is represented in some individual relation resulting after decomposition.
10.1.3 Demoralization:
It is the process of storing the join of higher normal form relations as a base relation,
which is in a lower normal form.
10.1.4 Definitions of Keys and Attributes Participating in Keys
1. A key K is a super key with the additional property that removal of any attribute from K
will cause K not to be a super key any more.
2. If a relation schema has more than one key, each is called a candidate key.
3. One of the candidate keys is arbitrarily designated to be the primary key, and the others
are called secondary keys.
4. An attribute of relation schema R is called a prime attribute of R if it is a member of
some candidate key of R.
5. An attribute is called nonprime if it is not a prime attribute, that is, if it is not a member
of any candidate key.
10.2 First Normal form
It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute.
It disallows having a set of values, a tuple of values, or a combination of both as an
attribute value for a single tuple.

Fig10.2. A relation schema that is not in 1NF

Fig10.3 Sample state of relation DEPARTMENT Fig10.4 . 1NF version of the same relation
with redundancy
Fig10.2is not in 1NF because Dlocations is not an atomic attribute.
There are three main techniques to achieve first normal form:
First technique:
1. Remove the attribute Dlocations and place it in a separate relation
DEPT_LOCATIONS, along with the primary key Dnumber.
2. The primary key of this relation is the combination {Dnumber, Dlocation}.
3. A distinct tuple in DEPT_LOCATIONS exists for each location of a department.
4. This decomposes the non-1NF relation into two 1NF relations.
Second Technique:
5. Expand the key so that there will be a separate tuple, in the original
DEPARTMENT relation for each location of a DEPARTMENT.
6. The primary key becomes the combination {Dnumber, Dlocation}.
7. Disadvantage: introducing redundancy in the relation.
Third technique:
8. If a maximum number of values is known for the attribute—for example, if it is
known that at most three locations can exist for a department—replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3.
9. Disadvantage: Introducing NULL values if most departments have fewer than
three locations.
The first solution is considered best because it does not suffer from redundancy and it is
completely general, having no limit placed on a maximum number of values.
10.3Second Normal Form
1. It is based on the concept of full functional dependency.
2. A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more.
3. A functional dependency X→Y is a partial dependency if some attribute A € X can be
removed from X and the dependency still holds.
4. In the following figure, {Ssn, Pnumber}→ Hours is a fu ll dependency (neither Ssn→
Hours nor Pnumber→Hours holds).
5. However, the dependency {Ssn, Pnumber} →Ename is partial because Ssn→Ename
holds.

Fig10.5 Relation schema EMP_PROJ


6. The EMP_PROJ relation is in 1NF but is not in 2NF.
7. The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially
dependent on the primary key {Ssn, Pnumber} of EMP_PROJ.
8. If a relation schema is not in 2NF, it can be second normalized or 2NF normalized
into a number of 2NF relations.
9. In that 2NF Relation , nonprime attributes are associated only with the part of the
primary key on which they are fully functionally dependent.
10. The functional dependencies FD1, FD2, and FD3 lead to the decomposition of
EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in figure, each of
which is in 2NF.

Fig10.6 . Normalizing EMP_PROJ into 2NF relations


10.4 Third Normal Form
1. It is based on the concept of transitive dependency.
2. A functional dependency X →Y in a relatio n sch ema R is a transitive dependency if
there exists a set of attributes Z in R that is neither a candidate key nor a subset of any
key of R, and both X→Z and Z→Y hold.
3. The dependency Ssn→Dmgr_ssn is transitive through Dnumber in EMP_DEPT in figure,
because both the dependencies Ssn→ Dnumber and Dnumber→ Dmgr_ssn hold and
Dnumber is neither a key itself nor a subset of the key of EMP_DEPT.

Fig10.7 . Relation schema EMP_DEPT


Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of
R is transitively dependent on the primary key.
The relation schema EMP_DEPT is in 2NF but not in 3NF because of the transitive
dependency.
EMP_DEPT is normalized by decomposing it into the two 3NF relation schemas ED1
and ED2.
Fig10.8. Normalizing EMP_DEPT into 3NF relations
10.5 Boyce Codd Normal Form

Definition: A relation schema R is in BCNF if whenever a nontrivial functional


dependency X→A holds in R, then X is a superkey of R.
1. Example: Consider a relation TEACH with the following dependencies:
FD1: {Student, Course} →
Instructor FD2: Instructor→
Course
2. {Student, Course} is a candidate key for this relation and that the dependencies shown
follow the pattern in figure, with Student as A, Course as B, and Instructor as C.

Fig10.9. A schematic relation with FDs; it is in 3NF, but not in BCNF

3. Hence this relation is in 3NF but not BCNF.


4. Decomposition of this relation schema into two schemas is not straightforward because it
may be decomposed into one of the three following possible pairs:
{Student, Instructor} and {Student, Course}
{Course, Instructor} and {Course, Student}
{Instructor, Course} and {Instructor, Student}
5. All three decompositions lose the functional dependency FD1. The desirable
decomposition of those just shown is 3 because it will not generate spurious tuples after a
join.
6. A relation not in BCNF should be decomposed so as to meet this property. Non additive
decomposition is a must during normalization.
10.6Formal definition of Multi valued dependencies(MVD):
The MVD x →→ Y is said to hold for R(X,Y,Z) if, whenever t1 and t2 are two rows in R
that have the same values for attributes X and therefore t1[x]=t2[x] then R also contains
t3 and t4,such that
t3 [X] = t4 [X] = t1 [X] = t2 [X]
t3 [Y] = t1 [Y] and t4[Y] = t2
[Y] t3 [Z] = t2 [Z] and t4 [Z] =
t1[Z]
10.6.1Fourth Normal Form
A relation schema R is in 4NF with respect to+ a set of dependencies F if, for every
nontrivial multivalued dependency X →→ Y in F , X is a superkey for R.
Consider the EMP relation in figure. EMP is not in 4NF because in the nontrivial MVDs
Ename→→ Pname and Ename →→ Dname, and Ename is not a superkey of EMP.

Fig10.10. The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname
Decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS, shown in figure.
Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs
Ename →→ Pname in EMP_PROJECTS and Ename →→ Dname in
EMP_DEPENDENTS are trivial MVDs.
No other nontrivial MVDs hold in either EMP_PROJECTS or EMP_DEPENDENTS. No
FDs hold in these relation schemas either.

Fig10.11. Decomposing the EMP relation into two 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS
10.6 Join Dependencies
Let a relation R have subset of its attribute A,B,C ,..Then R satisfies the Join
dependency (JD) written as *(A,B,C) if and only if every possible legal value of R is
equal to the join of its projection A,B,C…
10.7.1Definition of 5NF:
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R1, R2, ..., Rn), where each Ri is a subset of the set of attributes of R and R =
R1 ,R2 ... Rn, at least one of the following holds.
*(R1, R2, ..., Rn) is a trivial join-dependency (i.e., one of Ri is R)
Every Ri is a super key for R.
Example:
Department Subject Student
Comp. Sc. CP1000 John Smith
Mathematics MA1000 John Smith
Comp. Sc. CP2000 Arun Kumar
Comp. Sc. CP3000 Reena Rani
Physics PH1000 Raymond Chew
Chemistry CH2000 Albert Garcia
1. The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and CP3000 which are
taken by a variety of students.
2. No student takes all the subjects and no subject has all students enrolled in it and therefore all
three fields are needed to represent the information.
3. The above relation does not show MVDs since the attributes subject and student are not
independent; they are related to each other and the pairings have significant information in them.
4. The relation can therefore not be decomposed in two relations
(dept, subject), and (dept, student)
Without losing some important information.
The relation can however be decomposed in the following three relations

(dept, subject), and

(dept, student)

(subject, student)
Now it can be shown that this decomposition is lossless.

S-ar putea să vă placă și