Sunteți pe pagina 1din 42

Normalization

Pearson Education Limited 1995, 2005

Objectives
2

What is normalization and the purpose of


normalization
What is update anomalies?
How normal forms can be transformed
from lower normal forms to higher
normal forms; 1NF, 2NF and 3NF

Pearson Education Limited 1995, 2005

Purpose of
Normalization

Normalization is a technique of analyzing and correcting table


structure for producing a set of suitable relations that support
the data requirements of an enterprise.

Result: a set of relations with minimized data redundancies

Characteristics of a suitable set of relations include:


the minimal number of attributes necessary to support the
data requirements of the enterprise;
attributes with a close logical relationship are found in the
same relation i.e each table represent a single subject
minimal redundancy with each attribute represented only
once with the important exception of attributes that form all
or part of foreign keys i.e no data item will be unnecessarily
stored in more than 1 table
All attributes in a table are dependant on the primary key
Pearson Education Limited 1995, 2005

Purpose of
Normalization
The benefits of using a database that has
a suitable set of relations is that the
database will be:

easier for the user to access and maintain


the data reducing the opportunities for
data inconsistencies;
take up minimal storage space on the
computer.

Pearson Education Limited 1995, 2005

How Normalization
Supports Database Design

Pearson Education Limited 1995, 2005

Data Redundancy and


Update Anomalies

Major aim of relational database design


(i.e normalization) is to group attributes
into relations to minimize data
redundancy
Problems associated with data
redundancy are illustrated by comparing
the Staff and Branch relations with the
StaffBranch relation.

Pearson Education Limited 1995, 2005

Data Redundancy and


Update Anomalies

Design 1

Design 2

Pearson Education Limited 1995, 2005

Data Redundancy and Update


Anomalies
8

StaffBranch relation has redundant data; the


details of a branch are repeated for every
member of staff. (Refer to Design 2)
In contrast, the branch information (bAddress)
appears only once for each branch in the
Branch relation and only the branch number
(branchNo) is repeated in the Staff relation,
to represent where each member of staff is
located. (Refer to Design 1)
Relations that contain redundant information
may potentially suffer from update anomalies.
Types of update anomalies include
Insertion
Deletion
Modification
Pearson Education Limited 1995, 2005

Insertion Anomalies Examples

If Design 2 is used,
to enter the details of new staff with branch no. B007
would require that the correct details of branch no. B007
is entered so that it will be consistent with values for
branch no. B007 in other tuples. But if Design 1 relation
is used, they do not suffer this potential inconsistency
to insert a new branch that has no member, other
attributes would consist null values this can violate
primary key req.

10

Deletion Anomalies Example

If Design 2 is used, if we delete a tuple from


the relation that represents the last member
of staff located at a branch (branchNO =
B007), the details of the branch is lost
compared to if we used the relations Staff
and Branch relations in Design 1.

Modification Anomalies Example

11

If Design 2 is used, if the value of the


attribute is to be changed for example
bAddress = 22 Deer Rd, London, the
other tuples with the same bAddress must
also be updated.

12

The Need for


Normalization

Example: Company that manages building


projects

Each project has its own project number, name,


employees assigned to it
Each employee has an employee number, name
& job classification
Charges its clients by billing hours spent on
each contract
Hourly billing rate is dependent on employees
position
Periodically, report is generated that contains
information displayed in Table 5.1

13

The Need for Normalization


(continued)

14

The Need for Normalization


(continued)

Structure of data set in previous figure


does not handle data very well

Modification anomalies
Insertion anomalies
Deletion anomalies

15

The Normalization
Process
Works through a series of stages called
normal forms:

Unnormalized form (UNF) A table that contain


one or more repeating groups
First normal form (1NF) table format; no
repeating group
Second normal form (2NF) 1NF and no partial
dependencies
Third normal form (3NF) 2NF and no transitive
dependencies

16

The Process of
Normalization

Pearson Education Limited 1995, 2005

17

Conversion to First
Normal
Form
Repeating group

Derives its name from the fact that a group of


multiple entries of same type can exist for any
single key attribute occurrence
Ex. PROJ_NUM=15 has 5 entries that are
related because they each share the
PROJ_NUM=15 characteristics

Relational table must not contain repeating


groups => reflecting data redundancies
Normalizing table structure will reduce data
redundancies

18

Conversion to
1NF(continued)
Step 1: Eliminate the Repeating Groups

Present data in tabular format, where each


cell has single value and there are no
repeating groups
Eliminate repeating groups, eliminate nulls
by making sure that each repeating group
attribute contains an appropriate data
value

Conversion to
1NF(continued)

19

20

Conversion to
1NF(continued)

Step 2: Identify All Dependencies


Definition:A functional dependency
occurs when one attribute in a relation
uniquely determines another attribute.
This can be written A B which would
be the same as stating "B is functionally
dependent upon A" or "A determines
B".

21

Conversion to
1NF(continued)

Dependencies can be depicted with help of


a diagram (or dependency notation A B).
Dependency diagram:
Depicts

all dependencies found within given


table structure
Helpful in getting birds-eye view of all
relationships among tables attributes
Makes it less likely that will overlook an
important dependency

22

Conversion to
1NF(continued)

Partial dependency a dependency that that is based on only


part of a composite primary key
Transitive dependency a dependency of one non-prime
attribute on another non-prime attribute

23

Conversion to
1NF(continued)

Step 3: Identify the Primary Key

Primary key must uniquely identify


attribute value. In other words, if a value
of the key is given, only one answer can
be returned for other attributes.
For

example, PROJ_NUM in the sample


schema cannot be a primary key. This is so
since PROJ_NUM=15 can identify any one of 5
employees so PROJ_NUM alone is not enough
to be used as a primary key

24

Conversion to
1NF(continued)

Primary key can be determined based on


the functional dependencies identified
earlier.
It

can be a single attribute, i.e., the


determinant which can determine uniquely all
attributes or
Composition of several determinants which
can cover all attributes. [Note: if there are few
possibilities, choose the ones with biggest
scope]

25

Conversion to
1NF(continued)

Primary key is combination of proj_num and emp_num.

Result of 1NF
26

Result from 1NF normalization process will be


one relation with all attributes listed.
Primary/composite key is underlined.
Example:
EMPLOYEE_PROJECT (proj_num, proj_name,
emp_num,
emp_name, job_class,
chg_hr, hours)

Relation name

Primary/composite
key

All attributes

1NF
27

In First normal form :


All key attributes are defined
There are no repeating groups in the table that is
each row/column intersection contains one and
only one value, not a set of values
All attributes are dependent on primary key
Problem: 1NF table structure contains partial
dependencies
Sometimes used for performance reasons, but
should be used with caution
Still subject to data redundancies. Ex. What
happen if EMP_NUM = 105 changes JOB_CLASS?

28

Conversion to Second
Normal Form

Relational database design can be improved by


converting the database into 2NF.
2NF removes partial dependency
If 1NF relation has a single attribute as primary
key, then the relation is automatically in its 2NF
as well.
Partial dependency can only happen when
composite key exists. If we have more than one
attribute in the key, then there are possibilities
that some attributes may depend on a portion
of the key only.

29

Conversion to
2NF(continued)
Step 1: Write Each Key Component on a
Separate Line
Write each key component on separate line,
then write original (composite) key on last line
PROJ_NUM
EMP_NUM
PROJ_NUM

EMP_NUM

Each component will become key in new


table/relation
NOTE: If the key has 2 attributes (A,B), then possible components will
be A, B, and AB. If 3 components (A,B,C), then it will be A, B, C, AB,
AC, BC and ABC.

Conversion to 2NF
(continued)

30

Step 2: Assign Corresponding Dependent Attributes


Determine those attributes that are dependent on
other attributes
(PROJ_NUM, PROJ_NAME)
(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
(PROJ_NUM, EMP_NUM, HOURS)

At this point, most anomalies have been


eliminated
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGNMENT(PROJ_NUM, EMP_NUM, HOURS)

31

Conversion to
2NF(continued)
If at the end, there are relations with

only the keys in them (except for the


relation having all keys), then the
relations can be eliminated.
Relation is in second normal form (2NF)
when it includes no partial dependencies

Another technique to
convert to 2NF

32

Step 1: Write the original 1NF relation

EMPLOYEE_PROJECT (proj_num, proj_name, emp_num,


emp_name, job_class, chg_hr, hours)

Step 2: For each partial dependency, create a new


relation, with the determinant as key. In the 1NF,
delete the dependents and circle/italic/colored the
key (foreign key).
EMPLOYEE_PROJECT (proj_num, emp_num,
Proj_num
emp_name, job_class, chg_hr, hours)

PROJECT (proj_num, proj_name)


proj_name

**Repeat for other partial

Result of 2NF
33

Result from 2NF normalization process will be


multiple relations. Each primary/composite key
is underlined. Each foreign key identified
(circle/italic/colored)
Example:
EMPLOYEE_PROJECT (proj_num, emp_num,
hours)
PROJECT (proj_num, proj_name
EMPLOYEE (emp_num, emp_name, job_class,
chg_hr)

34

Conversion to
2NF(continued)

35

Conversion to Third
Normal Form
3NF removes transitive dependency
Step 1: Write the previous 2NF relations

Step 2: For each transitive dependency (nonkey dependents on another non-key), create
a new relation with determinant as key.
Step 3: In the original 2NF, delete the
dependents. Make the key into foreign key
(circle/italic/colored).

Result of 3NF
36

Result from 3NF normalization process will be


multiple relations. Each primary/composite key
is underlined. Each foreign key identified
(circle/italic/colored)
Example:
EMPLOYEE_PROJECT (proj_num, emp_num,
hours)
PROJECT (proj_num, proj_name
EMPLOYEE (emp_num, emp_name, job_class)
JOB (job_class, chg_hr)

See the difference. In 2NF, the key is both


f.k. and p.k. In 3NF, the key is f.k. only

Conversion to 3NF
(continued)

37

Note: Original EMPLOYEE tables transitive


dependency is eliminated

staffNo
38

Sample Exercise
1

branchNo

branchAddress

S4552

B001

City South Plaza, Seattle, WA


98122

S4555

B004

S4612
S4612

name

position

hoursPerWeek

Ellen London

Assistant

16

16 14th Avenue, Seattle, WA


98128

Ellen Layman

Assistant

B002

City Center Plaza, Seattle,


WA 98122

Dave Sinclair

Clerk

14

B004

16 14th Avenue, Seattle, WA


98128

Dave Sinclair

Clerk

10

Examine the table shown above. This table represents the hours worked per week
for temporary staff at each branch of a company.
1.

Identify the functional dependencies represented by the data shown in the


table.

2.

Using the functional dependencies identified in part (2), describe and illustrate
the process of normalization by converting the table to Third Normal Form (3NF)
relations. Identify the primary and foreign keys in your 3NF relations.

39

Sample Exercise
2

Given the following relational schema:


MOVIE(cinemaID, cinemaCapacity, movieID, movieTitle, movieDuration,
showDate, showTime, actorID, actorName, ticketPrice, ticketSold,
totalCollection)
1.

Sketch a table with the attributes of the above schema as column headers.
Populate the table with 10 records of data.

2.

Identify the primary key and the functional dependencies represented by the
data shown in the table.

3.

Using the functional dependencies identified in part (2), describe and illustrate
the process of normalization by converting the table to Third Normal Form (3NF)
relations. Identify the primary and foreign keys in your 3NF relations.

40

Sample Exercise
3

Given the following incomplete dependency diagram,


(a) Draw an arrow for each functional dependency
(b) State the primary key based on the identified functional dependencies
(c) Write the 1NF relational schema for the diagram
(d) Identify any partial dependency
(e) Normalize the relation in (c) into 2NF relations
(f) Identify any transitive dependency
(g) Normalize the relation in (e) into 3NF relations.
TE
AM
ID

CTR G
M
Y
RP A
T
C
H
ID

DAT
E

TIM
E

STADIU LOC W L
M

GF

GA PT

Learning Outcomes
41

Now students should be able to:

Explain what is normalization and the


purpose of normalization
Perform normalization process from
lower normal forms to higher normal
forms; 1NF, 2NF and 3NF

References
42

Database Systems A practical Approach to


Design, Implementation and Management.
Thomas Connolly, Carolyn Begg (2010), Addision
Wesley, Fifth Edition.
Chapter 14
Database Systems Design, Implementation
& Management.
Peter Rob, Carlos Coronel (2007), Thomson Course
Technology, Seventh edition.
Chapter 5 (pg 148 158)

S-ar putea să vă placă și