Sunteți pe pagina 1din 42

Normalization and Codds Rules

n n

Normalization Normal Forms


n n n

1 NF 2 NF 3 NF

Codds Rules

Data Normalization
n

The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise.
n n n

Achieve a design that is highly flexible Reduce redundancy Ensure that the design is free of certain update, insertion and deletion anomalies

Normalization
1NF 1NF 2NF 2NF 3NF 3NF BCNF BCNF 4NF 4NF Flat file Partial dependencies removed Transitive dependencies removed Every determinant is a candidate key Non-tivial multi-valued dependencies removed

Order No. Date:

10001
Go, Hogs

6 / 15 / 99

Stereos To Go
Invoice
Stereos To Go

Account No. Customer:


Address:

0000-000-0000-0 John Smith 2036-26 Street Sacramento CA 95819


City State Zip Code

0000 000 0000 0 John Smith

1/05

Date Shipped:
Item Number 1 2 3 4 5 Product Code

6 / 18 / 99
Product Description/Manufacturer Qty Price

SAGX730 AT10 CDPC725

Pioneer Remote A/V Receiver Cervwin Vega Loudspeakers Sony Disc-Jockey CD Changer

1 1 1

56995 35995 39995

Subtotal Shipping & Handling Sales Tax Total

132985 10000 10306 153291

Unnormalized Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1 Item1_descrip Item1_qty Item1_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , Item7 Item7_descrip Item7_qty Item7_price)

How would a program process the data to recreate the invoice?

Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1, Item1_descrip, Item1_qty, Item1_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groups Item7, Item7_descrip, Item7_qty, Item7_price)

A flat file places all the data of a transaction into a single r ecord. record.
This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.

Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price)

Nominated group of attributes to serve as the key (form a unique combination) Eliminate the repeating groups. Each row retains data for one item. If a person bought 5 items, we would have five tuples

r e e e b m b m a Flat File m u n u n r n e nt e m c u o to oi c s v In Ac Cu Item Description


10001 10001 123456 123456 John John Smith Smith AT10 AT10

1NF r

Item Item Quantity Price


1 1 569.95 569.95 359.95 359.95 399.95 399.95 100.00 100.00 103.06 103.06

10001 10001 123456 123456 John John Smith Smith SAGX730 SAGX730 Pioneer Pioneer Remote Remote A/V A/V Rec Rec

Cerwin Cerwin Vega Vega Loudspeakers Loudspeakers 1 1 1 1 1 1 1 1

10001 10001 123456 123456 John John Smith Smith CDPC725 CDPC725 Sony Sony Disc Disc Jockey Jockey CD CD 10001 10001 123456 123456 John John Smith Smith S/H S/H 10001 10001 123456 123456 John John Smith Smith Tax Tax Shipping Shipping Sales Sales Tax Tax

From 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price)

Functional dependencies and determinants Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.

From 1NF to 2NF


(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Item, Item_descrip, Item_qty, Item_price)

Is this unique by itself? What happens if the item is purchased more than once?

From 1NF to 2NF


(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)

Partial dependency
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)

Composite key (forms a unique combination)

From 1NF to 2NF


(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item, Item_qty, Item_price) (Item, Item_descrip)

From 2NF to 3NF


(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item, Item_qty, Item_price) (Item, Item_descrip)

Which attributes are dependent on others? Is there a problem?

Transitive Dependencies and Anomalies


n

Insertion anomalies
n

To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries By deleting a row, a customer or product may cease to exist To modify a customers or products data in one row, all modifications must be carried

Deletion anomalies
n

Modification anomalies
n

Insertion and Modification Anomalies For example


Insert a new Panasonic product
Product_code Manufacturer_name

DVD -A110 DVD-A110 PV -4210 PV-4210 PV -4250 PV-4250

Panasonic Panasonic Panasonic Panasonic Panasonic Panasonic

CT -32S35 CT-32S35

PAN PAN

Inconsistency
DVD -A110 DVD-A110 PV -4210 PV-4210 PV -4250 PV-4250 CT -32S35 CT-32S35 Panasonic Panasonic PanaSonic PanaSonic Pana Pana Sonic Sonic PAN PAN

Change all Panasonic products manufacturer name to Panasonic USA

Deletion Anomaly
For Example
4377182 4398711 4578461 4873179 John Smith Arnold S Gray Davis Lisa Carr
lll lll lll lll

Sacramento Davis Sacramento Reno

CA CA CA NV

95831 95691 95831 89557

By deleting customer Arnold S, we would also be deleting Davis, California.

Transitive Dependencies
A condition where A, B, C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

Invoice_number Invoice_date Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item Item_descrip Invoice_number+Item Item_qty Item_price

Why Should City and State Be Separated from Customer Relation?


n

City and state are dependent on zip code for their values and not the customers identifier (i.e., key). Zip_code City, State

Otherwise, Cust_account Cust_addr, Zip_code City, State

3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code Relation
(Zip_code, City, State)

Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)

Items Relation
(Item, Item_descrip)

3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code Relation
(Zip_code, City, State)

Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)

Items Relation
(Item, Item_descrip)

Manufacturers Relation
(Manuf_code, Manuf_name)

Since the Items relation contains the manufacturers name in the description, a separate Manufacturers relation can be created

First to Third Normal Form


(1NF - 3NF)
n

1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)

Codd's Rules
E. F. Codd presented these rules as a basis of determining whether a DBMS could be classified as Relational

Codd's Rules
n

Codd's Rules can be divided into 5 functional areas


n n n n n

Foundation Rules Structural Rules Integrity Rules Data Manipulation Rules Data Independence Rules

Foundation Rules
n n

Rule 0 Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities.
n

All data definition & manipulation must be able to be done through relational ops.

Foundation Rules
n n

Rule 12 - Nonsubversion Rule If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language.
n

All database access must be controlled through the DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA.
n

This does not prohibit use of record at a time languages e.g. PL/SQL

Codd's Rules
n

Structural Rules (Rules 1 & 6)


n

The fundamental structural construct is the table. Codd states that an RDBMS must support tables, domains, primary & foreign keys. Each table should have a primary key.

Structural Rules
n n

Rule 1 All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table.
n

ALL info even the Metadata held in the system catalogue MUST be stored as relations(tables) & manipulated in the same way as data.

Structural Rules
n n

Rule 6 - View Updating All views that are theoretically updatable are updatable by the system.
n

Not really implemented yet by any available system.

Codd's Rules
n

Integrity Rules (Rules 3 & 10)


n

Integrity should be maintained by the DBMS not the application.

Rule 3 - Systematic treatment of null values Null values are supported for representation of 'missing' & inapplicable information in a systematic way & independent of data type.

Integrity Rules
n n

Rule 10 - Integrity independence Integrity constraints specific to a particular RDB MUST be definable in the relational data sublanguage & storable in the DB, NOT the application program.
n

This gives the advantage of centralised control & enforcement

Codd's Rules
n n

Data Manipulation Rules (Rule 2, 4, 5 & 7) User should be able to manipulate the 'Logical View' of the data with no need for knowledge of how it is Physically stored or accessed. Rule 2 - Guaranteed Access Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.

n n

Data Manipulation Rules


n

Rule 4 - Dynamic on-line Catalog based on relational model The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data.
n

System & other data stored & manipulated in the same way.

Data Manipulation Rules


n n

Rule 5 - Comprehensive Data Sublanguage RDBMS may support many languages & modes of use, but there must be at least ONE language whose statements can express ALL of the following n n n n n n

Data Definition View Definition Data manipulation (interactive & via program) Integrity constraints Authorization Transaction boundaries (begin, commit & rollback)
n

1992 - ISO standard for SQL provides all these functions

Data Manipulation Rules


n

Rule 7 - High-level insert, update & delete Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.

Codd's Rules
n

Data Independence Rules (Rules 8, 9 11) These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.

Data Independence Rules


n n

n n

Rule 8 - Physical Data Independence Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods. Rule 9 - Logical Data Independence Appn Progs & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Data Independence Rules


n n

Rule 11 - Distribution Independence The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed.

Data Independence Rules


n

Rule 11 - Distribution Independence n

This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment.
n

The user should 'see' one centralised DB whether data is located on one or more computers.

Data Independence Rules


n

Rule 11 - Distribution Independence


n

This rule does not say that to be fully Relational the DBMS must support distributed DB's but that if it does the query must remain the same.

Summary
n

Codd's Rules can be divided into 5 functional areas


n n n n n

Foundation Rules Structural Rules Integrity Rules Data Manipulation Rules Data Independence Rules

S-ar putea să vă placă și