Sunteți pe pagina 1din 12

Chapter 3.

6 – Databases

What is an entity?

An entity is an object of a system which has some data of the system associated with it. In a
relational database each entity can be developed as a table.

Examples:

Product and invoice are entities of a retail store system


Order and Product are entities of a purchasing system
Passenger and Flight are entities of an airline seat reservation system
Student and Class are entities of a student registration system of a school

What is a relational database?

A relational database is a set of tables which are linked through “relationships”. Therefore a
relational database can be considered as three dimensional.

Table 1

Table 2

Table 3

What is a table?

A table is a set of records about a single entity.

What is a record?

A record is a single instance of an entity and is a row of the table representing that entity. Each
record contains of a set of related fields.

What is a field?

A field is an attribute of an entity and is a column of that table. In a record a field contains a
single piece of data item.

Example:
2

The table given below represents the entity “Student”. Name, telephone number and date of birth
are attributes of the “Student” entity. Therefore “Name”, “Telephone Number” and “Date of
Birth” are fields of the “Student Details” table.

Student Details Table

Name Telephone Number Date of Birth


John 0112896111 14/03/91 Record
Peter 0112842735 22/06/92

What is a flat file?

If the data of a particular system resides in a single file, such a file is called a flat file. A flat file
contains a set of records, where each record consists of a set of fields. A flat file can be
considered as two dimensional.

Field

LW03751, John, 11 Science A


LW03752, Peter, 11 Science A
LW03753, Edward, 11 Commerce A
LW03754, Mark, 11 Science B Record

What are the problems associated with a flat file?

Suppose that an organization stores its data in the following flat files:

(a) File 1: Contains data about customers such as the customer’s name, gender and address
(maintained by the customer service department)

(b) File 2: Contains data about products and their suppliers (maintained by the purchasing
department)

(c) File 3: Contains data about sales such as the products and quantities sold and names of the
customers who purchased them (maintained by the sales department)

1. Separation and isolation of data – which leads to the inability of running queries to generate
important and useful information

Suppose that the organization needs to find the male customers who have purchased a product
supplied by a particular supplier. This would not be possible because the sales department file
(File 3) does not contain the customer’s gender. If this information is to be found, the names of
the customers’ who purchased that particular product have to be first extracted from the sales
department file (File 3). Then using that set of names, the names of those customers whose
gender is male have to be extracted from the customer service department’s file (File 1). This
is because it is impossible to run a query by joining the two flat files as there is no relationship
3

exists between the two files which link them.

2. Redundancy of data

Duplication of data wastes time and money involved in data entry. For example, customer’s
name must be entered in the customer services department flat file (File 1) as well as in the
sales department flat file (File 3).

3. Data inconsistency

This means that the same data which appears in two different places has two different values.
This happens as a consequence of data redundancy and it leads to the loss of data integrity. For
example, the customer service department might correct wrong spelling of a customer’s name
in their flat file (File 1) but sales department may not do that correction in their file (file 3).

4. Dependence of applications on data

The applications depend on the record format of the flat file. If a new field is added to the file
or an existing field is removed from the file, code in all the application programs accessing the
file must be changed.

What is meant by normalizing a database?

Normalization is the process of removing data redundancy from a database. When the database
design is fully normalized, there is no repetition of data across tables with the exception of the
fields used to link the tables together.

What is the 1st rule of normalization?

The first rule of normalization specifies that:

1. Every table must have a field or a combination of fields defined as a primary key which is used
to identify a record uniquely.

3. Each field must contain data that is non-decomposable (that cannot be broken down into
smaller pieces).
For example the first name and the last name should not appear in the “Customer Name” field
but it should be broken down into two separate fields such as “First Name” and “Last Name”
fields.

4. A table should not contain repeating groups of data. For example fields such as Product 1,
Product 2, Product 3 etc.

What is composite primary key?


4

If a primary key is a combination of two fields or more than two fields, such a primary key is
called a composite primary key.

For example the “Order Details” table needs the combination of both “Order ID” and “Product
ID” fields defined as its primary key.

What is the 2nd rule of normalization?

If the table has a composite primary key, every non-key field in the table must relate to the
composite primary key, but not to a part of the composite primary key.

For example, the “Order Details” table can contain the non-key fields, “Quantity”, “Expiry Date”
and “Date Manufactured” as those fields relate to the combination of “Order ID” and “Product
ID” fields, but not to any one of those two fields. But this table should not contain “Unit Price” as
a field as this field relates only to the “Product ID” field.

What is the 3rd rule of normalization?

Any non-key field in a table must relate to the primary key of the table and should not depend on
any other non-key field.

For example Orders table cannot contain the fields Order ID (primary key), Order Date,
Customer ID and Customer Name. This is because the non-key field, Customer Name, depends
on another non-key field, Customer ID. Therefore the inclusion of the Customer Name field in
the Orders table violates the 3rd rule of normalization.

What are the advantages of database normalization?

• It saves disk space by removing redundant data from the database and thereby reducing the
size of database.
• Maintaining the database is easy and less time consuming as it is enough to change a field
only in one table of the database.
• It increases the efficiency of updating the database, by the implementation of relationships
between multiple tables.

What are the disadvantages of database normalization?

Since complete normalization involves separating data into single-entity tables, it leads to joining
multiple tables together when it is necessary to obtain information from more than one table.
These joins take a long time to be created and retrieving information from the queries can be very
slow.

What is a secondary key?


5

This is an attribute used to access the records in a table in a different order.

What is a foreign key?

This is an attribute in one table which is a primary key in another table. A foreign key is used to
create a link or a relationship between the two tables.

Example 1:

A landscape garden company services a number of gardens. Each GARDEN is owned by an


OWNER. Each owner may have more than one garden. Each garden has a number of PLANTS in
it and each plant may be in a number of gardens.

Draw an entity relationship diagram to represent this data model in 3NF and label the
relationships. (2004 Nov 2)

In this example the three entities, OWNER, GARDEN and PLANT are connected by the
following relationships:

owns is planted with


Owner Garden Plant
is planted in

The above relationships read as follows:

At least one owner owns many gardens


At least one garden is planted with many plants and at least one plant is planted in many
gardens

In the above diagram a many-to-many relationship exists between the Garden entity and the Plant
entity. In E-R diagrams a many-to-many relationship must be broken down into two separate one-
to-many relationships by introducing a linking entity. In this case the linking entity is named as
GARDEN PLANT. The E-R diagram corrected to the 3rd normal form is shown below:

OWNER PLANT

owns is planted as

contains
GARDEN GARDEN PLANT

The above relationships read as follows:


6

At least one owner owns many gardens


At least one garden contains many garden plants
At least one plant is planted as many garden plants

Entity relationship (E-R diagram) diagram for an order processing system

Example 2:

A bakery accepts catering orders from customers. Most of them are regular customers and each
one is given an id number. A CUSTOMER is issued an invoice when he places an ORDER which
indicates each PRODUCT that the customer has ordered and the date that he can pick up his
order. At the order taking counter the bakery displays a list of products that it produces at the
bakery. Draw an entity relationship diagram to represent this data model in 3NF and label the
relationships.

CUSTOMER PRODUCT

has placed appears in

contains
ORDER ORDER DETAIL

The above relationships are read as follows:

At least one customer has placed many orders


At least one order contains many order details
At least one product appears in many order details

When an entity relationship diagram is implemented as a database the entities become tables of
the database and the attributes become the fields of those tables. When the database structure is
written in standard form the table name is followed by the field names written inside a pair of
brackets and the primary key underlined.

The tables of the above database have the following structure:

Customer (Customer ID, Customer Name, Gender, Telephone Number)

Product (Product ID, Product Name, Unit Price)

Orders (Order ID, Order Date, Customer ID)

Order Details (Order ID, Product ID, Quantity Ordered, Expiry Date)

Advantages of using a relational database


7

1. Can minimize data redundancy through normalization

2. Data is consistent because the tables are normalized and one field appears only at one place in
the database. Therefore the data has to be updated only at one place in the database.

3. Data is shared throughout the organization because the database can be implemented as a
central database by installing it in a database server.

4. Improved data security through the implementation of a centralized database

5. Can execute complex queries involving multiple tables

6. Can enforce standards at departmental, organizational, national and international level

7. Can economize on size through centralization and using one very large computer with dumb
terminals or a network of computers.

8. Improved data accessibility because the data is shared

9. Increased productivity through the data handling processes of the database management
system

10. Can create user views of data

11. Can achieve increased productivity using file handling techniques of the DBMS instead of
each application having to have its own procedures

12. Changes to the database structure (adding a field or removing a field from the database) do
not cause applications to be re-written

13. Can achieve improved back-up and recovery through the automatic backup and recovery
features of the DBMS. This avoids the need for a human being to remember to backup the
database each day or week.

What is meant by a multi-access database?

This is a database which is installed on a shared drive of a network computer, called as the
database server. More than one user can connect to this database over the network simultaneously
and access its views, records, tables and other objects and update them concurrently.

Why access rights are implemented on a database?

Access rights are implemented because different users are supposed to access only a part of the
data in the database. For example information which is sensitive and confidential should only be
accessible to those who need them.

In an information system of a hospital, the receptionists should only have access to the contact
information of the patients, but they should not have access to the information about the diagnosis
8

of the disease and the medications prescribed to the patients. The nurses should have access only
to the medications prescribed to the patient, but not to the diagnosis of the disease. The doctors
should have access to all these information including the diagnosis which is the most confidential
information. Here the doctors are said to have the highest level of access to the database.

Access rights are implemented in the database by the database administrator creating different
“views” on data using the features of the DBMS. In the above examples the database
administrator would create three different views: one for receptionists, one for nurses and one
doctors. Each view can be accessed only when those users log on with the authorized user names
and passwords.

The access rights can also be implemented at hardware level. Only those computers in a particular
area of the organization will be given a particular access right. This can be done by setting access
rights using the network interface card id (NIC ID) of the computer.

What is meant by ensuring the integrity of a shared database?

Ensuring the integrity of a shared database means making sure that no data is accidentally lost or
corrupted. The integrity of a shared database can be lost when multiple users simultaneously
update a database and at least one change of data fails to get updated when many users make
simultaneous updates to one particular record.

When a field is updated, the entire record will be copied into the user’s local memory area at the
user’s local workstation. When the record is saved, the record is rewritten to the file server.
Consider the following example.

Example:

User A accesses a customer record, thereby causing it to be copied into the memory at his/her
workstation and starts typing in a new address for the customer.

User B accesses the same customer record through his computer and alters the credit limit and
then saves the record to the database

User A then completes the address change and saves the record.

Aim: To update the record to have the new address and the new credit limit

What state will the record actually be in?

Answer: The record will have the new address and the old credit limit.

Solution: Lock the record as soon as the User A accesses it. Unlock the record only after the User
A saves the changes to the database.

What is DBMS?
9

DBMS is a software application used to control access to the data stored in a database. It provides
an interface between the operating system and the user in order to provide access to the data as
simple as possible. DBMS also has features which helps it to overcome the problems associated
with the flat files:

What are the features implemented in a DBMS which help to overcome the problems
associated with the flat files?

Problem in the flat file Solution in the DBMS


1. Separation and isolation of data – which 1. Ability to create relationships between the
leads to the inability of running queries to Tables helps to run queries by joining tables.
generate important and useful information

2. Redundancy of data 2. Normalization of the data in tables removes


redundancy.

3. Data inconsistency 3. Normalization of data in tables makes one


attribute to appear only once in the database,
which makes an update to the data take place
only in one place.

4. Dependence of applications on data 4. Ability to create images of tables such or


recordsets

Three-level architecture of a DBMS

A database may be considered from several different levels or “views” known as schema. The
three levels of schema are:

External or user level

This is the individual’s view of the database. In a multi-user database, there will generally be
several different external schema representing each user’s view according to his/her needs or
access rights.

Conceptual level

This is an integration of all the user views of the entire database, including entities, attributes and
relationships, as designed by the database designer.

Internal or storage level

This describes the logical structure used for the storage of data and the data access methods. It is
generally transparent to the user.

Functions of a DBMS
10

• DBMS provides an interface between the operating system and the user in order to provide
access to the data as simple as possible.

• DBMS allows storage, retrieval and update of data as easy as possible without having to be
aware of the internal structure of the database.

• DBMS creates and maintain a data dictionary.

• DBMS must provide facilities to manage simultaneous access and update of a record in the
database.

• DBMS provides the capability to recover the database in the event of a system failure.

• DBMS handles password allocation and checking and then provides the user with the
appropriate “view”.

DDL and DML

DBMSs contain the following two functionalities:

• Data definition language (DDL) and


• Data manipulation language (DML)

Both DDL and DML are coded in SQL (Structured Query Language). SQL is a declarative
programming language about which you would learn under the “Programming Languages”
section.

For what purpose DDL is used?

DDL is used to define the structure (tables), data types and constraints of the database. For
example, it uses the following statements:

• CREATE TABLE statement to create a table specifying the field names, field lengths and
data types.

• CREATE INDEX statement to create an index for a table

• ALTER TABLE ……..ADD statement to add another field to an existing table

• CREATE TABLE ………


PRIMARY KEY …………
FOREIGN KEY …………. statement to define primary keys and link tables by using
foreign key(s)

• GRANT <privileges> ON <table> TO <users> statement to define subschema to allow


other users to have
11

insert/update/delete access.

For what purpose DML is used?

DML is used to modify, insert, update, delete and retrieve data in a database. It uses mainly the
following statements:

• SELECT statement to extract a collection of fields from a given table

• INSERT INTO <field list>


VALUES <data list> statement to add data to create a new record in an existing table

• UPDATE <table>
SET <field name> = <formula>
WHERE <field name> = <value> statement to set a new formula to come up with the
values for a particular field

• DELETE FROM <table>


WHERE <field> = <value> statement to delete a record from a given table

• ORDER BY statement to sort records in the ascending or the descending order

• GROUP BY statement to categorize data according to a specified field

• INNER JOIN and OUTER JOIN statements to join tables using common fields and extract a
collection of fields from those joined tables

What is a client server database?

A client server database is a database system where the DBMS server software runs on the
network server and the DBMS client software runs on individual workstations which act as client
computers.

What is an object oriented database?

This is a database which contains DBMS facilities with object oriented programming capability.
Data is stored as objects and can only be interpreted using the methods specified by its class.

Why do we need object oriented databases?

• Conventional DBMSs were designed for homogeneous data that can be easily structured into
predefined data fields and records.
• Many applications today, however, require databases that can store and retrieve not only
structured numbers and characters but also drawings, images, photographs, voice and full-
12

motion video. For example a patient database might need to store not only information on
name address, test results and diagnosis but also X-ray images.
• Conventional DBMSs are not well-suited to handling graphics-based or multimedia data. An
object oriented database stores the data and methods as objects that can be automatically
retrieved and shared.

How partitioning and duplication are used to distribute data on a network?

Under partitioning, part of the data in the database is held locally and processed in that particular
machine.

Under duplication, a copy of the entire database is sent to be used by other machines.

What are the implications of duplicating data

1. Many copies of entire database is sent to other machines means that the database is always
backed up

2. It increases the speed of response to user requests

3. Data is less secure because of multiple copies

4. Heavy responsibility on network manager to ensure data consistency

S-ar putea să vă placă și