Documente Academic
Documente Profesional
Documente Cultură
6 – Databases
What is an entity?
An entity is an object of a system which has some data of the system associated with it. In a
relational database each entity can be developed as a table.
Examples:
A relational database is a set of tables which are linked through “relationships”. Therefore a
relational database can be considered as three dimensional.
Table 1
Table 2
Table 3
What is a table?
What is a record?
A record is a single instance of an entity and is a row of the table representing that entity. Each
record contains of a set of related fields.
What is a field?
A field is an attribute of an entity and is a column of that table. In a record a field contains a
single piece of data item.
Example:
2
The table given below represents the entity “Student”. Name, telephone number and date of birth
are attributes of the “Student” entity. Therefore “Name”, “Telephone Number” and “Date of
Birth” are fields of the “Student Details” table.
If the data of a particular system resides in a single file, such a file is called a flat file. A flat file
contains a set of records, where each record consists of a set of fields. A flat file can be
considered as two dimensional.
Field
Suppose that an organization stores its data in the following flat files:
(a) File 1: Contains data about customers such as the customer’s name, gender and address
(maintained by the customer service department)
(b) File 2: Contains data about products and their suppliers (maintained by the purchasing
department)
(c) File 3: Contains data about sales such as the products and quantities sold and names of the
customers who purchased them (maintained by the sales department)
1. Separation and isolation of data – which leads to the inability of running queries to generate
important and useful information
Suppose that the organization needs to find the male customers who have purchased a product
supplied by a particular supplier. This would not be possible because the sales department file
(File 3) does not contain the customer’s gender. If this information is to be found, the names of
the customers’ who purchased that particular product have to be first extracted from the sales
department file (File 3). Then using that set of names, the names of those customers whose
gender is male have to be extracted from the customer service department’s file (File 1). This
is because it is impossible to run a query by joining the two flat files as there is no relationship
3
2. Redundancy of data
Duplication of data wastes time and money involved in data entry. For example, customer’s
name must be entered in the customer services department flat file (File 1) as well as in the
sales department flat file (File 3).
3. Data inconsistency
This means that the same data which appears in two different places has two different values.
This happens as a consequence of data redundancy and it leads to the loss of data integrity. For
example, the customer service department might correct wrong spelling of a customer’s name
in their flat file (File 1) but sales department may not do that correction in their file (file 3).
The applications depend on the record format of the flat file. If a new field is added to the file
or an existing field is removed from the file, code in all the application programs accessing the
file must be changed.
Normalization is the process of removing data redundancy from a database. When the database
design is fully normalized, there is no repetition of data across tables with the exception of the
fields used to link the tables together.
1. Every table must have a field or a combination of fields defined as a primary key which is used
to identify a record uniquely.
3. Each field must contain data that is non-decomposable (that cannot be broken down into
smaller pieces).
For example the first name and the last name should not appear in the “Customer Name” field
but it should be broken down into two separate fields such as “First Name” and “Last Name”
fields.
4. A table should not contain repeating groups of data. For example fields such as Product 1,
Product 2, Product 3 etc.
If a primary key is a combination of two fields or more than two fields, such a primary key is
called a composite primary key.
For example the “Order Details” table needs the combination of both “Order ID” and “Product
ID” fields defined as its primary key.
If the table has a composite primary key, every non-key field in the table must relate to the
composite primary key, but not to a part of the composite primary key.
For example, the “Order Details” table can contain the non-key fields, “Quantity”, “Expiry Date”
and “Date Manufactured” as those fields relate to the combination of “Order ID” and “Product
ID” fields, but not to any one of those two fields. But this table should not contain “Unit Price” as
a field as this field relates only to the “Product ID” field.
Any non-key field in a table must relate to the primary key of the table and should not depend on
any other non-key field.
For example Orders table cannot contain the fields Order ID (primary key), Order Date,
Customer ID and Customer Name. This is because the non-key field, Customer Name, depends
on another non-key field, Customer ID. Therefore the inclusion of the Customer Name field in
the Orders table violates the 3rd rule of normalization.
• It saves disk space by removing redundant data from the database and thereby reducing the
size of database.
• Maintaining the database is easy and less time consuming as it is enough to change a field
only in one table of the database.
• It increases the efficiency of updating the database, by the implementation of relationships
between multiple tables.
Since complete normalization involves separating data into single-entity tables, it leads to joining
multiple tables together when it is necessary to obtain information from more than one table.
These joins take a long time to be created and retrieving information from the queries can be very
slow.
This is an attribute in one table which is a primary key in another table. A foreign key is used to
create a link or a relationship between the two tables.
Example 1:
Draw an entity relationship diagram to represent this data model in 3NF and label the
relationships. (2004 Nov 2)
In this example the three entities, OWNER, GARDEN and PLANT are connected by the
following relationships:
In the above diagram a many-to-many relationship exists between the Garden entity and the Plant
entity. In E-R diagrams a many-to-many relationship must be broken down into two separate one-
to-many relationships by introducing a linking entity. In this case the linking entity is named as
GARDEN PLANT. The E-R diagram corrected to the 3rd normal form is shown below:
OWNER PLANT
owns is planted as
contains
GARDEN GARDEN PLANT
Example 2:
A bakery accepts catering orders from customers. Most of them are regular customers and each
one is given an id number. A CUSTOMER is issued an invoice when he places an ORDER which
indicates each PRODUCT that the customer has ordered and the date that he can pick up his
order. At the order taking counter the bakery displays a list of products that it produces at the
bakery. Draw an entity relationship diagram to represent this data model in 3NF and label the
relationships.
CUSTOMER PRODUCT
contains
ORDER ORDER DETAIL
When an entity relationship diagram is implemented as a database the entities become tables of
the database and the attributes become the fields of those tables. When the database structure is
written in standard form the table name is followed by the field names written inside a pair of
brackets and the primary key underlined.
Order Details (Order ID, Product ID, Quantity Ordered, Expiry Date)
2. Data is consistent because the tables are normalized and one field appears only at one place in
the database. Therefore the data has to be updated only at one place in the database.
3. Data is shared throughout the organization because the database can be implemented as a
central database by installing it in a database server.
7. Can economize on size through centralization and using one very large computer with dumb
terminals or a network of computers.
9. Increased productivity through the data handling processes of the database management
system
11. Can achieve increased productivity using file handling techniques of the DBMS instead of
each application having to have its own procedures
12. Changes to the database structure (adding a field or removing a field from the database) do
not cause applications to be re-written
13. Can achieve improved back-up and recovery through the automatic backup and recovery
features of the DBMS. This avoids the need for a human being to remember to backup the
database each day or week.
This is a database which is installed on a shared drive of a network computer, called as the
database server. More than one user can connect to this database over the network simultaneously
and access its views, records, tables and other objects and update them concurrently.
Access rights are implemented because different users are supposed to access only a part of the
data in the database. For example information which is sensitive and confidential should only be
accessible to those who need them.
In an information system of a hospital, the receptionists should only have access to the contact
information of the patients, but they should not have access to the information about the diagnosis
8
of the disease and the medications prescribed to the patients. The nurses should have access only
to the medications prescribed to the patient, but not to the diagnosis of the disease. The doctors
should have access to all these information including the diagnosis which is the most confidential
information. Here the doctors are said to have the highest level of access to the database.
Access rights are implemented in the database by the database administrator creating different
“views” on data using the features of the DBMS. In the above examples the database
administrator would create three different views: one for receptionists, one for nurses and one
doctors. Each view can be accessed only when those users log on with the authorized user names
and passwords.
The access rights can also be implemented at hardware level. Only those computers in a particular
area of the organization will be given a particular access right. This can be done by setting access
rights using the network interface card id (NIC ID) of the computer.
Ensuring the integrity of a shared database means making sure that no data is accidentally lost or
corrupted. The integrity of a shared database can be lost when multiple users simultaneously
update a database and at least one change of data fails to get updated when many users make
simultaneous updates to one particular record.
When a field is updated, the entire record will be copied into the user’s local memory area at the
user’s local workstation. When the record is saved, the record is rewritten to the file server.
Consider the following example.
Example:
User A accesses a customer record, thereby causing it to be copied into the memory at his/her
workstation and starts typing in a new address for the customer.
User B accesses the same customer record through his computer and alters the credit limit and
then saves the record to the database
User A then completes the address change and saves the record.
Aim: To update the record to have the new address and the new credit limit
Answer: The record will have the new address and the old credit limit.
Solution: Lock the record as soon as the User A accesses it. Unlock the record only after the User
A saves the changes to the database.
What is DBMS?
9
DBMS is a software application used to control access to the data stored in a database. It provides
an interface between the operating system and the user in order to provide access to the data as
simple as possible. DBMS also has features which helps it to overcome the problems associated
with the flat files:
What are the features implemented in a DBMS which help to overcome the problems
associated with the flat files?
A database may be considered from several different levels or “views” known as schema. The
three levels of schema are:
This is the individual’s view of the database. In a multi-user database, there will generally be
several different external schema representing each user’s view according to his/her needs or
access rights.
Conceptual level
This is an integration of all the user views of the entire database, including entities, attributes and
relationships, as designed by the database designer.
This describes the logical structure used for the storage of data and the data access methods. It is
generally transparent to the user.
Functions of a DBMS
10
• DBMS provides an interface between the operating system and the user in order to provide
access to the data as simple as possible.
• DBMS allows storage, retrieval and update of data as easy as possible without having to be
aware of the internal structure of the database.
• DBMS must provide facilities to manage simultaneous access and update of a record in the
database.
• DBMS provides the capability to recover the database in the event of a system failure.
• DBMS handles password allocation and checking and then provides the user with the
appropriate “view”.
Both DDL and DML are coded in SQL (Structured Query Language). SQL is a declarative
programming language about which you would learn under the “Programming Languages”
section.
DDL is used to define the structure (tables), data types and constraints of the database. For
example, it uses the following statements:
• CREATE TABLE statement to create a table specifying the field names, field lengths and
data types.
insert/update/delete access.
DML is used to modify, insert, update, delete and retrieve data in a database. It uses mainly the
following statements:
• UPDATE <table>
SET <field name> = <formula>
WHERE <field name> = <value> statement to set a new formula to come up with the
values for a particular field
• INNER JOIN and OUTER JOIN statements to join tables using common fields and extract a
collection of fields from those joined tables
A client server database is a database system where the DBMS server software runs on the
network server and the DBMS client software runs on individual workstations which act as client
computers.
This is a database which contains DBMS facilities with object oriented programming capability.
Data is stored as objects and can only be interpreted using the methods specified by its class.
• Conventional DBMSs were designed for homogeneous data that can be easily structured into
predefined data fields and records.
• Many applications today, however, require databases that can store and retrieve not only
structured numbers and characters but also drawings, images, photographs, voice and full-
12
motion video. For example a patient database might need to store not only information on
name address, test results and diagnosis but also X-ray images.
• Conventional DBMSs are not well-suited to handling graphics-based or multimedia data. An
object oriented database stores the data and methods as objects that can be automatically
retrieved and shared.
Under partitioning, part of the data in the database is held locally and processed in that particular
machine.
Under duplication, a copy of the entire database is sent to be used by other machines.
1. Many copies of entire database is sent to other machines means that the database is always
backed up