Sunteți pe pagina 1din 21

DATA BASE ENVIRONMENT SYSTEM: 1.

0 Introduction
Data: Data is a piece of information which is fact. Data are known facts and figures that are recorded. Data have implicit meaning. Data Base: Data base is a collection of inter related data. More or less stored permanently stored in a computers memory. The data base has the following implicit properties.

A data base represents same aspect of the real world. A data base is logically coherent. A data base is designed, built and populated with data for a specific purpose.

Data Base Management System: DBMS


DBMS is a collection of programs that enables users to create and maintain a data base. DBMS is a general purpose software system that provides the process of (i) Defining (ii) Constructing & (iii) manipulating database for applications. Defining: It involves specifying the data type structures, and constraints for the data to be stored in the data base. Constructing: It is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating: It is a data base includes function as querying the database to retrieve specific data, updating the database to reflect changes.

1.2 Data Base System Environment:


Users: The layman or the end users whose job is to access the data base for querying, updating & generating reports.

Application programmers: They implement the specification (given by the system analysts) into program. They test, debug, document & maintain the transactions. They are also known as Software Engineers. Application Programs: These are programs written by the Software Engineers to define, construct or to manipulate the database. Queries: Queries are used to retrieve specific data, update the database. Queries are generally handled by end users. Software to Process Queries / Programs: The Applications or Queries are processed by the DBMS software. This software handles programs as well Queries & passes them to a format that is understood by the next level. Software to Access Stored Data: It receives the format from the software and interprets it. It generates a hardware interrupts to access the storage devices. These interrupts may be to store, list, manipulate or even to delete the data.

Stored Database: The Database is stored in storage devices.


File Oriented Approach:
The file processing system or File Management System is used to store data in a computerized data base. Before the advent of DBMS, the application programs are defined & maintained in a master file and other supporting transaction files. Hence one master files with one or more transaction files are used. Here each user defines & implements the files needed for specific applications as part of programming application. The draw backs of file oriented approach are following: (1) It leads to Data redundancy: The same data may be present in more than one file. (2) Wasted Memory Space: Redundant or duplicate copies of the same data results in wastage of storage space. (3) Loss of data Integrity: Data redundancy leads to inconsistency problems. It results that the same data item has a different value in different files.

(4) Difficulty in accessing data: To access data from the files, programming language programs must be written. (5) Information is available in reports: Information is available only through reports, in between queries are not supported. (6) Data Isolation: Data are accessed through reports. This results in data isolation. (7) Inadequate Security: Only operating system level security is provided. In depth security is not provided. (8) High Maintenance Cost: Software maintenance cost will be very expense. (9) Each data file of an application is a separate entity.

Data Base Approach:


In the data base approach, a single repository of data is maintained. The data is defined once & is accessed by various users.

The characteristics of the Database:


(i) Self Describing Nature of the Data Base System: The data base system contains definition or description of the data base structure & constraints. This definition is stored in the system catalog. The information stored in the catalog is called Meta data; it describes the structure of the primary data base. (ii) Insulation between Programs & Data, & Data Abstraction: In DBMS it is not necessary to change all programs whenever a change is made to the structure of the data files. So Data base files are program data independence. In data base approach, the detailed structure & organization of each file are stored in the catalog. (iii) Support of Multiple Views of the Data: Data base allows having multiple views of data. It provides facilities for defining multiple views. (iv) Sharing of Data and Multi-user Transaction Processing:

The data base allows multiple users to access the data base at the same time. DBMS include concurrency control that is if several users try to update the same data it should control such that result of the updates are correct. The applications of the multi user are called Online Transaction Processing OLTP.

Actors on the Scene:


Actors on the scene are persons who define, constructs and manipulates the data base. They are involved in the design, use & maintenance of a large data base. Such people involved in day-to-day use of a large database. (i) Data base Administrators: DBA DBA responsibility is to manage or administering the resources. The primary resource is the data base and the secondary resource is the DBMS and related software. The DBA is responsible authorizing access to the database for monitoring its use, for acquiring software & hardware resources as needed. The DBA is held responsibility for breach of security or poor system response time. (ii) Data base Designers: Data base designers are responsible for identifying the data to be stored in the data base. He is also responsible for choosing appropriate structures to represent & store this data. He has to communicate with all prospective data base users, to understand requirements. It helps in designing. (iii) End Users: They are people who access the data base for querying, updating & generating reports. The data base is exists for the need of end users. The various causes of end users are (a) Casual End Users: They occasionally access the data base. They are in need of different information each time. Such end users are middle or high level managers they use a sophisticated data base query language. They learn only a few facilities that are used repeatedly. (b) Nave or parametric end users: Their job is to constantly query & update the data base. They make use of more portion of data base. They always use standard type of queries & updates called canned transactions. Such transactions are programmed & tested. The various nave users are Bank tellers, Reservation Clerks, for airlines, hotels, Railway station etc. They learn very few facilities provided by DBMS, they understand only the standard types of transactions.

(c) Sophisticated End Users: Such users are familiarizing with the facilities of the DBMS. The sophisticated end users are engineers, scientists, business analysts & others. They learn most of the DBMS facilities. (d) Stand alone Users: Such users maintain personal data bases by using readymade programs that provide easy to use menu or graphics based interfaces. They are proficient in using a specific software package. (iv) System Analysts: They determine the requirements of end users and develop specifications for transactions that meet the requirement. (v) Application Programmers: Application programmer implements the specifications in to programs. The specifications are defined by the System Analysts. The application programmer will test, debug, document and maintain the transaction.

Workers Behind the Scene:


Such users are not interested in the data base itself. The various workers behind the scene are as follows: i. DBMS System Designers and Implementers: Such persons will design & implement the DBMS modules and interfaces as a software package. A DBMS is complex software with modules. The DBMS must interface with other system software, such as the operating system and compilers for various programming languages. ii. Tool Developers: They design & implement tools. The software packages facilitate data base system design & use. Tools are package that are generally purchased separately. The various packages for data base design, performance monitoring, natural language or graphical interfaces, prototyping, simulation & test data generation. Operators & Maintenance Personnel: They are responsible for the actual running & maintenance of the hardware & software.

Benefit of Using Data Base Approach:


The benefits of using data base approach are as follows: i. Controlling Redundancy: The DBMS approach reduces redundancy, i.e. duplicates of data. This redundancy is storing the same data multiple times leads to several data.

Storage space is wasted when the same data is stored repeatedly. Files that represent the same data may become inconsistent.

ii. Restricting Unauthorized Access: In multiple users share a data base, all users will not be authorized to access all information in the data base. Some users may be permitted only to retrieve data while others are may be permitted to retrieve & to update. iii. Providing persistent storage for program objects & Data Structures: Databases are used to provide persistent storage for program objects & data structures. iv. Permitting Inferences & Actions using Rules: Data base system allows defining the deduction rules for inference new information from the stored data base facts. v. Providing Multiple User Interfaces: The data base is used by various types of users. They include query languages for casual user, programming languages interfaces for application programmers; forms & command codes for parametric users; menu driven & natural language interfaces for standalone interfaces. So the DBMS provides access to the multiple user interfaces. vi. Representing Complex Relationship among Data: The DBMS has the capability to represent a variety of complex relationships among the data as well as to retrieve & update related data easily & efficiently. vii. Integrity Constraints: The DBMS must provide capabilities to define & enforce the constraints. The simplest type of integrity constraint involves specifying a data type for each data item. viii. Providing Backup & Recovery: The DBMS must provide facilities to recover from hardware or software failures. The backup & recovery subsystem of DBMS is responsible for recovery. The benefits of Data base Approach are as follows: (i) Potential for Enforcing Standards: The data base approach permits the DBA to define & enforce standards among data base users in a large Organization. It allows

communications & co-operation among various departments, projects & users within the organization. (ii) Reduced Application Development Time: The DBMS provides facilities to create new applications in short durations. The estimated time is less than to one-sixth to onefourth of that of traditional file system. (iii) Flexibility: It is necessary to change the structure of a data base as requirements change. The DBMS allow certain types of changes to the structure of the data base without affecting the stored data & the existing applications programs. (iv) Availability of Up-to-Date Information: A DBMS makes the data base available to all users. One users update is applied to the data base; all other users can immediately see this update. Economies of Scale: The DBMS permits consolidation of data & applications, thus reducing the amount of wasteful overlap between activities of data processing personnel in different departments.

When not to Use a DBMS:


The overheads costs of DBMS are due to the following: i. High initial investment in hardware, software & training. ii. Burden for defining & processing data. Overhead for providing security, concurrency control, recovery & integrity functions. The DBMS is used under following circumstances: The data base & applications are simple, well defined & not expected to change (ii) Multiple user access to data is not required.

2. DATA BASE SYSTEM CONCEPTS AND ARCHITECTURE: Chapter No. 2.


In early days of DBMS packages, the whole DBMS software packages are tightly integrated system. In modern days, DBMS software packages are modular in design with client server system architecture. Here large centralized mainframe computers are replaced by hundreds of distributed workstations & personal computers connected via Communication networks. In basic client-server architecture, system functionality is distributed between two types of modules. (i) Client Module (ii) Server Module Client Module: It will run on a user work station or personal computer. It handles user interaction & provides the user friendly interfaces such as forms or menu-based GUIs (Graphical User Interfaces). In client module application programs & user interfaces that access the data. Server Module: The server module handles data storage, access, search, & other functions.

Data Models, Schemes & Instances:


Data Models: It is a collection of concepts that is used to describe the structure of a data base. It provides all necessary support to describe the data base. The structure of a data base means the data types, relationship & constraints. The data models have set of basic operations for specifying retrievals & updates on the data base.

Categories of Data Models: Data models are classified based on the types of
concepts. (i) High level or Conceptual Data Models (ii) Low level or Physical Data Models

(iii) Representational or Implementation Data Models (i) High Level or Conceptual Data Models: This level provides concepts that are close to the way the users understand the data. This level is meant for the programmer not for end users. Conceptual data models use concepts such as entities, attributes & relationships. (ii) Low level or Physical Data Models: It provides concepts that describe the details how the data is stored within the storage devices. This level is meant for the computer specialists not for end users. (iii) Representational Data Models: This level provides concepts that are understood by end users. It provides details the way data is organized within the computer. It hides the details of data storage implemented on a computer system. Representational or Implementation data models are relational data model network or hierarchical models. SQL is standard language for relational data bases. This data model represents data by using record structures & hence is known as record based data models. Physical data model describes how data is stored in the computer by representing information such as record formats, record ordering & access paths. An access path is a structure that makes the search particular data base records efficient. Entity: Entity represents a real world object or concept. Example: employee, project. Attribute: Attribute represents property. It describes an entity. Ex: employees salary. Relationship: Relationship represents an interaction among the entities. It is among two or more entities. For example: works on relationship between an employee & a project.

Schemas, Instances & Data base State:


Data base Schema: The description of a data base is called data base schema. It is specified during data base design & is not expected to change frequently. Schema diagrams are conventions for the displaying schemas. Below fig. shows a schema diagram: STUD Roll No Name Class Age

Exam Reg. No Sports Roll No Game Sfees Sub 1 Sub 2 Sub 3 Sub 4

A schema diagram displays name of records, data items types & few constraints. Data Base State: The actual data in a data base may change frequently. The data in the data base at a particular moment in time is called a data base state or snapshot. It is also known as current set of occurrences or instances in the data base.

DBMS Architecture & Data Independence:


Characteristics of Data base Approach: (i) Insulation of programs & data (ii) Support of multiple user views (iii) Use of a catalog to store the data base description. So, the DBMS architecture for data base systems is called the three schema architecture. Its goal is to separate the user applications & the physical data base. The three levels of DBMS: 1. Internal Level: It is internal schema. It describes the physical storage structure of the data base. It uses a physical data model & describes the complete details of data storage & access paths for the data base. 2. Conceptual Level: It describes the structure of the complete data base for community of users. This level hides the details of physical storage structures & concentrates on describing entities, data types, relationships user operations, constraints. 3. External or View Level: It describes the part of the data base that a user group is interested in & hides the rest of the data base from that user group.

The three schemas describes the data, the data is actually exists at the physical level. The user refers to its own external schema. The DBMS transform a request specified on an external schema. Hence, the DBMS transform a request specified on an external schema into a required== against the conceptual schema, & then onto a request on the internal schema for processing over the stored data base. This processing of transforming requests & results between levels are called Mappings.

Data Independence:
Data independence is the capacity to change the schema at one level of a data base system without having to change the schema at the next higher level. Types of data independence. (i) Logical Data Independence: (ii) Physical Data Independence: Logical Data independence: It is the capacity to change the conceptual schema without having to change external schemas or application programs. The conceptual schema is changed during data base expansion or reduction of data base. Physical Data Independence: It is the capacity to change the internal schema without having to change the conceptual schemas. Changes to the internal schema are needed because some physical files are to be reorganized.

Data base Languages & Interfaces:


DBMS provide appropriate languages & interfaces for each category of users. (a) Data Definition Language (DDL): This language is used by the DBA & by data base designers to define both conceptual & internal schemas. DDL compiler function is to process DDL statements to identify descriptions of the schema. Constructs and schema description is stored in the DBMS catalog. (b) Storage Definition Language (SDL): It is used to specify the internal schema. The mappings between the two schemas are specified in DDL and SDL.

(c) View Definition Languages (VDL): This language specifies user views and their mappings to the conceptual schema. (d) Data Manipulation Language (DML): This language allows manipulating the data. The manipulations include retrieval, insertion, deletion and modification of the data.

Two types of DML:


(i) High level or non-procedural DML: It is used to specify data base operations. Such DML statements entered interactively from the terminal or to be embedded in a general purpose programming language. (ii) Low-level Procedural DML: It is a procedural DML that is embedded in a general purpose programming language. Such DML retrieves individual records or objects from the data base and processes each separately. Hence, a programming language constructs such as looping, to retrieve a process each record from a set of records. Any DML command either high level or low level are embedded in a HLL, that language is host language and the DML is known as data sublanguage. In high level DML used in a standalone interactive is called a query language.

DBMS Interfaces:
The user friendly interfaces provided by the DBMS are: (i) MENU BASED INTERFACES FOR BROWSING: The interfaces present the user with lists of options called menus. Menus avoid the need to memorize the specific commands & syntax of a query language. (ii) Forms based Interfaces: A form based interface displays a form to each user. Users have to fill out all the form entries to insert the new data. Forms are designed & programmed for nave users as interfaces. DBMS provides various front end tools to design the forms.

2) Graphical User Interfaces: It displays a schema to the user in diagrammatic form. GUIs use both menus & forms. GUIs use pointing device such as mouse, to pick the parts of displayed schema diagram. 3) Natural Language Interfaces: Such interface receives the requests written in human language & tries to understand them. It processes this requests & submits to the DBMS. 4) Interfaces for Parametric Users: Parametric users have small set of operations that must be repeatedly performed. System analysts & programmers design & implement a special interface. 5) Interfaces for the DBA: The data base system contain privileged commands that are used by the DBAs staff.

Database System Environment:


DBMS is a complex software system. The components of the DBMS are: Stored Data base: The data base and DBMS catalog are stored on disk; access to the disk is controlled by the operating system which schedules disk input/output. Storage Data Manager: It is a module of the DBMS which controls access to DBMS information i.e., stored on the disk. It uses basic operating system services for carrying out low level data transfer between the disk and computer data storage. But it also controls other aspects of data transfers such as handling buffers in main memory. DDL Compiler: It processes schema definition, specified in the DDL (data definition language) and stores descriptions of the schemas in the DBMS catalog. The catalog contains information such as names of files, data items storage details of each file, mapping information among schemas and constraints. Run Time Data base processor: Handles database, accesses at run-time receives or updates operations and carries them out on the data base. Query Compiler: Handles high level queries that are entered interactively, it parses, analysis and compiles or interprets a query by creating data base access code and generates calls on the run-time processor for executing the code. Pre Compiler: Extracts DML commands from an application program written in a host programming language.

DML Compiler: Accepts data manipulation language (DML) commands, compiles them into object code. Host language compiler: Other than DML commands, remaining commands are sent to the host language compiler, generates the object code. Canned Transaction: It contains the object code of the DML commands and other program codes, also includes calls to the run-time data base processor. DBA: The responsibility of the DBA is to administering the data base and the secondary resource. End users: This class of users job is to access the data base for querying, updating, and generating reports. Application Programmer: Determines the requirements of the end users and develop specifications for canned transactions that meet these requirements.

Entity Relationship Diagram


When a company asks you to make them a working, functional DBMS which they can work with, there are certain steps to follow. Let us summarize them here: 1. Gathering information: This could be a written document that describes the system in question with reasonable amount of details. 2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the description we have gathered about the system. 3. Designing the database: Out of the ERD we have created, it is very easy to determine the tables, the attributes which the tables must contain and the relationship among these tables. 4. Normalization: This is a process of removing different kinds of impurities from the tables we have just created in the above step. How to Prepare an ERD Step 1 Let us take a very simple example and we try to reach a fully organized database from it. Let us look at the following simple statement: A boy eats an ice cream. This is a description of a real word activity, and we may consider the above statement as a written document (very short, of course). Step 2 Now we have to prepare the ERD. Before doing that we have to process the statement a little. We can see that the sentence contains a subject (boy), an object (ice cream) and a verb (eats) that defines the relationship between the subject and the object. Consider the nouns as entities (boy and ice cream) and the verb (eats) as a relationship. To plot them in the diagram, put the nouns within rectangles and the relationship within a diamond. Also, show the relationship with a directed arrow, starting from the subject entity (boy) towards the object entity (ice cream).

Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every boy must have a name, address, phone number etc. and every ice cream has a manufacturer, flavor, price etc. Without these the diagram is not complete. These items which we mentioned here are known as attributes, and they must be incorporated in the ERD as connected ovals.

But can only entities have attributes? Certainly not. If we want then the relationship must have their attributes too. These attribute do not inform anything more either about the boy or the ice cream, but they provide additional information about the relationships between the boy and the ice cream.

Step 3 We are almost complete now. If you look carefully, we now have defined structures for at least three tables like the following: Boy Name Address Phone Ice Cream Manufacturer Flavor Price Eats Date Time

However, this is still not a working database, because by definition, database should be collection of related tables. To make them connected, the tables must have some common attributes. If we chose the attribute Name of the Boy table to play the role of the common attribute, then the revised structure of the above tables become something like the following. Boy Name Address Phone Ice Cream Manufacturer Flavor Price Name Eats Date Time Name This is as complete as it can be. We now have information about the boy, about the ice cream he has eaten and about the date and time when the eating was done. Cardinality of Relationship While creating relationship between two entities, we may often need to face the cardinality problem. This simply means that how many entities of the first set are related to how many entities of the second set. Cardinality can be of the following three types. One-to-One Only one entity of the first set is related to only one entity of the second set. E.g. A teacher teaches a student. Only one teacher is teaching only one student. This can be expressed in the following diagram as:

One-to-Many Only one entity of the first set is related to multiple entities of the second set. E.g. A teacher teaches students. Only one teacher is teaching many students. This can be expressed in the following diagram as:

Many-to-One Multiple entities of the first set are related to multiple entities of the second set. E.g. Teachers teach a student. Many teachers are teaching only one student. This can be expressed in the following diagram as:

Many-to-Many Multiple entities of the first set is related to multiple entities of the second set. E.g. Teachers teach students. In any school or college many teachers are teaching many students. This can be considered as a two way one-to-many relationship. This can be expressed in the following diagram as:

In this discussion we have not included the attributes, but you can understand that they can be used without any problem if we want to. The Concept of Keys A key is an attribute of a table which helps to identify a row. There can be many different types of keys which are explained here. Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row in a table. Generally they contain unique values and can never contain NULL values. There can be more than one super key or candidate key in a table e.g. within a STUDENT table Roll and Mobile No. can both serve to uniquely identify a student. Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the entire table. E.g. although there are two candidate keys in the STUDENT table, the college would obviously use Roll as the primary key of the table.

Alternate Key: This is the candidate key which is not chosen as the primary key of the table. They are named so because although not the primary key, they can still identify a row. Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single class Roll is enough to find a student, but in the entire school, merely searching by the Roll is not enough, because there could be 10 classes in the school and each one of them may contain a certain roll no 5. To uniquely identify the student we have to say something like class VII, roll no 5. So, a combination of two or more attributes is combined to create a unique combination of values, such as Class + Roll. Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key of its own. To identify its rows, we have to use the primary attribute of a related table. Such a copy of another related tables primary key is called foreign key. Strong and Weak Entity Based on the concept of foreign key, there may arise a situation when we have to relate an entity having a primary key of its own and an entity not having a primary key of its own. In such a case, the entity having its own primary key is called a strong entity and the entity not having its own primary key is called a weak entity. Whenever we need to relate a strong and a weak entity together, the ERD would change just a little. Say, for example, we have a statement A Student lives in a Home. STUDENT is obviously a strong entity having a primary key Roll. But HOME may not have a unique primary key, as its only attribute Address may be shared by many homes (what if it is a housing estate?). HOME is a weak entity in this case. The ERD of this statement would be like the following

As you can see, the weak entity itself and the relationship linking a strong and weak entity must have double border. Different Types of Database There are three different types of data base. The difference lies in the organization of the database and the storage structure of the data. We shall briefly mention them here.

Relational DBMS This is our subject of study. A DBMS is relational if the data is organized into relations, that is, tables. In RDBMS, all data are stored in the well-known row-column format. Hierarchical DBMS In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among data items and the data model is very suitable for representing one-to-many relationship. To access the data items, some kind of tree-traversal techniques are used, such as preorder traversal. Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to organize a hierarchical database into row column format. For example, consider the following hierarchical database that shows four employees (E01, E02, E03, and E04) belonging to the same department D1.

There are two ways to represent the above one-to-many information into a relation that is built in one-to-one relationship. The first is called Replication, where the department id is replicated a number of times in the table like the following. Dept-Id Employee Code D1 D1 D1 D1 E01 E02 E03 E04

Replication makes the same data item redundant and is an inefficient way to store data. A better way is to use a technique called the Virtual Record. While using this, the repeating data item is not used in the table. It is kept at a separate place. The table, instead of containing the repeating information, contains a pointer to that place where the data item is stored.

This organization saves a lot of space as data is not made redundant. Network DBMS The NDBMS is built primarily on a oneto-many relationship, but where a parent-child representation among the data items cannot be ensured. This may happen in any real world situation where any entity can be linked to any entity. The NDBMS was proposed by a group of theorists known as the Database Task Group (DBTG). What they said looks like this: In NDBMS, all entities are called Records and all relationships are called Sets. The record from where the relationship starts is called the Owner Record and where it ends is called Member Record. The relationship or set is strictly one-to-many.

In case we need to represent a many-to-many relationship, an interesting thing happens. In NDBMS, Owner and Member can only have one-to-many relationship. We have to introduce a third common record with which both the Owner and Member can have oneto-many relationship. Using this common record, the Owner and Member can be linked by a many-to-many relationship. Suppose we have to represent the statement Teachers teach students. We have to introduce a third record, suppose CLASS to which both teacher and the student can have a many-to-many relationship. Using the class in the middle, teacher and student can be linked to a virtual many-to-many relationship.

S-ar putea să vă placă și