Documente Academic
Documente Profesional
Documente Cultură
1.10 Types of database applications 1.11 Data Models 1.12 The database system environment 1.13 Centralized and Client-Server DBMS Architectures
VTU-EDUSAT
Page 1
Introduction to Database
1.0 Introduction
Database is a collection of related data. Database management system is software designed to assist the maintenance and utilization of large scale collection of data. DBMS came into existence in 1960 by Charles. Integrated data store which is also called as the first general purpose DBMS. Again in 1960 IBM brought IMS-Information management system. In 1970 Edgor Codd at IBM came with new database called RDBMS. In 1980 then came SQL Architecture- Structure Query Language. In 1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.
Data
Data is raw fact or figures or entity. When activities in the organization takes place, the effect of these activities need to be recorded which is known as Data.
Information
Processed data is called information The purpose of data processing is to generate the information required for carrying out the business activities.
Data capture: Which is the task associated with gathering the data as and when they originate.
Data classification: Captured data has to be classified based on the nature and intended usage.
Data storage: The segregated data has to be stored properly. Data arranging: It is very important to arrange the data properly Data retrieval: Data will be required frequently for further processing, Hence it is very important to create some indexes so that data can be retrieved
VTU-EDUSAT
Page 2
Database
Database may be defined in simple terms as a collection of data A database is a collection of related data. The database can be of any size and of varying complexity. A database may be generated and maintained manually or it may be computerized.
The DBMS is hence a general purpose software system that facilitates the process of defining constructing and manipulating database for various applications.
VTU-EDUSAT
Page 3
Characteristics of DBMS
To incorporate the requirements of the organization, system should be designed for easy maintenance.
Information systems should allow interactive access to data to obtain new information without writing fresh programs.
System should be designed to co-relate different data to meet new requirements. An independent central repository, which gives information and meaning of available data is required.
Integrated database will help in understanding the inter-relationships between data stored in different applications.
The stored data should be made available for access by different users simultaneously.
Automatic recovery feature has to be provided to overcome the problems with processing system failure.
DBMS Utilities
A data loading utility: Which allows easy loading of data from the external format without writing programs. A backup utility: Which allows to make copies of the database periodically to help in cases of crashes and disasters. Recovery utility: Which allows to reconstruct the correct state of database from the backup and history of transactions. Monitoring tools: Which monitors the performance so that internal schema can be changed and database access can be optimized.
VTU-EDUSAT
Page 4
File organization: Which allows restructuring the data from one type to another?
DBMS
1. DBMS is a collection of data and user is not required to write the procedures for
managing the database. 2. DBMS provides an abstract view of data that hides the details. 3. DBMS is efficient to use since there are wide varieties of sophisticated techniques to store and retrieve the data. 4. DBMS takes care of Concurrent access using some form of locking. 5. DBMS has crash recovery mechanism, DBMS protects user from the effects of system failures. 6. DBMS has a good protection mechanism. DBMS = Database Management System RDBMS = Relational Database Management System
VTU-EDUSAT
Page 5
Application program should not be exposed to details of data representation and storage DBMS provides the abstract view that hides these details.
2. Efficient data access.: DBMS utilizes a variety of sophisticated techniques to store and retrieve data
efficiently.
3. Data integrity and security:
Data is accessed through DBMS, it can enforce integrity constraints. E.g.: Inserting salary information for an employee.
4. Data Administration:
When users share data, centralizing the data is an important task, Experience professionals can minimize data redundancy and perform fine tuning which reduces retrieval time.
5. Concurrent access and Crash recovery:
DBMS schedules concurrent access to the data. DBMS protects user from the effects of system failure.
6. Reduced application development time.
DBMS supports important functions that are common to many applications. VTU-EDUSAT Page 6
VTU-EDUSAT
Page 8
A database management system (DBMS) is a collection of programs that enables users to create and maintain database. The DBMS is a general purpose software system that facilitates the process of defining, constructing, manipulating and sharing databases among various users and applications. Defining a database specifying the database involves specifying the data types, constraints and structures of the data to be stored in the database. The descriptive information is also stored in the database in the form database catalog or dictionary; it is called meta-data. Manipulating the data includes the querrying the database to retrieve the specific data. An application program accesses the database by sending the qurries or requests for data to DBMS. The important function provided by the DBMS includes protecting the database and maintain the database.
VTU-EDUSAT
Page 9
VTU-EDUSAT
Page 10
VTU-EDUSAT
Page 11
VTU-EDUSAT
Page 12
VTU-EDUSAT
Page 13
VTU-EDUSAT
Page 14
Requirements Committee). ANSI/SPARC produced an interim report in 1972 followed by a final report in 1977. The reports proposed an architectural framework for databases. Under this approach, a database is considered as containing data about an enterprise. The three levels of the architecture are three different views of the data: External - individual user view Conceptual - community user view Internal - physical or storage view The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence that we have discussed earlier.
The external level is the view that the individual user of the database has. This view is often a restricted view of the database and the same database may provide a number of different views for different classes of users. In general, the end users and even the application programmers are only interested in a subset of the database. For example, a department head may only be interested in the departmental finances and student enrolments but not the library information. The librarian would not be expected to have any interest in the information about academic staff. The payroll office would have no interest in student enrolments.
The conceptual view is the information model of the enterprise and contains the view of the whole enterprise without any concern for the physical implementation. This view is normally more stable than the other two views. In a database, it may be desirable to change the internal view to improve performance while there has been no change in the
VTU-EDUSAT
Page 15
The internal view is the view about the actual physical storage of data. It tells us what data is stored in the database and how. At least the following aspects are considered at this level:
Storage allocation e.g. B-trees, hashing etc. Access paths e.g. specification of primary and secondary keys, indexes and pointers and sequencing. Miscellaneous e.g. data compression and encryption techniques, optimization of the internal structures.
Efficiency considerations are the most important at this level and the data structures are chosen to provide an efficient database. The internal view does not deal with the physical devices directly. Instead it views a physical device as a collection of physical pages and allocates space in terms of logical pages.
The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.
Assuming the three level view of the database, a number of mappings are needed to enable the users working with one of the external views. For example, the payroll office may have an external view of the database that consists of the following information only: Staff number, name and address. VTU-EDUSAT Page 16
The conceptual view of the database may contain academic staff, general staff, casual staff etc. A mapping will need to be created where all the staff in the different categories are combined into one category for the payroll office. The conceptual view would include information about each staff's position, the date employment started, full-time or parttime etc. This will need to be mapped to the salary level for the salary office. Also, if there is some change in the conceptual view, the external view can stay the same if the mapping is changed.
1. Logical data independence is the capacity to change the conceptual schema without having to change the external schema. 2. Physical data independence is the capacity to change the internal schema without changing the conceptual schema.
VTU-EDUSAT
Page 17
1. High Level-conceptual data model: User level data model is the high level or conceptual model. This provides concepts that are close to the way that many users perceive data. 2 .Low level-Physical data model : provides concepts that describe the details of how data is stored in the computer model. Low level data model is only for Computer specialists not for end-user. 3. Representation data model: It is between High level & Low level data model Which provides concepts that may be understood by end-user but that are not too far removed from the way data is organized by within the computer. The most common data models are
1. Relational Model
The Relational Model uses a collection of tables both data and the relationship among those data. Each table have multiple column and each column has a unique name . Relational database comprising of two tables Customer Table.
VTU-EDUSAT
Page 19
Customer Preethi and Rocky share the same account number A-111 Advantages 1. The main advantage of this model is its ability to represent data in a simplified format. 2. The process of manipulating record is simplified with the use of certain key attributes used to retrieve data. 3. Representation of different types of relationship is possible with this model.
2. Network Model
The data in the network model are represented by collection of records and relationships among data are represented by links, which can be viewed as pointers. Preethi 111-222-3456 yelhanka Bangalore
A-101 A-111
1000.00 3000.00
VTU-EDUSAT
Page 20
3. Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. All attributes of a specific record are listed under an entity type.
Advantages: 1. The representation of records is done using an ordered tree, which is natural method of implementation of oneto-many relationships. 2. Proper ordering of the tree results in easier and faster retrieval of records. 3. Allows the use of virtual records. This result in a stable database especially when modification of the data base is made.
VTU-EDUSAT
Page 21
The description of a database. Includes descriptions of the database structure, data types, and the constraints on the database. Schema Diagram:
A component of the schema or an object within the schema, e.g., STUDENT, COURSE. VTU-EDUSAT Page 22
The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. Also called database instance (or occurrence or snapshot). The term instance is also applied to individual database components, e.g. record instance, table instance, entity instance
Refers to the database state when it is initially loaded into the system. Valid State:
Distinction
The database schema changes very infrequently. The database state changes every time the database is updated
VTU-EDUSAT
Page 23
VTU-EDUSAT
Page 24
DBMS Languages
Data Definition Language (DDL) Data Manipulation Language (DML) High-Level or Non-procedural Languages: These include the relational language SQL May be used in a standalone way or may be embedded in a programming language Low Level or Procedural Languages:
VTU-EDUSAT
Types of DML
High Level or Non-procedural Language:
For example, the SQL relational language are set-oriented and specify what data to retrieve rather than how to retrieve it. Also called declarative languages.
Low Level or Procedural Language: Retrieve data one record-at-a-time; Constructs such as looping are needed to retrieve multiple records, along with positioning pointers.
DBMS Interfaces
Stand-alone query language interfaces Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE) VTU-EDUSAT Page 26
VTU-EDUSAT
Page 27
The figure is divided into two halves. The top half of the figure refers to the various users of the database environment and their interfaces. The lower half shows the internals of the DBMS responsible for storage of data and processing of transaction. The database and the DBMS catalog are usually stored on disk.Access to the disk is primarily controlled by operating system(OS).which inclues disk input/Output.A higher level stored data manager module of DBMS controls access to DBMS information that is stored on the disk.
VTU-EDUSAT
Page 28
VTU-EDUSAT
Page 29
Architectures for DBMS have followed trends similar to those generating computer system architectures. Earlier architectures used mainframes computers to provide the main processing for all system functions, including user application programs and user interface programs as well all DBMS functionality. The reason was that most users accessed such systems via computer terminals that did not have processing power and only provided display capabilities. Therefore all processing was performed remotely on the computer system, and only display information and controls were sent from the computer to the display terminals, which were connected to central computer via various types of communication networks. As prices of hardware declined, most users replaced their terminals with PCs and workstations. At first database systems used these computers similarly to how they have used is play terminals, so that DBMS itself was still a Centralized DBMS in which all the DBMS functionality, application program execution and user interface processing were carried out on one Machine. VTU-EDUSAT Page 30
Clients
Provide appropriate interfaces through a client software module to access and utilize the various server resources. Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. Connected to the servers via some form of a network. (LAN: local area network, wireless network, etc.)
DBMS Server
Provides database query and transaction services to the clients Relational DBMS servers are often called SQL servers, query servers, or transaction servers Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as: VTU-EDUSAT Page 31
VTU-EDUSAT
Page 32
Classification of DBMSs
Based on the data model used Traditional: Relational, Network, Hierarchical. Emerging: Object-oriented, Object-relational. Other classifications Single-user (typically used with personal computers) vs. multi-user (most DBMSs). Centralized (uses a single computer with one database) vs. distributed (uses multiple computers, multiple databases)
VTU-EDUSAT
Page 34
Entity-Relationship Model
Introduction to ER Model
ER model is represents real world situations using concepts, which are commonly used by people. It allows defining a representation of the real world at logical level.ER model has no facilities to describe machine-related aspects. In ER model the logical structure of data is captured by indicating the grouping of data into entities. The ER model also supports a top-down approach by which details can be given in successive stages. Entity: An entity is something which is described in the database by storing its data, it may be a concrete entity a conceptual entity. Entity set: An entity set is a collection of similar entities. Attribute: An attribute describes a property associated with entities. Attribute will have a name and a value for each entity. Domain: A domain defines a set of permitted values for a attribute
VTU-EDUSAT
Page 1
VTU-EDUSAT
Page 2
The company is organized into DEPARTMENTs. Each department has a name, number and an employee who manages the department. We keep track of the start date of the department manager. A department may have several locations. Each department controls a number of PROJECTs. Each project has a unique name, unique number and is located at a single location. We store each EMPLOYEEs social security number, address, salary, sex, and birth date. Each employee works for one department but may work on several projects. VTU-EDUSAT Page 3
ER Model Concepts
Entities and Attributes Entities are specific objects or things in the mini-world that are represented in the database. For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT.
Attributes are properties used to describe an entity. For example an EMPLOYEE entity may have the attributes Name, SSN, Address, Sex, BirthDate .
A specific entity will have a value for each of its attributes. For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55
Each attribute has a value set (or data type) associated with it e.g. integer, string, subrange, enumerated type,
Types of Attributes
There are two types of Attributes Simple Each entity has a single atomic value for the attribute. For example, SSN or Sex. VTU-EDUSAT Page 4
Multi-valued
An entity may have multiple values for that attribute. For example, Color of a CAR or Previous Degrees of a STUDENT. Denoted as {Color} or {Previous Degrees}. In general, composite and multi-valued attributes may be nested arbitrarily to any number of levels, although this is rare. For example, Previous Degrees of a STUDENT is a composite multi-valued attribute denoted by {Previous Degrees (College, Year, Degree, Field)} Multiple Previous Degrees values can exist. Each has four subcomponent attributes: College, Year, Degree, Field
VTU-EDUSAT
Page 5
An attribute of an entity type for which each entity must have a unique value is called a key attribute of the entity type. For example, SSN of EMPLOYEE.
A key attribute may be composite. Vehicle Tag Number is a key of the CAR entity type with components (Number, State).
An entity type may have more than one key. The CAR entity type may have two keys: VehicleIdentificationNumber (popularly called VIN) VehicleTagNumber (Number, State), license plate number. Each key is underlined
VTU-EDUSAT
Page 6
Entity Set
Each entity type will have a collection of entities stored in the database Called the entity set. The above example shows three CAR entity instances in the entity set for CAR Same name (CAR) used to refer to both the entity type and the entity set. Entity set is the current state of the entities of thattype that are stored in the database.
VTU-EDUSAT
Page 8
ER model has three main concepts: Entities (and their entity types and entity sets) Attributes (simple, composite, multi valued) Relationships (and their relationship types and relationship sets)
Relationships of the same type are grouped or typed into a relationship type. For example, the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate.
The degree of a relationship type is the number of participating entity type. Both MANAGES and WORKS_ON are binary relationships.
VTU-EDUSAT
Page 9
Relationship instances of the M:N WORKS_ON relationship between EMPLOYEE and PROJECT
VTU-EDUSAT
Page 10
Relationship Set: The current set of relationship instances represented in the database. The current state of a relationship type. Previous figures displayed the relationship sets Each instance in the set relates individual participating entities one from each participating entity type.
In ER diagrams, we represent the relationship type as follows: Diamond-shaped box is used to display a relationship type. Connected to the participating entity types via straight lines.
VTU-EDUSAT
Page 11
Relationship Types
In the refined design, some attributes from the initial entity types are refined into relationships:
Manager of DEPARTMENT -> MANAGES Works_on of EMPLOYEE -> WORKS_ON Department of EMPLOYEE -> WORKS_FOR etc
In general, more than one relationship type can exist between the same participating entity types MANAGES and WORKS_FOR are distinct relationship types between EMPLOYEE and DEPARTMENT Different meanings and different relationship instances.
VTU-EDUSAT
Page 12
Entities are identified by the combination of: A partial key of the weak entity type The particular entity they are related to in the identifying entity type.
Example:
A DEPENDENT entity is identified by the dependents first name, and the specific EMPLOYEE with whom the dependent is related. Name of DEPENDENT is the partial key. DEPENDENT is a weak entity type. EMPLOYEE is its identifying entity type via the identifying relationship type DEPENDENT_OF
Constraints on Relationships
Constraints on Relationship Types (Also known as ratio constraints) VTU-EDUSAT Page 13
VTU-EDUSAT
Page 14
VTU-EDUSAT
Page 15
VTU-EDUSAT
Page 16
A value of HoursPerWeek depends on a particular (employee, project) combination Most relationship attributes are used with M:N relationships. In 1:N relationships, they can be transferred to the entity type on the N-side of the relationship.
VTU-EDUSAT
Page 17
VTU-EDUSAT
Page 19
If needed, the binary and n-ary relationships can all be included in the schema design (see Figure 3.17a and b, where all relationships convey different meanings)
In some cases, a ternary relationship can be represented as a weak entity if the data model allows a weak entity type to have multiple identifying relationships (and hence multiple owner entity types) (see Fig 3.17c)
VTU-EDUSAT
Page 20
If a particular binary relationship can be derived from a higher-degree relationship at all times, then it is redundant.
For example, the TAUGHT_DURING binary relationship in Figure 3.18 (see next slide) can be derived from the ternary relationship OFFERS (based on the meaning of the relationships)
VTU-EDUSAT
Page 21
Bank Database
VTU-EDUSAT
Page 22
2.2.2 Entities and Entity sets: An Entity is any object of interest to and organization or for the representation in the database.They represent objects in the real world which is distinguishable from all other objects. For eg: Every person in a college is an entity. Every room in a college is an entity. Associated with an entity is a set of properties.These properties are used to distinguish to from one entity to another entity. For Eg:1.The Attributes of the entity of student are USN,Name,Address. 2.The Attributes of the Entity Of Vehicle are Vehicle no,Make,Capacity. For the purpose of accessing and storing information. Only certain attributes are used.Those attributes which uniquely identify every instance of the entity is termed as primary key.
An Entity which has a set of attributes.Which can uniquely identify all the entities is termed as Strong entity. An entity whose primary key does not determine all the instance of the entity uniquely termed as weak entity.
A collection of similar entities,Which has certain properties which are common forms an entity set for organization such as a college the object of concern include. Student,Teacher,Rooms,Subjects.The collection of similar entities forms entity set. 2.2.3 Attributes.
VTU-EDUSAT
Page 23
Types of attributes: 1.Simple Attributes: The attributes which cannot be further divided into subparts. Eg; University Seat Number of a student is unique which cannot be further divided. 2. Composite Attributes :The attributes can be further divided into portions. Eg: The attribute name in the Student Database can be further divided into First name,Middle name,Last name. Name Firstname Middle name Last name 3. Single valued attributes : The attribute at any instant contains only a specific value at any instant. for eg The USN is unique 4.Multivalued Attributes; Certain attributes for example the dependent name in the policy database may have set of values assigned to it.There may be more than one dependent for a single policy holder. 5.Stored Attributes:For a person entity,the value of age can be determined from the current date and the value of that persons birthdate .The Age attribute is hence derived attribute and is said to be derivable from the birthdate attributes,which is called a stored attributes. 6.NULL Attributes: A NULL value attribute is used when an attributes does not have any values.
Data integrity
Data is accepted based on certain rules & there fore data is valid. Enforcing data integrity ensures that the data in the database is valid and correct. Keys play an important role in maintaining data integrity.
The various types of keys that have been identified are : Candidate key Primary key Alternate key Composite key Foreign Key VTU-EDUSAT Page 24
An attribute or set of attributes that uniquely identifies a row is called a Candidate key.
This attribute has values that are unique Vehicle
Primary Key
The Candidate key that you choose to identify each row uniquely is called the Primary key.
Alternate Key
In certain tables, a single attribute cannot be used to identify rows uniquely and a combination of two or more attributes is used as a Primary key. Such keys are called Composite keys.
Purchase
Foreign Key
When a primary key of one table appears as an attribute in another table, it is called the Foreign key in the second table A foreign key is used to relate two tables.
Weak entity:
VTU-EDUSAT
Page 25
Relationships
A relationship type is a meaningful association between entity types A relationship is an association of entities where the association includes one entity from each participating entity type. Relationship types are represented on the ER diagram by a series of lines. As always, there are many notations in use today... In the original Chen notation, the relationship is placed inside a diamond, e.g. managers manage employees:
Figure : Chens notation for relationships For this module, we will use an alternative notation, where the relationship is a label on the line. The meaning is identical
VTU-EDUSAT
Page 26
It is possible to have a n-array relationship (e.g. quaternary or unary). Unary relationships are also known as a recursive relationship.
It is a relationship where the same entity participates more than once in different roles. In the example above we are saying that employees are managed by employees. If we wanted more information about who manages whom, we could introduce a second entity type called manager.
VTU-EDUSAT
Page 27
This can result in the loss of some information - It is no longer clear which sales assistant sold a customer a particular product. Try replacing the ternary relationship with an entity type and a set of binary relationships. Relationships are usually verbs, so name the new entity type by the relationship verb rewritten as a noun. The relationship sells can become the entity type sale.
So a sales assistant can be linked to a specific customer and both of them to the sale of a particular product. This process also works for higher order relationships.
Cardinality
Relationships are rarely one-to-one For example, a manager usually manages more than one employee VTU-EDUSAT Page 28
A one to may relationship - one manager manages many employees, but each employee only has one manager, so it is a one to many (1:n) relationship
A many to one relationship - many students study one course. They do not study more than one course, so it is a many to one (m:1) relationship
A many to many relationship - One lecturer teaches many students and a student is taught by many lecturers, so it is a many to many (m:n) relationship
VTU-EDUSAT
Page 29
2.3.4 Optionality
A relationship can be optional or mandatory. If the relationship is mandatory an entity at one end of the relationship must be related to an entity at the other end. The optionality can be different at each end of the relationship For example, a student must be on a course. This is mandatory. To the relationship `student studies course' is mandatory. But a course can exist before any students have enrolled. Thus the relationship `course is_studied_by student' is optional. To show optionality, put a circle or `0' at the `optional end' of the relationship. As the optional relationship is `course is_studied_by student', and the optional part of this is the student, then the `O' goes at the student end of the relationship connection.
It is important to know the optionality because you must ensure that whenever you create a new entity it has the required mandatory links.
2.4.1Entities
Bus - Company owns busses and will hold information about them. Route - Buses travel on routes and will need described. Town - Buses pass through towns and need to know about them Driver - Company employs drivers, personnel will hold their data.
VTU-EDUSAT
Page 30
VTU-EDUSAT
Page 31
Attributes
Bus (reg-no,make,size,deck,no-pass) Route (route-no,avg-pass) Driver (emp-no,name,address,tel-no) Town (name) Stage (stage-no) Garage (name,address) Example: Entity and Relationship sets for the hospital called General Hospital, Patients, Doctors, Beds, Examines, Bed Assigned, Accounts, has Account. patients, entity set with attributes SSNo, LastName, FirstName, HomePhone, Sex, DateofBirth, Age, Street, City, State, Zip. doctors, entity set with attributes SSNo, LastName, FirstName, OfficePhone, Pager, Specialty. examines, relational set with attributes Date, Time, Diagnosis, Fee. beds, entity set with attributes RoomNumber, BedNumber, Type, Status, PricePerHour. VTU-EDUSAT Page 32
VTU-EDUSAT
Page 33
Also do not include the system as an entity type e.g. if modelling a library, the entity types might be books, borrowers, etc. The library is the system, thus should not be an entity type.
3. List the attributes of each entity (all properties to describe the entity which are relevant to the application).
o o o o
Ensure that the entity types are really needed. are any of them just attributes of another entity type? if so keep them as attributes and cross them off the entity list. Do not have attributes of one entity as attributes of another entity!
Which attributes uniquely identify instances of that entity type? This may not be possible for some weak entities.
ER modelling is an iterative process, so draw several versions, refining each one until you are happy with it. Note that there is no one right answer to the problem, but some solutions are better than others! Overview
construct an ER model understand the problems associated with ER models understand the modelling concepts of Enhanced ER modelling
VTU-EDUSAT
Page 34
Entity integrity ensures that each row can be uniquely identified by an attribute called the Primary key. The Primary key cannot have a NULL value.
Domain integrity
Domain integrity refers to the range of valid entries for a given column. It ensures that there are only valid entries in the column.
Referential integrity
Referential integrity ensures that for every value of a Foreign key, there is a matching value of the Primary key.
Informal Definitions
RELATION:
A Relation is table of values. A relation may be thought of as a set of rows. A relation may alternately be though of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. Sometimes row-ids or sequential numbers are assigned to identify the rows in the table. Each column typically is called by its column name or column header or attribute name.
Formal definitions
A Relation may be defined in multiple ways. The Schema of a Relation: R (A1, A2, .....An) Relation schema R is defined over attributes A1, A2, .....An.
For Example -
CUSTOMER (Cust-id, Cust-name, Address, Phone#) Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers. VTU-EDUSAT Page 1
Example
VTU-EDUSAT
Page 4
Entity Integrity
Relational Database Schema Schema: : A set S of relation schemas that belong to the same database. S is the name of the database. S = {R1, R2, ..., Rn} Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] null for any tuple t in r(R) Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.
Referential Integrity
The initial design is typically not complete complete. Some aspects in the requirements will be represented as relationships relationships. ER model has three main concepts: VTU-EDUSAT Page 5
VTU-EDUSAT
Page 6
VTU-EDUSAT
Page 7
VTU-EDUSAT
Page 8
VTU-EDUSAT
Page 9
Relational Algebra
Relational Algebra consists of several groups of operations Unary Relational Operations SELECT (symbol: s (sigma)) PROJECT (symbol: (pi)) RENAME (symbol: (rho))
Relational Algebra Operations From Set Theory UNION ( U ), INTERSECTION ( ), DIFFERENCE (or MINUS, ) CARTESIAN PRODUCT ( x )
SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a relation based on a selection condition condition. The selection condition acts as a filter and keeps eeps only those tuples that satisfy the qualifying condition condition. Tuples satisfying the condition are selected wher whereas eas the other tuples are discarded (filtered out) Database State for COMPANY
VTU-EDUSAT
Page 11
DNO = 4 (EMPLOYEE) Select the employee tuples whose salary is greater than $30,000: SALARY > 30,000 (EMPLOYEE) In general, the select operation is denoted by <selection condition>(R) where the symbol (sigma) is used to denote the select operator the selection condition is a Boolean (conditional) expression specified on the attributes of relation R tuples that make the condition true are selected (appear in the result of the operation) tuples that make the condition false are filtered out (discarded from the result of the operation) The Boolean expression specified in <selection condition> is made up of a number of clauses of the form: <attribute name> <comparison op> <constant value> or <attribute name> <comparison op> <attribute name> Where <attribute name> is the name of an attribute of R, <comparison op> id normally one of the operations {=,>,>=,<,<=,!=} Clauses can be arbitrarily connected by the Boolean operators and, or and not For example, To select the tuples for all employees who either work in department 4 and make over $25000 per year, or work in department 5 and make over $30000, the select operation should be:
VTU-EDUSAT
Page 12
VTU-EDUSAT
Page 13
SELECT Operation Properties SELECT s is commutative: <condition1>( < condition2> (R)) = <condition2> ( < condition1> (R)) A cascade of SELECT operations may be replaced by a single selection with a conjunction of all the conditions: <cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)
PROJECT
PROJECT Operation is denoted by p (pi) If we are interested in only certain attributes of relation, we use PROJECT This operation keeps certain columns (attributes) from a relation and discards the other columns. PROJECT creates a vertical partitioning The list of specified columns (attributes) is kept in each tuple. tuple The other attributes in each tuple are discarded discarded. VTU-EDUSAT Page 14
RENAME
The RENAME operator is denoted by (rho) In some cases, we may want to rename the attributes of a relation or the relation name or both Useful when a query requires multiple operations Necessary in some cases (see JOIN operation later) RENAME operation which can rename either the relation name or the attribute names, or both
VTU-EDUSAT
Page 16
UNION
It is a Binary operation, denoted by U The result of R S, is a relation that includes all tuples that are either in R or in S or in both R and S Duplicate tuples are eliminated The two operand relations R and S must be type compatible (or UNION compatible) R and S must have same number of attributes Each pair of corresponding attributes must be type compatible (have same or compatible domains) Example: To retrieve the social security numbers of all employees who either work in department 5 (RESULT1 below) or directly supervise an employee who works in department 5 (RESULT2 below)
VTU-EDUSAT
Page 17
VTU-EDUSAT
Page 18
VTU-EDUSAT
Page 19
VTU-EDUSAT
Page 20
CARTESIAN PRODUCT
VTU-EDUSAT
Page 21
VTU-EDUSAT
Page 22
ACTUAL_DEPS SSN=ESSN(EMP_DEPENDENTS) RESULT FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS) Binary Relational Operations Division Join
VTU-EDUSAT
Page 23
Division
Interpretation of the division operation A/B: - Divide the attributes of A into 2 sets: A1 and A2. - Divide the attributes of B into 2 sets: B2 and B3. - Where the sets A2 and B2 have the same attributes. - For each set of values in B2: - Search in A2 for the sets of rows (having the same A1 values) whose A2 values (taken together) form a set which is the same as the set of B2s. - For all the set of rows in A which satisfy the above search, pick out their A1 values and put them in the answer.
VTU-EDUSAT
Page 24
VTU-EDUSAT
Page 25
JOIN
JOIN Operation (denoted by )
The sequence of CARTESIAN PRODECT followed by SELECT is used quite commonly to identify and select related tuples from two relations This operation is very important for any relational database with more than a single relation, lation, because it allows us combine related tuples from various relations The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is: R expressions. Example: Suppose that we want to retrieve the name of the manager of each department. VTU-EDUSAT Page 26
<join condition>S
where R and S can be any relations that result from general relational algebra
EMPLOYEE
VTU-EDUSAT
Page 27
The join condition is called theta Theta can be any general boolean expression on the attributes of R and S; for example: R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
EQUIJOIN
The most common use of join involves join conditions with equality comparisons only Such a join, where the only comparison operator used is =, is called an EQUIJOIN. The JOIN seen in the previous example was an EQUIJOIN
NATURAL JOIN
Another variation of JOIN called NATURAL JOIN denoted by * It was created to get rid of the second (superfluous) attribute in an EQUIJOIN condition. Another example: Q R(A,B,C,D) * S(C,D,E) The implicit join condition includes each pair of attributes with the same name, ANDed NDed together: R.C=S.C AND R.D = S.D Result keeps only one attribute of each such pair: Q(A,B,C,D,E)
VTU-EDUSAT
Page 28
VTU-EDUSAT
Page 29
= <join condition> (R X S)
VTU-EDUSAT
Page 30
NATURAL JOIN
Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write: DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS Only attribute with the same name is DNUMBER An implicit join condition is created based on this attribute: DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER VTU-EDUSAT Page 31
VTU-EDUSAT
Page 32
if no matching tuple is found in S, then the attributes of S in the join result are filled
A third operation, full outer join, denoted by keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with null values as needed.
VTU-EDUSAT
Page 33
Outer join
VTU-EDUSAT
Page 34
Examples of Queries in Relational Algebra Q1: Retrieve the name and address of all employees who work for the Research department.
EMPLOYEE)
Q6: Retrieve the names of employees who have no dependents. ALL_EMPS SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) ESSN(DEPENDENT) EMPS_WITHOUT_DEPS (ALL_EMPS - EMPS_WITH_DEPS) RESULT LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)
VTU-EDUSAT
Page 35
What is SQL?
SQL stands for Structured Query Language SQL lets you access and manipulate databases SQL is an ANSI (American National Standards Institute) standard
Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data. Below is an example of a table called "Persons": P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur
The table above contains three records (one for each person) and five columns (P_Id, LastName, FirstName, Address, and City).
SQL Statements
Most of the actions you need to perform on a database are done with SQL statements. The following SQL statement will select all the records in the "Persons" table:
SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database
The DDL part of SQL permits database tables to be created or deleted. It also define indexes (keys), specify links between tables, and impose constraints between tables. The most important DDL statements in SQL are:
CREATE DATABASE - creates a new database ALTER DATABASE - modifies a database CREATE TABLE - creates a new table ALTER TABLE - modifies a table DROP TABLE - deletes a table
Now we want to select the content of the columns named "LastName" and "FirstName" from the table above. We use the following SELECT statement:
SELECT * Example
Now we want to select all the columns from the "Persons" table. We use the following SELECT statement:
Now we want to select only the distinct values from the column named "City" from the table above. We use the following SELECT statement:
SQL WHERE Syntax SELECT column_name(s) FROM table_name WHERE column_name operator value
The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur
Now we want to select only the persons living in the city "Bangalore" from the table above. We use the following SELECT statement:
This is correct: SELECT * FROM Persons WHERE FirstName='Pranav' This is wrong: SELECT * FROM Persons WHERE FirstName=Pranav
For numeric values:
This is correct: SELECT * FROM Persons WHERE Year=1965 This is wrong: SELECT * FROM Persons WHERE Year='1965'
Greater than or equal Less than or equal Search for a pattern If you know the exact value you want to return for at least one of the columns
Now we want to select only the persons with the first name equal to "Pranav" AND the last name equal to "Kumar": We use the following SELECT statement:
OR Operator Example
Now we want to select only the persons with the first name equal to "Pranav" OR the first name equal to "Mounitha":
SQL ORDER BY Syntax SELECT column_name(s) FROM table_name ORDER BY column_name(s) ASC|DESC
ORDER BY Example
The "Persons" table: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Tom Address VPura Yelhanka Hebbal Vingvn 23 City Bangalore Bangalore Tumkur Tumkur
Now we want to select all the persons from the table above, however, we want to sort the persons by their last name. We use the following SELECT statement:
INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)
Now we want to insert a new row in the "Persons" table. We use the following SQL statement:
The following SQL statement will add a new row, but only add data in the "P_Id", "LastName" and the "FirstName" columns:
INSERT INTO Persons (P_Id, LastName, FirstName) VALUES (5, 'Tjessem', 'Jakob')
The "Persons" table will now look like this: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 City Bangalore Bangalore Tumkur Tumkur
SQL UPDATE Syntax UPDATE table_name SET column1=value, column2=value2,... WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which record or records that should be updated. If you omit the WHERE clause, all records will be updated!
Now we want to update the person "Tjessem, Jakob" in the "Persons" table. We use the following SQL statement:
UPDATE Persons SET Address='Nissestien 67', City='Bangalore' WHERE LastName='Tjessem' AND FirstName='Jakob'
The "Persons" table will now look like this: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 Nissestien 67 City Bangalore Bangalore Tumkur Tumkur Bangalore
Now we want to delete the person "Tjessem, Jakob" in the "Persons" table. We use the following SQL statement:
Now we want to select only the two first records in the table above. We use the following SELECT statement:
Now we want to select only 50% of the records in the table above. We use the following SELECT statement:
SQL LIKE Syntax SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern
Now we want to select the persons living in a city that starts with "B" from the table above. We use the following SELECT statement:
P_Id 1 2
Next, we want to select the persons living in a city that ends with an "r" from the "Persons" table. We use the following SELECT statement:
Next, we want to select the persons living in a city that contains the pattern "mk" from the "Persons" table. We use the following SELECT statement:
It is also possible to select the persons living in a city that NOT contains the pattern "mk" from the "Persons" table, by using the NOT keyword. We use the following SELECT statement:
SQL Wildcards
SQL wildcards can be used when searching for data in a database.
SQL Wildcards
SQL wildcards can substitute for one or more characters when searching for data in a database. SQL wildcards must be used with the SQL LIKE operator. With SQL, the following wildcards can be used: Wildcard % _ [charlist] [^charlist] or [!charlist] Description A substitute for zero or more characters A substitute for exactly one character Any single character in charlist Any single character not in charlist
Next, we want to select the persons with a last name that starts with "P", followed by any character, followed by "an", followed by any character, followed by "v" from the "Persons" table. We use the following SELECT statement:
Next, we want to select the persons with a last name that do not start with "b" or "s" or "p" from the "Persons" table. We use the following SELECT statement:
SQL IN Operator
The IN Operator
The IN operator allows you to specify multiple values in a WHERE clause.
IN Operator Example
The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur
Now we want to select the persons with a last name equal to "Kumari" or "Gubbi" from the table above. We use the following SELECT statement:
SQL BETWEEN Syntax SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2
Now we want to select the persons with a last name alphabetically between "Kumari" and "Gubbi" from the table above. We use the following SELECT statement:
Note: The BETWEEN operator is treated differently in different databases. In some databases a person with the LastName of "Kumari" or "Gubbi" will not be listed (BETWEEN only selects fields that are between and excluding the test values). In other databases a person with the last name of "Kumari" or "Gubbi" will be listed (BETWEEN selects fields that are between and including the test values). And in other databases a person with the last name of "Kumari" will be listed, but "Gubbi" will not be listed (BETWEEN selects fields between the test values, including the first test value and excluding the last test value). Therefore: Check how your database treats the BETWEEN operator.
Example 2
To display the persons outside the range in the previous example, use NOT BETWEEN:
SQL Alias
With SQL, an alias name can be given to a table or to a column.
SQL Alias
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names. An alias name could be anything, but usually it is short.
SQL Alias Syntax for Tables SELECT column_name(s) FROM table_name AS alias_name SQL Alias Syntax for Columns SELECT column_name AS alias_name FROM table_name
Alias Example
Assume we have a table called "Persons" and another table called "Product_Orders". We will give the table aliases of "p" an "po" respectively. Now we want to list all the orders that "Mounitha Kumari" is responsible for. We use the following SELECT statement:
SELECT po.OrderID, p.LastName, p.FirstName FROM Persons AS p, Product_Orders AS po WHERE p.LastName='Kumari' WHERE p.FirstName='Mounitha'
The same SELECT statement without aliases:
WHERE Persons.FirstName='Mounitha'
As you'll see from the two SELECT statements above; aliases can make queries easier to both write and to read.
SQL Joins
SQL joins are used to query data from two or more tables, based on a relationship between certain columns in these tables.
SQL JOIN
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables. Tables in a database are often related to each other with keys. A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table. Look at the "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur
Note that the "P_Id" column is the primary key in the "Persons" table. This means that no two rows can have the same P_Id. The P_Id distinguishes two persons even if they have the same name. Next, we have the "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15
Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id" column refers to the persons in the "Persons" table without using their names. Notice that the relationship between the two tables above is the "P_Id" column.
Before we continue with examples, we will list the types of JOIN you can use, and the differences between them.
JOIN: Return rows when there is at least one match in both tables LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table FULL JOIN: Return rows when there is a match in one of the tables
SQL INNER JOIN Syntax SELECT column_name(s) FROM table_name1 INNER JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: INNER JOIN is the same as JOIN.
The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15
Now we want to list all the persons with any orders. We use the following SELECT statement:
The INNER JOIN keyword return rows when there is at least one match in both tables. If there are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.
SQL LEFT JOIN Syntax SELECT column_name(s) FROM table_name1 LEFT JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.
The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15
Now we want to list all the persons and their orders - if any, from the tables above.
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons LEFT JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi Kumar FirstName Mounitha Mounitha Sharan Sharan Pranav OrderNo 22456 24562 77895 44678
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in the right table (Orders).
SQL RIGHT JOIN Syntax SELECT column_name(s) FROM table_name1 RIGHT JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.
2 3 4 5
3 1 1 15
Now we want to list all the orders with containing persons - if any, from the tables above. We use the following SELECT statement:
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons RIGHT JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi FirstName Mounitha Mounitha Sharan Sharan OrderNo 22456 24562 77895 44678 34764 The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no matches in the left table (Persons
SQL FULL JOIN Syntax SELECT column_name(s) FROM table_name1 FULL JOIN table_name2 ON table_name1.column_name=table_name2.column_name
The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15
Now we want to list all the persons and their orders, and all the orders with their persons. We use the following SELECT statement:
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons FULL JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi Kumar FirstName Mounitha Mounitha Sharan Sharan Pranav 34764 The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from the right table (Orders). If there are rows in "Persons" that do not have matches in "Orders", or if there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as well. OrderNo 22456 24562 77895 44678
SQL UNION Syntax SELECT column_name(s) FROM table_name1 UNION SELECT column_name(s) FROM table_name2
Note: The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.
SQL UNION ALL Syntax SELECT column_name(s) FROM table_name1 UNION ALL SELECT column_name(s) FROM table_name2
PS: The column names in the result-set of a UNION are always equal to the column names in the first SELECT statement in the UNION.
Now we want to list all the different employees in Norway and USA. We use the following SELECT statement:
Note: This command cannot be used to list all employees in India and USA. In the example above we have two employees with equal names, and only one of them will be listed. The UNION command selects only distinct values.
SELECT E_Name FROM Employees_India UNION ALL SELECT E_Name FROM Employees_USA
Result E_Name Kumari, Mounitha Kumar, Pranav Kumar, Stephen Gubbi, Sharan Turner, Sally Kent, Clark Kumar, Stephen Scott, Stephen
SELECT column_name(s)
SELECT Persons.LastName,Orders.OrderNo INTO Persons_Order_Backup FROM Persons INNER JOIN Orders ON Persons.P_Id=Orders.P_Id
SQL CREATE TABLE Syntax CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, column_name3 data_type, .... )
The data type specifies what type of data the column can hold. For a complete reference of all the data types available in MS Access, MySQL, and SQL Server, go to our complete Data Types reference.
( P_Id int, LastName varchar(255), FirstName varchar(255), Address varchar(255), City varchar(255) )
The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City columns are of type varchar with a maximum length of 255 characters. The empty "Persons" table will now look like this: P_Id LastName FirstName Address City
The empty table can be filled with data with the INSERT INTO statement
SQL Constraints
SQL Constraints
Constraints are used to limit the type of data that can go into a table. Constraints can be specified when a table is created (with the CREATE TABLE statement) or after the table is created (with the ALTER TABLE statement). We will focus on the following constraints:
The following SQL enforces the "P_Id" column and the "LastName" column to not accept NULL values:
CREATE TABLE Persons ( P_Id int NOT NULL, LastName varchar(255) NOT NULL, FirstName varchar(255), Address varchar(255), City varchar(255) )
CREATE TABLE Persons ( P_Id int NOT NULL, LastName varchar(255) NOT NULL, FirstName varchar(255), Address varchar(255), City varchar(255), PRIMARY KEY (P_Id) )
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:
3 4
22456 24562
2 1
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table. The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table. The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table. The FOREIGN KEY constraint is used to prevent actions that would destroy link between tables. The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign key column, because it has to be one of the values contained in the table it points to.
CREATE TABLE Orders ( O_Id int NOT NULL, OrderNo int NOT NULL, P_Id int, PRIMARY KEY (O_Id), FOREIGN KEY (P_Id) REFERENCES Persons(P_Id) )
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on
Now we want to add a column named "DateOfBirth" in the "Persons" table. We use the following SQL statement:
P_Id 1 2 3
DateOfBirth
SQL Views
A view is a virtual table. This chapter shows how to create, update, and delete a view.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.
SQL CREATE VIEW Syntax CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
Note: A view always shows up-to-date data! The database engine recreates the data, using the view's SQL statement, every time a user queries a view.
CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName FROM Products WHERE Discontinued=No
We can query the view above as follows:
CREATE VIEW [Products Above Average Price] AS SELECT ProductName,UnitPrice FROM Products WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
We can query the view above as follows:
CREATE VIEW [Category Sales For 1997] AS SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales FROM [Product Sales for 1997] GROUP BY CategoryName
We can query the view above as follows:
We can also add a condition to the query. Now we want to see the total sale only for the category "Beverages":
SQL CREATE OR REPLACE VIEW Syntax CREATE OR REPLACE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
Now we want to add the "Category" column to the "Current Product List" view. We will update the view with the following SQL:
CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName,Category FROM Products WHERE Discontinued=No
AVG() - Returns the average value COUNT() - Returns the number of rows FIRST() - Returns the first value LAST() - Returns the last value MAX() - Returns the largest value MIN() - Returns the smallest value SUM() - Returns the sum
UCASE() - Converts a field to upper case LCASE() - Converts a field to lower case MID() - Extract characters from a text field LEN() - Returns the length of a text field ROUND() - Rounds a numeric field to the number of decimals specified NOW() - Returns the current system date and time FORMAT() - Formats how a field is to be displayed
Tip: The aggregate functions and the scalar functions will be explained in details in the next chapters.
2 3 4 5 6
Now we want to find the average value of the "OrderPrice" fields. We use the following SQL statement:
Now we want to count the number of orders from "Customer Nilsen". We use the following SQL statement:
Now we want to find the largest value of the "OrderPrice" column. We use the following SQL statement:
LargestOrderPrice 2000
Now we want to find the smallest value of the "OrderPrice" column. We use the following SQL statement:
Now we want to find the sum of all "OrderPrice" fields". We use the following SQL statement:
SQL GROUP BY Syntax SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name
4 5 6
Now we want to find the total sum (total order) of each customer. We will have to use the GROUP BY statement to group the customers. We use the following SQL statement:
The result-set above is not what we wanted. Explanation of why the above SELECT statement cannot be used: The SELECT statement above has two columns specified (Customer and SUM(OrderPrice). The "SUM(OrderPrice)" returns a single value (that is the total sum of the "OrderPrice" column), while "Customer" returns 6 values (one value for each row in the "Orders" table). This will therefore not give us the correct result. However, you have seen that the GROUP BY statement solves this problem.
SQL HAVING Syntax SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name HAVING aggregate_function(column_name) operator value
Now we want to find if any of the customers have a total order of less than 2000. We use the following SQL statement:
Now we want to find if the customers "Kumari" or "Jensen" have a total order of more than 1500. We add an ordinary WHERE clause to the SQL statement:
SELECT Customer,SUM(OrderPrice) FROM Orders WHERE Customer='Kumari' OR Customer='Jensen' GROUP BY Customer HAVING SUM(OrderPrice)>1500
The result-set will look like this:
Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the "LastName" column to uppercase. We use the following SELECT statement:
Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the "LastName" column to lowercase. We use the following SELECT statement:
Now we want to extract the first four characters of the "City" column above.
Now we want to select the length of the values in the "Address" column above. We use the following SELECT statement:
The ROUND() function is used to round a numeric field to the number of decimals specified.
Now we want to display the product name and the price rounded to the nearest integer. We use the following SELECT statement:
2 3
Mascarpone GorgonzMounitha
1000 g 1000 g
32.56 15.67
Now we want to display the products and prices per today's date. We use the following SELECT statement:
Now we want to display the products and prices per today's date (with today's date displayed in the following format "YYYY-MM-DD"). We use the following SELECT statement:
ALTER TABLE
CREATE INDEX
DELETE
DELETE FROM table_name (Note: Deletes the entire table!!) DELETE * FROM table_name (Note: Deletes the entire table!!) DROP DATABASE DROP INDEX DROP DATABASE database_name DROP INDEX table_name.index_name (SQL Server) DROP INDEX index_name ON table_name (MS Access) DROP INDEX index_name (DB2/Oracle) ALTER TABLE table_name DROP INDEX index_name (MySQL) DROP TABLE table_name SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name HAVING aggregate_function(column_name) operator value SELECT column_name(s) FROM table_name WHERE column_name IN (value1,value2,..) INSERT INTO table_name VALUES (value1, value2, value3,....) or INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,....) INNER JOIN SELECT column_name(s) FROM table_name1 INNER JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 LEFT JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 RIGHT JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 FULL JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern SELECT column_name(s) FROM table_name ORDER BY column_name [ASC|DESC] SELECT column_name(s) FROM table_name
HAVING
IN
INSERT INTO
LEFT JOIN
RIGHT JOIN
FULL JOIN
LIKE
ORDER BY
SELECT
SELECT * FROM table_name SELECT DISTINCT column_name(s) FROM table_name SELECT * INTO new_table_name [IN externaldatabase] FROM old_table_name or SELECT column_name(s) INTO new_table_name [IN externaldatabase] FROM old_table_name
SELECT TOP number|percent column_name(s) FROM table_name TRUNCATE TABLE table_name SELECT column_name(s) FROM table_name1 UNION SELECT column_name(s) FROM table_name2 SELECT column_name(s) FROM table_name1 UNION ALL SELECT column_name(s) FROM table_name2 UPDATE table_name SET column1=value, column2=value,... WHERE some_column=some_value SELECT column_name(s) FROM table_name WHERE column_name operator value
UNION ALL
UPDATE
WHERE
INFORMAL DESIGHN GUIDELINES FOR RELATIONAL SCHEMA 1.Semantics of the Attributes 2.Reducing the Redundant Value in Tuples. 3.Reducing Null values in Tuples. 4.Dissallowing spurious Tuples. 1. Semantics of the Attributes Whenever we are going to form relational schema there should be some meaning among the attributes.This meaning is called semantics.This semantics relates one attribute to another with some relation. Eg: USN No Student name Sem
2. Reducing the Redundant Value in Tuples Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
VTU-EDUSAT
Page 1
The main goal of the schema diagram is to minimize the storage space that the base memory occupies.Grouping attributes information relations has asignificant effect on storage space. Eg; USN No Eg: Dept No Dept Name Student name Sem
If we integrate these two and is used as a single table i.e Student Table USN No Student name Sem Dept No Dept Name
Here whenever if we insert the tuples there may be N stunents in one department,so Dept No,Dept Name values are repeated N times which leads to data redundancy. Another problem is updata anamolies ie if we insert new dept that has no students. If we delet the last student of a dept,then whole information about that department will be deleted If we change the value of one of the attributes of aparticaular table the we must update the tuples of all the students belonging to thet depy else Database will become inconsistent. Note: Design in such a way that no insertion ,deletion,modification anamolies will occur 3. Reducing Null values in Tuples. Note: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls: attribute not applicable or invalid attribute value unknown (may exist) VTU-EDUSAT Page 2
Functional dependency
1. Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs 2. FDs and keys are used to define normal forms for relations 3. FDs are constraints that are derived from the meaning and interrelationships of the data attributes 4. X->Y : A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y 5. X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y 6. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] 7. X -> Y in R specifies a constraint on all relation instances r(R) 8. Written as X -> Y; can be displayed graphically on a relation schema as in Figures. ( denoted by the arrow: ). 9. FDs are derived from the real-world constraints on the attributes 10. social security number determines employee name SSN -> ENAME 11.project number determines project name and location PNUMBER -> {PNAME, PLOCATION} 11. employee ssn and project number determines the hours per week that the employee works on the project VTU-EDUSAT Page 3
VTU-EDUSAT
Page 4
The purpose of second normal form (2NF) is to eliminate partial key dependencies. Each attribute in an entity must depend on the whole key, not just a part of it. Page 5
VTU-EDUSAT
Third Normal form also helps to eliminate redundant information by eliminating interdependencies between non-key attributes. It is already in 2NF There are no non-key attributes that depend on another non-key attribute
VTU-EDUSAT
Page 6
General Normal Form Definitions (For Multiple Keys) The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R Definition: Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: X is a superkey of R, or A is a prime attribute of R
Example 1 CUSTOMER CustomerID 12123 12443 354 Firstname Harry Leona Sarah Surname Enfield Lewis Brightman City London London Coventry PostCode SW7 2AP WC2H 7JY CV4 7AL
VTU-EDUSAT
Page 7
The Description of what the certificate means could be obtained frome the certifcate attribute - it does not need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach. Example 3 CLIENT ClientID 12123 12443 354 CINEMAS CinemaID LON23 VTU-EDUSAT CinemaAddress 1 Leicester Square. London Page 8 CinemaID* LON23 COV2 MAN4 CinemaAddress 1 Leicester Square. London 34 Bramby St, Coventry 56 Croydon Rd, Manchester
In this case the database is almost in 3NF - for some reason the Cinema Address is being repeated in the Client table, even though it can be obtained from the Cinemas table. So simply remove the column from the client table
BOYCE-CODD NORMAL FORM (BCNF) A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
Definition: A multivalued dependency (MVD) X >> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also VTU-EDUSAT Page 9
Example
VTU-EDUSAT
Page 10
VTU-EDUSAT
Page 12
Dependency Preservation Property of a Decomposition: Definition: Given a set of dependencies F on R, the projection of F on Ri, denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X Y in F+ such that the attributes in X Y are all contained in Ri. Hence, the projection of F on each relation schema Ri in the decomposition D is the set of functional dependencies in F+, the closure of F, such that all their left- and right-handside attributes are in Ri. Dependency Preservation Property: A decomposition D = {R1, R2, ..., Rm} of R is dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is
VTU-EDUSAT Page 2
VTU-EDUSAT
Page 3
VTU-EDUSAT
Page 4
Lossless (nonadditive) join test for n-ary decompositions. (c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT, and WORKS_ON satisfies test
VTU-EDUSAT
Page 5
VTU-EDUSAT
Page 6
VTU-EDUSAT
Page 8
VTU-EDUSAT
Page 9
VTU-EDUSAT
Page 10
VTU-EDUSAT
Page 11
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.
VTU-EDUSAT
Page 12
Multi-valued Dependencies and Fourth Normal Form Definition: A multi-valued dependency (MVD) X >> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also exist in r with the following properties, where we use Z to denote (R - (X Y)): t3[X] = t4[X] = t1[X] = t2[X]. t3[Y] = t1[Y] and t4[Y] = t2[Y]. t3[Z] = t2[Z] and t4[Z] = t1[Z]. An MVD X >> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X Y = R. Inference Rules for Functional and Multi-valued Dependencies: IR1 (reflexive rule for FDs): If X Y, then X > Y.
VTU-EDUSAT
Page 13
IR5 (augmentation rule for MVDs): If X >> Y and W Z then WX >> YZ. IR6 (transitive rule for MVDs): {X >> Y, Y >> Z} = X >> (Z 2 Y). IR7 (replication rule for FD to MVD): {X > Y} = X >> Y. IR8 (coalescence rule for FDs and MVDs): If X >> Y and there exists W with the properties that (a) W Y is empty, (b) W > Z, and (c) Y Z, then X > Z. Definition: A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X >> Y in F+, X is a superkey for R. Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F. Decomposing a relation state of EMP that is not in 4NF: (a) EMP relation with additional tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
VTU-EDUSAT
Page 14
Lossless (Non-additive) Join Decomposition into 4NF Relations: PROPERTY LJ1 The relation schemas R1 and R2 form a lossless (non-additive) join decomposition of R with respect to a set F of functional and multi-valued dependencies if and only if (R1 R2) >> (R1 - R2) or by symmetry, if and only if (R1 R2) >> (R2 - R1)). Algorithm 11.5: Relational decomposition into 4NF relations with non-additive join property Input: A universal relation R and a set of functional and multi-valued dependencies F. 1. Set D := { R }; 2. While there is a relation schema Q in D that is not in 4NF do { choose a relation schema Q in D that is not in 4NF;
VTU-EDUSAT Page 15
VTU-EDUSAT
Page 16
Inclusion Dependencies Definition: An inclusion dependency R.X < S.Y between two sets of attributesX of relation schema R, and Y of relation schema Sspecifies the constraint that, at any specific time when r is a relation state of R and s a relation state of S, we must have X(r(R)) Y(s(S)) Note: The ? (subset) relationship does not necessarily have to be a proper subset. The sets of attributes on which the inclusion dependency is specifiedX of R and Y of Smust have the same number of attributes. In addition, the domains for each pair of corresponding attributes should be compatible. Objective of Inclusion Dependencies:
VTU-EDUSAT Page 17
Domain-Key Normal Form (DKNF): Definition: A relation schema is said to be in DKNF if all constraints and dependencies that should hold on the valid relation states can be enforced simply by enforcing the domain constraints and key constraints on the relation.
VTU-EDUSAT Page 19
VTU-EDUSAT
Page 20
PROPOERTIES OF TRANSACTION The DBMS need to ensure the following properties of transactions: 1. Atomicity Transactions are either done or not done They are never left partially executed An executing transaction completes in its entirety or it is aborted altogether. e.g., Transfer_Money (Amount, X, Y) means i) DEBIT (Amount, X); ii) CREDIT (Amount, Y). Either both take place or none
2. Consistency Transactions should leave the database in a consistent state If each Transaction is consistent, and the DB starts consistent, then the Database ends up consistent. If a transaction violates the databases consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules.
VTU EDUSAT
Page 1
4. Durability Effects of completed transactions are resilient against failures Once a transaction commits, the system must guarantee that the results of its operations will never be lost, in spite of subsequent failures SIMPLE MODEL OF A DATABASE A database is a collection of named data items. Granularity of data - a field, a record, or a whole disk block (Concepts are independent of granularity). Basic operations are read and write: read_item(X): Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X. write_item(X): Writes the value of program variable X into the database item named X. READ AND WRITE OPERATIONS: Basic unit of data transfer from the disk to the computer main memory is one block. In general, a data item (what is read or written) will be the field of some record in the database, although it may be a larger unit such as a record or even a whole block.
1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer).
VTU EDUSAT Page 2
1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the program variable named X into its correct location in the buffer. 4. Store the updated block from the buffer back to disk (either immediately or at some later point in time).
Transaction Example in MySQL START TRANSACTION; SELECT@A:=SUM(salary) FROMtable1 WHEREtype=1; UPDATEtable2 SETsummary=@A WHEREtype=1; COMMIT;
Transaction Example in Oracle(same with SQL Server) When you connect to the database with sqlplus(Oracle command-line utility that runs SQL and PL/SQL commands interactively or from a script) a transaction begins. Once the transaction begins, every SQL DML (Data Manipulation Language) statement you issue subsequently becomes a part of this transaction
VTU EDUSAT
Page 3
TRANSACTION STATES 1. Active state 2. Partially committed state 3. Committed state 4. Failed state 5. Terminated State State transition diagram illustrating the states for transaction execution:
VTU EDUSAT
Page 4
VTU EDUSAT
Page 5
CONCURRENCY CONTROL
Concurrency in a DBMS Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently.
Users submit transactions, and can think of each transaction as executing by itself.
Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).
Things get even more complicated if we have several DBMS programs (transactions) executed concurrently.
Synchronization" of transactions; allowing concurrency (instead of insisting on a strict serial transaction execution, i.e., process complete T1, then T2, then T3 etc.) - increase the throughput of the system, - minimize response time for each transaction
Why do we need concurrent executions? It is essential for good DBMS performance! Disk accesses are frequent, and relatively slow. Overlapping I/O with CPU activity increases throughput and response time.
VTU EDUSAT Page 6
What is the problem with concurrent transactions? Interleaving transactions might lead the system to an inconsistent state (like previous example): Scenario: A Xact prints the monthly bank account statement for a user U (one bank transaction at-a-time).Before finalizing the report another Xact withdraws $X from user U. Result: Although the report contains an updated final balance, it shows nowhere the bank transaction that caused the decrease (unrepeatable read problem, explained next) A DBMS guarantees that these problems will not arise. Users are given the impression that the transactions are executed sequentially, the one after the other.
Why Concurrency Control is needed? Problems that can occur for certain transaction schedules without appropriate concurrency control mechanisms:
The Lost Update Problem This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect.
VTU EDUSAT Page 7
The Temporary Update (or Dirty Read) Problem This occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value.
The Incorrect Summary Problem If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.
The update performed by T1 gets lost; possible solution: T1 locks/unlocks database object X =) T2 cannot read X while X is modified by T1
VTU EDUSAT Page 8
T1 modifies db object, and then the transactionT1 fails for some reason. Meanwhile the modified db object, however, has been accessed by another transaction T2. Thus T2 has read data that never existed.
VTU EDUSAT
Page 9
In this schedule, the total computed by T1 is wrong. =) T1 must lock/unlock several db objects.
VTU EDUSAT
Page 10
Problem 1: Reading Uncommitted Data (WR Conflicts) Reading the value of an uncommitted object might yield an inconsistency Dirty Reads or Write-then-Read (WR) Conflicts.
Problem 2: Unrepeatable Reads (RW Conflicts) Reading the same object twice might yield an inconsistency Read-then-Write (RW) Conflicts ( Write-After-Read)
Problem 3: Overwriting Uncommitted Data (WW Conflicts) Overwriting an uncommitted object might yield an inconsistency Lost Update or Write-After-Write (WW) Conflicts.
VTU EDUSAT
Page 11
Problem caused by the WR-Conflict? Account B was credited with the interest on a smaller amount (i.e., 100$ less), thus the result is not equivalent to the serial schedule.
2. Unrepeatable Reads (RW Conflicts) To illustrate the RW-conflict, consider the following problem:
VTU EDUSAT
Page 12
Problem caused by the RW-Conflict? Although the A counter is read twice within T1 (without any intermediate change) it has two different values (unrepeatable read)! what happens if T2 aborts? 1 has shown an incorrect result.
3. Overwriting Uncommitted Data (WW Conflicts) To illustrate the WW-conflict consider the following problem: Salary of employees A and B must be kept equal T1: Set Salary to 1000; T2: Set Salary equal to 2000
Problem caused by the WW-Conflict? Employee A gets a salary of 2000 while employee B gets a salary of 1000, thus result is not equivalent to the serial schedule!
VTU EDUSAT
Page 13
3. WW Conflict (lost update): A transaction T2 could overwrite the value of an object A, which has already been modified by a transaction T1, while T1is still in progress.
1. A computer failure (system crash): A hardware or software error occurs in the computer system during transaction execution. If the hardware crashes, the contents of the computers internal memory may be lost.
2. A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error. In addition, the user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction: Certain conditions necessitate cancellation of the transaction. For example, data for the transaction may not be found. A condition, such as insufficient account balance in a banking database, may cause a transaction, such as a fund withdrawal from that account, to be canceled. A programmed abort in the transaction causes it to fail.
VTU EDUSAT Page 14
4. Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock.
5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction.
6. Physical problems and catastrophes: This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator.
read or write: These specify read or write operations on the database items that are executed as part of a transaction.
end_transaction: This specifies that read and write transaction operations have ended and marks the end limit of transaction execution.
At this point it may be necessary to check whether the changes introduced by the transaction can be permanently applied to the database or whether the transaction has to be aborted because it violates concurrency control or for some other reason.
VTU EDUSAT
Page 15
rollback (or abort): This signals that the transaction has ended unsuccessfully, so that any changes or effects that the transaction may have applied to the database must be undone.
Recovery techniques use the following operators: undo: Similar to rollback except that it applies to a single operation rather than to a whole transaction.
redo: This specifies that certain transaction operations must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database.
The System Log Log or Journal: The log keeps track of all transaction operations that affect the values of database items. This information may be needed to permit recovery from transaction failures. The log is kept on disk, so it is not affected by any type of failure except for disk or catastrophic failure. In addition, the log is periodically backed up to archival storage (tape) to guard against such catastrophic failures. T in the following discussion refers to a unique transaction-id that is generated automatically by the system and is used to identify each transaction:
The following actions are recorded in the log: _ Ti writes an object: the old value and the new value. Log record must go to disk before the changed page! _ Ti commits/aborts: a log record indicating this action.
VTU EDUSAT Page 16
Types of log record: [start_transaction,T]: Records that transaction T has started execution. [write_item,T,X,old_value,new_value]: Records that transaction T has changed the value of database item X from old_value to new_value. [read_item,T,X]: Records that transaction T has read the value of database item X. [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database. [abort,T]: Records that transaction T has been aborted. Protocols for recovery that avoid cascading rollbacks do not require that read operations be written to the system log, whereas other protocolsrequire these entries for recovery. Strict protocols require simpler write entries that do not include new_value Recovery using log records: If the system crashes, we can recover to a consistent database state by examining the log. 1. Because the log contains a record of every write operation that changes the value of some database item, it is possible to undo the effect of these write operations of a transaction T by tracing backward through the log and resetting all items changed by a write operation of T to their old_values. 2. We can also redo the effect of the write operations of a transaction T by tracing forward through the log and setting all items changed by a write operation of T (that did not get done permanently) to their new_values.
VTU EDUSAT
Page 17
Beyond the commit point, the transaction is said to be committed, and its effect is assumed to be permanently recorded in the database.
The transaction then writes an entry [commit,T] into the log. Roll Back of transactions: Needed for transactions that have a [start_transaction,T] entry into the log but no commit entry [commit,T] into the log.
Redoing transactions: Transactions that have written their commit entry in the log must also have recorded all their write operations in the log; otherwise they would not be committed, so their effect on the database can be redone from the log entries. (Notice that the log file must be kept on disk. At the time of a system crash, only the log entries that have been written back to disk are considered in the recovery process because the contents of main memory may be lost.)
Force writing a log: Before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk. This process is called force-writing the log file before committing a transaction.
VTU EDUSAT
Page 18
A schedule (or history) S of n transactions T1, T2, , Tn: It is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1. Note, however, that operations from other transactions Tj can be interleaved with the operations of Ti in S.
Serializability: DBMS must control concurrent execution of transactions to ensure read consistency, i.e., to avoid dirty reads etc. A (possibly concurrent) schedule S is serializable if it is equivalent to a serial schedule S0, i.e., S has the same result database state as S0.
How to ensure serializability of concurrent transactions? Conflicts between operations of two transactions:
VTU EDUSAT
Page 19
Checks for serializability are based on precedence graph that describes dependencies among concurrent transactions; if the graph has no cycle, and then the transactions are serializable. - they can be executed concurrently without affecting each others transaction result.
Atomicity of Transactions A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions. A very important property guaranteed by the DBMS for all transactions is that they are atomic. That is, a user can think of a Xact as always executing all its actions in one step, or not executing any actions at all. _ DBMS logs all actions so that it can undo the actions of aborted transactions. Example Consider two transactions (Xacts): T1: BEGIN A=A+100, B=B-100 END T2: BEGIN A=1.06*A, B=1.06*B END
Intuitively, the first transaction is transferring $100 from Bs account to As account. The second is crediting both accounts with a 6% interest payment. There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.
VTU EDUSAT
Page 20
Scheduling Transactions Serial schedule: Schedule that does not interleave the actions of different transactions.
Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule.
(Note: If each transaction preserves consistency, every serializable schedule preserves consistency.)
VTU EDUSAT
Page 21
Schedules classified on recoverability: Recoverable schedule: One where no transaction needs to be rolled back. A schedule S is recoverable if no transaction T in S commits until all transactions T that have written an item that T reads have committed. Cascadeless schedule: One where every transaction reads only the items that are written by committed transactions. Schedules requiring cascaded rollback: A schedule in which uncommitted transactions that read an item from a failed transaction must be rolled back.
Strict Schedules: A schedule in which a transaction can neither read or write an item X until the last transaction that wrote X has committed.
VTU EDUSAT
Page 22
Serializable schedule: A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions
Result equivalent: Two schedules are called result equivalent if they produce the same final state of the database.
Conflict equivalent: Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules.
Conflict serializable: A schedule S is said to be conflict serializable if it is conflict equivalent to some serial schedule S.
Being serializable is not the same as being serial. Being serializable implies that the schedule is a correct schedule. It will leave the database in a consistent state. The interleaving is appropriate and will result in a state as if the transactions were serially executed, yet will achieve efficiency due to concurrent execution.
Serializability is hard to check. Interleaving of operations occurs in an operating system through some scheduler Difficult to determine beforehand how the operations in a schedule will be interleaved
Current approach used in most DBMSs: Use of locks with two phase locking
View serializability: Definition of serializability based on view equivalence. A schedule is view serializable if it is view equivalent to a serial schedule.
Two schedules are said to be view equivalent if the following three conditions hold: 1. The same set of transactions participates in S and S, and S and S include the same operations of those transactions. 2. For any operation Ri(X) of Ti in S, if the value of X read by the operation has been written by an operation Wj(X) of Tj (or if it is the original value of X before the schedule started), the same condition must hold for the value of X read by operation Ri(X) of Ti in S. 3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk must also be the last operation to write item Y in S.
The premise behind view equivalence: As long as each read operation of a transaction reads the result of the same write operation in both schedules, the write operations of each transaction must produce the same results. The view: the read operations are said to see the same view in both schedules
Relationship between view and conflict equivalence: The two are same under constrained write assumption which assumes that if T writes X, it is constrained by the value of X it read; i.e., new X = f(old X)
VTU EDUSAT Page 24
Relationship between view and conflict equivalence Consider the following schedule of three transactions T1: r1(X), w1(X); T2: w2(X); and T3: w3(X): Schedule Sa: r1(X); w2(X); w1(X); w3(X); c1; c2; c3; In Sa, the operations w2(X) and w3(X) are blind writes, since T1 and T3 do not read the value of X. Sa is view serializable, since it is view equivalent to the serial schedule T1, T2, T3. However, Sa is not conflict serializable, since it is not conflict equivalent to any serial schedule
Testing for conflict serializability: Algorithm 17.1: 1. Looks at only read_Item (X) and write_Item (X) operations 2. Constructs a precedence graph (serialization graph) - a graph with directed edges 3. An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj 4. The schedule is serializable if and only if the precedence graph has no cycles.
Strict Two-phase Locking (Strict 2PL) Protocol: _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ All locks held by a transaction are released when the transaction completes _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object. Strict 2PL allows only serializable schedules.
VTU EDUSAT
Page 25
Aborting a Transaction If a transaction Ti is aborted, all its actions have to be undone. Not only that, if Tj reads an object last written by Ti, Tj must be aborted as well Most systems avoid such cascading aborts by releasing a transactions locks only at commit time. _ If Ti writes an object, Tj can read this only after Ti commits. In order to undo the actions of an aborted transaction, the DBMS maintains a log in which every write is recorded. This mechanism is also used to recover from system crashes: all active Xacts at the time of the crash are aborted when the system comes back up. Recovering From a Crash There are 3 phases in the Aries recovery algorithm: _ Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash. _ Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk. _ Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, which is in the log record for the update), working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!)
Conflict Serializable Schedules Two schedules are conflict equivalent if: _ Involve the same actions of the same transactions _ Every pair of conflicting actions is ordered the same way Schedule S is conflict serializable if S is conflict equivalent to some serial schedule
VTU EDUSAT
Page 26
The cycle in the graph reveals the problem. The output of T1 depends on T2, and viceversa.
Dependency Graph Dependency graph: One node per Xact; edge from Ti to Tj if Tj reads/writes an object last written by Ti. Theorem: Schedule is conflict serializable if and only if its dependency graph is acyclic
Review: Strict 2PL Strict Two-phase Locking (Strict 2PL) Protocol: _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ All locks held by a transaction are released when the transaction completes _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object. Strict 2PL allows only schedules whose precedence graph is acyclic
Two-Phase Locking (2PL) Two-Phase Locking Protocol _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ A transaction can not request additional locks once it releases any locks. _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object.
View Serializability Schedules S1 and S2 are view equivalent if: _ If Ti reads initial value of A in S1, then Ti also reads initial value of A in S2
VTU EDUSAT Page 27
Lock Management Lock and unlock requests are handled by the lock manager Lock table entry: _ Number of transactions currently holding a lock _ Type of lock held (shared or exclusive) _ Pointer to queue of lock requests Locking and unlocking have to be atomic operations Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock
Deadlocks Deadlock: Cycle of transactions waiting for locks to be released by each other. Two ways of dealing with deadlocks: _ Deadlock prevention _ Deadlock detection
Deadlock Prevention Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: _ Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts _ Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits If a transaction re-starts, make sure it has its original timestamp
Deadlock Detection
Multiple-Granularity Locks Hard to decide what granularity to lock (tuples vs. pages vs. tables).
VTU EDUSAT
Page 29