Sunteți pe pagina 1din 47

3: Database Systems

Part V: Physical Database Design

Physical Database Design

The process of mapping the logical data model into an internal set of physical database structures Major consideration:

Can the user get the desired information, in the appropriate format, and in a timely (i.e. acceptable response time) fashion?

Objectives of Physical Database Design

Implement the database as a set of stored records, files, indexes, etc. Provide adequate performance Ensure database integrity, security, and recoverability

Major Inputs to Physical Design

Logical data model User processing requirements, including


Size of database and frequency of use Response time, security, backup, recovery, and retention of data

DBMS characteristics and other components of computer operating environment


4

Components of Physical Design

Data volume and usage analysis Data distribution strategy File organization Indexes Integrity constraints

Data Volume and Usage Analysis

Database size

Used to select physical storage devices and estimate cost of storage Used to select file organization and access methods Plan for use of indexes Strategy for data distribution
6

Usage paths

Composite Usage Map

Composite Usage Map

Data volumes

Composite Usage Map

Access Frequencies (per hour)

Composite Usage Map


Usage analysis:
200 purchased parts accessed per hour 80 quotations accessed from these 200 purchased part accesses 70 suppliers accessed from these 80 quotation accesses

10

Composite Usage Map


Usage analysis:
75 suppliers accessed per hour 40 quotations accessed from these 75 supplier accesses 40 purchased parts accessed from these 40 quotation accesses

11

Data Distribution Strategies

Different approaches to determine at which nodes or sites to physically locate the data in a distributed computing network Four strategies

Centralized Partitioned Replicated Hybrid


12

Centralized

All data are located at a single site Advantage

Simple implementation Data not readily accessible to remote users Expensive data communication costs When central system crashes, entire database system fails
13

Disadvantages

Partitioned

Database is divided into nonoverlapping partitions or fragments which are assigned to particular sites Advantage

Data is more accessible to local user


More complex implementation

Disadvantage

14

Replicated

Duplicate copies of the entire database are assigned to more than one site in the network Advantage

Maximizes local access to data


Update problems (synchronization)

Disadvantage:

15

Hybrid

Database is partitioned into critical and non-critical fragments Critical fragments are stored at multiple sites, while non-critical fragments are only in one site What are the advantages and disadvantages of this approach?

16

File Organization

How records are physically arranged or stored on secondary storage devices Example

Storage on hard disks, tapes, CD-ROMs, etc.

17

Basic File Organizations

Sequential Indexed

Indexed sequential Indexed non-sequential

Hashed

18

Sequential File Organization

Records in the file are stored in sequence according to a primary key value

1 2

If sorted
every insert or delete requires resort

If not sorted
Average time to find desired record = n/2. n
19

Indexed File Organization

An index is created that allows user to locate individual records faster Index

A table or other data structure used to determine the location of rows in the main table that satisfy some condition

20

Indexed Sequential

Records are stored sequentially by primary key value Uses block index Example:

White pages phone directory

21

Indexed Non-Sequential

Records are stored non-sequentially Full index is required Example

Books in a library

22

Hashed File Organization

A hashing algorithm is used to determine the address of each record Hashing algorithm

Converts a primary key value into a relative record number or file address Example: Divide primary key value by a prime number and use the remainder as the storage location

23

Selecting File Organization

Select a file organization that provides a reasonable balance among the following criteria:

Fast access for retrieval High throughput for processing transactions Efficient use of storage devices Protection from failures or data loss Minimal need for reorganization Accommodation for file growth Security from unauthorized use
24

Constraints in Selecting File Organization

Physical characteristics of secondary storage devices Available operating system File management software User needs for storing and accessing data

25

Indexes

Stored in main memory for faster searching of required values Types of index

Primary key Non-key Clustering

26

Types of Indexes

Primary key

Index created based on the primary key


Index created for each desired non-key attribute Speeds up retrievals by physically ordering the file or table based on a nonkey attribute
27

Non-key

Clustering

Clustering Indexes

Clustering attribute

Any non-key attribute used to group together rows that have a common value for the attribute
Index defined on the clustering attribute of a table

Clustering index

28

Clustering Index: An Example


DESCRIPTION Bookcase Chair Dresser Stand RECORD NO. 1 3,5 2,6,7 4

DESCRIPTION INDEX (Non-clustered)

PRODUCT TABLE
DESCRIPTION Bookcase Dresser Chair Stand Chair Dresser Dresser FINISH Oak Maple Cherry Pine Maple Oak Pine PRICE 75 625 100 750 125 800 1200
29

RECORD NO. 1 2 3 4 5 6 7

PRODUCT NO. 0100 0350 0975 1000 1250 1425 1775

Clustering Index: An Example


DESCRIPTION Bookcase Chair Dresser Stand RECORD NO. 1 2 4 7

DESCRIPTION INDEX (Clustered)

PRODUCT TABLE
DESCRIPTION Bookcase Chair Chair Dresser Dresser Dresser Stand FINISH Oak Cherry Maple Maple Oak Pine Pine PRICE 75 100 125 625 800 1200 750
30

RECORD NO. 1 2 3 4 5 6 7

PRODUCT NO. 0100 0975 1250 0350 1425 1775 1000

Trees

Most common data structure for implementing indexes Branching factor

Degree of a tree Maximum number of children allowed per parent Number of levels between the root node and a leaf node in a tree
31

Depth

Balanced Trees

Also called B-Trees A tree in which all leaves are of the same distance from the root Index files are most commonly organized using B-trees, which have predictable efficiency Also support sequential retrieval of records
32

Using B-Trees in Indexes

uses a tree search


Average time to find desired record = depth of the tree
33

Main Trade-Off of Using an Index

Improved performance for retrievals versus degraded performance for inserting, deleting, and updating records in a table Examples

Decision Support Systems (DSS) Transaction Processing Systems (TPS)

34

When to Use Indexes

Specify a unique index for the primary key attribute of each table In most situations, it is also advisable to specify an index for foreign keys Specify an index for non-key attributes that are referred to in qualification, sorting, and grouping commands

35

When to Use Indexes

Index search fields Index only large tables (when there are >100 values but not when there are <30 values) Null values will not be referenced from an index Remember, only use indexes heavily for non-volatile databases
36

Integrity Constraints

Business rules that preserve the integrity of the data Four types

Default value Domain Null value Referential integrity

37

Referential Integrity

Considers the validity of references between objects in a database The value of a foreign key in one table (referencing table) must be an actual value of a primary key in some other table (referenced table), or else it must be null, if allowed

38

Referential Integrity Rules

Insertion Rule

A row cannot be inserted in the referencing table unless a matching entry already exists in the referenced table If insertion is allowed even without a matching entry in the referenced table, a null value is used for the foreign key in the referencing table

39

Referential Integrity Rules

Deletion Rule

A row cannot be deleted from the referenced table if there are matching rows in the referencing table
Restrict Nullify Cascade
40

Three applicable rules


Delete Rules

Restrict

Deletion is not allowed


Foreign key values changed to null in the referencing table before corresponding row in the referenced table is deleted Affected rows in the referencing table are deleted first before matching row in the referenced table is deleted
41

Nullify

Cascade

Enforcing Referential Integrity

Enforcing referential integrity in application programs

Unreliable -- may be handled differently in separate programs and cause conflicts

Enforcing referential integrity constraints within the DBMS

Consistent enforcement of rules Makes programming and maintenance easier


42

Denormalization

Database may not always be implemented in normalized form Used to speed up data access Reduces number of tables that must be accessed to retrieve data No hard and fast rules

43

Denormalization

Situations to consider denormalization

One-to-one relationship between two entities Many-to-many relationship with non-key attributes Reference data

44

Denormalization of One-to-One
Name Student_ID Address Application_Date Status

STUDENT

has

SCHOLARSHIP APPLICATION
Student_ID Application_ID

Denormalized relation:

STUDENT (Student_ID, Name, Address, Application_Date, Status)

45

Denormalization to Many-to-Many
Vendor_Name Address Price Description

VENDOR

submits

PRICE QUOTE

given for

ITEM

Vendor_ID

Vendor_ID

Item_ID

Item_ID

Denormalized relations: VENDOR (Vendor_ID, Vendor_Name, Address) ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)

46

Denormalization of Reference Data


Container_No Cabinet_No Description

STORAGE

stores

ITEM

Storage_ID

Storage_ID

Item_ID

Denormalized relation: STORAGE (Item_ID, Description, Container_No, Cabinet_No)

47

S-ar putea să vă placă și