Sunteți pe pagina 1din 32

Database

Concepts
What is a Database

 Personal address book in a Word document


 Collection of Word documents
 Collection of Excel Spreadsheets
 Data collected, maintained, & used in airline reservation
 Data used to support launch of a space shuttle

Structured Set of Data


Models of Reality
DML
DATABASE SYSTEM
REALITY
• structures DATABASE
• processes DDL

 Database is a model of structures of reality


 Use of a database reflect processes of reality
 Database system is a software system which supports definition
& use of a database
 DDL: Data Definition Language
 DML: Data Manipulation Language
When does DBMS is
used
 Persistent Storage of Data
 Centralized Control of Data
 Control of Redundancy
 Control of Consistency & Integrity
 Multiple User Support
 Sharing of Data
 Data Independence
 Control of Access & Security
 Backup & Recovery
Data Modeling

DATABASE SYSTEM
REALITY
• structures MODEL
data modeling
• processes

 Model represents a perception of structures of reality


 Data modeling process is to fix a perception of
structures of reality & represent this perception
 In data modeling process we select aspects & we
abstract
Database Design

purpose of database design is to create a


database which

 is a model of structures of reality

 supports queries & updates

 runs efficiently
Database Terminology

 Data Models
 Keys
 Integrity
 Triggers & Stored Procedures
 Null Values
 Surrogates
 Normalization
Data Model

A data model consists of notations for

expressing:
 Data Structures
 Integrity Constraints
 Operations
Data Model - Data
Structures
All data models have notation for defining:
 entity types

 attribute types

 relationship types

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231


Data Model -
Constraints
Rules that cannot be expressed by Data Structures

 Static constraints apply to database state


 Dynamic constraints apply to change of database state
 E.g., “All FLIGHT-SCHEDULE entities must have precisely one
DEPT-AIRPORT relationship

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231 242 bos


Data Model -
Operations
Operations support change & retrieval of data:
 insert FLIGHT-SCHEDULE(97, delta, tu, 258);
insert DEPT-AIRPORT(97, atl);
 select FLIGHT#, WEEKDAY
from FLIGHT-SCHEDULE
where AIRLINE=‘delta’;
FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231 242 bos

97 delta tu 258 97 atl


Keys

Keys are uniqueness constraints


 A key on FLIGHT# in FLIGHT-SCHEDULE will force all
FLIGHT#’s to be unique in FLIGHT-SCHEDULE
 Consider following keys on DEPT-AIRPORT:

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231 242 bos


Integrity

 Integrity:
– does model reflect reality well?
– is model without internal conflicts?
 a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models
existence of an entity in real world
 a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE
because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE
entity to have a DEPT-AIRPORT

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231 242 bos


Triggers & Stored
Procedures
 Triggers can be defined to enforce constraints on a
database, e.g.,

 DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE


ON DELETE FROM FLIGHT-SCHEDULE WHERE FLIGHT#=‘X’
ACTION DELETE FROM DEPT-AIRPORT WHERE FLIGHT#=‘X’;

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 sc&inavian fr 450 545 lax

242 usair mo 231 242 bos


Null Values

CUSTOMER# NAME MAIDEN NAME DRAFT STATUS


123-45-6789 Lisa Smith Lisa Jones inapplicable
234-56-7890 George Foreman inapplicable drafted
345-67-8901 unknown Mary Blake inapplicable

 Null-value unknown reflects that attribute does apply, but value is


currently unknown. That’s ok!
 Null-value inapplicable indicates that attribute does not apply.
That’s bad!
 Null-value inapplicable results from direct use of “catch all forms”
in database design
 “Catch all forms” are ok in reality, but detrimental in database design
Surrogates

reality
name custom# customer
customer custom# name addr

customer addr
surrogate-based representation

 surrogates are system-generated, unique, internal identifiers


DATA MODELS
 ER-Model
 Hierarchical Model
 Relational Model
 Object-Oriented Model
ER Model visa
required

dept domestic international


time flight flight
airport airport
dept
name code p
1 airport n
weekdays
airport flight
airport
addr schedule
arriv
1 n
airport 1
zip flight#
street city arriv
time instance
of
customer# date
n
customer reserva- flight
n n
customer instance
name tion

seat#
Hierarchical
Model
flight-sched
flight#

flight-inst dept-airp arriv-airp


date airport-code airport-code
customer
customer# customer name
 parent-child relationship types (1:n only!!):
(flight-sched,flight-inst), (flight-inst,customer)
 one record type is root, all or record types is a child of one
parent record type only
 substantial duplication of customer instances
Relational Model

 Commercial systems include: ORACLE, DB2,

SYBASE, INFORMIX, INGRES, SQL Server

 Dominates database market on all platforms


Relational Model -
Data Structures
 domains
 attributes
 relations

relation name
attribute names
flight-schedule
flight#: airline: weekday: price:

integer char(20) char(2) dec(6,2)

domain names
Relational Model -
Integrity Constraints
 Primary Keys
 Entity Integrity
 Referential Integrity

flight-schedule customer
flight# customer# customer name
p p

reservation
flight# date customer#
Relational Model -
Operations
 Powerful query languages
 Procedural; describes how to compute a query;
operators like JOIN, SELECT, PROJECT
 Declarative; describes desired result, e.g. SQL,
QBE
 insert, delete, & update capabilities
Object-Oriented
Model
based on object-oriented paradigm,
e.g., Simula, Smalltalk, C++, Java
area is in a state of flux

 object-oriented commercial systems include GemStone, Ontos,


Orion-2, Statice, Versant, O2
 object-relational model has relational repository model; adds
object-oriented features;
 object-relational commercial systems include Starburst,
POSTGRES
Clinical Database
 Data from Clinical Trial stored in
some computer system
– Manual
– Electronic
 Complex Data Cleaning, Review &
Reporting
 Excel, Access, SAS, Oracle
Clinical Db Design

Always follows Protocol Creation


 Accuracy

 Clarity

 Ease & Speed of Data Entry

 Creation of Analysis Datasets

 Source Data Transfer formats

 DBMS Requirements
Normalization
 Step-by-step decomposition of
complex records into simple
records
 Reduces redundancy
 Non-loss decomposition
– To avoid inconsistencies
– Update anomalies
Clinical Db
Normalization
 Tall Skinny Vs Short Fat
 Normalized Vs Non-normalized
 Fewer Columns & More Rows
 Efficient Access & Storage
 Reducing the Size of Data
Groupings or Records
 Does not apply to CRF Image Db
Short Fat

Patient Visit BP_DI BP_SY BP_DI BP_SY BP_DI BP_SY


ID A_1 S_1 A_2 S_2 A_3 S_3

1001 2 120 72 118 70 117 68


Tall Skinny

Patien Visi Measureme BP_DI BP_SY


t ID t nt A S

1001 2 1 120 72

1001 2 2 118 70

1001 2 3 117 68
Short Fat Vs Tall
Skinny
 Short Fat
– Data Cleaning Checks for variables
within a single visit is easy
– Missing Values easily detected
 Tall Skinny
– Easy Creation of Structures &
Associated Checks
– Data Querying is easier
Thank You

S-ar putea să vă placă și