Documente Academic
Documente Profesional
Documente Cultură
c
Outline
× Definitions
× Selecting a dbms
× Selecting an application layer
× Relational Design
× Planning
× A very few words about Replication
× Space
!
Definitions
What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.
x
Definitions
Instance
A database ,
, or an Ɲinstanceƞ is
made up of the background processes
needed by the database software.
These processes usually include a process
monitor, session monitor, lock monitor,
etc. They will vary from database vendor
to database vendor.
u
Definitions
What is a schema?
A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT
A SCHEMA.
A | controls 0 or more databases.
A | contains 0 or more database application
schemas.
A |
is the set of database
objects that apply to a specific application. These objects
are relational in nature, and are related to each other,
within a database to serve a specific functionality. For
example payroll, purchasing, calibration, trigger, etc. A
database application schema not a database. Usually
several schemas coexist in a database.
A |
is the code base to manipulate
and retrieve the data stored in the database application
schema.
è
Definitions Cont.
Primary Definitions
× @ , a set of columns that contain data. In
@ ,
the old days, a table was called a file.
× a
,, a set of columns from a table reflecting a
a
record.
× |,, an object that allows for fast retrieval of
|
table rows. Every primary key and foreign key
should have an index for retrieval speed.
× ,, often designated pk, is 1 or more
O
columns in a table that makes a record unique.
Definitions Cont.
Primary Definitions
× , often designated fk, is a common
,
column common between 2 tables that define
the relationship between those 2 tables.
× Foreign keys are either mandatory or optional.
Mandatory forces a child to have a parent by
creating a not null column at the child. Optional
allows a child to exist without a parent, allowing
a nullable column at the child table (not a
common circumstance).
·
Definitions Cont.
Primary Definitions
a
or ER is a
pictorial representation of the application
schema.
ü
Er Example
'- +% & ,
$%*
"#
%&
$%
Definitions Cont.
Primary Definitions
are rules residing in the
R
are
databaseƞs data dictionary governing
relationships and dictating the ways
records are manipulated, what is a
legal move vs. what is an illegal
move. These are of the utmost
importance for a secure and
consistent set of data.
c.
Definitions Cont.
Primary Definitions
or
sql statements that insert, update or
delete database in a database.
or ,, sql
used to create and modify database
objects used in an application schema.
cc
Definitions Cont.
Primary Definitions
A
is a logical unit of work that
contains one or more SQL statements. A
transaction is an atomic unit. The effects
of all the SQL statements in a transaction
can be either all
| (applied to
the database) or all
| (undone
from the database), insuring data
consistency.
c!
Definitions Cont.
Primary Definitions
× A view is a selective presentation of the
structure of, and data in, one or more
tables (or other views). A view is a Ɲvirtual
tableƞ, having predefined columns and
joins to one or more tables, reflecting a
specific facet of information.
cx
Definitions Cont.
Primary Definitions
Database are PL/SQL, Java, or C
procedures that run implicitly whenever a table
or view is modified or when some user actions
or database system actions occur. Database
triggers can be used in a variety of ways for
managing your database. For example, they can
automate data generation, audit data
modifications, enforce complex integrity
constraints, and customize complex security
authorizations. Trigger methodology differs
between databases.
cu
Definitions Cont.
Primary Definitions
a
is the process of copying and
maintaining database objects, such as tables, in
multiple databases that make up a distributed
database system.
are copies of the database data in a
are
format specific to the database.
used to recover one or more files that have been
physically damaged as the result of a disk
failure. Media recovery requires the restoration
of the damaged files from the most recent
operating system backup of a database. It is of
the utmost importance to perform regularly
cè
scheduled backups.
Definitions Cont.
R
An application is defined as mission critical,
imho, if
1. there are legal implications or financial loss to
the institution if the data is lost or unavailable.
2. there are safety issues if the data is lost or
unavailable.
3. no data loss can be tolerated.
4. uptime must be maximized (98%+).
c
Definitions Cont.
Ɲ ƞ
ƞ or Ɲ
Ɲ
ƞƞ or Ɲ
Ɲ
ƞƞ
Seems odd, but Ɲlargeƞ is a hard definition to
determine. Vldb is an acronym for very large
databases. Its definition varies depending on
the database software one selects. Very large
normally indicates data that is reaching the
limits of capacity for the database software, or
data that needs extraordinary measures need to
be taken for operations such as backup,
recovery, storage, etc.
c·
Definitions Cont.
Commercial databases do not a have a practical
limit to the size of the load. Issues will be
backup strategies for large databases.
Freeware does limit the size of the databases, and
the number of users. Documentation on these
issues vary widely from the freeware sites to the
user sites. Mysql supposedly can support 8T and
100 users. However, you will find arguments on
the users lists that these numbers cannot be
met.
cü
Selecting a DBMS
c
Selecting a DBMS
How do I Choose?
Which database product is appropriate for my
application? You must make a requirements
assessment.
Does you database need 24x7 availability?
Is your database mission critical, and no data loss can
be tolerated?
Is your database large? (backup recovery methods)
What data types do I need? (binary, large objects?)
Do I need replication? What level of replication is
required? Read only? Read/Write? Read/Write is
very expensive, so can I justify it?
!.
Selecting a DBMS
How do I Choose? Cont.
If your answer to any of the above is Ɲyesƞ, I would
strongly suggest purchasing and using a commercial
database with support. Support includes:
× 24x7 assistance with technical issues
× Upgrades/new releases
× Assistance with and use of proven backup/recovery
methods
!c
Selecting a DBMS
The Freeware Choice
Freeware is an alternative for applications.
However, be fore warned, support for
these databases is done via email to a ad
hoc support group. The level of support
via these groups may vary over the life of
your database. Be prepared. Also expect
less functionality than any commercial
product. See http://www-
http://www-
css.fnal.gov/dsg/external/freeware/
!!
Selecting a DBMS
The Freeware Choice
Freeware is free.
Freeware is open source.
Freeware functionality is improving.
Freeware is good for smaller non-
non-mission
critical applications.
!x
Selecting an Application Layer
Again, planning takes center stage. In the end you
want stability and dependability.
× How many users need access?
× What will the security requirements be?
× Are there software licensing issues that need
consideration?
× Is platform portability a requirement?
× Two tier or three tier architecture?
!u
Selecting an Application Layer
× Direct access to the database layer? (probably
should be avoided)
× Are you replicating? How? Where? With what?
× There are no utilities that will port data from 1
database to another (i.e., postgres to mysql). if
database portability is a requirement, an
independent code must be written to satisfy this
requirement.
!è
Selecting an Application Layer
Cont.
Application maintenance issues
× People availability, working with users as a team, talent,
and turnover? (historically a huge issue)
× A Ɲknownƞ or Ɲcommonƞ language?
× Freeware? Bug fixes, patchesƦare they important and
timely?
× Documentation? Set standards, procedures, code
reviews making sure the documentation exists and is
clear.
× Is the application flexible enough to easily accommodate
business rule changes that mandate modifications?
× The availability of an ER diagram at this stage is
invaluable. We consider it a must have.
× There are no utilities to port data from 1 type of db to !
another. This lack of portability means a method to
Selecting an Application Layer
Misc. application definitionsƦ
This presentation is not an application
presentation, but I will mention a few terms you
may hear.
Sql the query language for relational databases. A
must learn.
ODBC, open database connectivity. The software
that allows a database to talk to an application.
JDBC, java database connectivity.
!·
Relational Design
!ü
Relational Design
The Setup
The database group has a standard 3 tier
infrastructure for developing and deploying
production databases and applications. This
infrastructure provides 3 database instances,
development, integration and production. This
infrastructure is applicable to any application
schema, mission critical or not. It is designed
to insure development, testing, feedback,
signoff, and an protected production
environment.
Each of these instances contain 1 or more
applications.
!
Relational Design
The Setup
The 3 instances are used as follows:
1. Development instance. Developers
playground. Small in size compared to
production. Much of the data is
Ɲinventedƞ and input by the developers.
Usually there is not enough disk space to
ever Ɲrefreshƞ with production data.
x.
Relational Design Cont.
The Setup
2. The integration instance is used for
moving what is thought to be Ɲcompleteƞ
functionality to a pre production
implementation. Power users and
developers work in concert in integration
to make sure the specs were followed.
The users should use integration as their
sign off area. Cuts from dev to int are
frequent and common to maintain the
newest releases in int for user testing.
xc
Relational Design Cont.
The Setup
3. The production instance, real data. Needs
to be kept pure. NO testing allowed. Very
few logons. The optimal setup of a
production database server machine has
~3 operating system logons, root, the
database logon (ie oracle), and a
monitoring tool. In a critical 24x7
supported database, developers,
development tools, web servers, log files,
all should be kept off the production
database server. x!
Relational Design Cont.
The Setup
Letƞs talk about mission critical & 24x7 a bit.
1. To optimize a mission critical 24/7 database, the
database server machine should be dedicated to
running the database, nothing else.
2. All software products need maintenance and downtime.
Resist putting software products on the db server
machine so that their maintenance does not inhibit the
running of the database. Further, if the product
breaks, it could inhibit access to the database for a
long period. Example, a logging application, monitoring
users on the db goes wild, fills all available space and
halts the database. If this logging app. were not on
the dbserver machine, the db would be unaffected by
the malfunction.
xx
Relational Design Cont.
The Setup
3. All database applications and database software require
modifications. Most times these modification require
down time because the schema or data modifications
need to lock entire tables exclusively. If you are sharing
your database instance with other many other
applications, and 1 of those applications needs the
database for an upgrade, all apps may have to take the
down time. Avoid this by insuring your 24/7 database
application is segregated from all other software that is
not absolutely needed. In that way you insure any down
times are specific to your cause.
xu
3 Our 1st relational example
3
c
& &
. ! 2" &" "
/ . , c0 . , c
/ . !1 / 0 3 0 2" 1
. ," &c1
&
& c
" & " & 2" &" "
& " c
2" &" . ," &c
/ 0 3 0 2" 1 xè
What is a schema?
It is It is not
Tables (columns/datatypes) having ×The environment (servers, OS)
Constraints (not null, unique, foreign & ×The results of queries, I.e objects
primary keys) ×Application Code
Triggers
Indexes
etc.
Accounts
Privileges & Roles
Server side processes
x
Relational Design
Getting Started
Using your design tool, you will begin by relating objects that will
eventually become tables. All the other schema objects will fall out
of this design.
You will spend LOADS of time in your design tool, honing, redoing,
reacting to modifications, etc.
The end users and the designers need to be working almost at the
same desk for this process. If the end user is the designer, the end
user should involve additional users to insure an unbiased and
general design.
It is highly suggested that the design be kept up to date for future
documentation and maintainers.
Tables are related, most frequently in a 0 to many relationship.
Example, 1 run will result in 0 or more events. Analyzing and
defining these relationships results in an application schema.
x·
What will a good schema design
buy you?
I am afraid the 80% planning 20% implementation
rule applies. Gather requirements.
× Discovery of data that needs to be gathered.
× Data flexibility
Relational Design
Common Mistakes
Mistakes we see ALL the time
× Do not design your schema around your favorite
query. A relational design will enable all queries
to be speedy, not only your favorite.
× Donƞt design the schema around your narrow
view of the application. Get other users involved
from the start, ask for input and review.
u!
i |
uu
Relational Design
Examples of Common Mistakes
× Using timestamp as the primary key assumes
that within a second, no other record will be
inserted. Actually this was not the case, and an
insert operation failed. Use database generated
sequences as primary keys and NONNON--UNIQUE
index on timestamp.
× A table with more than 900 columns. Such
design will cause chaining since each record is
not going to fit in one block. One record
spanning many blocks, thus chaining, hence bad
performance.
uè
Relational Design
Examples of Common Mistakes
× Do not let the application control a generated
sequence. Have seen locking issues, and
duplicate values issues when the application
increments the sequence. Have the database
increment/lock/constrain the sequence/primary
key. That is why the databases have sequence
mechanisms, use them.
× Use indices! An Atlas table with 200,000 rows,
halted during a query. Reason? No indices.
Added a primary key index, instantaneous query
response. Indices are not wasted space!
u
Relational Design
Examples of Common Mistakes
!"#!$$$$$$
i
Have examples where constraints were not used,
but Ɲimplementedƞ via the api. Bugs in the api
allowed data to be deleted that should not have
been deleted, and constraints would have
prevented the error. Have also seen apis error
with Ɲcannot deleteƞ errors. They were trying to
force an invalid delete, luckily the database
constraints saved the data.
u·
Entity Relationship Diagrams
1 to many
$%*
#2 -5
$%* 4
#2 -5 4
$%*
#2 -5
$%*
#2 -5
uü
Entity Relationship Diagrams
many to many
,"
! ," ! ! & !
! & ," !
,"
6
2 & & 6
," &
! !6! 6!
! 6!
& ,"
u
Entity Relationship Diagrams
1 to 1
7 ,"
7
2 & &
,"
2 & &
,"
2 & &
è.
Relational Design
The Good
,"
,"
èc
Relational Design
The Bad
4
4
3 % - %&x",, -& "2 -0%22 &"- & % "-, %&" -0 -c "2 32
3,," "22 % & "&&-0& &0%- %"-&%"-, u&% 2 - "- &% ,!
è!
Relational Design
The Ugly
/ 1 /x1
," ,"
,"
3 & x ",, & 2" &" 03 " & 2 22 , " 32 "& & 2
9& 0 9& & &" 0 9& " &
èx
Relational Design
The GoodƦletƞs recap
,"
,"
èu
Relational Design
What to expect from a design tool
× An entity relationship diagram
× The ability to create the ddl (data
definition language) needed
× The ability to project disk space usage
× Ddl in a format to allow you to enter the
code into a code library (cvs), and that will
allow you to run against your database
èè
Relational Design Why bother?
Experience from RunII
TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!
Personnel consistency does not exist. Application
developers come and go regularly. The
documentation that a design product provides will
the next developer an immediate understanding of
the application in picture format.
Application sharing is enhanced when others can look
at your design and determine whether the
application is reusable in their environment. Sam is
a good example of an application that 3 experiments
are now using.
è
Relational Design
Why bother? Cont.
When an application is under construction,
the ER diagram goes to every application
meeting, and quite possibly the wallet of
the application leader. It is the pictorial
answer to many issues.
Planning for disk space has been an issue,
the designer tool should assist with this
task.
è·
Planning
Overall
What do I need to plan for?
People, hardware, software, obsolescence,
maintenance, emergencies.
How far out do I need to plan?
Initially 2-
2-4 years.
How often do I need to review the plans?
Annually.
What if my plan fails or looks undoable?
Nip it in the bud, be proactive, come up with
options.
èü
Planning
Overall
× Disk space requirements. My experience is all the
wags, (wild guesses) fall short of what is needed. It
is hard to predict the number of rows in a table. It
would be easier if we knew the amount and results
of the science ahead of time! Remember, 10x what
you think the data will take.
× Hardware requirements. Experience tells us that the
database machine should serve 1 master (if it is a
large database or mission critical), the database,
nothing else. Ideally there will be root, a database
monitor user and a database user, oracle for
example. No apache, no log file areas, no
applications, etc.
è
Planning
Overall
× Growth and obsolesce. Plan for 3- 3-4 years before
needing to replace hardware. Hardware and
software become obsolete. New/upgraded software
gives addition functionality that you will want/need.
× Maintenance. Do you change the oil in your car?
Plan on 1 morning per month downtime for caring
for the hardware and software. Security patches
could mandate additional stoppages. I cannot stress
how important this is. Fire walling will not protect
you from bugs and obsolescence. If the downtime is
not needed, it will not be taken. Planning
maintenance time is as important as planning to buy
disks. .
Planning
User Requirements
Will user requirements influence your
hardware & software decisions?
Do you need replication?
What architecture is your api going to be?
How many users will be loading the
database and hardware?
c
Planning
Maintenance
× Database/Operating system software need
upgrades. One always hopes one can get
on a stable version of something and not
upgrade. That is a fallacy. Major version
upgrades provide needed and new
functionality. Bug patches and security
patches are a never ending fact of life.
!
Planning
Backup and Recovery
Backup and recovery procedures of vldb
(very large databases) are difficult at best.
Vldb is normally defined as mulitple Gig or
tera byte databases. This is probably the
most sensitive area when choosing a
freeware database.
Hardware plays a part here as well. Insure
when planning for hardware there is plan
for backup and recovery. Disk and tape
x
may be needed.
Planning
Good Practices with a Hammer
Make a standards document and enforce its
use. When dbas and developers are
always on the same page, life is easier for
both. Expectations are clear and defined.
Anger and disappointment are lessened.
System as well as database standards need
to be followed and enforced.
u
Planning
Failover
·
Replication Cont.
× Oracle Supports 3 types of replication READ ONLY Snapshots
(Materialized views), Advanced Replication and streams based
replication.
× Streams allows ddl modifications made to the master
automatically.
× Streams can be configured in uni-
uni-directional ( Single Source
and one or more than targets) or master to master where
updates can happen to any participant database.
× Advanced replication also supports master to master . But
streams based replication is recommended.
AllÔdatabases
use disk to store
data.
Ô
·è
Additional References
·
Additional References
× Oracle Designer tutorial http://www
http://www--
css.fnal.gov/dsg/internal/ora_adm/index.htm#d
esigner (choose Oracle Designer tutorial or
Oracle Designer Short Cuts and Lessons
Learned)