Sunteți pe pagina 1din 39

Advanced Database Systems

Outline for Today


What this class is about: Data management What we will cover in this class Requerment Applications Different adb

Data Management
Application
Query Query

Query

Data

DataBase Management System (DBMS)

Example: At a Company
Query 1: Is there an employee named Nemo? Query 2: What is Nemos salary? Query 3: How many departments are there in the company? Query 4: How many employees have Salary >= 80K? Query 5: What is the name of Nemos department? Query 6: How many employees are there in the Accounts department? Employee
ID 10 20 40 52 Name Nemo Dory Gill Ray DeptID 12 156 89 34 Salary 120K 79K 76K 85K

Department
ID 12 34 89 156 Name IT Accounts HR Marketing

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS

Data

Example: Store that Sells Cars


Make Model OwnerID ID Name Owners of 12 12 Nemo Honda Accords Honda Accord who are <= Honda Accord 156 156 Dory 23 years old Join (Cars.OwnerID = Owners.ID) Filter (Make = Honda and Model = Accord) Age 22 21

Filter (Age <= 23)

Cars
Make Honda Toyota Mini Honda Model Accord Camry Cooper Accord OwnerID 12 34 89 156

Owners
ID 12 34 89 156 Name Nemo Ray Gill Dory Age 22 42 36 21

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.

Data

DBMS is multi-user
Example
Get account balance from database; If balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;

Homer at ATM1 withdraws $100 Marge at ATM2 withdraws $50 Initial balance = $400, final balance = ? Should be $250 no matter who goes first

Final balance = $300

Homer withdraws $100: Marge withdraws $50:

read balance; $400

read balance; $400 if balance > amount then balance = balance - amount; $3 write balance; $350

if balance > amount then balance = balance - amount; $300 write balance; $300

Final balance = $350

Homer withdraws $100: Marge withdraws $50:


read balance; $400

read balance; $400

if balance > amount then balance = balance - amount; $300 write balance; $300 if balance > amount then balance = balance - amount; $3 write balance; $350

Concurrency control in DBMS


Similar to concurrent programming problems But data is not all in main-memory Appears similar to file system concurrent access? Approach taken by MySQL initially; now MySQL offers better alternatives But want to control at much finer granularity Or else one withdrawal would lock up all accounts!

Recovery in DBMS
Example: balance transfer decrement the balance of account X by $100; increment the balance of account Y by $100; Scenario 1: Power goes out after the first instruction Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out Log updates; undo/redo during recovery

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.

Data

Motivation
Relational databases are tuned towards: simple data simple, ad-hoc queries multiple users Other models are more suitable for other types of data Object-Oriented, Deductive, Semi-Structured Databases, Data warehouses

Limitations of the relational model


Not every query can be expressed
Transitive closure cannot be expressed in Relational Algebra
Give all cities reachable from Antwerp by plane Give all smallest components of a part Give all decendants of person X

Not even if youre very smart


proof

Extension to other relational query languages

Deductive Databases
Motivation is two-fold:
add deductive capabilities to databases; the database contains:
facts (intensional relations) rules to generate derived facts (extensional relations)

Database is knowledge base


Extend the querying
datalog allows for recursion

Deductive Databases
Datalog as engine of deductive databases
similarities with Prolog has facts and rules rules define -possibly recursive- views

Semantics not always clear


safety negation recursion

Deductive Databases
g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y).

Deductive Databases
In this topic we study:
How to handle negation and recursion in the same program How to efficiently evaluate Datalog queries

OO Databases
Many applications require the storage and manipulation of complex data
design databases geometric databases

Object-Oriented programming languages manipulate complex objects


classes, methods, inheritance, polymorphism

OO Databases
Very simple example:
Class book
set of authors title set of keywords

Extremely simple to model in OO language Hard in relational database!

OO Databases
In many applications persistency of the data is nevertheless required
protection against system failure consistency of the data

Mapping: object in OO language tuples of atomic values in relational database is often problematic

OO Databases

Title Database System Concepts Database System Concepts Database System Concepts Database System Concepts Database System Concepts Database System Concepts

Author Silberschatz Korth Sudarshan Silberschatz Korth Sudarshan

Keyword Database Database Database Storage Storage Storage

OO Databases
Or we go to 4NF
Title Database System Concepts Database System Concepts Database System Concepts Title Database System Concepts Database System Concepts Author Silberschatz Korth Sudarshan Keyword Database Storage

OO Databases
Basically OODB = persistent OO programming language
Very important concept rather uninteresting scientifically

This topic will mainly be self-study


Reading bookchapter + Q & A session

Data Warehousing & OLAP


other Metadata

sources
Operational Extract Transform Load Refresh

Monitor & Integrator

OLAP Server

Analysis

Query/Reporting

DBs

Data Warehouse

Serve
Data Mining

Data Marts

ROLAP Server

Data Sources

Data Storage

OLAP Engine Front-End Tools

Data Warehousing & OLAP


Transaction processing Operational setting Up-to-date = critical Simple data Simple queries; only touch a small part of the database Flight reservations ticket sales do not sell a seat twice reservation, date, name Give flight details of X List flights to Y

Data Warehousing & OLAP


Decision support Off-line setting Historical data Summarized data Integrate different databases Statistical queries Flight company Evaluate ROI flights Flights of last year # passengers per carrier for destination X Passengers, fuel costs, maintenance info Average % of seats sold/month/destination

Data Warehousing & OLAP


In this topic we will study:
Conceptual models for decision support Database explosion problem Efficient implementation strategies
indexing, view materialization

XML
Why is XML important?
simple open non-proprietary widely accepted data exchange format

XML is like HTML but


no fixed set of tags
X = extensible

no fixed semantics (c.q. representation) of tags


representation determined by separate stylesheet semantics determined by application

no fixed structure
user-defined schemas

XML
<PersonList Type="Student" Date="2004-12-12"> <Title Value="Student List"/> <Contents> <Person> <Name>Jan Vijs</Name> <Id>11</Id> <Address> <Number>123</Number> <Street>Turnstreet</Street> </Address> </Person> <Person> <Id>66</Id> <Address> <Street>Hole Rd</Street> </Address> </Person> </Contents> </PersonList>

XML
In this topic:
XML XQuery, XSLT LiXQuery

Taught by prof Paredaens

Summary of modern DBMS features


Persistent storage of data Logical data model; declarative queries and updates ! physical data independence Multi-user concurrent access Safety from system failures Performance, performance, performance Massive amounts of data (terabytes ~ petabytes) High throughput (thousands ~ millions transactions per minute) High availability ( 99.999% uptime)

Modern DBMS Architecture


Applications SQL DBMS Parser Logical query plan Query Optimizer Physical query plan Query Executor Access method API calls Storage Manager

File system API calls Storage system API calls OS Disk(s)

Using a Traditional DBMS


User/Application Query Query Result Result

Loader

Table R Table S

New Approach for Data Streams


User/Application Register Continuous Query (Standing Query)

Result

Input streams

Stream Query Processor

Example Continuous (Standing) Queries


Web Amazons best sellers over last hour Network Intrusion Detection Track HTTP packets with destination address matching a prefix in given table and content matching *\.ida Finance Monitor NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes

New Challenges in DBMSs


High-level Query Q Answer

DBMS

TeraBytes PetaBytes

Data

<CD> <TITLE>Empire B.</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia </COMPANY> <PRICE>10.90</PRICE> </CD>

Summary: Data Management is Important


Core aspect of most sciences and engineering today Core need in industry Cool mix of theory and systems Chances are you will find something interesting even if you primary interest is elsewhere

S-ar putea să vă placă și