Sunteți pe pagina 1din 136

CS6005 - ADVANCED DATABASE SYSTEMS

UNIT I
PARALLEL AND DISTRIBUTED
DATABASES
Inter and Intra Query Parallelism – Architecture Query
evaluation – Optimization – Distributed Architecture
– Storage – Catalog Management – Query Processing -
Transactions – Recovery - Large-scale Data Analytics
in the Internet Context – Map Reduce Paradigm - run-
time system for supporting scalable and fault-tolerant
execution - paradigms: Pig Latin and Hive and parallel
databases versus Map Reduce.
Parallel Database Systems

• Single administrative domain

• Homogeneous working

environment

• Close proximity of data storage

• Multiple processors
1.3. Objectives
• The primary objective of parallel database processing is to gain
performance improvement
• Two main measures:
– Throughput: the number of tasks that can be completed within a given
time interval
– Response time: the amount of time it takes to complete a single task
from the time it is submitted
• Metrics:
– Speed up
– Scale up

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.3. Objectives (cont’d)

• Scale up
– Handling of larger tasks by increasing the degree of parallelism
– The ability to process larger tasks in the same amount of time by providing more
resources.

• Linear scale up: the ability to maintain the same level of performance
when both the workload and the resources are proportionally added
• Transactional scale up
• Data scale up

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.3. Objectives (cont’d)

• Transaction scale up
– The increase in the rate at which the transactions are processed
– The size of the database may also increase proportionally to the transactions’
arrival rate
– N-times as many users are submitting N-times as many requests or transactions
against an N-times larger database
– Relevant to transaction processing systems where the transactions are small
updates

• Data scale up
– The increase in size of the database, and the task is a large job who runtime
depends on the size of the database (e.g. sorting)
– Typically found in online analytical processing (OLAP)

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.3. Objectives (cont’d)

• Parallel Obstacles
– Start-up and Consolidation costs,
– Interference and Communication, and
– Skew

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.3. Objectives (cont’d)

• Start-up and Consolidation


– Start up: initiation of multiple processes
– Consolidation: the cost for collecting results obtained from each processor by a
host processor
1.3. Objectives (cont’d)

• Interference and Communication


– Interference: competing to access shared resources
– Communication: one process communicating with other processes, and often
one has to wait for others to be ready for communication (i.e. waiting time).

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.3. Objectives (cont’d)

• Skew
– Unevenness of workload
– Load balancing is one of the critical factors to achieve linear speed up

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism
• Forms of parallelism for database processing:
– Interquery parallelism
– Intraquery parallelism
– Interoperation parallelism
– Intraoperation parallelism
– Mixed parallelism

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Interquery Parallelism
– “Parallelism among queries”
– Different queries or transactions are executed in parallel with one another
– Main aim: scaling up transaction processing systems

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Intraquery Parallelism
– “Parallelism within a query”
– Execution of a single query in parallel on multiple processors and disks
– Main aim: speeding up long-running queries

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Execution of a single query can be parallelized in two ways:

– Intraoperation parallelism: Speeding up the processing of a query by


parallelizing the execution of each individual operation (e.g. parallel sort, parallel
search, etc)

– Interoperation parallelism: Speeding up the processing of a query by executing


in parallel different operations in a query expression (e.g. simultaneous sorting or
searching)

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Intraoperation Parallelism
– “Partitioned parallelism”
– Parallelism due to the data
being partitioned
– Since the number of records
in a table can be large, the
degree of parallelism is
potentially enourmous

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Interoperation parallelism: Parallelism created by concurrently


executing different operations within the same query or transaction

– Pipeline parallelism
– Independent parallelism

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Pipeline Parallelism
– Output record of one operation
A are consumed by a second
operation B, even before the
first operation has produced
the entire set of records in its
output

– Multiple operations form some


sort of assembly line to
manufacture the query results

– Useful with a small number of


processors, but does not scale
up well

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Independent Parallelism
– Operations in a query that do
not depend on one another are
executed in parallel

– Does not provide a high


degree of parallelism

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.4. Forms of Parallelism (cont’d)

• Mixed Parallelism
– In practice, a mixture of all available parallelism forms is used.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Parallel Databases

Parallel Databases
WHY ?
Parallel Databases– Why ?
The Philosophy –

The ideal database machine would be a single infinitely fast processor


with an infinite memory with infinite bandwidth – and it would be
infinitely cheap (free).

But do we have such an ideal machine ? NO


So the challenge is to build an infinitely fast processor out of infinitely
many processors of finite speed, and to build an infinitely large memory
with infinite memory bandwidth from infinitely many storage units of
finite speed.

Answer To This Challenge – Parallel Databases


Parallel Databases

Parallel Databases
The Implementation
Parallel Databases-
Implementation
Parallel Database Implementation – The Basic Techniques

Two Key Properties -


Parallel Databases-
Implementation
Two Kinds of Scale up –

Batch – Same query running on N-times larger


database.

Transactional – N-times as many clients,


submitting N-times as many requests against
an N-times larger database.
Parallel Databases-
Implementation
Threats To Linear Speedup/Scale up
Parallel Databases-
Implementation
Hardware Architecture
Shared Memory Shared Disk
Parallel Databases-
Implementation
Hardware Architecture

Shared Nothing
Parallel Databases-
Implementation
Parallel Dataflow Approach To SQL Software

SQL data model was originally proposed to


improve programmer productivity by offering
a nonprocedural database language.
SQL came with Data Independence since the
programs do not specify how the query is to
be executed.
Relational Queries with their properties can
be executed as a dataflow graph and can use
Parallel Databases

Parallel Databases
The Future
Parallel Databases- The
Future
Research Problems
Parallel Query Optimization
Application Program Parallelism
Physical Database Design
On-line Data Reorganization and Utilities
Parallel Databases- The
Future
Future Directions
Many commercial success stories.
But research issues still remain unresolved.
Some applications are not well supported by
relational data model.
Object-oriented design ??
Introduction
• Parallel machines are becoming quite
common and affordable
– Prices of microprocessors, memory and disks
have dropped sharply
• Databases are growing increasingly large
– large volumes of transaction data are collected
and stored for later analysis.
– multimedia objects like images are increasingly
stored in databases
Introduction
• Large-scale parallel database systems
increasingly used for:

 storing large volumes of data

 processing time-consuming decision-support


queries

 providing high throughput for transaction


processing
Parallelism in Databases
• Data can be partitioned across multiple
disks for parallel I/O.
• Individual relational operations (e.g., sort,
join, aggregation) can be executed in
parallel
– data can be partitioned and each processor
can work independently on its own
partition.
• Queries are expressed in high level
language (SQL, translated to
relationalalgebra)
– makes parallelization easier.
• Different queries can be run in parallel
with each other.Concurrency control takes
care of conflicts.
• Thus, databases naturally lend
themselvesto parallelism.
Parallel Database Architectures
Parallel Database Architectures
• Shared memory -- processors share a
common memory
• Shared disk -- processors share a common
disk
• Shared nothing -- processors share neither a
common memory nor common disk
• Hierarchical -- hybrid of the above
architectures
Shared Memory
• Extremely efficient communication between
processors
• Downside –is not scalable beyond 32 or 64
processors Widely used for lower degrees of
parallelism (4 to 8).
Shared Disk
• Examples: IBM Sysplex and DEC clusters
(now part of Compaq/HP) running Rdb (now
Oracle Rdb) were early commercial users
• Downside: bottleneck at interconnection to
the disk subsystem.
• Shared-disk systems can scale to a
somewhat larger number of processors, but
communication between processors is
slower.
Shared Nothing
• Examples: Teradata, Tandem, Oracle-n CUBE
• Data accessed from local disks (and local memory
accesses) do not pass through interconnection
network, thereby minimizing the interference of
resource sharing.
• Shared-nothing multiprocessors can be scaled up to
thousands of processors without interference.
• Main drawback: cost of communication and non-
local disk access; sending data involves software
interaction at both ends.
1.5. Parallel Database Architectures
• Parallel computers are no longer a monopoly of
supercomputers
• Parallel computers are available in many forms:

 Shared-memory architecture
 Shared-disk architecture

 Shared-nothing architecture

 Shared-something architecture
1.5. Parallel Database Architectures (cont’d)
• Shared-Memory and Shared-Disk Architectures
– Shared-Memory: all processors share a common main
memory and secondary memory
– Load balancing is relatively easy to achieve, but suffer
from memory and bus contention
– Shared-Disk: all processors, each of which has its own
local main memory, share the disks
1.5. Parallel Database Architectures (cont’d)

• Shared-Nothing Architecture
– Each processor has its own local main memory and
disks
– Load balancing becomes difficult

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
1.5. Parallel Database Architectures (cont’d)

• Shared-Something Architecture
– A mixture of shared-memory and shared-nothing architectures
– Each node is a shared-memory architecture connected to an
interconnection network ala shared-nothing architecture
1.5. Parallel Database Architectures (cont’d)

• Interconnection Networks
– Bus, Mesh, Hypercube
Parallel System Performance
Measure
• Speedup: = small system elapsed time
large system elapsed time
• Scaleup: = small system small problem elapsed time

big system big problem elapsed time

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Database Performance Measures
• throughput --- the number of tasks that can
be completed in a given time interval
• response time --- the amount of time it takes
to complete a single task from the time it is
submitted

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Parallel Database Issues
• Data Partitioning
• Parallel Query Processing

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
I/O Parallelism
• Horizontal partitioning – tuples of a relation are
divided among many disks such that each tuple
resides on one disk.
• Partitioning techniques (number of disks = n):
– Round-robin
– Hash partitioning
– Range partitioning

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Parallel Databases-
Implementation
Data Partitioning

Partitioning a relation involves distributing its


tuples over several disks.
Three Kinds –
 Round-robin Partitioning

 Range Partitioning

 Hashing Partitioning
Parallel Databases-
Implementation
Range Round-Robin Hashing
Parallel Databases-
Implementation
Round-Robin
Ideal for applications that wish to read entire relation
sequentially for each query.
Not ideal for point and range queries, since each of the n disks
must be searched.

Hash
Ideal for point queries based on the partitioning attribute.
Ideal for sequential scans of the entire relation.
Not ideal for point queries on non-partitioning attributes.
Not ideal for range queries on the partitioning attribute.

Range
Ideal for point and range queries on the partitioning attribute.
Parallel Databases-
Implementation
Handling Of Skew
The distribution of tuples when a relation
is partitioned (except for Round-Robin) may
be skewed, with a high percentage of tuples
placed in some partitions and fewer tuples in
other partitions.
2 Kinds –
Data Skew (Attribute-value Skew)
Execution Skew (Partition Skew)
Parallel Databases-
Implementation
Parallelism With Relational Operators

Consider a simple sequential query –


Parallel Databases-
Implementation
A Relational Dataflow Graph
Parallel Databases-
Implementation
Parallel Databases-
Implementation
Famous Implementations Of Parallel
Databases

Teradata
Tandem NonStop SQL
Gamma
The Super Database Computer
Bubba
nCUBE
Skew
• The distribution of tuples to disks may be
skewed
– Attribute-value skew.
• Some values appear in the partitioning attributes of
many tuples
– Partition skew.
• Too many tuples to some partitions and too few to
others
• Round robin handles skew well
• Hashing and ranging may result in skew
Yan Huang - CSCI5330 Database
04/25/2005
Implementation – Parallel Database
Typical Database Query Types
• Sequential scan
• Point query
• Range query

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Comparison of Partitioning
Techniques
Round Hashing Range
Robin
Sequential Best/good Good Good
Scan parallelism

Point Query Difficult Good for hash key Good for range
vector

Range Query Difficult Difficult Good for range


vector

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Handling Skew using Histograms

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Interquery Parallelism
• Queries/transactions execute in parallel
with one another.
• Increase throughput

Yan Huang - CSCI5330 Database


04/25/2005
Implementation – Parallel Database
Intraquery Parallelism
• Execution of a single query in parallel on
multiple processors/disks
• Speed up long-running queries.
• Two complementary forms of intraquery
parallelism :
– Intraoperation Parallelism – parallelize the
execution of each individual operation in the
query.
– Interoperation Parallelism – execute the
different operations in a query expression in
Yan Huang - CSCI5330 Database
04/25/2005
Implementation – Parallel Database
parallel.
Why Parallel Access To Data?
At 10 MB/s 1,000 x parallel
1.2 days to scan 1.5 minute to scan.

1 Terabyte 1 Terabyte

10 MB/s Parallelism:
Divide a big problem
into many smaller ones
to be solved in parallel.
Parallel DBMS: Intro
• Parallelism is natural to DBMS processing
– Pipelined parallelism: many machines each
doing one step in a multi-step process.
– Partitioned parallelism: many machines doing
the same thing to different pieces of data.
– Both are natural in DBMS!
Any Any
Sequential Sequential
Program
Pipeline Program

Sequential
Any
Sequential Any
Partition Sequential
Program
Sequential
Program
outputs split N ways, inputs merge M ways
DBMS: The || Success Story
• For a long time, DBMSs were the most (only?!)
successful/commercial application of parallelism.
– Teradata, Tandem vs. Thinking Machines, KSR.
– Every major DBMS vendor has some || server.
– (Of course we also have Web search engines now. )
• Reasons for success:
– Set-oriented processing (= partition ||-ism).
– Natural pipelining (relational operators/trees).
– Inexpensive hardware can do the trick!
– Users/app-programmers don’t need to think in ||
Some || Terminology Ideal

• Speed-Up

(throughput)
Xact/sec.
– Adding more resources
results in proportionally
less running time for a degree of ||-ism
fixed amount of data.
• Scale-Up
Ideal
– If resources are

(response time)
increased in proportion
sec./Xact
to an increase in
data/problem size, the
overall time should degree of ||-ism
remain constant.
Architecture Issue: Shared What?
Shared Memory Shared Disk Shared Nothing
(SMP) (network)

CLIENTS CLIENTS CLIENTS

Processors
Memory

Easy to program (Use affinity routing Hard to program


Expensive to build to approximate SN- Cheap to build
Difficult to scale like non-contention) Easy to scale

Sequent, SGI, Sun VMScluster, Sysplex Tandem, Teradata, SP2


What Systems Work This Way
(as of 9/1995)

Shared Nothing CLIENTS


Teradata: 400 nodes
Tandem: 110 nodes
IBM / SP2 / DB2: 128 nodes
Informix/SP2 48 nodes
ATT & Sybase ? nodes

Shared Disk CLIENTS

Oracle 170 nodes


DEC Rdb 24 nodes

Shared Memory
Informix 9 nodes
RedBrick ? nodes CLIENTS

Processors
Memory
Different Types of DBMS ||-ism
• Intra-operator parallelism
– get all machines working together to compute a
given operation (scan, sort, join)
• Inter-operator parallelism
– each operator may run concurrently on a different
site (exploits pipelining)
• Inter-query parallelism
– different queries run on different sites
• We’ll focus mainly on intra-operator ||-ism
Automatic Data Partitioning
Partitioning a table:
Range Hash Round Robin

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Good for equijoins, Good for equijoins, Good to spread load


exact-match queries, exact match queries
and range queries
Shared disk and memory less sensitive to partitioning.
Shared nothing benefits from "good" partitioning.
Parallel Scans/Selects

• Scan in parallel and merge (a.k.a. union all).


• Selection may not require all sites for range
or hash partitioning, but always does for RR.
• Indexes can be constructed on each
partition.
– Indexes useful for local accesses, as expected.
– However, what about unique indexes...?
(May not always want primary key partitioning!)
Secondary Indexes
• Secondary indexes become a bit troublesome
in the face of partitioning...
• Can partition them via base table key. A..Z A..Z A..Z A..Z A..Z

– Inserts local (unless unique??).


Base Table
– Lookups go to ALL indexes.
• Can partition by secondary key ranges.
– Inserts then hit 2 nodes (base, index).
– Ditto for index lookups (index, base).
– Uniqueness is easy, however. A..C D..F G...M N...R S..•
• Teradata’s index partitioning solution: Base Table
– Partition non-unique by base table key.
– Partition unique by secondary key.
Grace Hash Join
Partitions
OUTPUT
1 1
INPUT 2
Phase 1

hash 2
Original Relations ... function
h
(R then S) B-1
B-1
Disk B main memory buffers Disk

• In Phase 1 in the parallel case, partitions will get


distributed to different sites:
– A good hash function automatically distributes work
evenly! (Diff hash fn for partitioning, BTW.)
• Do Phase 2 (the actual joining) at each site.
• Almost always the winner for equi-joins.
Dataflow Network for || Joins

• Use of split/merge makes it easier to build


parallel versions of sequential join code.
Parallel Sorting

• Basic idea:
– Scan in parallel, range-partition as you go.
– As tuples arrive, perform “local” sorting.
– Resulting data is sorted and range-partitioned
(i.e., spread across system in known way).
– Problem: skew!
– Solution: “sample” the data at the outset to
determine good range partition points.
Parallel Aggregation
• For each aggregate function, need a
decomposition:
– count(S) = S count(s(i)), ditto for sum()
– avg(S) = (S sum(s(i))) / S count(s(i))
– and so on...
Count

v For groups: Count Count Count Count Count

– Sub-aggregate groups close to


the source.
A Table
– Pass each sub-aggregate to its
A...E F...J K...N O...S T...Z
group’s partition site.
Complex Parallel Query Plans
• Complex Queries: Inter-Operator parallelism
– Pipelining between operators:
• note that sort or phase 1 of hash-join block the
pipeline!
– Bushy Trees
Sites 1-8

Sites 1-4 Sites 5-8

A B R S
Observations
• It is relatively easy to build a fast parallel
query executor.
– S.M.O.P., well understood today.
• It is hard to write a robust and world-class
parallel query optimizer.
– There are many tricks.
– One quickly hits the complexity barrier.
– Many resources to consider simultaneously (CPU,
disk, memory, network).
Parallel Query Optimization

• Common approach: 2 phases


– Pick best sequential plan (System R algorithm)
– Pick degree of parallelism based on current
system parameters.
• “Bind” operators to processors
– Take query tree, “decorate” it with site
assignments as in previous picture.
What’s Wrong With That?
• Best serial plan != Best || plan! Why?
• Trivial counter-example:
– Table partitioned with local secondary index at
two nodes
– Range query: all of node 1 and 1% of node 2.
– Node 1 should do a scan of its partition.
Table Index
Scan Scan
– Node 2 should use secondary index.
• SELECT *
FROM telephone_book A..M N..Z

WHERE name < “NoGood”;


Parallel DBMS Summary

• ||-ism natural to query processing:


– Both pipeline and partition ||-ism!
• Shared-Nothing vs. Shared-Memory
– Shared-disk too, but less “standard” (~older...)
– Shared-memory easy, costly. Doesn’t scaleup.
– Shared-nothing cheap, scales well, harder to
implement.
• Intra-op, Inter-op, & Inter-query ||-ism all
possible.
|| DBMS Summary, cont.

• Data layout choices important!


– In practice, will not N-way partition every table.
• Most DB operations can be done partition-
||
– Select, sort-merge join, hash-join.
– Sorting, aggregation, ...
• Complex plans.
– Allow for pipeline-||ism, but sorts and hashes
block the pipeline.
– Partition ||-ism achieved via bushy trees.
|| DBMS Summary, cont.

• Hardest part of the equation: optimization.


– 2-phase optimization simplest, but can be
ineffective.
– More complex schemes still at the research
stage.
• We haven’t said anything about xacts,
logging, etc.
– Easy in shared-memory architecture.
– Takes a bit more care in shared-nothing
architecture
|| DBMS Challenges (mid-1990’s)
• Parallel query optimization.
• Physical database design.
• Mixing batch & OLTP activities.
– Resource management and concurrency challenges for
DSS queries versus OLTP queries/updates.
– Also online, incremental, parallel, and recoverable
utilities for load, dump, and various DB reorg ops.
• Application program parallelism.
– MapReduce, anyone...?
– (Some new-ish companies looking at this, e.g.,
GreenPlum, AsterData, …)

S-ar putea să vă placă și