Distributed Dbs

Distributed Database Systems
Vera Goebel
Department of Informatics
University of Oslo
2011
1
Contents
Review: Layered DBMS Architecture
Distributed DBMS Architectures
DDBMS Taxonomy
Client/Server Models
Key Problems of Distributed DBMS
Distributed data modeling
Distributed query processing & optimization
Distributed transaction management
Concurrency Control
Recovery
2
Applications
Data Model / View Management
Transaction Management
Functional Layers
Interface
of a DBMS
Control Semantic Integrity Control / Authorization

Compilation
Execution
Query Processing and Optimization

Storage Structures
Data Access
Buffer Management
Consistency
Concurrency Control / Logging
Database
3
Dependencies among DBMS components

Transaction
Mgmt
Access Path
Mgmt
Sorting
component
Lock
component
Log component
(with savepoint mgmt)
Central
Components
System Buffer
Mgmt
Indicates a dependency
4
Centralized DBS
logically integrated
physically centralized
T1
T2
T3
network
T4
DBS
T5
Traditionally: one large mainframe DBMS + n stupid terminals

5
Distributed DBS
Data logically integrated (i.e., access based on one schema)
Data physically distributed among multiple database nodes
Processing is distributed among multiple database nodes
T1
T3
T2
DBS1
network
DBS2
DBS3
Why a Distributed DBS?

Performance via parallel execution
- more users
- quick response
More data volume
- max disks on multiple nodes
Traditionally: m mainframes for the DBMSs + n terminals

6
DBMS Implementation Alternatives

Distribution
Distributed
DBMS DBMS
Homogeneous
Distributed Homog.
Federated DBMS
Distributed
Multi-DBMS
Client/Server Distribution
Distributed
Heterogeneous DBMS
Distributed Heterog.
Federated DBMS
Centralized
DBMS DBMS
Homogeneous
Centralized
Heterogeneous DBMS
Distributed Heterog.
Multi-DBMS
Centralized Homog.
Federated DBMS
Centralized Heterog.
Federated DBMS
Centralized Homog.
Multi-DBMS
Autonomy
Centralized Heterog.
Multi-DBMS
Heterogeneity
7
Common DBMS Architectural Configurations

No Autonomy
N1
N3
N2
N4
Fully integrated nodes

Complete cooperation on:
- Data Schema
- Transaction Mgmt
Fully aware of all other
nodes in the DDBS
Federated
Multi
D1
GTM & SM
D3
D2
D4
Independent DBMSs
Implement some
cooperation functions:
- Transaction Mgmt
- Schema mapping
Aware of other DBSs
in the federation
D1
D2
D3
D4
Fully independent DBMSs

Independent, global
manager tries to
coordinate the DBMSs
using only existing
DBMS services
Unaware of GTM and
other DBSs in the MultiDBS
8
Parallel Database System Platforms

(Non-autonomous Distributed DBMS)
Shared Everything
Proc1 Proc2
Shared Nothing
... ProcN
Fast Interconnection Network

Mem1
Mem2
...
MemM
Disk 1
...
Disk p
Typically a special-purpose hardware

configuration, but this can be emulated
in software.
Proc1
...
Fast Interconnection
Network
MemN
Mem1
Disk 1
ProcN
...
Disk N
Can be a special-purpose hardware

configuration, or a network of workstations.
Distributed DBMS
Advantages:
Improved performance
Efficiency
Extensibility (addition of new nodes)
Transparency of distribution
Storage of data
Query execution
Autonomy of individual nodes
Problems:
Complexity of design and implementation

Data consistency
Safety
Failure recovery
10
Client/Server Database Systems
The Simple Case of Distributed

Database Systems?
11
Client/Server Environments
data (object) server + n smart clients (workstations)
objects stored and administered on server

objects processed (accessed and modified) on
workstations [sometimes on the server too]
CPU-time intensive applications on workstations
GUIs, design tools
Use client system local storage capabilities

combination with distributed DBS services
=> distributed server architecture
12
Clients with Centralized Server Architecture

workstations
...
local communication
network
Data
Server
interface
database functions
...
Disk 1
database
Disk n
13
Clients with Distributed Server Architecture

workstations
...
local communication network

Data Server 1
Data Server m
interface
interface
distributed DBMS
...
distributed DBMS
local data mgnt. functions
local data mgnt. functions
...
...
Disk 1
database
Disk n
Disk 1
database
Disk p
14
Clients with Data Server Approach

3-tier client/server
...
user interface
Application
Server
query parsing
data server interface
communication
channel
Data
Server
application server interface

database functions
...
Disk 1
database
Disk n
15
In the Client/Server DBMS Architecture,

how are the DB services organized?
There are several architectural options!
16
Client/Server Architectures
server process
Relational
client process
Application
cursor
management
Object-Oriented
SQL
client process
Application
server process
object/page cache
management
objects,
pages, or
files
DBMS
17
Object Server Architecture

server process
client process
Log/Lock
Manager
Application
Object Manager
Object
Cache
objects
object references
queries
method calls
locks
log records
Object
Manager
Object
Cache
File/Index
Manager
Page Cache
Manager
Page
Cache
Storage Allocation
and I/O
Database
18
Object Server Architecture - summary
Unit of transfer: object(s)

Server understands the object concept and can execute methods
most DBMS functionality is replicated on client(s) and server(s)
Advantages:
- server and client can run methods, workload can be balanced
- simplifies concurrency control design (centralized in server)
- implementation of object-level locking
- low cost for enforcing constraints on objects
Disadvantages/problems:
- remote procedure calls (RPC) for object references
- complex server design
- client cache consistency problems
- page-level locking, objects get copied multiple times, large
objects
19
Page Server Architecture

client process
Application
Object
Cache
Object Manager
server process
File/Index
Manager
Page Cache
Manager
pages
page references
Log/Lock Page Cache
locks
Manager
Manager
log records
Page
Cache
Storage Allocation
and I/O
Page
Cache
database
20
Page Server Architecture - summary

unit of transfer: page(s)
server deals only with pages (understands no object semantics)
server functionality: storage/retrieval of page(s), concurrency
control, recovery
advantages:
- most DBMS functionality at client
- page transfer -> server overhead minimized
- more clients can be supported
- object clustering improves performance considerably
disadvantages/problems:
- method execution only on clients
- object-level locking
- object clustering
- large client buffer pool required
21
File Server Architecture

client process
Application
Object
Cache
Object Manager
File/Index
Manager
Page
Cache
server process
locks
log records
Log/Lock
Manager
pages
page references
Space
Allocation
NFS
Page Cache
Manager
database
22
File Server Architecture - summary

unit of transfer: page(s)
simplification of page server, clients use remote file system (e.g.,
NFS) to read and write DB page(s) directly
server functionality: I/O handling, concurrency control, recovery
advantages:
- same as for page server
- NFS: user-level context switches can be avoided
- NFS widely-used -> stable SW, will be improved
disadvantages/problems:
- same as for page server
- NFS write are slow
- read operations bypass the server -> no request combination
- object clustering
- coordination of disk space allocation
23
Cache Consistency in Client/Server Architectures

Synchronous
AvoidanceBased
Algorithms
DetectionBased
Algorithms
Asynchronous
Deferred
Client sends 1 msg

per lock to server;
Client waits;
Server replies with
ACK or NACK.
Client sends 1 msg

per lock to the server;
Client continues;
Server invalidates
cached copies at
other clients.
Client sends all write

lock requests to the
server at commit time;
Client waits;
Server replies when
all cached copies are
freed.
Client sends object

status query to server
for each access;
Client waits;
Server replies.
Client sends 1 msg

per lock to the server;
Client continues;
After commit, the
server sends updates
to all cached copies.
Client sends all write

lock requests to the
server at commit time;
Client waits;
Server replies based
on W-W conflicts only.
Best Performing Algorithms

24
Comparison of the 3 Client/Server Architectures

Page & File Server
Simple server design
Complex client design
Fine grained
concurrency control
difficult
Very sensitive to
client buffer pool size
and clustering
Object Server
Complex server design
Relatively simple client
design
Fine-grained concurrency
control
Reduces data movement,
relatively insensitive to
clustering
Sensitive to client buffer
pool size
Conclusions:
No clear winner
Depends on object size and applications object access pattern
File server ruled out by poor NFS performance
25
Problems in Distributed DBMS Services

Distributed Database Design
Distributed Directory/Catalogue Mgmt
Distributed Query Processing and Optimization
Distributed Transaction Mgmt
Distributed Concurreny Control
Distributed Deadlock Mgmt
Distributed Recovery Mgmt
directory management
query processing
distributed DB design
concurrency control
(lock)
influences
reliability (log)
transaction
management
deadlock management
26
Distributed Storage in Relational DBMSs

horizontal fragmentation: distribution of rows, selection
vertical fragmentation: distribution of columns, projection
hybrid fragmentation: projected columns from selected rows

allocation: which fragment is assigned to which node?
replication: multiple copies at different nodes, how many copies?
Design factors:
Most frequent query access patterns
Available distributed query processing algorithms
Evaluation Criteria
Cost metrics for: network traffic, query processing, transaction mgmt
A system-wide goal: Maximize throughput or minimize latency
27
Distributed Storage in OODBMSs

Must fragment, allocate, and replicate object data among nodes
Complicating Factors:
Encapsulation hides object structure

Object methods (bound to class not instance)
Users dynamically create new classes
Complex objects
Effects on garbage collection algorithm and performance
Approaches:
Store objects in relations and use RDBMS strategies to distribute data
All objects are stored in binary relations [OID, attr-value]
One relation per class attribute
Use nested relations to store complex (multi-class) objects
Use object semantics
Evaluation Criteria:
Affinity metric (maximize)
Cost metric (minimize)
28
Horizontal Partitioning using Object Semantics

Divide instances of a class into multiple groups
Based on selected attribute values
Node1: Attr A < 100
Attr A >= 100
Based on subclass designation
: Node2
Class C
Node1: Subcl X
Subcl Y : Node2
Fragment an objects complex attributes

from its simple attributes
Fragment an objects attributes based on
typical method invocation sequence
Keep sequentially referenced attributes together
29
Vertical Partitioning using Object Semantics
Fragment object attributes based on the

class hierarchy
Instances of Class X
Instances of Subclass subX
Instances of Subclass subsubX
Fragment #1
Fragment #2
Fragment #3
Breaks object encapsulation?

30
Path Partitioning using Object Semantics

Use the object composition graph for complex
objects
A path partition is the set of objects
corresponding to instance variables in the
subtree rooted at the composite object
Typically represented in a structure index
Composite
Object OID
{SubC#1-OID, SubC#2-OID, SubC#n-OID}
31
More Issues in OODBMS Fragmentation

Local Object Data
Good performance;
Local
Methods Issue: Replicate methods so they
are local to all instances?
Remote
Methods
Send data to remote site, execute,

and return result
OR fetch the method and execute;
Issues: Time/cost for two
transfers?
Ability to execute locally?
Remote Object Data

Fetch the remote data and execute
the methods;
Issue: Time/cost for data transfer?
Send additional input values via
RPC, execute and return result;
Issues: RPC time?
Execution load on remote node?
Replication Options
Objects
Classes (collections of objects)
Methods
Class/Type specifications
32
Distributed Directory and Catalogue Management

Directory Information:
Description and location of records/objects
Size, special data properties (e.g.,executable, DB type,
user-defined type, etc.)
Fragmentation scheme
Definitions for views, integrity constraints
Options for organizing the directory:
Issues: bottleneck, unreliable

Centralized
Issues: consistency, storage overhead
Fully replicated
Issues: complicated access protocol, consistency
Partitioned
e.g., zoned,
Combination of partitioned and replicated
replicated zoned
33
Distributed Query Processing and Optimization

Construction and execution of query plans, query optimization
Goals: maximize parallelism (response time optimization)
minimize network data transfer (throughput optimization)
Basic Approaches to distributed query processing:

Pipelining functional decomposition
Rel A
Node 1
Select A.x > 100
Node 2
Processing Timelines
Rel B
Join A and B on y
Parallelism data decomposition

Frag A.1
Frag A.2
Node 1
Select A.x > 100
Node 2
Node 1:
Node 2:
Processing Timelines
Node 3
Node 1:
Node 2:
Node 3:
Union
34
Creating the Distributed Query Processing Plan

Factors to be considered:
- distribution of data
- communication costs
- lack of sufficient locally available information
4 processing steps:
(1) query decomposition
(2) data localization
(3) global optimization
(4) local optimization
control site (uses global information)
local sites (use local information)
35
Generic Layered Scheme for Planning a Dist. Query

calculus query on distributed objects
control site
query decomposition
Global
schema
algebraic query on distributed objects

data localization
Fragment
schema
fragment query
global optimization
Statistics on
fragments
local sites
optimized fragment query

with communication operations
local optimization
This approach was

initially designed for
distributed relational
DBMSs.
It also applies to
distributed OODBMSs,
Multi-DBMSs, and
distributed MultiDBMSs.
Local
schema
optimized local queries

36
Distributed Query Processing Plans in OODBMSs

Two forms of queries
Explicit queries written in OQL
Object navigation/traversal
Reduce to logical
algebraic expressions
Planning and Optimizing

Additional rewriting rules for sets of objects
Union, Intersection, and Selection
Cost function
Should consider object size, structure, location, indexes, etc.
Breaks object encapsulation to obtain this info?
Objects can reflect their access cost estimate
Volatile object databases can invalidate optimizations

Dynamic Plan Selection (compile N plans; select 1 at runtime)
Periodic replan during long-running queries
37
Distributed Query Optimization

information needed for optimization (fragment statistics):
- size of objects, image sizes of attributes
- transfer costs
- workload among nodes
- physical data layout
- access path, indexes, clustering information
- properties of the result (objects) - formulas for estimating the
cardinalities of operation results
execution cost is expressed as a weighted combination of I/O,
CPU, and communication costs (mostly dominant).
total-cost = CCPU * #insts + CI/O * #I/Os + CMSG * #msgs + CTR * #bytes
Optimization Goals: response time of single transaction or system throughput
38
Distributed Transaction Managment

Transaction Management (TM) in centralized DBS
Achieves transaction ACID properties by using:
concurrency control (CC)
recovery (logging)
TM in DDBS
Achieves transaction ACID properties by using:
Transaction Manager
Concurrency Control
(Isolation)
Recovery Protocol
(Durability)
Log Manager
Buffer Manager
Commit/Abort Protocol
(Atomicity)
Replica Control Protocol

(Mutual Consistency)
Strong algorithmic
dependencies
39
Classification of Concurrency Control Approaches

Isolation
CC Approaches
Pessimistic
Locking
Centralized
Primary
Copy
Distributed
Timestamp
Ordering
Optimistic
Hybrid
Locking
Timestamp
Ordering
Basic
MultiVersion
Conservative
phases of pessimistic
transaction execution
validate
phases of optimistic
transaction execution
read
read
compute
write
compute
validate
write
40
Two-Phase-Locking Approach (2PL)
Obtain Lock
Number of locks
2PL Lock Graph
Isolation
Release Lock
BEGIN
LOCK
POINT
END
Transaction
Duration
Number of locks
Obtain Lock
Release Lock
Strict 2PL Lock Graph

BEGIN
END
Period of data item use
Transaction
Duration
41
Communication Structure of Centralized 2PL

Isolation
Data Processors at
participating sites
Coordinating TM
Central Site LM
1
Lock Request
2
Lock Granted
3
Operation
4
End of Operation
5
Release Locks
42
Communication Structure of Distributed 2PL

Isolation
Data Processors at
participating sites
Participating LMs
Coordinating TM
1
Lock Request
2
Operation
3
End of Operation
4
Release Locks
43
Distributed Deadlock Management
Isolation
Example
Site X
waits for T1 x
to release Lc
T2 x
T1 x
holds lock Lc
holds lock Lb
T1 x needs a
waits for T1
on site y
to complete
Site Y
T2 y needs b
waits for T2
on site x
to complete
waits for T2 y
to release Ld
T1 y
holds lock La
Distributed Waits-For Graph

- requires many messages
to update lock status
- combine the wait-for
tracing message with lock
status update messages
- still an expensive algorithm
T2 y
holds lock Ld
44
Communication Structure of Centralized 2P Commit Protocol

Coordinating TM
Atomicity
Participating Sites
1
Prepare
Make local commit/abort decision

Write log entry
2
Vote-Commit or Vote-Abort
Count the votes

If (missing any votes
or any Vote-Abort)
Then Global-Abort
Else Global-Commit
Who participates?
Depends on the CC alg.
Global-Abort or Global-Commit
Write log entry
4
ACK
Other communication
structures are
possible:
Linear
Distributed
45
State Transitions for 2P Commit Protocol

Coordinating TM
Participating Sites
Initial
1
Prepare
Advantages:
- Preserves atomicity
- All processes are
synchronous within
one state transition
Initial
Wait
2
Vote-Commit
or Vote-Abort
Ready
Global-Abort or
Global-Commit
Commit
Atomicity
Abort
4
ACK
Commit
Abort
Disadvantages:
- Many, many messages
- If failure occurs,
the 2PC protocol blocks!
Attempted solutions for the
blocking problem:
1) Termination Protocol
2) 3-Phase Commit
3) Quorum 3P Commit
46
Failures in a Distributed System

Types of Failure:
Transaction failure
Node failure
Media failure
Network failure
Partitions each containing 1 or more sites
Who addresses the problem?
Issues to be addressed:
How to continue service
How to maintain ACID properties
while providing continued service
How to ensure ACID properties after
recovery from the failure(s)
Termination Protocols
Modified Concurrency Control
& Commit/Abort Protocols
Recovery Protocols,
Termination Protocols,
& Replica Control Protocols
47
Termination Protocols
Coordinating TM
Atomicity, Consistency
Participating Sites
Initial
Timeout states:
Coordinator: wait, commit, abort

Participant: initial, ready
Prepare
Initial
Wait
2
3
Use timeouts to detect

potential failures that could
block protocol progress
Coordinator Termination Protocol:
Vote-Commit
or Vote-Abort
Wait Send global-abort

Commit or Abort BLOCKED!
Ready
Global-Abort or
Global-Commit
Participant Termination Protocol:

Commit
Abort
4
ACK
Commit
Abort
Ready Query the coordinator

If timeout
then query other participants;
If global-abort | global-commit
then proceed and terminate
else BLOCKED!
48
Replica Control Protocols
Consistency
Number of locks
Update propagation of committed write operations
STRICT UPDATE
LAZY UPDATE
END
BEGIN
Obtain Locks
Period of data use
2-Phase Commit
Release
Locks
COMMIT POINT
49
Strict Replica Control Protocol
Consistency
Read-One-Write-All (ROWA)
Part of the Concurrency Control Protocol
and the 2-Phase Commit Protocol
CC locks all copies
2PC propagates the updated values with 2PC
messages (or an update propagation phase is
inserted between the wait and commit states
for those nodes holding an updateable value).
50
Lazy Replica Control Protocol
Consistency
Propagates updates from a primary node.

Concurrency Control algorithm locks the
primary copy node (same node as the
primary lock node).
To preserve single copy semantics, must ensure
that a transaction reads a current copy.
Changes the CC algorithm for read-locks
Adds an extra communication cost for reading data
Extended transaction models may not require

single copy semantics.
51
Recovery in Distributed Systems
Atomicity, Durability
Select COMMIT or ABORT (or blocked) for each interrupted subtransaction

Commit Approaches:
Redo use the undo/redo log to perform all the write operations again
Retry use the transaction log to redo the entire subtransaction (R + W)
Abort Approaches:
Undo use the undo/redo log to backout all the writes that were actually
performed
Compensation use the transaction log to select and execute reverse
subtransactions that semantically undo the write operations.
Implementation requires knowledge of:
Buffer manager algorithms for writing updated data
from volatile storage buffers to persistent storage
Concurrency Control Algorithm
Commit/Abort Protocols
Replica Control Protocol
52
Network Partitions in Distributed Systems

Partition #2
N2
N1
N3
N8
N4
network
Partition #1
N7
N6
N5
Partition #3
Issues:
Termination of interrupted transactions
Partition integration upon recovery from a network failure
Data availability while failure is ongoing
53
Data Availability in Partitioned Networks

Concurrency Control model impacts data availability.
ROWA data replicated in multiple partitions is not
available for reading or writing.
Primary Copy Node CC can execute transactions if
the primary copy node for all of the read-set and all
of the write-set are in the clients partition.
Availability is still very limited . . . We need a new idea!
54
Quorums
ACID
Quorum a special type of majority

Use quorums in the Concurrency Control,
Commit/Abort, Termination, and Recovery Protocols
CC uses read-quorum & write-quorum
C/A, Term, & Recov use commit-quorum & abort-quorum
Advantages:
More transactions can be executed during site failure and
network failure (and still retain ACID properties)
Disadvantages:
Many messages are required to establish a quorum
Necessity for a read-quorum slows down read operations
Not quite sufficient (failures are not clean)
55
Read-Quorums and Write-Quorums
Isolation
The Concurrency Control Algorithm serializes valid

transactions in a partition. It must obtain
A read-quorum for each read operation
A write-quorum for each write operation
Let N=total number of nodes in the system

Define the size of the read-quorum Nr and the size
of the write-quorum Nw as follows:
Nr + Nw > N
Nw > (N/2)
Simple Example:
N=8
Nr = 4
Nw = 5
When failures occur, it is possible to have a valid

read quorum and no valid write quorum
56
Commit-Quorums and Abort-Quorums
ACID
The Commit/Abort Protocol requires votes from all

participants to commit or abort a transaction.
Commit a transaction if the accessible nodes can form a commitquorum
Abort a transaction if the accessible nodes can form an abort-quorum
Elect a new coordinator node (if necessary)
Try to form a commit-quorum before attempting
to form an abort-quorum
Let N=total number of nodes in the system

Define the size of the commit-quorum Nc and the size of
abort-quorum Na as follows:
Simple Examples:
Na + Nc > N; 0 Na, Nc N
N=7
N=7
Nc = 4
Nc = 5
Na = 4
Na = 3
57
Conclusions
Nearly all commerical relational database systems offer some
form of distribution
Client/server at a minimum
Only a few commercial object-oriented database systems

support distribution beyond N clients and 1 server
Future research directions:
Distributed data architecture and placement schemes (app-influenced)
Distributed databases as part of Internet applications
Continued service during disconnection or failure
Mobile systems accessing databases
Relaxed transaction models for semantically-correct continued operation
58

Distributed Dbs

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Distributed Dbs

Încărcat de

Drepturi de autor:

Formate disponibile

Distributed Database Systems

Control Semantic Integrity Control / Authorization

Query Processing and Optimization

Concurrency Control / Logging

Dependencies among DBMS components

Traditionally: one large mainframe DBMS + n stupid terminals

Why a Distributed DBS?

Traditionally: m mainframes for the DBMSs + n terminals

DBMS Implementation Alternatives

Common DBMS Architectural Configurations

Fully integrated nodes

Fully independent DBMSs

Parallel Database System Platforms

Fast Interconnection Network

Typically a special-purpose hardware

Can be a special-purpose hardware

Autonomy of individual nodes

Complexity of design and implementation

Client/Server Database Systems

The Simple Case of Distributed

objects stored and administered on server

Use client system local storage capabilities

Clients with Centralized Server Architecture

Clients with Distributed Server Architecture

local communication network

local data mgnt. functions

local data mgnt. functions

Clients with Data Server Approach

application server interface

In the Client/Server DBMS Architecture,

There are several architectural options!

Object Server Architecture

Object Server Architecture - summary

Unit of transfer: object(s)

Page Server Architecture

Page Server Architecture - summary

File Server Architecture

File Server Architecture - summary

Cache Consistency in Client/Server Architectures

Client sends 1 msg

Client sends 1 msg

Client sends all write

Client sends object

Client sends 1 msg

Client sends all write

Best Performing Algorithms

Comparison of the 3 Client/Server Architectures

Problems in Distributed DBMS Services

Distributed Storage in Relational DBMSs

hybrid fragmentation: projected columns from selected rows

Distributed Storage in OODBMSs

Encapsulation hides object structure

Use object semantics

Horizontal Partitioning using Object Semantics

Attr A >= 100

Based on subclass designation

Fragment an objects complex attributes

Vertical Partitioning using Object Semantics

Fragment object attributes based on the

Breaks object encapsulation?

Path Partitioning using Object Semantics

{SubC#1-OID, SubC#2-OID, SubC#n-OID}

More Issues in OODBMS Fragmentation

Send data to remote site, execute,