Sunteți pe pagina 1din 23

Distributed Database

Introduction
A major motivation behind the development of database systems is the
desire to integrate the operational data of an organization and to provide
controlled access to the data. Although integration and controlled access
may imply centralization, this is not the intention.
In fact, the development of computer networks promotes a decentralized
mode of work. This decentralized approach mirrors the organizational
structure of many companies, which are logically distributed into
divisions, departments, projects, and so on, and physically distributed
into offices, plants, factories, where each unit maintains its own
operational data. The sharing ability of the data and the efficiency of
data access should be improved by the development of a distributed
database system that reflects this organizational structure, makes the data
in all units accessible, and stores data proximate to the location where it
is most frequently used.
Distributed DBMS
The software system that permits the management of the
distributed database and makes the distribution transparent to
users.
A Distributed Database Management System (DDBMS) consists of a
single logical database that is split into a number of fragments. Each
fragment is stored on one or more computers under the control of a
separate DBMS, with the computers connected by a communications
network. Each site is capable of independently processing user requests
that require access to local data and is also capable of processing data
stored on other computers in the network.
Users access the distributed database via applications. Applications are
classified as those that do not require data from other sites (local
Applications) and those that do require data from other sites (global
applications). We require a DDBMS to have at least one global
application.
Banking Example

Using distributed database technology, a bank may implement their


database system on a number of separate computer systems rather than a
single, centralized mainframe. The computer systems may be located at
each local branch office: for example, Amritsar, Patiala, and Qadian. A
network linking the computer will enable the branches to communicate
with each other, and DDBMS will enable them to access data stored at
another branch office. Thus, a client living in Amritsar can also check
his/her account during the stay in Patiala or Qadian.
Distributed Relational Database Design
In this section we examine the factors that have to be considered for the
design of a distributed relational database. More specifically, we
examine:

♦ Fragmentation
A relation may be divided into a number of subrelations, called
fragments, which are the distributed.

There are two main types of fragmentation:


1) Horizontal fragmentation
2) Vertical fragmentation
♦ Allocation Each fragment is stored at the site with ‘optimal’
distribution.
Replication The DDBMS may maintain a copy of a fragment at
several different sites.
The definition and allocation of fragments must be based on how the
database is to be used. This involves analyzing transactions. The design
should be based on both quantitative and qualitative information.
Quantitative information is used in allocation;
qualitative information is used in fragmentation.
The quantitative information may include:
♦ The frequency with which a transaction is run.
♦ The site from which a transaction is run.
♦ The performance criteria for transactions.
The qualitative information may include information about the
transaction that are following objectives:
•Locality of reference
•Improved reliability and availability
•Acceptable performance
•Balanced storage capacities and costs
• Minimal communication costs
Data Allocation
There are four alternative strategies regarding the placement of data:
♦ Centralized
♦ Fragmented
♦ Complete replication
♦ Selective replication.
We now compare these strategies using the strategic objective identified
above.
Centralized This strategy consists of a single database and DBMS
stored at one site with users distributed across the network (we referred
to this previously as distributed processing). Locality of reference is at
its lowest as all sites, except the central site, have to use the network for
all data accesses. This also means that communication costs are high.
Reliability and availability are low, as a failure of the central site results
in the loss of the entire database system.
Fragmented (or partitioned)
This strategy partitions the database into disjoint fragments, with each
fragment assigned to one site. If data items are located at the site where
they are used most frequently, locality of reference is high. As there is
no replication, storage cost are low; similarly, reliability and availability
are low, although they are higher than in the centralized case; as the
failure of a site results in the loss of only that site’s data. Performance
should be good and communications costs low if the distribution is
designed properly.
Advantages of fragmentation
•Usage
•Efficiency
•Parallelism
•Security
Disadvantages of fragmentation
•Performance
•Integrity
Data Fragmentation
If relation r is fragmented, r is divided into a number of fragments r1, r2
……rn. These fragments contain sufficient information to allow
reconstruction of the original relation r. As we shall see, this
reconstruction can take place through the application of either the union
operation or a special type of join operation on the various fragments.
There are three different schemes for fragmenting a relation:
♦ Horizontal fragmentation
♦ Vertical fragmentation
♦ Mixed fragmentation
We shall illustrate these approaches by fragmenting the relation
document, with schema:
EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM,
DEPTNO)
Horizontal Fragmentation
In horizontal fragmentation, the relations (tables) are divided
horizontally. That is some of the tuples of the relation is placed in
one computer and rest are placed in other computers.

A horizontal fragment is a subset of the total tuples in that relation

To construct the relation R from various horizontal fragments, a UNION


operation can be performed on the fragments. Such a fragment
containing all the tuples of relation R is called a complete horizontal
fragment.
For example, suppose that the relation r is the EMP relation of above.
This relation can be divided into n different fragments, each of which
consists of tuples of employee belonging to a particular department.
EMP relation has three departments 10,20 and 30 results three different
fragments:
EMP1=σDEPTNO=10 (EMP)

EMP2=σDEPTNO=20 (EMP)

EMP3=σDEPTNO=30 (EMP)

These three fragments are shown below. Fragment r1 is stored in the


department number 10 site, fragment r2 is stored in the department
number 20 site and so on r3 is stored at department number 30 site.
These fragments are shown below:
We obtain the reconstruction of the relation r by taking the union of all
fragments; that is,
R=r1∪r2∪…..∪rn
Vertical Fragmentation
In vertical fragmentation, some of the columns (attributes) are stored in
one computer and rest are stored in other computers. This is because
each site may not need all the attributes of a relation.
A vertical fragment keeps only certain attributes of the relation.

The fragmentation should be done such that we can reconstruct relation r


from the fragments by taking the natural join

r=r1*r2*r3………rn
Mixed Fragmentation
Mixed fragmentation, also known as Hybrid fragmentation, intermixes
the horizontal and vertical fragmentation.
The relation r is divided into a number of fragment relations r1, r2……..
rn. Each fragment is obtained as the result of application of either the
horizontal fragmentation or vertical fragmentation scheme on relation r,
or on a fragment of r that was obtained previously.
For example, if we can combine the horizontal and vertical
fragmentation of the EMP relation, it will result into a mixed
fragmentation. This relation is divided initially into the fragments EMP1
and EMP2 as vertical fragments. We can now further divide fragment
EMP1 using the horizontal-fragmentation scheme, into the following two
fragments: EMP1a=σDEPTNO= 10 (EMP1)
EMP2a=σDEPTNO= 20 (EMP2)
EMP3a=σDEPTNO= 30 (EMP3)
Data Replication and Fragmentation

The techniques described for data replication and data fragmentation can
be applied successively to the same relation. That is, a fragment can be
replicated, replicas of fragments can be fragmented further, and so on.
For example, consider a distributed system consisting of sites S1, S2…
….S11. We can fragment EMP into EMP1a, EMP2a and EMP2, and for
example, store a copy of EMP1a at sites S1, S3 and S7; a copy of
EMP2a at sites S4 and S11; and a copy of EMP2 at sites S2, S8 and S9.
Complete replication
This strategy consists of maintaining a complete copy of the database at
each site. Therefore, locality of reference, reliability and availability, and
performance are maximized. However, storage costs and communication
costs for updates are the most expensive. To overcome some of these
problems, snapshots are sometimes used. A snapshot is a copy of the
data at a given time. The copies are updated periodically, for example,
hourly or weekly, so they may not be always up to date. Snapshots are
also sometimes used to implement views in a distributed database to
improve the time it takes to perform a database operation on a view.
Selective replication This strategy is a combination of fragmentation,
replication and centralized. Some data items are fragmented to achieve
high locality of reference and others, which are used at many sites and
are not frequently updated, are replicated; otherwise, the data items are
centralized. The objective of this strategy is to have all the advantages of
the other approaches but none of the disadvantages. This is the most
commonly used strategy because of its flexibility.

S-ar putea să vă placă și