Sunteți pe pagina 1din 10

DISTRIBUTED DATABASES.

Introduction.

Distributed databases have become an important area of information processing and it is


easy to foresee that their importance will rapidly grow. There are both organizational and
technological reason for this trend. Distributed databases eliminate many of the short
comings of the centralized databases and fit more naturally in the decentralized
structures of many organization.

Definition.

A distributed database is a collection of data which are distributed over different


computers of a computer network. Each site of the network has autonomous processing
capability and can perform local applications. Each site also participates in the execution
of at least one global application, which requires accessing data at several sites using a
communication subsystem.

An example of a local application is given below:

Consider a bank that has three branches at different locations. At each branch, a computer
controls the teller terminals of the branch and the account database of the branch. Each
computer with its local account database at one branch constitutes one site of the
distributed database; Computers are connected by a communication network. During
normal operations the applications which are requested from the terminals of the branch
need to access the database of that branch. These applications are completely executed by
the computer of the branch where they are issued, and will therefore be called local
applications. An example of a local application is a debit or a credit application
performed on an account stored at the same branch at which the application is required.

From a technological viewpoint, it appears that the really important aspect is the
existence of same applications which accesses data at more than one branch. This
applications are called global applications. The existence of global applications will be
considered the discriminating characteristic of distributed databases with respect to a set
of local databases.
A typical global application is a transfer of funds from an account of one branch to an
account of another branch. This application requires updating the databases at two
different branches. Note that this application is something more than just performing two
local updates at two individual branches (a debit and a credit), because it is also necessary
to ensure that either both updates are performed or neither.
Another definition.

Distributed database (DDB) can be defined as a collection of multiple logically


interrelated databases distributed over a computer network, and a distributed database
management system (DDBMS) as a software system that manages a distributed database
while making the distribution transparent to the user.

Features of Distributed Versus Centralized Databases.

1> Centralized Control.

The possibility of providing centralized control over the information resources of a whole
enterprise or organization was considered as one of the strongest motivations for
introducing databases; they were developed as the evaluation of information systems in
which each application had its own private files. The fundamental function of database
administrator (DBA) was to guarantee the safety of data; the data itself was recognized to
be an important investment of the enterprise which required a centralized responsibility.

In distributed databases, the idea of centralized control is much less emphasized. In


general, in distributed databases it is possible to identify a hierarchical control structure
based on a global database Administrator who has the central responsibility of the whole
database, and on local database administrators, who have the responsibility of their
respective local databases.

However, it must be emphasized that local database administrator may have a high
degree of autonomy, up to the point that a global database administrator is completely
missing and the intersite co-ordination is performed by the local administrators
themselves. This characteristic is usually called site autonomy.

Distributed databases may differ very much in the degree of site autonomy: from
complete site autonomy without any centralized database administrator to almost
completely centralized control.

2> Data Independence.

Data independence means that the actual organization of data is transparent to the
application programmer. Programs are written having a “conceptual” view of the data,
the so-called conceptual schema. The main advantage of data independence is that
programs are unaffected by changes in the physical organization of data.

In distributed database, data independence has the same importance as in traditional


databases; however, a new aspect is added to the usual notion of data independence
namely, distribution transparency.
By distribution transparency we mean that programs can be written as if the database
were not distributed. Thus the correctness of programs is unaffected by the movement of
data from one site to another, however their speed of execution is affected.
3>Reduction of Redundancy.

In traditional databases, redundancy was reduced as far as possible for two reasons:
First, inconsistencies among several copies of the same logical data are automatically
avoided by having only one copy, and second, storage space is saved by eliminating
redundancy. reduction of redundancy was obtained by data sharing i.e by allowing
several applications to access the same files and records. In distributed database,
however, there are several reasons for considering data redundancy as a desirable feature;
First , the locality of applications can be increased if the data is replicated at all sites
where applications need it , and

Second the availability of the system can be increased, because a site failure does not stop
the execution of application at the other sites if the data is replicated.
As a very general statement let us say that the convenience if replicating a data item
increase with the ration of retrieval accesses versus update accesses performed by
applications to it.

The convenience of data replication increases because if we have several copies of an


item, retrieval can be performed on any copy while updates must be performed
consistently on all copies.

4>Complex Physical Structures and Efficient Access.

Complex accessing structures , like secondary indexes interfile chains and so on, are a
major aspect of traditional databases. The support for these structures is the most
important part of database management systems (DBMSs). The reason for providing
complex accessing structures is to obtain efficient access to data.

In distributed databases; complex accessing structures are not the right tool for efficient
access. Therefore, while efficient access is a main problem in distributed databases,
physical structure are not a relevant technological issue. Efficient access to a distributed
database cannot be provided by using intersite physical structures because it is very
difficult to build and maintain such structures and because it is not convenient to
“navigate” at a record level in distributed databases.

5>Integrity, Recovery, and Concurrency Control.

In databases the issues of integrity, recovery, and concurrency control, although they
refer to different problems are strongly interrelated. To a large extent, the solution of
these problems consists of providing transactions.
A transaction is an atomic unit of executions; i.e it is a sequence of operation which
either are performed in entirely or are not performed at all.

The ‘funds transfer’ application is a global application which must be an atomic unit;
either both the debit portion and the credit portion are executed or none.
It is not acceptable to perform only one of them. Therefore the funds transfer application
is also a global transaction. It is clear that in distributed databases the problem of
transaction atomicity has a particular flavour: how should the system behave if the ‘debit’
site is operational and the ‘credit’ site is not operational when the funds transfer is
required ? Should the transaction be aborted (undoing all operations which have been
performed until the moment of site failure) or should a smart system try to execute the
funds transfer correctly even if both sites are never simultaneously operational ? of cause
the user would be less affected by failures if the latter approach is applied

clearly, atomic transactions are the means to obtain database integrity, because they
assure that either all actions which transform the database from one consistent state into
another are performed, or the initial consistent state is left untouched. There are two
dangerous enemies of transaction atomicity; failures and concurrency.
Failures may cause the system to stop in the midst of transaction execution, thus violating
the atomicity requirement. Concurrent execution of different transactions may permit one
transaction to observe an inconsistent transient state created by another transaction
during its execution

Recovery deals to a large extent with the problem of preserving transaction atomicity in
the presence of failures. In distributed database this aspect is particularly important
because some of the sites involved in transaction execution might fail.

Concurrency control deals with ensuring atomicity in the presence of concurrent


execution of transaction. In distributed databases, as in all distributed systems the
synchronization problem is hander than in centralized systems.

6>Privacy and Security.

In traditional databases, the database administrator having centralized control, can ensure
that only authorized access to the data is performed. Note however that the centralized
data approach in itself, without specialized control procedures is more vulnerable to
privacy and security violations than the older approaches based on separate files.
In distributed databases, local administrators are faced essentially with the same problem
as database administrators in a traditional database.

First in a distributed database with a very high degree of site autonomy, the owners of
local data feel more protected because they can enforce their own protections instead of
depending on a central database administrator.

Secondly security problems are intrinsic to distributed systems in general , because


communication networks can represent a weak point with respect to protection.
Why Distributed Databases.

There are several reasons why distributed databases are developed. The following is a list
of the main motivations.

1>Organisational and Economic Reason.

Many organizations are decentralized, and a distributed database approach fits more
naturally the structure of the organization. with the recent developments in computer
technology, the economy of scale motivation for having large , centralized computer
centers is becoming questionable. The organizational and economic motivations are
probably the most important reason for developing distributed database.

2>Interconnection Of Existing Databases.

Distributed databases are the natural solution when several databases already exist in an
organization and the necessity of performing global applications arises. In this case, the
distributed database is created bottom-up from the preexisting local databases. This
process may require a certain degree of local restructuring; however, the effort which is
required by this restructuring is much less than needed for the creation of a completely
new centralized database.

3>Incremental Growth.

If an organization grows by adding new , relatively autonomous organizational units


(new branches, new warehouses etc) , then the distributed database approach support
incremental growth with a minimum degree of impact on the already existing units.

With a centralized approach either the initial dimensions of the system take care of future
expansion, which is difficult to foresee and expensive to implement, or the growth has a
major impact not only on the new applications but also on the existing ones.

4>Reduced Communication Overhead.

In a geographically distributed database it indicates that many applications are local,


clearly reduces the communication overhead with respect to centralized database.
Therefore the maximization of locality of application is one of the primary objectives in
distributed database design.

5>Performance Consideration.

The existence of several autonomous processors results in the increase of performance


through a high degree of parallelism. This consideration can be applied to any
multiprocessor system, and not only to distributed databases. Distributed database have
the advantage in that the decomposition of data reflects application dependent criteria
which maximize application locality: in this way the mutual interference between
different processors is minimized.
The load is shared between the different processors, and critical bottlenecks, such as the
communication network itself or common services of the whole system , are avoided.
This effect is a consequence of the autonomous processing capability requirement for
local applications stated in the definition of distributed databases.

6>Reliability and Availability.

The distributed database approach, especially with redundant data , can be used also in
order to obtain higher reliability and availability. However obtaining this goal is not
straightforward and requires the use of techniques. The autonomous processing capability
of the different sites does not by itself guarantee a higher overall reliability of the system,
but it ensures a graceful degradation property ; in other words , failures in a distributed
database can be more frequent than in centralized one because of the greater number of
components, but the effect of each failure is confined to those applications which use the
data of the failed site, and complete system crash is rare.

Note.
The reasons why the development of distributed database system has just begun are as
below.
1. The recent development of small computers, providing at a lower cost many of
the capabilities which were previously provided by large mainframes, constitutes
the necessary hardware support for the development of distributed information
systems.
2. The technology of distributed databases is based on two other technologies which
have developed a sufficiently solid foundation during the seventies; computer
networks technology and database.
Distributed Database Management Systems ( DDBMSs ).

The distributed database management system supports the creation and maintenance of
distributed databases. In analyzing the features of DDBMSs it is convenient to
distinguish between commercially available systems and advanced research prototypes.

The distinction is based on present-day state of the art . Clearly some features are
currently experimental in advanced research prototypes will be incorporated in
commercially available systems of the future.

Several commercially available distributed systems were developed by the vendors of


centralized database management systems. They contain additional components which
extend the capabilities of centralized database management systems by supporting
communication and cooperation between several instances of DBMSs which are
installed at different sites of the computer networks.

The software components which are typically necessary for building a distributed
database in this case are:

1.The database management components (DB).

2.The data communication component (DC)

3.The data dictionary (DD) which is extended to represent information about the
distribution of data in the network

4.The distributed database component (DDB)

Advantages of distributed Databases.

Distributed database management has been proposed for various reasons ranging from
organizational decentralization and economical processing to greater autonomy. We
highlight some of the these advantanges here.

1.Management of distributed data with different levels of transparency.

Ideally, a DBMS should be distribution transparent in the sense of hiding the details of
where each of the file (table, relation) is physically stored within the system and stored
with possible replication . The following types of transparencies are possible:

 Distribution or network transparency.


This refers to freedom for the user from the operational details of the network. It
may be divided into location transparency. location transparency refers to the fact
that the command used to perform a task is independent of the location of data
and the location of the system where the command was issued.
Naming transparency implies that once a name is specified, the named objects can
be accessed unambiguously without additional specification.

 Replication transparency.
Copies of data may be stored at multiple sites for better availability and reliability.
Replication transparency makes the user unaware of the existence of copies.

 Fragmentation transparency.
Two types of fragmentation are possible.
Horizontal fragmentation-Distributes a relation into sets of tuples (rows).
Vertical fragmentation distributes a relation into subrelations where each
subrelation is defined by a subset of the columns of the original relation. A global
query by the user must be transformed into several fragments queries.
Fragmentation transparency makes the user unaware of the existence of
fragments.

 Other transparencies include design transparency and execution transparency-


referring to freedom from knowing how the distributed database is designed and
where a transaction executes.

2.Increased Reliability and Availability.

These are two of the most common potential advantages cited for distributed databases.
Reliability is broadly defined as the probability that a system is running (not down) at a
certain time point , where availability is probability that the system is continuously
available during a time interval. When the data and DBMS software are distributed over
several sites, one site may fail while other site continue to operate. Only the data and
software that exist at the failed site cannot be accessed. this improves both reliability and
availability. Further improvement is achieved by judiciously replicating data and
software at more than one site. In a centralized system, failure at a single site makes the
whole system unavailable to all users. In a distributed database, some of the data may be
unreachable, but users may still be able to access other parts of the database.

3.Improved Performance.

A distributed DBMS fragments the database by keeping the data closer to where it is
needed most. Data localization reduces the contention for CPU and I/O services and
simultaneous reduces access delays involved in wide area networks. When a large
database is distributed over multiple sites, smaller databases exist at each site. As a result,
local queries and transaction accessing data at a single site have better performance
because of the smaller local databases. In addition , each site has a smaller number of
transactions executing than if all transactions are submitted to a single centralized
database. Moreover , interquery and intraquery parallelism can be achieved by excuting
multiple queries at different sites or by breaking up a query into a number of subqueries
that execute in parallel. This contributes to improved performance.
4.Easier Expansion.

In distributed environment, expansion of the system in terms of adding more data,


increasing databases sizes or adding more processors is much easier.

Additional Functions of Distributed Databases.

Distribution leads to increased complexity in the system design and implementation. To


achieve the potential advantages listed previously , the DDBMS software must be able to
provide the following functions in addition to those of a centralized DBMS:

 Keeping track of data.


The ability to keep track of the data distribution, fragmentation, and replication by
expanding the DDBMS catalog.

 Distributed query processing.


The ability to access remote sites and transmit queries and data among the
various site via a communication network.

 Distributed transaction management.


The ability to devise execution strategies for queries and transactions that access
data from more than one site and to synchronize the access to distributed data and
maintain integrity of the overall database.

 Replicated data management.


The ability to decide which copy of a replicated data item to access and to
maintain the consistency of copies of a replicated data item.

 Distributed database recovery.


The ability to recover from individual site crashes and from new types of failures
such as the failure of a communication links.

 Security.
Distributed transactions must be executed with the proper management of security
of the data and the authorization/ access privileges of users.

 Distributed directory (catalog) management.


A directory contains information (metadata) about data in the database. the
directory may be global for the entire DDB, or local for each site. The placement
and distribution of the directory are design and policy issues.

S-ar putea să vă placă și