Documente Academic
Documente Profesional
Documente Cultură
Introduction.
Definition.
Consider a bank that has three branches at different locations. At each branch, a computer
controls the teller terminals of the branch and the account database of the branch. Each
computer with its local account database at one branch constitutes one site of the
distributed database; Computers are connected by a communication network. During
normal operations the applications which are requested from the terminals of the branch
need to access the database of that branch. These applications are completely executed by
the computer of the branch where they are issued, and will therefore be called local
applications. An example of a local application is a debit or a credit application
performed on an account stored at the same branch at which the application is required.
From a technological viewpoint, it appears that the really important aspect is the
existence of same applications which accesses data at more than one branch. This
applications are called global applications. The existence of global applications will be
considered the discriminating characteristic of distributed databases with respect to a set
of local databases.
A typical global application is a transfer of funds from an account of one branch to an
account of another branch. This application requires updating the databases at two
different branches. Note that this application is something more than just performing two
local updates at two individual branches (a debit and a credit), because it is also necessary
to ensure that either both updates are performed or neither.
Another definition.
The possibility of providing centralized control over the information resources of a whole
enterprise or organization was considered as one of the strongest motivations for
introducing databases; they were developed as the evaluation of information systems in
which each application had its own private files. The fundamental function of database
administrator (DBA) was to guarantee the safety of data; the data itself was recognized to
be an important investment of the enterprise which required a centralized responsibility.
However, it must be emphasized that local database administrator may have a high
degree of autonomy, up to the point that a global database administrator is completely
missing and the intersite co-ordination is performed by the local administrators
themselves. This characteristic is usually called site autonomy.
Distributed databases may differ very much in the degree of site autonomy: from
complete site autonomy without any centralized database administrator to almost
completely centralized control.
Data independence means that the actual organization of data is transparent to the
application programmer. Programs are written having a “conceptual” view of the data,
the so-called conceptual schema. The main advantage of data independence is that
programs are unaffected by changes in the physical organization of data.
In traditional databases, redundancy was reduced as far as possible for two reasons:
First, inconsistencies among several copies of the same logical data are automatically
avoided by having only one copy, and second, storage space is saved by eliminating
redundancy. reduction of redundancy was obtained by data sharing i.e by allowing
several applications to access the same files and records. In distributed database,
however, there are several reasons for considering data redundancy as a desirable feature;
First , the locality of applications can be increased if the data is replicated at all sites
where applications need it , and
Second the availability of the system can be increased, because a site failure does not stop
the execution of application at the other sites if the data is replicated.
As a very general statement let us say that the convenience if replicating a data item
increase with the ration of retrieval accesses versus update accesses performed by
applications to it.
Complex accessing structures , like secondary indexes interfile chains and so on, are a
major aspect of traditional databases. The support for these structures is the most
important part of database management systems (DBMSs). The reason for providing
complex accessing structures is to obtain efficient access to data.
In distributed databases; complex accessing structures are not the right tool for efficient
access. Therefore, while efficient access is a main problem in distributed databases,
physical structure are not a relevant technological issue. Efficient access to a distributed
database cannot be provided by using intersite physical structures because it is very
difficult to build and maintain such structures and because it is not convenient to
“navigate” at a record level in distributed databases.
In databases the issues of integrity, recovery, and concurrency control, although they
refer to different problems are strongly interrelated. To a large extent, the solution of
these problems consists of providing transactions.
A transaction is an atomic unit of executions; i.e it is a sequence of operation which
either are performed in entirely or are not performed at all.
The ‘funds transfer’ application is a global application which must be an atomic unit;
either both the debit portion and the credit portion are executed or none.
It is not acceptable to perform only one of them. Therefore the funds transfer application
is also a global transaction. It is clear that in distributed databases the problem of
transaction atomicity has a particular flavour: how should the system behave if the ‘debit’
site is operational and the ‘credit’ site is not operational when the funds transfer is
required ? Should the transaction be aborted (undoing all operations which have been
performed until the moment of site failure) or should a smart system try to execute the
funds transfer correctly even if both sites are never simultaneously operational ? of cause
the user would be less affected by failures if the latter approach is applied
clearly, atomic transactions are the means to obtain database integrity, because they
assure that either all actions which transform the database from one consistent state into
another are performed, or the initial consistent state is left untouched. There are two
dangerous enemies of transaction atomicity; failures and concurrency.
Failures may cause the system to stop in the midst of transaction execution, thus violating
the atomicity requirement. Concurrent execution of different transactions may permit one
transaction to observe an inconsistent transient state created by another transaction
during its execution
Recovery deals to a large extent with the problem of preserving transaction atomicity in
the presence of failures. In distributed database this aspect is particularly important
because some of the sites involved in transaction execution might fail.
In traditional databases, the database administrator having centralized control, can ensure
that only authorized access to the data is performed. Note however that the centralized
data approach in itself, without specialized control procedures is more vulnerable to
privacy and security violations than the older approaches based on separate files.
In distributed databases, local administrators are faced essentially with the same problem
as database administrators in a traditional database.
First in a distributed database with a very high degree of site autonomy, the owners of
local data feel more protected because they can enforce their own protections instead of
depending on a central database administrator.
There are several reasons why distributed databases are developed. The following is a list
of the main motivations.
Many organizations are decentralized, and a distributed database approach fits more
naturally the structure of the organization. with the recent developments in computer
technology, the economy of scale motivation for having large , centralized computer
centers is becoming questionable. The organizational and economic motivations are
probably the most important reason for developing distributed database.
Distributed databases are the natural solution when several databases already exist in an
organization and the necessity of performing global applications arises. In this case, the
distributed database is created bottom-up from the preexisting local databases. This
process may require a certain degree of local restructuring; however, the effort which is
required by this restructuring is much less than needed for the creation of a completely
new centralized database.
3>Incremental Growth.
With a centralized approach either the initial dimensions of the system take care of future
expansion, which is difficult to foresee and expensive to implement, or the growth has a
major impact not only on the new applications but also on the existing ones.
5>Performance Consideration.
The distributed database approach, especially with redundant data , can be used also in
order to obtain higher reliability and availability. However obtaining this goal is not
straightforward and requires the use of techniques. The autonomous processing capability
of the different sites does not by itself guarantee a higher overall reliability of the system,
but it ensures a graceful degradation property ; in other words , failures in a distributed
database can be more frequent than in centralized one because of the greater number of
components, but the effect of each failure is confined to those applications which use the
data of the failed site, and complete system crash is rare.
Note.
The reasons why the development of distributed database system has just begun are as
below.
1. The recent development of small computers, providing at a lower cost many of
the capabilities which were previously provided by large mainframes, constitutes
the necessary hardware support for the development of distributed information
systems.
2. The technology of distributed databases is based on two other technologies which
have developed a sufficiently solid foundation during the seventies; computer
networks technology and database.
Distributed Database Management Systems ( DDBMSs ).
The distributed database management system supports the creation and maintenance of
distributed databases. In analyzing the features of DDBMSs it is convenient to
distinguish between commercially available systems and advanced research prototypes.
The distinction is based on present-day state of the art . Clearly some features are
currently experimental in advanced research prototypes will be incorporated in
commercially available systems of the future.
The software components which are typically necessary for building a distributed
database in this case are:
3.The data dictionary (DD) which is extended to represent information about the
distribution of data in the network
Distributed database management has been proposed for various reasons ranging from
organizational decentralization and economical processing to greater autonomy. We
highlight some of the these advantanges here.
Ideally, a DBMS should be distribution transparent in the sense of hiding the details of
where each of the file (table, relation) is physically stored within the system and stored
with possible replication . The following types of transparencies are possible:
Replication transparency.
Copies of data may be stored at multiple sites for better availability and reliability.
Replication transparency makes the user unaware of the existence of copies.
Fragmentation transparency.
Two types of fragmentation are possible.
Horizontal fragmentation-Distributes a relation into sets of tuples (rows).
Vertical fragmentation distributes a relation into subrelations where each
subrelation is defined by a subset of the columns of the original relation. A global
query by the user must be transformed into several fragments queries.
Fragmentation transparency makes the user unaware of the existence of
fragments.
These are two of the most common potential advantages cited for distributed databases.
Reliability is broadly defined as the probability that a system is running (not down) at a
certain time point , where availability is probability that the system is continuously
available during a time interval. When the data and DBMS software are distributed over
several sites, one site may fail while other site continue to operate. Only the data and
software that exist at the failed site cannot be accessed. this improves both reliability and
availability. Further improvement is achieved by judiciously replicating data and
software at more than one site. In a centralized system, failure at a single site makes the
whole system unavailable to all users. In a distributed database, some of the data may be
unreachable, but users may still be able to access other parts of the database.
3.Improved Performance.
A distributed DBMS fragments the database by keeping the data closer to where it is
needed most. Data localization reduces the contention for CPU and I/O services and
simultaneous reduces access delays involved in wide area networks. When a large
database is distributed over multiple sites, smaller databases exist at each site. As a result,
local queries and transaction accessing data at a single site have better performance
because of the smaller local databases. In addition , each site has a smaller number of
transactions executing than if all transactions are submitted to a single centralized
database. Moreover , interquery and intraquery parallelism can be achieved by excuting
multiple queries at different sites or by breaking up a query into a number of subqueries
that execute in parallel. This contributes to improved performance.
4.Easier Expansion.
Security.
Distributed transactions must be executed with the proper management of security
of the data and the authorization/ access privileges of users.