Documente Academic
Documente Profesional
Documente Cultură
TKSawe
Distributed Naming.................................................................................................. 38
Name..................................................................................................................... 38
Name Spaces........................................................................................................ 39
DNS Domain Names.............................................................................................. 40
Understanding the DNS Domain Namespace.....................................................40
How the DNS Domain Namespace Is Organized.................................................40
Types of DNS Domain Names.............................................................................40
DNS and Internet Domains................................................................................. 41
Distributed Transactions........................................................................................... 43
Transaction............................................................................................................ 43
Commit and rollback of transactions..................................................................43
DESIRABLE PROPERTIES OF TRANSACTION............................................................45
ACID properties of transactions................................................................................ 48
What Are Distributed Transactions?.......................................................................49
Distributed System synchronization.........................................................................52
Computers clock................................................................................................... 52
Problems with physical clocks............................................................................53
Coordinated Universal Time (UTC)......................................................................53
Physical Clock Synchronization..........................................................................54
TKSawe
An Introduction to Distributed Systems
More importantly, computers are faster. Network communication takes computational effort. A
slower computer would spend a greater fraction of its time working on communicating rather
than working on the users program. Couple this with past CPU performance and cost and
networking just wasnt viable. Finally, interconnect technologies have advanced to the point
where it is very easy and inexpensive to connect computers together. Over local area networks,
we can expect connectivity in the range of tens of Mbits/sec to a Gbit/sec. Tanenbaum defines a
distributed system as a collection of independent computers that appear to the users of the
system as a single computer. There are two essential points in this definition. The first is the use
of the word independent. This means that, architecturally, the machines are capable of operating
independently. The second point is that the software enables this set of connected machines to
appear as a single computer to the users of the system. This is known as the single system image
and is a major goal in designing distributed systems that are easy to maintain and operate.
TKSawe
Why distributed systems?
Just because it is easy and inexpensive to connect multiple computers together does not
necessarily mean that it is a good idea to do so. There are genuine benefits in building distributed
systems: Price/performance ratio. You don't get twice the performance for twice the price in
buying computers. Processors are only so fast and the price/performance curve becomes
nonlinear and steep very quickly. With multiple CPUs, we can get (almost) double the
performance for double the money (as long as we can figure out how to keep the processors busy
and the overhead negligible).
Distributing machines may make sense. It makes sense to put the CPUs for ATM cash machines
at the source, each networked with the bank. Each bank can have one or more computers
networked with each other and with other banks. For computer graphics, it makes sense to putthe
graphics processing at the user's terminal to maximize the bandwidth between the device and
processor.
Computer supported cooperative networking. Users that are geographically separated can now
work and play together. Examples of this are electronic whiteboards, distributed document
systems, audio/video teleconferencing, email, file transfer, and games such as Doom, Quake, Age
of Empires, and Duke Nukeem, Starcraft, and scores of others. Increased reliability. If a small
percentage of machines break, the rest of the system remains intact and can do useful work.
Incremental growth. A company may buy a computer. Eventually the workload is too great for
the machine. The only option is to replace the computer with a faster one. Networking allows
you to add on to an existing infrastructure. Remote services. Users may need to access
information held by others at their systems.
Examples of this include web browsing, remote file access, and programs such as Napster and
Gnutella to access MP3 music. Mobility. Users move around with their laptop computers, Palm
Pilots, and WAP phones. It is not feasible for them to carry all the information they need with
them.
TKSawe
A distributed system has distinct advantages over a set of non-networked smaller
computers. Data can be shared dynamically giving private copies (via floppy disk, for
example) does not work if the data is changing. Peripherals can also be shared. Some
peripherals are expensive and/or infrequently used so it is not justifiable to give each PC
a peripheral. These peripherals include optical and tape jukeboxes, typesetters, large
format color printers and expensive drum scanners. Machines themselves can be shared
and workload can be distributed amongst idle machines. Finally, networked machines are
useful for supporting person-to-person networking: exchanging email, file transfer, and
information access (e.g., the web).
2. The network may lose messages and/or become overloaded. Rewiring the network can be
costly and difficult.
3. Security becomes a far greater concern. Easy and convenient data access from anywhere
creates security problems.
TKSawe
typical services:
- infrastructure services: file service, name service
- application services
Mobile and ubiquitous computing
Portable devices
laptops
handheld devices
wearable devices
devices embedded in appliances
Mobile computing
Location-aware computing
Ubiquitous computing, pervasive computing
Resource Sharing
With Distributed Systems, it is easier for users to access remote resources and to share
resources with other users.
Examples: printers, files, Web pages, etc
A distributed system should also make it easier for users to exchange information.
Easier resource and data exchange could cause security problems a distributed system
should deal with this problem.
TKSawe
Openness
The openness of DS is determined primarily by the degree to which new resource-sharing
services can be added and be made available for use by a variety of client programs.
1. Openness: offer services according to rules and interfaces that describe the syntax and
semantics of those services
Interoperability and portability
-- Separating policy from mechanism
Challenges:
Controlling the cost of resources or money.
Controlling the performance loss.
TKSawe
Fault Tolerance
Hardware, software and networks fail!
Distributed systems must maintain availability even at low levels of
hardware/software/network reliability.
Fault tolerance is achieved by
recovery
redundancy
Transparency
It hides the fact that the processes and resources are physically distributed across
multiple computers.
How to achieve single-system image? How to hide distribution from users or
programs?
Is it a good idea? Sometimes requires trade off transparency for performance
Access Transparency
Enables local and remote information objects to be accessed using identical
operations.
Example: File system operations in NFS, Navigation in the Web and SQL Queries
Location Transparency
Enables information objects to be accessed without knowledge of their location.
Example: File system operations in NFS, Pages in the Web, Tables in distributed
databases.
Concurrency Transparency
Enables serveral processes to operate concurrently using shared information
objects without interference between them.
Example: NFS, Automatic teller machine network, Database management
system.
Replication Transparency
Enables multiple instances of information objects to be used to increase reliability
and performance without knowledge of the replicas by users or application
programs
Example: Distributed DBMS, Mirroring Web Pages.
Failure Transparency
Enables the concealment of faults
Allows users and applications to complete their tasks despite the failure of other
components.
TKSawe
o Example: Database Management System
Migration Transparency
Allows the movement of information objects within a system without affecting
the operations of users or application programs.
Example: NFS, Web Pages
Performance Transparency
Allows the system to be reconfigured to improve performance as loads vary.
o Example: Distributed make.
Scaling Transparency
Allows the system and applications to expand in scale without change to the
system structure or the application algorithms.
Example: World-Wide-Web, Distributed Database
Distributed System Models
Characterization
The structure and the organization of systems and the relationship among their
components should be designed with the following goals in mind:
a conceptual view
TKSawe
High variation of workload, partial disconnection of components,
or poor connection.
Internal problems
External threats
Architectural Models
Architectural models provide a high-level view of the distribution of functionality between
system components and the interaction relationships between them.
Architectural models define
components (logical components deployed at physical nodes)
communication
Criteria
performance
reliability
scalability, ..
Client-Server
Clients send requests to servers
A server is a system that runs a service
The server is always on and processes requests from
clients
Clients do not communicate with other clients
Client-server model:
TKSawe
Service provided by multiple servers:
Needed:
name service
trading/broker service
browsing service
Tiered architectures
Tiered (multi-tier) architectures
distributed systems analogy to a layered architecture
Each tier (layer)
Runs as a network service
Is accessed by surrounding layers
The classic client-server architecture is a two-tier model
Clients: typically responsible for user interaction
Servers: responsible for back-end services (data access, printing, )
Layered architectures
Break functionality into multiple layers
Each layer handles a specific abstraction
Hides implementation details and specifics of hardware, OS,
network abstractions, data encoding,
TKSawe
Goals
Robustness
Expect that some systems may be down
Self-scalability: the system can handle greater workloads as more peers are added
Examples
BitTorrent, Skype
At a high level we can look at the application and user requirements, and classify the individual
processes and logic as Server, Client or Peer processes. Take the Dictionary server as an
example. We could have had one monolithic application, will all words and interface available
locally.
However a client-server approach may be more efficient. Previous queries could be cached
locally. Word additions, removes and modifications can be done centrally. C/S is valid approach!
Could use a peer model were each peer has a segment of Dictionary
More recently in terms of services offered and requested between processes in the
same or different computers.
Breaking up the complexity of systems by designing them through layers and services
TKSawe
.
Layers exist in the stack at both a single machine and between multiple machines
Operating systems typically split functionality into layers, and hide the complexity of many
common operations from users and programmers.
Layers access levels above and below them as services
TKSawe
Applications, services
Middleware
Operating system
Platform
Platform
The lowest hardware and software layers are often referred to as a platform for
distributed systems and applications.
These low-level layers provide services to the layers above them, which are implemented
independently in each computer.
Major Examples
Intel x86/Windows
Intel x86/Linux
Intel x86/Solaris
TKSawe
SPARC/SunOS
PowerPC/MacOS
Even thought these platforms are reasonably disparate (Unix, Windows) & (x86, PPC, SPARC)
they still have a core set of services and middleware in common. This is essential to enable these
platforms to communicate, interoperate and collaborate together with minimal effort.
We dont want to be converting between little endian/big endian or have to care about the
specifics of each platform at the higher level
Middleware
Major Examples:
Modern Middleware:
IBM WebSphere
Microsoft .NET
Sun J2EE
Google AppEngine
To enable us to be truly platform agnostic, we need a core set of middleware and functions that
we can use on all platforms
Sun RPC is available on most Unix platforms
CORBA is a distributed middleware that is available for all major platforms
Java RMI is available on any platform there is a compliant JVM install
.NET is becoming a standard, and is available for Windows/Linux/MacOS
Higher level middleware (Gridbus/Globus) can hide even more complexity
TKSawe
System Architecture
The most evident aspect of DS design is the division of responsibilities between system
components (applications, servers, and other processes) and the placement of the
components on computers in the network.
It has major implication for:
-Performance, reliability, and security of the resulting system.
result result
Server
Client
Key:
Process: Computer:
TKSawe
Client processes interact with individual server processes in a separate computer in order
to access data or resource. The server in turn may use services of other servers.
Example:
Querying a web server, which could then query a mysql or oracle database before returning the
content of a page
Web server is a client of the database server
A search engine, as well as serving search requests from clients, crawls other websites to keep its
information current.
Search engine is both a server and a client for other web servers.
TKSawe
Service
Server
Client
Server
Client
Server
This topology is extremely common. A web site like google serves approximately 100M searches
a day.
It is obviously simply not feasible to serve them from a single server.
Google uses clusters containing 10s of thousands of machines offering equivalent services, and
you are redirected (via DNS and other means) to one of them. Can also be redirected at protocol
or application level.
Similar techniques can be used for Oracle databases, that are replicated over many servers to
offer redundancy and performance
TKSawe
Proxy servers (replication transparency) and caches: Web proxy server
Client Web
server
Proxy
server
Client Web
server
Web proxy servers can operate at client level, at ISP level and at edge/gateway levels to improve
performance and reduce communication costs for frequently accessed data.
Caching can even be used for dynamic data (such as a google search). This reduces the load on
the web servers and improves the performance for end users by reducing the time taken for a
dynamic request. Google uses this technique extensively! When serving 100M requests/day this
saves resources. [DO A LIVE DEMO USING MU-WIRELESS!!!!!!!!!!!!!!!]
TKSawe
Peer 2
Peer 1
Applic ation
Applic ation
S harable Peer 3
objec ts
Applic ation
Peer 4
Applic ation
Peers 5 .... N
All of the processes play similar roles, interacting cooperatively as peers to perform
distributed activities or computations without distinction between clients and servers.
E.g., music sharing systems Gnutella, Napster, Kaza, etc.
Distributed white board users on several computers to view and interactively modify
a picture between them.
Peer model suits ad-hoc groupings of participants. Can be used very effectively for instant
bittorrent swarm style downloaded.
-No central point of failure (reliable)
-No central point of control (difficult to deny service for adversaries)
-Some peers will typically contribute more than others (I.e. seed or super-peer)
TKSawe
Variants of Client Sever Model: Mobile code and Web applets
a) client request results in the downloading of applet code
Client Web
Applet code server
Web
Client Applet server
TKSawe
Communication in distributed systems
The most important difference between a distributed system and a single processor system is the
communication between processes. In a single-processor communication implicitly assumes the
existence of shared memory:
Ex: problem of producers and consumers, where a process writes to a shared buffer and another
process reads from it.
In a distributed system there is no shared memory and thus the whole nature of communication
between processes must be rethought. Processes to communicate, they must adhere to rules
known as protocols. For distributed systems over a wide area, these protocols often take the form
of several layers and each layer has its own goals and rules. Messages are exchanged in various
ways, there are many design options in this regard, an important option is the "remote procedure
call." It is also important to consider the possibilities of communication between groups of
processes, not only between two processes. In distributed systems, the absence of physical
connection between the different memories of the teams, communication is performed by
message transfer.
Message Passing
TKSawe
Two basic interprocess communication paradigms: the shared data approach and
message passing approach.
TKSawe
In the shared-data approach, the information to be shared is placed in a common memory
area that is accessible to all processes involved in an IPC.
TKSawe
Desirable Features of a Good Message-Passing System
Simplicity
A message passing system should be simple and easy to use. It should be possible to
communicate with old and new applications, with different modules without the need to worry
about the system and network aspects.
Uniform Semantics
In a distributed system, a message-passing system may be used for the following two types of
interprocess communication:
local communication, in which the communicating processes are on the same node;
Efficiency
TKSawe
An IPC protocol of a message-passing system can be made efficient by reducing the number of
message exchanges, as far as practicable, during the communication process. Some optimizations
normally adopted for efficiency include the following:
avoiding the costs of establishing and terminating connections between the same pair of
processes for each and every message exchange between them;
minimizing the costs of maintaining the connections;
piggybacking of acknowledgement of previous messages with the next message during a
connection between a sender and a receiver that involves several message exchanges.
Correctness
Correctness is a feature related to IPC protocols for group communication. Issues related
to correctness are as follows:
atomicity;
ordered delivery;
survivability.
Atomicity ensures that every message sent to a group of receivers will be delivered to
either all of them or none of them. Ordered delivery ensures that messages arrive to all receivers
in an order acceptable to the application. Survivability guarantees that messages will be correctly
delivered despite partial failures of processes, machines, or communication links.
Other features:
reliability;
flexibility;
security;
portability.
TKSawe
Issues in IPC by Message Passing
Address. It contains characters that uniquely identify the sending and receiving processes in
the network.
Sequence number. This is the message identifier (ID), which is very useful for identifying lost
messages and duplicate messages in case of system failures.
Structural information. This element also has two parts. The type part specifies whether the
data to be passed on to the receiver is included within the message or the message only
contains a pointer to the data, which is stored somewhere outside the contiguous portion of
the message. The second part of this element specifies the length of the variable-size message
data.
TKSawe
Synchronization
In case of a blocking send primitive, after execution of the send statement, the sending
process is blocked until it receives an acknowledgement from the receiver that the message has
been received. On the other hand, for nonblocking send primitive, after execution of the send
statement, the sending process is allowed to proceed with its execution as soon as the message
has been copied to a buffer.
In the case of blocking receive primitive, after execution of the receive statement, the
receiving process is blocked until it receives a message. On the other hand, for a nonblocking
receive primitive, the receiving process proceeds with its execution after execution of the receive
statement, which returns control almost immediately just after telling the kernel where the
message buffer is.
TKSawe
An important issue in a nonblocking receive primitive is how the receiving process
knows that the message has arrived in the message buffer. One of the following two methods is
commonly used for this purpose:
Polling. In this method, a test primitive is provided to allow the receiver to check the buffer
status. The receiver uses this primitive to periodically poll the kernel to check if the message
is already available in the buffer.
Interrupt. In this method, when the message has been filled in the buffer and is ready for use
by the receiver, a software interrupt is used to notify the receiving process.
A variant of the nonblocking receive primitive is the conditional receive primitive, which
also returns control to the invoking process almost immediately, either with a message or with
an indicator that no message is available.
When both the send and receive primitives of a communication between two processes
use blocking semantics, the communication is said to be synchronous, otherwise it is
asynchronous. The main drawback of synchronous communication is that it limits concurrency
and is subject to communication deadlocks.
TKSawe
Synchronous mode of communication with both send and receive primitives having blocking-
type semantics
TKSawe
TKSawe
Buffering
In the standard message passing model, messages can be copied many times: from the user
buffer to the kernel buffer (the output buffer of a channel), from the kernel buffer of the sending
computer (process) to the kernel buffer in the receiving computer (the input buffer of a channel),
and finally from the kernel buffer of the receiving computer (process) to a user buffer.
In this case, there is no place to temporarily store the message. Hence one of the following
implementation strategies may be used:
The message remains in the sender processs address space and the execution of the send is
delayed until the receiver executes the corresponding receive.
The message is simply discarded and the time-out mechanism is used to resend the message
after a timeout period. The sender may have to try several times before succeeding.
TKSawe
The three types of buffering strategies used in interprocess communication
TKSawe
Single-Message Buffer
In single-message buffer strategy, a buffer having a capacity to store a single message is used on
the receivers node. This strategy is usually used for synchronous communication, an application
module may have at most one message outstanding at a time.
Unbounded-Capacity Buffer
In the asynchronous mode of communication, since a sender does not wait for the receiver to be
ready, there may be several pending messages that have not yet been accepted by the receiver.
Therefore, an unbounded-capacity message-buffer that can store all unreceived messages is
needed to support asynchronous communication with the assurance that all the messages sent to
the receiver will be delivered.
Finite-Bound Buffer
When the buffer has finite bounds, a strategy is also needed for handling the problem of a
possible buffer overflow. The buffer overflow problem can be dealt with in one of the following
two ways:
TKSawe
Unsuccessful communication. In this method, message transfers simply fail, whenever there
is no more buffer space and an error is returned.
Flow-controlled communication. The second method is to use flow control, which means that
the sender is blocked until the receiver accepts some messages, thus creating space in the
buffer for new messages. This method introduces a synchronization between the sender and
the receiver and may result in unexpected deadlocks. Moreover, due to the synchronization
imposed, the asynchronous send does not operate in the truly asynchronous mode for all send
commands.
Multidatagram Messages
Almost all networks have an upper bound of data that can be transmitted at a time. This size is
known as maximum transfer unit (MTU). A message whose size is greater than MTU has to be
fragmented into multiples of the MTU, and then each fragment has to be sent separately. Each
packet is known as a datagram. Messages larger than the MTU are sent in miltipackets, and are
known as multidatagram messages.
A message data should be meaningful to the receiving process. This implies that, ideally, the
structure of program objects should be preserved while they are being transmitted from the
address space of the sending process to the address space of the receiving process. However,
even in homogenous systems, it is very difficult to achieve this goal mainly because of two
reasons:
TKSawe
An absolute pointer value loses its meaning when transferred from one process address space
to another.
In transferring program objects in their original form, they are first converted to a stream
form that is suitable for transmission and placed into a message buffer. The process of
reconstruction of program object from message data on the receiver side is known as decoding
of message data. One of the following two representations may by used for the encoding and
decoding of a message data:
In tagged representation the type of each program object along with its value is encoded in
the message.
In untagged representation the message data only contains program object. No information is
included in the message data to specify the type of each program object.
TKSawe
Process Addressing
Explicit addressing. The process with which communication is desired is explicitly named as
a parameter in the communication primitive used.
Implicit addressing. The process willing to communicate does not explicitly name a process
for communication (the sender names a server instead of a process). This type of process
addressing is also known as functional addressing.
TKSawe
What Is RPC
RPC makes the client/server model of computing more powerful and easier to program. When
combined with the ONC RPCGEN protocol compiler (Chapter 33) clients transparently make
remote calls through a local procedure interface.
An RPC is analogous to a function call. Like a function call, when an RPC is made, the calling
arguments are passed to the remote procedure and the caller waits for a response to be returned
from the remote procedure. Figure 32.1 shows the flow of activity that takes place during an
RPC call between two networked systems. The client makes a procedure call that sends a request
to the server and waits. The thread is blocked from processing until either a reply is received, or
it times out. When the request arrives, the server calls a dispatch routine that performs the
requested service, and sends the reply to the client. After the RPC call is completed, the client
program continues. RPC specifically supports network applications.
TKSawe
Fig: Remote Procedure Calling Mechanism A remote procedure is uniquely identified by the
triple: (program number, version number, procedure number) The program number identifies a
group of related remote procedures, each of which has a unique procedure number. A program
may consist of one or more versions. Each version consists of a collection of procedures which
are available to be called remotely. Version numbers enable multiple versions of an RPC protocol
to be available simultaneously. Each version contains a a number of procedures that can be called
remotely. Each procedure has a procedure number.
Consider an example:
We use UNIX to run a remote shell and execute the command this way. There are some problems
with this method:
TKSawe
establish an server on the remote machine that can repond to queries.
Retrieve information by calling a query which will be quicker than previous approach.
The programs will be compiled seperately. The communication protocol is achieved by generated
stubs and these stubs and rpc (and other libraries) will need to be linked in.
TKSawe
Distributed Naming
Name
A name in a distributed system is a string of bits or characters used to refer to an entity.
Entities: hosts, printers, disks, files, processes, users, mailboxes, web pages, graphical
windows, messages, network connections, etc.
Thus the address cannot be treated as the name of the entity. Moreover an entity may have
more than one access point. Location independent name is separate from the address of the
access point.
Entity
Jon_Server
Owner: jon
Lifetime: 1 hourIdentifiers
TKSawe
Identifier
AddressNaming Service
Name Spaces
Set of all valid names to be used in a certain context, e. g., all valid URLs in WWW
Can be described using a generative grammar (e. g., BNF for URLs).
Internal structure
Potentially infinite
TKSawe
Aliases
-In general, allows a convenient name to be substituted for a more complicated one
Naming domain
-Name space for which there exist a single administrative authority for assigning names
within it
TKSawe
Types of DNS Domain Names
Root This is the top of the tree, representing an unnamed level; it is A single period (.) or a period
domain sometimes shown as two empty quotation marks (""), indicatingused at the end of a name, such as
a null value. When used in a DNS domain name, it is stated example.microsoft.com.
by a
trailing period (.) to designate that the name is located at the
root or highest level of the domain hierarchy. In this instance,
the DNS domain name is considered to be complete and points
to an exact location in the tree of names. Names stated this way
are called fully qualified domain names (FQDNs).
Top level A name used to indicate a country/region or the type of .com, which indicates a
domain organization using a name. name registered to a business
for commercial use on the
Internet.
Subdomain Additional names that an organization can create that are example.microsoft.com. ,
derived from the registered second-level domain name. These which is a fictitious
include names added to grow the DNS tree of names in an subdomain assigned by
organization and divide it into departments or geographic Microsoft for use in
locations. documentation example
names.
Host or Names that represent a leaf in the DNS tree of names and host-
resource identify a specific resource. Typically, the leftmost label of a a.example.microsoft.com.,
name DNS domain name identifies a specific computer on the where the first label (host-
network. For example, if a name at this level is used in a host a) is the DNS host name for
(A) RR, it is used to look up the IP address of computer based a specific computer on the
on its host name. network.
TKSawe
DNS and Internet Domains
The Internet Domain Name System is managed by a Name Registration Authority on the
Internet, responsible for maintaining top-level domains that are assigned by organization and by
country/region. These domain names follow the International Standard 3166. Some of the many
existing abbreviations, reserved for use by organizations, as well as two-letter and three-letter
abbreviations used for countries/regions are shown in the following table:
Some DNS Top-level Domain Names (TLDs)
TKSawe
Distributed Transactions
Transaction
In computer programming, a transaction usually means a sequence of information exchange and
related work (such as database updating) that is treated as a unit for the purposes of satisfying a
request and for ensuring database integrity. For a transaction to be completed and database
changes to made permanent, a transaction has to be completed in its entirety. A typical
transaction is a catalog merchandise order phoned in by a customer and entered into a computer
by a customer representative. The order transaction involves checking an inventory database,
confirming that the item is available, placing the order, and confirming that the order has been
placed and the expected time of shipment. If we view this as a single transaction, then all of the
steps must be completed before the transaction is successful and the database is actually changed
to reflect the new order. If something happens before the transaction is successfully completed,
any changes to the database must be kept track of so that they can be undone.
A program that manages or oversees the sequence of events that are part of a transaction is
sometimes called a transaction monitor. Transactions are supported by Structured Query
Language, the standard database user and programming interface. When a transaction completes
successfully, database changes are said to be committed; when a transaction does not complete,
changes are rolled back. In IBM's Customer Information Control System product, a transaction is
a unit of application data processing that results from a particular type of transaction request. In
CICS, an instance of a particular transaction request by a computer operator or user is called a
task.
Less frequently and in other computer contexts, a transaction may have a different meaning. For
example, in IBM mainframe operating system batch processing, a transaction is a job or a job
step.
At any time, an application process might consist of a single transaction. However the life of an
application process can involve many transactions as a result of commit or rollback operations.
TKSawe
A transaction begins when data is read or written. A transaction ends with a COMMIT or
ROLLBACK statement or with the end of an application process.
The COMMIT statement commits the database changes that were made during the
current transaction, making the changes permanent.
DB2 holds or releases locks that are acquired on behalf of an application process,
depending on the isolation level in use and the cause of the lock.
The ROLLBACK statement backs out, or cancels, the database changes that are made by
the current transaction and restores changed data to the state before the transaction began.
The initiation and termination of a transaction define points of consistency within an application
process. A point of consistency is a time when all recoverable data that an application program
accesses is consistent with other data. The following figure illustrates these concepts.
When a rollback operation is successful, DB2 backs out uncommitted changes to restore the data
consistency that existed when the unit of work was initiated. That is, DB2 undoes the work, as
shown in the following figure. If the transaction fails, the rollback operations begins.
TKSawe
You can use the ROLLBACK statement to back out changes only to a savepoint within the
transaction without ending the transaction.
Savepoint support simplifies the coding of application logic to control the treatment of a
collection of SQL statements within a transaction. Your application can set a savepoint within a
transaction. Without affecting the overall outcome of the transaction, application logic can undo
the data changes that were made since the application set the savepoint. The use of savepoints
makes coding applications more efficient because you don't need to include contingency and
what-if logic in your applications.
To assure the ACID properties of a transaction, any changes made to data in the course of a
transaction must be committed or rolled back.
When a transaction completes normally, a transaction processing system commits the changes
made to the data; that is, it makes them permanent and visible to other transactions.
When a transaction does not complete normally, the system rolls back (or backs out) the changes;
that is, it restores the data to its last consistent state.
Resources that can be rolled back to their state at the start of a transaction are known as
recoverable resources: resources that cannot be rolled back are non-recoverable.
TKSawe
Consider the case of funds transfer from account A to account B.
A.bal -= amount;
B.bal += amount;
A.bal -= amount;
CRASH
RECOVERY
A.bal += amount; -- Rollback
A.bal -= amount;
B.bal += amount;
B.bal += amount;
TKSawe
A.bal -= amount (FAILS!! As balance is 0)
3. Isolation: A transaction should appear as though it is being executed in isolation from other
transactions. That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently.
Isolation is enforced by the concurrency control subsystem of the DBMS. If every transaction
does not make its updates visible to other transactions until it is committed, one form of isolation
is enforced that solves the temporary update problem and eliminates cascading rollbacks. There
have been attempts to define the level of isolation of a transaction. A transaction is said to have
level 0 (zero) isolation if it does not overwrite the dirty reads of higher-level transactions. A level
1 (one) isolation transaction has no lost updates; and level 2 isolation has no lost updates and no
dirty reads. Finally, level 3 isolation (also called true isolation) has, in addition to degree 2
properties, repeatable reads.
Eg:
Consider the case of funds transfer from account A to account B.
Transaction T1:
A.bal -= amount; (Let As balance become 0 after this)
B.bal += amount;
Transaction T2:
A.bal -= amount2;
TKSawe
4. Durability or permanency: The changes applied to the database by a committed transaction
must persist in the database. These changes must not be lost because of any failure.
Finally, the durability property is the responsibility of the recovery subsystem of the DBMS.
Eg:
Consider the case of funds transfer from account A to account B.
Transaction T1:
A.bal -= amount;
B.bal += amount;
Commit
In the context of transaction processing, the acronym ACID refers to the four key properties of a
transaction: atomicity, consistency, isolation, and durability.
Atomicity
All changes to data are performed as if they are a single operation. That is, all the changes
are performed, or none of them are.
For example, in an application that transfers funds from one account to another, the
atomicity property ensures that, if a debit is made successfully from one account, the
corresponding credit is made to the other account.
Consistency
Data is in a consistent state when a transaction starts and when it ends.
For example, in an application that transfers funds from one account to another, the
consistency property ensures that the total value of funds in both the accounts is the same
at the start and end of each transaction.
Isolation
TKSawe
The intermediate state of a transaction is invisible to other transactions. As a result,
transactions that run concurrently appear to be serialized.
For example, in an application that transfers funds from one account to another, the
isolation property ensures that another transaction sees the transferred funds in one
account or the other, but not in both, nor in neither.
Durability
After a transaction successfully completes, changes to data persist and are not undone,
even in the event of a system failure.
For example, in an application that transfers funds from one account to another, the
durability property ensures that the changes made to each account will not be reversed.
A distributed transaction is a transaction that updates data on two or more networked computer
systems. Distributed transactions extend the benefits of transactions to applications that must
update distributed data. Implementing robust distributed applications is difficult because these
applications are subject to multiple failures, including failure of the client, the server, and the
network connection between the client and server. In the absence of distributed transactions, the
application program itself must detect and recover from these failures.
For distributed transactions, each computer has a local transaction manager. When a transaction
does work at multiple computers, the transaction managers interact with other transaction
managers via either a superior or subordinate relationship. These relationships are relevant only
for a particular transaction.
Each transaction manager performs all the enlistment, prepare, commit, and abort calls for its
enlisted resource managers (usually those that reside on that particular computer). Resource
managers manage persistent or durable data and work in cooperation with the DTC to guarantee
atomicity and isolation to an application.
TKSawe
In a distributed transaction, each participating component must agree to commit a change action
(such as a database update) before the transaction can occur. The DTC performs the transaction
coordination role for the components involved and acts as a transaction manager for each
computer that manages transactions. When committing a transaction that is distributed among
several computers, the transaction manager sends prepare, commit, and abort messages to all its
subordinate transaction managers. In the two-phase commit algorithm for the DTC, phase one
involves the transaction manager requesting each enlisted component to prepare to commit; in
phase two, if all successfully prepare, the transaction manager broadcasts the commit decision.
2. When the application has prepared its changes, it asks the transaction manager to commit
the transaction. The transaction manager keeps a sequential transaction log so that its
commit or abort decisions will be durable.
o If all components are prepared, the transaction manager commits the transaction
and the log is cleared.
o If any component cannot prepare, the transaction manager broadcasts an abort
decision to all elements involved in the transaction.
o While a component remains prepared but not committed or aborted, it is in doubt
about whether the transaction committed or aborted. If a component or transaction
manager fails, it reconciles in-doubt transactions when it reconnects.
When a transaction manager is in-doubt about a distributed transaction, the transaction manager
queries the superior transaction manager. The root transaction manager, also referred to as the
global commit coordinator, is the transaction manager on the system that initiates a transaction
and is never in-doubt. If an in-doubt transaction persists for too long, the system administrator
can force the transaction to commit or abort.
Note
Many aspects of a distributed transaction are identical to a transaction whose scope is a single database. For
example, a distributed transaction provides predictable behavior by enforcing the ACID properties that define
all transactions.
For distributed transactions, each computer has a local transaction manager. When a transaction
does work at multiple computers, the transaction managers interact with other transaction
TKSawe
managers via either a superior or subordinate relationship. These relationships are relevant only
for a particular transaction.
Each transaction manager performs all the enlistment, prepare, commit, and abort calls for its
enlisted resource managers (usually those that reside on that particular computer). Resource
managers manage persistent or durable data and work in cooperation with the DTC to guarantee
atomicity and isolation to an application.
In a distributed transaction, each participating component must agree to commit a change action
(such as a database update) before the transaction can occur. The DTC performs the transaction
coordination role for the components involved and acts as a transaction manager for each
computer that manages transactions. When committing a transaction that is distributed among
several computers, the transaction manager sends prepare, commit, and abort messages to all its
subordinate transaction managers. In the two-phase commit algorithm for the DTC, phase one
involves the transaction manager requesting each enlisted component to prepare to commit; in
phase two, if all successfully prepare, the transaction manager broadcasts the commit decision.
In order for the scheme to work reliably, both the coordinator and the participating resource
managers independently must be able to guarantee proper completion, including any necessary
restart/redo operations. The algorithms for guaranteeing success by handling failures at any stage
are provided in advanced database texts.
TKSawe
Distributed System synchronization
Synchronization: single CPU sys vs. dist sys.
Single CPU:
critical regions, mutual exclusion, and other synchronization probls are solved using
methods such as semaphores and monitors.
DS:
semaphores and monitors are not appropriate since they rely on the existence of shared
memory
Problems to be tackled with:
Time
Mutual exclusion
Election algorithms
Atomic transactions
Deadllocks
Computers clock
Each computer has a circuit for keeping track of time
TKSawe
The word "clock" is used to refer to these devices, but they are not actually clocks in the usual
sense: timer is perhaps a better word.
A computer timer is usually a precisely machined quartz crystal.
When kept under tension, quartz crystals oscillate at a well-defined frequency that depends on
the kind of crystal, how it is cut, and the amount of tension.
Associated with each crystal are two registers, a counter and a holding register.
Each oscillation of the crystal decrements the counter by one.
When the counter gets to zero, an interrupt is generated and the counter is reloaded from the
holding register.
In this way, it is possible to program a timer to generate an interrupt 60 times a second, or at
any other desired frequency.
Each interrupt is called one clock tick.
TKSawe
When each machine has its own clock, an event that occurred after another event may
nevertheless be assigned an earlier time.
TKSawe
P: calculate P = C(P) C(S)
send P to S
S: receive all Ps
compute an average
send -P to client P
P: apply -P to C(P)
a) The time daemon asks all the other machines for their clock values
b) The machines answer
c) The time daemon tells everyone how to adjust their clock
Logical Clocks
For many DS algorithms, associating an event to an absolute real time is not essential, we only
need to know an unambiguous order of events.
Synchronization based on relative time.
Example: Unix make (Is output.c updated after the generation of
output.o?)
relative time may not relate to the real time.
Whats important is that the processes in the Distributed System agree on the ordering in
which certain events occur.
Such clocks are referred to as Logical Clocks.
Lamport Algorithm
TKSawe
Clock synchronization does not have to be exact
Synchronization not needed if there is no interaction between machines
Synchronization only needed when machines communicate
i.e. must only agree on ordering of interacting events
Two distinct events a and b are concurrent (a||b) if not (ab or ba).
We cannot say whether one event happened-before
For any two events a and b in a distributed system, either ab, ba or a||b.
Logical Clocks
There is a clock Ci at each process pi
The clock Ci can be thought of as a function that assigns a number Ci(a) to any event a, called
the timestamp of event a, at pi
These clocks can be implemented by counters and have no relation to physical time.
TKSawe