Sunteți pe pagina 1din 14

ACID Properties By Example (And Counterexample) Part Zero

When talking about databases, ACID is an acronym that stands for Atomic, Consistent, Isolation
and Durable. These are important properties of a database systems architecture. Specifically
these properties refer to how database transactions are designed.

In fact this stuff is important in any transaction processing system (TPS). These systems (not just
database systems) use a server-client architecture and they first became popular in the 1960s.
These systems are successful because they allow multiple clients to modify and share data
concurrently all while enforcing data integrity! Not too shabby.

So most servers (including database servers) were built with this architecture in mind. Its
interesting that NoSQL databases dont attempt to provide ACID transactions. Each of these
NoSQL databases ignore one or more of these properties and attempt to offer something else in
its place (but thats a story for another day).

With SQL Server, these properties are enforced by default. But as it happens, you can relax these
ACID properties in SQL Server if you want. Well see that it turns out to be easy (maybe too
easy?) to write SQL that ignores some of these properties. The hope is that after reading this
series, youll

be aware of the properties


understand why database transactions behave the way they do,
and be aware of any consequences if youre tempted to give up any of these properties.

How This Series Is Organized


So I started this series as a single blog post, but it was getting a bit long for a single article. I
wanted to come up with some examples (and counterexamples) other than the too common
example of a money transfer between two bank accounts.

What youll see in this series is

a description of each ACID property.


A bit about how each property is handled in SQL Server,
An example from real life (but not necessarily an I.T. example!)
A counterexample from real life (but again, not necessarily an I.T. example!)
In the 1800s scientists first used the word atom to describe the smallest bits of an element. They
picked that word because it meant un-cuttable or indivisible. The idea was that thats as far as
you can go; you cant break down these bits further. Fast forward to the early 1900s and they
found out that atoms actually can change (through radiation or splitting). But it was too late to
change the language. Splittable or not, atoms are atoms.

This process of splitting atoms is so interesting that somehow the word atomic has come to refer
this process of dividing atoms (e.g. atomic bomb, atomic energy).

Atomic Transactions in SQL Server

But when we talk about the word atomic as one of the ACID properties of transactions, the word
regains its original meaning: indivisible. SQL Server transactions are always atomic. Theyre all-
or-nothing. To twist Meatloafs words, this means that two out of three is bad (Its got to be three
out of three or nothing). Its often forgotten, but this applies to single statement transactions too;
all of a statement (whether an update, delete or insert) will happen or not happen.

To guarantee atomicity, SQL Server uses a Write Ahead Transaction Log. The log always gets
written to first before the associated data changes. That way, if and when things go wrong, SQL
Server will know how to rollback to a state where every transaction happened or didnt happen.
Theres a lot more to it than that, but as a developer all I care about is that I can trust that my
transactions dont get split.

Example

Heres an example from outside the I.T. industry. Its a story about an all or nothing transaction.
About two years ago, Samoa switched from driving on the right side of the road to the left (The
NYT has a great article on it).
You can imagine the great effort that must go into a switch like this. And it has to happen all or
nothing. The switch has to happen everywhere, all at once with no exceptions. Unlike other big
projects that can usually be broken down into smaller phases, this one cant.
Translated into SQL, this might be equivalent to:

BEGIN TRANSACTION

UPDATE ROADS
SET TrafficDirection = 'Left'
WHERE Country = 'Samoa';

UPDATE TRAFFIC_LIGHTS
SET TrafficDirectionMode = 'Left'
WHERE Country = 'Samoa';

UPDATE INTERSECTIONS
SET TrafficDirectionConfigurationMode = 'Left'
WHERE Country = 'Samoa'

COMMIT

Counter Example

An example of failed atomicity (outside I.T.). One word: Standards.


Say you want to create a universal standard for something (say the Metric system) the main
purpose is to create it to be the single standard to replace all others. If you fail in your goal,
youve added to the problem!
Some more successful universal standards:

http (over gopher etc) for almost all web traffic


Blueray (over hd-dvd) for hi-def movie formats

But consider the Metric system. Its mostly successful because of its large adoption. But because
there are a few stragglers, its not as successful as it could be. Translated into SQL:

UPDATE EVERYTHING
SET Units = 'METRIC',
Value = fn_Convert(Value, Units, 'METRIC')
-- no where clause!

This statement didnt update everything. The statement wasnt atomic and this continues to
cause problems. One problem that comes to mind is the failed Mars Climate Orbiter mission.

Lets be grateful for the all-or-nothing transactions we have in our databases!


ACID Properties By Example (And Counterexample) Part Two: Consistent

Consistency
A transaction reaching its normal end, thereby committing its results, preserves the consistency
of the database. In other words, each successful transaction by definition commits only legal
results.
Essentially, consistency means that database systems have to enforce business rules defined for
their databases.

But its interesting. The word consistency (applied to database systems) arent always used the
same way! For example, in Brewers CAP theorem the C, standing for consistency is defined as
All clients have the same view of the data. (Really computer scientists?? Thats the word you
decide to overload with different meanings?). So if you ever hear someone say eventually
consistent. Theyre using the consistency term from CAP, not the consistency term from ACID.

I guess C means something different for everyone.

Consistency in SQL Server

In my own words, consistency means that any defined checks and constraints are satisfied after
any transaction:

Columns only store values of a particular type (int columns store only ints, etc)
Primary keys and unique keys are unique
Check constraints are satisfied
Foreign key constraints are satisfied

Constraints Are Enforced One Record At A Time


Some things you might notice about these constraints. They can all be checked and validated by
looking at a single row. Check constraints enforce rules based only on the columns of a single
row. One exception is where these constraints might perform a singleton lookup in an index to
look for the existence of a row (for enforcing foreign keys and primary keys).
Multi-line constraints are not supported directly because it would be impractical to efficiently
enforce consistency. For example, its not possible to create a constraint on an EMPLOYEE
table that would enforce the rule that the sum of employee salaries must not exceed a specific
amount.

Consistency Enforced After Each Statement


Another interesting thing about SQL Server is that while ACID only requires the DBMS to
enforce consistency after a complete transaction, SQL Server will go further and enforce
consistency after every single statement inside a transaction. It might be nice to insert rows into
several tables in any order you wish. But if these rows reference each other with foreign keys,
you still have to be careful about the order you do the inserting, transaction or no transaction.

Handling Inconsistencies
When SQL Server finds inconsistencies. It handles it in one of a few ways.

If a foreign key is defined properly, a change to one row can cascade to other other rows.
If a value of a particular datatype is inserted into a column which is defined to hold a different
datatype, SQL Server may sometimes implicitly convert the value to the target datatype.
Most often, SQL Server gives up and throws an error, rolling back all effects of that statement.

Inconsistent Data Any Way


It also turns out that its very easy to work around these constraints! (Besides the all-too-
common method of not defining constraints in the first place). Primary keys, Unique constraints
and datatype validation are always enforced, no getting around them. But you can get around
foreign keys and check constraints by

using WITH NOCHECK when creating a foreign key or a check constraint. Youre basically saying,
enforce any new or changing data, but dont bother looking at any existing data. These
constraints will then be marked as not trusted
using the BULK INSERT statement (or other similar bulk operations) without
CHECK_CONSTRAINTS. In this case foreign keys and check constraints are ignored and marked as
not trusted.

Example

Im taking the following example not from I.T., but from the world of medical labs.

When processing medical tests (at least in my part of the world), theres a whole set of rules that
medical professionals have to follow. Doctors and their staff have to fill in a requisition properly.
The specimen collection centre has do verify that information, take samples and pass everything
on to a lab. The lab that performs the tests, ensures that everything is valid before performing the
test and sending back results to the doctor.

Just like a database transaction, the hope is that everything goes smoothly. All patient
information is entered properly. Patients and lab techs have followed all appropriate instructions.

Fixing Inconsistent Data: It sometimes happens that information is entered incorrectly or


missing (like insurance info, or the date and time of the test). In these cases, often the lab might
call back for corrections before continuing with the test. This is similar to the case when SQL
Server recognizes that a statement will not leave the database in a consistent state. In some cases,
SQL Server can try to do something about it. For example it can do an implicit conversion of a
datatype, or it can cascade a delete/update.

Giving Up And Rolling Back: But sometimes a medical test cant be saved. For example,
sometimes a sample arrives clotted when it should have arrived unclotted (or vice versa). In these
cases, meaningful results arent possible and the whole test has to be rejected to be performed
again correctly. SQL Server will do this whenever its necessary to maintain consistent data. It
will raise an error and the entire statement or transaction is undone (to be corrected and
performed again).

Counterexample

Well, this counterexample comes from the world of cheesy Science Fiction. Normally we want
our databases to store only consistent and legal data. Any illegal data should be rejected right
away. What we dont want is for our databases to get hung up on some crazy inconsistent data.

But if youre Captain Kirk and you need to deal with a rogue computer or robot thats acting up.
What do you do? Simple, confuse it with inconsistent information! Those robots wont know
what hit them.

This bit of dialog comes straight from an episode of Star Trek called I, Mudd (Im not even
making this up, Google it!)

Kirk: Norman, Everything Harry tells you is a lie, remember that, everything Harry tells you is a
lie.
Harry Mudd: Listen to this carefully Norman: I am lying.
[Norman the android starts beeping, his light starts flashing and his ears start smoking]
Norman: You say you are lying but if everything you say is a lie then you are telling the truth
but you cannot tell the truth because everything you say is a lie but you lie, you tell the truth but
you cannot for you lie Illogical! Illogical!
[more of the same, more smoke and kaboom]

Honest-to-God smoke from the ears! Its so classic it gets parodied a lot. (Heres one of my
favourites, a comic from Cyanide and Happiness).

Im grateful that our databases dont choke on inconsistent data. They just throw an error and tell
clients Here, you deal with it!
ACID Properties By Example (And Counterexample) Part Three: Isolation

So the third ACID property of database transactions is I for Isolation. This is the only ACID
property that deals with behaviour of a transaction with respect to other concurrent transactions
(all the other properties describe the behaviour of single transactions). Haerder and Reuter
describe it as:

Isolation: Events within a transaction must be hidden from other transactions running
concurrently.
Its not super-rigorous, but I think of it like this: No looking at works-in-progress

So there are different kinds of database isolation. Even with the the guideline: no looking at
other transactions in progress. And now these levels of isolation are well defined. I wrote a
series on those earlier, the different levels are READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ and SERIALIZABLE. By the way READ UNCOMMITTED is the only
isolation level here that is not really isolated, more on that later.

Isolation in SQL Server

SQL Server supports all of these isolation levels. It enforces this isolation using various locks on
data (fascinating stuff actually), processes will wait to maintain isolation. In contrast, Oracle
supports only SERIALIZABLE and a kind of READ COMMITTED that is closer in behaviour
to SQL Servers SNAPSHOT isolation. No matter how its implemented, READ COMMITTED
is the default isolation level in both SQL Server and Oracle.

Unisolated Transactions:

So it is possible for other transactions to see the effects of a transaction in-flight (i.e. as its
happening, before its committed). This is done with NOLOCK hints or with the READ
UNCOMMITTED isolation level. In fact, I learned recently that when using NOLOCK hints,
you not only can see the effects of an in-flight transaction, but you can see the effects of an in-
flight statement. This is an Isolation failure and it boils down to this: SQL Server transactions
are atomic, but when using NOLOCK, it might not seem that way. So take care.

Example
Todays example and counterexample both come from the newspapers headlines of Chicago.

For the example a fictional example I explain a situation thats all about not making
assumptions. Its all about being cautious and not committing to a decision while the jurys still
out. This immediately brought to mind a scene from the movie Chicago [spoiler alert!] :

The movie (and play) is about a court case. The main character Roxie is on trial for murder. Its a
sensational trial and the papers are eager to publish the results of the trial. The papers are so
eager in fact that the papers have printed out two editions of their newspapers. One headline read
Shes Innocent the other headline read Shes Guilty. But those two stacks of papers are just
sitting there in the van. The man in the newspaper van waits for a signal from the
courthouse. Once he got the proper signal, he cracked open the innocent edition and gave them to
a paper boy to hand out.

Its about not acting on information while the jury is still out. The jury is isolated from the world
and no one can act on what the jury has to say until theyve committed to a verdict.

Counter-Example

Our counter-example comes from non-fiction. In reality, the assumptions we make tend to be
correct. Our assumptions are only interesting when they turn out to be incorrect. This counter-
example comes from the most incorrect newspaper headline I can think of:

Dewey Defeats Truman

Click through for Wikipedias page on cool piece of newspaper history (Chicago newspaper
history). Its a great example of what can go wrong when we act on tentative (uncommitted)
information. The Chicago Tribune published the wrong presidential candidate as the winner.

But the really really cautious reporters would report neither candidate as the winner. Theyd be
waiting at the Electoral College convention. Theyd be keen on seeing how that turns out.

ACID Properties By Example (And Counterexample) Part Four: Durable

The last ACID property is D, Durability. Again, Haerder and Reuter describe Durability:

Once a transaction has been completed and has committed its results to the database, the system
must guarantee that these results survive any subsequent malfunctions.
What does this mean exactly? Thats a tall order for a database system! I mean any malfunction
whatsoever? Im pretty sure our database systems are designed to survive a power failure but I
dont expect that they could survive something as severe as the heat death of the universe.

Actually databases dont have to go that far. When designing a database system, only two kinds
of malfunctions are considered: media failure and system failure.

Media Failures

For media failure (e.g. a faulty hard drive) databases are recovered by using backups and
transaction logs. And this leads directly to three bits of super-common DBA advice:

Take backups regularly.


Keep your transaction logs and your main database files on different hard drives.
When dealing with a disk failures, step one is backing up the tail of the log

system Failures

System failures (e.g. system crashes, power outages etc) have to be handled too.

SQL Server does it this way. When SQL Server is processing transactions, it will first write
changes to a transaction log and then write the associated changes to the database file. Always
always in that order (Theres a bit more too it, but thats the main part). Its called the Write-
Ahead Transaction Log.

But when theres a system malfunction, a few things need to be cleaned up the next time the
server restarts (to maintain atomicity and consistency). There may be transactions that were
interrupted and not yet committed. And some transactions may not have their changes written to
disk, or sometimes not written completely to disk. How do you recover from stuff like that?

Well the database recovers from a failure like that during a startup process called
(unsurprisingly) recovery. It can look at these half-performed transactions and it can roll them
back using the info in the logs. Or alternatively it can roll-forward and replay committed
transactions that havent made it to disk if the conditions are right and theres enough info in the
transaction log to do so. (Further Information at MCM Prep Video: Log File Internals and
Maintenance)

So What Does This Mean To You?


If an ACID database system like SQL Server reports that your transaction has committed successfully then because
its durable, your transaction is truly persisted: You dont have to worry about buffer flushes or power failures
losing your work.

Example
So what is interestingly durable? Durability in database systems usually means that something is
redundant so that if one thing is lost, the transaction is not lost. So I give a list here of things that
are too redundant:

The Hydras heads (Greek Mythology)


Enchanted Brooms from the Sorcerers Apprentice.
Autofac (An interesting short story by Philip K. Dick which I finished reading last night).

Counter-Example

I have two examples and they both come from the career of Richard Harris (best known to my
family as the first Dumbledore). Did you know he was a one-hit wonder? He had a hit single in
the seventies called MacArthur Park. If youve never heard the song, skip this article and
experience the utter madness that is MacArthur Park. You wont regret it

Back to the example. The singer of MacArthur Park would like to have his cake.
Unfortunately, its been left out in the rain (malfunction). But thats okay right? He could
always get out the recipe (transaction log) and make a new one right? Wrong! Hell never
have that recipe again (durability fail). Had he persisted that recipe, the poor sucker would
still have his cake.

Bonus Richard Harris Counterexample

You may remember he played Emperor Marcus Aurelius in the movie Gladiator. (Spoiler
alert!) In that movie, he plans to make Maximus his heir instead of his son Commodus. He first
tells his plans to Maximus (who is reluctant to rule Rome) and then he tells Commodus who
did not take the news well at all. In fact he murdered his father after hearing it! The Emperors
plans never make it to the public and so Commodus becomes Emperor.
You see, his plans to make Maximus his heir was not durable! Had the Emperor told a bunch
of other people first, then his intended heir Maximus would have ruled Rome as he wanted
(Not to mention it would have removed the motive for his murder).

Thats The Series

So thats it. I had fun with it. It gave me a chance to geek out. And even though blog post
series are a nice way of treating a topic in depth, I still found myself struggling to keep each
article to blog-post length. Theres just so much to learn here. I guarantee I learned more
writing this series than a reader would reading it

S-ar putea să vă placă și