Documente Academic
Documente Profesional
Documente Cultură
in Database Systems
Fabian Merki
merkisoft informatik
Long Running User Transactions in Database Systems
Table of Contents
1 Preface...............................................................................................................................3
2 Application..........................................................................................................................4
3 Concept overview...............................................................................................................6
3.1 Long running user transactions..................................................................................6
3.2 Master-master or multi master replication..................................................................6
4 Use Case...........................................................................................................................7
5 Evaluation..........................................................................................................................8
5.1 Oracle Workspace......................................................................................................8
5.1.1 Example...............................................................................................................8
5.1.2 Conflict resolution................................................................................................9
5.1.3 Conclusion of Oracle's Workspace................................................................... 10
5.2 Daffodil Replicator....................................................................................................11
5.2.1 Testing Daffodil Replicator................................................................................12
5.2.2 Conclusion.........................................................................................................12
5.3 Hibernate..................................................................................................................13
5.3.1 What does Hibernate offer?..............................................................................13
5.3.2 Replication modes.............................................................................................14
5.3.3 What is missing in Hibernate?...........................................................................14
5.4 Others.......................................................................................................................15
5.4.1 Microsofts SQL Server......................................................................................15
5.4.2 Slony I / II...........................................................................................................15
5.5 Conclusion................................................................................................................16
6 Design and implementation of replication with Hibernate............................................... 17
6.1 Methods of replication...............................................................................................17
6.2 Replication Framework.............................................................................................18
6.3 Algorithm...................................................................................................................18
6.4 Additional replication table........................................................................................19
6.5 Conflict resolution.....................................................................................................19
6.6 Dependencies...........................................................................................................20
6.7 Additional unique constraint vs. UUID......................................................................21
6.8 Transaction handling................................................................................................22
7 Testing..............................................................................................................................23
8 Conclusion.......................................................................................................................25
9 References.......................................................................................................................26
1 Preface
The subject "Long Running User Transactions in Database Systems" was selected for this
assignment because of a requirement that arose from a Course Administration software
that was under development at that time. Refer to http://kursweb.merkisoft.ch for more
information on the Course Administration application.
This application had specific database replication requirements the likes of which I had not
encountered before. The assignment was a perfect opportunity to spend some time to
investigate what solutions are available on the market and then to test a solution in a real
world scenario.
I decided to write the document in English, because I feel the Course Administration
application is not the only software that could benefit from such a solution. Documenting
my findings in English on the web will open up the results to a broader spectrum of people
than if written in my native German. It also gave me the opportunity to practise writing
technical documentation in English which is a requirement in my current employment.
Acknowledgements
I would like to thank my lecturer Mr. Herbet Bitto for his help and support for the content of
this assignment, Mr. Steven Hawkes for reviewing the document and the SBB for offering
a comfortable environment where due to time pressures, most of this assignment was
conducted. I found it a challenging experience researching and developing this solution
whilst commuting on a daily basis.
I hope you will enjoy reading this document.
Fabian Merki
Hereby I do confirm that everything in this assignment is created, written, drawn by myself
unless otherwise stated.
________________ __________________________
2 Application
The following diagram outlines a problem scenario for an application that uses multiple
databases. Customers subscribe for courses via the internet. The administrator manages
courses, subscriptions, teachers and additional data.
Figure 1
A local database was selected because the administrators have a slow internet connection
but require fast data access. The centralised database on the internet can be modified at
the same time as the local database. At some point the local and the central database
must be replicated. Because both databases are master databases, such a replication is
called master-master replication. The term master means that the databases is updated by
a user and therefore becomes the master of the modified data. Multiple local databases
might exist since more than one administrator can manage the data. An organisational
process must be established by the administrators so that changes do not get overwritten
by others.
A further requirement of this architecture is that changes may be stored before they are
published. Such time consuming changes by a user are called long-running user
transaction.
Because a course is not a single database record (actually it is a complex structure) the
replication process must replicate the whole database as a single entity – replication of
single tables would most likely fail because of the foreign key constraints that exist within
the database.
This document outline how the previous requirements can be addressed using existing “off
the shelf” products together with custom software.
3 Concept overview
4 Use Case
Title Database replication
Precondition Local and remote database exists.
At least the local database is filled with data.
Description For all tables which need to be replicated, the versions in both
systems are checked and the corresponding action for each row is
applied.
Postcondition The local and the remote database are identical in terms of the data
in the replicated tables.
The version of the replicated objects are stored.
Variations Database download:
1. clean / delete local tables
2. start replication
Actors Administrator
5 Evaluation
A major proportion of the available time for this assignment was allocated to locate and
evaluate existing products offering solutions for the specified requirements. This chapter
provides specific detailed information on the key features of products outlined previously.
It was found that some replication products only cover the simple case of master-slave
replication and therefore do not fit the requirements of this assignment. Others do have
master-master concepts but most of them only work if the second instance of the database
is always available and are not able to accommodate a dynamic set of master databases.
The evaluation below focus's on Oracle Workspace, Daffodil, Hibernate and provides a
insite into these products can be used to solve the master-master replication problem,
covering issues such as merging, conflict resolution etc.
5.1.1 Example
Session 1 Session 2
execute DBMS_WM.EnableVersioning('emp');
execute DBMS_WM.CreateWorkspace('NEWWORKSPACE');
execute DBMS_WM.GotoWorkspace('NEWWORKSPACE');
execute DBMS_WM.MergeWorkspace('NEWWORKSPACE');
execute DBMS_WM.RefreshWorkspace('NEWWORKSPACE');
execute DBMS_WM.RemoveWorkspace('NEWWORKSPACE');
Example 1
execute DBMS_WM.MergeWorkspace('NEWWORKSPACE');
execute DBMS_WM.ResolveConflicts('NEWWORKSPACE',
'emp', 'empno>=0','child');
execute DBMS_WM.CommitResolve('NEWWORKSPACE');
execute DBMS_WM.MergeWorkspace('NEWWORKSPACE');
Example 2
This product supports bi-directional data replication by either capturing a data source
snapshot or by synchronizing the changes. It monitors for data changes in the tables and
synchronizes all data changes made by the subscriber and the publisher on a periodic
basis or on-demand by the subscriber. While synchronizing with one or more target data
source, Replicator uses pre-defined conflict resolution algorithms to resolve conflicts
between the publisher and subscriber. The publications and subscriptions are defined
using GUI or APIs on existing database servers.
Figure 2
Source: [RES-DAFFODIL]
The problem only occurs when a course row is delete on the publisher side and there is no
clear reason why.
A more minor issue is that Daffodil Replicator adds triggers to tables and sometimes in my
tests, the replication completed without actually performing any changes on the other side.
The reason for this was that the test cases drop and recreate tables for a clean test setup
which caused the deletion of the triggers. Therefore tables should never be dropped and
recreated because Daffodil will not recreate the triggers.
5.2.2 Conclusion
It was quite easy to use the open source edition of Daffodil Replicator. Apart from the
problem with deletes, the replication process works very well. One useful feature is that
the smallest unit of the merge operation is a single cell and not an entire row.
The detection of the delete bug rendered this product in its current version unusable for my
application.
5.3 Hibernate
Hibernate is an open-source object-relational mapping framework for Java. It is able to
create a database scheme to persist the Java objects and to query the database with
either SQL or HSQL (which is Hibernate's object oriented version of SQL). Model classes
have to be annotated with @Table and fields with @Id, @Basic, @OneToMany etc.
Alternatives to annotated classes exists but has not been considered in this research.
The following code illustrates the usage of model classes with annotations from the
javax.persistence package:
package model;
import javax.persistence.*;
@Entity
public class Student extends BaseEntity {
@Column(nullable = false)
private String name;
@ManyToOne(cascade = CascadeType.ALL)
private Address address = new Address();
@OneToMany(targetEntity = Subscription.class,
cascade = {CascadeType.REMOVE}, mappedBy = "student")
private List<Subscription> subscription = new ArrayList<Subscription>();
// [...]
Apart from this basic database access the Session also has support for replication. The
method replicate can take an object from an other database and persists it into the current
database. To maintain key constraints, Hibernate maintains the primary key id even if a
unique key generator is used. It works best when using the UUID key generator. It is very
important to have unique keys over more than one database therefore UUIDs are
considered reasonable.
In contrast to Oracle's Workspace, Hibernate is able to manage the full object relationship
model and will replicate related objects or cascade deletes to child objects.
5.4 Others
5.4.2 Slony I / II
Slony-I is a "master to multiple slaves" replication system supporting cascading and
slave promotion.
[..]
But Slony-I, by only having a single origin for each set, is quite unsuitable for really
asynchronous multi-way replication. For those that could use some sort of
"asynchronous multi master replication with conflict resolution" akin to what is
provided by Lotus Notes™ or the "syncing" protocols found on Palm OS systems,
you will really need to look elsewhere. These sorts of replication models are not
without merit, but they represent different replication scenarios that Slony-I does not
attempt to address.
(Source: [RES-SLONY])
It looks as if Slony would not fit my requirements because it only provides “single origin for
each set”. Therefore I did not look more deeply into it.
Nevertheless the following paragraph states very well the issues of conflict resolution:
Some async multimaster systems try to resolve conflicts by finding ways to apply
partial record updates. For instance, with an address update, one user, on one
node, might update the phone number for an address, and another user might
update the street address, and the conflict resolution system might try to apply
these updates in a non-conflicting order.
Conflict resolution systems almost always require some domain knowledge of the
application being used.
It is absolutely true that domain specific knowledge is needed and that a general conflict
resolving mechanism does (most likely) not exist.
5.5 Conclusion
Because the primary goal of this assignment is a solution to replicate multiple master
databases within Java applications and no standard product fits exactly the requirements
an own solution must be developed. Hibernate was chosen as the basis for this solution.
The reasons of the choice:
● Successful prove of concept
● Database independent (JDBC)
● Free, open-source and 100% pure Java
● No additional server is required
● No additional database mapping required (This was already created for the course
application. No redundancy, reduces the number of required changes when adding,
renaming, removing columns or tables.)
Figure 3
6.3 Algorithm
After performing the cleanup, if freshDownload is set to true, the order of the classes to be
processed is evaluated and stored in a list. Classes, which are not referencing other model
classes, are at the start of this list while the most referenced classes are at the end of the
list. The algorithm to generate this dependency graph will be explained later in this
document.
For each class in the list a replication object is created. In the construction phase the
following query is performened on both databases:
select x.id, (select r.replicatedVersion from ReplicationVersion r where
r.id=x.id and r.system=:SYSTEM), x.version from <tablename> x order by id
Now the results are simultaneously processed to determine if a row has to be inserted,
updated or deleted in one or the other database. Because both results are ordered by id,
6.6 Dependencies
To be able to perform the replication, the relationships between entities must be evaluated.
No child record can be inserted unless the corresponding parent record exists.
The framework should determine how the classes, which are mapped to database tables
by using annotations, are related to each other. The following Java code performs a
dependency sorting. A list of classes which require replication are passed in. The method
returns a list of classes where classes without references to other classes are at the top of
the list followed by classes with references to already processed classes.
Figure 4
while (!graph.isEmpty()) {
for (Iterator<Class> iterator = graph.keySet().iterator();
iterator.hasNext();) {
Class clazz = iterator.next();
List<Class> list = graph.get(clazz);
if (list.isEmpty()) {
classStack.add(clazz);
iterator.remove();
}
}
return classStack;
}
The model is an adjacency list or in object oriented manner: a map where the key is the
class and the values are lists of depending classes.
Classes which have no dependency are added one-by-one to the sorted list. When a class
is added because it has no more dependencies, it will be removed from the dependency
lists of the other classes. Remark: this algorithm only works if no cycle exists because
classes in a circle always have a dependency on other classes. Cyclic dependencies are
very unlikely for a database model and therefore not considered in this solution.
7 Testing
To prove the functionality and correctness of the replication framework JUnit was used.
JUnit is a powerful test framework for Java and was chosen to write the test cases. This
approach helped to develop the software and additionally it allowed quickly regression
testing.
A very simple datamodel was used for the tests. The following diagram shows the
relationship between the classes.
The following extract of the test class shows how the tests are written:
public void testSubscription() throws ParseException {
Session local = DAO.getLocal().openSession();
Session remote = DAO.getRemote().openSession();
checkSubscription(0, 0, 0, 3, 0, 0);
megaSubscriptionTest(local, remote);
checkSubscription(1, 0, 0, 1, 1, 0);
initLocalDatabase();
megaSubscriptionTest(remote, local);
checkSubscription(0, 1, 1, 0, 0, 1);
checkSubscription(0, 0, 1, 0, 0, 0);
}
Replication[] rr = Replication.mergeAll(false);
check(rr[0], Course.class, 0, 0, 0, 0);
check(rr[1], City.class, 0, 0, 0, 0);
check(rr[2], Student.class, 0, 0, dlStudent, drStudent);
check(rr[3], Subscription.class, ulSubscription, urSubscription,
dlSubscription, drSubscription);
}
The checkSubscription Method replicates the databases and then checks if the expected
amount of records were updated. With this approach it is really simple to do complex test
cases where both sides insert, update and delete records without writing much code.
Please see code for full details of test cases.
8 Conclusion
Depending on the requirements many products are available to perform replication. In the
situation of the course application the Hibernate solution was a good choice (see chapter
5.5).
The complete course administration software including a homepage and the replication
process described in this document was successfully deployed to administrate more than
700 children in three regions. In this production environment the software worked very
well. A few minor bugs were quickly fixed. The performance of the application was good. A
replication regularly completed within 5-20 seconds. In scenarios where the number of
records exceeded 1000 records i. e. after a mass update it took up to 5 minutes.
To address this performance issue, in parallel to this assignment I under took another
project to develop a zipped tunnel solution. Early tests are looking promising and are
showing a 2-5 fold reduction in communication load can be achieved. If successful, the
zipped tunnel will be integrated with the work of this assignment and used in the course
administration software.
9 References
[HIBERNATE]
http://www.hibernate.org
[JAVA]
http://java.sun.com
[ORACLE-WS]
http://www.oracle.com/technology/products/database/workspace_manager/index.html
[DAFFODIL]
http://www.daffodildb.com/replicator/dbreplicator.html
[RES-SLONY]
http://developer.postgresql.org/~wieck/slony1/adminguide-1.1.rc1/slonyintro.html
[RES-MSSQL]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/replsql/repltypes_30z7.asp
[RES-DAFODIL]
http://opensource.replicator.daffodilsw.com/what-is-replicator.html
[RES-ORACLE-WS]
http://www.adp-gmbh.ch/blog/2006/05/09.php
http://www.idevelopment.info/data/Oracle/DBA_tips/Workspace_Manager/WM_1.shtml