Sunteți pe pagina 1din 6

Database Development Supporting

Offline Update Using CRDT


(Conflict-free Replicated Data Types)

Erick Chandra Achmad Imam Kistijantoro


School of Electrical Engineering and Informatics School of Electrical Engineering and Informatics
Institut Teknologi Bandung Institut Teknologi Bandung
Bandung, Indonesia Bandung, Indonesia
erickchandra.1@gmail.com imam@informatika.org

Abstract—Database is nowadays a crucial data storage for Generally, database could only be run with online mode.
application needs. Currently, there are several existing popular Nonetheless, the connection to database could not be always
databases such as relational database like SQL and non- guaranteed, as there might be possibility to lost connection
relational database (i.e. NoSQL). However, both types have their especially for mobile client. If the offline mode is permitted
own drawbacks which could not be utilized in both online and
between replicas, consistency and integrity issues will rise and
offline modes concurrently. Therefore, the authors develop a
database that supports both modes especially availability in conflicts might appear. Currently, very few databases provide
offline mode without sacrificing consistency. The database is offline mode. One of the examples is Riak, with the replication
integrated with CRDT (conflict-free replicated data types) and synchronization attached on the server. Whenever one of
algorithm to handle and solve conflicts in synchronization the server replicas should fail to connect, it still responds to
process. Some tools and technologies are used in the database incoming requests. Until the connection is recovered, Riak will
development including Apache Thrift for middleware, LevelDB perform synchronization.
for server database, and IndexedDB for temporary client Based on the useful advantages for offline usage, the
database. The purpose of this development is to build a CRDT authors aim to develop a scalable yet consistency-safe database
database with usable API for developer users. Finally, it is tested
with scenarios to satisfy essential operations: add, remove, and
that could be utilized in both online and offline modes, and the
update. replicas persist on client sides. In order to support the
development, an algorithm is used to solve the potentially
Index Terms—availability, CRDT database, offline update, emerging conflicts. CRDT, which stands for conflict-free
synchronization. replicated data types, is exploited for its ability to handle
conflicts. The database is designed and equipped with API for
I. INTRODUCTION developer users.
Database is an organized collection of data [13]. There are II. RELATED WORKS
popular databases used nowadays, for instance, relational and
non-relational database. Relational database [6] essentially uses A. CAP Theorem
the principle of ACID, which is atomicity, consistency, CAP theorem or known as Brewer Theorem stands for
isolation, and durability [8]. Nonetheless, some properties consistency, availability, and partition tolerance. Brewer claims
could not be satisfied completely in the real-world application. that it is impossible for a distributed system to guarantee all of
Relational database is basically designed to run on a single the characteristics at the same time. Two out of three
server. Hence, it is relatively not scalable. But then, further characteristics could only be satisfied as a trade-off [4][5]. If a
improvements came to develop distributed database with the system provides consistency and partition tolerance, it should
use of transaction. This indeed still does not solve the lose the availability aspect. This applies to the other two
scalability issue. possibilities. The illustration is shown in Fig 1.
On the other hand, there is non-relational database which
uses BaSE (basically available, soft state, and eventual
consistency) principle. This kind of database is intentionally
designed to be scalable, as opposed to relational database.
However, non-relational database is severely criticized for its
consistency. It guarantees the consistency no more than
relational database does.

978-1-5386-3001-3/17/$31.00 ©2017 IEEE


consistency if it has converged or being convergent in replicas
[10]. In database, EC is categorized in BaSE principle. [11]
In eventual consistency, conflicts are solved by performing
reconciliation instead of consensus. After reconciliation is
completed, all replicas should agree with the decided value.
Certain approaches could be applied to the reconciliation
mechanism, including the principle of ‘last writer wins’ [14].
3) Strong Eventual Consistency
Strong Eventual Consistency (SEC) derives some of EC
properties. Every replica that has already received and applied
updates with the same set must have equivalent state [3][12].
C. CRDT Algorithm
CRDT is the short form of Conflict-free Replicated Data
Types. It is a kind of data type that is designed to be able to
satisfy the principle of strong eventual consistency and
Source: http://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter monotonicity that require no rollback. CRDT is used to
Fig 1. CAP Theorem Triangle replicate data in many computers in a network and execute the
updates without always having synchronized at all time.
1) Consistency Different from eventual consistency, CRDT is designed with
The consistency definition made by Gilbert and Lynch is SEC so that conflicts will never occur mathematically.
that each server provides the correct response for each request Equivalent state is guaranteed whenever a replica receives and
[7]. After all writing operations are completed, system should applies updates to the same set without synchronization. There
return the value from all the latest writes. Brewer stated that are two types of CRDT: state-based and op-based [1][2][9].
consistency is equivalent to having a single latest copy from 1) State-based CRDT
the data [4][5]. State-based CRDT, which is often pronounced as
2) Availability Convergent Replicated Data Types (CvRDTs), uses state-based
A running node should result in a response immediately. A object such as tuple (S, s0, q, u, m) [12]. There is si which is the
fast response is expected than the slow response. In real element of S as the payload in a replica process pi. Client reads
systems, a sufficiently slower response and no response are the status from the object with query method q and modifies
viewed as the same [4][5][7]. using update method u. Method m provides status merging
3) Partition Tolerance from remote replica. State-based CRDT uses the principle of
The network is allowed to be operated although some of the convergence for the merging process.
nodes fail to send or receive messages. Messages could be 2) Operation-based CRDT
delayed or lost forever [4][5]. Operation-based (op-based) CRDT is often called as
B. Consistency Model Commutative Replicated Data Types (CmRDTs). In op-based,
no merging operation is involved as in state-based CRDT.
There are several types of consistency model, such as strict Instead, updates are dispatched into a tuple of (t, u) with t as
consistency, sequential consistency, causal consistency, strong the prepare-update and u as the method of effect-update [12].
consistency, eventual consistency, and strong eventual
3) CRDT Variants Comparison
consistency. They are used in distributed computing to achieve
There are several comparisons for CRDT using state-based
or guarantee the consistency.
and op-based as shown in TABLE I.
1) Strong Consistency
Strong consistency (SC) is a consistency model for use in
TABLE I. CRDT COMPARISON
concurrent programming, for instance distributed shared
memory and distributed transaction. A protocol is said to be Comparison State-based Operation-based
Updating Updating and merging Updating operation is replicated.
supporting SC if all accesses are able to be viewed by all operation operation are differentiated.
parallel process or nodes with the same sequential order. In Update State includes previous updates, Perform reconciliation for non-
information hence no past information is commutative operation.
database, SC is intensely used in the part of transaction which separated.
guarantees the commits from one transaction must be known Information sent All complete states are Smaller operations are
by all nodes. In other words, a client will not receive invalid or transmitted when receiving and transmitted.
merging.
expired value. Basically, strong consistency uses consensus to Size of Larger in size. Smaller in size.
decide the chosen value. transmission
Concept, data Relatively simple. Relatively complex.
2) Eventual Consistency type complexity
Eventual consistency (EC) is used to achieve higher Usage Previously used in NFS, Previously used in collaborative
availability. EC ensures that if no other updates applied, Unison, and Dynamo editing, Bayou, and PNUTS.
eventually all accesses to the item should return the same
newest value. A system is said to have reached eventual
D. Example of CRDT Usage and Previous Works be transmitted every time. On the other hand, if state-based
Based on Saphiro’s research, some portfolio for CRDT CRDT is selected, when the records in the address book have
implementation could be as follows [12]. been enormous, a relatively large size state should be sent.
1. Register State-based CRDT is instead best suited for applications that
a. Last-writer wins require the whole result of all replicas. Whilst, for op-based
b. Multi-value CRDT, it is appropriate to be used for active replicas, that
2. Set
transmit particular operations [12].
a. Grow-only
b. 2P Op-based CRDT requires all operations to be commutative.
c. Observe-Remove However, in the real implementation, when there are addition
3. Map operation and removal operation simultaneously, this will
a. Set of Registers cause the operations to be no more commutative. The condition
4. Counter
a. Unlimited
is then handled using additional reconciliation and operation
b. Non-negative linearization in order to convert them to be commutative.
5. Graph
a. Directed B. Model and Representation in CRDT Database
b. Monotonic Directed Acyclic Graph For every record that is stored, there should be an
c. Edit Graph identification and a versioning factor. The identification is
6. Sequence
a. Edit Sequence generated using UUID (universally unique identifier) version 4
On the previous researches, Marc Saphiro, et. al. (2011) algorithm. The reason for choosing UUIDv4 is that it will
utilized directed graph as data type for CRDT and applied it to create a random yet nearly unique each time. The similarity for
a web crawler application. Beforehand, Saphiro proved that the ID generated is claimed to be very small, and it can contain
directed graph is also a CRDT mathematically. The graph is 32 alphanumeric characters with 4 hyphens.
composed of vertices and arches. An arc owns two registered There are two kinds of common versioning, such as using
vertices. It also represents the relationship between two timestamp and UUID. For UUID concept, each record with
vertices. certain UUID should only be added once. If it happens to be
There are alternatives towards the operation precedence for removed, any incoming operations should be invalid and cause
directed graph, such as no effect. On the contrary, by using timestamp, each record
1. Vertex removal takes the precedence. could be modified or recreated at any time according to the
2. Vertex addition takes the precedence. timestamp. Nonetheless, timestamp usage is inefficient when
3. Vertex removal operation should be pending until all the replica should perform rollbacks due to the cancellation of
concurrent vertex addition operations have all been removals or updates.
executed. C. Supporting Technology
Regarding the third alternative, execution with such There are server, middleware, and client using the
precedence might require synchronization which leads to following languages and technologies.
violence of CRDT requirement, i.e. asynchronous and fault 1. Server: NodeJS with LevelDB as database.
tolerance. The first alternative was selected despite no ideal
2. Middleware: Apache Thrift.
option. The only reason is just that the alternative suited more
3. Client: HTML and JS with IndexedDB as temporary
with the algorithm scenario of their research. Eventually,
local database and AppCache to provide caching when
Saphiro concluded that using CRDT concept, the strong
working in offline mode.
eventual consistency could satisfy the convergence of
replication.
III. SYSTEM DESIGN AND ARCHITECTURE

A. CRDT Database and Application


Database that is designed and built is implemented using
CRDT algorithm to resolve conflicts between replicas. Later on,
a more specific application to demonstrate the database
functionality is the address book in web app. Address book
allows contact addition, removal, and modification from
multiple clients concurrently. Attributes of each contact are
saved in a key-value storage.
Database that permits both offline and online usage might
have potential conflicts when merging or applying operations. Source: https://developer.mozilla.org/en/docs/Web/API/IndexedDB_API
The conflicts are to be resolved using CRDT, either state-based
Fig 2. IndexedDB Support for Desktop Browser
or op-based. Based on the CRDT comparison, operation-based
CRDT is opted since the efficiency of small size operations to
NodeJS and JavaScript is chosen for server and client to IV. IMPLEMENTATION AND EVALUATION
ease the author in developing. Apache Thrift is selected due to
its ability to support multiple languages and its simplicity. For A. Implementation
client side, IndexedDB is utilized since it currently provides Implementation is separated into three modules, such as
compatibility across many popular browsers. The IndexedDB server, middleware, and client modules. Server and
support for each browser is displayed in Fig 2 for desktop middleware modules are to be run at server to receive requests
version and Fig 3 for mobile version. from clients. Client module runs at client-side browser.
1) Server Module
Server module consists of three submodules, i.e. CRDTDB,
CRDTDB Log, and CRDTDB Log Counter.
a) CRDTDB
This module is the base of API that is used by
developer users as library. The module contains
operations connected to server database LevelDB. Usable
functions include a constructor, getOps, get, add, update,
and remove.
The usages of each function are as follows.
i. Constructor
Initially, this method is called to create a new
Source: https://developer.mozilla.org/en/docs/Web/API/IndexedDB_API
object. LDB file for storage will be generated by
Fig 3. IndexedDB Support for Mobile Browser LevelDB to store the data.
ii. GetOps
D. Address Book Application This function enables the user to access all
In the development, address book will work in both offline operations which have not been retrieved by the
and online modes. The application will run in the web server. The operations are given from the index
platform. Some conflict examples which may potentially passed by a parameter to the last counter in server.
emerge when operating are as follows. This index is correspondent with log counter’s index.
1. A new contact is added in replica rx and removed in ry. iii. Get
2. A new attribute of a contact is added in rx and removed This function allows the access to the stored
in ry. value in database according to the identification as the
3. An attribute of a contact is modified in both replica rx key.
and ry. iv. Add
The similar condition has been handled in CRDT database. Add functions as an operation to insert new data
Therefore, by using the built database, such conflicts are then to the database, with a new unique identification as
resolved. the key. The inserted data could then be retrieved by
using previous methods which are get and getOps.
E. Architecture Design
v. Remove
The design for the application is depicted on Fig 4. Clients Remove operation is responsible for deleting the
are viewed as replicas, which communicate with the server via stored data based on the passed identification key
middleware. When the replica encounters offline mode such as inside the parameter. After this method is called, no
no network connection or server network failure, each client other later data are permitted to have the same key
works in a standalone manner. Until the connection is and updates to this key will be forever ignored. This
recovered, the system will return as before. ensures the consistency of the database.
vi. Update
This method is used to modify the value of the
stored data based on the provided identification key
via input parameter. Update method is derived from
remove and add operation. Therefore, the
characteristics of remove and add still persist.
b) CRDTDB Log
This module is responsible to log all operations that
have ever been applied to the database. CRDTDB log
does not mark the operation with indices, instead it is
only used for general log and as backup and debugging
purpose. The stored log might be used to analyze or
diagnose abnormality.
Fig 4. Application Architecture Design
c) CRDTDB Log Counter a. Constructor
This module is basically used by CRDTDB module. It This is called when creating new object. It
aims to log all operations with a counter marker so that will create a database with the name based on the
later on when getOps is called, partial operations could be input parameter.
retrieved. Every time one operation is added, the counter b. Open function
will always increase by one. CRDTDB Log Counter has a This function is to open the connection for
constructor, get, add, and remove method. IndexedDB. A new object store will be created if
2) Middleware Module no previous existing stores. Four object stores
This module implements the API of Apache Thrift for will be created, i.e. crdtdb, crdtdb-log, crdtdb-
web server. There is a Thrift file containing data types log-counter, and crdtdb-buffer.
and operations. The codes are compiled and generated to c. Get function
NodeJS and JavaScript file. This middleware receives the Get function aims to access the value stored
incoming requests. Data types for Thrift are as follows. in database according to the key identification.
i. Operation (Enumeration) d. GetAllRequest function
Operation is the enumeration of ADD, This function is to retrieve all data from the
UPDATE, and DELETE. database using the cursor from IndexedDB.
ii. Work e. SyncServer function
Work data type consist of operation, id, and This is to retrieve all remaining operations
content. Operation has the type of operation from server. The log counter helps to identify the
enumeration, id has the type of IdType, and content last retrieved operation.
holds the string for value. An optional ServerIdxType f. SyncClient function
is added for the use of getOps method to reveal the SyncClient aims to transmit all remaining
corresponding index for the operation. operations stored in buffer, which is created
iii. IdType during offline mode.
IdType is typed as string, which holds the g. Add function
generated UUID. For every data insertion, this function is
iv. ServerIdxType called. If it is in offline mode, operations will be
This is used to hold the log counter index from temporarily stored in crdtdb-buffer. New
server. identification will be generated for each new
data.
The following methods are provided for the protocol h. Remove function
in Apache Thrift. Remove function is responsible for deleting
i. SyncOp data based on certain key identification. The
This method is called by the client with an input operations will be temporarily stored in buffer if
parameter for the last retrieved index. SyncOp will it is in offline mode.
execute getOps function from CRDTDB module and i. Update function
the output will be converted and transmitted using the Update function is used to modify data
generated Thrift data type, i.e. Work. according to the key identification. It is derived
ii. TransmitOperation from remove and add operations. The buffer will
This method accepts a Work type parameter. also work if it happens in offline mode.
The operation given will be processed based on the 4) Web Client Implementation
given Operation enumeration type. The web client implementation is demonstrated using
3) Client Module address book application. In the client side, the codes
Client API is provided in one JavaScript module, from developer user utilize CRDT Client API module.
CRDTDB Client which utilizes IndexedDB to store Then, the API manages the communication with server.
temporary local data in web browser. There are numbers The illustration is shown in Fig 5.
of libraries used in the module as the following.
i. uuidV4
This library is used to generate UUID version 4.
Inside it, there are functions to convert buffer to
UUID and generate random numbers.
ii. Thrift JavaScript generated library
This library is generated from the compilation
Apache Thrift file.

Inside CRDTDB Client module, functions are


provided such as the following. Fig 5. CRDTDB Client API Usage
B. Evaluation 2. The implemented CRDT database is able to achieve
The evaluation focuses the testing to basic functionality of strong eventual consistency with data types that satisfy
implemented CRDT database, including add, update, and CRDT requirements.
remove operations. Scenarios are created to examine in offline 3. Applications such as address book and score entry are
and online modes, and single or multiple client replicas appropriate to utilize CRDT database.
concurrently. The demonstration is using the address book 4. Versioning using identification over timestamp is more
application that utilizes CRDT database. The testing scenarios efficient since no rollbacks are needed.
are displayed in TABLE II, III, IV, and V. 5. For future works, API implementation may extend to
mobile app and the other platforms.
TABLE II. ADD OPERATION TESTING SCENARIO 6. More complex operations could be improved and
added to satisfy commutative property. For example,
No. Description Result
SP1-1 Adding new contact with single client in online mode. Passed the precedence commutativity between add and
SP1-2 Adding new contact with single client in offline mode. Passed multiply operations.
SP1-3 Adding new contact with single client in offline-online. Passed
SP1-4 Adding new contact with multiple clients in online mode. Passed ACKNOWLEDGMENT
SP1-5 Adding new contact with multiple clients in offline mode. Passed
SP1-6 Adding new contact with multiple clients in offline-online mode. Passed The authors would like to thank Riza Satria Perdana and
SP1-7 Adding more than two new contacts in offline-online mode with Passed Fitra Arifiansyah for the reviews during the paper completion.
network suddenly cut. Delays are added to each operation.
Particularly, Erick Chandra would like to thank parents and
family that give endless support and love, brother Muhamad
TABLE III. REMOVE OPERATION TESTING SCENARIO
Fikri Alhawarizmi for the motivation, spirit, care, and reviews,
No. Description Result teachers, and friends.
SP2-1 Delete existing contact with single client in online mode. Passed
SP2-2 Delete existing contact with single client in offline mode. Passed
SP2-3 Delete existing contact with single client in offline-online mode. Passed REFERENCES
SP2-4 Delete existing contact with multiple clients in online mode. Passed [1] Almeida, Paulo dkk. 2015. Efficient State-Based CRDTs by Delta-
SP2-5 Delete existing contact with multiple clients in offline mode. Passed Mutation. Springer International Publishing.
SP2-6 Delete existing contact with multiple clients in offline-online Passed
mode. [2] Baquero, Carlos dkk. 2014. Making Operation-Based CRDTs
SP2-7 Delete more than two existing contacts in offline-online mode Passed Operation-Based. Germany: Springer Berlin Heidelberg.
with network suddenly cut. Delays are added to each operation. [3] Basho. 2016. Strong Consistency.
http://docs.basho.com/riak/kv/2.2.0/learn/concepts/strong-
TABLE IV. UPDATE OPERATION TESTING SCENARIO consistency/
No. Description Result [4] Brewer, Eric. 2000. Towards robust distributed systems. PODC
SP3-1 Update existing contact with single client in online mode. Passed Keynotes. USA: UC Berkeley.
SP3-2 Update existing contact with single client in offline mode. Passed https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-
SP3-3 Update existing contact with single client in offline-online mode. Passed keynote.pdf
SP3-4 Update existing contact with multiple clients in online mode. Passed
[5] Brewer, Eric. 2012. CAP Twelve Years Later: How the “Rules”
SP3-5 Update existing contact with multiple clients in offline mode. Passed
SP3-6 Update existing contact with multiple clients in offline-online Passed
Have Changed. https://www.infoq.com/articles/cap-twelve-years-
mode. later-how-the-rules-have-changed
SP3-7 Update more than two existing contacts in offline-online mode Passed [6] Codd, E. F. 1970. “A Relational Model of Data for Large Shared
with network suddenly cut. Delays are added to each operation. Data Banks”. Communications of the ACM. San Jose, CA: IBM
Research Lab.
TABLE V. COMBINED OPERATION TESTING [7] Gilbert, Seth dan Lynch, Nancy. “Brewer’s Conjecture and the
No. Description Result Feasibility of Consistent, Available, Partition-tolerant Web
SP4-1 Add a contact in a client and delete the contact in the other client Passed Services”. ACM SIGACT News, Volume 33 Issue 2 (2002), page
in online mode. 51-59.
SP4-2 Add contact in a client and update the contact in the other client Passed [8] Gray, Jim. 1981. The Transaction Concept: Virtues and Limitations.
in online mode.
Proceedings of the 7th International Conference on Very Large
SP4-3 Update existing contact with multiple clients in offline-online Passed
mode concurrently. Databases. Vallco Parkway, Cupertino: Tandem Computers.
SP4-4 Client A updates an existing contact in offline mode. Client B Passed [9] Letia, Mihai. 2010. Consistency without Concurrency Control in
deletes the same contact in online mode. Both clients are finally Large, Dynamic Systems. SIGOPS Oper. Syst. Rev. ACM: 29-34.
brought to online mode.
[10] Petersen, K. dkk. 1997. “Flexible update propagation for weakly
V. CONCLUSION AND FUTURE WORKS consistent replication”. ACM SIGOPS Operating Systems Review.
[11] Pritchett, D. 2008. “Base: An Acid Alternative.” New York, USA:
The conclusion after designing and implementing CRDT ACM.
database that support offline updates are as follows. [12] Saphiro, Marc dkk. 2011. Conflict-free Replicated Data Types.
1. CRDT database using op-based approach that supports https://hal.inria.fr/inria-00609399v1/document
offline updates with client-side replica is successfully [13] Ullman, Jeffrey dkk. 2009. Database Systems: The Complete Book.
built using web platform. USA: Pearson Prentice Hall.
[14] Vogels, W. 2009. “Eventually Consistent”. Communications of the
ACM.

S-ar putea să vă placă și