Documente Academic
Documente Profesional
Documente Cultură
Introduction
Context
Introduction
Elasticity. Scale at any moment. Reduce the need to have IT specialized staff
to implement reliable and scalable data management solutions. on data centers.
Introduction
Dependency on Internet connectivity. Dependency on cloud provider. Data security (privacy). Drawbacks and benets have to be carefully
balanced.
2011 Ricardo Vilaa
4
Amazon SimpleDB
Web service that allows to store and run Requires no schema, automatically indexes
data, and provides real time lookup. and read only applications.
5
Amazon SimpleDB
Data is organized into domains. Domains are collections of items that are
described by attribute-value pairs.
Amazon SimpleDB
CreateDomain DeleteDomain ListDomains PutAttibutes GetAttributes DeleteAttributes Select : =, !=, <, > <=, >=, starts-with, and, or,
not, intersection and union.
7
App Engine Datastore API provides a query Includes a data modeling API and a SQL-like
2011 Ricardo Vilaa
8
An entity has one or more properties and a unique key. The entity properties can be a reference to another entity. Every entity is of a particular kind, a group of entities that can be returned by a query. Datastore API allows to dene data models, and create instances of those models. Entities are organized into entity groups, sets of one or more entities that can be manipulated in a single transaction.
9
10
Microso&
SQL Azure database Windows Azure storage Blob Queue storage Table storage
2011 Ricardo Vilaa
11
Requirements
can be held but also in how required resources can be provisioned dynamically and incrementally. beginning to be elastic and highly available. partitions it is impossible to achieve both strong consistency and availability.
13
Rely on distributed systems designed from the The CAP theorem states that under network
2011 Ricardo Vilaa Semana da LEI 2011
Amazons Dynamo
Properties of both databases and distributed hash tables (DHTs) Building block of some of the Amazon Web Services, such as S3. Several distributed systems concepts in a production system. Hosted data management service that allows multiple applications to concurrently store and query data. Is a component of the Yahoo!s Sherpa, an integrated suite of data services.
14
Yahoos PNUTS
Used to store Googles App Engine Datastore Facebooks Cassandra Initially developed by Facebook and now an Apache
open source project.
DataDroplets
Data Model
Dynamo
Key 1 3 4
Value a b c
K (V C)
2011 Ricardo Vilaa
17
Data Model
PNUTS
Key 1 3 4
Attribute 1 a
Attribute 2
Attribute 3 b
b a d
K P(String V )
2011 Ricardo Vilaa
18
Data Model
BigTable
Column Family1 Key 1 Column1 a b c 3 d v1 v2 v3 v1 Column2 g g h h v1 v2 Family1:Column2 Column3 e f g u v1 v2 v3 v1
e e ff
v 1 v 2
v1 v2
e f
v1 v2
Data Model
Cassandra
Super Column Family1 Column Family1 Key 1 3 4 e Column1 g Column2 e u e Column Family2 Column3 x u v k Column4 v
Data Model
DataDroplets
Key 1 3 4
String (K (V PString))
2011 Ricardo Vilaa
21
API
Dynamo
get(key)
put(key, context,value)
Semana da LEI 2011
22
API
PNUTS
get-any(tableName,key), get-critical (tableName,key,version), get-latest(tableName,key) put(tableName,key,value) delete(tableName,key) test-and-set-put(tableName,key,value,version) scan(tableName,selections,projections) rangeScan(tableName, rangeSelection, projections) multiget(keys)
23
API
BigTable
API
Cassandra
API
DataDroplets
put(collection,key, value, tags) get(collection,key) delete(collection,key) multiPut(mapKeys) multiGet(keys) getByRange(min, max) getByTags(tags)
2011 Ricardo Vilaa
26
Architecture
Dynamo
Clients Request Routing
Partition
DHT
Replication
Storage
27
Architecture
PNUTS
Clients
Tablet Controller
Tablet Controller
Partition
Replication Storage
28
Architecture
BigTable
Clients Request Routing
Tablets Servers
Partition
GFS
Replication Storage
29
Architecture
Cassandra
Clients
Storage
30
Architecture
DataDroplets
Clients
Soft-state layer
DHT
Replication
Replication
31
Tradeos
Tradeos Dynamo PNUTS BigTable Cassandra DataDroplets Data Partition Consistency Multiple versions Replication
2011 Ricardo Vilaa
random
ordered
eventual
atomic
eventual
version
version
timestamp
timestamp
none
quorum
le system
quorum
sync or async
32
Conclusion
Each system offers a different data modeling Flexible data models with dynamic
attributes.
Conclusion
Increase complexity at the application logic. In most enterprises, it is hard to add this
complex layer to the application.
2011 Ricardo Vilaa
34
Conclusion
DataDroplets
Provides additional consistency guarantees Smooths the migration path for existing
applications.
Future Research
Open Issues
Automatic elasticity. Multi-tenancy. More complex operations. Storage of massive data in commodity
hardware.
2011 Ricardo Vilaa
36
Research projects