Documente Academic
Documente Profesional
Documente Cultură
dr
Apache Cassandra
in
ah
M
An Overview
ch
Te
a
dr
“Apache Cassandra is an open source, distributed,
in
decentralized, elastically scalable, highly available,
fault-tolerant, tuneably consistent, column-oriented
ah
database, that bases its distribution design on Amazon’s
Dynamo and its data model on Google’s Bigtable.”
M
ch
Created at Facebook, it is now used at some of the most
Te
a
1.98 billion 500 GB drives
dr
in
6 fold growth
In 4 years
ah
988 EB
322 million 500GB drives
161 EB
M
ch
Te
2006 2010
Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
a
You Tube Serves 200 mn Videos every day
dr
Chevron accumulates 2TB Data everyday
Indian Telecom collects call data 155 TB per month and Growing
in
900,000 android phones provisioned by Google everyday
By 2015 there will be 2.5 billion email accounts
ah
By 2015 there will be 1 billion Subscribers in the telecom sector in India
Will RDBMS ever to scale these every growing volumes?
M
ch
Te
a
dr
RDBMS - Structured and organized data
Structured query language (SQL)
in
Data and its relationships are stored in separate tables.
Data Manipulation Language, Data Definition Language
ah
Tight Consistency
M
ch
Te
a
dr
Specialized data structures (think B-trees)
Shines with complicated queries
in
Focus on fast query & analysis quickly
Not necessarily on large datasets
ah
M
ch
Te
a
dr
Stands for Not Only SQL
No declarative query language (recently evolving)
in
No predefined schema
Key-Value pair storage, Column Store, Document Store, Graph databases -
ah
Eventual consistency rather than ACID property
Unstructured and unpredictable data
Driven by CAP Theorem
M
Prioritizes high performance, high availability and scalability
ch
Te
a
dr
Advantages
– High scalability
in
– Distributed Computing
– Lower cost
ah
– Schema flexibility, semi-structure data
– No complicated Relationships
–
Disadvantages
– No standardization
M
Object-oriented programming that is easy to use and flexible
ch
– Limited query capabilities (so far)
– Eventually consistent is not intuitive to program for
Te
a
dr
Consistency:
– If we wrote a data in one node and read it from another node in a
in
distributed system, it will return what I wrote on the other node.
Availability:
ah
– Each node of the distributed system should respond to the query unless it
dies.
Partition-Tolerance:
M
– This shows the availability and seamless operation of the distributed
system even with the partition (add/remove node from different data center)
ch
or message loss over the network.
Te
a
– To primarily support Consistency and Availability means that you’re likely
dr
using two-phase commit for distributed transactions. It means that the
system will block when a network partition occurs, so it may be that your
in
system is limited to a single data center cluster in an attempt to mitigate
this. If your application needs only this level of scale, this is easy to
ah
manage and allows you to rely on familiar, simple structures.
CP
M
– To primarily support Consistency and Partition Tolerance, you may try to
advance your architecture by setting up data shards in order to scale. Your
data will be consistent, but you still run the risk of some data becoming
ch
unavailable if nodes fail.
AP
Te
a
dr
ACID
– Atomic
in
– Consistent
– Isolation
ah
– Durability
– All of the above but not SCALABLE
BASE
– Basic Availibility
– Soft-State
M
ch
– Eventual Consistency
– All of the Above but not Strongly Consistent
Te
a
– Consistent hashing
dr
– Partitioning
– Replication
in
– One-hop routing
Google BigTable
ah
– Column Families
– Memtables
– SSTables
M
ch
Te
a
Horizontal - commodity hardware, not specialized boxes
dr
All nodes are identical
in
No master or SPOF
ah
Adding is simple
a
dr
Replication factor
– How many nodes data is replicated on
in
Consistency level
– Zero, One, Quorum, All
ah
Sync or async for writes
Reliability of reads
– Read repair
M
ch
Te
a
RF=3
dr
Conceptual Ring
in
a
One token per
ah
node
Multiple ranges M
ch
per node j d
Te
g
Copyright © 2013 Tech Mahindra. All rights reserved. 15
Ring Topology
a
RF=2
dr
Conceptual Ring
in
a
One token per
ah
node
Multiple ranges M
ch
per node j d
Te
g
Copyright © 2013 Tech Mahindra. All rights reserved. 16
New Node
a
RF=3
dr
Token assignment
in
a
Range adjustment
ah
m
Bootstrap
M
ch
Arrival only affects j d
immediate
Te
neighbors
g
Copyright © 2013 Tech Mahindra. All rights reserved. 17
Ring Partition
a
RF=3
dr
Node dies
in
a
Available?
ah
Hinting
Handoff
M
ch
Plan for this j d
Te
g
Copyright © 2013 Tech Mahindra. All rights reserved. 18
Schema-free Sparse-table
a
dr
Flexible column naming
You define the sort order
in
Not required to have a specific column just because another row does
ah
M
ch
Te
a
Apache Cassandra DataModel has 4 main concepts
dr
– Cluster
– KeySpace
in
– Column Family
A column family contains multiple columns referenced by a row key
ah
– Super Column Family
M
ch
Te
a
dr
Cassandra is meant to run on a cluster
Although cassandra can run stand-alone, it defeats the purpose of what it is
in
built for
Cluster is arranged as a ring of nodes
ah
Clients send read/write requests to any node in the ring
That node takes on the role of coordinator node, and forwards the request to
the node responsible for servicing it.
M
A partitioner decides which nodes store which rows.
Cluster is container for keyspaces
ch
Te
a
dr
A keyspace is a namespace to group multiple column families, typically one
per application. keyspace is the outermost container for data in Cassandra
in
The basic attributes that you can set per keyspace are
– Replication factor
ah
Refers to the number of nodes that will act as copies
– Replica placement strategy
a
dr
A column family is roughly analogous to a table in the relational model
It is a container for a collection of rows
in
Each row can have a different set of columns
Column Family can have types
ah
– Static Column Family
– Static Set of columns
– Dynamic Column Family
M
– Can use application supplied column names to store data
ch
Te
a
dr
The column is the smallest increment of data in Cassandra.
It is a tuple containing a name, a value and a timestamp.
in
A column must have a name, and the name can be a static label (such as
name” or “email”) or it can be dynamically set when the column is created by
ah
your application
M
ch
Te
a
dr
A Cassandra column family can contain either regular columns or super
columns , which adds another level of nesting to the regular column family
in
structure.
Super columns are comprised of a (super) column name and an ordered map
ah
of sub-columns.
A super column can specify a comparator on both the super column name as
well as on the sub-column names
M
ch
Te
a
dr
in
ah
M
ch
Te
a
dr
• Keyspace
• ColumnFamily
in
• Row (indexed)
ah
• Key
• Columns
Name (sorted)
M
ch
Value
Te
a
dr
in
A single column
ah
M
ch
Te
a
dr
A single row
in
ah
M
ch
Te
a
dr
in
ah
M
ch
Te
a
dr
(Business) Key -> Value
(twitter.com) tweet id -> information about tweet
in
(kayak.com) Flight number -> information about flight, e.g., availability
(yourbank.com) Account number -> information about it
ah
(amazon.com) item number -> information about it
a
dr
Yes
Relational Databases
in
(RDBMSs) have
been around for ages
ah
Data stored in tables
Schema-based, i.e.,
structured tables
Queried using SQL M
ch
Te
a
(client-specified)
dr
Can have index tables
Hence “column-
in
oriented databases”/
“NoSQL”
ah
No schemas
Some columns missing
from some entries
“Not Only SQL”
Supports get(key) and M
ch
put(key, value) operations
Often write-heavy
workloads
Te
a
CAP Theorem
dr
– Consistency
– Availability
in
– Partition Tolerance
Choose two
ah
– Cassandra chooses A and P
M
ch
Te
a
dr
Give up a little A and P to get more C
Ratchet up the consistency level
in
R + W > N Strong consistency
ah
More to come
M
ch
Te
a
dr
Simple: put(key, col, value)
Complex: put(key, [col:value, …, col:value])
in
Batch: multi key.
ah
M
ch
Te
a
dr
Configurable fsync
Sequential writes only
in
Memtable – no disk access
ah
(no reads or seeks)
Sstables are final (become
read only)
Indexes
Bloom filter
M
ch
Raw data
Te
a
You need a key or keys:
dr
Single: key=‘a’
Range: key=‘a’ through ’f’
And columns to retrieve:
in
Slice: cols={bar through kite}
ah
By name: key=‘b’ cols={bar, cat, llama}
Nothing like SQL “WHERE col=‘faz’”
But secondary indices are being worked on
M
ch
Te
a
Practically lock free
dr
Sstable proliferation
New in 0.6:
in
Row cache (avoid sstable
ah
lookup, not write-through)
Key cache (avoid index
scan)
M
ch
Te
a
– Range queries
dr
• Provisioning
– Virtual or bare metal
in
– Cluster size
• Data model
ah
– Think in terms of access
– Giving up transactions, ad-hoc queries, arbitrary indexes and joins
• (you may already do this with an RDBMS!)
M
ch
Te
a
dr
Wide rows
Data life-span
in
Cluster planning
Bootstrapping
ah
M
ch
Te
a
dr
Wide rows
Data life-span
in
Cluster planning
– Bootstrapping
ah
M
ch
Te
a
dr
Vector clocks (server side conflict resolution)
Alter keyspace/column families on a live cluster
in
Compression
Multi-tenant features
ah
Less memory restrictions
M
ch
Te
a
dr
Use Cassandra if you want/need
– High write throughput
in
– Near-linear scalability
– Automated replication/fault tolerance
ah
– Can tolerate missing RDBMS features
M
ch
Te
in
ah
M
ch
Te