Documente Academic
Documente Profesional
Documente Cultură
Depends on:
The size and shape of your data
Doing work in batches (batch puts and gets)
Numbers everyone should know 2
Limitations:
Only one ID or key_name per Entity
Cannot change ID or key_name later
500 bytes
Tools for storing data: Transactions
ACID transactions
Atomicity, Consistency, Isolation, Durability
No queries in transactions
Transactional read and write with Get() and Put()
Common practice
Query, find what you need
Transact with Get() and Put()
Hierarchical
Each Entity may have a parent
A "root" node defines an Entity group
Hierarchy of child Entities can go many levels deep
Watch out! Serialized writes for all children of the root
Datastore scales wide
Each Entity group has serialized writes
No limit to the number of Entity groups to use in parallel
Think of it as many independent hierarchies of data
Tools for storing data: Entity groups 3
Pitfalls
Large Entity groups = high contention = failed transactions
Not thinking about write throughput is bad
Structure your data to match your usage patterns
Good news
Query across entity groups without serialized access!
Consistent view across all entity groups
No partial commits visible
All Entities in a group are the latest committed version
Example: Counters
Counters
Using Model.count()
Bigtable doesn't know counts by design
O(N); cannot be O(1); must scan every Entity row!
Use an Entity with a count property:
class Counter(db.Model):
count = db.IntegerProperty()
Frequent updates = high contention!
Transactional writes are serialized and too slow
Fundamental limitation of distributed systems
Counters: Before and after
Single Sharded
class CounterConfig(Model):
name = StringProperty(required=True)
num_shards = IntegerProperty(required=True,
default=1)
class Counter(Model):
name = StringProperty(required=True)
count = IntegerProperty(required=True,
default=0)
Counters: Get the count
def get_count(name):
total = 0
for counter in Counter.gql(
'WHERE name = :1', name):
total += counter.count
return total
Counters: Increment the count
def increment(name):
config = CounterConfig.get_or_insert(name,
name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)
counter = Counter.get_by_key_name(shard_name)
if counter is None:
counter = Counter(
key_name=shard_name, name=name)
counter.count += 1
counter.put()
db.run_in_transaction(txn)
Counters: Cache reads
def get_count(name):
total = memcache.get(name)
if total is None:
total = 0
for counter in Counter.gql(
'WHERE name = :1', name):
total += counter.count
memcache.add(name, str(total), 60)
return total
Counters: Cache writes
def increment(name):
config = CounterConfig.get_or_insert(name,
name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)
counter = Counter.get_by_key_name(shard_name)
if counter is None:
counter = Counter(key_name=shard_name,
name=name)
counter.count += 1
counter.put()
db.run_in_transaction(txn)
memcache.incr(name)
Example: Building a Blog
Building a Blog
Standard blog
Multiple blog posts
Each post has comments
Efficient paging without using queries with offsets
Remember, Bigtable doesn't know counts!
Building a Blog: Blog entries
class GlobalIndex(db.Model):
max_index = db.IntegerProperty(required=True,
default=0)
class BlogEntry(db.Model):
index = db.IntegerProperty(required=True)
title = db.StringProperty(required=True)
body = db.TextProperty(required=True)
Building a Blog: Posting an entry
Hierarchy of Entities:
Blog
Index
Entry
Building a Blog: Getting one entry
def get_entries(start_index):
extra = None
if start_index is None:
entries = BlogEntry.gql(
'ORDER BY index DESC').fetch(
POSTS_PER_PAGE + 1)
else:
start_index = int(start_index)
entries = BlogEntry.gql(
'WHERE index <= :1 ORDER BY index DESC',
start_index).fetch(POSTS_PER_PAGE + 1)
if len(entries) > POSTS_PER_PAGE:
extra = entries[-1]
entries = entries[:POSTS_PER_PAGE]
return entries, extra
Building a Blog: Comments
High write-throughput
Can't use a shared index
Would like to order by post date
Post dates aren't unique, so we can't use them to page:
2008-05-26 22:11:04.1000 Before
2008-05-26 22:11:04.1234 My post
2008-05-26 22:11:04.1234 This is another post
2008-05-26 22:11:04.1234 And one more post
2008-05-26 22:11:04.1234 The last post
2008-05-26 22:11:04.2000 After
Building a Blog: Comments
High write-throughput
Can't use a shared index
Would like to order by post date
Post dates aren't unique, so we can't use them to page:
2008-05-26 22:11:04.1000 Before
2008-05-26 22:11:04.1234 My post
2008-05-26 22:11:04.1234 This is another post
2008-05-26 22:11:04.1234 And one more post
2008-05-26 22:11:04.1234 The last post
2008-05-26 22:11:04.2000 After
Building a Blog: Composite properties
Minimize waste