Sunteți pe pagina 1din 29

The

Hackers Database
Amir Salihefendic (amix)

About Me
Co-founder and former CTO of Plurk.com Helped Plurk scale to millions of users, billions of pages views and 8+ billion unique data items. With minimal hardware!

Founder of Doist.io creators of Todoist and Wedoist

Outline of the talk


Plurk Timelines opKmizaKon: How we saved hundreds of thousands of dollars Whats great about Redis? Dierent sample implementaKons:
redis_wrap redis_graph redis_queue

Advanced analyKcs using Redis


bitmapist and bitmapist.cohort

Problem
ExponenKal data growth in Social Networks

data size

number of users

The Easy Solu=on Throw money at the problem

The Smarter Solu=on


Reduce to linear data growth

data size

number of users

Example: Timelines

Example: Timelines

timeline data size

number of users

Example: Timelines
SoluKon: Chea=ng! Make Kmelines a xed size - 500 messages

timeline data size

O(1) inserKon O(1) update Cache able

number of users

Plurks =melines migra=on path


Tokyo Tyrant

Problem with MySQL and Tokyo Tyrant? Death by IO

Whats great about Redis? Everything is in memory, but the data is persistent. Amazing performance: 100.000+ SETs pr. sec 80.000+ GETs pr. sec

Redis Rich Datatypes


Rela=onal databases
Schemas, tables, columns, rows, indexes etc.

Column databases (BigTable, hBase etc.) Schemas, columns, column families, rows etc. Redis
key-value, sets, lists, hashes, bitmaps, etc.

Redis datatypes resemble datatypes in programming languages. They are natural to us!

redis_wrap
Implements a wrapper for Redis datatypes so they mimic the datatypes found in Python 100 lines of code h_ps://github.com/Doist/redis_wrap

redis_wrap
# Mimic of Python lists bears = get_list('bears') bears.append('grizzly') assert len(bears) == 1 assert 'grizzly' in bears # Mimic of Python sets fishes = get_set('fishes') assert 'nemo' not in fishes fishes.add('nemo') assert 'nemo' in fishes for item in fishes: assert item == 'nemo'

# Mimic of hashes villains = get_hash('villains') assert 'riddler' not in villains villains['riddler'] = 'Edward Nigma' assert 'riddler' in villains assert len(villains.keys()) == 1 del villains['riddler'] assert len(villains) == 0

redis_graph
Implements a simple graph database in Python Can scale to a few million nodes easily You could use something similar to implement LinkedIns who is connected to who feature Under 40 lines of code h_ps://github.com/Doist/redis_graph

redis_graph
# Adding an edge between nodes add_edge(from_node='frodo', to_node='gandalf') assert has_edge(from_node='frodo', to_node='gandalf') == True # Getting neighbors of a node assert list(neighbors('frodo')) == ['gandalf'] # Deleting edges delete_edge(from_node='frodo', to_node='gandalf') # Setting node values set_node_value('frodo', '1') assert get_node_value('frodo') == '1' # Setting edge values set_edge_value('frodo_baggins', '2') assert get_edge_value('frodo_baggins') == '2'

redis_graph: The implementaKon


from redis_wrap import * #--- Edges ---------------------------------------------- def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) #--- Node values ---------------------------- def get_node_value(node_x, system='default'): def has_edge(from_node, to_node, system='default'): node_key = 'nv:%s' % node_x edges = get_set( from_node, system=system ) return get_redis(system).get( node_key ) return to_node in edges def set_node_value(node_x, value, system='default'): def neighbors(node_x, system='default'): node_key = 'nv:%s' % node_x return get_set( node_x, system=system ) return get_redis(system).set( node_key, value ) #--- Edge values ----------------------------- def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value )

redis_queue
Implements a queue in Python using Redis Used to process millions of background tasks on Plurk / Todoist / Wedoist daily (billions in total) Implementa=on: 18 lines real implementaKon a bit bigger h_ps://github.com/Doist/redis_simple_queue

redis_queue
from redis_simple_queue import * delete_jobs('tasks') put_job('tasks', '42') assert 'tasks' in get_all_queues() assert queue_stats('tasks')['queue_size'] == 1 assert reserve_job('tasks') == '42' assert queue_stats('tasks')['queue_size'] == 0

redis_queue: Implementa=on
from redis_wrap import * def put(queue, job_data, system='default'): get_list(queue, system=system).append(job_data) def reserve(queue, system='default'): return get_list(queue, system=system).pop() def delete_jobs(queue, system='default'): get_redis(system).delete(queue) def get_all_queues(system='default'): return get_redis(system).keys('*').split(' ') def queue_stats(queue, system='default'): return { 'queue_size': len(get_list(queue)) }

bitmapist and bitmapist.cohort


Implements an advanced analyKcs library on top of Redis bitmaps. Saved us $2000 USD/month (Mixpanel)!

bitmapist h_ps://github.com/Doist/bitmapist bitmapist.cohort Cohort analyKcs (retenKon)

bitmapist: What does it help with?


Has user 123 been online today? This week? Has user 123 performed acKon "X"? How many users have been acKve have this month? How many unique users have performed acKon "X" this week? How many % of users that were acKve last week are sKll acKve? How many % of users that were acKve last month are sKll acKve this month? Bitmapist can answer thisfor millions of users and most operaKons are O(1)! Using very small amounts of memory.

What are bitmaps?


Opera=ons: SETBIT, GETBIT, BITCOUNT, BITOP SETBIT somekey 8 1 GETBIT somekey 8 BITOP AND destkey somekey1 somekey2 h_p://en.wikipedia.org/wiki/Bit_array

bitmapist: Using it

# Mark user 123 as active and has played a song mark_event('active', 123) mark_event('song:played', 123) # Answer if user 123 has been active this month assert 123 in MonthEvents('active', now.year, now.month) assert 123 in MonthEvents('song:played', now.year, now.month) # How many users have been active this week? print len(WeekEvents('active', now.year, now.isocalendar()[1])) # Perform bit operations. How many users that # have been active last month are still active this month? active_2_months = BitOpAnd( MonthEvents('active', last_month.year, last_month.month), MonthEvents('active', now.year, now.month) ) print len(active_2_months)

bitmapist.cohort: Manage retenKon!

h_p://amix.dk/blog/post/19718

Goal: InvenKng a modern way to work together Join an amazing team of 13 people from all around the world. A protable business. 500.000+ users. Work from anywhere. Hacker friendly culture. Python. CompeKKve salaries. We are hiring: jobs@doist.io www.doist.io

Ques=ons and Answers


Slides will be posted to h_p://amix.dk/ For oine quesKons contact: amix@doist.io

S-ar putea să vă placă și