The	  Hacker’s	  Database	        Amir	  Salihefendic	  (amix)	  
About	  Me	  •  Co-­‐founder	  and	  former	  CTO	  of	  Plurk.com	     	  •  Helped	  Plurk	  scale	  to	  millions	  of	...
Outline	  of	  the	  talk	  •  Plurk	  Timelines	  opKmizaKon:	  How	  we	  saved	     hundreds	  of	  thousands	  of	  do...
Problem	  ExponenKal	  data	  growth	  in	  Social	  Networks	      data size                         number of users
The	  Easy	  Solu=on	  Throw	  money	  at	  the	  problem	  
The	  Smarter	  Solu=on	      Reduce	  to	  linear	  data	  growth	      	  data size                     number of users
Example:	  Timelines	  
Example:	  Timelines	   timelinedata size                 number of users
Example:	  Timelines	                   SoluKon:	  Chea=ng!	      Make	  Kmelines	  a	  fixed	  size	  -­‐	  500	  messages...
Plurk’s	  =melines	  migra=on	  path	  	  	  	  	                                              Tokyo	  Tyrant	  	  	  	  •...
What’s	  great	  about	  Redis?	  • Everything	  is	  in	  memory,	    but	  the	  data	  is	  persistent.	    	  • Amazin...
Redis	  Rich	  Datatypes	  •  Rela=onal	  databases	    Schemas,	  tables,	  columns,	  rows,	  indexes	  etc.	     	  •  ...
Redis	  datatypes	  resemble	  datatypes	  in	  programming	  languages.	  	  They	  are	  natural	  to	  us!	  
redis_wrap	  •  Implements	  a	  wrapper	  for	  Redis	  datatypes	  so	     they	  mimic	  the	  datatypes	  found	  in	 ...
redis_wrap	  # Mimic of Python lists	              # Mimic of Python sets	bears = get_list(bears)	            fishes = get...
redis_graph	  •  Implements	  a	  simple	  graph	  database	  in	  Python	     	  •  Can	  scale	  to	  a	  few	  million	...
redis_graph	  # Adding an edge between nodes	add_edge(from_node=frodo, to_node=gandalf)	assert has_edge(from_node=frodo,	 ...
redis_graph:	  The	  implementaKon	  from redis_wrap import *		#--- Edges ----------------------------------------------	d...
redis_queue	  •  Implements	  a	  queue	  in	  Python	  using	  Redis	     	  •  Used	  to	  process	  millions	  of	  bac...
redis_queue	  from redis_simple_queue import *		delete_jobs(tasks)		put_job(tasks, 42)		assert tasks in get_all_queues()	a...
redis_queue:	  Implementa=on	  from redis_wrap import *		def put(queue, job_data, system=default):	    get_list(queue, sys...
bitmapist	  and	  bitmapist.cohort	  •  Implements	  an	  advanced	  analyKcs	  library	  on	  top	     of	  Redis	  bitma...
bitmapist:	  What	  does	  it	  help	  with?	  •  Has	  user	  123	  been	  online	  today?	  This	  week?	  •  Has	  user...
What	  are	  bitmaps?	  •  Opera=ons:	  SETBIT,	  GETBIT,	  BITCOUNT,	  BITOP	  	     	  •  SETBIT	  somekey	  8	  1	  •  ...
bitmapist:	  Using	  it	  # Mark user 123 as active and has played a song	mark_event(active, 123)	mark_event(song:played, ...
bitmapist.cohort:	   Manage	  retenKon!	  h_p://amix.dk/blog/post/19718	  	  
•  Goal:	  InvenKng	  a	  modern	  way	  to	  work	  together	  •  Join	  an	  amazing	  team	  of	  13	  people	  from	  ...
Ques=ons	  and	  Answers	  •  Slides	  will	  be	  posted	  to	     h_p://amix.dk/	  	  •  For	  “offline”	  quesKons	  cont...
Amir Salihefendic: Redis - the hacker's database
Upcoming SlideShare
Loading in …5
×

Amir Salihefendic: Redis - the hacker's database

1,592 views

Published on

Redis, the hacker's database:
- simple_queue: feature set, comparison with Celery and Rq
- redis_graph: available options, integration with other tools, and the big-O performance
- bitmapist, idea, archtecture, reports based on cohorts
- optionally: tagged-logger / ormist (lightweight Object-to-Redis mapper)
- optionally: scripting possibility of Lua, Lua-jit (almost as fast as C)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,592
On SlideShare
0
From Embeds
0
Number of Embeds
367
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Amir Salihefendic: Redis - the hacker's database

  1. 1. The  Hacker’s  Database   Amir  Salihefendic  (amix)  
  2. 2. About  Me  •  Co-­‐founder  and  former  CTO  of  Plurk.com    •  Helped  Plurk  scale  to  millions  of  users,   billions  of  pages  views  and  8+  billion  unique   data  items.  With  minimal  hardware!  •  Founder  of  Doist.io   creators  of  Todoist  and  Wedoist  
  3. 3. Outline  of  the  talk  •  Plurk  Timelines  opKmizaKon:  How  we  saved   hundreds  of  thousands  of  dollars    •  What’s  great  about  Redis?    •  Different  sample  implementaKons:   –  redis_wrap   –  redis_graph   –  redis_queue    •  Advanced  analyKcs  using  Redis   –  bitmapist  and  bitmapist.cohort  
  4. 4. Problem  ExponenKal  data  growth  in  Social  Networks   data size number of users
  5. 5. The  Easy  Solu=on  Throw  money  at  the  problem  
  6. 6. The  Smarter  Solu=on   Reduce  to  linear  data  growth    data size number of users
  7. 7. Example:  Timelines  
  8. 8. Example:  Timelines   timelinedata size number of users
  9. 9. Example:  Timelines   SoluKon:  Chea=ng!   Make  Kmelines  a  fixed  size  -­‐  500  messages   timeline •  O(1)  inserKon  data size •  O(1)  update   •  Cache  able   number of users
  10. 10. Plurk’s  =melines  migra=on  path           Tokyo  Tyrant        •  Problem  with  MySQL  and  Tokyo  Tyrant?   Death  by  IO  
  11. 11. What’s  great  about  Redis?  • Everything  is  in  memory,   but  the  data  is  persistent.    • Amazing  performance:   100.000+  SETs  pr.  sec   80.000+  GETs  pr.  sec  
  12. 12. Redis  Rich  Datatypes  •  Rela=onal  databases   Schemas,  tables,  columns,  rows,  indexes  etc.    •  Column  databases  (BigTable,  hBase  etc.)   Schemas,  columns,  column  families,  rows  etc.    •  Redis   key-­‐value,  sets,  lists,  hashes,  bitmaps,  etc.  
  13. 13. Redis  datatypes  resemble  datatypes  in  programming  languages.    They  are  natural  to  us!  
  14. 14. redis_wrap  •  Implements  a  wrapper  for  Redis  datatypes  so   they  mimic  the  datatypes  found  in  Python    •  100  lines  of  code    •  h_ps://github.com/Doist/redis_wrap    
  15. 15. redis_wrap  # Mimic of Python lists # Mimic of Python sets bears = get_list(bears) fishes = get_set(fishes) bears.append(grizzly) assert nemo not in fishes assert len(bears) == 1 fishes.add(nemo) assert grizzly in bears   assert nemo in fishes for item in fishes: assert item == nemo  # Mimic of hashes villains = get_hash(villains) assert riddler not in villains villains[riddler] = Edward Nigma assert riddler in villains assert len(villains.keys()) == 1 del villains[riddler] assert len(villains) == 0  
  16. 16. redis_graph  •  Implements  a  simple  graph  database  in  Python    •  Can  scale  to  a  few  million  nodes  easily  •  You  could  use  something  similar  to  implement   LinkedIn’s  “who  is  connected  to  who”  feature    •  Under  40  lines  of  code    •  h_ps://github.com/Doist/redis_graph    
  17. 17. redis_graph  # Adding an edge between nodes add_edge(from_node=frodo, to_node=gandalf) assert has_edge(from_node=frodo, to_node=gandalf) == True # Getting neighbors of a node assert list(neighbors(frodo)) == [gandalf] # Deleting edges delete_edge(from_node=frodo, to_node=gandalf)  # Setting node values set_node_value(frodo, 1) assert get_node_value(frodo) == 1 # Setting edge values set_edge_value(frodo_baggins, 2) assert get_edge_value(frodo_baggins) == 2  
  18. 18. redis_graph:  The  implementaKon  from redis_wrap import * #--- Edges ---------------------------------------------- def add_edge(from_node, to_node, system=default): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system=default): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) #--- Node values ---------------------------- def get_node_value(node_x, system=default): def has_edge(from_node, to_node, system=default): node_key = nv:%s % node_x edges = get_set( from_node, system=system ) return get_redis(system).get( node_key ) return to_node in edges def set_node_value(node_x, value, system=default): def neighbors(node_x, system=default): node_key = nv:%s % node_x return get_set( node_x, system=system ) return get_redis(system).set( node_key, value )   #--- Edge values ----------------------------- def get_edge_value(edge_x, system=default): edge_key = ev:%s % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system=default): edge_key = ev:%s % edge_x return get_redis(system).set( edge_key, value )  
  19. 19. redis_queue  •  Implements  a  queue  in  Python  using  Redis    •  Used  to  process  millions  of  background  tasks  on   Plurk  /  Todoist  /  Wedoist  daily  (billions  in  total)    •  Implementa=on:  18  lines   “real”  implementaKon  a  bit  bigger    •  h_ps://github.com/Doist/redis_simple_queue    
  20. 20. redis_queue  from redis_simple_queue import * delete_jobs(tasks) put_job(tasks, 42) assert tasks in get_all_queues() assert queue_stats(tasks)[queue_size] == 1 assert reserve_job(tasks) == 42 assert queue_stats(tasks)[queue_size] == 0  
  21. 21. redis_queue:  Implementa=on  from redis_wrap import * def put(queue, job_data, system=default): get_list(queue, system=system).append(job_data) def reserve(queue, system=default): return get_list(queue, system=system).pop() def delete_jobs(queue, system=default): get_redis(system).delete(queue) def get_all_queues(system=default): return get_redis(system).keys(*).split( ) def queue_stats(queue, system=default): return { queue_size: len(get_list(queue)) }  
  22. 22. bitmapist  and  bitmapist.cohort  •  Implements  an  advanced  analyKcs  library  on  top   of  Redis  bitmaps.  Saved  us  $2000  USD/month   (Mixpanel)!    •  bitmapist   h_ps://github.com/Doist/bitmapist    •  bitmapist.cohort   Cohort  analyKcs  (retenKon)  
  23. 23. bitmapist:  What  does  it  help  with?  •  Has  user  123  been  online  today?  This  week?  •  Has  user  123  performed  acKon  "X"?  •  How  many  users  have  been  acKve  have  this  month?  •  How  many  unique  users  have  performed  acKon  "X"   this  week?  •  How  many  %  of  users  that  were  acKve  last  week  are   sKll  acKve?  •  How  many  %  of  users  that  were  acKve  last  month  are   sKll  acKve  this  month?  •  Bitmapist  can  answer  thisfor  millions  of  users  and   most  operaKons  are  O(1)!  Using  very  small  amounts   of  memory.  
  24. 24. What  are  bitmaps?  •  Opera=ons:  SETBIT,  GETBIT,  BITCOUNT,  BITOP      •  SETBIT  somekey  8  1  •  GETBIT  somekey  8  •  BITOP  AND  destkey  somekey1  somekey2  •  h_p://en.wikipedia.org/wiki/Bit_array    
  25. 25. bitmapist:  Using  it  # Mark user 123 as active and has played a song mark_event(active, 123) mark_event(song:played, 123) # Answer if user 123 has been active this month assert 123 in MonthEvents(active, now.year, now.month) assert 123 in MonthEvents(song:played, now.year, now.month) # How many users have been active this week? print len(WeekEvents(active, now.year, now.isocalendar()[1])) # Perform bit operations. How many users that # have been active last month are still active this month? active_2_months = BitOpAnd( MonthEvents(active, last_month.year, last_month.month), MonthEvents(active, now.year, now.month) ) print len(active_2_months)  
  26. 26. bitmapist.cohort:   Manage  retenKon!  h_p://amix.dk/blog/post/19718    
  27. 27. •  Goal:  InvenKng  a  modern  way  to  work  together  •  Join  an  amazing  team  of  13  people  from  all  around   the  world.  A  profitable  business.  500.000+  users.  •  Work  from  anywhere.  Hacker  friendly  culture.   Python.  CompeKKve  salaries.  •  We  are  hiring:    jobs@doist.io                                                            www.doist.io    
  28. 28. Ques=ons  and  Answers  •  Slides  will  be  posted  to   h_p://amix.dk/    •  For  “offline”  quesKons  contact:   amix@doist.io    

×