Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced Redis data structures


Published on

In these slide following projects are presented:
* redis_wrap: A Pythonic wrapper that makes it nicer to work with builtin Redis datatypes
* redis_graph: A sample graph database
* redis_simple_queue: A simple queue implemented on Redis list structure
* bitmapist: a powerful analytics library using Redis bitmaps, great for retention and cohort tracking
* fixedlist: a fixed list structure that can optimize timelines (and other things)
* how to use Lua scripting for more advanced data structures and better performance

These are Python projects, but some of them (like bitmapist) have been ported to other languages.

This talk was given to PyCon Belarus on 31 Jan. 2015

Published in: Technology

Advanced Redis data structures

  1. 1. Advanced Redis data structures by Amir Salihefendic
  2. 2. About me Founder Millions of data items Co-founder, former CTO Billions of data items
  3. 3. Redis: Greatness Everything is in memory, data is persistent Amazing Performance The Hacker’s database
  4. 4. Redis: Greatness Great lead dev Amazing progress (Sentinel, Cluster, …)
  5. 5. Redis Rich Datatypes • Relational databases
 Schemas, tables, columns, rows, indexes etc.
 • Column databases (BigTable, hBase etc.)
 Schemas, columns, column families, rows etc.
 • Redis
 key-value, sets, lists, hashes, bitmaps, etc.
  6. 6. Redis datatypes resemble datatypes in programming languages. They are natural to us!
  7. 7. redis_wrap A wrapper for Redis datatypes, so they mimic the datatypes found in Python
  8. 8. # Mimic of Python lists bears = get_list('bears') bears.append('grizzly') assert len(bears) == 1 assert 'grizzly' in bears # Mimic of hashes villains = get_hash('villains') assert 'riddler' not in villains villains['riddler'] = 'Edward Nigma' assert 'riddler' in villains assert len(villains.keys()) == 1 del villains['riddler'] assert len(villains) == 0 # Mimic of Python sets fishes = get_set('fishes') assert 'nemo' not in fishes fishes.add('nemo') assert 'nemo' in fishes for item in fishes: assert item == 'nemo' redis_wrap: usage
  9. 9. redis_graph A simple graph database in Python
  10. 10. # Adding an edge between nodes add_edge(from_node='frodo', to_node='gandalf') assert has_edge(from_node='frodo', to_node='gandalf') == True # Getting neighbors of a node assert list(neighbors('frodo')) == ['gandalf'] # Deleting edges delete_edge(from_node='frodo', to_node='gandalf') # Setting node values set_node_value('frodo', '1') assert get_node_value('frodo') == '1' # Setting edge values set_edge_value('frodo_baggins', '2') assert get_edge_value('frodo_baggins') == '2' redis_graph: Usage
  11. 11. redis_graph: Implementation from redis_wrap import * #--- Edges ---------------------------------------------- def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edges def neighbors(node_x, system='default'): return get_set( node_x, system=system ) #--- Node values ---------------------------- def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key ) def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value ) #--- Edge values ----------------------------- def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value )
  12. 12. redis_simple_queue A simple queue in Python using Redis
  13. 13. redis_queue: usage from redis_simple_queue import * delete_jobs('tasks') put_job('tasks', '42') assert 'tasks' in get_all_queues() assert queue_stats('tasks')['queue_size'] == 1 assert reserve_job('tasks') == '42' assert queue_stats('tasks')['queue_size'] == 0
  14. 14. redis_queue: Implementation from redis_wrap import * def put(queue, job_data, system='default'): get_list(queue, system=system).append(job_data) def reserve(queue, system='default'): return get_list(queue, system=system).pop() def delete_jobs(queue, system='default'): get_redis(system).delete(queue) def get_all_queues(system='default'): return get_redis(system).keys('*').split(' ') def queue_stats(queue, system='default'): return { 'queue_size': len(get_list(queue)) }
  15. 15. Cohort/Retention Tracking How bitmapist was born
  16. 16. bitmapist: The idea MixPanel looks great!
  17. 17. bitmapist: Problem with MixPanel MixPanel would cost $2000/USD++/month
  18. 18. bitmapist + bitmapist.cohort • Implements an advanced analytics library on top of Redis bitmaps
 • • Hundreds of millions of events for Todoist • O(1) execution
  19. 19. bitmapist: Features •Has user 123 been online today? This week? •Has user 123 performed action "X"? •How many users have been active have this month? •How many unique users have performed action "X" this week? •How many % of users that were active last week are still active? •How many % of users that were active last month are still active this month? •O(1)! Using very small amounts of memory.
  20. 20. bitmapist: Bitmaps? • SETBIT, GETBIT, BITCOUNT, BITOP 
 • SETBIT somekey 8 1 • GETBIT somekey 8 • BITOP AND destkey somekey1 somekey2 •
  21. 21. bitmapist: Usage # Mark user 123 as active and has played a song mark_event('active', 123) mark_event('song:played', 123) # Answer if user 123 has been active this month assert 123 in MonthEvents('active', now.year, now.month) assert 123 in MonthEvents('song:played', now.year, now.month) # How many users have been active this week? print len(WeekEvents('active', now.year, now.isocalendar()[1])) # Perform bit operations. How many users that # have been active last month are still active this month? active_2_months = BitOpAnd( MonthEvents('active', last_month.year, last_month.month), MonthEvents('active', now.year, now.month) ) print len(active_2_months)
  22. 22. bitmapist.cohort: Visualization Read more
  23. 23. fixedlist How fixedlist was born
  24. 24. fixedlist: Problem Timelines: Exponential data growth
  25. 25. fixedlist: The Easy Solution Throw money at the problem
  26. 26. fixedlist: Cheating! • Fixed timeline size • O(1) insertion • O(1) update • O(1) get • Cacheable Solution that Facebook and Twitter use
  27. 27. fixedlist 2.5x faster than pure Redis solution 1.4x less memory than pure Redis solution
  28. 28. fixedlist: Usage # Add a value to a list fixedlist.add('hello', 'world') # Add mutliple values to multiple keys at once fixedlist.add(['hello1', 'hello2'], ['world1', 'world2']) # Get valuesfrom a list assert fixedlist.get('hello') == ['world', 'world1', 'world2'] # Remove a value fixedlist.remove('hello', 'world1') Saved Plurk tens of thousands of $
  29. 29. Redis+Lua+Python When you want: More complex data types Better performance
  30. 30. Redis+Python: Incr implementation def incr_python(key, delta=1, system='default'): client, scripts = get_redis(system) with client.pipeline() as p: value = delta old = p.get(key) if old: value = int(old) + delta p.set(key, value) p.unwatch() return value
  31. 31. Redis+Lua: Incr implementation scripts = { 'incr': client.register_script(_load_lua_script('incr.lua')) } ... def incr_lua(key, delta=1, system='default'): client, scripts = get_redis(system) return scripts['incr'](keys=['key', 'delta'], args=[key, delta]) local delta = tonumber(ARGV[2]) local value = delta local old = tonumber('get', ARGV[1])) if old then value = value + old end if not'set', ARGV[1], value) then return nil end return value
  32. 32. Perfomance: Lua 3x faster Python time python 300000 python 300000 37.77s user 12.00s system 73% cpu 1:07.73 total Lua time python 300000 python 300000 10.76s user 2.85s system 66% cpu 20.513 total
  33. 33. fixedlist in Lua Proof of concept Tokyo Tyrant example
  34. 34. Q & A More questions: @amix3k