Redis and Python

    by Josiah Carlson
        @dr_josiah
 dr-josiah.blogspot.com
  bit.ly/redis-in-action
Redis and Python;
 It's PB & J time
     by Josiah Carlson
         @dr_josiah
  dr-josiah.blogspot.com
   bit.ly/redis-in-action
What will be covered
•   Who am I?
•   What is Redis?
•   Why Redis with Python?
•   Cool stuff you can do by combining them
Who am I?
•   A Python user for 12+ years
•   Former python-dev bike-shedder
•   Former maintainer of Python async sockets libraries
•   Author of a few small OS projects
    o   rpqueue, parse-crontab, async_http, timezone-utils, PyPE

•   Worked at some cool places you've never heard of
    (Networks In Motion, Ad.ly)
•   Cool places you have (Google)
•   And cool places you will (ChowNow)
•   Heavy user of Redis
•   Author of upcoming Redis in Action
What is Redis?
•   In-memory database/data structure server
     o Limited to main memory; vm and diskstore defunct
•   Persistence via snapshot or append-only file
•   Support for master/slave replication (multiple slaves
    and slave chaining supported)
     o No master-master, don't even try
     o Client-side sharding
     o Cluster is in-progress
•   Five data structures + publish/subscribe
     o Strings, Lists, Sets, Hashes, Sorted Sets (ZSETs)
•   Server-side scripting with Lua in Redis 2.6
What is Redis? (compared to other
databases/caches)
• Memcached
  o in-memory, no-persistence, counters, strings, very fast, multi-threaded

• Redis
  o in-memory, optionally persisted, data structures, very fast, server-side
    scripting, single-threaded
• MongoDB
  o on-disk, speed inversely related to data integrity, bson, master/slave,
    sharding, multi-master, server-side mapreduce, database-level locking
• Riak
  o on-disk, pluggable data stores, multi-master sharding, RESTful API,
    server-side map-reduce, (Erlang + C)
• MySQL/PostgreSQL
  o on-disk/in-memory, pluggable data stores, master/slave, sharding,
    stored procedures, ...
What is Redis? (Strings)
•   Really scalars of a few different types
    o   Character strings
          concatenate values to the end
          get/set individual bits
          get/set byte ranges
    o   Integers (platform long int)
          increment/decrement
          auto "casting"
    o   Floats (IEEE 754 FP Double)
          increment/decrement
          auto "casting"
What is Redis? (Lists)
•   Doubly-linked list of character strings
    o   Push/pop from both ends
    o   [Blocking] pop from multiple lists
    o   [Blocking] pop from one list, push on another
    o   Get/set/search for item in a list
    o   Sortable
What is Redis? (Sets)
•   Unique unordered sequence of character
    strings
    o   Backed by a hash table
    o   Add, remove, check membership, pop, random pop
    o   Set intersection, union, difference
    o   Sortable
What is Redis? (Hashes)
•   Key-value mapping inside a key
    o   Get/Set/Delete single/multiple
    o   Increment values by ints/floats
    o   Bulk fetch of Keys/Values/Both
    o   Sort-of like a small version of Redis that only
        supports strings/ints/floats
What is Redis? (Sorted Sets -
ZSETs)
•   Like a Hash, with 'members' and 'scores',
    scores limited to float values
    o   Get, set, delete, increment
    o   Can be accessed by the sorted order of the
        (score,member) pair
          By score
          By index
What is Redis? (Publish/Subscribe)
•   Readers subscribe to "channels" (exact
    strings or patterns)
•   Writers publish to channels, broadcasting to
    all subscribers
•   Messages are transient
Why Redis with Python?
•   The power of Python lies in:
    o   Reasonably sane syntax/semantics
    o   Easy manipulation of data and data structures
    o   Large and growing community
•   Redis also has:
    o   Reasonably sane syntax/semantics
    o   Easy manipulation of data and data structures
    o   Medium-sized and growing community
    o   Available as remote server
         Like a remote IPython, only for data
         So useful, people have asked for a library version
Per-hour and Per-day hit counters
from itertools import imap
import redis
def process_lines(prefix, logfile):
   conn = redis.Redis()
   for log in imap(parse_line, open(logfile, 'rb')):
       time = log.timestamp.isoformat()
       hour = time.partition(':')[0]
       day = time.partition('T')[0]
       conn.zincrby(prefix + hour, log.path)
       conn.zincrby(prefix + day, log.path)
       conn.expire(prefix + hour, 7*86400)
       conn.expire(prefix + day, 30*86400)
Per-hour and Per-day hit counters
(with pipelines for speed)
from itertools import imap
import redis
def process_lines(prefix, logfile):
    pipe = redis.Redis().pipeline(False)
    for i, log in enumerate(imap(parse_line, open(logfile, 'rb'))):
        time = log.timestamp.isoformat()
        hour = time.partition(':')[0]
        day = time.partition('T')[0]
        pipe.zincrby(prefix + hour, log.path)
        pipe.zincrby(prefix + day, log.path)
        pipe.expire(prefix + hour, 7*86400)
        pipe.expire(prefix + day, 30*86400)
        if not i % 1000:
               pipe.execute()
    pipe.execute()
Simple task queue - add/run items
import json
import redis


def add_item(queue, name, *args, **kwargs):
   redis.Redis().rpush(queue,
       json.dumps([name, args, kwargs]))


def execute_one(queues):
   item = redis.Redis().blpop(queues, 30)
   name, args, kwargs = json.loads(item)
   REGISTRY[name](*args, **kwargs)
Simple task queue - register tasks
REGISTRY = {}
def task(queue):
    def wrapper(function):
        def defer(*args, **kwargs):
            add_item(queue, name, *args, **kwargs)
        name = function.__name__
        if name in REGISTRY:
            raise Exception(
                   "Duplicate callback %s"%(name,))
        REGISTRY[name] = function
        return defer
    if isinstance(queue, str):
        return wrapper
    function, queue = queue, 'default'
    return wrapper(function)
Simple task queue – register tasks
@task('high')
def do_something(arg):
    pass


@task
def do_something_else(arg):
    pass
Cool stuff to do...
•    Reddit                •   Publish/Subscribe
•    Caching               •   Messaging
•    Cookies               •   Search engines
•    Analytics             •   Ad targeting
•    Configuration         •   Twitter
     management            •   Chat rooms
•    Autocomplete          •   Job search
•    Distributed locks     •   ...
•    Counting Semaphores
•    Task queues
Thank you
      @dr_josiah
dr-josiah.blogspot.com
 bit.ly/redis-in-action



     Questions?

Python redis talk

  • 1.
    Redis and Python by Josiah Carlson @dr_josiah dr-josiah.blogspot.com bit.ly/redis-in-action
  • 2.
    Redis and Python; It's PB & J time by Josiah Carlson @dr_josiah dr-josiah.blogspot.com bit.ly/redis-in-action
  • 3.
    What will becovered • Who am I? • What is Redis? • Why Redis with Python? • Cool stuff you can do by combining them
  • 4.
    Who am I? • A Python user for 12+ years • Former python-dev bike-shedder • Former maintainer of Python async sockets libraries • Author of a few small OS projects o rpqueue, parse-crontab, async_http, timezone-utils, PyPE • Worked at some cool places you've never heard of (Networks In Motion, Ad.ly) • Cool places you have (Google) • And cool places you will (ChowNow) • Heavy user of Redis • Author of upcoming Redis in Action
  • 5.
    What is Redis? • In-memory database/data structure server o Limited to main memory; vm and diskstore defunct • Persistence via snapshot or append-only file • Support for master/slave replication (multiple slaves and slave chaining supported) o No master-master, don't even try o Client-side sharding o Cluster is in-progress • Five data structures + publish/subscribe o Strings, Lists, Sets, Hashes, Sorted Sets (ZSETs) • Server-side scripting with Lua in Redis 2.6
  • 6.
    What is Redis?(compared to other databases/caches) • Memcached o in-memory, no-persistence, counters, strings, very fast, multi-threaded • Redis o in-memory, optionally persisted, data structures, very fast, server-side scripting, single-threaded • MongoDB o on-disk, speed inversely related to data integrity, bson, master/slave, sharding, multi-master, server-side mapreduce, database-level locking • Riak o on-disk, pluggable data stores, multi-master sharding, RESTful API, server-side map-reduce, (Erlang + C) • MySQL/PostgreSQL o on-disk/in-memory, pluggable data stores, master/slave, sharding, stored procedures, ...
  • 7.
    What is Redis?(Strings) • Really scalars of a few different types o Character strings  concatenate values to the end  get/set individual bits  get/set byte ranges o Integers (platform long int)  increment/decrement  auto "casting" o Floats (IEEE 754 FP Double)  increment/decrement  auto "casting"
  • 8.
    What is Redis?(Lists) • Doubly-linked list of character strings o Push/pop from both ends o [Blocking] pop from multiple lists o [Blocking] pop from one list, push on another o Get/set/search for item in a list o Sortable
  • 9.
    What is Redis?(Sets) • Unique unordered sequence of character strings o Backed by a hash table o Add, remove, check membership, pop, random pop o Set intersection, union, difference o Sortable
  • 10.
    What is Redis?(Hashes) • Key-value mapping inside a key o Get/Set/Delete single/multiple o Increment values by ints/floats o Bulk fetch of Keys/Values/Both o Sort-of like a small version of Redis that only supports strings/ints/floats
  • 11.
    What is Redis?(Sorted Sets - ZSETs) • Like a Hash, with 'members' and 'scores', scores limited to float values o Get, set, delete, increment o Can be accessed by the sorted order of the (score,member) pair  By score  By index
  • 12.
    What is Redis?(Publish/Subscribe) • Readers subscribe to "channels" (exact strings or patterns) • Writers publish to channels, broadcasting to all subscribers • Messages are transient
  • 13.
    Why Redis withPython? • The power of Python lies in: o Reasonably sane syntax/semantics o Easy manipulation of data and data structures o Large and growing community • Redis also has: o Reasonably sane syntax/semantics o Easy manipulation of data and data structures o Medium-sized and growing community o Available as remote server  Like a remote IPython, only for data  So useful, people have asked for a library version
  • 14.
    Per-hour and Per-dayhit counters from itertools import imap import redis def process_lines(prefix, logfile): conn = redis.Redis() for log in imap(parse_line, open(logfile, 'rb')): time = log.timestamp.isoformat() hour = time.partition(':')[0] day = time.partition('T')[0] conn.zincrby(prefix + hour, log.path) conn.zincrby(prefix + day, log.path) conn.expire(prefix + hour, 7*86400) conn.expire(prefix + day, 30*86400)
  • 15.
    Per-hour and Per-dayhit counters (with pipelines for speed) from itertools import imap import redis def process_lines(prefix, logfile): pipe = redis.Redis().pipeline(False) for i, log in enumerate(imap(parse_line, open(logfile, 'rb'))): time = log.timestamp.isoformat() hour = time.partition(':')[0] day = time.partition('T')[0] pipe.zincrby(prefix + hour, log.path) pipe.zincrby(prefix + day, log.path) pipe.expire(prefix + hour, 7*86400) pipe.expire(prefix + day, 30*86400) if not i % 1000: pipe.execute() pipe.execute()
  • 16.
    Simple task queue- add/run items import json import redis def add_item(queue, name, *args, **kwargs): redis.Redis().rpush(queue, json.dumps([name, args, kwargs])) def execute_one(queues): item = redis.Redis().blpop(queues, 30) name, args, kwargs = json.loads(item) REGISTRY[name](*args, **kwargs)
  • 17.
    Simple task queue- register tasks REGISTRY = {} def task(queue): def wrapper(function): def defer(*args, **kwargs): add_item(queue, name, *args, **kwargs) name = function.__name__ if name in REGISTRY: raise Exception( "Duplicate callback %s"%(name,)) REGISTRY[name] = function return defer if isinstance(queue, str): return wrapper function, queue = queue, 'default' return wrapper(function)
  • 18.
    Simple task queue– register tasks @task('high') def do_something(arg): pass @task def do_something_else(arg): pass
  • 19.
    Cool stuff todo... • Reddit • Publish/Subscribe • Caching • Messaging • Cookies • Search engines • Analytics • Ad targeting • Configuration • Twitter management • Chat rooms • Autocomplete • Job search • Distributed locks • ... • Counting Semaphores • Task queues
  • 20.
    Thank you @dr_josiah dr-josiah.blogspot.com bit.ly/redis-in-action Questions?