Caching techniques in python, europython2010
Upcoming SlideShare
Loading in...5
×
 

Caching techniques in python, europython2010

on

  • 2,696 views

Slides from europython2010 conference in Birmingham on the subject of caching in python.

Slides from europython2010 conference in Birmingham on the subject of caching in python.

Statistics

Views

Total Views
2,696
Views on SlideShare
2,680
Embed Views
16

Actions

Likes
2
Downloads
42
Comments
0

5 Embeds 16

http://blog.10clouds.com 9
http://www.linkedin.com 3
https://www.linkedin.com 2
http://www.faveous.com 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Caching techniques in python, europython2010 Caching techniques in python, europython2010 Presentation Transcript

  • Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010
  • who I am • python developer, professionally for a few years now • experienced also in c and objective-c • currently working for 10clouds.com czwartek, 22 lipca 2010
  • Interesting intro • a bit of theory • common patterns • common problems • common solutions czwartek, 22 lipca 2010
  • How I think about cache • imagine a giant dict storing all your data • you have to manage all data manually • or provide some automated behaviour czwartek, 22 lipca 2010
  • similar to.... • manual memory managment in c • cache is memory • and you have to controll it manually czwartek, 22 lipca 2010
  • profits • improved performance • ...? czwartek, 22 lipca 2010
  • problems • managing any type of memory is hard • automation often have to be done custom each time czwartek, 22 lipca 2010
  • common patterns czwartek, 22 lipca 2010
  • memoization czwartek, 22 lipca 2010
  • • very old pattern (circa 1968) • we own the name to Donald Mitchie czwartek, 22 lipca 2010
  • how it works • we assosciate input with output, and store in somewhere • based on the assumption that for a given input, output is always the same czwartek, 22 lipca 2010
  • code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • what if output can change? • our pattern is still usefull • we simply need to add something czwartek, 22 lipca 2010
  • cache invalidation czwartek, 22 lipca 2010
  • There are only two hard problems in Computer Science: cache invalidation and naming things Phil Karlton czwartek, 22 lipca 2010
  • • basically, we update data in cache • we need to know when and what to change • the more granular you want to be, the harder it gets czwartek, 22 lipca 2010
  • code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • common problems czwartek, 22 lipca 2010
  • invalidating too much/ not enough • flushing all data any time something changes • not flushing cache at all • tragic effects czwartek, 22 lipca 2010
  • @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings() return [simple_function1(),simple_function2()] if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly" czwartek, 22 lipca 2010
  • invalidating too soon/ too late • your cache have to be synchronised to you db • sometimes very hard to spot • leads to tragic mistakes czwartek, 22 lipca 2010
  • @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2() if __name__ == '__main__': some_bigger_function() czwartek, 22 lipca 2010
  • superposition of dependancy • somehow less obvious problem • eventually you will start caching effects of computation • you have to know very preciselly of what your data is dependant czwartek, 22 lipca 2010
  • @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('key') def some_bigger_function(): return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) } if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys" czwartek, 22 lipca 2010
  • summing up • know your data.... • be aware what and when you cache • take care when using cached data in computation czwartek, 22 lipca 2010
  • common solutions czwartek, 22 lipca 2010
  • process level cache czwartek, 22 lipca 2010
  • why? • very fast access • simple to implement • very effective as long as you’re using single process czwartek, 22 lipca 2010
  • clever tricks with dicts czwartek, 22 lipca 2010
  • code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • invalidation czwartek, 22 lipca 2010
  • code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • application level cache czwartek, 22 lipca 2010
  • memcache czwartek, 22 lipca 2010
  • • battle tested • scales • fast • supports a few cool features • behaves a lot like dict • supports time-based expiration czwartek, 22 lipca 2010
  • libraries? • python-memcache • python-libmemcache • python-cmemcache • pylibmc czwartek, 22 lipca 2010
  • why no benchmarks • not the point of this talk :) • benchmarks are generic, caching is specific • pick your flavour, think for yourself czwartek, 22 lipca 2010
  • code example cache = memcache.Client(['localhost:11211']) def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • invalidation czwartek, 22 lipca 2010
  • code example def mem_invalidate(key): cache.set(str(key), None) czwartek, 22 lipca 2010
  • batch key managment czwartek, 22 lipca 2010
  • • what if I don’t want to expire each key manually • that’s a lot to remember • and we have to be carefull :( czwartek, 22 lipca 2010
  • groups? • group keys into sets • which are tied to one key per set • expire one key, instead of twenty czwartek, 22 lipca 2010
  • how to get there? • store some extra data • you can store dicts in cache • and cache behaves like dict • so it’s a case of comparing keys and values czwartek, 22 lipca 2010
  • #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value'] czwartek, 22 lipca 2010
  • def cached(key, group_key='', exp_time=0 ): # we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper czwartek, 22 lipca 2010
  • questions? czwartek, 22 lipca 2010
  • code samples @ http://github.com/ mdomans/europython2010 czwartek, 22 lipca 2010
  • follow me twitter: mdomans blog: blog.mdomans.com czwartek, 22 lipca 2010