0
Caching techinques in
                                 python
                                Michael Domanski
           ...
who I am

                     • python developer, professionally for a few
                          years now
          ...
Interesting intro

                     • a bit of theory
                     • common patterns
                     • co...
How I think about
                               cache

                     • imagine a giant dict storing all your data
...
similar to....

                     • manual memory managment in c
                     • cache is memory
               ...
profits


                     • improved performance
                     • ...?


czwartek, 22 lipca 2010
problems


                     • managing any type of memory is hard
                     • automation often have to be d...
common patterns



czwartek, 22 lipca 2010
memoization



czwartek, 22 lipca 2010
• very old pattern (circa 1968)
                     • we own the name to Donald Mitchie



czwartek, 22 lipca 2010
how it works


                     • we assosciate input with output, and store
                          in somewhere
  ...
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(fun...
what if output can
                               change?

                     • our pattern is still usefull
           ...
cache invalidation



czwartek, 22 lipca 2010
There are only two hard problems in Computer
                           Science: cache invalidation and naming things
    ...
• basically, we update data in cache
                     • we need to know when and what to
                          cha...
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[ke...
common problems



czwartek, 22 lipca 2010
invalidating too much/
                                not enough

                     • flushing all data any time someth...
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cache...
invalidating too soon/
                                  too late

                     • your cache have to be synchronis...
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cache...
superposition of
                               dependancy
                     • somehow less obvious problem
           ...
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cache...
summing up
                     • know your data....
                     • be aware what and when you cache
             ...
common solutions



czwartek, 22 lipca 2010
process level cache



czwartek, 22 lipca 2010
why?

                     • very fast access
                     • simple to implement
                     • very effec...
clever tricks with dicts



czwartek, 22 lipca 2010
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(fun...
invalidation



czwartek, 22 lipca 2010
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[ke...
application level cache



czwartek, 22 lipca 2010
memcache



czwartek, 22 lipca 2010
• battle tested
                     • scales
                     • fast
                     • supports a few cool featu...
libraries?

                     • python-memcache
                     • python-libmemcache
                     • python...
why no benchmarks

                     • not the point of this talk :)
                     • benchmarks are generic, cac...
code example
                          cache = memcache.Client(['localhost:11211'])

                 def memcached(key):
...
invalidation



czwartek, 22 lipca 2010
code example
                          def mem_invalidate(key):
                            cache.set(str(key), None)




...
batch key managment



czwartek, 22 lipca 2010
• what if I don’t want to expire each key
                          manually

                     • that’s a lot to remem...
groups?

                     • group keys into sets
                     • which are tied to one key per set
            ...
how to get there?

                     • store some extra data
                     • you can store dicts in cache
      ...
#we start with specified key and group
                 key='some_key'
                 group='some_group'

              ...
def cached(key, group_key='', exp_time=0 ):

          # we don't want to mix time based and event based expiration models...
questions?



czwartek, 22 lipca 2010
code samples @
                       http://github.com/
                    mdomans/europython2010

czwartek, 22 lipca 20...
follow me

                 twitter: mdomans
                 blog:    blog.mdomans.com


czwartek, 22 lipca 2010
Upcoming SlideShare
Loading in...5
×

Caching techniques in python, europython2010

2,691

Published on

Slides from europython2010 conference in Birmingham on the subject of caching in python.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,691
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
45
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Caching techniques in python, europython2010"

  1. 1. Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010
  2. 2. who I am • python developer, professionally for a few years now • experienced also in c and objective-c • currently working for 10clouds.com czwartek, 22 lipca 2010
  3. 3. Interesting intro • a bit of theory • common patterns • common problems • common solutions czwartek, 22 lipca 2010
  4. 4. How I think about cache • imagine a giant dict storing all your data • you have to manage all data manually • or provide some automated behaviour czwartek, 22 lipca 2010
  5. 5. similar to.... • manual memory managment in c • cache is memory • and you have to controll it manually czwartek, 22 lipca 2010
  6. 6. profits • improved performance • ...? czwartek, 22 lipca 2010
  7. 7. problems • managing any type of memory is hard • automation often have to be done custom each time czwartek, 22 lipca 2010
  8. 8. common patterns czwartek, 22 lipca 2010
  9. 9. memoization czwartek, 22 lipca 2010
  10. 10. • very old pattern (circa 1968) • we own the name to Donald Mitchie czwartek, 22 lipca 2010
  11. 11. how it works • we assosciate input with output, and store in somewhere • based on the assumption that for a given input, output is always the same czwartek, 22 lipca 2010
  12. 12. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  13. 13. what if output can change? • our pattern is still usefull • we simply need to add something czwartek, 22 lipca 2010
  14. 14. cache invalidation czwartek, 22 lipca 2010
  15. 15. There are only two hard problems in Computer Science: cache invalidation and naming things Phil Karlton czwartek, 22 lipca 2010
  16. 16. • basically, we update data in cache • we need to know when and what to change • the more granular you want to be, the harder it gets czwartek, 22 lipca 2010
  17. 17. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  18. 18. common problems czwartek, 22 lipca 2010
  19. 19. invalidating too much/ not enough • flushing all data any time something changes • not flushing cache at all • tragic effects czwartek, 22 lipca 2010
  20. 20. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings() return [simple_function1(),simple_function2()] if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly" czwartek, 22 lipca 2010
  21. 21. invalidating too soon/ too late • your cache have to be synchronised to you db • sometimes very hard to spot • leads to tragic mistakes czwartek, 22 lipca 2010
  22. 22. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2() if __name__ == '__main__': some_bigger_function() czwartek, 22 lipca 2010
  23. 23. superposition of dependancy • somehow less obvious problem • eventually you will start caching effects of computation • you have to know very preciselly of what your data is dependant czwartek, 22 lipca 2010
  24. 24. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('key') def some_bigger_function(): return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) } if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys" czwartek, 22 lipca 2010
  25. 25. summing up • know your data.... • be aware what and when you cache • take care when using cached data in computation czwartek, 22 lipca 2010
  26. 26. common solutions czwartek, 22 lipca 2010
  27. 27. process level cache czwartek, 22 lipca 2010
  28. 28. why? • very fast access • simple to implement • very effective as long as you’re using single process czwartek, 22 lipca 2010
  29. 29. clever tricks with dicts czwartek, 22 lipca 2010
  30. 30. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  31. 31. invalidation czwartek, 22 lipca 2010
  32. 32. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  33. 33. application level cache czwartek, 22 lipca 2010
  34. 34. memcache czwartek, 22 lipca 2010
  35. 35. • battle tested • scales • fast • supports a few cool features • behaves a lot like dict • supports time-based expiration czwartek, 22 lipca 2010
  36. 36. libraries? • python-memcache • python-libmemcache • python-cmemcache • pylibmc czwartek, 22 lipca 2010
  37. 37. why no benchmarks • not the point of this talk :) • benchmarks are generic, caching is specific • pick your flavour, think for yourself czwartek, 22 lipca 2010
  38. 38. code example cache = memcache.Client(['localhost:11211']) def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  39. 39. invalidation czwartek, 22 lipca 2010
  40. 40. code example def mem_invalidate(key): cache.set(str(key), None) czwartek, 22 lipca 2010
  41. 41. batch key managment czwartek, 22 lipca 2010
  42. 42. • what if I don’t want to expire each key manually • that’s a lot to remember • and we have to be carefull :( czwartek, 22 lipca 2010
  43. 43. groups? • group keys into sets • which are tied to one key per set • expire one key, instead of twenty czwartek, 22 lipca 2010
  44. 44. how to get there? • store some extra data • you can store dicts in cache • and cache behaves like dict • so it’s a case of comparing keys and values czwartek, 22 lipca 2010
  45. 45. #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value'] czwartek, 22 lipca 2010
  46. 46. def cached(key, group_key='', exp_time=0 ): # we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper czwartek, 22 lipca 2010
  47. 47. questions? czwartek, 22 lipca 2010
  48. 48. code samples @ http://github.com/ mdomans/europython2010 czwartek, 22 lipca 2010
  49. 49. follow me twitter: mdomans blog: blog.mdomans.com czwartek, 22 lipca 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×