Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CACHING TECHNIQUES IN
      PYTHON
                 RuPy 2009
    Michał Chruszcz & Michał Domański
  Sensi Soft Ltd. (htt...
CACHING AND HOW TO EAT
      IT PROPERLY
Why do we need to cache data and what problems can be
              expected whil...
SOME DATA MAY TAKE TIME
          TO LOAD

• user   is waiting

• eventually     he will leave

• that’s   bad
MAKE IT FASTER

• usually
        it takes more time to retrieve data from DB
 than from cache

• gains
     depend heavil...
UPS AND DOWNS OF
     CACHING
 The easiest way to stress yourself a lot.
WHY THIS PAGE ISN’T
  CHANGING ?
 Also known as „invalidation is painful”
PROBLEM 1:
 INVALIDATING TOO
MUCH/NOT ENOUGH
invalidating too much


• usually
        happens when expiring large sets of keys is as
 easy as invalidating one key

• ...
invalidating not enough


• commonly happens when one has problem with
 managing all references in his code

• usually   o...
@cached('key1')
                      EXAMPLE
def simple_function1():
    return get_from_db(id=1)

@cached('key2')
def si...
PROBLEM 2:
INVALIDATING TOO
 LATE / TOO SOON
invalidating too soon



• common  for environments with lots of code being saved and
 fetched again and again

• generate...
invalidating too late




• very
     common when we update large amount of data that
 depends on each other

• sometimes ...
EXAMPLE
@cached('key1')
def simple_function1():
    return get_from_db(id=1)

@cached('key2')
def simple_function2():
    ...
DEPENDENCY
          SUPERPOSITION
Do you really know „where” you want to put your data?
• get   object X where X.a=1 and X.b=2

• what if X has parameter ‚c’ that at some point
 becomes important (or simply is ...
EXAMPLE
class ClassX:
    def __init__(a,b,c):
        self.a=a
        self.b=b
        # this line was added later
     ...
QUESTIONS?
 Does it make sense?
PROCESS-LEVEL OBJECT
 CACHING - MEMOIZE
THE IDEA BEHIND
               MEMOIZATION


• Implementation   simplicity

• Algorithm   maturity

• Very   high performa...
THE ANSWER TO LIFE, THE
 UNIVERSE AND EVERYTHING
def the_answer_to_life_the_universe_and_everything():
    print('Performi...
SIMPLE IMPLEMENTATION
_cache = {} # Global dict - our storage

def memoize(key):
    def _decorating_wrapper(func):
      ...
THE ANSWER TO LIFE, THE
UNIVERSE AND EVERYTHING...
     USING MEMOIZE
@memoize('the_ultimate_answer')
def the_answer_to_li...
CHARACTERISTICS
           (A.K.A. PROS & CONS)

• Performance - virtually highest possible - objects are read
 from a glo...
MEMOIZE WITH TIME-BASED
        EXPIRATION
def memoize(key):
    def _decorating_wrapper(func):
        def _caching_wrapp...
DISTRIBUTED OBJECT
      CACHE -
   MEMCACHED
     How cool is that ?
WAIT. MEMCACHED?
• one    big hash-table

• fast

  • based on libevent, can perform non-blocking I/O
    operations
PYTHON LIBRARIES FOR
    MEMCACHED
PYTHON-MEMCACHE

• pythonic    way of interacting with memcached

• implemented     in pure python

• actually   many peop...
PYTHON-CMEMCACHE



• _cmemcache    implements StringClient in C (support
 only for string type values)

• standard  Clien...
PYTHON-LIBMEMCACHE



• written   using pyrex, wraps around libmemcache
 library

• going   to have UDP and binary protoco...
PYLIBMC


• usually   the fastest of all

• requires   libmemcached

• has    binary protocol, supports pooling

• lets
  ...
WHY NO SPEED COMPARISION
                   BENCHMARK?



• implementations      differ in speed

• not   so important

• ...
SOME BASIC USAGE...

A few examples closely related to using memcached in your
                       application.
VERY BASIC AND VERY TEDIOUS (YOU
PROBABLY KNOW IT FROM THE WEB)


 #somewhere
 client.set('some_key', 'some_value')

 #som...
SOMETHING THAT ACTUALLY MAY
          HAPPEN
#somewhere
client.set('some_key', some_function())
#somewhere else
client.get...
ACTUALLY DECORATORS ARE WHAT
         MAKES IT FUN
@cache('some_key')
def some_function():
      return some_value_from_db...
WHY THE LAST ONE IS SO MUCH
              FUN?

• Transparent   caching layer

 • we write code and add caching the way it...
SOMETHING USUALLY FORGOTTEN




• memcached      implements retrieval of multiple values

• significantly
            lower...
BENCHMARK
class BigObject(object):
    def __init__(self, key, size=10000):
        self.object=str(key)*size

def test_mu...
GROUPS
• What?

  • manage     whole sets of keys

  • implemented     as another key in cache

• Useful

  • you    don’t...
HOW IT WORKS
   #we start with specified key and group
key='some_key'
group='some_group'

# now retrieve some data from me...
EXPIRATION BY TIME




• Provided   by memcached itself

• Usable   for summary data computed over large data set

• Orwhe...
WHY NOT CACHE
   RENDERED PARTS OF A
       WEB PAGE?

• actually, we     can...

• it   has it’s pitfalls
DJANGO CACHE TEMPLATE TAG AND
       CACHING PARTS OF A WEB PAGE


• whencaching on such a high level, list of dependencie...
EXAMPLE

{% cache 500 sidebar request.user.username %}
    {% comment %}
      .. sidebar for logged in user ..
    {% end...
CACHING WHOLE RESPONSE


• speed   boost is tremendous

  • practically   nothing to do for your app

• expiring
         ...
EXAMPLE

@cache_page(60 * 15)
def my_view(request):
    """
    some view logic here... :)
    """
REVERSE PROXY CACHING -
         VARNISH
WHAT VARNISH FOR?


• Brilliant   top performance

• Early   stage of responding to user

• Distribution
HOW TO BUILD A CACHING-
    FRIENDLY WEB SITE?

• Static   vs. dynamic parts of a web page

• No   duplicated content

• S...
DYNAMIC PARTS OF A
    WEB PAGE
LANGUAGE VERSIONS
HTTP HEADERS


• Expires   HTTP 1.0 header

• Pragma    HTTP 1.0 header

• Cache-control   HTTP 1.1 header
HOW TO USE VARNISH?


• Configure   varnish behaviour

• Mark   application responses as cache/don’t cache

• Respond   wit...
VARNISH CONFIGURATION


• VCL   language as a mean of configuration

• Backends   as the source of documents

• Request   p...
STORING & LOOKING UP


• Storing    vs. looking up

• Varnish     follows sane & straightforward rules in case of storing
...
EXAMPLE
     CONFIGURATION FILE
backend default {
    set backend.host = "127.0.0.1";
    set backend.port = "8080";
}
sub...
MARKING RESPONSES

def cache_response(max_age=60 * 60):
  def _wrapper(fun):
      def _inner(*args, **kwargs):
          ...
HTTP HEADERS
               MANIPULATION
def process_response(self, request, response):
    # Values set by view decorator...
REMOVING USER-DATA
         FROM RESPONSE

@classmethod
def clear_user_content(cls, response):
    # Clearing all cookies ...
DECORATING VIEW


# /cheezburger/ points to this view
@cache_response(max_age=5 * 60)
def cheezburger(request):
    respon...
RESOURCES



www.slideshare.net/mchruszcz/caching-techniques-in-python

files.me.com/mchruszcz/lhe3c5
CONTACT

Michał Chruszcz:       Michał Domański:

mchruszcz@me.com       mdomans@gmail.com

@mchruszcz             @mdomans
Upcoming SlideShare
Loading in …5
×

Caching Techniques in Python

43,533 views

Published on

Caching Techniques in Python presentation by Michal Chruszcz & Michal Domanski at RuPy conference in Poznan, Poland, 2009.

Published in: Technology
  • Be the first to comment

Caching Techniques in Python

  1. 1. CACHING TECHNIQUES IN PYTHON RuPy 2009 Michał Chruszcz & Michał Domański Sensi Soft Ltd. (http://www.sensisoft.com/)
  2. 2. CACHING AND HOW TO EAT IT PROPERLY Why do we need to cache data and what problems can be expected while doing it.
  3. 3. SOME DATA MAY TAKE TIME TO LOAD • user is waiting • eventually he will leave • that’s bad
  4. 4. MAKE IT FASTER • usually it takes more time to retrieve data from DB than from cache • gains depend heavily on level of caching involved and amount of data retrieved
  5. 5. UPS AND DOWNS OF CACHING The easiest way to stress yourself a lot.
  6. 6. WHY THIS PAGE ISN’T CHANGING ? Also known as „invalidation is painful”
  7. 7. PROBLEM 1: INVALIDATING TOO MUCH/NOT ENOUGH
  8. 8. invalidating too much • usually happens when expiring large sets of keys is as easy as invalidating one key • youcan expect this if one coder writes actual code and the other does the caching part
  9. 9. invalidating not enough • commonly happens when one has problem with managing all references in his code • usually occurs on a higher than basic level of logic
  10. 10. @cached('key1') EXAMPLE def simple_function1(): return get_from_db(id=1) @cached('key2') def simple_function2(): return get_from_db(id=2) #some where else in code @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ return [simple_function1(),simple_function2()]
  11. 11. PROBLEM 2: INVALIDATING TOO LATE / TOO SOON
  12. 12. invalidating too soon • common for environments with lots of code being saved and fetched again and again • generates extra overhead because of all get/set operations • can be an effect of poor code design • sometimes when we do so much saving and retrieving that caching isn’t the best idea
  13. 13. invalidating too late • very common when we update large amount of data that depends on each other • sometimes easy to spot, sometimes not, can generate very obscure problems
  14. 14. EXAMPLE @cached('key1') def simple_function1(): return get_from_db(id=1) @cached('key2') def simple_function2(): return get_from_db(id=2) #and then def some_bigger_function(): put_to_db(id=1, data) #some where far away value=simple_function1() put_to_db(id=2, value) # and somewhere totally remote(like 100 lines of code) invalidate_cache('key1') invalidate_cache('key2') return simple_function2()
  15. 15. DEPENDENCY SUPERPOSITION Do you really know „where” you want to put your data?
  16. 16. • get object X where X.a=1 and X.b=2 • what if X has parameter ‚c’ that at some point becomes important (or simply is added to implementation after the caching code was written) ? • forexample, if you have multiple languages on your site, do you need to consider language a parameter for cache?
  17. 17. EXAMPLE class ClassX: def __init__(a,b,c): self.a=a self.b=b # this line was added later self.c=c def get(*args,**kwargs): """ written by original code author """ key = 'class_X_key_%s_%s' % (args[0],args[1]) @cached(key) def _get(): return get_from_db(ClassX, *args, **kwargs) return _get()
  18. 18. QUESTIONS? Does it make sense?
  19. 19. PROCESS-LEVEL OBJECT CACHING - MEMOIZE
  20. 20. THE IDEA BEHIND MEMOIZATION • Implementation simplicity • Algorithm maturity • Very high performance
  21. 21. THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING def the_answer_to_life_the_universe_and_everything(): print('Performing computation taking 7½ million years...') # We'll skip it for the time being. # sleep(7.5 * 10 ** 6 * 365 * 24 * 60 * 60) return 42
  22. 22. SIMPLE IMPLEMENTATION _cache = {} # Global dict - our storage def memoize(key): def _decorating_wrapper(func): def _caching_wrapper(*args, **kwargs): cache_key = normalize_key(key, args, kwargs) # Value has been cached - use it if _cache.has_key(cache_key): return _cache[cache_key] # Store the return value of the function ret = func(*args, **kwargs) _cache[cache_key] = ret return ret return _caching_wrapper return _decorating_wrapper
  23. 23. THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING... USING MEMOIZE @memoize('the_ultimate_answer') def the_answer_to_life_the_universe_and_everything(): print('Performing computation taking 7½ million years...') # We'll skip it for the time being. # sleep(7.5 * 10 ** 6 * 365 * 24 * 60 * 60) return 42
  24. 24. CHARACTERISTICS (A.K.A. PROS & CONS) • Performance - virtually highest possible - objects are read from a global module-level dictionary • Local Scope - cache available in current process only • Hard Cache - never expires • Threading issues - possible race conditions
  25. 25. MEMOIZE WITH TIME-BASED EXPIRATION def memoize(key): def _decorating_wrapper(func): def _caching_wrapper(*args, **kwargs): cache_key = normalize_key(key, args, kwargs) now = time.time() # If value is cached and still valid - just use it if _timestamps.get(cache_key, now) > now: return _cache[cache_key] # Store the return value of the function ret = func(*args, **kwargs) _cache[cache_key] = ret _timestamps[cache_key] = now + VALIDITY_PERIOD return ret return _caching_wrapper return _decorating_wrapper
  26. 26. DISTRIBUTED OBJECT CACHE - MEMCACHED How cool is that ?
  27. 27. WAIT. MEMCACHED? • one big hash-table • fast • based on libevent, can perform non-blocking I/O operations
  28. 28. PYTHON LIBRARIES FOR MEMCACHED
  29. 29. PYTHON-MEMCACHE • pythonic way of interacting with memcached • implemented in pure python • actually many people never go beyond this one
  30. 30. PYTHON-CMEMCACHE • _cmemcache implements StringClient in C (support only for string type values) • standard Client is built upon it with conversion done via pickling
  31. 31. PYTHON-LIBMEMCACHE • written using pyrex, wraps around libmemcache library • going to have UDP and binary protocol support soon
  32. 32. PYLIBMC • usually the fastest of all • requires libmemcached • has binary protocol, supports pooling • lets you choose behaviours of underlying libmemcached library
  33. 33. WHY NO SPEED COMPARISION BENCHMARK? • implementations differ in speed • not so important • do your own benchmarks, each case is different • thinking is much more important
  34. 34. SOME BASIC USAGE... A few examples closely related to using memcached in your application.
  35. 35. VERY BASIC AND VERY TEDIOUS (YOU PROBABLY KNOW IT FROM THE WEB) #somewhere client.set('some_key', 'some_value') #somewhere else in code client.get('some_key')
  36. 36. SOMETHING THAT ACTUALLY MAY HAPPEN #somewhere client.set('some_key', some_function()) #somewhere else client.get('some_key')
  37. 37. ACTUALLY DECORATORS ARE WHAT MAKES IT FUN @cache('some_key') def some_function(): return some_value_from_db() cached_value = some_function()
  38. 38. WHY THE LAST ONE IS SO MUCH FUN? • Transparent caching layer • we write code and add caching the way it suits our code best • someone re-using the code doesn’t have to think about cache • clean code separation
  39. 39. SOMETHING USUALLY FORGOTTEN • memcached implements retrieval of multiple values • significantly lowers the overhead caused by repeating same set of operations for each key
  40. 40. BENCHMARK class BigObject(object): def __init__(self, key, size=10000): self.object=str(key)*size def test_multi(): vals=memcache_client.get_multi(keys) def test_single(): for index in xrange(10000): memcache_client.get('key_%s'% index) ######## test_multi completed in 392.811 ms test_single completed in 1257.483 ms
  41. 41. GROUPS • What? • manage whole sets of keys • implemented as another key in cache • Useful • you don’t have to get all the keys to use them • example use case is when you have cached objects that depend on each other in a not known way
  42. 42. HOW IT WORKS #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']
  43. 43. EXPIRATION BY TIME • Provided by memcached itself • Usable for summary data computed over large data set • Orwhen you aren’t able to provide on-event expiration and you want defence against stale values
  44. 44. WHY NOT CACHE RENDERED PARTS OF A WEB PAGE? • actually, we can... • it has it’s pitfalls
  45. 45. DJANGO CACHE TEMPLATE TAG AND CACHING PARTS OF A WEB PAGE • whencaching on such a high level, list of dependencies becomes long • even something simple can have lots of code under the hood • djangogives us expiration by timeout and ability to identify value by any amount of arguments
  46. 46. EXAMPLE {% cache 500 sidebar request.user.username %} {% comment %} .. sidebar for logged in user .. {% endcomment %} {{ user.nickname }} {% endcache %}
  47. 47. CACHING WHOLE RESPONSE • speed boost is tremendous • practically nothing to do for your app • expiring something sensibly becomes almost impossible • timeout expiration becomes your friend
  48. 48. EXAMPLE @cache_page(60 * 15) def my_view(request): """ some view logic here... :) """
  49. 49. REVERSE PROXY CACHING - VARNISH
  50. 50. WHAT VARNISH FOR? • Brilliant top performance • Early stage of responding to user • Distribution
  51. 51. HOW TO BUILD A CACHING- FRIENDLY WEB SITE? • Static vs. dynamic parts of a web page • No duplicated content • Separate language versions are also duplicated content • HTTP headers defining caching rules
  52. 52. DYNAMIC PARTS OF A WEB PAGE
  53. 53. LANGUAGE VERSIONS
  54. 54. HTTP HEADERS • Expires HTTP 1.0 header • Pragma HTTP 1.0 header • Cache-control HTTP 1.1 header
  55. 55. HOW TO USE VARNISH? • Configure varnish behaviour • Mark application responses as cache/don’t cache • Respond with appropriate HTTP headers
  56. 56. VARNISH CONFIGURATION • VCL language as a mean of configuration • Backends as the source of documents • Request processing stages define caching rules
  57. 57. STORING & LOOKING UP • Storing vs. looking up • Varnish follows sane & straightforward rules in case of storing • ... but does no look-ups by default
  58. 58. EXAMPLE CONFIGURATION FILE backend default { set backend.host = "127.0.0.1"; set backend.port = "8080"; } sub vcl_recv { set req.backend = default; if (req.request == "GET") { if (req.url ~ "^/icanhas/") { lookup; } if (req.url ~ "^/cheezburger/") { lookup; } } }
  59. 59. MARKING RESPONSES def cache_response(max_age=60 * 60): def _wrapper(fun): def _inner(*args, **kwargs): response = fun(*args, **kwargs) if not hasattr(response, 'dont_cache'): response.cache = True response.max_age = max_age return response return _inner return _wrapper
  60. 60. HTTP HEADERS MANIPULATION def process_response(self, request, response): # Values set by view decorator do_cache = getattr(response, 'cache', False) max_age = getattr(response, 'max_age', DEFAULT_MAX_AGE) # If a view is marked for caching, let's cache it. if do_cache: self.clear_user_content(response) response['Cache-Control'] = 'public, max-age=%d' % max_age else: response['Cache-Control'] = 'private, no-cache, must-revalidate, max-age=0' response['Pragma'] = 'no-cache' return response
  61. 61. REMOVING USER-DATA FROM RESPONSE @classmethod def clear_user_content(cls, response): # Clearing all cookies and "Cookie" from Vary HTTP header response.cookies.clear() if response.has_header('Vary'): varies = map(lambda x: x.strip(), response['Vary'].split(',')) response['Vary'] = ', '.join(filter( lambda x: x != 'Cookie', varies))
  62. 62. DECORATING VIEW # /cheezburger/ points to this view @cache_response(max_age=5 * 60) def cheezburger(request): response = HttpResponse('I can haz cheezburger.n') if request.GET.get('fresh'): response.content += 'And I wants this fresch.' response.dont_cache = True return response
  63. 63. RESOURCES www.slideshare.net/mchruszcz/caching-techniques-in-python files.me.com/mchruszcz/lhe3c5
  64. 64. CONTACT Michał Chruszcz: Michał Domański: mchruszcz@me.com mdomans@gmail.com @mchruszcz @mdomans

×