Caching Techniques in Python

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    mention convoluted data structures also

    something not showing on page, why invalidation is important, why it is not easy

    common for functions heavily dependant on data, ensuring validity for performance, is it worth, sometimes simply rewrite code or test if all invalidations are needed

    big views made for editing large amount of data in one place

    what if we invalid only big_key1?

    invalidation proper in terms of amount of data considered, problems lies in frequency and timing

    again, diff between too late-- not enough == proper data, improper time, data becomes stale, even if we expire more, it won’t help with the problem

    does it make sense? questions?

    function get is defined somewhere else, used by some other piece of code, ClassX get changed, tools are generic, but the devil is in the details

    - implementation simplicity - in modern languages only several lines of straightforward code
    - algorithm maturity - known for over 40 years in this form
    - hi performance - no external services; simple language-specific data structures as storage

    Douglas Adams’ “The Hitchkiker’s Guide to the Galaxy”
    “Deep Thought” Supercomputer

    - Performance (virtually highest possible - objects are read from a global module-level dictionary)
    - Local scope (cache available in current process only)
    - "Hard" cache - never expires (possible time-based expiration implementation)
    - Threading issues (race conditions in multi-threaded environment, possible thread-safe alterations)

    _timestamps - global dictionary (like _cache) for timestamps
    VALIDITY_PERIOD - predefined caching time in seconds; of course might be parameterized

    yes, it is very similar to python dict, go have fun with it

    benchmarking baseline reference, defines the communication std

    at least so they say :)

    pooling-usefull in threaded enviroments, flags for no-delay TCP sockets, asynchronous I/O, changing hash and distribution algorithm

    usually speed gains in caching come from caching alone done properly, drooling over 50% speed gains in terms of how fast you can store and retrieve some abstract data is pretty much useless, if you use it wrong, make your code 100% optimal, then start looking for better library, its easy to misuse technology

    do you really think you can remember all your keys?:)

    still having to remeber all your keys, but this leads as to a better idea

    think about cache invalidation--how to do it properly(django signals, maybe celery based tasks) -- on-event invalidation

    decorators give us the ability to hide caching code away from someone else reading our code, yet still informing him that code us cached, so beware mere mortal :), also adding cache this way doesn’t require from us to change the original code, implementation virtually identical to memoize

    test done on macbook pro with 2,33GHz proc and 4GB ram on osx 10.6.1

    we have a set of objects that can be connected by users in any way possible, adding one object should expire all objects that can be related to him so that everything displays right

    Some cached values are based on a large set of data, and invalidating the cache or writing new values to it is impractical. This is often true of summary data, or data computed from a large set of original data.

    very fun to use at a first glance, but no way to have on-event expiration, for example, user changes his nick in account edition panel, then if we cache field with his nickname, he will still se his old in the sidebar

    use wisely, good for very high traffic parts of application, but you must consider what parts of your app can become ‚static’ for some time

    - Brilliant top performance - slashdot effect, great architecture based on virtual memory and other modern kernels’ features
    - Early stage of responding - no need to bother HTTP server or web application
    - Distribution - proxy servers on separate machines

    small elements of a web page depending on cookie (session)

    special case of duplicated content - language versions

    VCL (Varnish Configuration Language) i compiled to a shared library
    Several stages of request processing, to which you can hook up

    Utility decorator

    Final stage of shaping response just before sending it from the app.

    Very important stage - no cookies might be cached
    No cookies, no vary.

    1 Favorite

    Caching Techniques in Python - Presentation Transcript

    1. CACHING TECHNIQUES IN PYTHON RuPy 2009 Michał Chruszcz & Michał Domański Sensi Soft Ltd. (http://www.sensisoft.com/)
    2. CACHING AND HOW TO EAT IT PROPERLY Why do we need to cache data and what problems can be expected while doing it.
    3. SOME DATA MAY TAKE TIME TO LOAD • user is waiting • eventually he will leave • that’s bad
    4. MAKE IT FASTER • usually it takes more time to retrieve data from DB than from cache • gains depend heavily on level of caching involved and amount of data retrieved
    5. UPS AND DOWNS OF CACHING The easiest way to stress yourself a lot.
    6. WHY THIS PAGE ISN’T CHANGING ? Also known as „invalidation is painful”
    7. PROBLEM 1: INVALIDATING TOO MUCH/NOT ENOUGH
    8. invalidating too much • usually happens when expiring large sets of keys is as easy as invalidating one key • youcan expect this if one coder writes actual code and the other does the caching part
    9. invalidating not enough • commonly happens when one has problem with managing all references in his code • usually occurs on a higher than basic level of logic
    10. @cached('key1') EXAMPLE def simple_function1(): return get_from_db(id=1) @cached('key2') def simple_function2(): return get_from_db(id=2) #some where else in code @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ return [simple_function1(),simple_function2()]
    11. PROBLEM 2: INVALIDATING TOO LATE / TOO SOON
    12. invalidating too soon • common for environments with lots of code being saved and fetched again and again • generates extra overhead because of all get/set operations • can be an effect of poor code design • sometimes when we do so much saving and retrieving that caching isn’t the best idea
    13. invalidating too late • very common when we update large amount of data that depends on each other • sometimes easy to spot, sometimes not, can generate very obscure problems
    14. EXAMPLE @cached('key1') def simple_function1(): return get_from_db(id=1) @cached('key2') def simple_function2(): return get_from_db(id=2) #and then def some_bigger_function(): put_to_db(id=1, data) #some where far away value=simple_function1() put_to_db(id=2, value) # and somewhere totally remote(like 100 lines of code) invalidate_cache('key1') invalidate_cache('key2') return simple_function2()
    15. DEPENDENCY SUPERPOSITION Do you really know „where” you want to put your data?
    16. • get object X where X.a=1 and X.b=2 • what if X has parameter ‚c’ that at some point becomes important (or simply is added to implementation after the caching code was written) ? • forexample, if you have multiple languages on your site, do you need to consider language a parameter for cache?
    17. EXAMPLE class ClassX: def __init__(a,b,c): self.a=a self.b=b # this line was added later self.c=c def get(*args,**kwargs): """ written by original code author """ key = 'class_X_key_%s_%s' % (args[0],args[1]) @cached(key) def _get(): return get_from_db(ClassX, *args, **kwargs) return _get()
    18. QUESTIONS? Does it make sense?
    19. PROCESS-LEVEL OBJECT CACHING - MEMOIZE
    20. THE IDEA BEHIND MEMOIZATION • Implementation simplicity • Algorithm maturity • Very high performance
    21. THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING def the_answer_to_life_the_universe_and_everything(): print('Performing computation taking 7½ million years...') # We'll skip it for the time being. # sleep(7.5 * 10 ** 6 * 365 * 24 * 60 * 60) return 42
    22. SIMPLE IMPLEMENTATION _cache = {} # Global dict - our storage def memoize(key): def _decorating_wrapper(func): def _caching_wrapper(*args, **kwargs): cache_key = normalize_key(key, args, kwargs) # Value has been cached - use it if _cache.has_key(cache_key): return _cache[cache_key] # Store the return value of the function ret = func(*args, **kwargs) _cache[cache_key] = ret return ret return _caching_wrapper return _decorating_wrapper
    23. THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING... USING MEMOIZE @memoize('the_ultimate_answer') def the_answer_to_life_the_universe_and_everything(): print('Performing computation taking 7½ million years...') # We'll skip it for the time being. # sleep(7.5 * 10 ** 6 * 365 * 24 * 60 * 60) return 42
    24. CHARACTERISTICS (A.K.A. PROS & CONS) • Performance - virtually highest possible - objects are read from a global module-level dictionary • Local Scope - cache available in current process only • Hard Cache - never expires • Threading issues - possible race conditions
    25. MEMOIZE WITH TIME-BASED EXPIRATION def memoize(key): def _decorating_wrapper(func): def _caching_wrapper(*args, **kwargs): cache_key = normalize_key(key, args, kwargs) now = time.time() # If value is cached and still valid - just use it if _timestamps.get(cache_key, now) > now: return _cache[cache_key] # Store the return value of the function ret = func(*args, **kwargs) _cache[cache_key] = ret _timestamps[cache_key] = now + VALIDITY_PERIOD return ret return _caching_wrapper return _decorating_wrapper
    26. DISTRIBUTED OBJECT CACHE - MEMCACHED How cool is that ?
    27. WAIT. MEMCACHED? • one big hash-table • fast • based on libevent, can perform non-blocking I/O operations
    28. PYTHON LIBRARIES FOR MEMCACHED
    29. PYTHON-MEMCACHE • pythonic way of interacting with memcached • implemented in pure python • actually many people never go beyond this one
    30. PYTHON-CMEMCACHE • _cmemcache implements StringClient in C (support only for string type values) • standard Client is built upon it with conversion done via pickling
    31. PYTHON-LIBMEMCACHE • written using pyrex, wraps around libmemcache library • going to have UDP and binary protocol support soon
    32. PYLIBMC • usually the fastest of all • requires libmemcached • has binary protocol, supports pooling • lets you choose behaviours of underlying libmemcached library
    33. WHY NO SPEED COMPARISION BENCHMARK? • implementations differ in speed • not so important • do your own benchmarks, each case is different • thinking is much more important
    34. SOME BASIC USAGE... A few examples closely related to using memcached in your application.
    35. VERY BASIC AND VERY TEDIOUS (YOU PROBABLY KNOW IT FROM THE WEB) #somewhere client.set('some_key', 'some_value') #somewhere else in code client.get('some_key')
    36. SOMETHING THAT ACTUALLY MAY HAPPEN #somewhere client.set('some_key', some_function()) #somewhere else client.get('some_key')
    37. ACTUALLY DECORATORS ARE WHAT MAKES IT FUN @cache('some_key') def some_function(): return some_value_from_db() cached_value = some_function()
    38. WHY THE LAST ONE IS SO MUCH FUN? • Transparent caching layer • we write code and add caching the way it suits our code best • someone re-using the code doesn’t have to think about cache • clean code separation
    39. SOMETHING USUALLY FORGOTTEN • memcached implements retrieval of multiple values • significantly lowers the overhead caused by repeating same set of operations for each key
    40. BENCHMARK class BigObject(object): def __init__(self, key, size=10000): self.object=str(key)*size def test_multi(): vals=memcache_client.get_multi(keys) def test_single(): for index in xrange(10000): memcache_client.get('key_%s'% index) ######## test_multi completed in 392.811 ms test_single completed in 1257.483 ms
    41. GROUPS • What? • manage whole sets of keys • implemented as another key in cache • Useful • you don’t have to get all the keys to use them • example use case is when you have cached objects that depend on each other in a not known way
    42. HOW IT WORKS #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']
    43. EXPIRATION BY TIME • Provided by memcached itself • Usable for summary data computed over large data set • Orwhen you aren’t able to provide on-event expiration and you want defence against stale values
    44. WHY NOT CACHE RENDERED PARTS OF A WEB PAGE? • actually, we can... • it has it’s pitfalls
    45. DJANGO CACHE TEMPLATE TAG AND CACHING PARTS OF A WEB PAGE • whencaching on such a high level, list of dependencies becomes long • even something simple can have lots of code under the hood • djangogives us expiration by timeout and ability to identify value by any amount of arguments
    46. EXAMPLE {% cache 500 sidebar request.user.username %} {% comment %} .. sidebar for logged in user .. {% endcomment %} {{ user.nickname }} {% endcache %}
    47. CACHING WHOLE RESPONSE • speed boost is tremendous • practically nothing to do for your app • expiring something sensibly becomes almost impossible • timeout expiration becomes your friend
    48. EXAMPLE @cache_page(60 * 15) def my_view(request): """ some view logic here... :) """
    49. REVERSE PROXY CACHING - VARNISH
    50. WHAT VARNISH FOR? • Brilliant top performance • Early stage of responding to user • Distribution
    51. HOW TO BUILD A CACHING- FRIENDLY WEB SITE? • Static vs. dynamic parts of a web page • No duplicated content • Separate language versions are also duplicated content • HTTP headers defining caching rules
    52. DYNAMIC PARTS OF A WEB PAGE
    53. LANGUAGE VERSIONS
    54. HTTP HEADERS • Expires HTTP 1.0 header • Pragma HTTP 1.0 header • Cache-control HTTP 1.1 header
    55. HOW TO USE VARNISH? • Configure varnish behaviour • Mark application responses as cache/don’t cache • Respond with appropriate HTTP headers
    56. VARNISH CONFIGURATION • VCL language as a mean of configuration • Backends as the source of documents • Request processing stages define caching rules
    57. STORING & LOOKING UP • Storing vs. looking up • Varnish follows sane & straightforward rules in case of storing • ... but does no look-ups by default
    58. EXAMPLE CONFIGURATION FILE backend default { set backend.host = "127.0.0.1"; set backend.port = "8080"; } sub vcl_recv { set req.backend = default; if (req.request == "GET") { if (req.url ~ "^/icanhas/") { lookup; } if (req.url ~ "^/cheezburger/") { lookup; } } }
    59. MARKING RESPONSES def cache_response(max_age=60 * 60): def _wrapper(fun): def _inner(*args, **kwargs): response = fun(*args, **kwargs) if not hasattr(response, 'dont_cache'): response.cache = True response.max_age = max_age return response return _inner return _wrapper
    60. HTTP HEADERS MANIPULATION def process_response(self, request, response): # Values set by view decorator do_cache = getattr(response, 'cache', False) max_age = getattr(response, 'max_age', DEFAULT_MAX_AGE) # If a view is marked for caching, let's cache it. if do_cache: self.clear_user_content(response) response['Cache-Control'] = 'public, max-age=%d' % max_age else: response['Cache-Control'] = 'private, no-cache, must-revalidate, max-age=0' response['Pragma'] = 'no-cache' return response
    61. REMOVING USER-DATA FROM RESPONSE @classmethod def clear_user_content(cls, response): # Clearing all cookies and "Cookie" from Vary HTTP header response.cookies.clear() if response.has_header('Vary'): varies = map(lambda x: x.strip(), response['Vary'].split(',')) response['Vary'] = ', '.join(filter( lambda x: x != 'Cookie', varies))
    62. DECORATING VIEW # /cheezburger/ points to this view @cache_response(max_age=5 * 60) def cheezburger(request): response = HttpResponse('I can haz cheezburger.n') if request.GET.get('fresh'): response.content += 'And I wants this fresch.' response.dont_cache = True return response
    63. RESOURCES www.slideshare.net/mchruszcz/caching-techniques-in-python files.me.com/mchruszcz/lhe3c5
    64. CONTACT Michał Chruszcz: Michał Domański: mchruszcz@me.com mdomans@gmail.com @mchruszcz @mdomans

    + Michal ChruszczMichal Chruszcz, 3 weeks ago

    custom

    327 views, 1 favs, 0 embeds more stats

    Caching Techniques in Python presentation by Michal more

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 327
      • 327 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories