Caching Up and Down the Stack

  • 111 views
Uploaded on

Whether you're looking to make your web app run faster or scale better, one great way to achieve both is to simply do less work. How? By using caches, the data hidey-holes which generations of …

Whether you're looking to make your web app run faster or scale better, one great way to achieve both is to simply do less work. How? By using caches, the data hidey-holes which generations of engineers have thoughtfully left at key junctures in computing infrastructure from your CPU to the backbone of the internet. Requests into web applications, which span great distances and often involve expensive frontend and backend lifting are great candidates for caching of all types. We'll discuss the benefits and tradeoffs of caching at different layers of the stack and how to find low-hanging cachable fruit, with a particular focus on server-side improvements

More in: Software
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
111
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Caching Up and Down the Stack Long Island/Queens Django Meetup 5/20/14
  • 2. Hi, I’m Dan Kuebrich ● Software engineer, python fan ● Web performance geek ● Founder of Tracelytics, now part of AppNeta ● Once (and future?) Queens resident
  • 3. DJANGO
  • 4. What is “caching”? ● Caching is avoiding doing expensive work o by doing cheaper work ● Common examples? o On repeat visits, your browser doesn’t download images that haven’t changed o Your CPU caches instructions, data so it doesn’t have to go to RAM… or to disk!
  • 5. What is “caching”? Uncached Client Data Source
  • 6. What is “caching”? Client Data Source Uncached Cached Cache Intermediary Client Data Source
  • 7. What is “caching”? Client Data Source Uncached Cached Cache Intermediary Client Data Source Fast! Slow...
  • 8. “Latency Numbers Every Programmer Should Know” Systems Performance: Enterprise and the Cloud by Brendan Gregg http://books.google.com/books?id=xQdvAQAAQBAJ&pg=PA20&lpg=PA20&source=bl&ots=hlTgyxdrnR&sig=CCjddHrY1H6muMVW9BFcbdO7DDo&hl=en&sa=X&ei=dS7oUquhOYr9oAT9oYGoDw&ved=0CCkQ6AEwAA#v=onepage &q&f=false
  • 9. A whole mess of caching: ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization Closer to the user Closer to the data
  • 10. Caching in Django apps: Frontend ● Client-side assets ● Full pages
  • 11. Client-side assets
  • 12. Client-side assets
  • 13. Client-side assets ● Use HTTP caches! o Browser o CDN o Intermediate proxies ● Set policy with cache headers o Cache-Control / Expires o ETag / Last-Modified
  • 14. HTTP Cache-Control and Expires ● Stop the browser from even asking for it ● Expires o Pick a date in the future, good til then ● Cache-control o More flexible o Introduced in HTTP 1.1 o Use this one
  • 15. HTTP Cache-Control and Expires dan@JLTM21:~$ curl -I https://login.tv.appneta.com/cache/tl-layouts_base_unauth- compiled-162c2ceecd9a7ff1e65ab460c2b99852a49f5a43.css HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=315360000 Content-length: 5955 Content-Type: text/css Date: Tue, 20 May 2014 23:12:16 GMT Expires: Thu, 31 Dec 2037 23:55:55 GMT Last-Modified: Fri, 16 May 2014 20:51:19 GMT Server: nginx Connection: keep-alive
  • 16. HTTP Cache Control in Django https://docs.djangoproject.com/en/dev/topics/cache/
  • 17. ETag + Last-Modified
  • 18. ETag + Last-Modified dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css HTTP/1.1 200 OK Last-Modified: Tue, 20 May 2014 05:52:50 GMT ETag: "30854c-1c3d3-4f9ce7d715080" Vary: Accept-Encoding Content-Type: text/css ...
  • 19. ETag + Last-Modified dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css --header 'If-None- Match: "30854c-1c3d3-4f9ce7d715080"' HTTP/1.1 304 Not Modified Last-Modified: Tue, 20 May 2014 05:52:50 GMT ETag: "30854c-1c3d3-4f9ce7d715080" Vary: Accept-Encoding Content-Type: text/css Date: Tue, 20 May 2014 23:21:12 GMT ...
  • 20. ETag vs Last-Modified ● Last-Modified is date-based ● ETag is content-based ● Most webservers generate both ● Some webservers (Apache) generate etags that depend on local state o If you have a load-balanced pool of servers working here, they might not be using the same etags!
  • 21. A whole mess of caching: ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization
  • 22. CDNs ● Put content closer to your end-users o and offload HTTP requests from your servers ● Best for static assets ● Same cache control policies apply
  • 23. Full-page caching Client Data Source Varnish No internet standards necessary!
  • 24. Full-page caching: mod_pagespeed Client Data Source mod_pagespeed ● Dynamically rewrites pages with frontend optimizations ● Caches rewritten pages
  • 25. A whole mess of caching: ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization
  • 26. Full-page caching in Django
  • 27. Wait, where is this getting cached? ● Django makes it easy to configure o In-memory o File-based o Memcached o etc.
  • 28. Full-page caching: dynamic pages?
  • 29. Full-page caching: dynamic pages?
  • 30. Fragment caching
  • 31. Full-page caching: dynamic pages?
  • 32. Full-page caching: the ajax solution
  • 33. Object caching def get_item_by_id(key): # Look up the item in our database return session.query(User) .filter_by(id=key) .first()
  • 34. Object caching def get_item_by_id(key): # Check in cache val = mc.get(key) # If exists, return it if val: return val # If not, get the val, store it in the cache val = return session.query(User) .filter_by(id=key) .first() mc.set(key, val) return val
  • 35. Object caching @decorator def cache(expensive_func, key): # Check in cache val = mc.get(key) # If exists, return it if val: return val # If not, get the val, store it in the cache val = expensive_func(key) mc.set(key, val) return val
  • 36. Object caching @cache def get_item_by_id(key): # Look up the item in our database return session.query(User) .filter_by(id=key) .first()
  • 37. Object caching in Django
  • 38. A whole mess of caching: ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization
  • 39. Query caching Client Actual tables Database Query Cache Cached?
  • 40. Query caching mysql> select SQL_CACHE count(*) from traces; +----------+ | count(*) | +----------+ | 3135623 | +----------+ 1 row in set (0.56 sec) mysql> select SQL_CACHE count(*) from traces; +----------+ | count(*) | +----------+ | 3135623 | +----------+ 1 row in set (0.00 sec)
  • 41. Query caching
  • 42. Query caching Uncached Cached
  • 43. Denormalization mysql> select table1.x, table2.y from table1 join table2 on table1.z = table2.q where table1.z > 100; mysql> select table1.x, table1.y from table1 where table1.z > 100;
  • 44. A whole mess of caching: ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization
  • 45. Caching: what can go wrong? ● Invalidation ● Fragmentation ● Stampedes ● Complexity
  • 46. Invalidation Client Data Source Cache Intermediary Update! Write Invalidate
  • 47. Invalidation on page-scale ● Browser cache ● CDN ● Proxy / optimizer ● Application-based o Full-page o Fragment o Object cache ● Database o Query cache o Denormalization More savings, generally more invalidation... Smaller savings, generally less invalidation
  • 48. Fragmentation ● What if I have a lot of different things to cache? o More misses o Potential cache eviction
  • 49. Fragmentation Your pages / objects FrequencyofAccess
  • 50. Fragmentation Your pages / objects FrequencyofAccess
  • 51. Stampedes ● On a cache miss extra work is done ● The result is stored in the cache ● What if multiple simultaneous misses?
  • 52. Stampedes http://allthingsd.com/20080521/stampede-facebook-opens-its-profile-doors/
  • 53. Complexity ● How much caching do I need, and where? ● What is the invalidation process o on data update? on release? ● What happens if the caches fall over? ● How do I debug it?
  • 54. Takeaways ● The ‘how’ of caching: o What are you caching? o Where are you caching it? o How bad is a cache miss? o How and when are you invalidating?
  • 55. Takeaways ● The ‘why’ of caching: o Did it actually get faster? o Is speed worth extra complexity? o Don’t guess – measure! o Always use real-world conditions.
  • 56. Questions? ?
  • 57. Thanks! ● Interested in measuring your Django app’s performance? o Free trial of TraceView: www.appneta.com/products/traceview ● See you at Velocity NYC this fall? ● Twitter: @appneta / @dankosaur
  • 58. Resources ● Django documentation on caching: https://docs.djangoproject.com/en/dev/topics/cache/ ● Varnish caching, via Disqus: http://blog.disqus.com/post/62187806135/scaling-django-to-8- billion-page-views ● Django cache option comparisons: http://codysoyland.com/2010/jan/17/evaluating-django- caching-options/ ● More Django-specific tips: http://www.slideshare.net/csky/where-django-caching-bust-at-the- seams ● Guide to cache-related HTTP headers: http://www.mobify.com/blog/beginners-guide-to-http- cache-headers/ ● Google PageSpeed: https://developers.google.com/speed/pagespeed/module