Scaling Django

  • 4,825 views
Uploaded on

by Mike Malone, presented at EuroDjangoCon

by Mike Malone, presented at EuroDjangoCon

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,825
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
171
Comments
0
Likes
31

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scaling Django Web Apps Mike Malone euro con 2009 Tuesday, May 5, 2009
  • 2. Hi, I’m Mike. Tuesday, May 5, 2009
  • 3. Tuesday, May 5, 2009
  • 4. Tuesday, May 5, 2009
  • 5. http://www.flickr.com/photos/kveton/2910536252/ Tuesday, May 5, 2009
  • 6. Tuesday, May 5, 2009
  • 7. Pownce • Large scale • Hundreds of requests/sec • Thousands of DB operations/sec • Millions of user relationships • Millions of notes • Terabytes of static data euro con 2009 7 Tuesday, May 5, 2009
  • 8. Pownce • Encountered and eliminated many common scaling bottlenecks • Real world example of scaling a Django app • Django provides a lot for free • I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way euro con 2009 8 Tuesday, May 5, 2009
  • 9. Scalability Tuesday, May 5, 2009
  • 10. Scalability Scalability is NOT: • Speed / Performance • Generally affected by language choice • Achieved by adopting a particular technology euro con 2009 10 Tuesday, May 5, 2009
  • 11. A Scalable Application import time def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',) euro con 2009 11 Tuesday, May 5, 2009
  • 12. A High Performance Application def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + quot;nquot;) f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines() if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),) euro con 2009 12 Tuesday, May 5, 2009
  • 13. Scalability A scalable system doesn’t need to change when the size of the problem changes. euro con 2009 13 Tuesday, May 5, 2009
  • 14. Scalability • Accommodate increased usage • Accommodate increased data • Maintainable euro con 2009 14 Tuesday, May 5, 2009
  • 15. Scalability • Two kinds of scalability • Vertical scalability: buying more powerful hardware, replacing what you already own • Horizontal scalability: buying additional hardware, supplementing what you already own euro con 2009 15 Tuesday, May 5, 2009
  • 16. Vertical Scalability • Costs don’t scale linearly (server that’s twice is fast is more than twice as much) • Inherently limited by current technology • But it’s easy! If you can get away with it, good for you. euro con 2009 16 Tuesday, May 5, 2009
  • 17. Vertical Scalability “ Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build! - Cal Henderson euro con 2009 17 Tuesday, May 5, 2009
  • 18. Horizontal Scalability The ability to increase a system’s capacity by adding more processing units (servers) euro con 2009 18 Tuesday, May 5, 2009
  • 19. Horizontal Scalability It’s how large apps are scaled. euro con 2009 19 Tuesday, May 5, 2009
  • 20. Horizontal Scalability • A lot more work to design, build, and maintain • Requires some planning, but you don’t have to do all the work up front • You can scale progressively... • Rest of the presentation is roughly in order euro con 2009 20 Tuesday, May 5, 2009
  • 21. Caching Tuesday, May 5, 2009
  • 22. Caching • Several levels of caching available in Django • Per-site cache: caches every page that doesn’t have GET or POST parameters • Per-view cache: caches output of an individual view • Template fragment cache: caches fragments of a template • None of these are that useful if pages are heavily personalized euro con 2009 22 Tuesday, May 5, 2009
  • 23. Caching • Low-level Cache API • Much more flexible, allows you to cache at any granularity • At Pownce we typically cached • Individual objects • Lists of object IDs • Hard part is invalidation euro con 2009 23 Tuesday, May 5, 2009
  • 24. Caching • Cache backends: • Memcached • Database caching • Filesystem caching euro con 2009 24 Tuesday, May 5, 2009
  • 25. Caching Use Memcache. euro con 2009 25 Tuesday, May 5, 2009
  • 26. Sessions Use Memcache. euro con 2009 26 Tuesday, May 5, 2009
  • 27. Sessions Or Tokyo Cabinet http://github.com/ericflo/django-tokyo-sessions/ Thanks @ericflo euro con 2009 27 Tuesday, May 5, 2009
  • 28. Caching Basic caching comes free with Django: from django.core.cache import cache class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles euro con 2009 28 Tuesday, May 5, 2009
  • 29. Caching Invalidate when a model is saved or deleted: from django.core.cache import cache from django.db.models import signals def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key) signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile) signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile) euro con 2009 29 Tuesday, May 5, 2009
  • 30. Caching • Invalidate post_save, not pre_save • Still a small race condition • Simple solution, worked for Pownce: • Instead of deleting, set the cache key to None for a short period of time • Instead of using set to cache objects, use add, which fails if there’s already something stored for the key euro con 2009 30 Tuesday, May 5, 2009
  • 31. Advanced Caching • Memcached’s atomic increment and decrement operations are useful for maintaining counts • But they’re not available in Django 1.0 • Added in 1.1 by ticket #6464 euro con 2009 31 Tuesday, May 5, 2009
  • 32. Advanced Caching • You can still use them if you poke at the internals of the cache object a bit • cache._cache is the underlying cache object try: result = cache._cache.incr(cache_key, delta) except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result. return result euro con 2009 32 Tuesday, May 5, 2009
  • 33. Advanced Caching • Other missing cache API • delete_multi & set_multi • append: add data to existing key after existing data • prepend: add data to existing key before existing data • cas: store this data, but only if no one has edited it since I fetched it euro con 2009 33 Tuesday, May 5, 2009
  • 34. Advanced Caching • It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them) • User and UserProfile • fetched almost every request • rarely change • But Django won’t let you • IMO, this is a bug :( euro con 2009 34 Tuesday, May 5, 2009
  • 35. The Memcache Backend class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';')) def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout) euro con 2009 35 Tuesday, May 5, 2009
  • 36. The Memcache Backend class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';')) def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout) euro con 2009 36 Tuesday, May 5, 2009
  • 37. Advanced Caching • Typical setup has memcached running on web servers • Pownce web servers were I/O and memory bound, not CPU bound • Since we had some spare CPU cycles, we compressed large objects before caching them • The Python memcache library can do this automatically, but the API is not exposed euro con 2009 37 Tuesday, May 5, 2009
  • 38. Monkey Patching core.cache from django.core.cache import cache from django.utils.encoding import smart_str import inspect as i if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass euro con 2009 38 Tuesday, May 5, 2009
  • 39. Advanced Caching • Useful tool: automagic single object cache • Use a manager to check the cache prior to any single object get by pk • Invalidate assets on save and delete • Eliminated several hundred QPS at Pownce euro con 2009 39 Tuesday, May 5, 2009
  • 40. Advanced Caching All this and more at: http://github.com/mmalone/django-caching/ euro con 2009 40 Tuesday, May 5, 2009
  • 41. Advanced Caching • Consistent hashing: hashes cached objects in such a way that most objects map to the same node after a node is added or removed. http://www.flickr.com/photos/deepfrozen/2191036528/ euro con 2009 41 Tuesday, May 5, 2009
  • 42. Caching Now you’ve made life easier for your DB server, next thing to fall over: your app server. euro con 2009 42 Tuesday, May 5, 2009
  • 43. Load Balancing Tuesday, May 5, 2009
  • 44. Load Balancing • Out of the box, Django uses a shared nothing architecture • App servers have no single point of contention • Responsibility pushed down the stack (to DB) • This makes scaling the app layer trivial: just add another server euro con 2009 44 Tuesday, May 5, 2009
  • 45. Load Balancing Spread work between multiple nodes in a cluster using a load balancer. Load Balancer • Hardware or software • Layer 7 or Layer 4 App Servers Database euro con 2009 45 Tuesday, May 5, 2009
  • 46. Load Balancing • Hardware load balancers • Expensive, like $35,000 each, plus maintenance contracts • Need two for failover / high availability • Software load balancers • Cheap and easy, but more difficult to eliminate as a single point of failure • Lots of options: Perlbal, Pound, HAProxy,Varnish, Nginx euro con 2009 46 Tuesday, May 5, 2009
  • 47. Load Balancing • Most of these are layer 7 proxies, and some software balancers do cool things • Caching • Re-proxying • Authentication • URL rewriting euro con 2009 47 Tuesday, May 5, 2009
  • 48. Load Balancing A common setup for large operations is to use redundant layer 4 hardware Hardware Balancers balancers in front of a pool of layer 7 software balancers. Software Balancers App Servers euro con 2009 48 Tuesday, May 5, 2009
  • 49. Load Balancing • At Pownce, we used a single Perlbal balancer • Easily handled all of our traffic (hundreds of simultaneous connections) • A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines • Plus there were some neat features that we took advantage of euro con 2009 49 Tuesday, May 5, 2009
  • 50. Perlbal Reproxying Perlbal reproxying is a really cool, and really poorly documented feature. euro con 2009 50 Tuesday, May 5, 2009
  • 51. Perlbal Reproxying 1. Perlbal receives request 2. Redirects to App Server 1. App server checks auth (etc.) 2. Returns HTTP 200 with X- Reproxy-URL header set to internal file server URL 3. File served from file server via Perlbal euro con 2009 51 Tuesday, May 5, 2009
  • 52. Perlbal Reproxying • Completely transparent to end user • Doesn’t keep large app server instance around to serve file • Users can’t access files directly (like they could with a 302) euro con 2009 52 Tuesday, May 5, 2009
  • 53. Perlbal Reproxying Plus, it’s really easy: def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response euro con 2009 53 Tuesday, May 5, 2009
  • 54. Load Balancing Best way to reduce load on your app servers: don’t use them to do hard stuff. euro con 2009 54 Tuesday, May 5, 2009
  • 55. Queuing Tuesday, May 5, 2009
  • 56. Queuing • A queue is simply a bucket that holds messages until they are removed for processing by clients • Many expensive operations can be queued and performed asynchronously • User experience doesn’t have to suffer • Tell the user that you’re running the job in the background (e.g., transcoding) • Make it look like the job was done real-time (e.g., note distribution) euro con 2009 56 Tuesday, May 5, 2009
  • 57. Queuing • Lots of open source options for queuing • Ghetto Queue (MySQL + Cron) • this is the official name. • Gearman • TheSchwartz • RabbitMQ • Apache ActiveMQ • ZeroMQ euro con 2009 57 Tuesday, May 5, 2009
  • 58. Queuing • Lots of fancy features: brokers, exchanges, routing keys, bindings... • Don’t let that crap get you down, this is really simple stuff • Biggest decision: persistence • Does your queue need to be durable and persistent, able to survive a crash? • This requires logging to disk which slows things down, so don’t do it unless you have to euro con 2009 58 Tuesday, May 5, 2009
  • 59. Queuing • Pownce used a simple ghetto queue built on MySQL / cron • Problematic if you have multiple consumers pulling jobs from the queue • No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from euro con 2009 59 Tuesday, May 5, 2009
  • 60. Django Standalone Scripts Consumers need to setup the Django environment from django.core.management import setup_environ from mysite import settings setup_environ(settings) euro con 2009 60 Tuesday, May 5, 2009
  • 61. THE DATABASE! Tuesday, May 5, 2009
  • 62. The Database • Til now we’ve been talking about • Shared nothing • Pushing problems down the stack • But we have to store a persistent and consistent view of our application’s state somewhere • Enter, the database... euro con 2009 62 Tuesday, May 5, 2009
  • 63. CAP Theorem • Three properties of a shared-data system • Consistency: all clients see the same data • Availability: all clients can see some version of the data • Partition Tolerance: system properties hold even when the system is partitioned & messages are lost • But you can only have two euro con 2009 63 Tuesday, May 5, 2009
  • 64. CAP Theorem • Big long proof... here’s my version. • Empirically, seems to make sense. • Eric Brewer • Professor at University of California, Berkeley • Co-founder and Chief Scientist of Inktomi • Probably smarter than me euro con 2009 64 Tuesday, May 5, 2009
  • 65. CAP Theorem • The relational database systems we all use were built with consistency as their primary goal • But at scale our system needs to have high availability and must be partitionable • The RDBMS’s consistency requirements get in our way • Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance euro con 2009 65 Tuesday, May 5, 2009
  • 66. The Database • There are lots of non-relational databases coming onto the scene • CouchDB • Cassandra • Tokyo Cabinet • But they’re not that mature, and they aren’t easy to use with Django euro con 2009 66 Tuesday, May 5, 2009
  • 67. The Database • Django has no support for • Non-relational databases like CouchDB • Multiple databases (coming soon?) • If you’re looking for a project, plz fix this. • Only advice: don’t get too caught up in trying to duplicate the existing ORM API euro con 2009 67 Tuesday, May 5, 2009
  • 68. I Want a Pony • Save always saves every field of a model • Causes unnecessary contention and more data transfer • A better way: • Use descriptors to determine what’s dirty • Only update dirty fields when an object is saved euro con 2009 68 Tuesday, May 5, 2009
  • 69. Denormalization Tuesday, May 5, 2009
  • 70. Denormalization • Django encourages normalized data, which is usually good • But at scale you need to denormalize • Corollary: joins are evil • Django makes it really easy to do joins using the ORM, so pay attention euro con 2009 70 Tuesday, May 5, 2009
  • 71. Denormalization • Start with a normalized database • Selectively denormalize things as they become bottlenecks • Denormalized counts, copied fields, etc. can be updated in signal handlers euro con 2009 71 Tuesday, May 5, 2009
  • 72. Replication Tuesday, May 5, 2009
  • 73. Replication • Typical web app is 80 to 90% reads • Adding read capacity will get you a long way • MySQL Master-Slave replication Read & Write Read only euro con 2009 73 Tuesday, May 5, 2009
  • 74. Replication • Django doesn’t make it easy to use multiple database connections, but it is possible • Some caveats • Slave lag interacts with caching in weird ways • You can only save to your primary DB (the one you configure in settings.py) • Unless you get really clever... euro con 2009 74 Tuesday, May 5, 2009
  • 75. Replication 1. Create a custom database wrapper by subclassing DatabaseWrapper class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor euro con 2009 75 Tuesday, May 5, 2009
  • 76. Replication 2. Custom QuerySet that uses primary DB for writes class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn euro con 2009 76 Tuesday, May 5, 2009
  • 77. Replication 3. Custom Manager that uses your custom QuerySet class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query()) def create_query(self): return db.models.sql.Query(self.model, connection) euro con 2009 77 Tuesday, May 5, 2009
  • 78. Replication Example on github: http://github.com/mmalone/django-multidb/ euro con 2009 78 Tuesday, May 5, 2009
  • 79. Replication • Goal: • Read-what-you-write consistency for writer • Eventual consistency for everyone else • Slave lag screws things up euro con 2009 79 Tuesday, May 5, 2009
  • 80. Replication What happens when you become write saturated? euro con 2009 80 Tuesday, May 5, 2009
  • 81. Federation Tuesday, May 5, 2009
  • 82. Federation • Start with Vertical Partitioning: split tables that aren’t joined across database servers • Actually pretty easy • Except not with Django euro con 2009 82 Tuesday, May 5, 2009
  • 83. Federation django.db.models.base FAIL! euro con 2009 83 Tuesday, May 5, 2009
  • 84. Federation If the Django pony gets kicked every time someon uses {% endifnotequal %} I don’t want to know what happens every time django.db.connection is imported. http://www.flickr.com/photos/captainmidnight/811458621/ euro con 2009 84 Tuesday, May 5, 2009
  • 85. Federation • At some point you’ll need to split a single table across databases (e.g., user table) • Now auto-increment won’t work • But Django uses auto-increment for PKs • ugh • Pluggable UUID backend? euro con 2009 85 Tuesday, May 5, 2009
  • 86. Profiling, Monitoring & Measuring Tuesday, May 5, 2009
  • 87. Know your SQL >>> Article.objects.filter(pk=3).query.as_sql() ('SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot; WHERE quot;app_articlequot;.quot;idquot; = %s ', (3,)) euro con 2009 87 Tuesday, May 5, 2009
  • 88. Know your SQL >>> import sqlparse >>> def pp_query(qs): ... t = qs.query.as_sql() ... sql = t[0] % t[1] ... print sqlparse.format(sql, reindent=True, keyword_case='upper') ... >>> pp_query(Article.objects.filter(pk=3)) SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot; WHERE quot;app_articlequot;.quot;idquot; = 3 euro con 2009 88 Tuesday, May 5, 2009
  • 89. Know your SQL >>> from django.db import connection >>> connection.queries [{'time': '0.001', 'sql': u'SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot;'}] euro con 2009 89 Tuesday, May 5, 2009
  • 90. Know your SQL • It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__ • Stick the result in connection.queries • Now we know where the query originated euro con 2009 90 Tuesday, May 5, 2009
  • 91. Measuring Django Debug Toolbar http://github.com/robhudson/django-debug-toolbar/ euro con 2009 91 Tuesday, May 5, 2009
  • 92. Monitoring You can’t improve what you don’t measure. • Ganglia • Munin euro con 2009 92 Tuesday, May 5, 2009
  • 93. Measuring & Monitoring • Measure • Server load, CPU usage, I/O • Database QPS • Memcache QPS, hit rate, evictions • Queue lengths • Anything else interesting euro con 2009 93 Tuesday, May 5, 2009
  • 94. All done... Questions? Tuesday, May 5, 2009
  • 95. Contact Me Mike Malone mjmalone@gmail.com twitter.com/mjmalone euro con 2009 95 Tuesday, May 5, 2009