Scaling Django Web Apps
                 Mike Malone




eu
      con 2009
ro
Hi, I’m Mike.
http://www.flickr.com/photos/kveton/2910536252/
Pownce

     • Large scale
      •   Hundreds of requests/sec

      •   Thousands of DB operations/sec

      •   Million...
Pownce

     • Encountered and eliminated many common
       scaling bottlenecks
     • Real world example of scaling a Dj...
Scalability
Scalability

Scalability is NOT:
     • Speed / Performance
     • Generally affected by language choice
     • Achieved b...
A Scalable Application
import time

def application(environ, start_response):
    time.sleep(10)
    start_response('200 O...
A High Performance Application
def application(environ, start_response):
    remote_addr = environ['REMOTE_ADDR']
    f = ...
Scalability


     A scalable system doesn’t need to change when the
                 size of the problem changes.




eu
...
Scalability

     • Accommodate increased usage
     • Accommodate increased data
     • Maintainable



eu
       con 200...
Scalability

     • Two kinds of scalability
       •   Vertical scalability: buying more powerful
           hardware, re...
Vertical Scalability

     • Costs don’t scale linearly (server that’s twice is
       fast is more than twice as much)
  ...
Vertical Scalability

“   Sky scrapers are special. Normal
    buildings don’t need 10 floor
    foundations. Just build!
 ...
Horizontal Scalability


     The ability to increase a system’s capacity by adding
               more processing units (...
Horizontal Scalability



      It’s how large apps are scaled.




eu
     con 2009                           19
ro
Horizontal Scalability

     • A lot more work to design, build, and maintain
     • Requires some planning, but you don’t...
Caching
Caching

     • Several levels of caching available in Django
       •   Per-site cache: caches every page that doesn’t ha...
Caching

     • Low-level Cache API
       •   Much more flexible, allows you to cache at any
           granularity

     ...
Caching

     • Cache backends:
      •   Memcached

      •   Database caching

      •   Filesystem caching




eu
     ...
Caching



          Use Memcache.



eu
     con 2009             25
ro
Sessions



          Use Memcache.



eu
     con 2009              26
ro
Sessions


                Or Tokyo Cabinet
     http://github.com/ericflo/django-tokyo-sessions/
                    Thank...
Caching
Basic caching comes free with Django:
 from django.core.cache import cache

 class UserProfile(models.Model):
   ....
Caching
Invalidate when a model is saved or deleted:
 from django.core.cache import cache
 from django.db.models import si...
Caching

     • Invalidate post_save, not pre_save
     • Still a small race condition
     • Simple solution, worked for ...
Advanced Caching

     • Memcached’s atomic increment and decrement
       operations are useful for maintaining counts
  ...
Advanced Caching

     • You can still use them if you poke at the
        internals of the cache object a bit
     • cach...
Advanced Caching

     • Other missing cache API
      •   delete_multi & set_multi

      •   append: add data to existin...
Advanced Caching

     • It’s often useful to cache objects ‘forever’ (i.e.,
       until you explicitly invalidate them)
...
The Memcache Backend
class CacheClass(BaseCache):
    def __init__(self, server, params):
        BaseCache.__init__(self,...
The Memcache Backend
     class CacheClass(BaseCache):
         def __init__(self, server, params):
             BaseCache...
Advanced Caching
     • Typical setup has memcached running on web
       servers
     • Pownce web servers were I/O and m...
Monkey Patching core.cache
from django.core.cache import cache
from django.utils.encoding import smart_str
import inspect ...
Advanced Caching

     • Useful tool: automagic single object cache
     • Use a manager to check the cache prior to any
 ...
Advanced Caching


                All this and more at:

     http://github.com/mmalone/django-caching/




eu
     con 2...
Advanced Caching

     • Consistent hashing: hashes cached objects
       in such a way that most objects map to the
     ...
Caching


     Now you’ve made life easier for your DB server,
        next thing to fall over: your app server.




eu
  ...
Load Balancing
Load Balancing
     • Out of the box, Django uses a shared nothing
       architecture
       •   App servers have no sing...
Load Balancing
Spread work between multiple
nodes in a cluster using a load
balancer.
                                  Lo...
Load Balancing
     • Hardware load balancers
      •   Expensive, like $35,000 each, plus maintenance
          contracts...
Load Balancing
     • Most of these are layer 7 proxies, and some
       software balancers do cool things
       •   Cach...
Load Balancing
A common setup for large
operations is to use
redundant layer 4 hardware        Hardware Balancers
balancer...
Load Balancing

     • At Pownce, we used a single Perlbal balancer
       •   Easily handled all of our traffic (hundreds ...
Perlbal Reproxying


     Perlbal reproxying is a really cool, and really poorly
                    documented feature.

...
Perlbal Reproxying
     1. Perlbal receives request
     2. Redirects to App Server
       1. App server checks auth (etc....
Perlbal Reproxying

     • Completely transparent to end user
     • Doesn’t keep large app server instance around
       ...
Perlbal Reproxying
Plus, it’s really easy:
def download(request, filename):
  # Check auth, do your thing
  response = Htt...
Load Balancing


     Best way to reduce load on your app servers: don’t
                 use them to do hard stuff.




e...
Queuing
Queuing
     • A queue is simply a bucket that holds messages
       until they are removed for processing by clients
    ...
Queuing
     • Lots of open source options for queuing
      •   Ghetto Queue (MySQL + Cron)
          •   this is the offi...
Queuing
     • Lots of fancy features: brokers, exchanges,
       routing keys, bindings...
       •   Don’t let that crap...
Queuing

     • Pownce used a simple ghetto queue built on
       MySQL / cron
      •   Problematic if you have multiple ...
Django Standalone Scripts
Consumers need to setup the Django environment

     from django.core.management import setup_en...
THE DATABASE!
The Database

     • Til now we’ve been talking about
       •   Shared nothing

       •   Pushing problems down the stac...
CAP Theorem

     • Three properties of a shared-data system
      •   Consistency: all clients see the same data

      •...
CAP Theorem

     • Big long proof... here’s my version.
     • Empirically, seems to make sense.
     • Eric Brewer
     ...
CAP Theorem

     • The relational database systems we all use were
       built with consistency as their primary goal
  ...
The Database

     • There are lots of non-relational databases
       coming onto the scene
       •   CouchDB

       • ...
The Database

     • Django has no support for
       •   Non-relational databases like CouchDB

       •   Multiple datab...
I Want a Pony

     • Save always saves every field of a model
     • Causes unnecessary contention and more data
       tr...
Denormalization
Denormalization

     • Django encourages normalized data, which is
       usually good
     • But at scale you need to de...
Denormalization

     • Start with a normalized database
     • Selectively denormalize things as they become
       bottl...
Replication
Replication

     • Typical web app is 80 to 90% reads
     • Adding read capacity will get you a long way
     • MySQL Ma...
Replication

     • Django doesn’t make it easy to use multiple
       database connections, but it is possible

     • So...
Replication
1. Create a custom database wrapper by subclassing DatabaseWrapper
class SlaveDatabaseWrapper(DatabaseWrapper)...
Replication
2. Custom QuerySet that uses primary DB for writes
class MultiDBQuerySet(QuerySet):
    ...
    def update(sel...
Replication
3. Custom Manager that uses your custom QuerySet
class SlaveDatabaseManager(db.models.Manager):
    def get_qu...
Replication


                Example on github:
     http://github.com/mmalone/django-multidb/




eu
     con 2009      ...
Replication

     • Goal:
       •   Read-what-you-write consistency for writer

       •   Eventual consistency for every...
Replication


          What happens when you become
                 write saturated?




eu
     con 2009               ...
Federation
Federation

     • Start with Vertical Partitioning: split tables that
       aren’t joined across database servers
      ...
Federation
         django.db.models.base




FAIL!




eu
        con 2009                 83
ro
Federation
       If the Django pony gets kicked every time someon
      uses {% endifnotequal %} I don’t want to know wha...
Federation

     • At some point you’ll need to split a single table
       across databases (e.g., user table)
     • Now...
Profiling, Monitoring &
      Measuring
Know your SQL

     >>> Article.objects.filter(pk=3).query.as_sql()
     ('SELECT quot;app_articlequot;.quot;idquot;, quot...
Know your SQL
     >>> import sqlparse
     >>> def pp_query(qs):
     ...   t = qs.query.as_sql()
     ...   sql = t[0] %...
Know your SQL

     >>> from django.db import connection
     >>> connection.queries
     [{'time': '0.001', 'sql': u'SELE...
Know your SQL

     • It’d be nice if a lightweight stacktrace could be
       done in QuerySet.__init__
       •   Stick ...
Measuring


                   Django Debug Toolbar

     http://github.com/robhudson/django-debug-toolbar/




eu
       ...
Monitoring

     You can’t improve what you don’t measure.
       • Ganglia
       • Munin



eu
        con 2009         ...
Measuring & Monitoring

     • Measure
      •   Server load, CPU usage, I/O

      •   Database QPS

      •   Memcache Q...
All done... Questions?
Contact Me


                 Mike Malone
                mjmalone@gmail.com

                twitter.com/mjmalone




eu
...
Scaling Django
Scaling Django
Scaling Django
Upcoming SlideShare
Loading in...5
×

Scaling Django

21,850

Published on

Published in: Technology
1 Comment
97 Likes
Statistics
Notes
No Downloads
Views
Total Views
21,850
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
778
Comments
1
Likes
97
Embeds 0
No embeds

No notes for slide































































































  • Scaling Django

    1. 1. Scaling Django Web Apps Mike Malone eu con 2009 ro
    2. 2. Hi, I’m Mike.
    3. 3. http://www.flickr.com/photos/kveton/2910536252/
    4. 4. Pownce • Large scale • Hundreds of requests/sec • Thousands of DB operations/sec • Millions of user relationships • Millions of notes • Terabytes of static data eu con 2009 7 ro
    5. 5. Pownce • Encountered and eliminated many common scaling bottlenecks • Real world example of scaling a Django app • Django provides a lot for free • I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way eu con 2009 8 ro
    6. 6. Scalability
    7. 7. Scalability Scalability is NOT: • Speed / Performance • Generally affected by language choice • Achieved by adopting a particular technology eu con 2009 10 ro
    8. 8. A Scalable Application import time def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',) eu con 2009 11 ro
    9. 9. A High Performance Application def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + quot;nquot;) f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines() if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),) eu con 2009 12 ro
    10. 10. Scalability A scalable system doesn’t need to change when the size of the problem changes. eu con 2009 13 ro
    11. 11. Scalability • Accommodate increased usage • Accommodate increased data • Maintainable eu con 2009 14 ro
    12. 12. Scalability • Two kinds of scalability • Vertical scalability: buying more powerful hardware, replacing what you already own • Horizontal scalability: buying additional hardware, supplementing what you already own eu con 2009 15 ro
    13. 13. Vertical Scalability • Costs don’t scale linearly (server that’s twice is fast is more than twice as much) • Inherently limited by current technology • But it’s easy! If you can get away with it, good for you. eu con 2009 16 ro
    14. 14. Vertical Scalability “ Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build! - Cal Henderson eu con 2009 17 ro
    15. 15. Horizontal Scalability The ability to increase a system’s capacity by adding more processing units (servers) eu con 2009 18 ro
    16. 16. Horizontal Scalability It’s how large apps are scaled. eu con 2009 19 ro
    17. 17. Horizontal Scalability • A lot more work to design, build, and maintain • Requires some planning, but you don’t have to do all the work up front • You can scale progressively... • Rest of the presentation is roughly in order eu con 2009 20 ro
    18. 18. Caching
    19. 19. Caching • Several levels of caching available in Django • Per-site cache: caches every page that doesn’t have GET or POST parameters • Per-view cache: caches output of an individual view • Template fragment cache: caches fragments of a template • None of these are that useful if pages are heavily personalized eu con 2009 22 ro
    20. 20. Caching • Low-level Cache API • Much more flexible, allows you to cache at any granularity • At Pownce we typically cached • Individual objects • Lists of object IDs • Hard part is invalidation eu con 2009 23 ro
    21. 21. Caching • Cache backends: • Memcached • Database caching • Filesystem caching eu con 2009 24 ro
    22. 22. Caching Use Memcache. eu con 2009 25 ro
    23. 23. Sessions Use Memcache. eu con 2009 26 ro
    24. 24. Sessions Or Tokyo Cabinet http://github.com/ericflo/django-tokyo-sessions/ Thanks @ericflo eu con 2009 27 ro
    25. 25. Caching Basic caching comes free with Django: from django.core.cache import cache class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles eu con 2009 28 ro
    26. 26. Caching Invalidate when a model is saved or deleted: from django.core.cache import cache from django.db.models import signals def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key) signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile) signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile) eu con 2009 29 ro
    27. 27. Caching • Invalidate post_save, not pre_save • Still a small race condition • Simple solution, worked for Pownce: • Instead of deleting, set the cache key to None for a short period of time • Instead of using set to cache objects, use add, which fails if there’s already something stored for the key eu con 2009 30 ro
    28. 28. Advanced Caching • Memcached’s atomic increment and decrement operations are useful for maintaining counts • But they’re not available in Django 1.0 • Added in 1.1 by ticket #6464 eu con 2009 31 ro
    29. 29. Advanced Caching • You can still use them if you poke at the internals of the cache object a bit • cache._cache is the underlying cache object try: result = cache._cache.incr(cache_key, delta) except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result. return result eu con 2009 32 ro
    30. 30. Advanced Caching • Other missing cache API • delete_multi & set_multi • append: add data to existing key after existing data • prepend: add data to existing key before existing data • cas: store this data, but only if no one has edited it since I fetched it eu con 2009 33 ro
    31. 31. Advanced Caching • It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them) • User and UserProfile • fetched almost every request • rarely change • But Django won’t let you • IMO, this is a bug :( eu con 2009 34 ro
    32. 32. The Memcache Backend class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';')) def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout) eu con 2009 35 ro
    33. 33. The Memcache Backend class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';')) def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout) eu con 2009 36 ro
    34. 34. Advanced Caching • Typical setup has memcached running on web servers • Pownce web servers were I/O and memory bound, not CPU bound • Since we had some spare CPU cycles, we compressed large objects before caching them • The Python memcache library can do this automatically, but the API is not exposed eu con 2009 37 ro
    35. 35. Monkey Patching core.cache from django.core.cache import cache from django.utils.encoding import smart_str import inspect as i if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass eu con 2009 38 ro
    36. 36. Advanced Caching • Useful tool: automagic single object cache • Use a manager to check the cache prior to any single object get by pk • Invalidate assets on save and delete • Eliminated several hundred QPS at Pownce eu con 2009 39 ro
    37. 37. Advanced Caching All this and more at: http://github.com/mmalone/django-caching/ eu con 2009 40 ro
    38. 38. Advanced Caching • Consistent hashing: hashes cached objects in such a way that most objects map to the same node after a node is added or removed. http://www.flickr.com/photos/deepfrozen/2191036528/ eu con 2009 41 ro
    39. 39. Caching Now you’ve made life easier for your DB server, next thing to fall over: your app server. eu con 2009 42 ro
    40. 40. Load Balancing
    41. 41. Load Balancing • Out of the box, Django uses a shared nothing architecture • App servers have no single point of contention • Responsibility pushed down the stack (to DB) • This makes scaling the app layer trivial: just add another server eu con 2009 44 ro
    42. 42. Load Balancing Spread work between multiple nodes in a cluster using a load balancer. Load Balancer • Hardware or software • Layer 7 or Layer 4 App Servers Database eu con 2009 45 ro
    43. 43. Load Balancing • Hardware load balancers • Expensive, like $35,000 each, plus maintenance contracts • Need two for failover / high availability • Software load balancers • Cheap and easy, but more difficult to eliminate as a single point of failure • Lots of options: Perlbal, Pound, HAProxy,Varnish, Nginx eu con 2009 46 ro
    44. 44. Load Balancing • Most of these are layer 7 proxies, and some software balancers do cool things • Caching • Re-proxying • Authentication • URL rewriting eu con 2009 47 ro
    45. 45. Load Balancing A common setup for large operations is to use redundant layer 4 hardware Hardware Balancers balancers in front of a pool of layer 7 software balancers. Software Balancers App Servers eu con 2009 48 ro
    46. 46. Load Balancing • At Pownce, we used a single Perlbal balancer • Easily handled all of our traffic (hundreds of simultaneous connections) • A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines • Plus there were some neat features that we took advantage of eu con 2009 49 ro
    47. 47. Perlbal Reproxying Perlbal reproxying is a really cool, and really poorly documented feature. eu con 2009 50 ro
    48. 48. Perlbal Reproxying 1. Perlbal receives request 2. Redirects to App Server 1. App server checks auth (etc.) 2. Returns HTTP 200 with X- Reproxy-URL header set to internal file server URL 3. File served from file server via Perlbal eu con 2009 51 ro
    49. 49. Perlbal Reproxying • Completely transparent to end user • Doesn’t keep large app server instance around to serve file • Users can’t access files directly (like they could with a 302) eu con 2009 52 ro
    50. 50. Perlbal Reproxying Plus, it’s really easy: def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response eu con 2009 53 ro
    51. 51. Load Balancing Best way to reduce load on your app servers: don’t use them to do hard stuff. eu con 2009 54 ro
    52. 52. Queuing
    53. 53. Queuing • A queue is simply a bucket that holds messages until they are removed for processing by clients • Many expensive operations can be queued and performed asynchronously • User experience doesn’t have to suffer • Tell the user that you’re running the job in the background (e.g., transcoding) • Make it look like the job was done real-time (e.g., note distribution) eu con 2009 56 ro
    54. 54. Queuing • Lots of open source options for queuing • Ghetto Queue (MySQL + Cron) • this is the official name. • Gearman • TheSchwartz • RabbitMQ • Apache ActiveMQ • ZeroMQ eu con 2009 57 ro
    55. 55. Queuing • Lots of fancy features: brokers, exchanges, routing keys, bindings... • Don’t let that crap get you down, this is really simple stuff • Biggest decision: persistence • Does your queue need to be durable and persistent, able to survive a crash? • This requires logging to disk which slows things down, so don’t do it unless you have to eu con 2009 58 ro
    56. 56. Queuing • Pownce used a simple ghetto queue built on MySQL / cron • Problematic if you have multiple consumers pulling jobs from the queue • No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from eu con 2009 59 ro
    57. 57. Django Standalone Scripts Consumers need to setup the Django environment from django.core.management import setup_environ from mysite import settings setup_environ(settings) eu con 2009 60 ro
    58. 58. THE DATABASE!
    59. 59. The Database • Til now we’ve been talking about • Shared nothing • Pushing problems down the stack • But we have to store a persistent and consistent view of our application’s state somewhere • Enter, the database... eu con 2009 62 ro
    60. 60. CAP Theorem • Three properties of a shared-data system • Consistency: all clients see the same data • Availability: all clients can see some version of the data • Partition Tolerance: system properties hold even when the system is partitioned & messages are lost • But you can only have two eu con 2009 63 ro
    61. 61. CAP Theorem • Big long proof... here’s my version. • Empirically, seems to make sense. • Eric Brewer • Professor at University of California, Berkeley • Co-founder and Chief Scientist of Inktomi • Probably smarter than me eu con 2009 64 ro
    62. 62. CAP Theorem • The relational database systems we all use were built with consistency as their primary goal • But at scale our system needs to have high availability and must be partitionable • The RDBMS’s consistency requirements get in our way • Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance eu con 2009 65 ro
    63. 63. The Database • There are lots of non-relational databases coming onto the scene • CouchDB • Cassandra • Tokyo Cabinet • But they’re not that mature, and they aren’t easy to use with Django eu con 2009 66 ro
    64. 64. The Database • Django has no support for • Non-relational databases like CouchDB • Multiple databases (coming soon?) • If you’re looking for a project, plz fix this. • Only advice: don’t get too caught up in trying to duplicate the existing ORM API eu con 2009 67 ro
    65. 65. I Want a Pony • Save always saves every field of a model • Causes unnecessary contention and more data transfer • A better way: • Use descriptors to determine what’s dirty • Only update dirty fields when an object is saved eu con 2009 68 ro
    66. 66. Denormalization
    67. 67. Denormalization • Django encourages normalized data, which is usually good • But at scale you need to denormalize • Corollary: joins are evil • Django makes it really easy to do joins using the ORM, so pay attention eu con 2009 70 ro
    68. 68. Denormalization • Start with a normalized database • Selectively denormalize things as they become bottlenecks • Denormalized counts, copied fields, etc. can be updated in signal handlers eu con 2009 71 ro
    69. 69. Replication
    70. 70. Replication • Typical web app is 80 to 90% reads • Adding read capacity will get you a long way • MySQL Master-Slave replication Read & Write Read only eu con 2009 73 ro
    71. 71. Replication • Django doesn’t make it easy to use multiple database connections, but it is possible • Some caveats • Slave lag interacts with caching in weird ways • You can only save to your primary DB (the one you configure in settings.py) • Unless you get really clever... eu con 2009 74 ro
    72. 72. Replication 1. Create a custom database wrapper by subclassing DatabaseWrapper class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor eu con 2009 75 ro
    73. 73. Replication 2. Custom QuerySet that uses primary DB for writes class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn eu con 2009 76 ro
    74. 74. Replication 3. Custom Manager that uses your custom QuerySet class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query()) def create_query(self): return db.models.sql.Query(self.model, connection) eu con 2009 77 ro
    75. 75. Replication Example on github: http://github.com/mmalone/django-multidb/ eu con 2009 78 ro
    76. 76. Replication • Goal: • Read-what-you-write consistency for writer • Eventual consistency for everyone else • Slave lag screws things up eu con 2009 79 ro
    77. 77. Replication What happens when you become write saturated? eu con 2009 80 ro
    78. 78. Federation
    79. 79. Federation • Start with Vertical Partitioning: split tables that aren’t joined across database servers • Actually pretty easy • Except not with Django eu con 2009 82 ro
    80. 80. Federation django.db.models.base FAIL! eu con 2009 83 ro
    81. 81. Federation If the Django pony gets kicked every time someon uses {% endifnotequal %} I don’t want to know what happens every time django.db.connection is imported. http://www.flickr.com/photos/captainmidnight/811458621/ eu con 2009 84 ro
    82. 82. Federation • At some point you’ll need to split a single table across databases (e.g., user table) • Now auto-increment won’t work • But Django uses auto-increment for PKs • ugh • Pluggable UUID backend? eu con 2009 85 ro
    83. 83. Profiling, Monitoring & Measuring
    84. 84. Know your SQL >>> Article.objects.filter(pk=3).query.as_sql() ('SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot; WHERE quot;app_articlequot;.quot;idquot; = %s ', (3,)) eu con 2009 87 ro
    85. 85. Know your SQL >>> import sqlparse >>> def pp_query(qs): ... t = qs.query.as_sql() ... sql = t[0] % t[1] ... print sqlparse.format(sql, reindent=True, keyword_case='upper') ... >>> pp_query(Article.objects.filter(pk=3)) SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot; WHERE quot;app_articlequot;.quot;idquot; = 3 eu con 2009 88 ro
    86. 86. Know your SQL >>> from django.db import connection >>> connection.queries [{'time': '0.001', 'sql': u'SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot;'}] eu con 2009 89 ro
    87. 87. Know your SQL • It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__ • Stick the result in connection.queries • Now we know where the query originated eu con 2009 90 ro
    88. 88. Measuring Django Debug Toolbar http://github.com/robhudson/django-debug-toolbar/ eu con 2009 91 ro
    89. 89. Monitoring You can’t improve what you don’t measure. • Ganglia • Munin eu con 2009 92 ro
    90. 90. Measuring & Monitoring • Measure • Server load, CPU usage, I/O • Database QPS • Memcache QPS, hit rate, evictions • Queue lengths • Anything else interesting eu con 2009 93 ro
    91. 91. All done... Questions?
    92. 92. Contact Me Mike Malone mjmalone@gmail.com twitter.com/mjmalone eu con 2009 95 ro
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×