Scaling Django

Scaling Django Web Apps
Mike Malone

eu
con 2009
ro

http://www.ﬂickr.com/photos/kveton/2910536252/

Pownce

• Large scale
• Hundreds of requests/sec

• Thousands of DB operations/sec

• Millions of user relationships

• Millions of notes

• Terabytes of static data

eu
con 2009 7
ro

Pownce

• Encountered and eliminated many common
scaling bottlenecks
• Real world example of scaling a Django app
• Django provides a lot for free

• I’ll be focusing on what you have to build yourself,
and the rare places where Django got in the way

eu
con 2009 8
ro

Scalability

Scalability is NOT:
• Speed / Performance
• Generally affected by language choice
• Achieved by adopting a particular technology

eu
con 2009 10
ro

A Scalable Application
import time

def application(environ, start_response):
time.sleep(10)
start_response('200 OK', [('content-type', 'text/plain')])
return ('Hello, world!',)

eu
con 2009 11
ro

A High Performance Application
def application(environ, start_response):
remote_addr = environ['REMOTE_ADDR']
f = open('access-log', 'a+')
f.write(remote_addr + quot;nquot;)
f.flush()
f.seek(0)
hits = sum(1 for l in f.xreadlines()
if l.strip() == remote_addr)
f.close()
start_response('200 OK', [('content-type', 'text/plain')])
return (str(hits),)

eu
con 2009 12
ro

Scalability

A scalable system doesn’t need to change when the
size of the problem changes.

eu
con 2009 13
ro

Scalability

• Accommodate increased usage
• Accommodate increased data
• Maintainable

eu
con 2009 14
ro

Scalability

• Two kinds of scalability
• Vertical scalability: buying more powerful
hardware, replacing what you already own

• Horizontal scalability: buying additional
hardware, supplementing what you already own

eu
con 2009 15
ro

Vertical Scalability

• Costs don’t scale linearly (server that’s twice is
fast is more than twice as much)
• Inherently limited by current technology
• But it’s easy! If you can get away with it, good
for you.

eu
con 2009 16
ro

Vertical Scalability

“ Sky scrapers are special. Normal
buildings don’t need 10 ﬂoor
foundations. Just build!
- Cal Henderson

eu
con 2009 17
ro

Horizontal Scalability

The ability to increase a system’s capacity by adding
more processing units (servers)

eu
con 2009 18
ro


It’s how large apps are scaled.

eu
con 2009 19
ro


• A lot more work to design, build, and maintain
• Requires some planning, but you don’t have to
do all the work up front
• You can scale progressively...
• Rest of the presentation is roughly in order

eu
con 2009 20
ro

Caching

• Several levels of caching available in Django
• Per-site cache: caches every page that doesn’t have
GET or POST parameters

• Per-view cache: caches output of an individual view

• Template fragment cache: caches fragments of a
template

• None of these are that useful if pages are
heavily personalized

eu
con 2009 22
ro

Caching

• Low-level Cache API
• Much more ﬂexible, allows you to cache at any
granularity

• At Pownce we typically cached
• Individual objects

• Lists of object IDs

• Hard part is invalidation
eu
con 2009 23
ro

Caching

• Cache backends:
• Memcached

• Database caching

• Filesystem caching

eu
con 2009 24
ro

Caching

Use Memcache.

eu
con 2009 25
ro

Sessions

Use Memcache.

eu
con 2009 26
ro

Sessions

Or Tokyo Cabinet
http://github.com/ericﬂo/django-tokyo-sessions/
Thanks @ericﬂo

eu
con 2009 27
ro

Caching
Basic caching comes free with Django:
from django.core.cache import cache

class UserProfile(models.Model):
...
def get_social_network_profiles(self):
cache_key = ‘networks_for_%s’ % self.user.id
profiles = cache.get(cache_key)
if profiles is None:
profiles = self.user.social_network_profiles.all()
cache.set(cache_key, profiles)
return profiles

eu
con 2009 28
ro

Caching
Invalidate when a model is saved or deleted:
from django.db.models import signals

def nuke_social_network_cache(self, instance, **kwargs):
cache_key = ‘networks_for_%s’ % self.instance.user_id
cache.delete(cache_key)

signals.post_save.connect(nuke_social_network_cache,
sender=SocialNetworkProfile)
signals.post_delete.connect(nuke_social_network_cache,
sender=SocialNetworkProfile)

eu
con 2009 29
ro

Caching

• Invalidate post_save, not pre_save
• Still a small race condition
• Simple solution, worked for Pownce:
• Instead of deleting, set the cache key to None for a
short period of time

• Instead of using set to cache objects, use add, which
fails if there’s already something stored for the key

eu
con 2009 30
ro

Advanced Caching

• Memcached’s atomic increment and decrement
operations are useful for maintaining counts
• But they’re not available in Django 1.0
• Added in 1.1 by ticket #6464

eu
con 2009 31
ro

Advanced Caching

• You can still use them if you poke at the
internals of the cache object a bit
• cache._cache is the underlying cache object
try:
result = cache._cache.incr(cache_key, delta)
except ValueError: # nonexistent key raises ValueError
# Do it the hard way, store the result.
return result

eu
con 2009 32
ro

Advanced Caching

• Other missing cache API
• delete_multi & set_multi

• append: add data to existing key after existing data

• prepend: add data to existing key before existing
data

• cas: store this data, but only if no one has edited it
since I fetched it

eu
con 2009 33
ro

Advanced Caching

• It’s often useful to cache objects ‘forever’ (i.e.,
until you explicitly invalidate them)
• User and UserProﬁle
• fetched almost every request

• rarely change

• But Django won’t let you
• IMO, this is a bug :(
eu
con 2009 34
ro

The Memcache Backend
class CacheClass(BaseCache):
def __init__(self, server, params):
BaseCache.__init__(self, params)
self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=0):
if isinstance(value, unicode):
value = value.encode('utf-8')
return self._cache.add(smart_str(key), value,
timeout or self.default_timeout)

eu
con 2009 35
ro

The Memcache Backend
class CacheClass(BaseCache):
def __init__(self, server, params):
BaseCache.__init__(self, params)
self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=None):
if timeout is None:
timeout = self.default_timeout
return self._cache.add(smart_str(key), value,
timeout)

eu
con 2009 36
ro

Advanced Caching
• Typical setup has memcached running on web
servers
• Pownce web servers were I/O and memory
bound, not CPU bound
• Since we had some spare CPU cycles, we
compressed large objects before caching them
• The Python memcache library can do this
automatically, but the API is not exposed

eu
con 2009 37
ro

Monkey Patching core.cache
from django.utils.encoding import smart_str
import inspect as i

if 'min_compress_len' in i.getargspec(cache._cache.set)[0]:
class CacheClass(cache.__class__):
def set(self, key, value, timeout=None,
min_compress_len=150000):
if timeout is None:
timeout = self.default_timeout
return self._cache.set(smart_str(key), value,
timeout, min_compress_len)
cache.__class__ = CacheClass

eu
con 2009 38
ro

Advanced Caching

• Useful tool: automagic single object cache
• Use a manager to check the cache prior to any
single object get by pk
• Invalidate assets on save and delete
• Eliminated several hundred QPS at Pownce

eu
con 2009 39
ro

Advanced Caching

All this and more at:

http://github.com/mmalone/django-caching/

eu
con 2009 40
ro

Advanced Caching

• Consistent hashing: hashes cached objects
in such a way that most objects map to the
same node after a node is added or removed.

http://www.ﬂickr.com/photos/deepfrozen/2191036528/
eu
con 2009 41
ro

Caching

Now you’ve made life easier for your DB server,
next thing to fall over: your app server.

eu
con 2009 42
ro

Load Balancing
• Out of the box, Django uses a shared nothing
architecture
• App servers have no single point of contention

• Responsibility pushed down the stack (to DB)

• This makes scaling the app layer trivial: just add
another server

eu
con 2009 44
ro

Load Balancing
Spread work between multiple
nodes in a cluster using a load
balancer.
Load Balancer
• Hardware or software
• Layer 7 or Layer 4
App Servers

Database

eu
con 2009 45
ro

Load Balancing
• Hardware load balancers
• Expensive, like $35,000 each, plus maintenance
contracts

• Need two for failover / high availability

• Software load balancers
• Cheap and easy, but more difﬁcult to eliminate as a
single point of failure

• Lots of options: Perlbal, Pound, HAProxy,Varnish,
Nginx
eu
con 2009 46
ro

Load Balancing
• Most of these are layer 7 proxies, and some
software balancers do cool things
• Caching

• Re-proxying

• Authentication

• URL rewriting

eu
con 2009 47
ro

Load Balancing
A common setup for large
operations is to use
redundant layer 4 hardware Hardware Balancers
balancers in front of a pool of
layer 7 software balancers. Software
Balancers

App
Servers

eu
con 2009 48
ro

Load Balancing

• At Pownce, we used a single Perlbal balancer
• Easily handled all of our trafﬁc (hundreds of
simultaneous connections)

• A SPOF, but we didn’t have $100,000 for black box
solutions, and weren’t worried about service
guarantees beyond three or four nines

• Plus there were some neat features that we took
advantage of

eu
con 2009 49
ro

Perlbal Reproxying

Perlbal reproxying is a really cool, and really poorly
documented feature.

eu
con 2009 50
ro

Perlbal Reproxying
1. Perlbal receives request
2. Redirects to App Server
1. App server checks auth (etc.)
2. Returns HTTP 200 with X-
Reproxy-URL header set to
internal ﬁle server URL
3. File served from ﬁle server via
Perlbal

eu
con 2009 51
ro

Perlbal Reproxying

• Completely transparent to end user
• Doesn’t keep large app server instance around
to serve ﬁle
• Users can’t access ﬁles directly (like they could
with a 302)

eu
con 2009 52
ro

Perlbal Reproxying
Plus, it’s really easy:
def download(request, filename):
# Check auth, do your thing
response = HttpResponse()
response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename)
return response

eu
con 2009 53
ro

Load Balancing

Best way to reduce load on your app servers: don’t
use them to do hard stuff.

eu
con 2009 54
ro

Queuing
• A queue is simply a bucket that holds messages
until they are removed for processing by clients
• Many expensive operations can be queued and
performed asynchronously
• User experience doesn’t have to suffer
• Tell the user that you’re running the job in the
background (e.g., transcoding)

• Make it look like the job was done real-time (e.g.,
note distribution)
eu
con 2009 56
ro

Queuing
• Lots of open source options for queuing
• Ghetto Queue (MySQL + Cron)
• this is the ofﬁcial name.

• Gearman

• TheSchwartz

• RabbitMQ

• Apache ActiveMQ

• ZeroMQ
eu
con 2009 57
ro

Queuing
• Lots of fancy features: brokers, exchanges,
routing keys, bindings...
• Don’t let that crap get you down, this is really
simple stuff

• Biggest decision: persistence
• Does your queue need to be durable and
persistent, able to survive a crash?

• This requires logging to disk which slows things
down, so don’t do it unless you have to
eu
con 2009 58
ro

Queuing

• Pownce used a simple ghetto queue built on
MySQL / cron
• Problematic if you have multiple consumers pulling
jobs from the queue

• No point in reinventing the wheel, there are
dozens of battle-tested open source queues to
choose from

eu
con 2009 59
ro

Django Standalone Scripts
Consumers need to setup the Django environment

from django.core.management import setup_environ
from mysite import settings

setup_environ(settings)

eu
con 2009 60
ro

The Database

• Til now we’ve been talking about
• Shared nothing

• Pushing problems down the stack

• But we have to store a persistent and
consistent view of our application’s state
somewhere
• Enter, the database...
eu
con 2009 62
ro

CAP Theorem

• Three properties of a shared-data system
• Consistency: all clients see the same data

• Availability: all clients can see some version of
the data

• Partition Tolerance: system properties hold
even when the system is partitioned & messages
are lost

• But you can only have two
eu
con 2009 63
ro

CAP Theorem

• Big long proof... here’s my version.
• Empirically, seems to make sense.
• Eric Brewer
• Professor at University of California, Berkeley

• Co-founder and Chief Scientist of Inktomi

• Probably smarter than me

eu
con 2009 64
ro

CAP Theorem

• The relational database systems we all use were
built with consistency as their primary goal
• But at scale our system needs to have high
availability and must be partitionable
• The RDBMS’s consistency requirements get in our
way

• Most sharding / federation schemes are kludges
that trade consistency for availability & partition
tolerance

eu
con 2009 65
ro

The Database

• There are lots of non-relational databases
coming onto the scene
• CouchDB

• Cassandra

• Tokyo Cabinet

• But they’re not that mature, and they aren’t easy
to use with Django

eu
con 2009 66
ro

The Database

• Django has no support for
• Non-relational databases like CouchDB

• Multiple databases (coming soon?)

• If you’re looking for a project, plz ﬁx this.
• Only advice: don’t get too caught up in trying to
duplicate the existing ORM API

eu
con 2009 67
ro

I Want a Pony

• Save always saves every ﬁeld of a model
• Causes unnecessary contention and more data
transfer
• A better way:
• Use descriptors to determine what’s dirty

• Only update dirty ﬁelds when an object is saved

eu
con 2009 68
ro

Denormalization

• Django encourages normalized data, which is
usually good
• But at scale you need to denormalize
• Corollary: joins are evil

• Django makes it really easy to do joins using the
ORM, so pay attention

eu
con 2009 70
ro

Denormalization

• Start with a normalized database
• Selectively denormalize things as they become
bottlenecks
• Denormalized counts, copied ﬁelds, etc. can be
updated in signal handlers

eu
con 2009 71
ro

Replication

• Typical web app is 80 to 90% reads
• Adding read capacity will get you a long way
• MySQL Master-Slave replication

Read & Write

Read only

eu
con 2009 73
ro

Replication

• Django doesn’t make it easy to use multiple
database connections, but it is possible

• Some caveats
• Slave lag interacts with caching in weird ways

• You can only save to your primary DB (the one
you conﬁgure in settings.py)
• Unless you get really clever...

eu
con 2009 74
ro

Replication
1. Create a custom database wrapper by subclassing DatabaseWrapper
class SlaveDatabaseWrapper(DatabaseWrapper):
def _cursor(self, settings):
if not self._valid_connection():
kwargs = {
'conv': django_conversions,
'charset': 'utf8',
'use_unicode': True,
}
kwargs = pick_random_slave(settings.SLAVE_DATABASES)
self.connection = Database.connect(**kwargs)
...
cursor = CursorWrapper(self.connection.cursor())
return cursor

eu
con 2009 75
ro

Replication
2. Custom QuerySet that uses primary DB for writes
class MultiDBQuerySet(QuerySet):
...
def update(self, **kwargs):
slave_conn = self.query.connection
self.query.connection = default_connection
super(MultiDBQuerySet, self).update(**kwargs)
self.query.connection = slave_conn

eu
con 2009 76
ro

Replication
3. Custom Manager that uses your custom QuerySet
class SlaveDatabaseManager(db.models.Manager):
def get_query_set(self):
return MultiDBQuerySet(self.model,
query=self.create_query())

def create_query(self):
return db.models.sql.Query(self.model, connection)

eu
con 2009 77
ro

Replication

Example on github:
http://github.com/mmalone/django-multidb/

eu
con 2009 78
ro

Replication

• Goal:
• Read-what-you-write consistency for writer

• Eventual consistency for everyone else

• Slave lag screws things up

eu
con 2009 79
ro

Replication

What happens when you become
write saturated?

eu
con 2009 80
ro

Federation

• Start with Vertical Partitioning: split tables that
aren’t joined across database servers
• Actually pretty easy

• Except not with Django

eu
con 2009 82
ro

Federation
django.db.models.base

FAIL!

eu
con 2009 83
ro

Federation
If the Django pony gets kicked every time someon
uses {% endifnotequal %} I don’t want to know what
happens every time django.db.connection is imported.

http://www.ﬂickr.com/photos/captainmidnight/811458621/
eu
con 2009 84
ro

Federation

• At some point you’ll need to split a single table
across databases (e.g., user table)
• Now auto-increment won’t work
• But Django uses auto-increment for PKs
• ugh

• Pluggable UUID backend?

eu
con 2009 85
ro

Proﬁling, Monitoring &
Measuring

Know your SQL

>>> Article.objects.filter(pk=3).query.as_sql()
('SELECT quot;app_articlequot;.quot;idquot;, quot;app_articlequot;.quot;namequot;,
quot;app_articlequot;.quot;author_idquot; FROM quot;app_articlequot; WHERE
quot;app_articlequot;.quot;idquot; = %s ', (3,))

eu
con 2009 87
ro

Know your SQL
>>> import sqlparse
>>> def pp_query(qs):
... t = qs.query.as_sql()
... sql = t[0] % t[1]
... print sqlparse.format(sql, reindent=True,
keyword_case='upper')
...
>>> pp_query(Article.objects.filter(pk=3))
SELECT quot;app_articlequot;.quot;idquot;,
quot;app_articlequot;.quot;namequot;,
quot;app_articlequot;.quot;author_idquot;
FROM quot;app_articlequot;
WHERE quot;app_articlequot;.quot;idquot; = 3

eu
con 2009 88
ro

Know your SQL

>>> from django.db import connection
>>> connection.queries
[{'time': '0.001', 'sql': u'SELECT quot;app_articlequot;.quot;idquot;,
quot;app_articlequot;.quot;namequot;, quot;app_articlequot;.quot;author_idquot; FROM
quot;app_articlequot;'}]

eu
con 2009 89
ro

Know your SQL

• It’d be nice if a lightweight stacktrace could be
done in QuerySet.__init__
• Stick the result in connection.queries

• Now we know where the query originated

eu
con 2009 90
ro

Measuring

Django Debug Toolbar

http://github.com/robhudson/django-debug-toolbar/

eu
con 2009 91
ro

Monitoring

You can’t improve what you don’t measure.
• Ganglia
• Munin

eu
con 2009 92
ro

Measuring & Monitoring

• Measure
• Server load, CPU usage, I/O

• Database QPS

• Memcache QPS, hit rate, evictions

• Queue lengths

• Anything else interesting

eu
con 2009 93
ro

Contact Me

Mike Malone
mjmalone@gmail.com

twitter.com/mjmalone

eu
con 2009 95
ro

Scaling Django

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Scaling Django

Similar to Scaling Django (20)

Recently uploaded

Recently uploaded (20)

Scaling Django

Editor's Notes