6. Pownce
• Large scale
• Hundreds of requests/sec
• Thousands of DB operations/sec
• Millions of user relationships
• Millions of notes
• Terabytes of static data
djangocon 2009 6
Thursday, September 10, 2009
7. Pownce
• Encountered and eliminated many common
scaling bottlenecks
• Real world example of scaling a Django app
• Django provides a lot for free
• I’ll be focusing on what you have to build yourself,
and the rare places where Django got in the way
djangocon 2009 7
Thursday, September 10, 2009
9. Scalability
Scalability is NOT:
• Speed / Performance
• Generally affected by language choice
• Achieved by adopting a particular technology
djangocon 2009 9
Thursday, September 10, 2009
10. A Scalable Application
import time
def application(environ, start_response):
time.sleep(10)
start_response('200 OK', [('content-type', 'text/plain')])
return ('Hello, world!',)
djangocon 2009 10
Thursday, September 10, 2009
11. A High Performance Application
def application(environ, start_response):
remote_addr = environ['REMOTE_ADDR']
f = open('access-log', 'a+')
f.write(remote_addr + "n")
f.flush()
f.seek(0)
hits = sum(1 for l in f.xreadlines()
if l.strip() == remote_addr)
f.close()
start_response('200 OK', [('content-type', 'text/plain')])
return (str(hits),)
djangocon 2009 11
Thursday, September 10, 2009
12. Scalability
A scalable system doesn’t need to change when the
size of the problem changes.
djangocon 2009 12
Thursday, September 10, 2009
13. Scalability
• Accommodate increased usage
• Accommodate increased data
• Maintainable
djangocon 2009 13
Thursday, September 10, 2009
14. Scalability
• Two kinds of scalability
• Vertical scalability: buying more powerful
hardware, replacing what you already own
• Horizontal scalability: buying additional
hardware, supplementing what you already own
djangocon 2009 14
Thursday, September 10, 2009
15. Vertical Scalability
• Costs don’t scale linearly (server that’s twice is
fast is more than twice as much)
• Inherently limited by current technology
• But it’s easy! If you can get away with it, good
for you.
djangocon 2009 15
Thursday, September 10, 2009
16. Vertical Scalability
“ Sky scrapers are special. Normal
buildings don’t need 10 floor
foundations. Just build!
- Cal Henderson
djangocon 2009 16
Thursday, September 10, 2009
17. Horizontal Scalability
The ability to increase a system’s capacity by adding
more processing units (servers)
djangocon 2009 17
Thursday, September 10, 2009
18. Horizontal Scalability
It’s how large apps are scaled.
djangocon 2009 18
Thursday, September 10, 2009
19. Horizontal Scalability
• A lot more work to design, build, and maintain
• Requires some planning, but you don’t have to
do all the work up front
• You can scale progressively...
• Rest of the presentation is roughly in order
djangocon 2009 19
Thursday, September 10, 2009
21. Caching
• Several levels of caching available in Django
• Per-site cache: caches every page that doesn’t have
GET or POST parameters
• Per-view cache: caches output of an individual view
• Template fragment cache: caches fragments of a
template
• None of these are that useful if pages are
heavily personalized
djangocon 2009 21
Thursday, September 10, 2009
22. Caching
• Low-level Cache API
• Much more flexible, allows you to cache at any
granularity
• At Pownce we typically cached
• Individual objects
• Lists of object IDs
• Hard part is invalidation
djangocon 2009 22
Thursday, September 10, 2009
24. Caching
Use Memcache.
djangocon 2009 24
Thursday, September 10, 2009
25. Sessions
Use Memcache.
djangocon 2009 25
Thursday, September 10, 2009
26. Sessions
Or Tokyo Cabinet
http://github.com/ericflo/django-tokyo-sessions/
Thanks @ericflo
djangocon 2009 26
Thursday, September 10, 2009
27. Caching
Basic caching comes free with Django:
from django.core.cache import cache
class UserProfile(models.Model):
...
def get_social_network_profiles(self):
cache_key = ‘networks_for_%s’ % self.user.id
profiles = cache.get(cache_key)
if profiles is None:
profiles = self.user.social_network_profiles.all()
cache.set(cache_key, profiles)
return profiles
djangocon 2009 27
Thursday, September 10, 2009
28. Caching
Invalidate when a model is saved or deleted:
from django.core.cache import cache
from django.db.models import signals
def nuke_social_network_cache(self, instance, **kwargs):
cache_key = ‘networks_for_%s’ % self.instance.user_id
cache.delete(cache_key)
signals.post_save.connect(nuke_social_network_cache,
sender=SocialNetworkProfile)
signals.post_delete.connect(nuke_social_network_cache,
sender=SocialNetworkProfile)
djangocon 2009 28
Thursday, September 10, 2009
29. Caching
• Invalidate post_save, not pre_save
• Still a small race condition
• Simple solution, worked for Pownce:
• Instead of deleting, set the cache key to None for a
short period of time
• Instead of using set to cache objects, use add, which
fails if there’s already something stored for the key
djangocon 2009 29
Thursday, September 10, 2009
30. Advanced Caching
• Memcached’s atomic increment and decrement
operations are useful for maintaining counts
• They were added to the Django cache API in
Django 1.1
djangocon 2009 30
Thursday, September 10, 2009
31. Advanced Caching
• You can still use them if you poke at the
internals of the cache object a bit
• cache._cache is the underlying cache object
try:
result = cache._cache.incr(cache_key, delta)
except ValueError: # nonexistent key raises ValueError
# Do it the hard way, store the result.
return result
djangocon 2009 31
Thursday, September 10, 2009
32. Advanced Caching
• Other missing cache API
• delete_multi & set_multi
• append: add data to existing key after existing data
• prepend: add data to existing key before existing
data
• cas: store this data, but only if no one has edited it
since I fetched it
djangocon 2009 32
Thursday, September 10, 2009
33. Advanced Caching
• It’s often useful to cache objects ‘forever’ (i.e.,
until you explicitly invalidate them)
• User and UserProfile
• fetched almost every request
• rarely change
• But Django won’t let you
• IMO, this is a bug :(
djangocon 2009 33
Thursday, September 10, 2009
34. The Memcache Backend
class CacheClass(BaseCache):
def __init__(self, server, params):
BaseCache.__init__(self, params)
self._cache = memcache.Client(server.split(';'))
def add(self, key, value, timeout=0):
if isinstance(value, unicode):
value = value.encode('utf-8')
return self._cache.add(smart_str(key), value,
timeout or self.default_timeout)
djangocon 2009 34
Thursday, September 10, 2009
35. The Memcache Backend
class CacheClass(BaseCache):
def __init__(self, server, params):
BaseCache.__init__(self, params)
self._cache = memcache.Client(server.split(';'))
def add(self, key, value, timeout=None):
if isinstance(value, unicode):
value = value.encode('utf-8')
if timeout is None:
timeout = self.default_timeout
return self._cache.add(smart_str(key), value,
timeout)
djangocon 2009 35
Thursday, September 10, 2009
36. Advanced Caching
• Typical setup has memcached running on web
servers
• Pownce web servers were I/O and memory
bound, not CPU bound
• Since we had some spare CPU cycles, we
compressed large objects before caching them
• The Python memcache library can do this
automatically, but the API is not exposed
djangocon 2009 36
Thursday, September 10, 2009
37. Monkey Patching core.cache
from django.core.cache import cache
from django.utils.encoding import smart_str
import inspect as i
if 'min_compress_len' in i.getargspec(cache._cache.set)[0]:
class CacheClass(cache.__class__):
def set(self, key, value, timeout=None,
min_compress_len=150000):
if isinstance(value, unicode):
value = value.encode('utf-8')
if timeout is None:
timeout = self.default_timeout
return self._cache.set(smart_str(key), value,
timeout, min_compress_len)
cache.__class__ = CacheClass
djangocon 2009 37
Thursday, September 10, 2009
38. Advanced Caching
• Useful tool: automagic single object cache
• Use a manager to check the cache prior to any
single object get by pk
• Invalidate assets on save and delete
• Eliminated several hundred QPS at Pownce
djangocon 2009 38
Thursday, September 10, 2009
39. Advanced Caching
All this and more at:
http://github.com/mmalone/django-caching/
djangocon 2009 39
Thursday, September 10, 2009
40. Caching
Now you’ve made life easier for your DB server,
next thing to fall over: your app server.
djangocon 2009 40
Thursday, September 10, 2009
42. Load Balancing
• Out of the box, Django uses a shared nothing
architecture
• App servers have no single point of contention
• Responsibility pushed down the stack (to DB)
• This makes scaling the app layer trivial: just add
another server
djangocon 2009 42
Thursday, September 10, 2009
43. Load Balancing
Spread work between multiple
nodes in a cluster using a load
balancer.
Load Balancer
• Hardware or software
• Layer 7 or Layer 4
App Servers
Database
djangocon 2009 43
Thursday, September 10, 2009
44. Load Balancing
• Hardware load balancers
• Expensive, like $35,000 each, plus maintenance
contracts
• Need two for failover / high availability
• Software load balancers
• Cheap and easy, but more difficult to eliminate as a
single point of failure
• Lots of options: Perlbal, Pound, HAProxy,Varnish,
Nginx
djangocon 2009 44
Thursday, September 10, 2009
45. Load Balancing
• Most of these are layer 7 proxies, and some
software balancers do cool things
• Caching
• Re-proxying
• Authentication
• URL rewriting
djangocon 2009 45
Thursday, September 10, 2009
46. Load Balancing
A common setup for large
operations is to use
redundant layer 4 hardware Hardware Balancers
balancers in front of a pool of
layer 7 software balancers. Software
Balancers
App
Servers
djangocon 2009 46
Thursday, September 10, 2009
47. Load Balancing
• At Pownce, we used a single Perlbal balancer
• Easily handled all of our traffic (hundreds of
simultaneous connections)
• A SPOF, but we didn’t have $100,000 for black box
solutions, and weren’t worried about service
guarantees beyond three or four nines
• Plus there were some neat features that we took
advantage of
djangocon 2009 47
Thursday, September 10, 2009
48. Perlbal Reproxying
Perlbal reproxying is a really cool, and really poorly
documented feature.
djangocon 2009 48
Thursday, September 10, 2009
49. Perlbal Reproxying
1. Perlbal receives request
2. Redirects to App Server
1. App server checks auth (etc.)
2. Returns HTTP 200 with X-
Reproxy-URL header set to
internal file server URL
3. File served from file server via
Perlbal
djangocon 2009 49
Thursday, September 10, 2009
50. Perlbal Reproxying
• Completely transparent to end user
• Doesn’t keep large app server instance around
to serve file
• Users can’t access files directly (like they could
with a 302)
djangocon 2009 50
Thursday, September 10, 2009
54. Queuing
• A queue is simply a bucket that holds messages
until they are removed for processing by clients
• Many expensive operations can be queued and
performed asynchronously
• User experience doesn’t have to suffer
• Tell the user that you’re running the job in the
background (e.g., transcoding)
• Make it look like the job was done real-time (e.g.,
note distribution)
djangocon 2009 54
Thursday, September 10, 2009
55. Queuing
• Lots of open source options for queuing
• Ghetto Queue (MySQL + Cron)
• this is the official name.
• Gearman
• TheSchwartz
• RabbitMQ
• Apache ActiveMQ
• ZeroMQ
djangocon 2009 55
Thursday, September 10, 2009
56. Queuing
• Lots of fancy features: brokers, exchanges,
routing keys, bindings...
• Don’t let that crap get you down, this is really
simple stuff
• Biggest decision: persistence
• Does your queue need to be durable and
persistent, able to survive a crash?
• This requires logging to disk which slows things
down, so don’t do it unless you have to
djangocon 2009 56
Thursday, September 10, 2009
57. Queuing
• Pownce used a simple ghetto queue built on
MySQL / cron
• Problematic if you have multiple consumers pulling
jobs from the queue
• No point in reinventing the wheel, there are
dozens of battle-tested open source queues to
choose from
djangocon 2009 57
Thursday, September 10, 2009
58. Django Standalone Scripts
Consumers need to setup the Django environment
from django.core.management import setup_environ
from mysite import settings
setup_environ(settings)
djangocon 2009 58
Thursday, September 10, 2009
60. The Database
• Til now we’ve been talking about
• Shared nothing
• Pushing problems down the stack
• But we have to store a persistent and
consistent view of our application’s state
somewhere
• Enter, the database...
djangocon 2009 60
Thursday, September 10, 2009
61. CAP Theorem
• Three properties of a shared-data system
• Consistency: all clients see the same data
• Availability: all clients can see some version of
the data
• Partition Tolerance: system properties hold
even when the system is partitioned & messages
are lost
• But you can only have two
djangocon 2009 61
Thursday, September 10, 2009
62. CAP Theorem
• Big long proof... here’s my version.
• Empirically, seems to make sense.
• Eric Brewer
• Professor at University of California, Berkeley
• Co-founder and Chief Scientist of Inktomi
• Probably smarter than me
djangocon 2009 62
Thursday, September 10, 2009
63. CAP Theorem
• The relational database systems we all use were
built with consistency as their primary goal
• But at scale our system needs to have high
availability and must be partitionable
• The RDBMS’s consistency requirements get in our
way
• Most sharding / federation schemes are kludges
that trade consistency for availability & partition
tolerance
djangocon 2009 63
Thursday, September 10, 2009
64. The Database
• There are lots of non-relational databases
coming onto the scene
• CouchDB
• Cassandra
• Tokyo Cabinet
• But they’re not that mature, and they aren’t easy
to use with Django
djangocon 2009 64
Thursday, September 10, 2009
66. Denormalization
• Django encourages normalized data, which is
usually good
• But at scale you need to denormalize
• Corollary: joins are evil
• Django makes it really easy to do joins using the
ORM, so pay attention
djangocon 2009 66
Thursday, September 10, 2009
67. Denormalization
• Start with a normalized database
• Selectively denormalize things as they become
bottlenecks
• Denormalized counts, copied fields, etc. can be
updated in signal handlers
djangocon 2009 67
Thursday, September 10, 2009
69. Replication
• Typical web app is 80 to 90% reads
• Adding read capacity will get you a long way
• MySQL Master-Slave replication
Read & Write
Read only
djangocon 2009 69
Thursday, September 10, 2009
70. Replication
• Django doesn’t make it easy to use multiple
database connections, but it is possible
• Some caveats
• Slave lag interacts with caching in weird ways
• You can only save to your primary DB (the one
you configure in settings.py)
• Unless you get really clever...
djangocon 2009 70
Thursday, September 10, 2009
71. Replication
1. Create a custom database wrapper by subclassing DatabaseWrapper
class SlaveDatabaseWrapper(DatabaseWrapper):
def _cursor(self, settings):
if not self._valid_connection():
kwargs = {
'conv': django_conversions,
'charset': 'utf8',
'use_unicode': True,
}
kwargs = pick_random_slave(settings.SLAVE_DATABASES)
self.connection = Database.connect(**kwargs)
...
cursor = CursorWrapper(self.connection.cursor())
return cursor
djangocon 2009 71
Thursday, September 10, 2009
72. Replication
2. Custom QuerySet that uses primary DB for writes
class MultiDBQuerySet(QuerySet):
...
def update(self, **kwargs):
slave_conn = self.query.connection
self.query.connection = default_connection
super(MultiDBQuerySet, self).update(**kwargs)
self.query.connection = slave_conn
djangocon 2009 72
Thursday, September 10, 2009
73. Replication
3. Custom Manager that uses your custom QuerySet
class SlaveDatabaseManager(db.models.Manager):
def get_query_set(self):
return MultiDBQuerySet(self.model,
query=self.create_query())
def create_query(self):
return db.models.sql.Query(self.model, connection)
djangocon 2009 73
Thursday, September 10, 2009
74. Replication
Example on github:
http://github.com/mmalone/django-multidb/
djangocon 2009 74
Thursday, September 10, 2009
76. Replication
• Goal:
• Read-what-you-write consistency for writer
• Eventual consistency for everyone else
• Slave lag screws things up
djangocon 2009 76
Thursday, September 10, 2009
77. Replication
What happens when you become
write saturated?
djangocon 2009 77
Thursday, September 10, 2009
79. Federation
• Start with Vertical Partitioning: split tables that
aren’t joined across database servers
• Actually pretty easy
• Except not with Django
djangocon 2009 79
Thursday, September 10, 2009
81. Federation
• At some point you’ll need to split a single table
across databases (e.g., user table)
• Auto-increment PKs won’t work
• It’d be nice to have a UUIDField for PKs
• You can probably build this yourself
djangocon 2009 81
Thursday, September 10, 2009
83. Know your SQL
>>> Article.objects.filter(pk=3).query.as_sql()
('SELECT "app_article"."id", "app_article"."name",
"app_article"."author_id" FROM "app_article" WHERE
"app_article"."id" = %s ', (3,))
djangocon 2009 83
Thursday, September 10, 2009
84. Know your SQL
>>> import sqlparse
>>> def pp_query(qs):
... t = qs.query.as_sql()
... sql = t[0] % t[1]
... print sqlparse.format(sql, reindent=True,
keyword_case='upper')
...
>>> pp_query(Article.objects.filter(pk=3))
SELECT "app_article"."id",
"app_article"."name",
"app_article"."author_id"
FROM "app_article"
WHERE "app_article"."id" = 3
djangocon 2009 84
Thursday, September 10, 2009
85. Know your SQL
>>> from django.db import connection
>>> connection.queries
[{'time': '0.001', 'sql': u'SELECT "app_article"."id",
"app_article"."name", "app_article"."author_id" FROM
"app_article"'}]
djangocon 2009 85
Thursday, September 10, 2009
86. Know your SQL
• It’d be nice if a lightweight stacktrace could be
done in QuerySet.__init__
• Stick the result in connection.queries
• Now we know where the query originated
djangocon 2009 86
Thursday, September 10, 2009