Your SlideShare is downloading. ×
Django Con   High Performance Django
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Django Con High Performance Django

27,538
views

Published on

Published in: Technology

4 Comments
80 Likes
Statistics
Notes
No Downloads
Views
Total Views
27,538
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
632
Comments
4
Likes
80
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. David Cramer http://www.davidcramer.net/ http://www.ibegin.com/ High Performance Django
  • 2. Curse
    • Peak daily traffic of approx. 15m pages, 150m hits.
    • Average monthly traffic 120m pages, 6m uniques.
    • Python, MySQL, Squid, memcached, mod_python, lighty.
    • Most developers came strictly from PHP (myself included).
    • 12 web servers, 4 database servers, 2 squid caches.
  • 3. iBegin
    • Massive amounts of data, 100m+ rows.
    • Python, PHP, MySQL, mod_wsgi.
    • Small team of developers.
    • Complex database partitioning/synchronization tasks.
    • Attempting to not branch off of Django. 
  • 4. Areas of Concern
    • Database (ORM)
    • Webserver (Resources, Handling Millions of Reqs)
    • Caching (Invalidation, Cache Dump)
    • Template Rendering (Logic Separation)
    • Profiling
  • 5. Tools of the Trade
    • Webserver (Apache, Nginx, Lighttpd)
    • Object Cache (memcached)
    • Database (MySQL, PostgreSQL, …)
    • Page Cache (Squid, Nginx, Varnish)
    • Load Balancing (Nginx, Perlbal)
  • 6. How We Did It
    • “ Primary” web servers serving Django using mod_python.
    • Media servers using Django on lighttpd.
    • Static served using additional instances of lighttpd.
    • Load balancers passing requests to multiple Squids.
    • Squids passing requests to multiple web servers.
  • 7. Lessons Learned
    • Don’t be afraid to experiment. You’re not limited to a one.
    • mod_wsgi is a huge step forward from mod_python.
    • Serving static files using different software can help.
    • Send proper HTTP headers where they are needed.
    • Use services like S3, Akamai, Limelight, etc..
  • 8. Webserver Software
    • Python Scripts
    • Apache (wsgi, mod_py, fastcgi)
    • Lighttpd (fastcgi)
    • Nginx (fastcgi)
    • Reverse Proxies
    • Nginx
    • Squid
    • Varnish
    • Static Content
    • Apache
    • Lighttpd
    • Tinyhttpd
    • Nginx
    • Software Load Balancers
    • Nginx
    • Perlbal
  • 9. Database (ORM)
    • Won’t make your queries efficient. Make your own indexes.
    • select_related() can be good, as well as bad.
    • Inherited ordering (Meta: ordering) will get you.
    • Hundreds of queries on a page is never a good thing.
    • Know when to not use the ORM.
  • 10. Handling JOINs
    • class Category(models.Model):
    • name = models.CharField()
    • created_by = models.ForeignKey(User)
    • class Poll(models.Model):
    • name = models.CharField()
    • category = models.ForeignKey(Category)
    • created_by = models.ForeignKey(User)
    • # We need to output a page listing all Poll's with
    • # their name and category's name.
    • def a_bad_example(request):
    • # We have just caused Poll to JOIN with User and Category,
    • # which will also JOIN with User a second time.
    • my_polls = Poll.objects.all().select_related()
    • return render_to_response('polls.html', locals(), request)
    • def a_good_example(request):
    • # Use select_related explicitly in each case.
    • poll = Poll.objects.all().select_related('category')
    • return render_to_response('polls.html', locals(), request)
  • 11. Template Rendering
    • Sandboxed engines are typically slower by nature.
    • Keep logic in views and template tags.
    • Be aware of performance in loops, and groupby (regroup).
    • Loaded templates can be cached to avoid disk reads.
    • Switching template engines is easy, but may not give you any worthwhile performance gain.
  • 12. Template Engines
  • 13. Using Django with Jinja
    • from jinja.contrib.djangosupport import render_to_response
    • from models import MyModel
    • def myview(request):
    • my_object_list = MyModel.objects.all()
    • # Both the context, and request parameters are optional.
    • # If you pass request it will execute your context processors.
    • return render_to_response(‘template/name.html’, locals(), request)
    • from jinja.contrib.djangosupport import register, convert_django_filter
    • def truncatechars(length=30):
    • def wrapped(env, context, value):
    • if len(value) > length:
    • value = value[0:length-3] + '...'
    • return value
    • return wrapped
    • register.filter(truncatechars)
    • from django.contrib.humanize.templatetags.humanize import intcomma
    • register.filter(convert_django_filter(intcomma), 'intcomma')
  • 14. Caching
    • Two flavors of caching: object cache and browser cache.
    • Django provides built-in support for both.
    • Invalidation is a headache without a well thought out plan.
    • Caching isn’t a solution for slow loading pages or improper indexes.
    • Use a reverse proxy in between the browser and your web servers: Squid, Varnish, Nginx, etc..
  • 15. Cache With a Plan
    • Build your pages to use proper cache headers.
    • Create a plan for object cache expiration, and invalidation.
    • For typical web apps you can serve the same cached page for both anonymous and authenticated users.
    • Contain commonly used querysets in managers for transparent caching and invalidation.
  • 16. Cache Commonly Used Items
    • def my_context_processor(request):
    • # We access object_list every time we use our context processors so
    • # it makes sense to cache this, no?
    • cache_key = ‘mymodel:all’
    • object_list = cache.get(cache_key)
    • if object_list is None:
    • object_list = MyModel.objects.all()
    • cache.set(cache_key, object_list)
    • return {‘object_list’: object_list}
    • # Now that we are caching the object list we are going to want to invalidate it
    • class MyModel(models.Model):
    • name = models.CharField()
    • def save(self, *args, **kwargs):
    • super(MyModel, self).save(*args, **kwargs)
    • # save it before you update the cache
    • cache.set(‘mymodel:all’, MyModel.objects.all())
  • 17. Profiling Code
    • Finding the bottleneck can be time consuming.
    • Tools exist to help identify common problematic areas.
      • cProfile/Profile Python modules.
      • PDB (Python Debugger)
  • 18. Profiling Code With cProfile
    • import sys
    • try: import cProfile as profile
    • except ImportError: import profile
    • try: from cStringIO import StringIO
    • except ImportError: import StringIO
    • from django.conf import settings
    • class ProfilerMiddleware(object):
    • def can(self, request):
    • return settings.DEBUG and 'prof' in request.GET and (not settings.INTERNAL_IPS or request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS)
    • def process_view(self, request, callback, callback_args, callback_kwargs):
    • if self.can(request):
    • self.profiler = profile.Profile()
    • args = (request,) + callback_args
    • return self.profiler.runcall(callback, *args, **callback_kwargs)
    • def process_response(self, request, response):
    • if self.can(request):
    • self.profiler.create_stats()
    • out = StringIO()
    • old_stdout, sys.stdout = sys.stdout, out
    • self.profiler.print_stats(1)
    • sys.stdout = old_stdout
    • response.content = '<pre>%s</pre>' % out.getvalue()
    • return response
  • 19. http://localhost:8000/?prof
  • 20. Profiling Database Queries
    • from django.db import connection
    • class DatabaseProfilerMiddleware(object):
    • def can(self, request):
    • return settings.DEBUG and 'dbprof' in request.GET
    • and (not settings.INTERNAL_IPS or
    • request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS)
    • def process_response(self, request, response):
    • if self.can(request):
    • out = StringIO()
    • out.write('time sql ')
    • total_time = 0
    • for query in reversed(sorted(connection.queries, key=lambda x: x['time'])):
    • total_time += float(query['time'])*1000
    • out.write('%s %s ' % (query['time'], query['sql']))
    • response.content = '<pre style=&quot;white-space:pre-wrap&quot;>%d queries executed in %.3f seconds %s</pre>' % (len(connection.queries), total_time/1000, out.getvalue())
    • return response
  • 21. http://localhost:8000/?dbprof
  • 22. Summary
    • Database efficiency is the typical problem in web apps.
    • Develop and deploy a caching plan early on.
    • Use profiling tools to find your problematic areas. Don’t pre-optimize unless there is good reason.
    • Find someone who knows more than me to configure your server software. 
  • 23. Slides and code available online at: http://www.davidcramer.net/djangocon Thanks!