High Performance Django

  • 1,671 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,671
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
34
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. High Performance Django David Cramer http://www.davidcramer.net/ http://www.ibegin.com/
  • 2. Curse •  Peak daily traffic of approx. 15m pages, 150m hits. •  Average monthly traffic 120m pages, 6m uniques. •  Python, MySQL, Squid, memcached, mod_python, lighty. •  Most developers came strictly from PHP (myself included). •  12 web servers, 4 database servers, 2 squid caches.
  • 3. iBegin •  Massive amounts of data, 100m+ rows. •  Python, PHP, MySQL, mod_wsgi. •  Small team of developers. •  Complex database partitioning/synchronization tasks. •  Attempting to not branch off of Django. 
  • 4. Areas of Concern •  Database (ORM) •  Webserver (Resources, Handling Millions of Reqs) •  Caching (Invalidation, Cache Dump) •  Template Rendering (Logic Separation) •  Profiling
  • 5. Tools of the Trade •  Webserver (Apache, Nginx, Lighttpd) •  Object Cache (memcached) •  Database (MySQL, PostgreSQL, …) •  Page Cache (Squid, Nginx, Varnish) •  Load Balancing (Nginx, Perlbal)
  • 6. How We Did It •  “Primary” web servers serving Django using mod_python. •  Media servers using Django on lighttpd. •  Static served using additional instances of lighttpd. •  Load balancers passing requests to multiple Squids. •  Squids passing requests to multiple web servers.
  • 7. Lessons Learned •  Don’t be afraid to experiment. You’re not limited to a one. •  mod_wsgi is a huge step forward from mod_python. •  Serving static files using different software can help. •  Send proper HTTP headers where they are needed. •  Use services like S3, Akamai, Limelight, etc..
  • 8. Webserver Software Python Scripts Static Content •  Apache (wsgi, mod_py, •  Apache fastcgi) •  Lighttpd •  Lighttpd (fastcgi) •  Tinyhttpd •  Nginx (fastcgi) •  Nginx Reverse Proxies Software Load Balancers •  Nginx •  Nginx •  Squid •  Perlbal •  Varnish
  • 9. Database (ORM) •  Won’t make your queries efficient. Make your own indexes. •  select_related() can be good, as well as bad. •  Inherited ordering (Meta: ordering) will get you. •  Hundreds of queries on a page is never a good thing. •  Know when to not use the ORM.
  • 10. Handling JOINs class Category(models.Model): name = models.CharField() created_by = models.ForeignKey(User) class Poll(models.Model): name = models.CharField() category = models.ForeignKey(Category) created_by = models.ForeignKey(User) # We need to output a page listing all Poll's with # their name and category's name. def a_bad_example(request): # We have just caused Poll to JOIN with User and Category, # which will also JOIN with User a second time. my_polls = Poll.objects.all().select_related() return render_to_response('polls.html', locals(), request) def a_good_example(request): # Use select_related explicitly in each case. poll = Poll.objects.all().select_related('category') return render_to_response('polls.html', locals(), request)
  • 11. Template Rendering •  Sandboxed engines are typically slower by nature. •  Keep logic in views and template tags. •  Be aware of performance in loops, and groupby (regroup). •  Loaded templates can be cached to avoid disk reads. •  Switching template engines is easy, but may not give you any worthwhile performance gain.
  • 12. Template Engines
  • 13. Caching •  Two flavors of caching: object cache and browser cache. •  Django provides built-in support for both. •  Invalidation is a headache without a well thought out plan. •  Caching isn’t a solution for slow loading pages or improper indexes. •  Use a reverse proxy in between the browser and your web servers: Squid, Varnish, Nginx, etc..
  • 14. Cache With a Plan •  Build your pages to use proper cache headers. •  Create a plan for object cache expiration, and invalidation. •  For typical web apps you can serve the same cached page for both anonymous and authenticated users. •  Contain commonly used querysets in managers for transparent caching and invalidation.
  • 15. Cache Commonly Used Items def my_context_processor(request): # We access object_list every time we use our context processors so # it makes sense to cache this, no? cache_key = ‘mymodel:all’ object_list = cache.get(cache_key) if object_list is None: object_list = MyModel.objects.all() cache.set(cache_key, object_list) return {‘object_list’: object_list} # Now that we are caching the object list we are going to want to invalidate it class MyModel(models.Model): name = models.CharField() def save(self, *args, **kwargs): super(MyModel, self).save(*args, **kwargs) # save it before you update the cache cache.set(‘mymodel:all’, MyModel.objects.all())
  • 16. Profiling Code •  Finding the bottleneck can be time consuming. •  Tools exist to help identify common problematic areas. –  cProfile/Profile Python modules. –  PDB (Python Debugger)
  • 17. Profiling Code With cProfile import sys try: import cProfile as profile except ImportError: import profile try: from cStringIO import StringIO except ImportError: import StringIO from django.conf import settings class ProfilerMiddleware(object): def can(self, request): return settings.DEBUG and 'prof' in request.GET and (not settings.INTERNAL_IPS or request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS) def process_view(self, request, callback, callback_args, callback_kwargs): if self.can(request): self.profiler = profile.Profile() args = (request,) + callback_args return self.profiler.runcall(callback, *args, **callback_kwargs) def process_response(self, request, response): if self.can(request): self.profiler.create_stats() out = StringIO() old_stdout, sys.stdout = sys.stdout, out self.profiler.print_stats(1) sys.stdout = old_stdout response.content = '<pre>%s</pre>' % out.getvalue() return response
  • 18. http://localhost:8000/?prof
  • 19. Profiling Database Queries from django.db import connection class DatabaseProfilerMiddleware(object): def can(self, request): return settings.DEBUG and 'dbprof' in request.GET and (not settings.INTERNAL_IPS or request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS) def process_response(self, request, response): if self.can(request): out = StringIO() out.write('timetsqln') total_time = 0 for query in reversed(sorted(connection.queries, key=lambda x: x['time'])): total_time += float(query['time'])*1000 out.write('%st%sn' % (query['time'], query['sql'])) response.content = '<pre style=quot;white-space:pre-wrapquot;>%d queries executed in %.3f secondsnn%s</pre>' % (len(connection.queries), total_time/1000, out.getvalue()) return response
  • 20. http://localhost:8000/?dbprof
  • 21. Summary •  Database efficiency is the typical problem in web apps. •  Develop and deploy a caching plan early on. •  Use profiling tools to find your problematic areas. Don’t pre- optimize unless there is good reason. •  Find someone who knows more than me to configure your server software. 
  • 22. Thanks! Slides and code available online at: http://www.davidcramer.net/djangocon