Efficient Django

1,614 views

Published on

Does Django scale? How to manage traffic peaks? What happens when the database grows too big? How to find (and fix) the bottlenecks?

We will overview the basics concepts, we'll use metrics to find bottlenecks, and finally we'll see some tips and tricks to improve the scalability and the performance of a Django project.

Main topics:
- System architecture
- Database performance
- Finding bottlenecks
- Monitoring, profiling, debugging
- Query optimization
- Dealing with a slow admin
- Queues and workers
- Faster tests

Talk given at #EuroPython 2016: https://ep2016.europython.eu/conference/talks/efficient-django

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,614
On SlideShare
0
From Embeds
0
Number of Embeds
490
Actions
Shares
0
Downloads
12
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Efficient Django

  1. 1. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Efficient Django
  2. 2. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Abstract Tips and best practices for avoiding scalability issues and performance bottlenecks in Django ● 1) Basic concepts: the theory ● 2) Measuring: how to find bottlenecks ● 3) Tips and tricks ● 4) Conclusion (yes, it scales!)
  3. 3. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Hi! ● I'm David Arcos ● Python/Django developer since 2008 ● Co-organizer at Python Barcelona ● CTO at Lead Ratings
  4. 4. David Arcos - @DZPMEfficient Django – #EuroPython 2016 ● “We improve your sales conversions, using predictive algorithms to rate the leads” ● Prediction API, “Machine Learning as a Service” ● http://lead-ratings.com
  5. 5. David Arcos - @DZPMEfficient Django – #EuroPython 2016 1) Basic concepts
  6. 6. David Arcos - @DZPMEfficient Django – #EuroPython 2016 The Pareto Principle "For many events, roughly 80% of the effects come from 20% of the causes"
  7. 7. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Prioritize and focus Focus on the few tasks that will have the most impact
  8. 8. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Basic scalability “Potential to be enlarged to handle a growing amount of work” ● Stateless app servers – Load balance them, scale horizontally ● Keep the state on the database(s) – This is the difficult part! Each system is different
  9. 9. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Database performance ● Do less requests: – Less reads – Less writes ● Do faster requests: – Indexed fields – De-normalize
  10. 10. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Templates ● Cache them ● Jinja2 is a bit faster than the default engine – but cache them anyways ● You can do fragment caching (for blocks)
  11. 11. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Cache ● Generic approach: cache at each stack level ● The cache documentation is excellent ● Beware of the cache invalidation!
  12. 12. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Cache ● Generic approach: cache at each stack level ● The cache documentation is excellent ● Beware of the cache invalidation!
  13. 13. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Bottlenecks ● Where is your bottleneck? ● CPU bound or I/O bound? – CPU? Run heavy calculations in async workers – Memory? Compress objects before caching – Database? Read from db replicas ● How to find it?
  14. 14. David Arcos - @DZPMEfficient Django – #EuroPython 2016 2) Measuring
  15. 15. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Can't improve what you don't measure ● Measure your system to find bottlenecks ● Optimize those bottlenecks ● Verify the improvements ● Rinse and repeat!
  16. 16. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Monitoring ● System: load, CPU, memory... ● Database: q/s, response time, size ● Cache: q/s, hit rate ● Queue: length ● Custom: metrics for your app
  17. 17. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Profiling ● The cProfile module provides profiling of Python programs by collecting data: – Number of calls, running time, time per call...
  18. 18. David Arcos - @DZPMEfficient Django – #EuroPython 2016 timeit ● The timeit module is a simple way to time execution time of small bits of Python code:
  19. 19. David Arcos - @DZPMEfficient Django – #EuroPython 2016 ipdb ● Like pdb, but for ipython – tab completion, syntax highlighting, better tracebacks, better introspection… ● Use ipdb.set_trace() to add a breakpoint and jump in with the debugger
  20. 20. David Arcos - @DZPMEfficient Django – #EuroPython 2016 django-debug-toolbar ● Display debug information about the current request/response ● Panels, very modular
  21. 21. David Arcos - @DZPMEfficient Django – #EuroPython 2016 django-debug-toolbar-line-profiler ● A toolbar panel for profiling Django Debug Panel ● Chrome extension ● For AJAX requests and non-HTML responses
  22. 22. David Arcos - @DZPMEfficient Django – #EuroPython 2016 3) Tips and tricks
  23. 23. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Add db indexes ● Single (db_index) or multiple (index_together) ● Be sure to profile and measure! – Sometimes it’s not obvious (i.e., admin) – Huge difference, i.e. from 15s to 3 ms (3.5M rows) ● But: uses more space, slower writes
  24. 24. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Do bulk operations ● Will greatly reduce the number of SQL queries: – Model.objects.bulk_create() – qs.update() <- maybe with F() expressions – qs.delete()
  25. 25. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Get related objects ● Return FK fields in same query: – qs.select_related() ● Return M2M fields, extra query: – qs.prefetch_related()
  26. 26. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Slow admin? ● Use list_select_related ● Overwrite get_queryset() with prefetch_related ● Is ordering using an index? Same for search_fields ● readonly_fields will avoid FK/M2M queries ● Use the raw_id_fields widget (or better: django-salmonella) ● Extend admin/filter.html to show filters as <select>
  27. 27. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Cachalot ● Caches your Django ORM queries and automatically invalidates them
  28. 28. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Queues and workers ● Do slow stuff later ● Some operations can be queued, and executed asynchronously in workers ● Use Celery
  29. 29. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Cached sessions ● Use SESSION_ENGINE to set cached sessions: – Non-persistent: don’t hit the DB – Persistent: don’t hit the DB… so often
  30. 30. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Persistent connections ● Use CONN_MAX_AGE to set the lifetime of a database connection (persistence)
  31. 31. David Arcos - @DZPMEfficient Django – #EuroPython 2016 UUIDs ● Use UUID for Primary Keys (instead of incremental IDs) – Guaranteed uniqueness, avoid collisions – UUIDs are well-indexed ● Easier db sharding
  32. 32. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Slow tests? ● Skip migrations: --keepdb ● Run in parallel: --parallel ● Disable unused middlewares, installed_apps, password hashers, logging, etc… ● Use mocking whenever possible
  33. 33. David Arcos - @DZPMEfficient Django – #EuroPython 2016 4) Conclusions ● Measure first ● Optimize only the bottleneck ● Go for the low-hanging fruit ● Measure again
  34. 34. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Good resources ● The official Django documentation ● Book: “High Performance Django” ● Blog: “Instagram Engineering” ● “Latency Numbers Every Programmer Should Know”
  35. 35. David Arcos - @DZPMEfficient Django – #EuroPython 2016 Thanks for attending! - Get the slides at http://slideshare.net/DZPM - We are looking for engineers and data scientists!

×