David Arcos - @DZPMEfficient Django – #EuroPython 2016
Efficient Django
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Abstract
Tips and best practices for avoiding scalability
issues and performance bottlenecks in Django
● 1) Basic concepts: the theory
● 2) Measuring: how to find bottlenecks
● 3) Tips and tricks
● 4) Conclusion (yes, it scales!)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Hi!
● I'm David Arcos
● Python/Django developer since 2008
● Co-organizer at Python Barcelona
● CTO at Lead Ratings
David Arcos - @DZPMEfficient Django – #EuroPython 2016
●
“We improve your sales conversions, using
predictive algorithms to rate the leads”
●
Prediction API, “Machine Learning as a Service”
●
http://lead-ratings.com
David Arcos - @DZPMEfficient Django – #EuroPython 2016
1) Basic concepts
David Arcos - @DZPMEfficient Django – #EuroPython 2016
The Pareto Principle
"For many events, roughly 80% of the effects
come from 20% of the causes"
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Prioritize and focus
Focus on the few tasks that will have the most impact
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Basic scalability
“Potential to be enlarged to handle a growing
amount of work”
●
Stateless app servers
– Load balance them, scale horizontally
●
Keep the state on the database(s)
– This is the difficult part! Each system is different
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Database performance
●
Do less requests:
– Less reads
– Less writes
●
Do faster requests:
– Indexed fields
– De-normalize
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Templates
●
Cache them
●
Jinja2 is a bit faster than the default engine
– but cache them anyways
●
You can do fragment caching (for blocks)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cache
●
Generic approach: cache at each stack level
●
The cache documentation is excellent
●
Beware of the cache invalidation!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cache
●
Generic approach: cache at each stack level
●
The cache documentation is excellent
●
Beware of the cache invalidation!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Bottlenecks
●
Where is your bottleneck?
●
CPU bound or I/O bound?
– CPU? Run heavy calculations in async workers
– Memory? Compress objects before caching
– Database? Read from db replicas
●
How to find it?
David Arcos - @DZPMEfficient Django – #EuroPython 2016
2) Measuring
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Can't improve what you don't measure
●
Measure your system to find bottlenecks
●
Optimize those bottlenecks
●
Verify the improvements
●
Rinse and repeat!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Monitoring
●
System: load, CPU, memory...
●
Database: q/s, response time, size
●
Cache: q/s, hit rate
●
Queue: length
●
Custom: metrics for your app
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Profiling
●
The cProfile module provides profiling of
Python programs by collecting data:
– Number of calls, running time, time per call...
David Arcos - @DZPMEfficient Django – #EuroPython 2016
timeit
●
The timeit module is a simple way to time
execution time of small bits of Python code:
David Arcos - @DZPMEfficient Django – #EuroPython 2016
ipdb
●
Like pdb, but for ipython
– tab completion, syntax highlighting, better
tracebacks, better introspection…
●
Use ipdb.set_trace() to add a breakpoint and
jump in with the debugger
David Arcos - @DZPMEfficient Django – #EuroPython 2016
django-debug-toolbar
●
Display debug information about the current
request/response
●
Panels, very modular
David Arcos - @DZPMEfficient Django – #EuroPython 2016
django-debug-toolbar-line-profiler
●
A toolbar panel for profiling
Django Debug Panel
●
Chrome extension
●
For AJAX requests and non-HTML responses
David Arcos - @DZPMEfficient Django – #EuroPython 2016
3) Tips and tricks
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Add db indexes
●
Single (db_index) or multiple (index_together)
●
Be sure to profile and measure!
– Sometimes it’s not obvious (i.e., admin)
– Huge difference, i.e. from 15s to 3 ms (3.5M rows)
●
But: uses more space, slower writes
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Do bulk operations
●
Will greatly reduce the number of SQL queries:
– Model.objects.bulk_create()
– qs.update() <- maybe with F() expressions
– qs.delete()
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Get related objects
●
Return FK fields in same query:
– qs.select_related()
●
Return M2M fields, extra query:
– qs.prefetch_related()
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Slow admin?
●
Use list_select_related
●
Overwrite get_queryset() with prefetch_related
●
Is ordering using an index? Same for search_fields
●
readonly_fields will avoid FK/M2M queries
●
Use the raw_id_fields widget (or better:
django-salmonella)
●
Extend admin/filter.html to show filters as <select>
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cachalot
●
Caches your Django ORM queries and
automatically invalidates them
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Queues and workers
●
Do slow stuff later
●
Some operations can be queued, and executed
asynchronously in workers
●
Use Celery
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cached sessions
●
Use SESSION_ENGINE to set cached sessions:
– Non-persistent: don’t hit the DB
– Persistent: don’t hit the DB… so often
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Persistent connections
●
Use CONN_MAX_AGE to set the lifetime of a
database connection (persistence)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
UUIDs
●
Use UUID for Primary Keys (instead of
incremental IDs)
– Guaranteed uniqueness, avoid collisions
– UUIDs are well-indexed
●
Easier db sharding
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Slow tests?
●
Skip migrations: --keepdb
●
Run in parallel: --parallel
●
Disable unused middlewares, installed_apps,
password hashers, logging, etc…
●
Use mocking whenever possible
David Arcos - @DZPMEfficient Django – #EuroPython 2016
4) Conclusions
●
Measure first
●
Optimize only the bottleneck
●
Go for the low-hanging fruit
●
Measure again
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Good resources
●
The official Django documentation
●
Book: “High Performance Django”
●
Blog: “Instagram Engineering”
●
“Latency Numbers Every Programmer Should Know”
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Thanks for attending!
- Get the slides at http://slideshare.net/DZPM
- We are looking for engineers and data scientists!

Efficient Django

  • 1.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Efficient Django
  • 2.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Abstract Tips and best practices for avoiding scalability issues and performance bottlenecks in Django ● 1) Basic concepts: the theory ● 2) Measuring: how to find bottlenecks ● 3) Tips and tricks ● 4) Conclusion (yes, it scales!)
  • 3.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Hi! ● I'm David Arcos ● Python/Django developer since 2008 ● Co-organizer at Python Barcelona ● CTO at Lead Ratings
  • 4.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 ● “We improve your sales conversions, using predictive algorithms to rate the leads” ● Prediction API, “Machine Learning as a Service” ● http://lead-ratings.com
  • 5.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 1) Basic concepts
  • 6.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 The Pareto Principle "For many events, roughly 80% of the effects come from 20% of the causes"
  • 7.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Prioritize and focus Focus on the few tasks that will have the most impact
  • 8.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Basic scalability “Potential to be enlarged to handle a growing amount of work” ● Stateless app servers – Load balance them, scale horizontally ● Keep the state on the database(s) – This is the difficult part! Each system is different
  • 9.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Database performance ● Do less requests: – Less reads – Less writes ● Do faster requests: – Indexed fields – De-normalize
  • 10.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Templates ● Cache them ● Jinja2 is a bit faster than the default engine – but cache them anyways ● You can do fragment caching (for blocks)
  • 11.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Cache ● Generic approach: cache at each stack level ● The cache documentation is excellent ● Beware of the cache invalidation!
  • 12.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Cache ● Generic approach: cache at each stack level ● The cache documentation is excellent ● Beware of the cache invalidation!
  • 13.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Bottlenecks ● Where is your bottleneck? ● CPU bound or I/O bound? – CPU? Run heavy calculations in async workers – Memory? Compress objects before caching – Database? Read from db replicas ● How to find it?
  • 14.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 2) Measuring
  • 15.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Can't improve what you don't measure ● Measure your system to find bottlenecks ● Optimize those bottlenecks ● Verify the improvements ● Rinse and repeat!
  • 16.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Monitoring ● System: load, CPU, memory... ● Database: q/s, response time, size ● Cache: q/s, hit rate ● Queue: length ● Custom: metrics for your app
  • 17.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Profiling ● The cProfile module provides profiling of Python programs by collecting data: – Number of calls, running time, time per call...
  • 18.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 timeit ● The timeit module is a simple way to time execution time of small bits of Python code:
  • 19.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 ipdb ● Like pdb, but for ipython – tab completion, syntax highlighting, better tracebacks, better introspection… ● Use ipdb.set_trace() to add a breakpoint and jump in with the debugger
  • 20.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 django-debug-toolbar ● Display debug information about the current request/response ● Panels, very modular
  • 21.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 django-debug-toolbar-line-profiler ● A toolbar panel for profiling Django Debug Panel ● Chrome extension ● For AJAX requests and non-HTML responses
  • 22.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 3) Tips and tricks
  • 23.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Add db indexes ● Single (db_index) or multiple (index_together) ● Be sure to profile and measure! – Sometimes it’s not obvious (i.e., admin) – Huge difference, i.e. from 15s to 3 ms (3.5M rows) ● But: uses more space, slower writes
  • 24.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Do bulk operations ● Will greatly reduce the number of SQL queries: – Model.objects.bulk_create() – qs.update() <- maybe with F() expressions – qs.delete()
  • 25.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Get related objects ● Return FK fields in same query: – qs.select_related() ● Return M2M fields, extra query: – qs.prefetch_related()
  • 26.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Slow admin? ● Use list_select_related ● Overwrite get_queryset() with prefetch_related ● Is ordering using an index? Same for search_fields ● readonly_fields will avoid FK/M2M queries ● Use the raw_id_fields widget (or better: django-salmonella) ● Extend admin/filter.html to show filters as <select>
  • 27.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Cachalot ● Caches your Django ORM queries and automatically invalidates them
  • 28.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Queues and workers ● Do slow stuff later ● Some operations can be queued, and executed asynchronously in workers ● Use Celery
  • 29.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Cached sessions ● Use SESSION_ENGINE to set cached sessions: – Non-persistent: don’t hit the DB – Persistent: don’t hit the DB… so often
  • 30.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Persistent connections ● Use CONN_MAX_AGE to set the lifetime of a database connection (persistence)
  • 31.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 UUIDs ● Use UUID for Primary Keys (instead of incremental IDs) – Guaranteed uniqueness, avoid collisions – UUIDs are well-indexed ● Easier db sharding
  • 32.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Slow tests? ● Skip migrations: --keepdb ● Run in parallel: --parallel ● Disable unused middlewares, installed_apps, password hashers, logging, etc… ● Use mocking whenever possible
  • 33.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 4) Conclusions ● Measure first ● Optimize only the bottleneck ● Go for the low-hanging fruit ● Measure again
  • 34.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Good resources ● The official Django documentation ● Book: “High Performance Django” ● Blog: “Instagram Engineering” ● “Latency Numbers Every Programmer Should Know”
  • 35.
    David Arcos -@DZPMEfficient Django – #EuroPython 2016 Thanks for attending! - Get the slides at http://slideshare.net/DZPM - We are looking for engineers and data scientists!