This document discusses optimizing Django performance. It begins by describing the server, Apache, database, and Django configurations. It then discusses optimizing various aspects of the Django project like the ORM, database queries, template rendering, caching, and DjangoCMS-specific optimizations. Key recommendations include using select_related, prefetch_related, batch database operations to avoid queries in loops, optimizing templates, reducing cache misses, and handling {% page_url %} tags efficiently in DjangoCMS projects. The document provides examples of inefficient code and more optimized alternatives.
30. Imaginary app (page 3)
class Order(models.Model):
owner = models.ForeignKey(settings.AUTH_USER_MODEL)
lines = models.ManyToManyField("books.OrderLine", blank=True)
created = models.DateField(auto_now_add=True)
updated = models.DateField(auto_now=True)
class OrderLine(models.Model):
book = models.ForeignKey('books.Book', related_name='order_lines')
31. ORM & Database queries
Tips:
● Make use of `select_related` and `prefetch_related` for fetching related
objects at once.
● Make use of `only` and `defer` (typical use case: listing views), but use
them with caution.
● Use `annotate` and `aggregate` for calculations on database level.
● When fetching IDs of related item (if you only need ID), use
{field_name}_id instead of {field_name}.id, since the first one does not
cause table JOIN.
● Consider using `values` or `values_list` for listing views, when possible.
32. Use case
Listing of books (about 2000 in total) with the following information:
● Book title
● Names of all authors of the book separated by comma
● Publisher name
● Number of pages
● Price
46. Bad example
for __i in range(1, 50):
author = Author.objects.create(
salutation='Something %s' % uuid.uuid4(),
name='Some name %s' % uuid.uuid4(),
email='name%s@example.com' % uuid.uuid4()
)
59. Given the following model...
class Item(models.Model):
text = models.TextField()
page_id = models.IntegerField()
...where page_id is the correspondent ID
of the CMS page
65. Improved example
# Freeze the queryset
items = Item.objects.all()[:]
# Collect all page ids
page_ids = [item.page_id for item in items]
pages = Page.objects.filter(id__in=page_ids)
pages_dict = {page.id: page for page in pages}
for item in items:
page = pages_dict.get(item.page_id)
# ... do something else with item
67. Unnecessary cache hits
Tips:
● Analyze cache usage with django-debug-toolbar.
● Use template fragment caching to minimize the number of cache queries.
● Identify heavy parts of your templates with TemplateProfilerPanel.
Optimise them first and after that, if they are still heavy, cache them.
● Try to avoid repetitive missed cache queries.
● The well known {% page_url %} and {% page_attribute %} tags of
DjangoCMS may produce a lot of missed cache queries.
● Document cache usages. Explain in the code why do you cache that
certain part. Keep track of all cache usages in a separate section of your
developer documentation.
68. Optimise templates
Tips:
● Clean up your templates. Remove all unused import statements.
● Use cached template loading on production.
● Avoid database queries done in the loop. Especially {% page_url %} and
{% page_attribute %} template tags of DjangoCMS.
69. A couple of things
● Try to reduce the number of database queries
● Use EXPLAIN when things are still slow.
● Add indexes when necessary.
● Compare the "before" and "after". Do it often.
● Test page load speed with caching on and off.
● Test the first page load and the second page load. Analyze the difference.
● Try to reduce the number of missed cache queries.
70. Tips
● Try to get rid of context processors and middleware that hit the database.
● Be careful with templatetags that query the database/cache. Especially,
when they are done in a loop.
74. Hold on, there’s a way to fix it!
# Some sort of a fake page container, to trick django-cms.
PageData = namedtuple('PageData', ['pk'])
class MyView(View):
def get(self, request):
# This is to trick the django-cms middleware, so that we don't
# fetch page from the request (not needed here).
request._current_page_cache = PageData("_")
context = self.get_context(request)
return render_to_response(
self.template_name),
context,
request
)
77. Dealing with {% page_url %}
Avoid:
● Use of {% page_url %} or {% page_attribute %} tags on IDs. Pass
complete objects only.
● Apply tricks to the querysets to fetch the page objects efficiently.
● If you list a lot of DjangoCMS Page objects (with URLs) on a page, don’t
use {% page_url %} at all. Instead, use customized template tag which fits
your needs better.
82. Improved example (page 2)
pages_dict = {}
for _title in titles:
if not hasattr(_title.page, 'title_cache'):
_title.page.title_cache = {}
_title.page.title_cache[get_language()] = _title
pages_dict[_title.page.id] = _title.page
_pages = []
for page in pages:
_pages.append(page_dict.get(page.id, page))
pages = _pages
83. Improved example (page 3)
class AddonsPageUrlNoCache(PageUrl):
"""PageUrl made that doesn't hit the cache.
You should not be using this everywhere, however if you know what you're
doing, it can save you a lot of page rendering time.
"""
name = 'addons_page_url_no_cache'
84. Improved example (page 4)
def get_value(self, context, page_lookup, lang, site):
site_id = get_site_id(site)
request = context.get('request', False)
if not request:
return ''
if lang is None:
lang = get_language_from_request(request)
page = _get_page_by_untyped_arg(page_lookup, request, site_id)
if page:
url = page.get_absolute_url(language=lang)
if url:
return url
return ''
85. Improved example (page 5)
register.tag(AddonsPageUrlNoCache)
{% for page in pages %}
{% addons_page_url_no_cache page %}
{% endfor %}
90. Bad example
class TextPictureLink(AbstractText):
title = models.TextField(_("Title"), null=True, blank=True)
image = FilerImageField(null=True, blank=True)
page_link = PageField(null=True, blank=True)
class GenericContentPlugin(TextPlugin):
module = _('Blocks')
render_template = 'path/to/generic_text_plugin.html'
name = _('Generic Plugin')
model = models.TextPictureLink
92. Good example
class GenericContentPlugin(TextPlugin):
# ... original code
@classmethod
def get_render_queryset(cls):
"""Tweak the queryset."""
return cls.model._default_manager
.all()
.select_related('image', 'page_link')
95. Apache JMeter
Highlights:
● Load/stress testing tool for analyzing/measuring the performance.
● Focused on web applications.
● Plugin based architecture.
● Supports variable parametrisation, assertions (response validation), per-
thread cookies, configuration variables.
● Comes with a large variety of reports (aggregations, graphs).
● Can replay access logs, including Apache2, Nginx, Tomcat and more.
● You can save your tests and repeat them later.
Hi,
Next to me is Artur Barseghyan, my name is Job Ganzevoort.
We’re developers at Goldmund, Wyldebeast and Wunderliebe.
We’re here to present the work we did last year to get a high-traffic e-commerce site to perform great, especially under stress.
We’ll discuss overall architecture, bottlenecks, solutions and results
Vacansoleil --- sponsoring today’s event --- is an e-commerce company selling camping holidays.
Over 500 campings throughout Europe.
Company has its head office is in the Netherlands, and sales offices in 11 countries.
Domains .nl .be .fr .de .at .it .pl .dk .co.uk .ie .hu
Seasonal: peak visits first 2 weeks of january, sunday evening, 21:30
600K pageviews per day, sustained load of 2K pageviews per minute for hours.
iSeries ERP-ish system responsible for product, availability and transactions.
Revamping architecture, piece by piece, servicebus.
New website, gradually deployed to all domains from march to september 2016.
Old website frontend would talk directly to iSeries.
New website frontend through Django workers.
AJAX heavy search mechanism.
Will new website be able to handle the load?
We had a hunch it wouldn't.
Topic of this talk is the performance of the webserver, measured in number of page views per second that the site can handle.
As a result of improving the performance, the user experience improved a lot too.
During 2016 we gradually delivered the new website to all vacansoleil domains.
In this new version, browsers communicate only with the webserver, not with the iSeries backend.
Browser requests are handled with Apache, where statics and media are served by apache directly, dynamic requests proxied to Django running under gunicorn. Our database of choice is PostgresQL.
A lot of content is retrieved from the iSeries backend.
I’ll be honest: our MVP felt sluggish even without load.
We started investigating bottlenecks.
These included the items in this sheet.
Each time we eliminated one, we’d be running into a new wall.
Over time though, things improved a lot.
The order in which we’re presenting here is more outside-in that chronological.
We’re running in a virtualized environment so these parameters can be tuned.
We’re using a single VPS. Partitioning it over several servers so far hasn’t been necessary.
Processors do 5000 bogomips. Yay!
We started with 8 cores. There was a time we were running with 40 cores but better caching brought the need down to about 16.
We’re using 24 now which gives us plenty headroom.
Allocation copious amounts of RAM to different components in the system certainly helped.
OS level filesystem caching is great too.
The virtualization platform allows us to scale servers up to 48 cores, 192GB.
Our webserver is Apache.
Out of the box, the default Multi Processing Module of apache is “prefork”, which allocates a fixed number of processes to handle requests.
Allowing browsers to use keepalive blocks idle processes, severely limiting the number of concurrent users of the site.
Even under a moderate load this quickly became a problem. Setup time for an http connection would be long, waiting for processes to become available.
The event MPM uses a single thread to listen for incoming requests, then delegating these to worker threads, managing the total amount of worker threads, keeping that number between a predefined lower and upper bound. With the values we picked, apache was never the bottleneck during stresstests or real-life load.
We started adding logging.
With a high number of django workers, max_connections at 100 was not enough.
Our server has more than enough RAM, so we allocated generous amounts to the database.
Large shared_buffers help keeping the database in RAM, reducing accesses to the harddrive.
We added logging to find slow DB queries, choosing 500ms as an arbitrary threshold.
This helped us identify locking issues with the django_sessions table.
As a result of this tuning, database calls were noticeably faster and especially the number of slow queries dropped.
We experimented with the number of gunicorn worker processes. If this number is too low, handling of requests can stall eventually giving timeouts.
Advise per gunicorn docs: start with (2 x $num_cores) + 1, we found 32 to be plenty.
Typical usage of this site is anonymous so we tried to avoid sessions.
We only use sessions for logged-in users, these are site admins and content editors.
Storing the session in memcached instead of postgresql is faster but more volatile.
If memcached doesn’t work properly (not enough memory allocated, or reboot), editors will have to login again.
This is mostly about the search page, but the search widget is basically available on each page of the site.
So here’s how the search widget works. The yellow block on the left is the form containing widgets for destination (country/region), period (arrival/departure/num_days_earlier_later), travel group (adults/children/pets), accommodation types (mobile home/tent/etc) and other features/facilities.
The search API call returns
1: all campings and their accommodations that match the form values.
That information is a plain list containing only IDs, to keep the result set small (<20KB) and fast.
2: facet counts: the number of campings matching a particular item, restricted by the other form widgets.
These results are displayed
1: within the form (facet counts)
2: header
3: pagination bar
4: results area filled through second API call
The current page consists of 5 search results (campings with accommodations matching the form values), rendered through
searchresults API call.
We tried to make this as lean as possible too. You’re seeing 1 foto per camping here, but there’s a bxslider there with about 25 fotos in it. The template code to process those 5x25 fotos took about 60% of the time in this call. We now only include 3 here, and the rest on interacting with the bxslider.
So, when searching, we’re actually looking for available accommodations that match the selected criteria.
Some data is rather static; for instance: campings and accommodations usually stay within the same country. The presence of a swimming pool also doesn’t change multiple times per day. Availability however does, because someone might have just booked the last remaining mobile home at a particular camping.
We fetch all static searchfacts pre-emptively. Daily sync. From this, we create AccoFilterSets, for example a set for all accommodations that have airconditioning.
On performing a search, we query the iSeries backend for raw availability data including only the date parameters. This yields a set of available accommodations that is then intersected with the AccoFilterSets for the other selected features to get the final search result set.
In that same pass the facet counting is performed: for each possible parameter value the number of campings/accommodations you’d get if that parameter was selected, in relation to the other selected parameters.
So why not elasticsearch (or solr, or whathaveyou)?
1: actually, the code isn’t too complicated, about 200 lines of code, and fast, after getting the raw availability set, all filtering and faceting is done in less that 20 milliseconds.
2: modeling this, especially the faceted search, in elasticsearch wasn’t easy. A complicating aspect is that we’re filtering on accommodations, but we’re showing the number of campings that have an accommodation with the required features.
The webserver is making a lot of calls to the iSeries backend.
One of the reasons we implemented search as I explained on the previous slide is that the iSeries request for raw availability is decoupled from the other search criteria, increasing the chances that the results are reusable. So we cache these, using Memcached.
A user interacting with the widget, for example selecting a country/region, mobilehome with airconditioning, gets the facet counts updated rather quickly because the iSeries backend is queried only once. A second user, searching with the same arrival/departure dates will reuse the cached availability information too.
Maximum cache age is a setting per API call type. Availability caching is rather short, like 10 minutes to an hour.
For prices, the iSeries allows querying a batch of accommodations at once, and for rendering the search results we do. Pagination shows 5 campings at once, each with a handful of accommodations. In this case, we cache the individual accommodation prices and not the entire batch.
This wasn’t “just add varnish, and poof…”, we had to adapt the site to be varnish friendly.
We also made varnish quite aggressive.
An early observation was that we have many anonymous, few authorized visitors.
We took extra steps to separate these authorized visitors, who are site admins and content editors.
These use a separate URL which is uncached, limited to local network (and VPN) only.
The public site URL is available only for anonymous users, and cacheable.
On GET requests, varnish simply strips all cookies, on the response too.
Most GET query parameters (for example google analytics campaign parameters) are handled in javascript, so removed from the query. Django doesn’t need to see those.
Structuring pages / AJAX calls to be as independent / cacheable as possible.
Shopping cart (suitcase icon topright)
Csrftoken
Overall 40% hitrate.
Highest on (cms) pages which is great, because those are expensive.
Lower on search API because wide variety in possible search parameters. But those are pretty fast due to reuse of partial search results.
We’re halfway our presentation now, and I’d like to summarise the approach so far
We’ve looked at the full stack: server, database, apache configuration.
Varnish reverse proxy caching, for which we adapted the site to make varnish really effective.
We designed the search APIs to be simple, fast, cacheable.
Smart search mechanism.
Reusing queries to the iSeries backend.
But there’s more
Artur will take you on a tour of Django and DjangoCMS-specific optimizations, focussing on the ORM.
Django optimisations
Optimisations take time.
It may seem endless.
And in most of the cases you won’t reach the limits of “no optimisations further possible”.
But, there are a couple of key factors to consider.
But how to measure performance?
Profiling tools. Django has some.
django-debug-toolbar
Just make sure you don’t accidentally enable it on staging or production.
Surprised?
Just joking
django-debug-toolbar-template-profiler
OK, another one. One of the best add-ons ever.
Just pip install, add extra panel to INSTALLED_APPS and DEBUG_TOOLBAR_PANELS.
I discovered it about 2 years ago.
Be careful, though. It affects the page load speed drastically.
Before we move on
django-debug-toolbar is an excellent tool, but page rendering timings lie.
django-debug-toolbar needs to render a lot of HTML as well.
The more database/cache/etc information to show, the longer page loads you get.
As I mentioned, some panels affect the page load speed drastically.
Don’t be scared, though. In production (or with DEBUG set to False) things aren’t that slow.
However, it’s not an excuse to avoid optimisation.
When you’re outside the request/response cycle, use logging to get an idea of what’s happening to your system.
Let’s start with imaginary app...
Just a couple of models. And views.
They aren’t anyhow related to the Vacansoleil.
Still are based on issues and use cases I have experienced there.
Imaginary app (page 1)
Pubisher, Author
Imaginary app (page 2)
Book model with many-to-many relation to Author and foreign key relation to Publisher.
Imaginary app (page 3)
Do I need this?
ORM & Database queries
To name a couple of tips…
Make use of `select_related` and `prefetch_related` for fetching related objects at once.
Make use of `only` and `defer` (typical use case: listing views), but use them with caution.
Use `annotate` and `aggregate` for calculations on database level.
When fetching IDs of related item (if you only need ID), use {field_name}_id instead of {field_name}.id, since the first one does not cause table JOIN.
Consider using `values` or `values_list` for listing views, when possible.
Use case
Listing of books (about 2000 in total) with the following information:
Book title
Names of all authors of the book separated by comma
Publisher name
Number of pages
Price
Our sample use case - just have a simple book listing
Show title
Show names of all authors of the book (remember, many-to-many relation) separated by comma
Show publisher name (remember, foreign key relation)
Show number of pages (same model/table)
Show price (same model/table)
Bad example
A bad way of fetching information.
Results
4023 queries took 730 ms, page rendered in 23932 ms
As a result, about 4000 queries, slow page loads.
Sad, isn’t it?
Let’s improve
Let’s improve…
We will use:
select_related for dealing with foreign key relations.
prefetch_related for dealing with many-to-many relations.
only for fetching desired columns only.
Good example
Just putting things together.
Results
2 queries took 12 ms, page rendered in 2104 ms
Only 2 database hits, much faster page load.
Surprised?
Even better example
This is how it would look.
Note, that no prefetch_related, select_related or only is needed here.
Values will take care of it.
Results
1 query took 13 ms, page rendered in 568 ms
One query left, page rendered 4 times faster than in improved example.
Nice, isn’t it?
But what is GroupConcat?
Some of you might ask…
But what is a GroupConcat?
Shall I go 2 pages back? OK.
Custom aggregation functions
Django makes it very easy to write custom aggregation functions.
There are a lot of bundled ones, such as Sum, Avg, Min, Max, etc.
If you use Postgres, make sure to check a dedicated contrib package (contrib/postgres/aggregates).
In case of Postgres you should probably be using “StringAgg”.
But for now - it’s just our tiny custom aggregation function, which saved us a lot of page rendering time.
Surprised?
Avoid database hits in the loop!
I’m gonna repeat it more than once today.
Use case
Our next example.
Create 50 records.
Bad example
A bad way to do it.
Results
98 queries took 40 ms, page rendered in 652 ms
Sad, isn’t it?
Let’s improve
Let’s improve!
Use bulk_create.
Good example
Here 50 objects are inserted at once.
Of course you should be using factories, but it’s out of the scope of this presentation!
Results
2 queries took 1.73 ms, page rendered in 60 ms
That’s a lot times faster!
Nice, isn’t it?
Surprised?
If you use django-debug-toolbar already...
Those of you who worked with django-debug-toolbar may want to say...
I know it, django-debug-toolbar is great...
“What the hell is this guy talking about? It’s a well known thing!”
“Say something new! We use DDT for at least 5 years already”.
While other would say:
“Yeah, it’s nice, but it doesn’t handle the AJAX requests and no JSON responses at all...”
However...
django-debug-toolbar-force
It’s possible! with django-debug-toolbar-foce
A single pip install, tiny additions to your configuration (MIDDLEWARE_CLASSES or MIDDLEWARE) and…
You can debug AJAX views and JSON responses just the same way, almost.
Just add ?debug-toolbar to a URL of any AJAX or JSON view.
Surprised?
You didn’t know about it, did you? :)
Hey, enough with these faces!
You may think, “hey, stop showing stupid pictures”.
Fine. Let’s move on...
OK, let’s go on.
One important thing!
One important thing
As I already mentioned,
I’ll be saying it more than once today...
Avoid querying in the loop!
Don’t make queries in the loop.
No matter what kind of queries, database, cache, elastic, whatever…
Just don’t.
Move all the queries outside the loop.
Fetch at once and reuse the result set later (in the loop).
Insert/update/delete in a batch.
Given the following model...
Another nasty example.
And the page_id is the correspondent ID of the CMS page.
But WHY???
You may think...
“Who writes such things?”
or
“No, it never happens in practice”
Because it happens and you have to deal with it!
But it does.
And you have to deal with it.
You want an example?
A serious example of a Django app where such things happen?
OK, you asked for it - DjangoCMS!
Pretty serious, isn’t it?
Surprised?
Did you know about it?
Surprised?
Bad example
OK, let’s see how we can deal with that.
A bad way of doing that would be… doing that in a loop!
Imagine, you have 100 items...
Results
101 database queres
Sad, isn’t it?
Improved example
Let’s improve!
We can’t use select_related or prefetch_related here.
But we can grab data at once, and do necessary joins in Python.
Results
No additional database queries and no missed cache hits.
A little bit of code, but it’s worth it, right?
Surprised?
Unnecessary cache hits
‘Now some things about the cache.
And I’m gonna talk about Django cache (Memcached, Locmem).
Don’t mix it with Varnish.
Django's cache can dramatically improve your site performance, but misuse can lead to slow-downs.
First of all, analyse.
Use django-debug-toolbar to track down what the cause is.
It many cases it would be possible to refactor the slow part.
Use template fragment caching when possible.
If you have more than 500 repetitive cache hits on a page, something isn’t right.
Identify heavy parts of your template with TemplateProfilerPanel.
Try to optimise heavy parts and after that,
if it can be done, cache those heavy parts (for at least 10 minutes).
Document in the code why you cache certain parts.
Keep track of all cache usages in a separate section of your developer documentation.
Optimise templates
Clean up your templates.
Remove unused imports.
Try to avoid queries in the loop.
If you work with DjangoCMS beware, that any {% page_url %} or {% page_attribute %} will make a lot of queries in the loop.
A couple of things
Reduced number of queries often leads to performance boost…
...still, reduced number of queries does not necessarily mean that your database is used efficiently.
Use EXPLAIN statements (provided by the django-debug-toolbar) to identify the heavy parts.
Fine-tune your queries.
Add indexes when necessary.
In development, run two instances of the project.
One with the change, another without. Always compare the "before" and "after" optimisation.
Test page load speed with caching on and off.
Test the first page load and the second page load.
Tips
Avoid use of content processors and middleware that do make database (or cache) hits.
If you need to have some data in templates, use templatetags on specific templates to fetch it.
In any case, if you absolutely have to use it, make sure it's lazy (unless impossible by-design)
DjangoCMS specific tricks
Some tricks specific to DjangoCMS.
DjangoCMS is fine...
One thing you should know is that
Once you have integrated DjangoCMS in your site
all your vanilla, non-CMS related views will become heavier.
There will be additional checks and queries fired by CMS middleware and CMS context processors.
It may seem weird and irrelevant, but that’s how it is.
Scary?
Sad isn’t it?
Hold on, there’s a way to fix it!
But wait, there’s a fix for it!
Now you non-CMS views are clean
...or almost clean.
And if they aren’t yet, try to find out how to optimise that.
Surprised?
Dealing with {% page_url %}
Try to avoid:
Usage of {% page_url %} or {% page_attribute %} tags on IDs.
Instead, pass complete objects only.
Apply some tricks to the querysets to fetch the page objects efficiently.
If you list a lot of DjangoCMS Page objects (with URLs) on a page, don’t use {% page_url %} at all.
Instead, use customised template tag which fits your needs better.
Use case
Or next use case.
A sitemap with about 200 CMS pages.
Bad example
A bad example.
Results
2000 database queries and 3000 missed cached hits
Produces a way too many database queries and missed cache hits.
Sad, isn’t it?
Improved example (page 1)
The code I’m going to show isn’t that simple and obvious.
It’s what worked well in the project I worked on.
You should not be using this everywhere.
However if you know what you're doing, it can save you a lot of page rendering time.
You might want to modify it to fit your needs better.
The slides will be available on the slideshare, you won’t miss the clue.
For now, let’s move on.
Improved example (page 2)
Sorry, it didn’t fit on one page.
Improved example (page 3)
It didn’t fit even on two pages...
Improved example (page 4)
Almost there...
Improved example (page 5)
Yes. That’s it.
Results
As a result - no additional database queries and no missed cache hits.
That’s a lot of custom code, but it’s worth it.
Surprised?
Surprised?
Use get_render_queryset on CMS plugins
The CMS plugins.
Honestly, there are parts (container plugins), which I didn’t optimise due to lack of time.
Maybe I will, one day.
But, let’s talk about parts that are possible to optimise.
Each plugin has a class method get_render_queryset.
Use it to fetch objects in a more efficient way.
Use case
Our use case, a simple plugin, which has 2 foreign key relations.
Bad example
A bad example.
Results
As a result, 3 database queries for rendering of a single plugin.
And how many plugins do you have on a page?
Good example
Let’s improve!
select_related is our friend!
Results
Yep, just 1 query for rendering of a single plugin.
Surprised?
Load testing
OK, the very last part
and I’m afraid I won’t have much time for it.
The load testing.
Apache JMeter
After a small research, we have picked Apache JMeter for a number of reasons.
Load/stress testing tool for analyzing/measuring the performance.
Focused on web applications.
Plugin based architecture.
Supports variable parametrisation, assertions (response validation), per-thread cookies, configuration variables.
Comes with a large variety of reports (aggregations, graphs).
Can replay access logs, including Apache2, Nginx, Tomcat and more.
You can save your tests and repeat them later.
We measured “before”...
Before the changes (optimisation) were deployed, we have measured the performance.
We measured “after”...
We have measured it again, after deploying the performance optimisation changes.
And “after” was a way better!
And after was a way better!