-
1.
Advanced Django ORM
techniques
Daniel Roseman http://blog.roseman.org.uk
-
2.
About Me
• Python user for five years
• Discovered Django four years ago
• Worked full-time with Python/Django since
2008.
• Top Django answerer on StackOverflow!
• Occasionally blog on Django, concentrating
on efficient use of the ORM.
-
3.
Contents
• Behind the scenes: models and fields
• How model relationships work
• More efficient relationships
• Other optimising techniques
-
4.
Django ORM
efficiency: a story
-
5.
414 queries!
-
6.
How can you stop this
happening to you?
http://www.flickr.com/photos/m0n0/4479450696
-
7.
Behind the scenes:
models and fields
http://www.flickr.com/photos/spacesuitcatalyst/847530840
-
8.
Defining a model
• Model structure initialised via metaclass
• Called when model is first defined
• Resulting model class stored in cache to
use when instantiated
-
9.
Fields
• Fields have contribute_to_class
• Adds methods, eg get_FOO_display()
• Enables use of descriptors for field access
-
10.
Model metadata
• Model._meta
• .fields
• .get_field(fieldname)
• .get_all_related_objects()
-
11.
Model instantiation
• Instance is populated from database initially
• Has no subsequent relationship with db
until save
• No identity between models
-
12.
Querysets
• Model=manager returns a queryset:
foos Foo.objects.all()
• Queryset is an ordered list of instances
of a single model
• No database access yet
• Slice: foos[0]
• Iterate: {% for foo in foos %}
-
13.
Where do all those
queries come from?
• Repeated queries
• Lack of caching
• Relational lookup
• Templates as well as views
-
14.
Repeated queries
def get_absolute_url(self):
return "%s/%s" % (
self.category.slug,
self.slug
)
Same category, but query is
repeated for each article
-
15.
Repeated queries
• Same link on every
page
• Dynamic, so can't
go in urlconf
• Could be cached
or memoized
-
16.
Relationships
http://www.flickr.com/photos/katietegtmeyer/124315322
-
17.
Relational lookups
• Forwards:
foo.bar.field
• Backwards:
bar.foo_set.all()
-
18.
Example models
class Foo(models.Model):
name = models.CharField(max_length=10)
class Bar(models.Model):
name = models.CharField(max_length=10)
foo = models.ForeignKey(Foo)
-
19.
Forwards relationship
>>> bar = Bar.objects.all()[0]
>>> bar.__dict__
{'id': 1, 'foo_id': 1, 'name': u'item1'}
-
20.
Forwards relationship
>>> bar.foo.name
u'item1'
>>> bar.__dict__
{'_foo_cache': <Foo: Foo object>, 'id': 1,
'foo_id': 1, 'name': u'item1'}
-
21.
Fowards relationships
• Relational access implemented via a
descriptor:
django.db.models.fields.related.
SingleRelatedObjectDescriptor
• __get__ tries to access _foo_cache
• If doesn't exist, does lookup and creates
cache
-
22.
select_related
• Automatically follows foreign keys in SQL
query
• Prepopulates _foo_cache
• Doesn't follow null=True relationships by
default
• Makes query more expensive, so be sure
you need it
-
23.
Backwards relationships
{% for foo in my_foos %}
{% for bar in foo.bar_set.all %}
{{ bar.name }}
{% endfor %}
{% endfor %}
-
24.
Backwards relationships
• One query per foo
• If you iterate over foo_set again, you
generate a new set of db hits
• No _foo_cache
• select_related does not work here
-
25.
Optimising backwards
relationships
• Get all related objects at once
• Sort by ID of parent object
• Then cache in hidden attribute as with
select_related
-
26.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
27.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
28.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
29.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
30.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
31.
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
relation_dict.setdefault(
obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
obj_dict[id]._related = related
-
32.
Optimising backwards
[{'time': '0.000', 'sql': u'SELECT
"foobar_foo"."id", "foobar_foo"."name" FROM
"foobar_foo"'},
{'time': '0.000', 'sql': u'SELECT
"foobar_bar"."id", "foobar_bar"."name",
"foobar_bar"."foo_id" FROM "foobar_bar"
WHERE "foobar_bar"."foo_id" IN (SELECT
U0."id" FROM "foobar_foo" U0)'}]
-
33.
Optimising backwards
• Still quite expensive, as can mean large
dependent subquery – MySQL in particular
very bad at these
• But now just two queries instead of n
• Not automatic – need to remember to use
_related_items attribute
-
34.
Generic relations
• Foreign key to ContentType, object_id
• Descriptor to enable direct access
• iterating through creates n+m
queries(n=number of source objects,
m=number of different content types)
• ContentType objects automatically cached
• Forwards relationship creates _foo_cache
• but select_related doesn't work
-
35.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
36.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
37.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
38.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
39.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
40.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
41.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
42.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
43.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id,
set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.
in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[content_type_id][item.object_id]
)
-
44.
Other optimising
techniques
-
45.
Memoizing
• Cache property on first access
• Can cache within instance, if multiple
accesses within same request
def get_expensive_items(self):
if not hasattr(self, '_cache'):
self._cache = self.expensive_op()
return self._cache
-
46.
DB Indexes
• Pay attention to slow query log and
debug toolbar output
• Add extra indexes where necessary -
especially for multiple-column lookup
• Use EXPLAIN
-
47.
Outsourcing
• Does all the logic need to go in the web
app?
• Services - via eg Piston
• Message queues
• Distributed tasks, eg Celery
-
48.
Summary
• Understand where queries are coming
from
• Optimise where necessary, within Django
or in the database
• and...
-
49.
PROFILE
-
50.
Daniel Roseman
http://blog.roseman.org.uk
(background: montage of Limmud, rosemanblog, Capital, Classic, Heart, GlassesDirect)
Some of same ideas in Guido's Appstats talk this morning
It's a model, in a field, geddit?
For more, see Marty Alchin, Pro Django (Apress)
descriptors used especially in related objects - see later
Very useful for introspection and working out what's going on
explain identity: multiple instances relating to same model row aren't the same object, changes made to one don't reflect the other; even saving one with new values won't be reflected in others.
Update, Aggregates, Q, F
Find repeated queries with my branch of the django-debug-toolbar, or SimonW's original query debug middleware
Actually in 1.2 there's an extra _state object in __dict__, which is used for the multiple DB support (which I'm not covering here).
Lack of model identity means that accessing the related item on one instance does not cause cache to be created on other instances that might reference the same db row
Note: backwards cache does work on OneToOne as of 1.2
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
| 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+-----------+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
| 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
--------+-----------+-----------------+---------------+---------+---------+------+------+-------------+