Successfully reported this slideshow.

Advanced Django ORM techniques

41

Share

Upcoming SlideShare
Advanced Django
Advanced Django
Loading in …3
×
1 of 51
1 of 51

Advanced Django ORM techniques

41

Share

Download to read offline

Django's ORM is extremely powerful, allowing you to manage your data without ever going near a line of SQL and hiding a multitude of complexities. But its power can sometimes be a curse rather than a blessing, multiplying queries without your knowledge and bringing your database to its knees.

In this session I explain what's going on behind the scenes and present some techniques to make your ORM use more efficient, showing how to monitor what's going on and how to better deal with relationships, indexes and more.

This talk was presented at Europython 2010 in Birmingham.

Django's ORM is extremely powerful, allowing you to manage your data without ever going near a line of SQL and hiding a multitude of complexities. But its power can sometimes be a curse rather than a blessing, multiplying queries without your knowledge and bringing your database to its knees.

In this session I explain what's going on behind the scenes and present some techniques to make your ORM use more efficient, showing how to monitor what's going on and how to better deal with relationships, indexes and more.

This talk was presented at Europython 2010 in Birmingham.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Advanced Django ORM techniques

  1. 1. Advanced Django ORM techniques Daniel Roseman http://blog.roseman.org.uk
  2. 2. About Me • Python user for five years • Discovered Django four years ago • Worked full-time with Python/Django since 2008. • Top Django answerer on StackOverflow! • Occasionally blog on Django, concentrating on efficient use of the ORM.
  3. 3. Contents • Behind the scenes: models and fields • How model relationships work • More efficient relationships • Other optimising techniques
  4. 4. Django ORM efficiency: a story
  5. 5. 414 queries!
  6. 6. How can you stop this happening to you? http://www.flickr.com/photos/m0n0/4479450696
  7. 7. Behind the scenes: models and fields http://www.flickr.com/photos/spacesuitcatalyst/847530840
  8. 8. Defining a model • Model structure initialised via metaclass • Called when model is first defined • Resulting model class stored in cache to use when instantiated
  9. 9. Fields • Fields have contribute_to_class • Adds methods, eg get_FOO_display() • Enables use of descriptors for field access
  10. 10. Model metadata • Model._meta • .fields • .get_field(fieldname) • .get_all_related_objects()
  11. 11. Model instantiation • Instance is populated from database initially • Has no subsequent relationship with db until save • No identity between models
  12. 12. Querysets • Model=manager returns a queryset: foos Foo.objects.all() • Queryset is an ordered list of instances of a single model • No database access yet • Slice: foos[0] • Iterate: {% for foo in foos %}
  13. 13. Where do all those queries come from? • Repeated queries • Lack of caching • Relational lookup • Templates as well as views
  14. 14. Repeated queries def get_absolute_url(self): return "%s/%s" % ( self.category.slug, self.slug ) Same category, but query is repeated for each article
  15. 15. Repeated queries • Same link on every page • Dynamic, so can't go in urlconf • Could be cached or memoized
  16. 16. Relationships http://www.flickr.com/photos/katietegtmeyer/124315322
  17. 17. Relational lookups • Forwards: foo.bar.field • Backwards: bar.foo_set.all()
  18. 18. Example models class Foo(models.Model): name = models.CharField(max_length=10) class Bar(models.Model): name = models.CharField(max_length=10) foo = models.ForeignKey(Foo)
  19. 19. Forwards relationship >>> bar = Bar.objects.all()[0] >>> bar.__dict__ {'id': 1, 'foo_id': 1, 'name': u'item1'}
  20. 20. Forwards relationship >>> bar.foo.name u'item1' >>> bar.__dict__ {'_foo_cache': <Foo: Foo object>, 'id': 1, 'foo_id': 1, 'name': u'item1'}
  21. 21. Fowards relationships • Relational access implemented via a descriptor: django.db.models.fields.related. SingleRelatedObjectDescriptor • __get__ tries to access _foo_cache • If doesn't exist, does lookup and creates cache
  22. 22. select_related • Automatically follows foreign keys in SQL query • Prepopulates _foo_cache • Doesn't follow null=True relationships by default • Makes query more expensive, so be sure you need it
  23. 23. Backwards relationships {% for foo in my_foos %} {% for bar in foo.bar_set.all %} {{ bar.name }} {% endfor %} {% endfor %}
  24. 24. Backwards relationships • One query per foo • If you iterate over foo_set again, you generate a new set of db hits • No _foo_cache • select_related does not work here
  25. 25. Optimising backwards relationships • Get all related objects at once • Sort by ID of parent object • Then cache in hidden attribute as with select_related
  26. 26. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  27. 27. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  28. 28. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  29. 29. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  30. 30. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  31. 31. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  32. 32. Optimising backwards [{'time': '0.000', 'sql': u'SELECT "foobar_foo"."id", "foobar_foo"."name" FROM "foobar_foo"'}, {'time': '0.000', 'sql': u'SELECT "foobar_bar"."id", "foobar_bar"."name", "foobar_bar"."foo_id" FROM "foobar_bar" WHERE "foobar_bar"."foo_id" IN (SELECT U0."id" FROM "foobar_foo" U0)'}]
  33. 33. Optimising backwards • Still quite expensive, as can mean large dependent subquery – MySQL in particular very bad at these • But now just two queries instead of n • Not automatic – need to remember to use _related_items attribute
  34. 34. Generic relations • Foreign key to ContentType, object_id • Descriptor to enable direct access • iterating through creates n+m queries(n=number of source objects, m=number of different content types) • ContentType objects automatically cached • Forwards relationship creates _foo_cache • but select_related doesn't work
  35. 35. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  36. 36. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  37. 37. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  38. 38. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  39. 39. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  40. 40. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  41. 41. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  42. 42. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  43. 43. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  44. 44. Other optimising techniques
  45. 45. Memoizing • Cache property on first access • Can cache within instance, if multiple accesses within same request def get_expensive_items(self): if not hasattr(self, '_cache'): self._cache = self.expensive_op() return self._cache
  46. 46. DB Indexes • Pay attention to slow query log and debug toolbar output • Add extra indexes where necessary - especially for multiple-column lookup • Use EXPLAIN
  47. 47. Outsourcing • Does all the logic need to go in the web app? • Services - via eg Piston • Message queues • Distributed tasks, eg Celery
  48. 48. Summary • Understand where queries are coming from • Optimise where necessary, within Django or in the database • and...
  49. 49. PROFILE
  50. 50. Daniel Roseman http://blog.roseman.org.uk

Editor's Notes


  • (background: montage of Limmud, rosemanblog, Capital, Classic, Heart, GlassesDirect)

  • Some of same ideas in Guido&apos;s Appstats talk this morning




  • It&apos;s a model, in a field, geddit?
  • For more, see Marty Alchin, Pro Django (Apress)
  • descriptors used especially in related objects - see later
  • Very useful for introspection and working out what&apos;s going on
  • explain identity: multiple instances relating to same model row aren&apos;t the same object, changes made to one don&apos;t reflect the other; even saving one with new values won&apos;t be reflected in others.
  • Update, Aggregates, Q, F


  • Find repeated queries with my branch of the django-debug-toolbar, or SimonW&apos;s original query debug middleware



  • Actually in 1.2 there&apos;s an extra _state object in __dict__, which is used for the multiple DB support (which I&apos;m not covering here).


  • Lack of model identity means that accessing the related item on one instance does not cause cache to be created on other instances that might reference the same db row


  • Note: backwards cache does work on OneToOne as of 1.2












  • +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
    | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
    | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
    +----+-----------+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
    | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
    | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
    +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+

    --------+-----------+-----------------+---------------+---------+---------+------+------+-------------+
























  • ×