Django Meetup: Django Multicolumn Joins

  • 1,368 views
Uploaded on

A presentation shared by Hearsay Social software engineer Jeremy Tillman.

A presentation shared by Hearsay Social software engineer Jeremy Tillman.

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,368
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Django Meetup: Django Multicolumn Joins Jeremy Tillman Software Engineer, Hearsay Social @hssengineering
  • 2. Django Multicolumn Joins | © 2012 Hearsay Social 2 About Me • Joined Hearsay Social May 2012 as Software Engineering Generalist • Computer Engineer BA, Purdue University • 3 years @ Microsoft working on versions of Window Server • 9 years of databases experience – Access, SQL Server, MySql • Loves Sea Turtles!
  • 3. Django Multicolumn Joins | © 2012 Hearsay Social 3 Why do we want multicolumn joins?
  • 4. Django Multicolumn Joins | © 2012 Hearsay Social 4 Django First App: Poll example class Poll(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)
  • 5. Django Multicolumn Joins | © 2012 Hearsay Social 5 What if we stored Polls for X number of customers? class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') CREATE TABLE customer( id INT NOT NULL AUTO_INCRMENT PRIMARY KEY, name VARCHAR(100) NOT NULL); CREATE TABLE poll( id INT NOT NULL AUTO_INCRMENT PRIMARY KEY, customer_id INT NOT NULL, question VARCHAR(200) NOT NULL, pub_date DATETIME NOT NULL, INDEX idx_customer (customer_id)); CREATE TABLE choice( id INT NOT NULL AUTO_INCRMENT PRIMARY KEY, poll INT NOT NULL, choice_text VARCHAR (200), votes INT NOT NULL DEFAULT 0, INDEX idx_poll (poll_id));
  • 6. Django Multicolumn Joins | © 2012 Hearsay Social 6 How is our data being stored? CREATE TABLE choice( id INT NOT NULL AUTO_INCRMENT PRIMARY KEY, poll_id INT NOT NULL, choice_text VARCHAR (200), votes INT NOT NULL DEFAULT 0, INDEX idx_poll (poll_id)); id poll_id choice_text votes 1 1 Ham 5 2 7 Aries 8 3 2 Elephant 9 …. … … … 23,564,149 1 All of the above 2 23,564,150 74 Sea turtle 7
  • 7. Django Multicolumn Joins | © 2012 Hearsay Social 7 Data locality part 1: Scope by poll CREATE TABLE choice( id INT NOT NULL, poll_id INT NOT NULL, choice_text VARCHAR (200), votes INT NOT NULL DEFAULT 0, PRIMARY KEY (poll_id, id)); id poll_id choice_text votes 1 1 Ham 5 1,562 1 Turkey 46 23,564,149 1 All of the above 2 …. … … … 18,242,234 74 Jelly fish 0 23,564,150 74 Sea turtle 7
  • 8. Django Multicolumn Joins | © 2012 Hearsay Social 8 Data locality part 2: Scope by customer CREATE TABLE choice( id INT NOT NULL, customer_id INT NOT NULL, poll_id INT NOT NULL, choice_text VARCHAR (200), votes INT NOT NULL DEFAULT 0, PRIMARY KEY (customer_id, poll_id, id)); id poll_id customer_id choice_text votes 1 1 1 Ham 5 1,562 1 1 Turkey 46 23,564,149 1 1 All of the above 2 18,242,234 74 1 Jelly fish 0 23,564,150 74 1 Sea turtle 7 … … … … …
  • 9. Django Multicolumn Joins | © 2012 Hearsay Social 9 Representation in Django Models class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Choice(models.Model): customer = models.ForeignKey(Customer) poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
  • 10. Django Multicolumn Joins | © 2012 Hearsay Social 10 Customer Load/Data Balance customer_id id 1 1 2 2 3 3 4 4
  • 11. Django Multicolumn Joins | © 2012 Hearsay Social 11 Customer Load/Data Balance: Split Customers customer_id id 3 3 3 5 4 4 4 6 customer_id id 1 1 1 5 2 2 2 6
  • 12. Django Multicolumn Joins | © 2012 Hearsay Social 12 Add DB and Balance Load: id collision customer_id id 3 3 3 5 customer_id id 1 1 1 5 customer_id id 2 2 2 6 4 4 4 6
  • 13. Django Multicolumn Joins | © 2012 Hearsay Social 13 Queries: Find all choices for a poll? customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 14. Django Multicolumn Joins | © 2012 Hearsay Social 14 Queries: Find all choices for a poll? Attempt 1) Using related set target_poll.choice_set.all() or Choice.objects.filter(poll=target_poll) SELECT * FROM choice WHERE poll_id = 1 customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 15. Django Multicolumn Joins | © 2012 Hearsay Social 15 Queries: Find all choices for a poll? Attempt 2) Adding a F expression target_poll.choice_set.all(customer=F(„poll__customer‟)) or Choice.objects.filter(poll=target_poll, customer=F(„poll__customer‟)) SELECT c.* FROM choice c INNER JOIN poll p ON c.poll_id = p.id WHERE c.poll_id = 1 AND c.customer_id = p.customer_id; customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 16. Django Multicolumn Joins | © 2012 Hearsay Social 16 Queries: Find all choices for a poll? Attempt 3) Filter explicitly target_poll.choice_set.all(customer=target_poll.customer) or Choice.objects.filter(poll=target_poll, customer=target_poll.customer) SELECT * FROM choice WHERE poll_id = 1 AND customer_id = 2; customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 17. Django Multicolumn Joins | © 2012 Hearsay Social 17 Field Assignment quantity_inn = Customer.objects.create(id=15, name=„Quantity Inn‟) quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=„What size bed do you prefer?‟) choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll) choice1.customer_id ??????   choice1.customer = quantity_poll.customer Repetitive
  • 18. Django Multicolumn Joins | © 2012 Hearsay Social 18 What do we do?
  • 19. Django Multicolumn Joins | © 2012 Hearsay Social 19 Solution via Django 1.6 class ForeignObject(othermodel, from_fields, to_fields[, **options]) where: from django.db.models import ForeignObject
  • 20. Django Multicolumn Joins | © 2012 Hearsay Social 20 ForeignObject Usage class ForeignModel(models.Model): id1 = models.IntegerField() id2 = models.IntegerField() class ReferencingModel(models.Model): om_id1 = models.IntegerField() om_id2 = models.IntegerField() om = ForeignObject(ForeignModel, from_fields=(om_id1, om_id2), to_fields=(id1, id2))
  • 21. Django Multicolumn Joins | © 2012 Hearsay Social 21 Conversion from ForeignKey to ForeignObject class Choice(models.Model): customer = models.ForeignKey(Customer) poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0) class Choice(models.Model): customer = models.ForeignKey(Customer) poll_id = models.IntegerField() choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0) poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
  • 22. Django Multicolumn Joins | © 2012 Hearsay Social 22 Queries with ForeignObject Attempt 1) Using related set target_poll.choice_set.all() SELECT * FROM choice WHERE poll_id = 1 AND customer_id = 2; customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 23. Django Multicolumn Joins | © 2012 Hearsay Social 23 Queries with ForeignObject Attempt 2) Manually stated Choice.objects.filter(poll=target_poll) SELECT * FROM choice WHERE poll_id = 1 AND customer_id = 2; customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 24. Django Multicolumn Joins | © 2012 Hearsay Social 24 Queries with ForeignObject Attempt 2) Manually stated w/tuple Choice.objects.filter(poll=(2, 1)) SELECT * FROM choice WHERE poll_id = 1 AND customer_id = 2; customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 25. Django Multicolumn Joins | © 2012 Hearsay Social 25 Field Assignment with ForeignObject quantity_inn = Customer.objects.create(id=15, name=„Quantity Inn‟) quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=„What size bed do you prefer?‟) choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll) choice1.customer_id >> 15   choice1.customer = quantity_poll.customer  Not needed
  • 26. Django Multicolumn Joins | © 2012 Hearsay Social 26 “With great power comes great responsibility”
  • 27. Django Multicolumn Joins | © 2012 Hearsay Social 27 Tuple ordering matters Choice.objects.filter(poll=(1, 2)) SELECT * FROM choice WHERE poll_id = 2 AND customer_id = 1; poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’)) customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 28. Django Multicolumn Joins | © 2012 Hearsay Social 28 IN Operator Choice.objects.filter(poll__in=[(2, 1), (2, 2)]) SELECT * FROM choice WHERE (poll_id = 1 AND customer_id = 2) OR (poll_id = 2 AND customer_id = 2); poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’)) customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 29. Django Multicolumn Joins | © 2012 Hearsay Social 29 IN Operator w/queryset Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2)) SELECT c.* FROM choice c WHERE EXISTS (SELECT p.customer_id, p.id FROM poll p WHERE p.customer_id = 2 AND p.customer_id = c.customer_id AND p.id = c.poll_id); poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’)) customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 30. Django Multicolumn Joins | © 2012 Hearsay Social 30 IN Operator with MySql Choice.objects.filter(poll__in=[(2, 1), (2, 2)]) SELECT * FROM choice WHERE (poll_id, customer_id) IN ((1, 2), (2, 2)); poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’)) customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 31. Django Multicolumn Joins | © 2012 Hearsay Social 31 IN Operator w/queryset & MySQL Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2)) SELECT c.* FROM choice c WHERE (c.customer_id, c.poll_id) IN (SELECT p.customer_id, p.id FROM poll p WHERE p.customer_id = 2); poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’)) customer_id id question 1 1 What’s your seat pref.? 1 2 Are you married? 2 1 Gender? 2 2 Did you have fun? customer_id poll_id id choice_text 1 1 1 Window 1 1 2 Ailse 1 2 1 Yes 1 2 2 No 2 1 1 Male 2 1 2 Female 2 2 1 Yes? Poll Choice
  • 32. Django Multicolumn Joins | © 2012 Hearsay Social 32 ForeignKey vs ForeignObject Whats the difference? ForeignKey is a ForeignObject pseudo def: ForeignObject(OtherModel, from_fields=((„self‟,)), to_fields=((OtherModel._meta.pk.name),))
  • 33. Django Multicolumn Joins | © 2012 Hearsay Social 33 ForeignKey usage: Order By Example Poll.objects.order_by(„customer‟) class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
  • 34. Django Multicolumn Joins | © 2012 Hearsay Social 34 ForeignKey usage: Order By Example Poll.objects.order_by(„customer‟) SELECT p.* from poll INNER JOIN customer c ON p.customer_id = c.id ORDER BY c.name ASC; class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
  • 35. Django Multicolumn Joins | © 2012 Hearsay Social 35 ForeignKey usage: Order By Example Poll.objects.order_by(„customer_id‟) SELECT p.* from poll INNER JOIN customer c ON p.customer_id = c.id ORDER BY c.name ASC; class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') Alias for customer
  • 36. Django Multicolumn Joins | © 2012 Hearsay Social 36 ForeignKey usage: Order By Example Poll.objects.order_by(„customer__id‟) SELECT p.* from poll INNER JOIN customer c ON p.customer_id = c.id ORDER BY p.customer_id ASC; class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
  • 37. Django Multicolumn Joins | © 2012 Hearsay Social 37 ForeignKey usage: Order By Example Poll.objects.order_by(„customer_id‟) SELECT * from poll ORDER BY customer_id ASC; class Customer(models.Model): name = models.CharField(max_length=100) class Meta: ordering = („name‟,) class Poll(models.Model): customer_id = models.IntegerField() question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') customer = models.ForeignObject(Customer, from_fields=(„customer_id‟,), to_fields=(„id‟,))
  • 38. Django Multicolumn Joins | © 2012 Hearsay Social 38 Still more fun stuff • ForeignObject.get_extra_description_filter • ForeignObject.get_extra_restriction • More to come
  • 39. Django Multicolumn Joins | © 2012 Hearsay Social 39 Dig for more information: • ForeignObject source • django/db/models/fields/related.py • V1 Version of Patch (Based of Django 1.4) • https://github.com/jtillman/django/tree/MultiColumnJoin • Blog post to come • Hearsay Social Blog (http://engineering.hearsaysocial.com/)
  • 40. Django Multicolumn Joins | © 2012 Hearsay Social 40 Questions?