Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyCon KR 2018 Effective Tips for Django ORM in Practice

865 views

Published on

In the following slides, I will share my solutions that have worked out with Django ORM. These are not only Django ORM issues but also some resolutions that have been effective with other techniques.

Published in: Software
  • Be the first to comment

PyCon KR 2018 Effective Tips for Django ORM in Practice

  1. 1. Effective Tips
 for Django ORM in Practice 한섬기 SeomGi, Han
  2. 2. Based on Django 1.11.14
  3. 3. I learned these tips from my best team members.
  4. 4. Models & QuerySet
  5. 5. Case 1) Implementing an idempotent function * https://docs.djangoproject.com/en/1.11/ref/models/querysets/#update-or-create django.db.models.QuerySet.update_or_create()*
  6. 6. P2P company settles the deposit amount every day. Borrower Repayment SettlementP2P Company Investors
  7. 7. And a human can always make a mistake. Borrower Repayment P2P Company Investors Settlement
 2 times!?
  8. 8. We have prevented this problem with idempotent update_or_create function. def settle(loan_id, sequence, amount): settlement, created = Settlement.objects.update_or_create( loan_id=loan_id, sequence=sequence, defaults={'amount': amount} ) if created: return 'Settlement completed!' return 'You have already settled this!'
  9. 9. Case 2) Overriding Predefined Model Methods django.db.models.Model.save() https://docs.djangoproject.com/en/1.11/ref/models/instances/#django.db.models.Model.save
  10. 10. We needed to show a number of investors per a loan

  11. 11. but using the ‘COUNT(*)' function for every loan brings overhead.
  12. 12. We have solved this problem with denormalization
 and Model.save() function Loan Investment Investment Investment … count( )
  13. 13. We have solved this problem with denormalization
 and Model.save() function Loan Investment Investment Investment … count() + investor_count
  14. 14. We have solved this problem with denormalization
 and Model.save() function class Investment(models.Model):
 ... def save(self, *args, **kwargs): self.loan.investor_count += 1 self.loan.save() super().save(*args, **kwargs) …
  15. 15. Note 1: You can specify which fields to save with ‘update_fields’. 
 Note 2: If you use ‘update_fields’, ‘auto_now’ attribute doesn’t work. https://code.djangoproject.com/ticket/22981 https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.DateField.auto_now class Investment(models.Model):
 ... def save(self, *args, **kwargs): self.loan.investor_count += 1 self.loan.save( update_fields=[‘investor_count’, ‘updated_at’]) super().save(*args, **kwargs) …
  16. 16. * https://django-model-utils.readthedocs.io/en/latest/utilities.html#choices Case 3) Enumerations model_utils.Choices*
  17. 17. Enum* is provided from Python 3.4.
 But we can’t use it in Django.
 In Django, enum is just a string. * https://docs.python.org/3/library/enum.html GRADES = ( ('A1', 'A1'), ('A2', 'A2'), ('A3', 'A3'), ... )
  18. 18. And we found the ‘django-model-utils*’ package. from model_utils import Choices class Loan(models.Model): GRADES = Choices( 'A1', ‘A2', ‘A3', ... ) grade = models.CharField(choices=GRADES) ... ...
 
 def use_choices(): loans = Loan.objects.filter( grade=Loan.GRADES.A1 )
 
 … * https://github.com/jazzband/django-model-utils/
  19. 19. Note: You can use integers (or other types) as an enum.
 Then, integer values are stored in a database. from model_utils import Choices class Loan(models.Model): GRADES = Choices( (1, 'CODE1', 1), (2, 'CODE2', 2), ... ) grade = models.CharField(choices=GRADES) … ...
 
 def use_choices(): loans = Loan.objects.filter( grade=Loan.GRADES.CODE1 )
 
 …
  20. 20. * https://django-model-utils.readthedocs.io/en/latest/models.html#timestampedmodel ** https://django-extensions.readthedocs.io/en/latest/model_extensions.html?highlight=TimeStamped Case 4) Abstract Models model_utils.models.TimeStampedModel* django_extensions.db.models.TimeStampedModel**
  21. 21. Our (internal) customers wanted to know the exact time of some actions.
  22. 22. Our (internal) customers wanted to know the exact time of some actions.
 Unfortunately, they ask them a year later!
  23. 23. Our (internal) customers wanted to know the exact time of some actions.
 Unfortunately, they ask them a year later!
 So we always add created_at and updated_at fields to all models.
  24. 24. And we thought that abstraction would make it better. class TimeStampedModel(models.Model): created_datetime = models.DateTimeField(auto_now_add=True) updated_datetime = models.DateTimeField(auto_now=True) class Meta: abstract = True class Loan(TimeStampedModel): amount = models.FloatField() interest = models.FloatField() ...
  25. 25. And we applied this more. class Loan(TimeStampedModel): amount = models.FloatField() interest = models.FloatField() class Meta: abstract = True class SecuredLoan(Loan): security = models.ForeignKey(Security) ... class UnsecuredLoan(Loan): credit_grade = models.IntegerField() ...
  26. 26. Case 5) More Abstract Model to Excel
  27. 27. Our (internal) customers wanted raw data sometimes.
  28. 28. Our (internal) customers wanted raw data sometimes. 
 We were annoyed with it.
  29. 29. class Loan(TimeStampedModel): amount = models.FloatField(verbose_name='Amount') interest = models.FloatField(verbose_name='Interest') class Meta: abstract = True class UnsecuredLoan(Loan): credit_grade = models.IntegerField(verbose_name='Credit Grade') ... And we have solved this problem with abstraction again. 
 First of all, we added ‘verbose_name’ to all models.
  30. 30. def get_field_name_by_verbose_name(verbose_name, model): for field in model._meta.fields: if field.verbose_name == verbose_name: return field.attname return None And we made a function which find a field name by ‘verbose_name’.
  31. 31. verbose_names = ('Amount', 'Interest', 'Credit Grade') field_names = tuple(map(partial(get_field_names_by_verbose_name, model), verbose_names)) loans = model.objects.filter(field=condition) with xlsxwriter.Workbook('/tmp/temp.xlsx') as workbook: worksheet = workbook.add_worksheet() def write_row(*args): row_num, loan = args def get_value(field_name): return getattr(loan, field_name) row = tuple(map(get_value, field_names)) worksheet.write_row(row_num, 0, row) apply(map(write_row, enumerate(loans, 0))) Then we made a small program to get raw data.
  32. 32. If someone wanted data, we just listed ‘verbose_name’ up.
  33. 33. Managers
  34. 34. Case 6) Predefined Filters django.db.models.manager.Manager* * https://docs.djangoproject.com/en/1.11/topics/db/managers/#custom-managers
  35. 35. Our (internal) customers asked something like below
  36. 36. Could you calculate an interest amount of a loan for this month? Our (internal) customers asked something like below
  37. 37. Could you calculate an interest amount of a loan for this month? Start from today, how much can we earn from that loan? Our (internal) customers asked something like below
  38. 38. Could you calculate an interest amount of a loan for this month? Start from today, how much can we earn from that loan? How much principal remains for that loan? … Our (internal) customers asked something like below
  39. 39. Could you calculate an interest amount of a loan for this month? Start from today, how much can we earn from that loan? How much principal remains for that loan? … Calculation based on remaining principal Summation based on remaining interest Summation based on remaining principal Our (internal) customers asked something like below
 and we found common grounds.
  40. 40. Could you calculate an interest amount of a loan for this month? Start from today, how much can we earn from that loan? How much principal remains for that loan? … Calculation based on remaining principal Summation based on remaining interest Summation based on remaining principal Our (internal) customers asked something like below
 and we found common grounds.
  41. 41. So we defined some filters .filter( loan=loan, status__in=REPAYMENT_STATUS.COMPLETED ) .filter( loan=loan ).exclude( status__in=REPAYMENT_STATUS.COMPLETED )
  42. 42. So we defined some filters
 and moved them into a custom manager of a model class RepaymentManager(models.Manager): def completed(self, loan): return self.filter( loan=loan, status__in=REPAYMENT_STATUS.COMPLETED ) def not_completed(self, loan): return self.filter( loan=loan ).exclude( status__in=REPAYMENT_STATUS.COMPLETED ) class Repayment(models.Model): objects = RepaymentManager() ...
  43. 43. So we defined some filters
 and moved them into a custom manager of a model
 and used it everywhere. ... remaining_principal = Repayment.objects.not_completed( loan=loan ).aggregate( remaining_principal=Coalesce(Sum('principal'), 0) )['remaining_principal'] ...
  44. 44. Aggregation & Annotation
  45. 45. Case 7) Group By
  46. 46. We try to show an investor’s summary based on loan status. Status
  47. 47. So we filtered with some conditions first. schedules = Schedule.objects.filter( user_id=user_id, planned_date__gte=start_date, planned_date__lt=end_date )
  48. 48. And we made groups with ‘values’ statement.
 If you use ‘values’ before ‘annotate' or ‘aggregate’, 
 it works as a ‘group by’ statement. schedules = schedules.values('status').annotate( cnt=Count('loan_id', distinct=True), sum_principal=AbsoluteSum('principal'), sum_interest=Sum('interest'), sum_commission=Sum('commission'), sum_tax=Sum('tax') )
  49. 49. Finally, we got aggregated values. [ {'cnt': 5, 'sum_principal': 300000, 'sum_interest': 1234, ...}, {'cnt': 3, 'sum_principal': 200000, 'sum_interest': 123, ...}, ... ]
  50. 50. https://docs.djangoproject.com/en/1.11/ref/models/conditional-expressions/#conditional-aggregation https://docs.djangoproject.com/en/1.11/ref/models/database-functions/#coalesce Case 8) Conditional Aggregation
  51. 51. But our (internal) customers wanted more summarized data. PLANNED
 + IN SETTLING
  52. 52. So we made abstracted categories. custom_status_annotation = Case( When(status__in=(PLANNED, SETTLING), then=Value(PLANNED)), When(status__in=(DELAYED, OVERDUE,), then=Value(DELAYED)), When(status__in=(LONG_OVERDUE,), then=Value(LONG_OVERDUE)), When(status__in=(SOLD,), then=Value(SOLD)), default=Value(COMPLETED), output_field=CharField(), )
  53. 53. schedules_by_status = schedules.annotate( custom_status=custom_status_annotation ).values( 'custom_status' ).annotate( cnt=Count('loan_id', distinct=True), sum_principal=Coalesce(AbsoluteSum('principal'), 0), sum_interest=Coalesce(Sum('interest'), 0), sum_commission=Coalesce(Sum('commission'), 0), sum_tax=Coalesce(Sum('tax'), 0) ).values( 'custom_status', 'cnt', 'sum_principal', 'sum_interest', 'sum_commission', 'sum_tax', )
  54. 54. It was not the end. They wanted sorted results.
 So we made a trick. custom_status_annotation = Case( When(status__in=(PLANNED, SETTLING), then=Value('02_PLANNED')), When(status__in=(DELAYED, OVERDUE,), then=Value('03_DELAYED')), When(status__in=(LONG_OVERDUE,), then=Value('04_LONG_OVERDUE')), When(status__in=(SOLD,), then=Value('05_SOLD')), default=Value(’01_COMPLETED'), output_field=CharField(), )
  55. 55. schedules_by_status = schedules.annotate( custom_status=custom_status_annotation ).values( 'custom_status' ).order_by( 'custom_status' ).annotate( cnt=Count('loan_id', distinct=True), sum_principal=Coalesce(AbsoluteSum('principal'), 0), sum_interest=Coalesce(Sum('interest'), 0), sum_commission=Coalesce(Sum('commission'), 0), sum_tax=Coalesce(Sum('tax'), 0) ).values( 'custom_status', 'cnt', 'sum_principal', 'sum_interest', 'sum_commission', 'sum_tax', )
  56. 56. 01_COMPLETED 02_PLANNED 03_DELAYED 04_LONG_OVERDUE 05_SOLD As the outcome, we could get summarized and sorted results.
  57. 57. Case 9) Custom Functions AbsoluteSum* * https://gist.github.com/iandmyhand/b2c32311715113e8c470932a053a6732
  58. 58. We stored transaction values like below. Category Value Deposit ₩100000 Investment -₩100000 Settlement ₩100100 Withdraw -₩50100
  59. 59. If we wanted to know the balance of some user, we needed to sum all values. Category Value Deposit ₩100000 Investment -₩100000 Settlement ₩100100 Withdraw -₩50100 ₩50000Balance
  60. 60. But our (internal) customers wanted to know total transaction amount. Category Value Deposit ₩100,000 Investment -₩100,000 Settlement ₩100,100 Withdraw -₩50,100 ₩350,200 Total
 Transaction
 Amount
  61. 61. So we created custom ORM function. class AbsoluteSum(Sum): name = 'AbsoluteSum' template = '%(function)s(%(absolute)s(%(expressions)s))' def __init__(self, expression, **extra): super(AbsoluteSum, self).__init__( expression, absolute='ABS ', output_field=IntegerField(), **extra) def __repr__(self): return "SUM(ABS(%s))".format( self.arg_joiner.join(str(arg) for arg in self.source_expressions) )
  62. 62. And used it. result = Statement.objects.annotate( absolute_sum=AbsoluteSum('amount'), normal_sum=Sum('amount') ).values( 'absolute_sum', 'normal_sum' ) … print(str(result.query)) SELECT (SUM(ABS(`test`.`amount`))) AS `absolute_sum`, (SUM(`test`.`amount`)) AS `normal_sum` FROM `statement` print(result['absolute_sum']) # 350200 print(result['normal_sum']) # 50000
  63. 63. Transactions
  64. 64. https://docs.djangoproject.com/en/1.11/ref/models/querysets/#select-for-update Case 10) Locks QuerySet.select_for_update*
  65. 65. Every investor can invest to a loan simultaneously, BorrowerInvestInvestors 100,000 50,000 30,000
  66. 66. Every investor can invest to a loan simultaneously, 
 but we need to match the sum of investment amount
 and the loan amount. BorrowerInvestInvestors 100,000 50,000 30,000 150,000
  67. 67. So we used a transaction with a lock. @transaction.atomic def invest(loan_id, user_id, amount): loan = Loan.objects.select_for_update().get(pk=loan_id) balance = Balance.objects.select_for_update().get(user_id=user_id) ...
  68. 68. How to check programmatically that a transaction and a lock work well? @transaction.atomic def invest(loan_id, user_id, amount): loan = Loan.objects.select_for_update().get(pk=loan_id) balance = Balance.objects.select_for_update().get(user_id=user_id) ...
  69. 69. We do not know the perfect way. 
 So we tested it with our eyes. @transaction.atomic def invest(loan_id, user_id, amount): loan = Loan.objects.select_for_update().get(pk=loan_id) balance = Balance.objects.select_for_update().get(user_id=user_id) time.sleep(60) …
  70. 70. We do not know the perfect way. 
 So we tested it with our eyes. Request simultaneously
  71. 71. Yes I know, this is not a good way.
 So if you have a nicer way, please share that idea. Request simultaneously
  72. 72. Note: Ordering execution of queries is the most important. @transaction.atomic def invest(loan_id, user_id, amount): a = AnotherModel.objects.all().first() loan = Loan.objects.select_for_update().get(pk=loan_id) balance = Balance.objects.select_for_update().get(user_id=user_id) ... The lock will not be acquired if a query without a lock executed first.
  73. 73. Case 11) Locks with two or more DBs
  74. 74. We split DB into two instances. 
 One is for a bank, and another is for our product. Bank Customers Database 
 for internal products Database 
 for a bank
  75. 75. But we needed to tie two databases in one transaction. Bank Customers Database 
 for internal products Database 
 for a bank One transaction
  76. 76. There is one tricky way to solve this problem.
 That is using transaction statement twice. with transaction.atomic(using='default'): with transaction.atomic(using='bank'): peoplefund = PeoplefundModel.objects.select_for_update().get(pk=loan_id) bank = BankModel.objects.select_for_update().get(user_id=user_id) ...
  77. 77. Performance
  78. 78. https://hackernoon.com/all-you-need-to-know-about-prefetching-in-django-f9068ebe1e60 https://stackoverflow.com/questions/31237042/whats-the-difference-between-select-related-and-prefetch-related-in-django-orm Case 12) Join prefetch_related & select_related*
  79. 79. We needed to show some information from both an investment and a loan model. from an investment modelfrom a loan model
  80. 80. And we wrote some codes like below. def get_investments(user_id): result = [] investments = Investment.objects.filter(user_id=user_id) for investment in investments: element = { 'investment_amount': investment.amount, 'loan_title': investment.loan.title, } result.append(element)
  81. 81. And it was getting slower as time goes by, 
 because we did not know how Django ORM works. def get_investments(user_id): result = [] investments = Investment.objects.filter(user_id=user_id) for investment in investments: element = { 'investment_amount': investment.amount, 'loan_title': investment.loan.title, } result.append(element) When the process reaches this point for the first time, 
 Django ORM takes all of the investments But at this point, Django ORM takes one loan per each iteration!
  82. 82. There is a simple way. Using select_related. 
 If you use it, process takes all related objects at once. def get_investments(user_id): result = [] investments = Investment.objects.select_related('loan').filter(user_id=user_id) for investment in investments: element = { 'investment_amount': investment.amount, 'loan_title': investment.loan.title, } result.append(element)
  83. 83. Debugging
  84. 84. https://github.com/jazzband/django-debug-toolbar Case 13) Checking REAL queries django-debug-toolbar*
  85. 85. Sometimes a program makes result precisely, but slowly.
  86. 86. That is the best time to use debugging tools. Same queries executed during one process
  87. 87. https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html Case 14) Watch data at a breakpoint in a transaction isolation level*
  88. 88. We know how to use a transaction. 
 But, how to see the data flow during a transaction?
  89. 89. First of all, you need to make a breakpoint then execute a process. https://www.jetbrains.com/pycharm/
  90. 90. At this point, you can not watch any data.
  91. 91. But if you set an isolation level as a ‘READ-UNCOMMITTED’, 
 you can watch data during transaction.
  92. 92. Raw SQL
  93. 93. Case 15) Performing raw SQL queries
  94. 94. https://docs.djangoproject.com/en/1.11/topics/security/#sql-injection-protection Do not use Raw SQL expressions because of the SQL Injection attacks*.
  95. 95. https://docs.djangoproject.com/en/1.11/topics/security/#sql-injection-protection Do not use Raw SQL expressions because of the SQL Injection attacks*.
 You can convert almost every queries to Django ORM.

×