SlideShare a Scribd company logo
1 of 19
Django and working with
large database tables
Django Stockholm Meetup Group
March 30, 2017
About me
Ilian Iliev
Platform Engineer at Lifesum
ilian@ilian.io
www.ilian.io
The setup
2.5GHz i7, 16GB Ram, MacBook Pro
Django 1.10
MySQL 5.7.14
PostgreSQL 9.5.4
The Models
class Tag(models.Model):
name = models.CharField(max_length=255)
class User(models.Model):
name = models.CharField(max_length=255)
date = models.DateTimeField(null=True)
class Message(models.Model):
sender = models.ForeignKey(User, related_name='sent_messages')
receiver = models.ForeignKey(User, related_name='recieved_messages', null=True)
tags = models.ManyToManyField(Tag)
The Change
class Message(models.Model):
sender = models.ForeignKey(User, related_name='sent_messages')
receiver = models.ForeignKey(User, related_name='recieved_messages', null=True)
tags = models.ManyToManyField(Tag, blank=True)
The weird migration
ALTER TABLE `big_tables_message_tags` DROP FOREIGN KEY
`big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id`;
ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT
`big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id` FOREIGN KEY (`tag_id`)
REFERENCES `big_tables_tag` (`id`);
ALTER TABLE `big_tables_message_tags` DROP FOREIGN KEY
`big_tables_message__message_id_95bfb6e6_fk_big_tables_message_id`;
ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT
`big_tables_message__message_id_95bfb6e6_fk_big_tables_message_id` FOREIGN KEY
(`message_id`) REFERENCES `big_tables_message` (`id`);
MySQL
Rows ~ 2.7M
Size ~ 88MB
message_id index size ~ 48MB
tags_id index size ~ 61MB
Migration time ~ 41 sec
The weird migration
ALTER TABLE "big_tables_message_tags" DROP CONSTRAINT
"big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id";
ALTER TABLE "big_tables_message_tags" ADD CONSTRAINT
"big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id" FOREIGN KEY ("tag_id")
REFERENCES "big_tables_tag" ("id") DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE "big_tables_message_tags" DROP CONSTRAINT
"big_tables_message_message_id_95bfb6e6_fk_big_tables_message_id";
ALTER TABLE "big_tables_message_tags" ADD CONSTRAINT
"big_tables_message_message_id_95bfb6e6_fk_big_tables_message_id" FOREIGN KEY
("message_id") REFERENCES "big_tables_message" ("id") DEFERRABLE INITIALLY DEFERRED;
PostgreSQL
Rows ~ 2.8M
Size ~ 83MB
message_id index size ~ 77MB
tags_id index size ~ 119MB
Migration time ~ 3.2 sec
Modify the migration that created the field and add the change there
* It is a know issue https://code.djangoproject.com/ticket/25253
Solution
class MessagesTags(models.Model):
message = models.ForeignKey(Message)
tag = models.ForeignKey(Tag)
added_by = models.ForeignKey(User, null=True)
Adding fields to big tables
MySQL: 31 sec
PostgreSQL: 5.3 sec
Timing
MySQL INPLACE
ALTER TABLE `big_tables_message_tags` ADD COLUMN `added_by_id`
integer NULL, ALGORITHM INPLACE, LOCK NONE;
ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT
`big_tables_message_ta_added_by_id_88e3a4dc_fk_big_tables_user_id`
FOREIGN KEY (`added_by_id`) REFERENCES `big_tables_user` (`id`),
ALGORITHM INPLACE, LOCK NONE;
* The INPLACE algorithm is supported when foreign_key_checks is disabled.
Otherwise, only the COPY algorithm is supported.
Running this on prod
Running in on prod resulted in the API crashing
Non locking query but still too heavy for the DB
Aurora appears even slower
Alternative
class MessagesTagsExtend(models.Model):
STATUS_PENDING_REVIEW = 0
STATUS_APPROVED = 10
DEFAULT_STATUS = STATUS_PENDING_REVIEW
message_tag = models.OneToOneField(MessagesTags)
status = models.IntegerField(default=DEFAULT_STATUS)
Alternative
class MessagesTags(models.Model):
...
@property
def status(self):
try:
return self.messagestagsextend.status
except MessagesTagsExtend.DoesNotExist:
print 'here'
return MessagesTagsExtend.DEFAULT_STATUS
@status.setter
def status(self, value):
obj, _ = MessagesTagsExtend.objects.get_or_create(message_tag=self)
obj.status = value
obj.save()
self.messagestagsextend = obj
* Performance is not tested on production environment
Iterating on big tables
for x in MessagesTags.objects.all():
print x
+ Single SQL query
- Loads everything in memory
Iterating on big tables
for x in MessagesTags.objects.iterator():
print x
+ Single SQL query
+ Loads pieces of the result in memory
- prefetch_related is not working
Questions?

More Related Content

What's hot

[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
OWASP Russia
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
Smarter Testing with Spock
Smarter Testing with SpockSmarter Testing with Spock
Smarter Testing with Spock
Dmitry Voloshko
 

What's hot (20)

MySql:Introduction
MySql:IntroductionMySql:Introduction
MySql:Introduction
 
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Hacking XPATH 2.0
Hacking XPATH 2.0Hacking XPATH 2.0
Hacking XPATH 2.0
 
20190627 j hipster-conf- diary of a java dev lost in the .net world
20190627   j hipster-conf- diary of a java dev lost in the .net world20190627   j hipster-conf- diary of a java dev lost in the .net world
20190627 j hipster-conf- diary of a java dev lost in the .net world
 
Spock
SpockSpock
Spock
 
XML & XPath Injections
XML & XPath InjectionsXML & XPath Injections
XML & XPath Injections
 
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
 
บทที่4
บทที่4บทที่4
บทที่4
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
Clojure functions midje
Clojure functions midjeClojure functions midje
Clojure functions midje
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Python PCEP Functions
Python PCEP FunctionsPython PCEP Functions
Python PCEP Functions
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Jdbc oracle
Jdbc oracleJdbc oracle
Jdbc oracle
 
Sequelize
SequelizeSequelize
Sequelize
 
Python dictionary : past, present, future
Python dictionary: past, present, futurePython dictionary: past, present, future
Python dictionary : past, present, future
 
Smarter Testing with Spock
Smarter Testing with SpockSmarter Testing with Spock
Smarter Testing with Spock
 
1.4 data cleaning and manipulation in r and excel
1.4  data cleaning and manipulation in r and excel1.4  data cleaning and manipulation in r and excel
1.4 data cleaning and manipulation in r and excel
 

Similar to Django and working with large database tables

Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
Katie Gulley
 
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docxFaculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
mydrynan
 
Metaprogramovanie #1
Metaprogramovanie #1Metaprogramovanie #1
Metaprogramovanie #1
Jano Suchal
 

Similar to Django and working with large database tables (20)

Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
 
Django Good Practices
Django Good PracticesDjango Good Practices
Django Good Practices
 
concurrency with GPars
concurrency with GParsconcurrency with GPars
concurrency with GPars
 
More Stored Procedures and MUMPS for DivConq
More Stored Procedures and  MUMPS for DivConqMore Stored Procedures and  MUMPS for DivConq
More Stored Procedures and MUMPS for DivConq
 
Python Metaprogramming
Python MetaprogrammingPython Metaprogramming
Python Metaprogramming
 
Clean code _v2003
 Clean code _v2003 Clean code _v2003
Clean code _v2003
 
Django Models
Django ModelsDjango Models
Django Models
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Java → kotlin: Tests Made Simple
Java → kotlin: Tests Made SimpleJava → kotlin: Tests Made Simple
Java → kotlin: Tests Made Simple
 
Why Our Code Smells
Why Our Code SmellsWhy Our Code Smells
Why Our Code Smells
 
03 object-classes-pbl-4-slots
03 object-classes-pbl-4-slots03 object-classes-pbl-4-slots
03 object-classes-pbl-4-slots
 
03 object-classes-pbl-4-slots
03 object-classes-pbl-4-slots03 object-classes-pbl-4-slots
03 object-classes-pbl-4-slots
 
Clean code
Clean codeClean code
Clean code
 
[FT-7][snowmantw] How to make a new functional language and make the world be...
[FT-7][snowmantw] How to make a new functional language and make the world be...[FT-7][snowmantw] How to make a new functional language and make the world be...
[FT-7][snowmantw] How to make a new functional language and make the world be...
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
Addressing Scenario
Addressing ScenarioAddressing Scenario
Addressing Scenario
 
GSP 125 Final Exam Guide
GSP 125 Final Exam GuideGSP 125 Final Exam Guide
GSP 125 Final Exam Guide
 
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docxFaculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
 
Metaprogramovanie #1
Metaprogramovanie #1Metaprogramovanie #1
Metaprogramovanie #1
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 

Django and working with large database tables

  • 1. Django and working with large database tables Django Stockholm Meetup Group March 30, 2017
  • 2. About me Ilian Iliev Platform Engineer at Lifesum ilian@ilian.io www.ilian.io
  • 3. The setup 2.5GHz i7, 16GB Ram, MacBook Pro Django 1.10 MySQL 5.7.14 PostgreSQL 9.5.4
  • 4. The Models class Tag(models.Model): name = models.CharField(max_length=255) class User(models.Model): name = models.CharField(max_length=255) date = models.DateTimeField(null=True) class Message(models.Model): sender = models.ForeignKey(User, related_name='sent_messages') receiver = models.ForeignKey(User, related_name='recieved_messages', null=True) tags = models.ManyToManyField(Tag)
  • 5. The Change class Message(models.Model): sender = models.ForeignKey(User, related_name='sent_messages') receiver = models.ForeignKey(User, related_name='recieved_messages', null=True) tags = models.ManyToManyField(Tag, blank=True)
  • 6. The weird migration ALTER TABLE `big_tables_message_tags` DROP FOREIGN KEY `big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id`; ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT `big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id` FOREIGN KEY (`tag_id`) REFERENCES `big_tables_tag` (`id`); ALTER TABLE `big_tables_message_tags` DROP FOREIGN KEY `big_tables_message__message_id_95bfb6e6_fk_big_tables_message_id`; ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT `big_tables_message__message_id_95bfb6e6_fk_big_tables_message_id` FOREIGN KEY (`message_id`) REFERENCES `big_tables_message` (`id`);
  • 7. MySQL Rows ~ 2.7M Size ~ 88MB message_id index size ~ 48MB tags_id index size ~ 61MB Migration time ~ 41 sec
  • 8. The weird migration ALTER TABLE "big_tables_message_tags" DROP CONSTRAINT "big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id"; ALTER TABLE "big_tables_message_tags" ADD CONSTRAINT "big_tables_message_tags_tag_id_5eb6034e_fk_big_tables_tag_id" FOREIGN KEY ("tag_id") REFERENCES "big_tables_tag" ("id") DEFERRABLE INITIALLY DEFERRED; ALTER TABLE "big_tables_message_tags" DROP CONSTRAINT "big_tables_message_message_id_95bfb6e6_fk_big_tables_message_id"; ALTER TABLE "big_tables_message_tags" ADD CONSTRAINT "big_tables_message_message_id_95bfb6e6_fk_big_tables_message_id" FOREIGN KEY ("message_id") REFERENCES "big_tables_message" ("id") DEFERRABLE INITIALLY DEFERRED;
  • 9. PostgreSQL Rows ~ 2.8M Size ~ 83MB message_id index size ~ 77MB tags_id index size ~ 119MB Migration time ~ 3.2 sec
  • 10. Modify the migration that created the field and add the change there * It is a know issue https://code.djangoproject.com/ticket/25253 Solution
  • 11. class MessagesTags(models.Model): message = models.ForeignKey(Message) tag = models.ForeignKey(Tag) added_by = models.ForeignKey(User, null=True) Adding fields to big tables
  • 12. MySQL: 31 sec PostgreSQL: 5.3 sec Timing
  • 13. MySQL INPLACE ALTER TABLE `big_tables_message_tags` ADD COLUMN `added_by_id` integer NULL, ALGORITHM INPLACE, LOCK NONE; ALTER TABLE `big_tables_message_tags` ADD CONSTRAINT `big_tables_message_ta_added_by_id_88e3a4dc_fk_big_tables_user_id` FOREIGN KEY (`added_by_id`) REFERENCES `big_tables_user` (`id`), ALGORITHM INPLACE, LOCK NONE; * The INPLACE algorithm is supported when foreign_key_checks is disabled. Otherwise, only the COPY algorithm is supported.
  • 14. Running this on prod Running in on prod resulted in the API crashing Non locking query but still too heavy for the DB Aurora appears even slower
  • 15. Alternative class MessagesTagsExtend(models.Model): STATUS_PENDING_REVIEW = 0 STATUS_APPROVED = 10 DEFAULT_STATUS = STATUS_PENDING_REVIEW message_tag = models.OneToOneField(MessagesTags) status = models.IntegerField(default=DEFAULT_STATUS)
  • 16. Alternative class MessagesTags(models.Model): ... @property def status(self): try: return self.messagestagsextend.status except MessagesTagsExtend.DoesNotExist: print 'here' return MessagesTagsExtend.DEFAULT_STATUS @status.setter def status(self, value): obj, _ = MessagesTagsExtend.objects.get_or_create(message_tag=self) obj.status = value obj.save() self.messagestagsextend = obj * Performance is not tested on production environment
  • 17. Iterating on big tables for x in MessagesTags.objects.all(): print x + Single SQL query - Loads everything in memory
  • 18. Iterating on big tables for x in MessagesTags.objects.iterator(): print x + Single SQL query + Loads pieces of the result in memory - prefetch_related is not working

Editor's Notes

  1. How many of you use MySQL How many use PostgreSQL Anyone using SQLite or Oracle?
  2. And of course you will always have to add select related
  3. Single SQL query Loads everything in memory I killed it after taking 2G of ram and it still hasn’t started printing the results
  4. And of course you will always have to add select related Consider using values() and values_list()
  5. Thank you a for listening, do you have any questions.