PyCon DE 2013 - Table Partitioning with Django

2,315 views

Published on

Table partitioning can be thought of as a division of one large table into several smaller tables which represent that original table. Table partitioning is "transparent", that means that in theory you don't need to change any code to work with partitioned tables.

We will talk about table partitioning theory in general and implementations in different database servers. Why and when we need to do table partitioning. What problems we can face and how we can solve them.

Django provides us with great database abstraction and ORM, but how can we use it with table partitioning ? We will talk about existing libraries for Django to work with table partitioning, their differences, which is the best (if any) and why.

Published in: Software, Technology, Business

PyCon DE 2013 - Table Partitioning with Django

  1. 1. PyCon.DE 2013 1 / 52 Table partitioning with Django Max Tepkeev 17 October 2013 Cologne, Germany
  2. 2. PyCon.DE 2013 2 / 52 Partitioning Theory Django Packages Realization
  3. 3. PyCon.DE 2013 3 / 52 Partitioning Theory
  4. 4. PyCon.DE 2013 4 / 52 Definition Table partitioning - division of one table into several tables, called partitions, which still represent original table.
  5. 5. PyCon.DE 2013 5 / 52 Why • Performance • Manageability • Availability
  6. 6. PyCon.DE 2013 6 / 52 When • Tables greater than 2GB • Tables with historical data • Table need to be distributed across different types of storage devices • Queries ALWAYS contain a filter on the partition field
  7. 7. PyCon.DE 2013 7 / 52 Methods • Horizontal partitioning • Vertical partitioning
  8. 8. PyCon.DE 2013 8 / 52 Strategies • Range partitioning • List partitioning • Hash partitioning • Composite partitioning
  9. 9. PyCon.DE 2013 9 / 52 Strategies
  10. 10. PyCon.DE 2013 10 / 52 Strategies
  11. 11. PyCon.DE 2013 11 / 52 Example id user_id entry added 1 345 Login 2013-08-22 17:24:43 2 345 Went to Store section 2013-08-22 17:25:01 3 345 Ordered a book 2013-08-22 17:33:28 4 345 Payed for a book 2013-08-22 17:35:54 5 345 Logout 2013-08-22 17:38:32
  12. 12. PyCon.DE 2013 12 / 52 Example INSERT INTO user_actions (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08') Goes to user_actions_y2013m08 INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42') Goes to user_actions_y2013m09
  13. 13. PyCon.DE 2013 13 / 52 Example SELECT * FROM user_actions id user_id entry added 1 237 Login 2013-08-21 11:54:08 2 198 Logout 2013-09-01 08:43:42 Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.
  14. 14. PyCon.DE 2013 14 / 52 Realization
  15. 15. PyCon.DE 2013 15 / 52 RDBMS • PostgreSQL • MySQL
  16. 16. PyCon.DE 2013 16 / 52 PostgreSQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning
  17. 17. PyCon.DE 2013 17 / 52 PostgreSQL Implementation: • Inheritance Available: • >= 8.1
  18. 18. PyCon.DE 2013 18 / 52 PostgreSQL Steps: • Master table • Child tables • Correct partition insertion function • Trigger that calls partition insertion function • Function to delete duplicate rows from master • Trigger that calls delete duplicate rows function
  19. 19. PyCon.DE 2013 19 / 52 PostgreSQL CREATE TABLE logs ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id) ) Master table:
  20. 20. PyCon.DE 2013 20 / 52 PostgreSQL CREATE TABLE logs_y2013m08 ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp ) ) INHERITS (logs); Child table:
  21. 21. PyCon.DE 2013 21 / 52 PostgreSQL CREATE FUNCTION "logs_insert_child"() RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Correct partition insertion function:
  22. 22. PyCon.DE 2013 22 / 52 PostgreSQL CREATE TRIGGER "before_insert_logs_trigger" BEFORE INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"(); Trigger that calls partition insertion function:
  23. 23. PyCon.DE 2013 23 / 52 PostgreSQL CREATE FUNCTION "logs_delete_master"() RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Function to delete duplicate rows from master:
  24. 24. PyCon.DE 2013 24 / 52 PostgreSQL CREATE TRIGGER "after_insert_logs_trigger" AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"(); Trigger that calls delete duplicate rows function:
  25. 25. PyCon.DE 2013 25 / 52 Code for automatic new partition creation PostgreSQL DECLARE start_date TIMESTAMP; start_date := date_trunc('month', NEW.added); IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');'; END IF;
  26. 26. PyCon.DE 2013 26 / 52 MySQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning • Hash partitioning • Composite partitioning
  27. 27. PyCon.DE 2013 27 / 52 MySQL Implementation: • Native (PARTITION BY) Available: • >= 5.1
  28. 28. PyCon.DE 2013 28 / 52 How that works MySQL CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE );
  29. 29. PyCon.DE 2013 29 / 52 MySQL Limitations • From lowest to highest (range) • Foreign Key • No real-time partition creation
  30. 30. PyCon.DE 2013 30 / 52 Django Packages
  31. 31. PyCon.DE 2013 31 / 52 Packages • django-parting • django-db-parti
  32. 32. PyCon.DE 2013 32 / 52 django-parting RDBMS: • PostgreSQL
  33. 33. PyCon.DE 2013 33 / 52 django-parting Features: • Partition tables with Foreign Keys
  34. 34. PyCon.DE 2013 34 / 52 django-parting From pypi: $ pip install django-parting or clone from github: $ git clone git://github.com/danfairs/django-parting.git
  35. 35. PyCon.DE 2013 35 / 52 django-parting Add parting to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'parting' )
  36. 36. PyCon.DE 2013 36 / 52 django-parting from django.db import models from django.utils import timezone class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()
  37. 37. PyCon.DE 2013 37 / 52 django-parting from django.utils import timezone from parting import PartitionManager from dateutil.relativedelta import relativedelta def _key_for_date(dt): return dt.strftime('%Y%m') class TweetPartitionManager(PartitionManager): def current_partition(self): return _key_for_date(timezone.now()) def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)
  38. 38. PyCon.DE 2013 38 / 52 django-parting class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) partitions = TweetPartitionManager() class Meta: abstract = True class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField() partitions = TweetPartitionManager() class Meta: abstract = True
  39. 39. PyCon.DE 2013 39 / 52 django-parting import json from django.utils.timezone import make_aware, utc tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy' } partition_key = _key_for_dt(tweet_data['created_at']) partition = Tweet.partitions.get_partition(partition_key) tweet = partition(**tweet_data) tweet.save()
  40. 40. PyCon.DE 2013 40 / 52 django-parting CREATE TABLE "testapp_tweet_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "json" text NOT NULL, "created" datetime NOT NULL ); CREATE TABLE "testapp_star_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "tweet_id" integer NOT NULL REFERENCES "testapp_tweet_2013_03" ("id"), "user" text NOT NULL );
  41. 41. PyCon.DE 2013 41 / 52 django-parting Problems: • Not database-level partitioning • No django admin support • No active development
  42. 42. PyCon.DE 2013 42 / 52 django-db-parti RDBMS: • MySQL • PostgreSQL
  43. 43. PyCon.DE 2013 43 / 52 django-db-parti Features: • Real database-level partitioning • Automatic new partition creation in real-time • Django admin support
  44. 44. PyCon.DE 2013 44 / 52 django-db-parti From pypi: $ pip install django-db-parti or clone from github: $ git clone git://github.com/maxtepkeev/django-db-parti.git
  45. 45. PyCon.DE 2013 45 / 52 django-db-parti Add dbparti to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'dbparti' )
  46. 46. PyCon.DE 2013 46 / 52 django-db-parti In models.py add import statement: from dbparti.models import Partitionable Make your model to inherit from Partitionable: class YourModelName(Partitionable):
  47. 47. PyCon.DE 2013 47 / 52 django-db-parti Add a Meta class to your model with a few settings: class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added' Lastly initialize some database stuff with the command: $ python manage.py partition app_name
  48. 48. PyCon.DE 2013 48 / 52 django-db-parti Possible model settings partition_type: • range partition_subtype: • date partition_range: • day • week • month • year
  49. 49. PyCon.DE 2013 49 / 52 django-db-parti Customize how data will be displayed in the Django admin In admin.py add import statement: from dbparti.admin import PartitionableAdmin Make your admin to inherit from PartitionableAdmin: class YourModelAdminName(PartitionableAdmin): partition_show = 'all'
  50. 50. PyCon.DE 2013 50 / 52 django-db-parti Possible model admin settings partition_show: • all (default) • current • previous
  51. 51. PyCon.DE 2013 51 / 52 django-db-parti Problems: • Only range partitioning (datetime) • Database backend limitations
  52. 52. PyCon.DE 2013 52 / 52 Question time https://www.github.com/maxtepkeev/django-db-parti email: tepkeev@gmail.com skype: max.tepkeev

×