PyCon.DE 2013 1 / 52
Table partitioning
with Django
Max Tepkeev
17 October 2013
Cologne, Germany
PyCon.DE 2013 2 / 52
Partitioning
Theory
Django
Packages
Realization
PyCon.DE 2013 3 / 52
Partitioning Theory
PyCon.DE 2013 4 / 52
Definition
Table partitioning - division of one
table into several tables, called
partitions, which still represent
original table.
PyCon.DE 2013 5 / 52
Why
• Performance
• Manageability
• Availability
PyCon.DE 2013 6 / 52
When
• Tables greater than 2GB
• Tables with historical data
• Table need to be distributed across
different types of storage devices
• Queries ALWAYS contain a filter on the
partition field
PyCon.DE 2013 7 / 52
Methods
• Horizontal partitioning
• Vertical partitioning
PyCon.DE 2013 8 / 52
Strategies
• Range partitioning
• List partitioning
• Hash partitioning
• Composite partitioning
PyCon.DE 2013 9 / 52
Strategies
PyCon.DE 2013 10 / 52
Strategies
PyCon.DE 2013 11 / 52
Example
id user_id entry added
1 345 Login 2013-08-22 17:24:43
2 345 Went to Store section 2013-08-22 17:25:01
3 345 Ordered a book 2013-08-22 17:33:28
4 345 Payed for a book 2013-08-22 17:35:54
5 345 Logout 2013-08-22 17:38:32
PyCon.DE 2013 12 / 52
Example
INSERT INTO user_actions (user_id, entry, added)
VALUES (237, 'Login', '2013-08-21 11:54:08')
Goes to user_actions_y2013m08
INSERT INTO user_actions (user_id, entry, added)
VALUES (198, 'Logout', '2013-09-01 08:43:42')
Goes to user_actions_y2013m09
PyCon.DE 2013 13 / 52
Example
SELECT * FROM user_actions
id user_id entry added
1 237 Login 2013-08-21 11:54:08
2 198 Logout 2013-09-01 08:43:42
Table partitioning is “transparent”. You don’t need to change
your code to work with partitioned tables.
PyCon.DE 2013 14 / 52
Realization
PyCon.DE 2013 15 / 52
RDBMS
• PostgreSQL
• MySQL
PyCon.DE 2013 16 / 52
PostgreSQL
Methods:
• Horizontal partitioning
Strategies:
• Range partitioning
• List partitioning
PyCon.DE 2013 17 / 52
PostgreSQL
Implementation:
• Inheritance
Available:
• >= 8.1
PyCon.DE 2013 18 / 52
PostgreSQL
Steps:
• Master table
• Child tables
• Correct partition insertion function
• Trigger that calls partition insertion function
• Function to delete duplicate rows from master
• Trigger that calls delete duplicate rows function
PyCon.DE 2013 19 / 52
PostgreSQL
CREATE TABLE logs (
id serial,
entry text NOT NULL,
added timestamp(6) NOT NULL,
CONSTRAINT logs_pkey PRIMARY KEY (id)
)
Master table:
PyCon.DE 2013 20 / 52
PostgreSQL
CREATE TABLE logs_y2013m08 (
CHECK (
added >= '2013-08-01 00:00:00'::timestamp AND
added <= '2013-08-31 23:59:59'::timestamp
)
) INHERITS (logs);
Child table:
PyCon.DE 2013 21 / 52
PostgreSQL
CREATE FUNCTION "logs_insert_child"() RETURNS "trigger"
AS $BODY$
DECLARE tablename TEXT;
BEGIN
tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM');
EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);'
USING NEW;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
Correct partition insertion function:
PyCon.DE 2013 22 / 52
PostgreSQL
CREATE TRIGGER "before_insert_logs_trigger"
BEFORE INSERT ON "logs"
FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"();
Trigger that calls partition insertion function:
PyCon.DE 2013 23 / 52
PostgreSQL
CREATE FUNCTION "logs_delete_master"() RETURNS "trigger"
AS $BODY$
BEGIN
DELETE FROM ONLY logs WHERE id = NEW.id;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
Function to delete duplicate rows from master:
PyCon.DE 2013 24 / 52
PostgreSQL
CREATE TRIGGER "after_insert_logs_trigger"
AFTER INSERT ON "logs"
FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"();
Trigger that calls delete duplicate rows function:
PyCon.DE 2013 25 / 52
Code for automatic new partition creation
PostgreSQL
DECLARE start_date TIMESTAMP;
start_date := date_trunc('month', NEW.added);
IF NOT EXISTS(
SELECT relname FROM pg_class WHERE relname=tablename)
THEN
EXECUTE 'CREATE TABLE ' || tablename || ' (
CHECK (
added >= ''' || start_date || ''' AND
added <= ''' || start_date + '1 month'::interval || '''
)
) INHERITS ('logs');';
END IF;
PyCon.DE 2013 26 / 52
MySQL
Methods:
• Horizontal partitioning
Strategies:
• Range partitioning
• List partitioning
• Hash partitioning
• Composite partitioning
PyCon.DE 2013 27 / 52
MySQL
Implementation:
• Native (PARTITION BY)
Available:
• >= 5.1
PyCon.DE 2013 28 / 52
How that works
MySQL
CREATE TABLE members (
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (2012),
PARTITION p1 VALUES LESS THAN (2013),
PARTITION p2 VALUES LESS THAN MAXVALUE
);
PyCon.DE 2013 29 / 52
MySQL
Limitations
• From lowest to highest (range)
• Foreign Key
• No real-time partition creation
PyCon.DE 2013 30 / 52
Django Packages
PyCon.DE 2013 31 / 52
Packages
• django-parting
• django-db-parti
PyCon.DE 2013 32 / 52
django-parting
RDBMS:
• PostgreSQL
PyCon.DE 2013 33 / 52
django-parting
Features:
• Partition tables with Foreign Keys
PyCon.DE 2013 34 / 52
django-parting
From pypi:
$ pip install django-parting
or clone from github:
$ git clone git://github.com/danfairs/django-parting.git
PyCon.DE 2013 35 / 52
django-parting
Add parting to PYTHONPATH and installed applications:
INSTALLED_APPS = (
...
'parting'
)
PyCon.DE 2013 36 / 52
django-parting
from django.db import models
from django.utils import timezone
class Tweet(models.Model):
json = models.TextField()
user = models.TextField()
created_at = models.DateTimeField(default=timezone.now())
class Star(models.Model):
tweet = models.ForeignKey(Tweet)
user = models.TextField()
PyCon.DE 2013 37 / 52
django-parting
from django.utils import timezone
from parting import PartitionManager
from dateutil.relativedelta import relativedelta
def _key_for_date(dt):
return dt.strftime('%Y%m')
class TweetPartitionManager(PartitionManager):
def current_partition(self):
return _key_for_date(timezone.now())
def next_partition(self):
one_months_time = timezone.now() + relativedelta(months=1)
return _key_for_date(one_months_time)
PyCon.DE 2013 38 / 52
django-parting
class Tweet(models.Model):
json = models.TextField()
user = models.TextField()
created_at = models.DateTimeField(default=timezone.now())
partitions = TweetPartitionManager()
class Meta:
abstract = True
class Star(models.Model):
tweet = models.PartitionForeignKey(Tweet)
user = models.TextField()
partitions = TweetPartitionManager()
class Meta:
abstract = True
PyCon.DE 2013 39 / 52
django-parting
import json
from django.utils.timezone import make_aware, utc
tweet_data = {
'created_at': make_aware(
datetime.datetime(2012, 12, 6, 14, 23), utc)
'json': json.dumps({'key': 'value'}),
'user': 'Jimmy'
}
partition_key = _key_for_dt(tweet_data['created_at'])
partition = Tweet.partitions.get_partition(partition_key)
tweet = partition(**tweet_data)
tweet.save()
PyCon.DE 2013 40 / 52
django-parting
CREATE TABLE "testapp_tweet_2013_03" (
"id" integer NOT NULL PRIMARY KEY,
"json" text NOT NULL,
"created" datetime NOT NULL
);
CREATE TABLE "testapp_star_2013_03" (
"id" integer NOT NULL PRIMARY KEY,
"tweet_id" integer NOT NULL REFERENCES
"testapp_tweet_2013_03" ("id"),
"user" text NOT NULL
);
PyCon.DE 2013 41 / 52
django-parting
Problems:
• Not database-level partitioning
• No django admin support
• No active development
PyCon.DE 2013 42 / 52
django-db-parti
RDBMS:
• MySQL
• PostgreSQL
PyCon.DE 2013 43 / 52
django-db-parti
Features:
• Real database-level partitioning
• Automatic new partition creation in real-time
• Django admin support
PyCon.DE 2013 44 / 52
django-db-parti
From pypi:
$ pip install django-db-parti
or clone from github:
$ git clone git://github.com/maxtepkeev/django-db-parti.git
PyCon.DE 2013 45 / 52
django-db-parti
Add dbparti to PYTHONPATH and installed applications:
INSTALLED_APPS = (
...
'dbparti'
)
PyCon.DE 2013 46 / 52
django-db-parti
In models.py add import statement:
from dbparti.models import Partitionable
Make your model to inherit from Partitionable:
class YourModelName(Partitionable):
PyCon.DE 2013 47 / 52
django-db-parti
Add a Meta class to your model with a few settings:
class Meta(Partitionable.Meta):
partition_type = 'range'
partition_subtype = 'date'
partition_range = 'month'
partition_column = 'added'
Lastly initialize some database stuff with the command:
$ python manage.py partition app_name
PyCon.DE 2013 48 / 52
django-db-parti
Possible model settings
partition_type:
• range
partition_subtype:
• date
partition_range:
• day
• week
• month
• year
PyCon.DE 2013 49 / 52
django-db-parti
Customize how data will be displayed in the Django admin
In admin.py add import statement:
from dbparti.admin import PartitionableAdmin
Make your admin to inherit from PartitionableAdmin:
class YourModelAdminName(PartitionableAdmin):
partition_show = 'all'
PyCon.DE 2013 50 / 52
django-db-parti
Possible model admin settings
partition_show:
• all (default)
• current
• previous
PyCon.DE 2013 51 / 52
django-db-parti
Problems:
• Only range partitioning (datetime)
• Database backend limitations
PyCon.DE 2013 52 / 52
Question time
https://www.github.com/maxtepkeev/django-db-parti
email: tepkeev@gmail.com
skype: max.tepkeev

PyCon DE 2013 - Table Partitioning with Django

  • 1.
    PyCon.DE 2013 1/ 52 Table partitioning with Django Max Tepkeev 17 October 2013 Cologne, Germany
  • 2.
    PyCon.DE 2013 2/ 52 Partitioning Theory Django Packages Realization
  • 3.
    PyCon.DE 2013 3/ 52 Partitioning Theory
  • 4.
    PyCon.DE 2013 4/ 52 Definition Table partitioning - division of one table into several tables, called partitions, which still represent original table.
  • 5.
    PyCon.DE 2013 5/ 52 Why • Performance • Manageability • Availability
  • 6.
    PyCon.DE 2013 6/ 52 When • Tables greater than 2GB • Tables with historical data • Table need to be distributed across different types of storage devices • Queries ALWAYS contain a filter on the partition field
  • 7.
    PyCon.DE 2013 7/ 52 Methods • Horizontal partitioning • Vertical partitioning
  • 8.
    PyCon.DE 2013 8/ 52 Strategies • Range partitioning • List partitioning • Hash partitioning • Composite partitioning
  • 9.
    PyCon.DE 2013 9/ 52 Strategies
  • 10.
    PyCon.DE 2013 10/ 52 Strategies
  • 11.
    PyCon.DE 2013 11/ 52 Example id user_id entry added 1 345 Login 2013-08-22 17:24:43 2 345 Went to Store section 2013-08-22 17:25:01 3 345 Ordered a book 2013-08-22 17:33:28 4 345 Payed for a book 2013-08-22 17:35:54 5 345 Logout 2013-08-22 17:38:32
  • 12.
    PyCon.DE 2013 12/ 52 Example INSERT INTO user_actions (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08') Goes to user_actions_y2013m08 INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42') Goes to user_actions_y2013m09
  • 13.
    PyCon.DE 2013 13/ 52 Example SELECT * FROM user_actions id user_id entry added 1 237 Login 2013-08-21 11:54:08 2 198 Logout 2013-09-01 08:43:42 Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.
  • 14.
    PyCon.DE 2013 14/ 52 Realization
  • 15.
    PyCon.DE 2013 15/ 52 RDBMS • PostgreSQL • MySQL
  • 16.
    PyCon.DE 2013 16/ 52 PostgreSQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning
  • 17.
    PyCon.DE 2013 17/ 52 PostgreSQL Implementation: • Inheritance Available: • >= 8.1
  • 18.
    PyCon.DE 2013 18/ 52 PostgreSQL Steps: • Master table • Child tables • Correct partition insertion function • Trigger that calls partition insertion function • Function to delete duplicate rows from master • Trigger that calls delete duplicate rows function
  • 19.
    PyCon.DE 2013 19/ 52 PostgreSQL CREATE TABLE logs ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id) ) Master table:
  • 20.
    PyCon.DE 2013 20/ 52 PostgreSQL CREATE TABLE logs_y2013m08 ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp ) ) INHERITS (logs); Child table:
  • 21.
    PyCon.DE 2013 21/ 52 PostgreSQL CREATE FUNCTION "logs_insert_child"() RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Correct partition insertion function:
  • 22.
    PyCon.DE 2013 22/ 52 PostgreSQL CREATE TRIGGER "before_insert_logs_trigger" BEFORE INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"(); Trigger that calls partition insertion function:
  • 23.
    PyCon.DE 2013 23/ 52 PostgreSQL CREATE FUNCTION "logs_delete_master"() RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Function to delete duplicate rows from master:
  • 24.
    PyCon.DE 2013 24/ 52 PostgreSQL CREATE TRIGGER "after_insert_logs_trigger" AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"(); Trigger that calls delete duplicate rows function:
  • 25.
    PyCon.DE 2013 25/ 52 Code for automatic new partition creation PostgreSQL DECLARE start_date TIMESTAMP; start_date := date_trunc('month', NEW.added); IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');'; END IF;
  • 26.
    PyCon.DE 2013 26/ 52 MySQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning • Hash partitioning • Composite partitioning
  • 27.
    PyCon.DE 2013 27/ 52 MySQL Implementation: • Native (PARTITION BY) Available: • >= 5.1
  • 28.
    PyCon.DE 2013 28/ 52 How that works MySQL CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE );
  • 29.
    PyCon.DE 2013 29/ 52 MySQL Limitations • From lowest to highest (range) • Foreign Key • No real-time partition creation
  • 30.
    PyCon.DE 2013 30/ 52 Django Packages
  • 31.
    PyCon.DE 2013 31/ 52 Packages • django-parting • django-db-parti
  • 32.
    PyCon.DE 2013 32/ 52 django-parting RDBMS: • PostgreSQL
  • 33.
    PyCon.DE 2013 33/ 52 django-parting Features: • Partition tables with Foreign Keys
  • 34.
    PyCon.DE 2013 34/ 52 django-parting From pypi: $ pip install django-parting or clone from github: $ git clone git://github.com/danfairs/django-parting.git
  • 35.
    PyCon.DE 2013 35/ 52 django-parting Add parting to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'parting' )
  • 36.
    PyCon.DE 2013 36/ 52 django-parting from django.db import models from django.utils import timezone class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()
  • 37.
    PyCon.DE 2013 37/ 52 django-parting from django.utils import timezone from parting import PartitionManager from dateutil.relativedelta import relativedelta def _key_for_date(dt): return dt.strftime('%Y%m') class TweetPartitionManager(PartitionManager): def current_partition(self): return _key_for_date(timezone.now()) def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)
  • 38.
    PyCon.DE 2013 38/ 52 django-parting class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) partitions = TweetPartitionManager() class Meta: abstract = True class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField() partitions = TweetPartitionManager() class Meta: abstract = True
  • 39.
    PyCon.DE 2013 39/ 52 django-parting import json from django.utils.timezone import make_aware, utc tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy' } partition_key = _key_for_dt(tweet_data['created_at']) partition = Tweet.partitions.get_partition(partition_key) tweet = partition(**tweet_data) tweet.save()
  • 40.
    PyCon.DE 2013 40/ 52 django-parting CREATE TABLE "testapp_tweet_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "json" text NOT NULL, "created" datetime NOT NULL ); CREATE TABLE "testapp_star_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "tweet_id" integer NOT NULL REFERENCES "testapp_tweet_2013_03" ("id"), "user" text NOT NULL );
  • 41.
    PyCon.DE 2013 41/ 52 django-parting Problems: • Not database-level partitioning • No django admin support • No active development
  • 42.
    PyCon.DE 2013 42/ 52 django-db-parti RDBMS: • MySQL • PostgreSQL
  • 43.
    PyCon.DE 2013 43/ 52 django-db-parti Features: • Real database-level partitioning • Automatic new partition creation in real-time • Django admin support
  • 44.
    PyCon.DE 2013 44/ 52 django-db-parti From pypi: $ pip install django-db-parti or clone from github: $ git clone git://github.com/maxtepkeev/django-db-parti.git
  • 45.
    PyCon.DE 2013 45/ 52 django-db-parti Add dbparti to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'dbparti' )
  • 46.
    PyCon.DE 2013 46/ 52 django-db-parti In models.py add import statement: from dbparti.models import Partitionable Make your model to inherit from Partitionable: class YourModelName(Partitionable):
  • 47.
    PyCon.DE 2013 47/ 52 django-db-parti Add a Meta class to your model with a few settings: class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added' Lastly initialize some database stuff with the command: $ python manage.py partition app_name
  • 48.
    PyCon.DE 2013 48/ 52 django-db-parti Possible model settings partition_type: • range partition_subtype: • date partition_range: • day • week • month • year
  • 49.
    PyCon.DE 2013 49/ 52 django-db-parti Customize how data will be displayed in the Django admin In admin.py add import statement: from dbparti.admin import PartitionableAdmin Make your admin to inherit from PartitionableAdmin: class YourModelAdminName(PartitionableAdmin): partition_show = 'all'
  • 50.
    PyCon.DE 2013 50/ 52 django-db-parti Possible model admin settings partition_show: • all (default) • current • previous
  • 51.
    PyCon.DE 2013 51/ 52 django-db-parti Problems: • Only range partitioning (datetime) • Database backend limitations
  • 52.
    PyCon.DE 2013 52/ 52 Question time https://www.github.com/maxtepkeev/django-db-parti email: tepkeev@gmail.com skype: max.tepkeev