Table partitioning can be thought of as a division of one large table into several smaller tables which represent that original table. Table partitioning is "transparent", that means that in theory you don't need to change any code to work with partitioned tables.
We will talk about table partitioning theory in general and implementations in different database servers. Why and when we need to do table partitioning. What problems we can face and how we can solve them.
Django provides us with great database abstraction and ORM, but how can we use it with table partitioning ? We will talk about existing libraries for Django to work with table partitioning, their differences, which is the best (if any) and why.
4. PyCon.DE 2013 4 / 52
Definition
Table partitioning - division of one
table into several tables, called
partitions, which still represent
original table.
6. PyCon.DE 2013 6 / 52
When
• Tables greater than 2GB
• Tables with historical data
• Table need to be distributed across
different types of storage devices
• Queries ALWAYS contain a filter on the
partition field
11. PyCon.DE 2013 11 / 52
Example
id user_id entry added
1 345 Login 2013-08-22 17:24:43
2 345 Went to Store section 2013-08-22 17:25:01
3 345 Ordered a book 2013-08-22 17:33:28
4 345 Payed for a book 2013-08-22 17:35:54
5 345 Logout 2013-08-22 17:38:32
12. PyCon.DE 2013 12 / 52
Example
INSERT INTO user_actions (user_id, entry, added)
VALUES (237, 'Login', '2013-08-21 11:54:08')
Goes to user_actions_y2013m08
INSERT INTO user_actions (user_id, entry, added)
VALUES (198, 'Logout', '2013-09-01 08:43:42')
Goes to user_actions_y2013m09
13. PyCon.DE 2013 13 / 52
Example
SELECT * FROM user_actions
id user_id entry added
1 237 Login 2013-08-21 11:54:08
2 198 Logout 2013-09-01 08:43:42
Table partitioning is “transparent”. You don’t need to change
your code to work with partitioned tables.
21. PyCon.DE 2013 21 / 52
PostgreSQL
CREATE FUNCTION "logs_insert_child"() RETURNS "trigger"
AS $BODY$
DECLARE tablename TEXT;
BEGIN
tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM');
EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);'
USING NEW;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
Correct partition insertion function:
22. PyCon.DE 2013 22 / 52
PostgreSQL
CREATE TRIGGER "before_insert_logs_trigger"
BEFORE INSERT ON "logs"
FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"();
Trigger that calls partition insertion function:
23. PyCon.DE 2013 23 / 52
PostgreSQL
CREATE FUNCTION "logs_delete_master"() RETURNS "trigger"
AS $BODY$
BEGIN
DELETE FROM ONLY logs WHERE id = NEW.id;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
Function to delete duplicate rows from master:
24. PyCon.DE 2013 24 / 52
PostgreSQL
CREATE TRIGGER "after_insert_logs_trigger"
AFTER INSERT ON "logs"
FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"();
Trigger that calls delete duplicate rows function:
25. PyCon.DE 2013 25 / 52
Code for automatic new partition creation
PostgreSQL
DECLARE start_date TIMESTAMP;
start_date := date_trunc('month', NEW.added);
IF NOT EXISTS(
SELECT relname FROM pg_class WHERE relname=tablename)
THEN
EXECUTE 'CREATE TABLE ' || tablename || ' (
CHECK (
added >= ''' || start_date || ''' AND
added <= ''' || start_date + '1 month'::interval || '''
)
) INHERITS ('logs');';
END IF;
26. PyCon.DE 2013 26 / 52
MySQL
Methods:
• Horizontal partitioning
Strategies:
• Range partitioning
• List partitioning
• Hash partitioning
• Composite partitioning
28. PyCon.DE 2013 28 / 52
How that works
MySQL
CREATE TABLE members (
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (2012),
PARTITION p1 VALUES LESS THAN (2013),
PARTITION p2 VALUES LESS THAN MAXVALUE
);
29. PyCon.DE 2013 29 / 52
MySQL
Limitations
• From lowest to highest (range)
• Foreign Key
• No real-time partition creation
43. PyCon.DE 2013 43 / 52
django-db-parti
Features:
• Real database-level partitioning
• Automatic new partition creation in real-time
• Django admin support
44. PyCon.DE 2013 44 / 52
django-db-parti
From pypi:
$ pip install django-db-parti
or clone from github:
$ git clone git://github.com/maxtepkeev/django-db-parti.git
45. PyCon.DE 2013 45 / 52
django-db-parti
Add dbparti to PYTHONPATH and installed applications:
INSTALLED_APPS = (
...
'dbparti'
)
46. PyCon.DE 2013 46 / 52
django-db-parti
In models.py add import statement:
from dbparti.models import Partitionable
Make your model to inherit from Partitionable:
class YourModelName(Partitionable):
47. PyCon.DE 2013 47 / 52
django-db-parti
Add a Meta class to your model with a few settings:
class Meta(Partitionable.Meta):
partition_type = 'range'
partition_subtype = 'date'
partition_range = 'month'
partition_column = 'added'
Lastly initialize some database stuff with the command:
$ python manage.py partition app_name
48. PyCon.DE 2013 48 / 52
django-db-parti
Possible model settings
partition_type:
• range
partition_subtype:
• date
partition_range:
• day
• week
• month
• year
49. PyCon.DE 2013 49 / 52
django-db-parti
Customize how data will be displayed in the Django admin
In admin.py add import statement:
from dbparti.admin import PartitionableAdmin
Make your admin to inherit from PartitionableAdmin:
class YourModelAdminName(PartitionableAdmin):
partition_show = 'all'
50. PyCon.DE 2013 50 / 52
django-db-parti
Possible model admin settings
partition_show:
• all (default)
• current
• previous
51. PyCon.DE 2013 51 / 52
django-db-parti
Problems:
• Only range partitioning (datetime)
• Database backend limitations