MariaDB Temporal Tables
Federico Razzoli
€ whoami
● Federico Razzoli
● Freelance consultant
● Writing SQL since MySQL 2.23
● info@federico-razzoli.com
● I love open source, sharing,
Collaboration, win-win, etc
● I love MariaDB, MySQL, Postgres, etc
○ Even Db2, somehow
Methods for
Versioning Data
Data versioning… why?
Several reasons:
● Auditing
● Travel back in time
○ Which / how many products were we selling in Dec 2016?
● Track a row’s history
○ History of the relationship with a customer
● Compare today’s situation with 6 month ago
○ How many EU employees did we lose because of Brexit?
● Statistics on data changes
○ Sales trends
● Find correlations
○ Sales decrease because we invest less in web marketing
Example
SELECT * FROM users WHERE id = 24 G
*************************** 1. row
id: 24
first_name: Jody
last_name: Whittaker
email: first_lady@doctorwho.co.uk
gender: F
birth_date: NULL
1 row in set (0.00 sec)
Method 1: track row versions
SELECT * FROM user_changes G
*************************** 1. row
id: 1
first_name: Jody
last_name: Whittaker
email: first_lady@doctorwho.co.uk
gender: F
valid_from: 2018-10-07
valid_to: NULL
1 row in set (0.00 sec)
Method 1: track row versions
What we can do (easily):
● Undo a column change
● Undo an UPDATE/DELETE
● Get the full state of a row at a given time
● See how often a row changes
Harder to do:
● Audit changes
● See how often a value changed over time
Method 2: track field changes
SELECT * FROM user_changes G
*************************** 1. row
id: 1
user_id: 24
field: email
old_value: jody@gmail.com
new_value: first_lady@doctorwho.co.uk
valid_from: 2018-10-07
valid_to: NULL
1 row in set (0.00 sec)
Method 2: track field changes
What we can do (easily):
● Undo a column change
● Audit changes
● See how a certain value changed over time
Harder to do:
● Undo an UPDATE/DELETE
● Get/restore an old row version
● See how often a row changes over time
System-Versioned Tables
They automagically implement the Keep Row Changes method
● You INSERT, DELETE, UPDATE and SELECT data, getting the same
results you would get with a regular table
● Old versions of the rows are stored in the same (logical) table
● To get old data, you need to use a special syntaxes, like:
SELECT … AS OF TIMESTAMP '2018/01/01 16:30:00';
System-Versioned Tables
Implementations
Where are sysver tables implemented?
In the proprietary DBMS world:
● Oracle 11g (2007)
● Db2 (2012)
● SQL Server 2016
Sometimes they are called Temporal Tables
In Db2, a temporal table can use system-period or application-period
Where are sysver tables implemented?
In the open source world:
● PostgreSQL, as an extension
● CockroachDB
● MariaDB 10.3
PostgreSQL and CockroachDB implementations have important
limitations
Where are sysver tables implemented?
In the NoSQL world:
● In HBase, rows have a version property
System-Versioned Tables
In MariaDB
Overview
● Implemented in MariaDB 10.3 (stable since Apr 2017)
● You must have row-start and row-end Generated Columns
○ Type: TIMESTAMP(6) or DATETIME(6)
○ You decide the names
○ These are Invisible Columns (10.3 feature)
● Any storage engine
○ Except CONNECT (MDEV-15968)
ALTER TABLE
● Forbidden by default
○ Changes that only affect metadata also forbidden
○ system_versioning_alter_history='KEEP’
● ADD COLUMN adds a column that is set to the current DEFAULT value
or NULL for all old version rows
○ When was the column added?
● DROP COLUMN also affects old versions of the rows
○ The column’s history is lost
● CHANGE COLUMN also affects old versions of the rows
○ The column’s history is modified
Example
CREATE OR REPLACE TABLE employee (
...
valid_from TIMESTAMP(6)
GENERATED ALWAYS AS ROW START
COMMENT ‘When the row was INSERTed',
valid_to TIMESTAMP(6)
GENERATED ALWAYS AS ROW END
COMMENT 'When row was DELETEd or UPDATEd',
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
)
WITH SYSTEM VERSIONING,
ENGINE InnoDB;
Adding versioning to an existing table
ALTER TABLE employee
LOCK = SHARED,
ALGORITHM = COPY,
ADD COLUMN valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
ADD COLUMN valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
ADD PERIOD FOR SYSTEM_TIME(valid_from, valid_to),
ADD SYSTEM VERSIONING
;
● Notice ALGORITHM=COPY and LOCK=SHARED
Querying historical data: point in time
SELECT * FROM my_table FOR SYSTEM_TIME
● AS OF TIMESTAMP'2018-10-01 12:00:00'
● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00'
● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW() ALL
SELECT * FROM my_table FOR SYSTEM_TIME ALL
WHERE valid_from < @end AND valid_to > @start;
Querying historical data: point in time
SELECT * FROM my_table FOR SYSTEM_TIME
● AS OF TIMESTAMP'2018-10-01 12:00:00'
● AS OF (SELECT valid_from FROM employee WHERE id=50)
● AS OF @some_event_timestamp
Querying historical data: time range
SELECT * FROM my_table FOR SYSTEM_TIME
● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00'
● FROM (SELECT ...) TO (SELECT ...)
● FROM @some_event TO @another_event
Querying historical data: time range
SELECT * FROM my_table FOR SYSTEM_TIME
● BETWEEN '2018-10-01 00:00:00' AND '2018-11-01 00:00:00'
● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW()
● BETWEEN @some_event AND (SELECT ...)
Querying historical data: example
INSERT INTO customer (id, first_name, last_name, email) VALUES
(1, 'William', 'Hartnell', 'the_first@gmail.com'),
(2, 'Tom', 'Baker', 'tom.baker@gmail.com');
SET @beginning_of_time := NOW(6);
DELETE FROM customer WHERE id = 1;
UPDATE customer SET email = 'tom.baker@hotmail.com' WHERE id=2;
INSERT INTO customer (id, first_name, last_name, email) VALUES
(3, 'Peter', 'Capaldi', 'capaldi.petey@gmail.com');
SET @twelve_regeneration := NOW(6);
INSERT INTO customer (id, first_name, last_name, email) VALUES
(4, 'Jody', 'Wittaker', 'jody@gmail.com');
Querying historical data: example
SELECT id, first_name, last_name, valid_from, valid_to
FROM customer
FOR SYSTEM_TIME
BETWEEN @beginning_of_time AND @twelve_regeneration;
+----+------------+-----------+----------------------------+----------------------------+
| id | first_name | last_name | valid_from | valid_to |
+----+------------+-----------+----------------------------+----------------------------+
| 1 | William | Hartnell | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.633414 |
| 2 | Tom | Baker | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.638419 |
| 2 | Tom | Baker | 2018-11-04 13:50:33.638419 | 2038-01-19 03:14:07.999999 |
| 3 | Peter | Capaldi | 2018-11-04 13:50:33.644880 | 2038-01-19 03:14:07.999999 |
+----+------------+-----------+----------------------------+----------------------------+
Partitions
● By default, the history is stored together with current data
● You can put the history on separate partitions
● And limit them:
○ By rows number
○ By time
Indexes
● The ROW END column is appended to UNIQUE indexes and the PK
● Other indexes are untouched
○ You may consider adding ROW END to some of your indexes
○ This is a good reason to define temporal columns explicitly
Practical
Use Cases
Basic Examples
● First examples take advantage of temporal columns to identify INSERTs,
DELETEs and UPDATEs
Get DELETEd rows
● Get canceled orders:
SET @eot := '2038-01-19 03:14:07.999999';
SELECT * FROM `order` FOR SYSTEM_TIME ALL
WHERE valid_to < @eot;
● Get orders canceled today:
SELECT COUNT(*)
FROM `order` FOR SYSTEM_TIME
WHERE valid_to = @
AND valid_from
DATE(NOW()) AND (DATE(NOW()) + INTERVAL 1 DAY);
Get INSERTed rows
● Get orders generated today:
SELECT id, MIN(valid_from) AS insert_time
FROM `order` FOR SYSTEM_TIME ALL
GROUP BY id
HAVING DATE(MIN(valid_from)) = DATE(NOW())
ORDER BY MIN(valid_from);
Get UPDATEd rows
● How many times orders were modified:
SELECT
-- exclude the INSERTions
id, (COUNT(*) – 1) AS how_many_edits
FROM `order` FOR SYSTEM_TIME ALL
-- exclude DELETions
WHERE valid_to < @eot
GROUP BY id
ORDER BY how_many_edits;
Debug mistakes in a single row
● Wrong data has been found in a row. The original version of the row was
correct, so we want to know when the mistake (or malicious change)
happened
Debug mistakes in a single row
● Find all versions of a row:
SELECT id, status, valid_from, valid_to
FROM `order` FOR SYSTEM_TIME ALL
WHERE id = 24;
Debug mistakes in a single row
● When an order was blocked:
SELECT DATE(MIN(valid_from)) AS block_date
FROM `order` FOR SYSTEM_TIME ALL
WHERE status = 'BLOCKED'
AND id = 24;
Debug mistakes in a single row
● Status is wrong. To find out how the problem happened, we want to check the
INSERT and all following status changes:
SELECT id, status, valid_from, valid_to
FROM (
SELECT
NOT (status <=> @prev_status) AS status_changed,
@prev_status := status,
id, status, valid_from, valid_to
FROM `order` FOR SYSTEM_TIME ALL
WHERE id = 24
) t
WHERE status_changed = 1;
Debug mistakes in a single row
● We want to know if the status is the same as one month ago:
SELECT
present.id,
present.status AS current_status,
past.status AS past_status
FROM `order` present
INNER JOIN `order`
FOR SYSTEM_TIME AS OF TIMESTAMP
NOW() - INTERVAL 1 MONTH
AS past
ON present.id = past.id
ORDER BY present.id;
Stats on data changes
SELECT
AVG(amount), STDDEV(amount),
MAX(amount), MIN(amount),
COUNT(amount)
FROM account FOR SYSTEM_TIME ALL
WHERE customer_id = 24
AND valid_from BETWEEN '2016-00-00' AND NOW()
GROUP BY customer_id;
Questions?

MariaDB Temporal Tables

  • 1.
  • 2.
    € whoami ● FedericoRazzoli ● Freelance consultant ● Writing SQL since MySQL 2.23 ● info@federico-razzoli.com ● I love open source, sharing, Collaboration, win-win, etc ● I love MariaDB, MySQL, Postgres, etc ○ Even Db2, somehow
  • 3.
  • 4.
    Data versioning… why? Severalreasons: ● Auditing ● Travel back in time ○ Which / how many products were we selling in Dec 2016? ● Track a row’s history ○ History of the relationship with a customer ● Compare today’s situation with 6 month ago ○ How many EU employees did we lose because of Brexit? ● Statistics on data changes ○ Sales trends ● Find correlations ○ Sales decrease because we invest less in web marketing
  • 5.
    Example SELECT * FROMusers WHERE id = 24 G *************************** 1. row id: 24 first_name: Jody last_name: Whittaker email: first_lady@doctorwho.co.uk gender: F birth_date: NULL 1 row in set (0.00 sec)
  • 6.
    Method 1: trackrow versions SELECT * FROM user_changes G *************************** 1. row id: 1 first_name: Jody last_name: Whittaker email: first_lady@doctorwho.co.uk gender: F valid_from: 2018-10-07 valid_to: NULL 1 row in set (0.00 sec)
  • 7.
    Method 1: trackrow versions What we can do (easily): ● Undo a column change ● Undo an UPDATE/DELETE ● Get the full state of a row at a given time ● See how often a row changes Harder to do: ● Audit changes ● See how often a value changed over time
  • 8.
    Method 2: trackfield changes SELECT * FROM user_changes G *************************** 1. row id: 1 user_id: 24 field: email old_value: jody@gmail.com new_value: first_lady@doctorwho.co.uk valid_from: 2018-10-07 valid_to: NULL 1 row in set (0.00 sec)
  • 9.
    Method 2: trackfield changes What we can do (easily): ● Undo a column change ● Audit changes ● See how a certain value changed over time Harder to do: ● Undo an UPDATE/DELETE ● Get/restore an old row version ● See how often a row changes over time
  • 10.
    System-Versioned Tables They automagicallyimplement the Keep Row Changes method ● You INSERT, DELETE, UPDATE and SELECT data, getting the same results you would get with a regular table ● Old versions of the rows are stored in the same (logical) table ● To get old data, you need to use a special syntaxes, like: SELECT … AS OF TIMESTAMP '2018/01/01 16:30:00';
  • 11.
  • 12.
    Where are sysvertables implemented? In the proprietary DBMS world: ● Oracle 11g (2007) ● Db2 (2012) ● SQL Server 2016 Sometimes they are called Temporal Tables In Db2, a temporal table can use system-period or application-period
  • 13.
    Where are sysvertables implemented? In the open source world: ● PostgreSQL, as an extension ● CockroachDB ● MariaDB 10.3 PostgreSQL and CockroachDB implementations have important limitations
  • 14.
    Where are sysvertables implemented? In the NoSQL world: ● In HBase, rows have a version property
  • 15.
  • 16.
    Overview ● Implemented inMariaDB 10.3 (stable since Apr 2017) ● You must have row-start and row-end Generated Columns ○ Type: TIMESTAMP(6) or DATETIME(6) ○ You decide the names ○ These are Invisible Columns (10.3 feature) ● Any storage engine ○ Except CONNECT (MDEV-15968)
  • 17.
    ALTER TABLE ● Forbiddenby default ○ Changes that only affect metadata also forbidden ○ system_versioning_alter_history='KEEP’ ● ADD COLUMN adds a column that is set to the current DEFAULT value or NULL for all old version rows ○ When was the column added? ● DROP COLUMN also affects old versions of the rows ○ The column’s history is lost ● CHANGE COLUMN also affects old versions of the rows ○ The column’s history is modified
  • 18.
    Example CREATE OR REPLACETABLE employee ( ... valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START COMMENT ‘When the row was INSERTed', valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END COMMENT 'When row was DELETEd or UPDATEd', PERIOD FOR SYSTEM_TIME (valid_from, valid_to) ) WITH SYSTEM VERSIONING, ENGINE InnoDB;
  • 19.
    Adding versioning toan existing table ALTER TABLE employee LOCK = SHARED, ALGORITHM = COPY, ADD COLUMN valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START, ADD COLUMN valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END, ADD PERIOD FOR SYSTEM_TIME(valid_from, valid_to), ADD SYSTEM VERSIONING ; ● Notice ALGORITHM=COPY and LOCK=SHARED
  • 20.
    Querying historical data:point in time SELECT * FROM my_table FOR SYSTEM_TIME ● AS OF TIMESTAMP'2018-10-01 12:00:00' ● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00' ● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW() ALL SELECT * FROM my_table FOR SYSTEM_TIME ALL WHERE valid_from < @end AND valid_to > @start;
  • 21.
    Querying historical data:point in time SELECT * FROM my_table FOR SYSTEM_TIME ● AS OF TIMESTAMP'2018-10-01 12:00:00' ● AS OF (SELECT valid_from FROM employee WHERE id=50) ● AS OF @some_event_timestamp
  • 22.
    Querying historical data:time range SELECT * FROM my_table FOR SYSTEM_TIME ● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00' ● FROM (SELECT ...) TO (SELECT ...) ● FROM @some_event TO @another_event
  • 23.
    Querying historical data:time range SELECT * FROM my_table FOR SYSTEM_TIME ● BETWEEN '2018-10-01 00:00:00' AND '2018-11-01 00:00:00' ● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW() ● BETWEEN @some_event AND (SELECT ...)
  • 24.
    Querying historical data:example INSERT INTO customer (id, first_name, last_name, email) VALUES (1, 'William', 'Hartnell', 'the_first@gmail.com'), (2, 'Tom', 'Baker', 'tom.baker@gmail.com'); SET @beginning_of_time := NOW(6); DELETE FROM customer WHERE id = 1; UPDATE customer SET email = 'tom.baker@hotmail.com' WHERE id=2; INSERT INTO customer (id, first_name, last_name, email) VALUES (3, 'Peter', 'Capaldi', 'capaldi.petey@gmail.com'); SET @twelve_regeneration := NOW(6); INSERT INTO customer (id, first_name, last_name, email) VALUES (4, 'Jody', 'Wittaker', 'jody@gmail.com');
  • 25.
    Querying historical data:example SELECT id, first_name, last_name, valid_from, valid_to FROM customer FOR SYSTEM_TIME BETWEEN @beginning_of_time AND @twelve_regeneration; +----+------------+-----------+----------------------------+----------------------------+ | id | first_name | last_name | valid_from | valid_to | +----+------------+-----------+----------------------------+----------------------------+ | 1 | William | Hartnell | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.633414 | | 2 | Tom | Baker | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.638419 | | 2 | Tom | Baker | 2018-11-04 13:50:33.638419 | 2038-01-19 03:14:07.999999 | | 3 | Peter | Capaldi | 2018-11-04 13:50:33.644880 | 2038-01-19 03:14:07.999999 | +----+------------+-----------+----------------------------+----------------------------+
  • 26.
    Partitions ● By default,the history is stored together with current data ● You can put the history on separate partitions ● And limit them: ○ By rows number ○ By time
  • 27.
    Indexes ● The ROWEND column is appended to UNIQUE indexes and the PK ● Other indexes are untouched ○ You may consider adding ROW END to some of your indexes ○ This is a good reason to define temporal columns explicitly
  • 28.
  • 29.
    Basic Examples ● Firstexamples take advantage of temporal columns to identify INSERTs, DELETEs and UPDATEs
  • 30.
    Get DELETEd rows ●Get canceled orders: SET @eot := '2038-01-19 03:14:07.999999'; SELECT * FROM `order` FOR SYSTEM_TIME ALL WHERE valid_to < @eot; ● Get orders canceled today: SELECT COUNT(*) FROM `order` FOR SYSTEM_TIME WHERE valid_to = @ AND valid_from DATE(NOW()) AND (DATE(NOW()) + INTERVAL 1 DAY);
  • 31.
    Get INSERTed rows ●Get orders generated today: SELECT id, MIN(valid_from) AS insert_time FROM `order` FOR SYSTEM_TIME ALL GROUP BY id HAVING DATE(MIN(valid_from)) = DATE(NOW()) ORDER BY MIN(valid_from);
  • 32.
    Get UPDATEd rows ●How many times orders were modified: SELECT -- exclude the INSERTions id, (COUNT(*) – 1) AS how_many_edits FROM `order` FOR SYSTEM_TIME ALL -- exclude DELETions WHERE valid_to < @eot GROUP BY id ORDER BY how_many_edits;
  • 33.
    Debug mistakes ina single row ● Wrong data has been found in a row. The original version of the row was correct, so we want to know when the mistake (or malicious change) happened
  • 34.
    Debug mistakes ina single row ● Find all versions of a row: SELECT id, status, valid_from, valid_to FROM `order` FOR SYSTEM_TIME ALL WHERE id = 24;
  • 35.
    Debug mistakes ina single row ● When an order was blocked: SELECT DATE(MIN(valid_from)) AS block_date FROM `order` FOR SYSTEM_TIME ALL WHERE status = 'BLOCKED' AND id = 24;
  • 36.
    Debug mistakes ina single row ● Status is wrong. To find out how the problem happened, we want to check the INSERT and all following status changes: SELECT id, status, valid_from, valid_to FROM ( SELECT NOT (status <=> @prev_status) AS status_changed, @prev_status := status, id, status, valid_from, valid_to FROM `order` FOR SYSTEM_TIME ALL WHERE id = 24 ) t WHERE status_changed = 1;
  • 37.
    Debug mistakes ina single row ● We want to know if the status is the same as one month ago: SELECT present.id, present.status AS current_status, past.status AS past_status FROM `order` present INNER JOIN `order` FOR SYSTEM_TIME AS OF TIMESTAMP NOW() - INTERVAL 1 MONTH AS past ON present.id = past.id ORDER BY present.id;
  • 38.
    Stats on datachanges SELECT AVG(amount), STDDEV(amount), MAX(amount), MIN(amount), COUNT(amount) FROM account FOR SYSTEM_TIME ALL WHERE customer_id = 24 AND valid_from BETWEEN '2016-00-00' AND NOW() GROUP BY customer_id;
  • 39.