Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Querying Data
at a Previous Point in Time
Alexander Krizhanovsky
Tempesta Technologies, Inc.
ak@tempesta-tech.com
Who am I?
CEO & CTO at Tempesta Technologies
Develop Tempesta FW –
an open source hybrid of an HTTP accelerator and a fire...
MariaDB System Versioning
Commissioned by MariaDB Corporation
SQL System Versioning
SQL:2011
The database can store all versions of stored records
Applications:
●
Point-in-time recover...
Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ ...
Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ ...
Getting the history
t t t t
+------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+
| x | | x | | x | | x |
+------+ ...
System Versioning vs Flashback
Flashback (since 10.2.4)
mysqlbinlog --fashback > dump.sql & mysql < d.sql
●
Pure binary lo...
Use cases
Temporal data processing
●
How a Sales Opportunity has fluctuated over time?
●
Mine clients activity changes dur...
Sense of System Versioning:
CREATE TABLE (SQL:2011)
> create table t(x int,
row_start timestamp(6) generated always as row...
Sense of System Versioning:
CREATE TABLE
> create table t(x int) with system versioning;
Sense of System Versioning
> create table t(x int) with system versioning;
> insert into t values (1);
> set @ts = now(6);...
Sense of BETWEEN
> create table t(x int) with system versioning;
> insert into t values(1);
> set @t0 = now(6);
> update t...
Point-in-time recovery
> create table t(x int) with system versioning;
> insert into t values(1);
> select sleep(10);
> de...
SQL workaround:
a Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-design-
a-po...
Point in Time Architecture
Issues
●
Application layer awareness
●
Timestamps only
●
Low performance
●
Too complex
●
Doesn’...
Solutions on the market
Mostly for point-in-time recovery
Doesn’t survive DDL
Oracle Flashback & IBM DB2
●
History tables ...
MariaDB System Versioning
Intended to survive DDL (for 2.0)
As engine independent as possibly
●
SQL layer: DML & Queries
●...
System versioned tables
New invisible columns
●
row_start - transaction ID which created the row
●
row_end - transaction I...
row_end in primary key
Historical records now can have the same PK values
+---+-----------+----------------------+
| x | r...
Why timestamps aren’t enough?
Forensics discovery and debugging may need reliable answer
which transactions were visible f...
Transactional System Versioning
(InnoDB only)
> create table t_trx(x int,
t0 bigint unsigned generated always as row start...
mysql.transaction_registry
Maps trx_id to timestamp (for transaction history only)
Updated on engine-independent layer thr...
Begin & commit transaction IDs
> select *,row_start,row_end from t for system_time all;
+---+-----------+-----------------...
Transaction history view
Uses trx_id only to provide MVCC-consistent AS OF view
Only works with InnoDB tables with transac...
SELECT
JOIN::prepare, i.e. system versioning queries are optimized
Adds WHERE clause for time-related information
●
row_en...
SELECT: track the rows
> select x, sys_trx_start as start, commit_id as commit,
sys_trx_end as end, begin_timestamp, commi...
Transactional System Versioning:
SELECT (syntax sugar)
-- standard syntax
> select *,t0,tx from t_trx for system_time as o...
Select all historical records
> select x as dead_rows from t
for system_time all where row_end < now(6);
+-----------+
| d...
Range queries
> select *,row_start,row_end from t for system_time
between timestamp (now(6) - interval 1 month) and now(6)...
Range queries
> select *,row_start,row_end from t for system_time
between timestamp (now(6) - interval 1 month) and now(6)...
FROM...TO vs BETWEEN
> select *,row_start,row_end from t for system_time
between transaction 0 and transaction 3033;
+---+...
Range queries: MyISAM
> select *,row_start,row_end from my_t
for system_time between timestamp 0 and timestamp now(6);
+--...
INSERT
New record
●
row_start = current timestamp
●
row_end = 2038-01-19 06:14:07.999999
New record (transactional history...
DELETE
UPDATE
Moves the record to history:
●
row_end = current timestamp | trx_id
(as of begin of the transaction)
Can not...
UPDATE
UPDATE + INSERT
New history record:
●
Copy the record to history
●
row_end = current timestamp | trx_id
(as of begi...
History partitioning
> create table t (x int) with system versioning
partition by system_time interval 1 month
subpartitio...
History purging
> delete history from t before system_time '2018-02-23 21:36';
> delete history from t;
> alter table t dr...
ALTER System Versioning
> create table t (x int);
> insert into t values(1);
> alter table t add system versioning;
> upda...
Per-column history
> create table t (x int) with system versioning;
> insert into t(x) values(1); update t set x=2;
> set ...
Foreign keys
> create table p (x int unique key);
> create table c (px int, foreign key(px) references p(x))
with system v...
Backups
Fully compatible with MariaDB Backup
Dump & restore lose the history
Further extensions
DDL survival (in progress)
https://github.com/tempesta-tech/mariadb/milestone/15
Audit plugin:
https://...
DDL survival
TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival
In progress: persistent history (tables renam...
Application-time period tables
(we’re open for requests)
> create table emp(id int, d_start date, d_end date, dept varchar...
Questions?
Thanks to:
●
MariaDB (request, discussions, review)
●
Alexey Midenkov
●
Eugene Kosov
E-mail: ak@tempesta-tech.c...
Replication
Timestamp-based
●
SBR, RBR, Galera – as usual tables
Transaction-based (InnoDB)
●
SBR only
●
RBR for system ve...
Cascade foreign keys
(https://jira.mariadb.org/browse/MDEV-15364)
> create table p (x int primary key);
> create table c (...
Upcoming SlideShare
Loading in …5
×

M|18 Querying Data at a Previous Point in Time

541 views

Published on

M|18 Querying Data at a Previous Point in Time

Published in: Data & Analytics
  • Be the first to comment

M|18 Querying Data at a Previous Point in Time

  1. 1. Querying Data at a Previous Point in Time Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  2. 2. Who am I? CEO & CTO at Tempesta Technologies Develop Tempesta FW – an open source hybrid of an HTTP accelerator and a firewall ● Web accelerator, load balancer, DDoS mitigation & Web security ● x3 faster than Nginx, 40% faster than a DPDK-based Web server ● Linux kernel HTTPS/TCP/IP stack https://netdevconf.org/2.1/session.html?krizhanovsky Custom software development: ● high performance network traffic processing e.g. WAF mentioned in Gartner magic quadrant ● Databases
  3. 3. MariaDB System Versioning Commissioned by MariaDB Corporation
  4. 4. SQL System Versioning SQL:2011 The database can store all versions of stored records Applications: ● Point-in-time recovery ● Forensic discovery & legal requirements to store data for N years ● Data analysis (retrospective, trends etc.) MariaDB starting with 10.3.4 ● https://mariadb.com/kb/en/library/system-versioned-tables/
  5. 5. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+
  6. 6. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+ > select * from t; Empty set (0.00 sec)
  7. 7. Getting the history t t t t +------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+ | x | | x | | x | | x | +------+ +------+ +------+ +------+ | 1 | | 2 | | 3 | | 1000 | +------+ | 1 | | 1 | | 1 | | | | 2 | | 2 | +------+ +------+ | 3 | TS0 TS1 ... > select * from t +------+ +------+ for system_time between | 2 | AS OF TS0 timestamp TS0 and | 3 | timestamp TS1; +------+
  8. 8. System Versioning vs Flashback Flashback (since 10.2.4) mysqlbinlog --fashback > dump.sql & mysql < d.sql ● Pure binary log based point-in-time recovery mechanism ● Typically to recover recent changes (low performance) ● Multi-engine ● No DDL System Versioning ● Efficient queries & MVCC-like data analysis ● InnoDB & MyISAM fully supported; RocksDB, Aria must be tested ● Designed to survive DDL (in progress)
  9. 9. Use cases Temporal data processing ● How a Sales Opportunity has fluctuated over time? ● Mine clients activity changes during a particular period of time ● Analyze trends in your staff changes Forensic analysis & legal requirements to store data for N years. ● Audit requires a financial institution to report on changes made to a client's records during the past five years Point-in-time recovery ● A client inquiry reveals a data entry error involving the three-month introductory interest rate on a credit card. The bank needs to retroactively correct the error
  10. 10. Sense of System Versioning: CREATE TABLE (SQL:2011) > create table t(x int, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end) ) with system versioning;
  11. 11. Sense of System Versioning: CREATE TABLE > create table t(x int) with system versioning;
  12. 12. Sense of System Versioning > create table t(x int) with system versioning; > insert into t values (1); > set @ts = now(6); > insert into t values (2); > select * from t for system_time as of timestamp @ts; +------+ | x | +------+ | 1 | +------+
  13. 13. Sense of BETWEEN > create table t(x int) with system versioning; > insert into t values(1); > set @t0 = now(6); > update t set x = 2; > set @t1 = now(6); > delete from t; > select *,row_start,row_end from t for system_time between timestamp @ts0 and timestamp @ts1; +------+----------------------------+----------------------------+ | x | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-23 18:11:44.017902 | 2018-02-23 18:11:53.634389 | | 1 | 2018-02-23 18:06:57.559257 | 2018-02-23 18:11:44.017902 | +------+----------------------------+----------------------------+
  14. 14. Point-in-time recovery > create table t(x int) with system versioning; > insert into t values(1); > select sleep(10); > delete from t; > insert into t select * from t for system_time as of (now(6) - interval 10 second); > select * from t; +------+ | x | +------+ | 1 | +------+
  15. 15. SQL workaround: a Point in Time Architecture https://www.simple-talk.com/sql/database-administration/database-design- a-point-in-time-architecture/ INSERT: introduces column DateCreated DELETE: no actual deletes, introduces column DateEnd UPDATE: trigger ● UPDATE DateEnd for old record ● INSERT a new record SELECT: additional WHERE clause by <DateCreated, DateEnd>
  16. 16. Point in Time Architecture Issues ● Application layer awareness ● Timestamps only ● Low performance ● Too complex ● Doesn’t survive DDLs
  17. 17. Solutions on the market Mostly for point-in-time recovery Doesn’t survive DDL Oracle Flashback & IBM DB2 ● History tables are generated from undo log => limited time to live ● Long history leads to performance issues MS SQL Server ● separate history tables
  18. 18. MariaDB System Versioning Intended to survive DDL (for 2.0) As engine independent as possibly ● SQL layer: DML & Queries ● InnoDB: transactional history (MVCC-like) only No changes are required from an application Standard dialect (what is defined) Too many data (use partitioning for separate disks)
  19. 19. System versioned tables New invisible columns ● row_start - transaction ID which created the row ● row_end - transaction ID when the row died > create table t(x int primary key, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end)) with system versioning; > desc t; +-----------+--------------+------+-----+---------+-----------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+-----------+ | x | int(11) | NO | PRI | NULL | | | row_start | timestamp(6) | NO | | NULL | INVISIBLE | | row_end | timestamp(6) | NO | PRI | NULL | INVISIBLE |
  20. 20. row_end in primary key Historical records now can have the same PK values +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 5437 | ← dead (history) | 1 | 5437 | 18446744073709551615 | +---+-----------+----------------------+ DELETE and UPDATE now always updates PK PK constraints are always satisfied: +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 18446744073709551615 | Wrong and imposible! | 1 | 5437 | 18446744073709551615 |
  21. 21. Why timestamps aren’t enough? Forensics discovery and debugging may need reliable answer which transactions were visible for transaction X? ● However have begin timestamp, commit timestamp... Limited accuracy for many short concurrent transactions ● OS doesn’t guarantee strictly monotonically increasing time ● Different CPUs may have different time ● MVCC operates with transaction IDs
  22. 22. Transactional System Versioning (InnoDB only) > create table t_trx(x int, t0 bigint unsigned generated always as row start, tx bigint unsigned generated always as row end, period for system_time(t0, tx) ) with system versioning; > insert into t_trx values(1); > insert into t_trx values(2); > select *,t0,tx from t_trx; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | | 2 | 4049 | 18446744073709551615 | +------+------+----------------------+
  23. 23. mysql.transaction_registry Maps trx_id to timestamp (for transaction history only) Updated on engine-independent layer through handler interface Very large Columns ● transaction_id - transaction ID ● commit_id – transaction commit ID (trx_id) ● begin_timestamp – timestamp for beging of the transaction ● commit_timestamp – timestamp for commit of the transaction ● isolation_level – RR/S, RC/RU
  24. 24. Begin & commit transaction IDs > select *,row_start,row_end from t for system_time all; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5583 | 18446744073709551615 | +---+-----------+----------------------+ > select * from mysql.transaction_registry where commit_timestamp > now(6) - interval 15 minute G *************************** 1. row *************************** transaction_id: 5583 commit_id: 5584 begin_timestamp: 2018-02-25 06:37:42.190825 commit_timestamp: 2018-02-25 06:37:42.191870 isolation_level: REPEATABLE-READ
  25. 25. Transaction history view Uses trx_id only to provide MVCC-consistent AS OF view Only works with InnoDB tables with transactional history create function TRX_SEES(TRX_ID1 bigint unsigned, TRX_ID0 bigint unsigned) returns bool begin declare COMMIT_ID1 bigint unsigned default VTQ_COMMIT_ID(TRX_ID1); declare COMMIT_ID0 bigint unsigned default VTQ_COMMIT_ID(TRX_ID0); declare ISO_LEVEL1 enum('RR', 'RC') default VTQ_ISO_LEVEL(TRX_ID1); if TRX_ID1 > COMMIT_ID0 then return true; end if; if COMMIT_ID1 > COMMIT_ID0 and ISO_LEVEL1 = 'RC' then return true; end if; return false; end
  26. 26. SELECT JOIN::prepare, i.e. system versioning queries are optimized Adds WHERE clause for time-related information ● row_end = Inf for current data transaction_registery is used to convert timestamps to trx_id
  27. 27. SELECT: track the rows > select x, sys_trx_start as start, commit_id as commit, sys_trx_end as end, begin_timestamp, commit_timestamp from t for system_time all join mysql.transaction_registry as vtq on vtq.transaction_id = t.sys_trx_start where x < 10; +---+-------+--------+----------------------+----------------------------+----------------------------+ | x | start | commit | end | begin_timestamp | commit_timestamp | +---+-------+--------+----------------------+----------------------------+----------------------------+ | 3 | 3033 | 3034 | 18446744073709551615 | 2017-04-12 01:05:55.861774 | 2017-04-12 01:05:55.864698 | | 2 | 3026 | 3027 | 3033 | 2017-04-12 01:00:32.275002 | 2017-04-12 01:00:32.278337 | | 1 | 3024 | 3025 | 3026 | 2017-04-12 01:00:23.585170 | 2017-04-12 01:00:23.596620 | +---+-------+--------+----------------------+----------------------------+----------------------------+
  28. 28. Transactional System Versioning: SELECT (syntax sugar) -- standard syntax > select *,t0,tx from t_trx for system_time as of transaction 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+ -- ...the same (where t0 > 4045 and t0 < 4048 also works) > select *,t0,tx from t_trx where t0 = 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+
  29. 29. Select all historical records > select x as dead_rows from t for system_time all where row_end < now(6); +-----------+ | dead_rows | +-----------+ | 1 | +-----------+
  30. 30. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+
  31. 31. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+ > select *,row_start,row_end from t for system_time from transaction 2974 to transaction 2986; +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 44 | NULL | 2965 | 2986 | +------+------+-----------+---------+
  32. 32. FROM...TO vs BETWEEN > select *,row_start,row_end from t for system_time between transaction 0 and transaction 3033; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | | 3 | 3033 | 18446744073709551615 | +---+-----------+----------------------+ > select *,row_start,row_end from t for system_time from transaction 0 to transaction 3033; +---+-----------+---------+ | x | row_start | row_end | +---+-----------+---------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | +---+-----------+---------+ Required by the standard Might be useful to know Changes during a period state before a disaster
  33. 33. Range queries: MyISAM > select *,row_start,row_end from my_t for system_time between timestamp 0 and timestamp now(6); +---+----------------------------+----------------------------+ | x | row_start | row_end | +---+----------------------------+----------------------------+ | 1 | 2017-04-12 00:10:47.099814 | 2038-01-19 06:14:07.000000 | +---+----------------------------+----------------------------+ > select *,row_start,row_end from my_t for system_time from transaction 0 to transaction 10000; ERROR 4109 (HY000): Transaction system versioning for `my_t` is not supported
  34. 34. INSERT New record ● row_start = current timestamp ● row_end = 2038-01-19 06:14:07.999999 New record (transactional history): ● row_start = trx_id ● row_end = Inf
  35. 35. DELETE UPDATE Moves the record to history: ● row_end = current timestamp | trx_id (as of begin of the transaction) Can not be used for historical data
  36. 36. UPDATE UPDATE + INSERT New history record: ● Copy the record to history ● row_end = current timestamp | trx_id (as of begin of the transaction) New record: ● row_start = current timestamp | trx_id ● row_end = Inf | 2038-01-19 06:14:07.999999
  37. 37. History partitioning > create table t (x int) with system versioning partition by system_time interval 1 month subpartition by key(x) subpartitions 4 ( partition p0 history, partition p1 history, partition pnow current); By time interval, limit number of records (e.g. limit 1000) Partition pruning for history range Another way to get all history records: > select *,row_start,row_end from t partition(p0,p1);
  38. 38. History purging > delete history from t before system_time '2018-02-23 21:36'; > delete history from t; > alter table t drop partition p0; > alter table t drop partition p1; ERROR 4126 (HY000): Wrong partitions for `t`: must have at least one HISTORY and exactly one last CURRENT
  39. 39. ALTER System Versioning > create table t (x int); > insert into t values(1); > alter table t add system versioning; > update t set x=2; > alter table t drop system versioning; -- historical data was dropped > select * from t; +------+ | x | +------+ | 2 | +------+
  40. 40. Per-column history > create table t (x int) with system versioning; > insert into t(x) values(1); update t set x=2; > set @@system_versioning_alter_history='keep'; > alter table t add y int without system versioning; > insert into t(x,y) values(3,3); > update t set x=4; > update t set y=5; > select *,row_end from t for system_time all; +------+------+----------------------------+ | x | y | row_end | +------+------+----------------------------+ | 1 | NULL | 2018-02-24 16:20:30.323272 | | 2 | NULL | 2018-02-24 16:22:08.685693 | | 3 | 3 | 2018-02-24 16:22:08.685693 | | 4 | 5 | 2038-01-19 06:14:07.999999 | | 4 | 5 | 2038-01-19 06:14:07.999999 | +------+------+----------------------------+
  41. 41. Foreign keys > create table p (x int unique key); > create table c (px int, foreign key(px) references p(x)) with system versioning; > insert into p values(1); > insert into c values(1); > delete from c; > delete from p; > select * from c for system_time all; +----+ | px | +----+ | 1 | +----+
  42. 42. Backups Fully compatible with MariaDB Backup Dump & restore lose the history
  43. 43. Further extensions DDL survival (in progress) https://github.com/tempesta-tech/mariadb/milestone/15 Audit plugin: https://github.com/tempesta-tech/mariadb/issues/138 Other storage engines – need to test https://github.com/tempesta-tech/mariadb/issues/323 https://github.com/tempesta-tech/mariadb/issues/345 Application-time period tables (?)
  44. 44. DDL survival TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival In progress: persistent history (tables renaming) Versioned Tracking Metadata table (VTMD) table: ● trx_id_start - transaction which generated a table ● trx_id_end - transaction, which generated a new version ● original_name - original name of the table before the transaction trx_id_start ● new_name - new name of the table ● col_renames - blob with new to old column name mappings Multi-schema SELECT
  45. 45. Application-time period tables (we’re open for requests) > create table emp(id int, d_start date, d_end date, dept varchar(30), e_period for period(d_start, d_end)); > insert into emp values (1, '2016-01-01', '2038-01-19', 'sales'); > update emp for portion of e_period from date '2017-03-15' to date '2017-07-15' set dept = 'engineering' where id = 1; +----+-------------+------------+--------------+ | id | d_start | d_end | dept | +----+-------------+------------+--------------+ | 1 | 2016-01-01 | 2017-03-15 | sales | | 1 | 2017-03-15 | 2017-07-15 | engineering | | 1 | 2017-07-15 | 2038-01-19 | sales | +----+-------------+------------+--------------+
  46. 46. Questions? Thanks to: ● MariaDB (request, discussions, review) ● Alexey Midenkov ● Eugene Kosov E-mail: ak@tempesta-tech.com Tempesta FW – the fastest and secure HTTP accelerator: https://github.com/tempesta-tech/tempesta
  47. 47. Replication Timestamp-based ● SBR, RBR, Galera – as usual tables Transaction-based (InnoDB) ● SBR only ● RBR for system versioned tables is automatically switched to SBR (like mixed replication)
  48. 48. Cascade foreign keys (https://jira.mariadb.org/browse/MDEV-15364) > create table p (x int primary key); > create table c (px int, foreign key (px) references p(x) on delete cascade on update cascade) with system versioning; > insert into p values (1); > insert into c values (1); > update p set x = 2; > select *,row_start,row_end from c for system_time all; +------+----------------------------+----------------------------+ | px | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-25 01:31:59.070080 | 2038-01-19 06:14:07.999999 | +------+----------------------------+----------------------------+

×