Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Querying Data at a Previous Point in Time

277 views

Published on

M|18 Querying Data at a Previous Point in Time

Published in: Data & Analytics
  • Be the first to comment

M|18 Querying Data at a Previous Point in Time

  1. 1. Querying Data at a Previous Point in Time Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  2. 2. Who am I? CEO & CTO at Tempesta Technologies Develop Tempesta FW – an open source hybrid of an HTTP accelerator and a firewall ● Web accelerator, load balancer, DDoS mitigation & Web security ● x3 faster than Nginx, 40% faster than a DPDK-based Web server ● Linux kernel HTTPS/TCP/IP stack https://netdevconf.org/2.1/session.html?krizhanovsky Custom software development: ● high performance network traffic processing e.g. WAF mentioned in Gartner magic quadrant ● Databases
  3. 3. MariaDB System Versioning Commissioned by MariaDB Corporation
  4. 4. SQL System Versioning SQL:2011 The database can store all versions of stored records Applications: ● Point-in-time recovery ● Forensic discovery & legal requirements to store data for N years ● Data analysis (retrospective, trends etc.) MariaDB starting with 10.3.4 ● https://mariadb.com/kb/en/library/system-versioned-tables/
  5. 5. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+
  6. 6. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+ > select * from t; Empty set (0.00 sec)
  7. 7. Getting the history t t t t +------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+ | x | | x | | x | | x | +------+ +------+ +------+ +------+ | 1 | | 2 | | 3 | | 1000 | +------+ | 1 | | 1 | | 1 | | | | 2 | | 2 | +------+ +------+ | 3 | TS0 TS1 ... > select * from t +------+ +------+ for system_time between | 2 | AS OF TS0 timestamp TS0 and | 3 | timestamp TS1; +------+
  8. 8. System Versioning vs Flashback Flashback (since 10.2.4) mysqlbinlog --fashback > dump.sql & mysql < d.sql ● Pure binary log based point-in-time recovery mechanism ● Typically to recover recent changes (low performance) ● Multi-engine ● No DDL System Versioning ● Efficient queries & MVCC-like data analysis ● InnoDB & MyISAM fully supported; RocksDB, Aria must be tested ● Designed to survive DDL (in progress)
  9. 9. Use cases Temporal data processing ● How a Sales Opportunity has fluctuated over time? ● Mine clients activity changes during a particular period of time ● Analyze trends in your staff changes Forensic analysis & legal requirements to store data for N years. ● Audit requires a financial institution to report on changes made to a client's records during the past five years Point-in-time recovery ● A client inquiry reveals a data entry error involving the three-month introductory interest rate on a credit card. The bank needs to retroactively correct the error
  10. 10. Sense of System Versioning: CREATE TABLE (SQL:2011) > create table t(x int, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end) ) with system versioning;
  11. 11. Sense of System Versioning: CREATE TABLE > create table t(x int) with system versioning;
  12. 12. Sense of System Versioning > create table t(x int) with system versioning; > insert into t values (1); > set @ts = now(6); > insert into t values (2); > select * from t for system_time as of timestamp @ts; +------+ | x | +------+ | 1 | +------+
  13. 13. Sense of BETWEEN > create table t(x int) with system versioning; > insert into t values(1); > set @t0 = now(6); > update t set x = 2; > set @t1 = now(6); > delete from t; > select *,row_start,row_end from t for system_time between timestamp @ts0 and timestamp @ts1; +------+----------------------------+----------------------------+ | x | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-23 18:11:44.017902 | 2018-02-23 18:11:53.634389 | | 1 | 2018-02-23 18:06:57.559257 | 2018-02-23 18:11:44.017902 | +------+----------------------------+----------------------------+
  14. 14. Point-in-time recovery > create table t(x int) with system versioning; > insert into t values(1); > select sleep(10); > delete from t; > insert into t select * from t for system_time as of (now(6) - interval 10 second); > select * from t; +------+ | x | +------+ | 1 | +------+
  15. 15. SQL workaround: a Point in Time Architecture https://www.simple-talk.com/sql/database-administration/database-design- a-point-in-time-architecture/ INSERT: introduces column DateCreated DELETE: no actual deletes, introduces column DateEnd UPDATE: trigger ● UPDATE DateEnd for old record ● INSERT a new record SELECT: additional WHERE clause by <DateCreated, DateEnd>
  16. 16. Point in Time Architecture Issues ● Application layer awareness ● Timestamps only ● Low performance ● Too complex ● Doesn’t survive DDLs
  17. 17. Solutions on the market Mostly for point-in-time recovery Doesn’t survive DDL Oracle Flashback & IBM DB2 ● History tables are generated from undo log => limited time to live ● Long history leads to performance issues MS SQL Server ● separate history tables
  18. 18. MariaDB System Versioning Intended to survive DDL (for 2.0) As engine independent as possibly ● SQL layer: DML & Queries ● InnoDB: transactional history (MVCC-like) only No changes are required from an application Standard dialect (what is defined) Too many data (use partitioning for separate disks)
  19. 19. System versioned tables New invisible columns ● row_start - transaction ID which created the row ● row_end - transaction ID when the row died > create table t(x int primary key, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end)) with system versioning; > desc t; +-----------+--------------+------+-----+---------+-----------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+-----------+ | x | int(11) | NO | PRI | NULL | | | row_start | timestamp(6) | NO | | NULL | INVISIBLE | | row_end | timestamp(6) | NO | PRI | NULL | INVISIBLE |
  20. 20. row_end in primary key Historical records now can have the same PK values +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 5437 | ← dead (history) | 1 | 5437 | 18446744073709551615 | +---+-----------+----------------------+ DELETE and UPDATE now always updates PK PK constraints are always satisfied: +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 18446744073709551615 | Wrong and imposible! | 1 | 5437 | 18446744073709551615 |
  21. 21. Why timestamps aren’t enough? Forensics discovery and debugging may need reliable answer which transactions were visible for transaction X? ● However have begin timestamp, commit timestamp... Limited accuracy for many short concurrent transactions ● OS doesn’t guarantee strictly monotonically increasing time ● Different CPUs may have different time ● MVCC operates with transaction IDs
  22. 22. Transactional System Versioning (InnoDB only) > create table t_trx(x int, t0 bigint unsigned generated always as row start, tx bigint unsigned generated always as row end, period for system_time(t0, tx) ) with system versioning; > insert into t_trx values(1); > insert into t_trx values(2); > select *,t0,tx from t_trx; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | | 2 | 4049 | 18446744073709551615 | +------+------+----------------------+
  23. 23. mysql.transaction_registry Maps trx_id to timestamp (for transaction history only) Updated on engine-independent layer through handler interface Very large Columns ● transaction_id - transaction ID ● commit_id – transaction commit ID (trx_id) ● begin_timestamp – timestamp for beging of the transaction ● commit_timestamp – timestamp for commit of the transaction ● isolation_level – RR/S, RC/RU
  24. 24. Begin & commit transaction IDs > select *,row_start,row_end from t for system_time all; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5583 | 18446744073709551615 | +---+-----------+----------------------+ > select * from mysql.transaction_registry where commit_timestamp > now(6) - interval 15 minute G *************************** 1. row *************************** transaction_id: 5583 commit_id: 5584 begin_timestamp: 2018-02-25 06:37:42.190825 commit_timestamp: 2018-02-25 06:37:42.191870 isolation_level: REPEATABLE-READ
  25. 25. Transaction history view Uses trx_id only to provide MVCC-consistent AS OF view Only works with InnoDB tables with transactional history create function TRX_SEES(TRX_ID1 bigint unsigned, TRX_ID0 bigint unsigned) returns bool begin declare COMMIT_ID1 bigint unsigned default VTQ_COMMIT_ID(TRX_ID1); declare COMMIT_ID0 bigint unsigned default VTQ_COMMIT_ID(TRX_ID0); declare ISO_LEVEL1 enum('RR', 'RC') default VTQ_ISO_LEVEL(TRX_ID1); if TRX_ID1 > COMMIT_ID0 then return true; end if; if COMMIT_ID1 > COMMIT_ID0 and ISO_LEVEL1 = 'RC' then return true; end if; return false; end
  26. 26. SELECT JOIN::prepare, i.e. system versioning queries are optimized Adds WHERE clause for time-related information ● row_end = Inf for current data transaction_registery is used to convert timestamps to trx_id
  27. 27. SELECT: track the rows > select x, sys_trx_start as start, commit_id as commit, sys_trx_end as end, begin_timestamp, commit_timestamp from t for system_time all join mysql.transaction_registry as vtq on vtq.transaction_id = t.sys_trx_start where x < 10; +---+-------+--------+----------------------+----------------------------+----------------------------+ | x | start | commit | end | begin_timestamp | commit_timestamp | +---+-------+--------+----------------------+----------------------------+----------------------------+ | 3 | 3033 | 3034 | 18446744073709551615 | 2017-04-12 01:05:55.861774 | 2017-04-12 01:05:55.864698 | | 2 | 3026 | 3027 | 3033 | 2017-04-12 01:00:32.275002 | 2017-04-12 01:00:32.278337 | | 1 | 3024 | 3025 | 3026 | 2017-04-12 01:00:23.585170 | 2017-04-12 01:00:23.596620 | +---+-------+--------+----------------------+----------------------------+----------------------------+
  28. 28. Transactional System Versioning: SELECT (syntax sugar) -- standard syntax > select *,t0,tx from t_trx for system_time as of transaction 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+ -- ...the same (where t0 > 4045 and t0 < 4048 also works) > select *,t0,tx from t_trx where t0 = 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+
  29. 29. Select all historical records > select x as dead_rows from t for system_time all where row_end < now(6); +-----------+ | dead_rows | +-----------+ | 1 | +-----------+
  30. 30. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+
  31. 31. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+ > select *,row_start,row_end from t for system_time from transaction 2974 to transaction 2986; +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 44 | NULL | 2965 | 2986 | +------+------+-----------+---------+
  32. 32. FROM...TO vs BETWEEN > select *,row_start,row_end from t for system_time between transaction 0 and transaction 3033; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | | 3 | 3033 | 18446744073709551615 | +---+-----------+----------------------+ > select *,row_start,row_end from t for system_time from transaction 0 to transaction 3033; +---+-----------+---------+ | x | row_start | row_end | +---+-----------+---------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | +---+-----------+---------+ Required by the standard Might be useful to know Changes during a period state before a disaster
  33. 33. Range queries: MyISAM > select *,row_start,row_end from my_t for system_time between timestamp 0 and timestamp now(6); +---+----------------------------+----------------------------+ | x | row_start | row_end | +---+----------------------------+----------------------------+ | 1 | 2017-04-12 00:10:47.099814 | 2038-01-19 06:14:07.000000 | +---+----------------------------+----------------------------+ > select *,row_start,row_end from my_t for system_time from transaction 0 to transaction 10000; ERROR 4109 (HY000): Transaction system versioning for `my_t` is not supported
  34. 34. INSERT New record ● row_start = current timestamp ● row_end = 2038-01-19 06:14:07.999999 New record (transactional history): ● row_start = trx_id ● row_end = Inf
  35. 35. DELETE UPDATE Moves the record to history: ● row_end = current timestamp | trx_id (as of begin of the transaction) Can not be used for historical data
  36. 36. UPDATE UPDATE + INSERT New history record: ● Copy the record to history ● row_end = current timestamp | trx_id (as of begin of the transaction) New record: ● row_start = current timestamp | trx_id ● row_end = Inf | 2038-01-19 06:14:07.999999
  37. 37. History partitioning > create table t (x int) with system versioning partition by system_time interval 1 month subpartition by key(x) subpartitions 4 ( partition p0 history, partition p1 history, partition pnow current); By time interval, limit number of records (e.g. limit 1000) Partition pruning for history range Another way to get all history records: > select *,row_start,row_end from t partition(p0,p1);
  38. 38. History purging > delete history from t before system_time '2018-02-23 21:36'; > delete history from t; > alter table t drop partition p0; > alter table t drop partition p1; ERROR 4126 (HY000): Wrong partitions for `t`: must have at least one HISTORY and exactly one last CURRENT
  39. 39. ALTER System Versioning > create table t (x int); > insert into t values(1); > alter table t add system versioning; > update t set x=2; > alter table t drop system versioning; -- historical data was dropped > select * from t; +------+ | x | +------+ | 2 | +------+
  40. 40. Per-column history > create table t (x int) with system versioning; > insert into t(x) values(1); update t set x=2; > set @@system_versioning_alter_history='keep'; > alter table t add y int without system versioning; > insert into t(x,y) values(3,3); > update t set x=4; > update t set y=5; > select *,row_end from t for system_time all; +------+------+----------------------------+ | x | y | row_end | +------+------+----------------------------+ | 1 | NULL | 2018-02-24 16:20:30.323272 | | 2 | NULL | 2018-02-24 16:22:08.685693 | | 3 | 3 | 2018-02-24 16:22:08.685693 | | 4 | 5 | 2038-01-19 06:14:07.999999 | | 4 | 5 | 2038-01-19 06:14:07.999999 | +------+------+----------------------------+
  41. 41. Foreign keys > create table p (x int unique key); > create table c (px int, foreign key(px) references p(x)) with system versioning; > insert into p values(1); > insert into c values(1); > delete from c; > delete from p; > select * from c for system_time all; +----+ | px | +----+ | 1 | +----+
  42. 42. Backups Fully compatible with MariaDB Backup Dump & restore lose the history
  43. 43. Further extensions DDL survival (in progress) https://github.com/tempesta-tech/mariadb/milestone/15 Audit plugin: https://github.com/tempesta-tech/mariadb/issues/138 Other storage engines – need to test https://github.com/tempesta-tech/mariadb/issues/323 https://github.com/tempesta-tech/mariadb/issues/345 Application-time period tables (?)
  44. 44. DDL survival TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival In progress: persistent history (tables renaming) Versioned Tracking Metadata table (VTMD) table: ● trx_id_start - transaction which generated a table ● trx_id_end - transaction, which generated a new version ● original_name - original name of the table before the transaction trx_id_start ● new_name - new name of the table ● col_renames - blob with new to old column name mappings Multi-schema SELECT
  45. 45. Application-time period tables (we’re open for requests) > create table emp(id int, d_start date, d_end date, dept varchar(30), e_period for period(d_start, d_end)); > insert into emp values (1, '2016-01-01', '2038-01-19', 'sales'); > update emp for portion of e_period from date '2017-03-15' to date '2017-07-15' set dept = 'engineering' where id = 1; +----+-------------+------------+--------------+ | id | d_start | d_end | dept | +----+-------------+------------+--------------+ | 1 | 2016-01-01 | 2017-03-15 | sales | | 1 | 2017-03-15 | 2017-07-15 | engineering | | 1 | 2017-07-15 | 2038-01-19 | sales | +----+-------------+------------+--------------+
  46. 46. Questions? Thanks to: ● MariaDB (request, discussions, review) ● Alexey Midenkov ● Eugene Kosov E-mail: ak@tempesta-tech.com Tempesta FW – the fastest and secure HTTP accelerator: https://github.com/tempesta-tech/tempesta
  47. 47. Replication Timestamp-based ● SBR, RBR, Galera – as usual tables Transaction-based (InnoDB) ● SBR only ● RBR for system versioned tables is automatically switched to SBR (like mixed replication)
  48. 48. Cascade foreign keys (https://jira.mariadb.org/browse/MDEV-15364) > create table p (x int primary key); > create table c (px int, foreign key (px) references p(x) on delete cascade on update cascade) with system versioning; > insert into p values (1); > insert into c values (1); > update p set x = 2; > select *,row_start,row_end from c for system_time all; +------+----------------------------+----------------------------+ | px | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-25 01:31:59.070080 | 2038-01-19 06:14:07.999999 | +------+----------------------------+----------------------------+

×