In-memory OLTP storage with
persistence and transac on support
Alexander Korotkov
Postgres Professional
October 25, 2017
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 1 / 34
Disclaimer
▶ This talk is not about something produc on ready. Don’t hold
your breath while wai ng for use some of considered
func onality in produc on. When this func onality will be
available and produc on-ready, it might become something
drama cally different.
▶ This talk is about some intermediate results achieved during
development. These results are presented for discussion and
brainstorming in order to make further development be er.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 2 / 34
What this talk is about?
▶ We (Postgres Pro) have a prototype of in-memory OLTP storage
implemented using FDW.
▶ It’s proof of concept of opportuni es for in-memory OLTP in
PostgreSQL (it was debatable that there are any).
▶ It’s yet another example of alterna ve storage implemented
using FDW interface before we’ve na ve pluggable storages.
So, it’s waypoint to verify where we are on pluggable storages.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 3 / 34
Why pluggable storages?
▶ Lack of pluggable storages support is understood as
limita on.
PostgreSQL was always posi oned as highly extendable DBMS while
lack of pluggable storages support is large gap in this area.
▶ Rising interest in PostgreSQL from enterprises.
Postgres-centric companies have enough of resources to support
mul ple storage engines. Enterprises are also interested in using
PostgreSQL for non-OLTP tasks. Alterna ve storages might improve
OLTP too (UNDO log for be er update performance).
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 4 / 34
Use cases for pluggable storages
▶ Different MVCC implementa on: mostly varia ons of UNDO
log, but not only.
▶ Data compression: either row-level, page-level etc...
▶ Non disk-oriented storage: in-memory, SSD-op mized,
NVRAM-op mized.
▶ Non heap-like rows layout: index-organized table (IOT)
including LSM.
▶ Non row-oriented data layout: either column or parquet
layouts.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 5 / 34
Current state of pluggable storage
https://www.postgresql.org/message-id/flat/
20160812231527.GA690404%40alvherre.pgsql
▶ Started as quite mechanical separa on of heap_* methods into
storage AM interface.
▶ Boundary of storage layer was significantly shi ed during
discussions.
▶ S ll a lot of work to do.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 6 / 34
Current view on pluggable storage
proper es
▶ All the storages should share same transac on model (or have
no transac ons at all).
▶ All the storages should write same WAL stream.
▶ Tuples have to be iden fied by TIDs (further improvement is
possible).
▶ Storages should share some of index access methods.
▶ Index access method interface should be expanded with new
func ons (at least retail tuple delete).
▶ Storages may have completely different MVCC implementa on.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 7 / 34
Why FDW for prototyping
Pro:
▶ FDW is completely free in the way it scans and modifies the
foreign table.
▶ This approach is already used in cstore_fdw1
, vops2
.
Cons:
▶ Lack of control on associated resources,
▶ Lack of DDL support.
1
https://github.com/citusdata/cstore_fdw
2
https://github.com/postgrespro/vops
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 8 / 34
Why in-memory?
▶ No extra mapping layer (buffer manager) to traverse from one page
to another.
▶ Row-level WAL takes less space (no page-level informa on, no
explicit index logging), but slower to apply.
▶ Be er IO u liza on (write both snapshots and WAL are wri en
sequen ally).
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 9 / 34
What this par cular in-memory engine is?
▶ Index organized table where index is in-memory B-tree.
▶ This B-tree supports transac ons and MVCC using UNDO log
which is circular buffer in memory containing both row-level
and page-level records.
▶ It writes full data snapshots on checkpoints and row-level WAL.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 10 / 34
Why our in-memory engine is a good
example of pluggable storage
Because it does the things in a quite different way.
▶ It stores data in main memory with quite different
model of persistence: full data snapshots plus
row-level WAL.
▶ It doesn’t have heap-like layout.
▶ It uses very different MVCC implementa on:
combina on of row-level and page-level undo logs.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 11 / 34
Why our in-memory engine is a bad
example of pluggable storage
▶ It uses CSN snapshot model which is far from ge ng
commi ed.
▶ Tuples aren’t iden fied by TIDs.
▶ Persistence is implemented using set of hacks.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 12 / 34
Configura on parameters
▶ in_memory_engine.shared_pool_size – size of
separate pool of 1k pages for in-memory tables.
▶ in_memory_engine.undo_size – size of circular
buffer for undo records to support transac ons and
MVCC.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 13 / 34
Usage: defining in-memory table and
inser ng data
CREATE EXTENSION in_memory;
CREATE FOREIGN TABLE im_test
(
id int8 NOT NULL,
val text NOT NULL
) SERVER in_memory OPTIONS (indices ’unique (id)’,
persistent ’true’);
INSERT INTO im_test
(SELECT id, ’val’ || id FROM generate_series(1, 1000000) id);
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 14 / 34
Usage: querying a single key
# EXPLAIN ANALYZE SELECT * FROM im_test WHERE id = 50000;
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on im_test (cost=0.06..4.52 rows=357 width=40)
(actual time=0.190..0.191 rows=1 loops=1)
Pk conds: (id = 50000)
Planning time: 0.635 ms
Execution time: 0.260 ms
(4 rows)
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 15 / 34
Usage: querying a key range
# EXPLAIN ANALYZE SELECT * FROM im_test
WHERE id >= 10000 AND id <= 20000;
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on im_test (cost=0.06..5.41 rows=357 width=40)
(actual time=0.045..4.194 rows=10001 loops=1)
Pk conds: (id >= 10000 AND id <= 20000)
Planning time: 0.075 ms
Execution time: 4.915 ms
(4 rows)
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 16 / 34
Usage: querying for non-key condi on
# EXPLAIN ANALYZE SELECT * FROM im_test WHERE val LIKE ’%1111%’;
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on im_test (cost=0.06..891.83 rows=571 width=40)
(actual time=0.345..187.325 rows=280 loops=1)
Filter: (val ~~ ’%1111%’::text)
Rows Removed by Filter: 999720
Planning time: 0.046 ms
Execution time: 187.375 ms
(5 rows)
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 17 / 34
Usage: non-persistent tables are writable on
standby
*** Master ***
# CREATE FOREIGN TABLE im_test (id int8 NOT NULL, val text NOT NULL)
SERVER in_memory OPTIONS (indices ’unique (id)’,
persistent ’false’);
# INSERT INTO im_test
(SELECT id, ’val’ || id FROM generate_series(1, 1000000) id);
INSERT 0 1000000
*** Standby ***
# SELECT * FROM im_test;
id | val
----+-----
(0 rows)
# INSERT INTO im_test VALUES (1, ’foo’), (2, ’bar’);
INSERT 0 2
# SELECT * FROM im_test;
id | val
----+-----
1 | foo
2 | bar
(2 rows)
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 18 / 34
Limita ons
▶ Only B-tree with limited func onality is supported.
▶ No secondary indexes are supported yet.
▶ No out-of-line storage are supported for tuples yet.
▶ Undo log shouldn’t wraparound during single transac on (that
transac on is automa cally aborted).
▶ If required undo record is already overflowed then “snapshot’s
too old” error is emi ed.
▶ Serializable isola on level isn’t supported.
▶ Replica on isn’t supported yet.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 19 / 34
Read-only benchmark
0 50 100 150 200 250
# Clients
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
QPS
pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
builtin
in-memory
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 20 / 34
Why there is no win?
Storage is only one layer par cipa ng in query
execu on. There are also:
▶ Network layer,
▶ Executor,
▶ Parser (analyze & rewrite if not prepared),
▶ Transac on management (including snapshot
acquirement),
▶ ...
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 21 / 34
Measuring overheads
0 50 100 150 200 250
# Clients
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
QPS
pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
read-only
SELECT 1;
;
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 22 / 34
Read-only benchmark:
fetch 9 values per single query
set aid1 random(1, 100000 * :scale)
set aid2 random(1, 100000 * :scale)
set aid3 random(1, 100000 * :scale)
set aid4 random(1, 100000 * :scale)
set aid5 random(1, 100000 * :scale)
set aid6 random(1, 100000 * :scale)
set aid7 random(1, 100000 * :scale)
set aid8 random(1, 100000 * :scale)
set aid9 random(1, 100000 * :scale)
SELECT abalance FROM pgbench_accounts WHERE
aid IN (:aid1, :aid2, :aid3, :aid4, :aid5, :aid6, :aid7, :aid8, :aid9);
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 23 / 34
Read-only benchmark:
fetch 9 values per single query
0 50 100 150 200 250
# Clients
0
200000
400000
600000
800000
1000000
1200000
QPS
pgbench -s 1000 -j $n -c $n -M prepared -f ro9.sql on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
builtin
in-memory
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 24 / 34
Read-only benchmark:
compare values-per-second
0 50 100 150 200 250
# Clients
0
2000000
4000000
6000000
8000000
10000000
VPS
pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
builtin-1
builtin-9
in_memory-1
in_memory-9
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 25 / 34
Read-write benchmark without persistence
(async commit)
0 50 100 150 200 250
# Clients
0
50000
100000
150000
200000
250000
TPS
pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
unlogged table
in_memory
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 26 / 34
Read-write benchmark with persistence
(async commit)
0 50 100 150 200 250
# Clients
0
50000
100000
150000
200000
TPS
pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
builtin
in-memory
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 27 / 34
Read-write benchmark:
do transac on in a single statement
CREATE OR REPLACE FUNCTION tcpb_trx(_aid int, _bid int, _tid int, _delta int)
RETURNS void AS $$
BEGIN
UPDATE pgbench_accounts SET abalance = abalance + _delta WHERE aid = _aid;
PERFORM abalance FROM pgbench_accounts WHERE aid = _aid;
UPDATE pgbench_tellers SET tbalance = tbalance + _delta WHERE tid = _tid;
UPDATE pgbench_branches SET bbalance = bbalance + _delta WHERE bid = _bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(_tid, _bid, _aid, _delta, CURRENT_TIMESTAMP);
END;
$$ LANGUAGE plpgsql;
set aid random(1, 100000 * :scale)
set bid random(1, 1 * :scale)
set tid random(1, 10 * :scale)
set delta random(-5000, 5000)
SELECT tcpb_trx(:aid, :bid, :tid, :delta);
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 28 / 34
Read-write benchmark with persistence
(async commit, func on vs. interac ve)
0 50 100 150 200 250
# Clients
0
100000
200000
300000
400000
500000
600000
TPS
pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors
mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300
builtin
builtin-func
in_memory
in_memory-func
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 29 / 34
Hacks used in implementa on
▶ Minimalis c CSN implementa on
CSNs are assigned but neither used, neither wri en to SLRUs.
in-memory engine doesn’t need SLRU.
▶ Checkpoint hook
in-memory engine writes full data snapshot on checkpoint.
▶ Generic logical message hook
Used to implement custom recovery/replica on. This is an awful
hack.
▶ TRUNCATE using u lity command hook
TRUNCATE isn’t supported by FDW directly.
▶ DROP support using event trigger
Used to free the resources occupied by in-memory table.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 30 / 34
Recovery problem
▶ Row-level WAL is compact, but it requires meta-informa on to
apply. That is we need to be able to read system catalog while
applying WAL including recovery.
▶ We can’t access system catalog during recovery, because the
whole database isn’t accessible since it’s not recovered yet.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 31 / 34
Recovery problem solu on:
2-phase recovery
At the second phase we have consistent system catalog.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 32 / 34
Future roadmap
Integrate in-memory as pluggable storage:
▶ In-memory B-tree as index access method.
▶ Implement storage for in-memory tables using one of following
ways:
▶ Write some kind of «in-memory heap» OR/AND
▶ Write a storage wrapper implemen ng index-organized
table.
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 33 / 34
Thank you for a en on!
Alexander Korotkov In-memory OLTP storage with persistence and transac on support 34 / 34

In-memory OLTP storage with persistence and transaction support

  • 1.
    In-memory OLTP storagewith persistence and transac on support Alexander Korotkov Postgres Professional October 25, 2017 Alexander Korotkov In-memory OLTP storage with persistence and transac on support 1 / 34
  • 2.
    Disclaimer ▶ This talkis not about something produc on ready. Don’t hold your breath while wai ng for use some of considered func onality in produc on. When this func onality will be available and produc on-ready, it might become something drama cally different. ▶ This talk is about some intermediate results achieved during development. These results are presented for discussion and brainstorming in order to make further development be er. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 2 / 34
  • 3.
    What this talkis about? ▶ We (Postgres Pro) have a prototype of in-memory OLTP storage implemented using FDW. ▶ It’s proof of concept of opportuni es for in-memory OLTP in PostgreSQL (it was debatable that there are any). ▶ It’s yet another example of alterna ve storage implemented using FDW interface before we’ve na ve pluggable storages. So, it’s waypoint to verify where we are on pluggable storages. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 3 / 34
  • 4.
    Why pluggable storages? ▶Lack of pluggable storages support is understood as limita on. PostgreSQL was always posi oned as highly extendable DBMS while lack of pluggable storages support is large gap in this area. ▶ Rising interest in PostgreSQL from enterprises. Postgres-centric companies have enough of resources to support mul ple storage engines. Enterprises are also interested in using PostgreSQL for non-OLTP tasks. Alterna ve storages might improve OLTP too (UNDO log for be er update performance). Alexander Korotkov In-memory OLTP storage with persistence and transac on support 4 / 34
  • 5.
    Use cases forpluggable storages ▶ Different MVCC implementa on: mostly varia ons of UNDO log, but not only. ▶ Data compression: either row-level, page-level etc... ▶ Non disk-oriented storage: in-memory, SSD-op mized, NVRAM-op mized. ▶ Non heap-like rows layout: index-organized table (IOT) including LSM. ▶ Non row-oriented data layout: either column or parquet layouts. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 5 / 34
  • 6.
    Current state ofpluggable storage https://www.postgresql.org/message-id/flat/ 20160812231527.GA690404%40alvherre.pgsql ▶ Started as quite mechanical separa on of heap_* methods into storage AM interface. ▶ Boundary of storage layer was significantly shi ed during discussions. ▶ S ll a lot of work to do. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 6 / 34
  • 7.
    Current view onpluggable storage proper es ▶ All the storages should share same transac on model (or have no transac ons at all). ▶ All the storages should write same WAL stream. ▶ Tuples have to be iden fied by TIDs (further improvement is possible). ▶ Storages should share some of index access methods. ▶ Index access method interface should be expanded with new func ons (at least retail tuple delete). ▶ Storages may have completely different MVCC implementa on. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 7 / 34
  • 8.
    Why FDW forprototyping Pro: ▶ FDW is completely free in the way it scans and modifies the foreign table. ▶ This approach is already used in cstore_fdw1 , vops2 . Cons: ▶ Lack of control on associated resources, ▶ Lack of DDL support. 1 https://github.com/citusdata/cstore_fdw 2 https://github.com/postgrespro/vops Alexander Korotkov In-memory OLTP storage with persistence and transac on support 8 / 34
  • 9.
    Why in-memory? ▶ Noextra mapping layer (buffer manager) to traverse from one page to another. ▶ Row-level WAL takes less space (no page-level informa on, no explicit index logging), but slower to apply. ▶ Be er IO u liza on (write both snapshots and WAL are wri en sequen ally). Alexander Korotkov In-memory OLTP storage with persistence and transac on support 9 / 34
  • 10.
    What this parcular in-memory engine is? ▶ Index organized table where index is in-memory B-tree. ▶ This B-tree supports transac ons and MVCC using UNDO log which is circular buffer in memory containing both row-level and page-level records. ▶ It writes full data snapshots on checkpoints and row-level WAL. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 10 / 34
  • 11.
    Why our in-memoryengine is a good example of pluggable storage Because it does the things in a quite different way. ▶ It stores data in main memory with quite different model of persistence: full data snapshots plus row-level WAL. ▶ It doesn’t have heap-like layout. ▶ It uses very different MVCC implementa on: combina on of row-level and page-level undo logs. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 11 / 34
  • 12.
    Why our in-memoryengine is a bad example of pluggable storage ▶ It uses CSN snapshot model which is far from ge ng commi ed. ▶ Tuples aren’t iden fied by TIDs. ▶ Persistence is implemented using set of hacks. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 12 / 34
  • 13.
    Configura on parameters ▶in_memory_engine.shared_pool_size – size of separate pool of 1k pages for in-memory tables. ▶ in_memory_engine.undo_size – size of circular buffer for undo records to support transac ons and MVCC. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 13 / 34
  • 14.
    Usage: defining in-memorytable and inser ng data CREATE EXTENSION in_memory; CREATE FOREIGN TABLE im_test ( id int8 NOT NULL, val text NOT NULL ) SERVER in_memory OPTIONS (indices ’unique (id)’, persistent ’true’); INSERT INTO im_test (SELECT id, ’val’ || id FROM generate_series(1, 1000000) id); Alexander Korotkov In-memory OLTP storage with persistence and transac on support 14 / 34
  • 15.
    Usage: querying asingle key # EXPLAIN ANALYZE SELECT * FROM im_test WHERE id = 50000; QUERY PLAN ---------------------------------------------------------------- Foreign Scan on im_test (cost=0.06..4.52 rows=357 width=40) (actual time=0.190..0.191 rows=1 loops=1) Pk conds: (id = 50000) Planning time: 0.635 ms Execution time: 0.260 ms (4 rows) Alexander Korotkov In-memory OLTP storage with persistence and transac on support 15 / 34
  • 16.
    Usage: querying akey range # EXPLAIN ANALYZE SELECT * FROM im_test WHERE id >= 10000 AND id <= 20000; QUERY PLAN ---------------------------------------------------------------- Foreign Scan on im_test (cost=0.06..5.41 rows=357 width=40) (actual time=0.045..4.194 rows=10001 loops=1) Pk conds: (id >= 10000 AND id <= 20000) Planning time: 0.075 ms Execution time: 4.915 ms (4 rows) Alexander Korotkov In-memory OLTP storage with persistence and transac on support 16 / 34
  • 17.
    Usage: querying fornon-key condi on # EXPLAIN ANALYZE SELECT * FROM im_test WHERE val LIKE ’%1111%’; QUERY PLAN ---------------------------------------------------------------- Foreign Scan on im_test (cost=0.06..891.83 rows=571 width=40) (actual time=0.345..187.325 rows=280 loops=1) Filter: (val ~~ ’%1111%’::text) Rows Removed by Filter: 999720 Planning time: 0.046 ms Execution time: 187.375 ms (5 rows) Alexander Korotkov In-memory OLTP storage with persistence and transac on support 17 / 34
  • 18.
    Usage: non-persistent tablesare writable on standby *** Master *** # CREATE FOREIGN TABLE im_test (id int8 NOT NULL, val text NOT NULL) SERVER in_memory OPTIONS (indices ’unique (id)’, persistent ’false’); # INSERT INTO im_test (SELECT id, ’val’ || id FROM generate_series(1, 1000000) id); INSERT 0 1000000 *** Standby *** # SELECT * FROM im_test; id | val ----+----- (0 rows) # INSERT INTO im_test VALUES (1, ’foo’), (2, ’bar’); INSERT 0 2 # SELECT * FROM im_test; id | val ----+----- 1 | foo 2 | bar (2 rows) Alexander Korotkov In-memory OLTP storage with persistence and transac on support 18 / 34
  • 19.
    Limita ons ▶ OnlyB-tree with limited func onality is supported. ▶ No secondary indexes are supported yet. ▶ No out-of-line storage are supported for tuples yet. ▶ Undo log shouldn’t wraparound during single transac on (that transac on is automa cally aborted). ▶ If required undo record is already overflowed then “snapshot’s too old” error is emi ed. ▶ Serializable isola on level isn’t supported. ▶ Replica on isn’t supported yet. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 19 / 34
  • 20.
    Read-only benchmark 0 50100 150 200 250 # Clients 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 QPS pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 builtin in-memory Alexander Korotkov In-memory OLTP storage with persistence and transac on support 20 / 34
  • 21.
    Why there isno win? Storage is only one layer par cipa ng in query execu on. There are also: ▶ Network layer, ▶ Executor, ▶ Parser (analyze & rewrite if not prepared), ▶ Transac on management (including snapshot acquirement), ▶ ... Alexander Korotkov In-memory OLTP storage with persistence and transac on support 21 / 34
  • 22.
    Measuring overheads 0 50100 150 200 250 # Clients 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 QPS pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 read-only SELECT 1; ; Alexander Korotkov In-memory OLTP storage with persistence and transac on support 22 / 34
  • 23.
    Read-only benchmark: fetch 9values per single query set aid1 random(1, 100000 * :scale) set aid2 random(1, 100000 * :scale) set aid3 random(1, 100000 * :scale) set aid4 random(1, 100000 * :scale) set aid5 random(1, 100000 * :scale) set aid6 random(1, 100000 * :scale) set aid7 random(1, 100000 * :scale) set aid8 random(1, 100000 * :scale) set aid9 random(1, 100000 * :scale) SELECT abalance FROM pgbench_accounts WHERE aid IN (:aid1, :aid2, :aid3, :aid4, :aid5, :aid6, :aid7, :aid8, :aid9); Alexander Korotkov In-memory OLTP storage with persistence and transac on support 23 / 34
  • 24.
    Read-only benchmark: fetch 9values per single query 0 50 100 150 200 250 # Clients 0 200000 400000 600000 800000 1000000 1200000 QPS pgbench -s 1000 -j $n -c $n -M prepared -f ro9.sql on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 builtin in-memory Alexander Korotkov In-memory OLTP storage with persistence and transac on support 24 / 34
  • 25.
    Read-only benchmark: compare values-per-second 050 100 150 200 250 # Clients 0 2000000 4000000 6000000 8000000 10000000 VPS pgbench -s 1000 -j $n -c $n -M prepared -S on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 builtin-1 builtin-9 in_memory-1 in_memory-9 Alexander Korotkov In-memory OLTP storage with persistence and transac on support 25 / 34
  • 26.
    Read-write benchmark withoutpersistence (async commit) 0 50 100 150 200 250 # Clients 0 50000 100000 150000 200000 250000 TPS pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 unlogged table in_memory Alexander Korotkov In-memory OLTP storage with persistence and transac on support 26 / 34
  • 27.
    Read-write benchmark withpersistence (async commit) 0 50 100 150 200 250 # Clients 0 50000 100000 150000 200000 TPS pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 builtin in-memory Alexander Korotkov In-memory OLTP storage with persistence and transac on support 27 / 34
  • 28.
    Read-write benchmark: do transacon in a single statement CREATE OR REPLACE FUNCTION tcpb_trx(_aid int, _bid int, _tid int, _delta int) RETURNS void AS $$ BEGIN UPDATE pgbench_accounts SET abalance = abalance + _delta WHERE aid = _aid; PERFORM abalance FROM pgbench_accounts WHERE aid = _aid; UPDATE pgbench_tellers SET tbalance = tbalance + _delta WHERE tid = _tid; UPDATE pgbench_branches SET bbalance = bbalance + _delta WHERE bid = _bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (_tid, _bid, _aid, _delta, CURRENT_TIMESTAMP); END; $$ LANGUAGE plpgsql; set aid random(1, 100000 * :scale) set bid random(1, 1 * :scale) set tid random(1, 10 * :scale) set delta random(-5000, 5000) SELECT tcpb_trx(:aid, :bid, :tid, :delta); Alexander Korotkov In-memory OLTP storage with persistence and transac on support 28 / 34
  • 29.
    Read-write benchmark withpersistence (async commit, func on vs. interac ve) 0 50 100 150 200 250 # Clients 0 100000 200000 300000 400000 500000 600000 TPS pgbench -s 1000 -j $n -c $n -M prepared on 4 x 18 cores Intel Xeon E7-8890 processors mean of 3 3-minute runs with shared_buffers = 32GB, max_connections = 300 builtin builtin-func in_memory in_memory-func Alexander Korotkov In-memory OLTP storage with persistence and transac on support 29 / 34
  • 30.
    Hacks used inimplementa on ▶ Minimalis c CSN implementa on CSNs are assigned but neither used, neither wri en to SLRUs. in-memory engine doesn’t need SLRU. ▶ Checkpoint hook in-memory engine writes full data snapshot on checkpoint. ▶ Generic logical message hook Used to implement custom recovery/replica on. This is an awful hack. ▶ TRUNCATE using u lity command hook TRUNCATE isn’t supported by FDW directly. ▶ DROP support using event trigger Used to free the resources occupied by in-memory table. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 30 / 34
  • 31.
    Recovery problem ▶ Row-levelWAL is compact, but it requires meta-informa on to apply. That is we need to be able to read system catalog while applying WAL including recovery. ▶ We can’t access system catalog during recovery, because the whole database isn’t accessible since it’s not recovered yet. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 31 / 34
  • 32.
    Recovery problem soluon: 2-phase recovery At the second phase we have consistent system catalog. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 32 / 34
  • 33.
    Future roadmap Integrate in-memoryas pluggable storage: ▶ In-memory B-tree as index access method. ▶ Implement storage for in-memory tables using one of following ways: ▶ Write some kind of «in-memory heap» OR/AND ▶ Write a storage wrapper implemen ng index-organized table. Alexander Korotkov In-memory OLTP storage with persistence and transac on support 33 / 34
  • 34.
    Thank you fora en on! Alexander Korotkov In-memory OLTP storage with persistence and transac on support 34 / 34