OLAP+OLTP=HTA
P
Postgres Professional
Konstantin Knizhnik
OLAP and OLTP oriented databases
mySQL
Oracle
SqlLite
SQLServer
Postgres
OLTP
Clickhouse
GreenPlum
SparkSQL
Teradata
Vertica
OLAP
Lambda architecture
Hybrid OLAP and OLTP databases
HyPer
SAP HANA
VoltDB
...
HTAP
Clickhouse
GreenPlum
SparkSQL
Teradata
Vertica
OLAPOLTP
mySQL
Oracle
SqlLite
SQLServer
Postgres
Speedup executor
Interpretation
overhead
elimination
JIT
1
2
3
3
4
5
4
6
8
+ =
Vectorize executor
Why Postgres is slow on OLAP queries?
1. Unpacking tuple overhead (heap_deform_tuple)
2. Interpretation overhead (invocation of query plan node functions)
3. Abstraction penalty (user defined types and operations)
4. Pull model overhead (saving/restoring context on each access to the
page)
5. MVCC overhead (~20 bytes per tuple space overhead + visibility
check overhead)
Typical OLAP query profile
16.57% postgres postgres [.] slot_deform_tuple
13.39% postgres postgres [.] ExecEvalExpr
8.64% postgres postgres [.] advance_aggregates
8.58% postgres postgres [.] advance_transition_function
5.83% postgres postgres [.] float8_accum
5.14% postgres postgres [.] tuplehash_insert
3.89% postgres postgres [.] float8pl
3.60% postgres postgres [.] slot_getattr
2.66% postgres postgres [.] bpchareq
2.56% postgres postgres [.] heap_getnext
Query execution plan
select count(*) from where salary > 100000;
>
heap scan
salary 100000
count(*)
Traditional query execution
shipdate quantity price
21.02.2017 100 99
23.02.2017 200 60
24.02.2017 150 120
SELECT sum(quantity*price) FROM lineitems;
100 * 99 = 9900
200 * 60 = 12000
150 * 120 = 18000
+
+
= 39900
Vectorized query execution
shipdate quantity price
21.02.2017,
23.02.2017,
24.02.2017
100,
200,
150
99,
60,
120
25.02.2017,
26.02.2017,
28.02.2017
300,
110,
80
100,
60,
230
SELECT sum(quantity*price) FROM lineitems;
100
200
150
99
60
120
Sum = 39900
9900
12000
18000
Tile
Row store vs. column store
Row store
optimal for OLTP
postgres heap
Column store
optimal for OLAP
cstore,zedstore
Columnar store & vectorized executor
par.workers PG9_6
vectorize=off
PG9_6
vectorize=on
master
vectorize=off
jit=on
master
vectorize=off
jit=off
master
vectorize=on
jit=on
master
vectorize=on
jit=off
zedstore
vectorize=off
jit=on
zedstore
vectorize=off
jit=off
zedstore
vectorize=on
jit=on
zedstore
vectorize=on
jit=off
0 36 20 16 25.5 15 17.5 18 26 17 19
4 10 - 5 7 - - 5 7 - -
Results of TPCH-10G/Q1
1. For 9.6 vectorization gives ~2x speed improvement
2. PG 13 is almost 2 times faster than 9.6 (thanks to JIT)
3. Effect of vectorized execution is much smaller on PG 13
4. Zedstore doesn’t give any noticeable performance advantages
To index or not to index
?
Fast indexes: LSM
C0
C1
C2
Cn
memory
disk
merge join
Postgres + RocksDB FDW
Index Clients TPS
Inclusive B-Tree 1 9387
Inclusive B-Tree 10 18761
RocksDB FDW 1 138350
RocksDB FDW 10 122369
RocksDB 1 166333
RocksDB 10 141482
Benchmark: insertion of 250 millions of records with random key in inclusive
index containing one bigint key and 8 bigint fields.
x15
x6
LSM simulation with B-Tree
Current index
Heap table
Main index
Merging index
inserts
merge
selects
B-Tree*3 = Lsm3
Index Clients TPS
Inclusive B-Tree 1 9387
Inclusive B-Tree 10 18761
RocksDB FDW 1 138350
RocksDB FDW 10 122369
RocksDB 1 166333
RocksDB 10 141482
Lsm3 1 151699
Lsm3 10 65997
x16
Write amplification problem
Why Uber Engineering Switched from Postgres to MySQL?
Postgres MVCC
Key TID
ABC <100,1>
ABC <100,3>
ABC <200,2>
B-Tree
Heap
Tuples:Block 100
Block 200 Tuples:
Hot update
Key TID
ABC <100,1>
ABC <200,2>
B-Tree
Heap
Tuples:Block 100
Block 200 Tuples:
hot update chain
TOAST
TID Main
attributes
1
2
3
4
Main table TOAST table
BlockNo TID ToastNo Extended
attributes
100 1 1
100 1 2
100 1 3
100 2 1
UNDAM tuple chains
Header
chunk
Next
Undo
Data
Extension
chunk
Next
Data
Extension
chunk
Next
Data
Extension
chunk
Next
Data
Extension
chunk
Next
Data
Extension
chunk
Next
Data
Header
chunk
Next
Undo
Data
UNDAM fixed size allocator
Chain
Next
Bitmask:
00101
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
10001
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
00001
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
11100
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
00111
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
00001
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain
Next
Bitmask:
10101
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chain headers
UNDAM relations forks
Main fork Extension fork (fsm)
Head chunks
of last
versions
Extension chunks
+
undo chains
undo
tail
UNDAM and WAL
Chunk header
{1,ABB,100,29.05.2020}
{2,SAP,200,29.05.2020}
empty
{3,IBM,300,29.05.2020}
Old block
Chunk header
{1,ABB,100,29.05.2020}
{2,SAP,300,01.06.2020}
empty
{3,IBM,29.05.2020}
New block
Update
Delta
Performance comparison: pgbench
pgbench -c 10 logged unlogged
Standard heap 7810 8167
UNDAM
chunk_size=64
6367 8010
UNDAM
chunk_size=150
7011 8475
zheap 6174 5800
Update only performance
pgbench -c 10 -P 1 -T 1000 -M prepared -f update.sql postgres
set aid random(1, 100000 * :scale)
update pgbench_accounts set abalance=abalance+1 WHERE aid = :aid;
Configuration TPS
heap+logged 83339
heap+unlogged 110267
undam+logged 156484
undam+unlogged 163072
zheap+logged 91846
zheap+unlogged 106987
Past: OLAP hypercubes
Materialized views: hypercubes for PG
7.1 7.2 7.3 7.4 8.0 8.1 8.2 8.3 8.4 9.0 9.1 9.2 9.3 9.4 9.5 9.6 10 11 12 13 ?
OLTP
OLAP
Materialized
Views
Incremental
Materialized
Views
Views
Incremental materialized view
create incremental materialized view teller_sums as
select t.tid,sum(abalance)
from pgbench_accounts a join pgbench_tellers t
on a.bid=t.bid group by t.tid;
Benchmarking 1
pgbench -i -s 100 postgres
done in 26.07 s
create incremental materialized view teller_sums as select t.tid,sum(abalance) from
pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid;
Time: 20805.230 ms (00:20.805)
select * from teller_sums where tid=1;
Time: 0.871 ms
select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid
group by t.tid having t.tid=1;
Time: 915.508 ms x1000
Insertion speed
pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
matview
10194
TPS
141
TPS
without with
Oops!
Investigations...
Reasons of slow updates
1. Lack of index: you have to create index for materialized view to allow
efficient update
2. Exclusive lock of materialized view: kill concurrent execution
3. Adding more view cause proportional increase of update time
Incremental update using triggers
x
1
Table O
y
2
Table I
create view materialized view V as
select x,y from O,I;
x y
1 2
Concurrent update of view
Transaction 1 Transaction 2
begin; begin;
Insert into I (x) values (3); Insert into O (y) values (4);
end;
end;
x y
1 2
3 2
x y
1 2
3 2
1 4
Conclusion
1. JIT almost eliminates need in vectorized executor
2. LSM index allows to combine high insertion speed and fast index
scans.
3. UNDO storage provides inplace updates and can significantly
increase update speed
4. Materialized views dramatically decrease insertion speed
Some links:
RocksDB FDW: https://github.com/postgrespro/lsm
LSM3: https://github.com/postgrespro/lsm3
UNDAM: : https://github.com/postgrespro/undam
Vectorized engine: https://github.com/zhangh43/vectorize_engine
VOPS: https://github.com/postgrespro/vops
Questions
postgres=# ?

OLTP+OLAP=HTAP

  • 1.
  • 2.
    OLAP and OLTPoriented databases mySQL Oracle SqlLite SQLServer Postgres OLTP Clickhouse GreenPlum SparkSQL Teradata Vertica OLAP
  • 3.
  • 4.
    Hybrid OLAP andOLTP databases HyPer SAP HANA VoltDB ... HTAP Clickhouse GreenPlum SparkSQL Teradata Vertica OLAPOLTP mySQL Oracle SqlLite SQLServer Postgres
  • 5.
  • 6.
    Why Postgres isslow on OLAP queries? 1. Unpacking tuple overhead (heap_deform_tuple) 2. Interpretation overhead (invocation of query plan node functions) 3. Abstraction penalty (user defined types and operations) 4. Pull model overhead (saving/restoring context on each access to the page) 5. MVCC overhead (~20 bytes per tuple space overhead + visibility check overhead)
  • 7.
    Typical OLAP queryprofile 16.57% postgres postgres [.] slot_deform_tuple 13.39% postgres postgres [.] ExecEvalExpr 8.64% postgres postgres [.] advance_aggregates 8.58% postgres postgres [.] advance_transition_function 5.83% postgres postgres [.] float8_accum 5.14% postgres postgres [.] tuplehash_insert 3.89% postgres postgres [.] float8pl 3.60% postgres postgres [.] slot_getattr 2.66% postgres postgres [.] bpchareq 2.56% postgres postgres [.] heap_getnext
  • 8.
    Query execution plan selectcount(*) from where salary > 100000; > heap scan salary 100000 count(*)
  • 9.
    Traditional query execution shipdatequantity price 21.02.2017 100 99 23.02.2017 200 60 24.02.2017 150 120 SELECT sum(quantity*price) FROM lineitems; 100 * 99 = 9900 200 * 60 = 12000 150 * 120 = 18000 + + = 39900
  • 10.
    Vectorized query execution shipdatequantity price 21.02.2017, 23.02.2017, 24.02.2017 100, 200, 150 99, 60, 120 25.02.2017, 26.02.2017, 28.02.2017 300, 110, 80 100, 60, 230 SELECT sum(quantity*price) FROM lineitems; 100 200 150 99 60 120 Sum = 39900 9900 12000 18000 Tile
  • 11.
    Row store vs.column store Row store optimal for OLTP postgres heap Column store optimal for OLAP cstore,zedstore
  • 12.
    Columnar store &vectorized executor par.workers PG9_6 vectorize=off PG9_6 vectorize=on master vectorize=off jit=on master vectorize=off jit=off master vectorize=on jit=on master vectorize=on jit=off zedstore vectorize=off jit=on zedstore vectorize=off jit=off zedstore vectorize=on jit=on zedstore vectorize=on jit=off 0 36 20 16 25.5 15 17.5 18 26 17 19 4 10 - 5 7 - - 5 7 - - Results of TPCH-10G/Q1 1. For 9.6 vectorization gives ~2x speed improvement 2. PG 13 is almost 2 times faster than 9.6 (thanks to JIT) 3. Effect of vectorized execution is much smaller on PG 13 4. Zedstore doesn’t give any noticeable performance advantages
  • 13.
    To index ornot to index ?
  • 14.
  • 15.
    Postgres + RocksDBFDW Index Clients TPS Inclusive B-Tree 1 9387 Inclusive B-Tree 10 18761 RocksDB FDW 1 138350 RocksDB FDW 10 122369 RocksDB 1 166333 RocksDB 10 141482 Benchmark: insertion of 250 millions of records with random key in inclusive index containing one bigint key and 8 bigint fields. x15 x6
  • 16.
    LSM simulation withB-Tree Current index Heap table Main index Merging index inserts merge selects
  • 17.
    B-Tree*3 = Lsm3 IndexClients TPS Inclusive B-Tree 1 9387 Inclusive B-Tree 10 18761 RocksDB FDW 1 138350 RocksDB FDW 10 122369 RocksDB 1 166333 RocksDB 10 141482 Lsm3 1 151699 Lsm3 10 65997 x16
  • 18.
    Write amplification problem WhyUber Engineering Switched from Postgres to MySQL?
  • 19.
    Postgres MVCC Key TID ABC<100,1> ABC <100,3> ABC <200,2> B-Tree Heap Tuples:Block 100 Block 200 Tuples:
  • 20.
    Hot update Key TID ABC<100,1> ABC <200,2> B-Tree Heap Tuples:Block 100 Block 200 Tuples: hot update chain
  • 21.
    TOAST TID Main attributes 1 2 3 4 Main tableTOAST table BlockNo TID ToastNo Extended attributes 100 1 1 100 1 2 100 1 3 100 2 1
  • 22.
  • 23.
    UNDAM fixed sizeallocator Chain Next Bitmask: 00101 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 10001 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 00001 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 11100 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 00111 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 00001 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain Next Bitmask: 10101 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chain headers
  • 24.
    UNDAM relations forks Mainfork Extension fork (fsm) Head chunks of last versions Extension chunks + undo chains undo tail
  • 25.
    UNDAM and WAL Chunkheader {1,ABB,100,29.05.2020} {2,SAP,200,29.05.2020} empty {3,IBM,300,29.05.2020} Old block Chunk header {1,ABB,100,29.05.2020} {2,SAP,300,01.06.2020} empty {3,IBM,29.05.2020} New block Update Delta
  • 26.
    Performance comparison: pgbench pgbench-c 10 logged unlogged Standard heap 7810 8167 UNDAM chunk_size=64 6367 8010 UNDAM chunk_size=150 7011 8475 zheap 6174 5800
  • 27.
    Update only performance pgbench-c 10 -P 1 -T 1000 -M prepared -f update.sql postgres set aid random(1, 100000 * :scale) update pgbench_accounts set abalance=abalance+1 WHERE aid = :aid; Configuration TPS heap+logged 83339 heap+unlogged 110267 undam+logged 156484 undam+unlogged 163072 zheap+logged 91846 zheap+unlogged 106987
  • 28.
  • 29.
    Materialized views: hypercubesfor PG 7.1 7.2 7.3 7.4 8.0 8.1 8.2 8.3 8.4 9.0 9.1 9.2 9.3 9.4 9.5 9.6 10 11 12 13 ? OLTP OLAP Materialized Views Incremental Materialized Views Views
  • 30.
    Incremental materialized view createincremental materialized view teller_sums as select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid;
  • 31.
    Benchmarking 1 pgbench -i-s 100 postgres done in 26.07 s create incremental materialized view teller_sums as select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid; Time: 20805.230 ms (00:20.805) select * from teller_sums where tid=1; Time: 0.871 ms select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1; Time: 915.508 ms x1000
  • 32.
    Insertion speed pgbench -Mprepared -N -c 10 -j 4 -T 30 -P 1 postgres matview 10194 TPS 141 TPS without with Oops!
  • 33.
  • 34.
    Reasons of slowupdates 1. Lack of index: you have to create index for materialized view to allow efficient update 2. Exclusive lock of materialized view: kill concurrent execution 3. Adding more view cause proportional increase of update time
  • 35.
    Incremental update usingtriggers x 1 Table O y 2 Table I create view materialized view V as select x,y from O,I; x y 1 2
  • 36.
    Concurrent update ofview Transaction 1 Transaction 2 begin; begin; Insert into I (x) values (3); Insert into O (y) values (4); end; end; x y 1 2 3 2 x y 1 2 3 2 1 4
  • 37.
    Conclusion 1. JIT almosteliminates need in vectorized executor 2. LSM index allows to combine high insertion speed and fast index scans. 3. UNDO storage provides inplace updates and can significantly increase update speed 4. Materialized views dramatically decrease insertion speed
  • 38.
    Some links: RocksDB FDW:https://github.com/postgrespro/lsm LSM3: https://github.com/postgrespro/lsm3 UNDAM: : https://github.com/postgrespro/undam Vectorized engine: https://github.com/zhangh43/vectorize_engine VOPS: https://github.com/postgrespro/vops
  • 39.