© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grant McAlister – Senior Principal Engineer – Amazon RDS
Sept 2017
Tuning PostgreSQL for High
Write Workloads
High Write Workloads?
CLIENT
Database
Storage
Single
Insert/Update
High Write Workloads?
CLIENT
Database
Storage
Commit
Latency
Client
Latency
Single
Insert/Update
High Write Workloads?
CLIENT
Database
Storage
Commit
Latency
Client
Latency
CLIENT
Database
Storage
Single
Insert/Update
High Write Workloads?
CLIENT
Database
Storage
Commit
Latency
Client
Latency
CLIENT
Database
Storage
Commit
Latency
Client
Latency
Single
Insert/Update
Copy or
Multi-Insert
High Write Workloads?
CLIENT
Database
Storage
Commit
Latency
Client
Latency
CLIENT
Database
Storage
Commit
Latency
Client
Latency
Single
Insert/Update
Copy or
Multi-Insert
CLIENT
Database
Storage
High Write Workloads?
CLIENT
Database
Storage
Commit
Latency
Client
Latency
CLIENT
Database
Storage
Commit
Latency
Client
Latency
Single
Insert/Update
Copy or
Multi-Insert
CLIENT
Database
Storage
Commit
Latency
Many
Clients
T
H
R
O
U
G
H
P
U
T
Insert Test
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE Non Random GUID
WHICH ONE WOULD YOU LIKE
Update Test
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
UPDATE #1
UPDATE #2
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
WHICH ONE WOULD YOU LIKE
Parameter Tuning
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Concurrency and Throughput – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Full Page Writes
Block in
Memory
PostgreSQL
WAL
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
WAL
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Full
Block
WAL
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Full
Block
WAL
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
During crash
recovery
PostgreSQL
uses the FPW
block in the
WAL to replace
the bad
checkpointed
block
Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
During crash
recovery
PostgreSQL
uses the FPW
block in the
WAL to replace
the bad
checkpointed
block
WAL throughput – Dump of wal
Btree d: INSERT_LEAF off 184, blkref #0: rel 1663/32772/32779 blk 300
Btree d: INSERT_LEAF off 110, blkref #0: rel 1663/32772/32784 blk 1092
Btree d: INSERT_LEAF off 41, blkref #0: rel 1663/32772/32782 blk 5752
Btree d: INSERT_LEAF off 40, blkref #0: rel 1663/32772/32782 blk 8000
Btree d: INSERT_LEAF off 89, blkref #0: rel 1663/32772/32779 blk 1757
Btree d: INSERT_LEAF off 363, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 77, blkref #0: rel 1663/32772/32783 blk 4
Btree d: INSERT_LEAF off 94, blkref #0: rel 1663/32772/32779 blk 2083
Btree d: INSERT_LEAF off 362, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 10, blkref #0: rel 1663/32772/32782 blk 7687
Btree d: INSERT_LEAF off 365, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 114, blkref #0: rel 1663/32772/32784 blk 791
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32783 blk 2213
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32785 blk 1639
Btree d: INSERT_LEAF off 209, blkref #0: rel 1663/32772/32784 blk 1433
Transaction d: COMMIT 2017-09-07 01:08:55.354810 UTC
At the beginning of the run
WAL throughput – Dump of wal
Btree d: INSERT_LEAF off 184, blkref #0: rel 1663/32772/32779 blk 300
Btree d: INSERT_LEAF off 110, blkref #0: rel 1663/32772/32784 blk 1092
Btree d: INSERT_LEAF off 41, blkref #0: rel 1663/32772/32782 blk 5752
Btree d: INSERT_LEAF off 40, blkref #0: rel 1663/32772/32782 blk 8000
Btree d: INSERT_LEAF off 89, blkref #0: rel 1663/32772/32779 blk 1757
Btree d: INSERT_LEAF off 363, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 77, blkref #0: rel 1663/32772/32783 blk 4
Btree d: INSERT_LEAF off 94, blkref #0: rel 1663/32772/32779 blk 2083
Btree d: INSERT_LEAF off 362, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 10, blkref #0: rel 1663/32772/32782 blk 7687
Btree d: INSERT_LEAF off 365, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 114, blkref #0: rel 1663/32772/32784 blk 791
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32783 blk 2213
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32785 blk 1639
Btree d: INSERT_LEAF off 209, blkref #0: rel 1663/32772/32784 blk 1433
Transaction d: COMMIT 2017-09-07 01:08:55.354810 UTC
Btree d: INSERT_LEAF off 216, blkref #0: rel 1663/16395/16407 blk 14331 FPW
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 139, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 59, blkref #0: rel 1663/16395/16407 blk 17944 FPW
Btree d: INSERT_LEAF off 45, blkref #0: rel 1663/16395/16408 blk 17
Btree d: INSERT_LEAF off 252, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 135, blkref #0: rel 1663/16395/16408 blk 7
Btree d: INSERT_LEAF off 5, blkref #0: rel 1663/16395/16405 blk 131373 FPW
Btree d: INSERT_LEAF off 175, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 19, blkref #0: rel 1663/16395/16405 blk 40974 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 19, blkref #0: rel 1663/16395/16405 blk 143873 FPW
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 14, blkref #0: rel 1663/16395/16405 blk 37468 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 84, blkref #0: rel 1663/16395/16407 blk 2696
Btree d: INSERT_LEAF off 149, blkref #0: rel 1663/16395/16407 blk 1401 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16406 blk 39718
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16410 blk 29411
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16408 blk 29370
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 24, blkref #0: rel 1663/16395/16405 blk 69991 FPW
Transaction d: COMMIT 2017-09-07 01:04:32.650362 UTC
At the beginning of the run Later in the run
1K vs 48K of data
WAL Compression
Block in
Memory
update t set y = 6;
Full
Block
WAL
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
BlockCompressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
BlockCompressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
BlockCompressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
BlockCompressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
WAL
Compressed
BlockCompressed
Block
WAL Compression
Block in
Memory
update t set y = 6;
Block on
Disk
Checkpoint
Datafile
WAL
Archive
Compressed
BlockCompressed
Block
Lot of Random
Values
=
Poor
Compression
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
Compressed – Average 5.2KB per FPW
Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
Compressed – Average 5.2KB per FPW
Random Data does not compress well
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
max_wal_size=16GB
More Blocks + Random Inserts = More FPW
Randomness and Size of the Data
Assume 10K Random Updates per Second
Assume checkpoint every 60 second
Touch 10K x 60 = 600K blocks between checkpoints
1GB table is ~130K blocks - Touch every block 4 times
100GB table is 13M blocks - Low chance of same block touch
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
What about ASYNC
Async Commit – Log Buffer
Queued Work
Log Buffer
Most Databases
Storage
Async Commit – Log Buffer
Queued Work
Log Buffer
Most Databases
StorageThese sessions no longer have
to wait
Async Commit – Log Buffer
Queued Work
Log Buffer
Most Databases
StorageThese sessions no longer have
to wait
Async Commit – Log Buffer
Queued Work
Log Buffer
Most Databases
StorageThese sessions no longer have
to wait
Longer Checkpoints – Reduce FPW - BUT
Default
min_wal_size=256MB, max_wal_size=2GB
recovery time = 3 seconds
New
min_wal_size=2GB, max_wal_size=16GB
recovery time = 91 seconds
Non Parameter Tuning
1-
200
1-
100
101-
200
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert a Sequence Number into a B-tree
1-
200
1-
100
101-
200
Insert 201
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
1-
201
101-
201
151-
201
176-
201
4 blocks
loaded
Insert a Sequence Number into a B-tree
1-
200
1-
100
101-
200
Insert 201
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 202
1-
201
101-
201
151-
201
176-
201
1-
202
101-
202
151-
202
176-
202
4 blocks
loaded
Insert a Sequence Number into a B-tree
0 blocks
loaded
1-
200
1-
100
101-
200
Insert 201
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 202
1-
201
101-
201
151-
201
176-
201
1-
202
101-
202
151-
202
176-
202
4 blocks
loaded
Insert a Sequence Number into a B-tree
0 blocks
loaded
At least 1 FPW
1-
200
1-
100
101-
200
Insert
124
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
1-
200
101-
200
101-
150
101-
125
4 blocks
loaded
Insert a Random value into a B-tree
1-
200
1-
100
101-
200
Insert
124
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 99
1-
200
101-
200
101-
150
101-
125
4 blocks
loaded
1-
100
51-
100
76-
100
3 blocks
loaded
Insert a Random value into a B-tree
1-
200
1-
100
101-
200
Insert
124
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 99
1-
200
101-
200
101-
150
101-
125
4 blocks
loaded
1-
100
51-
100
76-
100
3 blocks
loaded
Insert a Random value into a B-tree
151-
200
151-
200
2 blocks
loaded
Insert 161
At least 3 FPW
Remember those 9 Indexes – Cut it to 6
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
Remove Index
Remove Index
Remove Index – Allows HOT Update
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c1, d, e, f
HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c1, d, e, f
HOT Updates – Looking at FPW in the logs
HOT Updated
Heap 14/ 68, , d: HOT_UPDATE off 19 xmax 2327993188 ; new off 3 xmax 0, blkref #0: rel 1663/41083/41086 blk 28
XLOG 0/ 3368, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 1492899 FPW
Transaction 8/ 34, , d: COMMIT 2017-09-07 00:07:17.532647 UTC
Non HOT Update
Heap 14/ 75, , d: UPDATE off 67 xmax 2327993195 ; new off 7 xmax 0, blkref #0: rel 1663/41083/41086 blk 285
XLOG 0/ 2774, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41090 blk 7039952 FPW
Btree 2/ 120, , d: INSERT_LEAF off 17, blkref #0: rel 1663/41083/41090 blk 7039952
XLOG 0/ 3150, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 29 FPW
Btree 2/ 64, , d: INSERT_LEAF off 205, blkref #0: rel 1663/41083/41092 blk 29
Btree 2/ 2639, , d: INSERT_LEAF off 73, blkref #0: rel 1663/41083/41093 blk 4 FPW
Btree 2/ 3148, , d: INSERT_LEAF off 2, blkref #0: rel 1663/41083/41094 blk 1 FPW
Btree 2/ 5099, , d: INSERT_LEAF off 364, blkref #0: rel 1663/41083/41095 blk 4237904 FPW
Transaction 8/ 34, , d: COMMIT 2017-09-07 00:24:29.427017 UTC
3.4K VS 16.7K
HOT Updates – How to Track
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
0 | 0
sfo=> update benchmark_uuid set e=cast(0 as boolean) where id = 1000;
UPDATE 1
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
1 | 1
sfo=> update benchmark_uuid set last_updated=CURRENT_TIMESTAMP where id=1001;
UPDATE 1
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
2 | 1
HOT UPDATE
=
!=
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL Reduced Indexes
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
Remember those 9 Indexes – Cut it to 6
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
1-
200
1-
100
101-
200
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert a Constrained Random Value into a B-tree
1-
200
1-
100
101-
200
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert a Constrained Random Value into a B-tree
1-
200
1-
100
101-
200
Insert
172
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
1-
200
101-
200
151-
200
151-
175
4 blocks
loaded
Insert a Constrained Random Value into a B-tree
1-
200
1-
100
101-
200
Insert
172
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 199
1-
200
101-
200
151-
200
151-
175
4 blocks
loaded
176-
200
1 blocks
loaded
Insert a Constrained Random Value into a B-tree
Insert 161
1-
200
1-
100
101-
200
Insert
172
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 199
1-
200
101-
200
151-
200
151-
175
4 blocks
loaded
176-
200
1 blocks
loaded
Insert a Constrained Random Value into a B-tree
Insert 161
0 blocks
loaded
1-
200
1-
100
101-
200
Insert
172
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 199
1-
200
101-
200
151-
200
151-
175
4 blocks
loaded
176-
200
1 blocks
loaded
Insert a Constrained Random Value into a B-tree
Insert 161
0 blocks
loaded
Insert 168
0 blocks
loaded
1-
200
1-
100
101-
200
Insert
172
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
Insert 199
1-
200
101-
200
151-
200
151-
175
4 blocks
loaded
176-
200
1 blocks
loaded
Insert a Constrained Random Value into a B-tree
Insert 161
0 blocks
loaded
Insert 168
0 blocks
loaded At least 2 FPW
Constraining Random Values
Prefix UUID with a date
• 550e8400-e29b-41d4-a716-446655440000
• YYYYMMDDHH24-UUID
• Example
• 2010022712-550e8400-e29b-41d4-a716-446655440000
• 2010022713-550e8400-e29b-41d4-a716-446655440000
• Balance the number of hot blocks vs contention
• More date precision
• less random to b-tree = less blocks touched
• Possibly more contention on the leaf blocks
Change PK to UUID like and remove 3 indexes
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
Remove Index
Remove Index
Remove Index – Allows HOT Update
Remove Index
Remove Index
Make non random – right lean
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL Reduced Indexes Non Random GUID
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
Why Vacuuming Matter
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
Why Vacuuming Matter
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
Why Vacuuming Matter
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
Why Vacuuming Matter
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
Why Vacuuming Matter
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuum
Why Vacuuming Matter
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuum
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuum
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple3
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple4
tuple3
tuple4
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple4
tuple3
tuple4
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
No Vacuum
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple4
tuple3
tuple4
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
No Vacuumtuple3
tuple3
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple4
tuple3
tuple4
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
No Vacuumtuple3
tuple4
tuple3
tuple4
Why Vacuuming Matter
tuple3
tuple4
tuple5
tuple6
tuple1
tuple2
tuple7
tuple8
tuple9
tuple10
Run Vacuumtuple3
tuple4
tuple3
tuple4
tuple1
tuple2
tuple3
tuple4
tuple5
tuple6
tuple1tuple1
tuple2 tuple2
tuple7
tuple8
tuple9
tuple10
No Vacuumtuple3
tuple4
tuple3
tuple4
More blocks = More Cache Misses,non HOT updates & more FPW
4,000
5,000
6,000
7,000
8,000
9,000
10,000
1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261
TPS
Minutes
Updates - No Vacuum Running
Vacuuming in Memory – Insert Only Workload
HEAP
PAGES
Vacuuming in Memory – Insert Only Workload
HEAP
PAGES
1-
200
1-
100
101-
200
1-50
51-
100
1-25
26-
50
51-
75
76-
100
101-
150
151-
200
101-
125
126-
150
151-
175
176-
200
1-
201
101-
201
151-
201
176-
201
1-
202
101-
202
151-
202
176-
202
NO LONG RUNNING TRANSACTIONS
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
WAL
Archive
Not
Frozen
Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
WAL
Archive
Not
Frozen
Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
WAL
Archive
Not
Frozen
VACUUM
Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
Full
Block
WAL
Archive
Not
Frozen
VACUUM
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
WAL
Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
WAL
Archive
Frozen
Vacuum in Memory Continued
• Increase checkpoint_timeout
• alter table X set (vacuum settings);
• Manual Test
• Vacuum in Memory before checkpoint – 3.5 seconds
• Vacuum in Memory after checkpoint – 84.5 seconds
• Vacuum not in Memory – 165.8 seconds
Aurora PostgreSQL
Aurora PostgreSQL Differences
• No Checkpoints
• No Full Page Writes (FPW)
• No Log Buffer
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A Queued Work
Storage
B
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B
C
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C
D
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D
E
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
0 0 0 0 0
A B C D E
Durability
Tracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
0 0 0 0 0
A B C D E
2 2 1 0 1
A B C D E
Durability
Tracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
4 3 4 2 4
A B C D E
Durability
Tracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
4 3 4 2 4
A B C D E
Durability
Tracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A
Queued Work
Storage
B C D E
6 5 6 3 5
A B C D E
Durability
Tracking
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
WAL
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Reduced Indexes Non Random GUID Aurora PostgreSQL
3,729 3,949
4,871
4,302
9,177
10,290
17,158
-
5,000
10,000
15,000
20,000
TPS(2UpdatesperTransaction)
Update Workload - PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Async Reduced Indexes Non Random GUID
Aurora PostgreSQL
Thank you!
Questions?

Tuning PostgreSQL for High Write Throughput

Editor's Notes

  • #14 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #15 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #16 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #17 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #18 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #19 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #20 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #49 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #50 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #51 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #52 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #120 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #121 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #122 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #123 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #124 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #125 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #126 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #127 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #128 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #129 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #130 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #131 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #132 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #133 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #134 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #135 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #136 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #137 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #138 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #139 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #140 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #141 Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes
  • #151 Ask about preview environment and other decoders