Indexes in PostgreSQL (10)

PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Indexes
in
PostgreSQL
(10)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The outlineThe outline
• Indexes in PostgreSQL
• What’s new in v10:
– Parallelism
– Hash indexing
– New supports for SP-GiST (inet data)
– Summarization of BRINs

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
~$ whoami~$ whoami
Giuseppe BroccoloGiuseppe Broccolo
- data engineer at- data engineer at
- member of- member of
@giubro
gbroccolo7
gbroccolo
gemini__81

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
– allow concurrent changes (MVCC compliant)
– persist the information (WAL)
– speed up access to data:
• links to data blocks (sometimes can be avoided)
• Indexes’ blocks live in shared buffers AWA data blocks
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
WALWALWAL
sharedbuffers

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The default AMs – the treesThe default AMs – the trees
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
balanced
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
balanced
unbalanced
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
• unbalanced ones are quite faster for range
searches

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The default AMs – the hashesThe default AMs – the hashes
• binary maps (k: v)
– k: the hash of the search key - bucket
– v: the address where the key is stored
– just one kind of search: =
– complexity:
• ~O(1)
– like trees, their sizes are comparable with
the indexed dataset
• ~O(N)
search key
k: value...
hashing
N
complexity
~O(logN)
...
~O(1)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The default AMs – the BRINsThe default AMs – the BRINs
• Block Range Indexes:
– À. Herrera, S. Riggs, H. Linnakangas (PG 9.5)
– Range: summarization of adjacent-on-disk blocks
– complexity:
• ~O(N/K), K~10/100
• really small indexes,faster creation
• ~O(N/K’), K’~1000/10000
• can be used for low-selectivity queries
• low performance for “dynamic” data
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
range 0 range 1 range 2 range 3
range 7range 6range 5range 4
Summarization:
blk n. xxxxx
range X blk n. yyyyy
blk n. zzzzz
......

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The default AMsThe default AMs
• B-tree, GIN, GiST, SP-GiST, Hash, BRIN
• can add user defined new access methods
– fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD
sortable generalized
balanced unbalanced
trees

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses
• access methods use operator classes (opclass)
•
•
•
• define:
– operators for the needed types
– support functions depending on the access method
• can be extended to specific datatypes
CREATE INDEX idx_name
USING method
ON table (column opclass_name)
WITH (opt=value);
• CREATE OPERATOR CLASS opclass_name
FOR TYPE datatype
USING method
OPERATOR $$(),
[...],
FUNCTION func1(),
[...]

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Execution plansExecution plans
• IndexScan need to inspect data
pages for row visibility
• IndexOnlyScan just index pages, use
visibility map (PG9.2)
• BitmapIndexScan
BitmapHeapScan 1) reduce # of accesses
using a bitmap
2) used by BRIN to
inspect block ranges
N
complexity
~O(logN)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
What’s new in PG 10 ?What’s new in PG 10 ?

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Parallelization in index scansParallelization in index scans
• parallelization is not new in PG (9.6), see G. Ciolli later
– parallel B-tree index scans
– parallel BitmapHeapScan (different areas of the heap are processed
by parallel workers)
– R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• for B-tree
– Workers inspect leaf pages in parallel
gather
node
gather
node
worker #1
worker #2
worker #N
...
• for bitmap heap scan
– Workers inspect heap chunks in parallel

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• The parameters:
– max_parallel_workers (included in max_worker_processes)
– max_parallel_workers_per_gather (included in max_parallel_workers)
– min_parallel_index_scan_size (512kB)
• heuristic: # workers / index size > 512kB * 3# workers
– parallel_setup_cost (1000.0)
– parallel_tuple_cost (0.1)
– force_parallel_mode (false)
• tune them basing on underlying HW!

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
• table/B-tree ~O(300MB)
=# CREATE TABLE test AS
=# SELECT generate_series(1,10000000) t(i);
CREATE
=# CREATE INDEX btree_idx ON test USING btree (i);
CREATE

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
When parallelization is disabled:When parallelization is disabled:
• Ex. IndexOnlyScan on B-tree:
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.433..0.434 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 1
Planning time: 0.525 ms
Execution time: 0.461 ms
(5 rows)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Setup parallel executions:
•
•
•
• Plan does not change!! Force parallelization...
=# SET max_parallel_workers TO 8;
SET
=# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers
SET
=# SET force_parallel_mode TO true;
SET

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex. IndexOnlyScan on B-tree
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Gather (cost=1000.43..1008.45 rows=1 width=4)
Workers Planned: 6
Workers Launched: 6
Single Copy: true
-> Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
Index Cond: (i = 5)
Heap Fetches: 0
(9 rows)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• try to “trick” the planner with lower tuple costs:
• the same plan is obtained – and it is still disadvantageous!
– costs parameters are (almost) always fine
– parallelization costs are sustainable in case of (real) big data
=# SET force_parallel_mode TO false;
SET
=# SET parallel_tuple_cost TO 0.01;
SET

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
8kB8kB8kB8kB
WALWALWAL
• Hash AMs did not define how index changes had to be logged into WALs:
– Hashes lived just in shared buffers – no crash safe!
– Hashes could not be phisically replicated
• Hashes AMs now include WAL logging (R. Haas, G. Ghosh,
A. Kapila,A. Sharma)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex. physical replication, with pre-existing hash index before 1st
base backup:
hot standby
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
master
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
WALWAL WALWALWALWAL

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• pre PostgreSQL 10:
hot standby
=# explain analyze select * from
=# hash_example where i = 123;
QUERY PLAN
-----------------------------------------
Index Scan using hash_idx on hash_example
(cost=0.00..8.02 rows=1 width=21)
[...]
master
=# explain analyze select * from
=# hash_example where i = 123;
ERROR: could not read block 0 in file
"base/16402/458955269": read only 0 of
8192 byte
=# SET enable_index_scan TO false;
SET
WALWAL WALWALWALWAL

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Unbalanced indexes perform better in case of inclusion searches:
– Ex. Quad-tree
&&
bbox
• H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes):
– defined the OpClass for inet to be interfaced with SP-GiST AMs
• inet_ops → && >> >>= > >= <> << <<= < <= =
– important improvement in SP-GiST AM: # of child nodes is limited

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex.
=# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '/' ||
=# (random() * 32)::int::text)::inet as addr
=# FROM generate_series(1, 1000);
CREATE
=# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops);
CREATE
=# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops);
CREATE
=# CREATE TABLE network_b AS (
=# SELECT * FROM network_a ORDER BY random() LIMIT 100);
CREATE

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex. no indexes
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.00..15032.50 rows=78724 width=14)
Join Filter: (a.addr && b.addr)
Rows Removed by Join Filter: 905027
-> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7)
-> Materialize (cost=0.00..20.00 rows=1000 width=7)
-> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7)
(8 rows)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex. GiST index
QUERY PLAN
-----------------------------------------------------------------------------------
-> Index Only Scan using gist_idx_a on network_a a
(cost=0.14..0.35 rows=10 width=7)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
(7 rows)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
• Ex. SP-GiST index
QUERY PLAN
-----------------------------------------------------------------------------------
-> Index Only Scan using spgist_idx_a on network_a a
(cost=0.14..0.37 rows=10 width=7)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
(7 rows)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
BRIN summarization for newBRIN summarization for new INSERTINSERTss
• pre PG 10: perform VACUUM, or call brin_summarize_new_value()
• NOW (Á. Herrera):
– autovacuum daemon is now able to summarize now data in present ranges:
• CREATE INDEX ON table USING brin (column) WITH (autosummarize=on);
– It is possible to summarize/desummarized single blocks (bigint):
• brin_summarize_range / brin_desummarize_range
• BRIN are (still) not able to “shrinks” summarized data
– if you update/delete boundary data, need to REINDEX

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Other features about indexesOther features about indexes
• Improve hash index performance
(A. Kapila, M. Cy, A. Sharma)
• Improve accuracy in determining if a BRIN index scan is beneficial
(D. Rowley, E. Hasegeli)
• Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently
(A. Borodin)
• Reduce page locking during vacuuming of GIN indexes
(A. Borodin)

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
The future of indexes in PostgreSQLThe future of indexes in PostgreSQL
• Allow compression/decompression AM functions in SP-GiST
OpClasses (good for PostGIS!)
• CREATE GLOBAL INDEX

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
ConclusionsConclusions
• PostgreSQL has a long tradition in indexes development
• different types for different goals
• an eye to the future

edition
Milan October, 13th
2017
Giuseppe Broccolo
Viralize.com
Creative Commons licenseCreative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-nc-sa/4.0/
© 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/

Indexes in PostgreSQL (10)

Recommended

Recommended

More Related Content

Similar to Indexes in PostgreSQL (10)

Similar to Indexes in PostgreSQL (10) (20)

Recently uploaded

Recently uploaded (20)

Indexes in PostgreSQL (10)