SlideShare a Scribd company logo
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Indexes
in
PostgreSQL
(10)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The outlineThe outline
• Indexes in PostgreSQL
• What’s new in v10:
– Parallelism
– Hash indexing
– New supports for SP-GiST (inet data)
– Summarization of BRINs
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
~$ whoami~$ whoami
Giuseppe BroccoloGiuseppe Broccolo
- data engineer at- data engineer at
- member of- member of
@giubro
gbroccolo7
gbroccolo
gemini__81
g.broccolo.7@gmail.com
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
– allow concurrent changes (MVCC compliant)
– persist the information (WAL)
– speed up access to data:
• links to data blocks (sometimes can be avoided)
• Indexes’ blocks live in shared buffers AWA data blocks
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
WALWALWAL
sharedbuffers
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
unbalanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
• unbalanced ones are quite faster for range
searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the hashesThe default AMs – the hashes
• binary maps (k: v)
– k: the hash of the search key - bucket
– v: the address where the key is stored
– just one kind of search: =
– complexity:
• ~O(1)
– like trees, their sizes are comparable with
the indexed dataset
• ~O(N)
search key
k: value...
hashing
N
complexity
~O(logN)
...
~O(1)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the BRINsThe default AMs – the BRINs
• Block Range Indexes:
– À. Herrera, S. Riggs, H. Linnakangas (PG 9.5)
– Range: summarization of adjacent-on-disk blocks
– complexity:
• ~O(N/K), K~10/100
• really small indexes,faster creation
• ~O(N/K’), K’~1000/10000
• can be used for low-selectivity queries
• low performance for “dynamic” data
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
range 0 range 1 range 2 range 3
range 7range 6range 5range 4
Summarization:
blk n. xxxxx
range X blk n. yyyyy
blk n. zzzzz
......
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMsThe default AMs
• B-tree, GIN, GiST, SP-GiST, Hash, BRIN
• can add user defined new access methods
– fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD
sortable generalized
balanced unbalanced
trees
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses
• access methods use operator classes (opclass)
•
•
•
• define:
– operators for the needed types
– support functions depending on the access method
• can be extended to specific datatypes
CREATE INDEX idx_name
USING method
ON table (column opclass_name)
WITH (opt=value);
• CREATE OPERATOR CLASS opclass_name
FOR TYPE datatype
USING method
OPERATOR $$(),
[...],
FUNCTION func1(),
[...]
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Execution plansExecution plans
• IndexScan need to inspect data
pages for row visibility
• IndexOnlyScan just index pages, use
visibility map (PG9.2)
• BitmapIndexScan
BitmapHeapScan 1) reduce # of accesses
using a bitmap
2) used by BRIN to
inspect block ranges
N
complexity
~O(logN)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
What’s new in PG 10 ?What’s new in PG 10 ?
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in index scansParallelization in index scans
• parallelization is not new in PG (9.6), see G. Ciolli later
– parallel B-tree index scans
– parallel BitmapHeapScan (different areas of the heap are processed
by parallel workers)
– R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• for B-tree
– Workers inspect leaf pages in parallel
gather
node
gather
node
worker #1
worker #2
worker #N
...
• for bitmap heap scan
– Workers inspect heap chunks in parallel
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• The parameters:
– max_parallel_workers (included in max_worker_processes)
– max_parallel_workers_per_gather (included in max_parallel_workers)
– min_parallel_index_scan_size (512kB)
• heuristic: # workers / index size > 512kB * 3# workers
– parallel_setup_cost (1000.0)
– parallel_tuple_cost (0.1)
– force_parallel_mode (false)
• tune them basing on underlying HW!
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
• table/B-tree ~O(300MB)
=# CREATE TABLE test AS
=# SELECT generate_series(1,10000000) t(i);
CREATE
=# CREATE INDEX btree_idx ON test USING btree (i);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When parallelization is disabled:When parallelization is disabled:
• Ex. IndexOnlyScan on B-tree:
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.433..0.434 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 1
Planning time: 0.525 ms
Execution time: 0.461 ms
(5 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Setup parallel executions:
•
•
•
• Plan does not change!! Force parallelization...
=# SET max_parallel_workers TO 8;
SET
=# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers
SET
=# SET force_parallel_mode TO true;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Gather (cost=1000.43..1008.45 rows=1 width=4)
(actual time=2.523..2.579 rows=1 loops=1)
Workers Planned: 6
Workers Launched: 6
Single Copy: true
-> Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.030..0.032 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 0
Planning time: 0.063 ms
Execution time: 3.934 ms
(9 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• try to “trick” the planner with lower tuple costs:
• the same plan is obtained – and it is still disadvantageous!
– costs parameters are (almost) always fine
– parallelization costs are sustainable in case of (real) big data
=# SET force_parallel_mode TO false;
SET
=# SET parallel_tuple_cost TO 0.01;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
8kB8kB8kB8kB
WALWALWAL
• Hash AMs did not define how index changes had to be logged into WALs:
– Hashes lived just in shared buffers – no crash safe!
– Hashes could not be phisically replicated
• Hashes AMs now include WAL logging (R. Haas, G. Ghosh,
A. Kapila,A. Sharma)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• Ex. physical replication, with pre-existing hash index before 1st
base backup:
hot standby
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
master
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• pre PostgreSQL 10:
hot standby
=# explain analyze select * from
=# hash_example where i = 123;
QUERY PLAN
-----------------------------------------
Index Scan using hash_idx on hash_example
(cost=0.00..8.02 rows=1 width=21)
(actual time=1.526..1.529 rows=1 loops=1)
[...]
master
=# explain analyze select * from
=# hash_example where i = 123;
ERROR: could not read block 0 in file
"base/16402/458955269": read only 0 of
8192 byte
=# SET enable_index_scan TO false;
SET
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Unbalanced indexes perform better in case of inclusion searches:
– Ex. Quad-tree
&&
bbox
• H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes):
– defined the OpClass for inet to be interfaced with SP-GiST AMs
• inet_ops → && >> >>= > >= <> << <<= < <= =
– important improvement in SP-GiST AM: # of child nodes is limited
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex.
=# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '/' ||
=# (random() * 32)::int::text)::inet as addr
=# FROM generate_series(1, 1000);
CREATE
=# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops);
CREATE
=# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops);
CREATE
=# CREATE TABLE network_b AS (
=# SELECT * FROM network_a ORDER BY random() LIMIT 100);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. no indexes
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.00..15032.50 rows=78724 width=14)
(actual time=0.017..185.134 rows=94973 loops=1)
Join Filter: (a.addr && b.addr)
Rows Removed by Join Filter: 905027
-> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.008..0.187 rows=1000 loops=1)
-> Materialize (cost=0.00..20.00 rows=1000 width=7)
(actual time=0.000..0.061 rows=1000 loops=1000)
-> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.005..0.083 rows=1000 loops=1)
Planning time: 0.522 ms
Execution time: 190.120 ms
(8 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..631.40 rows=13600 width=39)
(actual time=0.048..112.023 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.016..0.153 rows=1000 loops=1)
-> Index Only Scan using gist_idx_a on network_a a
(cost=0.14..0.35 rows=10 width=7)
(actual time=0.018..0.093 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.111 ms
Execution time: 119.433 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. SP-GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..667.40 rows=13600 width=39)
(actual time=0.034..58.196 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.009..0.105 rows=1000 loops=1)
-> Index Only Scan using spgist_idx_a on network_a a
(cost=0.14..0.37 rows=10 width=7)
(actual time=0.008..0.042 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.109 ms
Execution time: 63.562 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
BRIN summarization for newBRIN summarization for new INSERTINSERTss
• pre PG 10: perform VACUUM, or call brin_summarize_new_value()
• NOW (Á. Herrera):
– autovacuum daemon is now able to summarize now data in present ranges:
• CREATE INDEX ON table USING brin (column) WITH (autosummarize=on);
– It is possible to summarize/desummarized single blocks (bigint):
• brin_summarize_range / brin_desummarize_range
• BRIN are (still) not able to “shrinks” summarized data
– if you update/delete boundary data, need to REINDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Other features about indexesOther features about indexes
• Improve hash index performance
(A. Kapila, M. Cy, A. Sharma)
• Improve accuracy in determining if a BRIN index scan is beneficial
(D. Rowley, E. Hasegeli)
• Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently
(A. Borodin)
• Reduce page locking during vacuuming of GIN indexes
(A. Borodin)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The future of indexes in PostgreSQLThe future of indexes in PostgreSQL
• Allow compression/decompression AM functions in SP-GiST
OpClasses (good for PostGIS!)
• CREATE GLOBAL INDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
ConclusionsConclusions
• PostgreSQL has a long tradition in indexes development
• different types for different goals
• an eye to the future
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Creative Commons licenseCreative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-nc-sa/4.0/
© 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/

More Related Content

Similar to Indexes in PostgreSQL (10)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
pgdayrussia
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobility
Silvia Pichler
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
Taswar Bhatti
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
Giuseppe Broccolo
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Beat Signer
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
Dina Goldshtein
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
Hyung-Gyu Ryoo
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Beat Signer
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Fabio Benedetti
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
Rim Moussa
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
Rim Moussa
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
BigML, Inc
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Pier Carlo Chiodi
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!
GeoSolutions
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong Huynh
Ulrik Lyngs
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CV
Juraj Vysvader
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Codemotion
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
Tao Xie
 

Similar to Indexes in PostgreSQL (10) (20)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobility
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong Huynh
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CV
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

Indexes in PostgreSQL (10)

  • 1. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Indexes in PostgreSQL (10)
  • 2. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The outlineThe outline • Indexes in PostgreSQL • What’s new in v10: – Parallelism – Hash indexing – New supports for SP-GiST (inet data) – Summarization of BRINs
  • 3. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ~$ whoami~$ whoami Giuseppe BroccoloGiuseppe Broccolo - data engineer at- data engineer at - member of- member of @giubro gbroccolo7 gbroccolo gemini__81 g.broccolo.7@gmail.com
  • 4. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods
  • 5. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods – allow concurrent changes (MVCC compliant) – persist the information (WAL) – speed up access to data: • links to data blocks (sometimes can be avoided) • Indexes’ blocks live in shared buffers AWA data blocks 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB WALWALWAL sharedbuffers
  • 6. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→
  • 7. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches
  • 8. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced unbalanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches • unbalanced ones are quite faster for range searches
  • 9. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the hashesThe default AMs – the hashes • binary maps (k: v) – k: the hash of the search key - bucket – v: the address where the key is stored – just one kind of search: = – complexity: • ~O(1) – like trees, their sizes are comparable with the indexed dataset • ~O(N) search key k: value... hashing N complexity ~O(logN) ... ~O(1)
  • 10. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the BRINsThe default AMs – the BRINs • Block Range Indexes: – À. Herrera, S. Riggs, H. Linnakangas (PG 9.5) – Range: summarization of adjacent-on-disk blocks – complexity: • ~O(N/K), K~10/100 • really small indexes,faster creation • ~O(N/K’), K’~1000/10000 • can be used for low-selectivity queries • low performance for “dynamic” data 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB range 0 range 1 range 2 range 3 range 7range 6range 5range 4 Summarization: blk n. xxxxx range X blk n. yyyyy blk n. zzzzz ......
  • 11. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMsThe default AMs • B-tree, GIN, GiST, SP-GiST, Hash, BRIN • can add user defined new access methods – fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant) • CREATE ACCESS METHOD sortable generalized balanced unbalanced trees
  • 12. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses • access methods use operator classes (opclass) • • • • define: – operators for the needed types – support functions depending on the access method • can be extended to specific datatypes CREATE INDEX idx_name USING method ON table (column opclass_name) WITH (opt=value); • CREATE OPERATOR CLASS opclass_name FOR TYPE datatype USING method OPERATOR $$(), [...], FUNCTION func1(), [...]
  • 13. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Execution plansExecution plans • IndexScan need to inspect data pages for row visibility • IndexOnlyScan just index pages, use visibility map (PG9.2) • BitmapIndexScan BitmapHeapScan 1) reduce # of accesses using a bitmap 2) used by BRIN to inspect block ranges N complexity ~O(logN)
  • 14. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com What’s new in PG 10 ?What’s new in PG 10 ?
  • 15. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in index scansParallelization in index scans • parallelization is not new in PG (9.6), see G. Ciolli later – parallel B-tree index scans – parallel BitmapHeapScan (different areas of the heap are processed by parallel workers) – R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
  • 16. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • for B-tree – Workers inspect leaf pages in parallel gather node gather node worker #1 worker #2 worker #N ... • for bitmap heap scan – Workers inspect heap chunks in parallel
  • 17. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • The parameters: – max_parallel_workers (included in max_worker_processes) – max_parallel_workers_per_gather (included in max_parallel_workers) – min_parallel_index_scan_size (512kB) • heuristic: # workers / index size > 512kB * 3# workers – parallel_setup_cost (1000.0) – parallel_tuple_cost (0.1) – force_parallel_mode (false) • tune them basing on underlying HW!
  • 18. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree • table/B-tree ~O(300MB) =# CREATE TABLE test AS =# SELECT generate_series(1,10000000) t(i); CREATE =# CREATE INDEX btree_idx ON test USING btree (i); CREATE
  • 19. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When parallelization is disabled:When parallelization is disabled: • Ex. IndexOnlyScan on B-tree: =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.433..0.434 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 1 Planning time: 0.525 ms Execution time: 0.461 ms (5 rows)
  • 20. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Setup parallel executions: • • • • Plan does not change!! Force parallelization... =# SET max_parallel_workers TO 8; SET =# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers SET =# SET force_parallel_mode TO true; SET
  • 21. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Gather (cost=1000.43..1008.45 rows=1 width=4) (actual time=2.523..2.579 rows=1 loops=1) Workers Planned: 6 Workers Launched: 6 Single Copy: true -> Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.030..0.032 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 0 Planning time: 0.063 ms Execution time: 3.934 ms (9 rows)
  • 22. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • try to “trick” the planner with lower tuple costs: • the same plan is obtained – and it is still disadvantageous! – costs parameters are (almost) always fine – parallelization costs are sustainable in case of (real) big data =# SET force_parallel_mode TO false; SET =# SET parallel_tuple_cost TO 0.01; SET
  • 23. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! 8kB8kB8kB8kB WALWALWAL • Hash AMs did not define how index changes had to be logged into WALs: – Hashes lived just in shared buffers – no crash safe! – Hashes could not be phisically replicated • Hashes AMs now include WAL logging (R. Haas, G. Ghosh, A. Kapila,A. Sharma)
  • 24. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • Ex. physical replication, with pre-existing hash index before 1st base backup: hot standby =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) master =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) WALWAL WALWALWALWAL
  • 25. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • pre PostgreSQL 10: hot standby =# explain analyze select * from =# hash_example where i = 123; QUERY PLAN ----------------------------------------- Index Scan using hash_idx on hash_example (cost=0.00..8.02 rows=1 width=21) (actual time=1.526..1.529 rows=1 loops=1) [...] master =# explain analyze select * from =# hash_example where i = 123; ERROR: could not read block 0 in file "base/16402/458955269": read only 0 of 8192 byte =# SET enable_index_scan TO false; SET WALWAL WALWALWALWAL
  • 26. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Unbalanced indexes perform better in case of inclusion searches: – Ex. Quad-tree && bbox • H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes): – defined the OpClass for inet to be interfaced with SP-GiST AMs • inet_ops → && >> >>= > >= <> << <<= < <= = – important improvement in SP-GiST AM: # of child nodes is limited
  • 27. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. =# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '/' || =# (random() * 32)::int::text)::inet as addr =# FROM generate_series(1, 1000); CREATE =# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops); CREATE =# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops); CREATE =# CREATE TABLE network_b AS ( =# SELECT * FROM network_a ORDER BY random() LIMIT 100); CREATE
  • 28. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. no indexes =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.00..15032.50 rows=78724 width=14) (actual time=0.017..185.134 rows=94973 loops=1) Join Filter: (a.addr && b.addr) Rows Removed by Join Filter: 905027 -> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7) (actual time=0.008..0.187 rows=1000 loops=1) -> Materialize (cost=0.00..20.00 rows=1000 width=7) (actual time=0.000..0.061 rows=1000 loops=1000) -> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7) (actual time=0.005..0.083 rows=1000 loops=1) Planning time: 0.522 ms Execution time: 190.120 ms (8 rows)
  • 29. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..631.40 rows=13600 width=39) (actual time=0.048..112.023 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.153 rows=1000 loops=1) -> Index Only Scan using gist_idx_a on network_a a (cost=0.14..0.35 rows=10 width=7) (actual time=0.018..0.093 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.111 ms Execution time: 119.433 ms (7 rows)
  • 30. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. SP-GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..667.40 rows=13600 width=39) (actual time=0.034..58.196 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.009..0.105 rows=1000 loops=1) -> Index Only Scan using spgist_idx_a on network_a a (cost=0.14..0.37 rows=10 width=7) (actual time=0.008..0.042 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.109 ms Execution time: 63.562 ms (7 rows)
  • 31. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com BRIN summarization for newBRIN summarization for new INSERTINSERTss • pre PG 10: perform VACUUM, or call brin_summarize_new_value() • NOW (Á. Herrera): – autovacuum daemon is now able to summarize now data in present ranges: • CREATE INDEX ON table USING brin (column) WITH (autosummarize=on); – It is possible to summarize/desummarized single blocks (bigint): • brin_summarize_range / brin_desummarize_range • BRIN are (still) not able to “shrinks” summarized data – if you update/delete boundary data, need to REINDEX
  • 32. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Other features about indexesOther features about indexes • Improve hash index performance (A. Kapila, M. Cy, A. Sharma) • Improve accuracy in determining if a BRIN index scan is beneficial (D. Rowley, E. Hasegeli) • Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently (A. Borodin) • Reduce page locking during vacuuming of GIN indexes (A. Borodin)
  • 33. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The future of indexes in PostgreSQLThe future of indexes in PostgreSQL • Allow compression/decompression AM functions in SP-GiST OpClasses (good for PostGIS!) • CREATE GLOBAL INDEX
  • 34. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ConclusionsConclusions • PostgreSQL has a long tradition in indexes development • different types for different goals • an eye to the future
  • 35. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Creative Commons licenseCreative Commons license This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/ © 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/