SlideShare a Scribd company logo
1 of 35
Download to read offline
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Indexes
in
PostgreSQL
(10)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The outlineThe outline
• Indexes in PostgreSQL
• What’s new in v10:
– Parallelism
– Hash indexing
– New supports for SP-GiST (inet data)
– Summarization of BRINs
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
~$ whoami~$ whoami
Giuseppe BroccoloGiuseppe Broccolo
- data engineer at- data engineer at
- member of- member of
@giubro
gbroccolo7
gbroccolo
gemini__81
g.broccolo.7@gmail.com
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
– allow concurrent changes (MVCC compliant)
– persist the information (WAL)
– speed up access to data:
• links to data blocks (sometimes can be avoided)
• Indexes’ blocks live in shared buffers AWA data blocks
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
WALWALWAL
sharedbuffers
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
unbalanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
• unbalanced ones are quite faster for range
searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the hashesThe default AMs – the hashes
• binary maps (k: v)
– k: the hash of the search key - bucket
– v: the address where the key is stored
– just one kind of search: =
– complexity:
• ~O(1)
– like trees, their sizes are comparable with
the indexed dataset
• ~O(N)
search key
k: value...
hashing
N
complexity
~O(logN)
...
~O(1)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the BRINsThe default AMs – the BRINs
• Block Range Indexes:
– À. Herrera, S. Riggs, H. Linnakangas (PG 9.5)
– Range: summarization of adjacent-on-disk blocks
– complexity:
• ~O(N/K), K~10/100
• really small indexes,faster creation
• ~O(N/K’), K’~1000/10000
• can be used for low-selectivity queries
• low performance for “dynamic” data
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
range 0 range 1 range 2 range 3
range 7range 6range 5range 4
Summarization:
blk n. xxxxx
range X blk n. yyyyy
blk n. zzzzz
......
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMsThe default AMs
• B-tree, GIN, GiST, SP-GiST, Hash, BRIN
• can add user defined new access methods
– fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD
sortable generalized
balanced unbalanced
trees
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses
• access methods use operator classes (opclass)
•
•
•
• define:
– operators for the needed types
– support functions depending on the access method
• can be extended to specific datatypes
CREATE INDEX idx_name
USING method
ON table (column opclass_name)
WITH (opt=value);
• CREATE OPERATOR CLASS opclass_name
FOR TYPE datatype
USING method
OPERATOR $$(),
[...],
FUNCTION func1(),
[...]
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Execution plansExecution plans
• IndexScan need to inspect data
pages for row visibility
• IndexOnlyScan just index pages, use
visibility map (PG9.2)
• BitmapIndexScan
BitmapHeapScan 1) reduce # of accesses
using a bitmap
2) used by BRIN to
inspect block ranges
N
complexity
~O(logN)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
What’s new in PG 10 ?What’s new in PG 10 ?
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in index scansParallelization in index scans
• parallelization is not new in PG (9.6), see G. Ciolli later
– parallel B-tree index scans
– parallel BitmapHeapScan (different areas of the heap are processed
by parallel workers)
– R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• for B-tree
– Workers inspect leaf pages in parallel
gather
node
gather
node
worker #1
worker #2
worker #N
...
• for bitmap heap scan
– Workers inspect heap chunks in parallel
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• The parameters:
– max_parallel_workers (included in max_worker_processes)
– max_parallel_workers_per_gather (included in max_parallel_workers)
– min_parallel_index_scan_size (512kB)
• heuristic: # workers / index size > 512kB * 3# workers
– parallel_setup_cost (1000.0)
– parallel_tuple_cost (0.1)
– force_parallel_mode (false)
• tune them basing on underlying HW!
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
• table/B-tree ~O(300MB)
=# CREATE TABLE test AS
=# SELECT generate_series(1,10000000) t(i);
CREATE
=# CREATE INDEX btree_idx ON test USING btree (i);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When parallelization is disabled:When parallelization is disabled:
• Ex. IndexOnlyScan on B-tree:
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.433..0.434 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 1
Planning time: 0.525 ms
Execution time: 0.461 ms
(5 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Setup parallel executions:
•
•
•
• Plan does not change!! Force parallelization...
=# SET max_parallel_workers TO 8;
SET
=# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers
SET
=# SET force_parallel_mode TO true;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Gather (cost=1000.43..1008.45 rows=1 width=4)
(actual time=2.523..2.579 rows=1 loops=1)
Workers Planned: 6
Workers Launched: 6
Single Copy: true
-> Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.030..0.032 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 0
Planning time: 0.063 ms
Execution time: 3.934 ms
(9 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• try to “trick” the planner with lower tuple costs:
• the same plan is obtained – and it is still disadvantageous!
– costs parameters are (almost) always fine
– parallelization costs are sustainable in case of (real) big data
=# SET force_parallel_mode TO false;
SET
=# SET parallel_tuple_cost TO 0.01;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
8kB8kB8kB8kB
WALWALWAL
• Hash AMs did not define how index changes had to be logged into WALs:
– Hashes lived just in shared buffers – no crash safe!
– Hashes could not be phisically replicated
• Hashes AMs now include WAL logging (R. Haas, G. Ghosh,
A. Kapila,A. Sharma)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• Ex. physical replication, with pre-existing hash index before 1st
base backup:
hot standby
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
master
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• pre PostgreSQL 10:
hot standby
=# explain analyze select * from
=# hash_example where i = 123;
QUERY PLAN
-----------------------------------------
Index Scan using hash_idx on hash_example
(cost=0.00..8.02 rows=1 width=21)
(actual time=1.526..1.529 rows=1 loops=1)
[...]
master
=# explain analyze select * from
=# hash_example where i = 123;
ERROR: could not read block 0 in file
"base/16402/458955269": read only 0 of
8192 byte
=# SET enable_index_scan TO false;
SET
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Unbalanced indexes perform better in case of inclusion searches:
– Ex. Quad-tree
&&
bbox
• H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes):
– defined the OpClass for inet to be interfaced with SP-GiST AMs
• inet_ops → && >> >>= > >= <> << <<= < <= =
– important improvement in SP-GiST AM: # of child nodes is limited
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex.
=# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '/' ||
=# (random() * 32)::int::text)::inet as addr
=# FROM generate_series(1, 1000);
CREATE
=# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops);
CREATE
=# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops);
CREATE
=# CREATE TABLE network_b AS (
=# SELECT * FROM network_a ORDER BY random() LIMIT 100);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. no indexes
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.00..15032.50 rows=78724 width=14)
(actual time=0.017..185.134 rows=94973 loops=1)
Join Filter: (a.addr && b.addr)
Rows Removed by Join Filter: 905027
-> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.008..0.187 rows=1000 loops=1)
-> Materialize (cost=0.00..20.00 rows=1000 width=7)
(actual time=0.000..0.061 rows=1000 loops=1000)
-> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.005..0.083 rows=1000 loops=1)
Planning time: 0.522 ms
Execution time: 190.120 ms
(8 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..631.40 rows=13600 width=39)
(actual time=0.048..112.023 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.016..0.153 rows=1000 loops=1)
-> Index Only Scan using gist_idx_a on network_a a
(cost=0.14..0.35 rows=10 width=7)
(actual time=0.018..0.093 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.111 ms
Execution time: 119.433 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. SP-GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..667.40 rows=13600 width=39)
(actual time=0.034..58.196 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.009..0.105 rows=1000 loops=1)
-> Index Only Scan using spgist_idx_a on network_a a
(cost=0.14..0.37 rows=10 width=7)
(actual time=0.008..0.042 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.109 ms
Execution time: 63.562 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
BRIN summarization for newBRIN summarization for new INSERTINSERTss
• pre PG 10: perform VACUUM, or call brin_summarize_new_value()
• NOW (Á. Herrera):
– autovacuum daemon is now able to summarize now data in present ranges:
• CREATE INDEX ON table USING brin (column) WITH (autosummarize=on);
– It is possible to summarize/desummarized single blocks (bigint):
• brin_summarize_range / brin_desummarize_range
• BRIN are (still) not able to “shrinks” summarized data
– if you update/delete boundary data, need to REINDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Other features about indexesOther features about indexes
• Improve hash index performance
(A. Kapila, M. Cy, A. Sharma)
• Improve accuracy in determining if a BRIN index scan is beneficial
(D. Rowley, E. Hasegeli)
• Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently
(A. Borodin)
• Reduce page locking during vacuuming of GIN indexes
(A. Borodin)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The future of indexes in PostgreSQLThe future of indexes in PostgreSQL
• Allow compression/decompression AM functions in SP-GiST
OpClasses (good for PostGIS!)
• CREATE GLOBAL INDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
ConclusionsConclusions
• PostgreSQL has a long tradition in indexes development
• different types for different goals
• an eye to the future
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Creative Commons licenseCreative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-nc-sa/4.0/
© 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/

More Related Content

Similar to Indexes in PostgreSQL (10)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobilitySilvia Pichler
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchTaswar Bhatti
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with PythonGiuseppe Broccolo
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Beat Signer
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)Dina Goldshtein
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...Hyung-Gyu Ryoo
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Beat Signer
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Fabio Benedetti
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML, Inc
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasyPier Carlo Chiodi
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoSolutions
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhUlrik Lyngs
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj Vysvader
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Codemotion
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 

Similar to Indexes in PostgreSQL (10) (20)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobility
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong Huynh
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CV
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Indexes in PostgreSQL (10)

  • 1. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Indexes in PostgreSQL (10)
  • 2. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The outlineThe outline • Indexes in PostgreSQL • What’s new in v10: – Parallelism – Hash indexing – New supports for SP-GiST (inet data) – Summarization of BRINs
  • 3. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ~$ whoami~$ whoami Giuseppe BroccoloGiuseppe Broccolo - data engineer at- data engineer at - member of- member of @giubro gbroccolo7 gbroccolo gemini__81 g.broccolo.7@gmail.com
  • 4. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods
  • 5. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods – allow concurrent changes (MVCC compliant) – persist the information (WAL) – speed up access to data: • links to data blocks (sometimes can be avoided) • Indexes’ blocks live in shared buffers AWA data blocks 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB WALWALWAL sharedbuffers
  • 6. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→
  • 7. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches
  • 8. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced unbalanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches • unbalanced ones are quite faster for range searches
  • 9. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the hashesThe default AMs – the hashes • binary maps (k: v) – k: the hash of the search key - bucket – v: the address where the key is stored – just one kind of search: = – complexity: • ~O(1) – like trees, their sizes are comparable with the indexed dataset • ~O(N) search key k: value... hashing N complexity ~O(logN) ... ~O(1)
  • 10. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the BRINsThe default AMs – the BRINs • Block Range Indexes: – À. Herrera, S. Riggs, H. Linnakangas (PG 9.5) – Range: summarization of adjacent-on-disk blocks – complexity: • ~O(N/K), K~10/100 • really small indexes,faster creation • ~O(N/K’), K’~1000/10000 • can be used for low-selectivity queries • low performance for “dynamic” data 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB range 0 range 1 range 2 range 3 range 7range 6range 5range 4 Summarization: blk n. xxxxx range X blk n. yyyyy blk n. zzzzz ......
  • 11. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMsThe default AMs • B-tree, GIN, GiST, SP-GiST, Hash, BRIN • can add user defined new access methods – fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant) • CREATE ACCESS METHOD sortable generalized balanced unbalanced trees
  • 12. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses • access methods use operator classes (opclass) • • • • define: – operators for the needed types – support functions depending on the access method • can be extended to specific datatypes CREATE INDEX idx_name USING method ON table (column opclass_name) WITH (opt=value); • CREATE OPERATOR CLASS opclass_name FOR TYPE datatype USING method OPERATOR $$(), [...], FUNCTION func1(), [...]
  • 13. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Execution plansExecution plans • IndexScan need to inspect data pages for row visibility • IndexOnlyScan just index pages, use visibility map (PG9.2) • BitmapIndexScan BitmapHeapScan 1) reduce # of accesses using a bitmap 2) used by BRIN to inspect block ranges N complexity ~O(logN)
  • 14. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com What’s new in PG 10 ?What’s new in PG 10 ?
  • 15. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in index scansParallelization in index scans • parallelization is not new in PG (9.6), see G. Ciolli later – parallel B-tree index scans – parallel BitmapHeapScan (different areas of the heap are processed by parallel workers) – R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
  • 16. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • for B-tree – Workers inspect leaf pages in parallel gather node gather node worker #1 worker #2 worker #N ... • for bitmap heap scan – Workers inspect heap chunks in parallel
  • 17. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • The parameters: – max_parallel_workers (included in max_worker_processes) – max_parallel_workers_per_gather (included in max_parallel_workers) – min_parallel_index_scan_size (512kB) • heuristic: # workers / index size > 512kB * 3# workers – parallel_setup_cost (1000.0) – parallel_tuple_cost (0.1) – force_parallel_mode (false) • tune them basing on underlying HW!
  • 18. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree • table/B-tree ~O(300MB) =# CREATE TABLE test AS =# SELECT generate_series(1,10000000) t(i); CREATE =# CREATE INDEX btree_idx ON test USING btree (i); CREATE
  • 19. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When parallelization is disabled:When parallelization is disabled: • Ex. IndexOnlyScan on B-tree: =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.433..0.434 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 1 Planning time: 0.525 ms Execution time: 0.461 ms (5 rows)
  • 20. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Setup parallel executions: • • • • Plan does not change!! Force parallelization... =# SET max_parallel_workers TO 8; SET =# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers SET =# SET force_parallel_mode TO true; SET
  • 21. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Gather (cost=1000.43..1008.45 rows=1 width=4) (actual time=2.523..2.579 rows=1 loops=1) Workers Planned: 6 Workers Launched: 6 Single Copy: true -> Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.030..0.032 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 0 Planning time: 0.063 ms Execution time: 3.934 ms (9 rows)
  • 22. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • try to “trick” the planner with lower tuple costs: • the same plan is obtained – and it is still disadvantageous! – costs parameters are (almost) always fine – parallelization costs are sustainable in case of (real) big data =# SET force_parallel_mode TO false; SET =# SET parallel_tuple_cost TO 0.01; SET
  • 23. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! 8kB8kB8kB8kB WALWALWAL • Hash AMs did not define how index changes had to be logged into WALs: – Hashes lived just in shared buffers – no crash safe! – Hashes could not be phisically replicated • Hashes AMs now include WAL logging (R. Haas, G. Ghosh, A. Kapila,A. Sharma)
  • 24. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • Ex. physical replication, with pre-existing hash index before 1st base backup: hot standby =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) master =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) WALWAL WALWALWALWAL
  • 25. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • pre PostgreSQL 10: hot standby =# explain analyze select * from =# hash_example where i = 123; QUERY PLAN ----------------------------------------- Index Scan using hash_idx on hash_example (cost=0.00..8.02 rows=1 width=21) (actual time=1.526..1.529 rows=1 loops=1) [...] master =# explain analyze select * from =# hash_example where i = 123; ERROR: could not read block 0 in file "base/16402/458955269": read only 0 of 8192 byte =# SET enable_index_scan TO false; SET WALWAL WALWALWALWAL
  • 26. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Unbalanced indexes perform better in case of inclusion searches: – Ex. Quad-tree && bbox • H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes): – defined the OpClass for inet to be interfaced with SP-GiST AMs • inet_ops → && >> >>= > >= <> << <<= < <= = – important improvement in SP-GiST AM: # of child nodes is limited
  • 27. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. =# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '/' || =# (random() * 32)::int::text)::inet as addr =# FROM generate_series(1, 1000); CREATE =# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops); CREATE =# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops); CREATE =# CREATE TABLE network_b AS ( =# SELECT * FROM network_a ORDER BY random() LIMIT 100); CREATE
  • 28. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. no indexes =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.00..15032.50 rows=78724 width=14) (actual time=0.017..185.134 rows=94973 loops=1) Join Filter: (a.addr && b.addr) Rows Removed by Join Filter: 905027 -> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7) (actual time=0.008..0.187 rows=1000 loops=1) -> Materialize (cost=0.00..20.00 rows=1000 width=7) (actual time=0.000..0.061 rows=1000 loops=1000) -> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7) (actual time=0.005..0.083 rows=1000 loops=1) Planning time: 0.522 ms Execution time: 190.120 ms (8 rows)
  • 29. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..631.40 rows=13600 width=39) (actual time=0.048..112.023 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.153 rows=1000 loops=1) -> Index Only Scan using gist_idx_a on network_a a (cost=0.14..0.35 rows=10 width=7) (actual time=0.018..0.093 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.111 ms Execution time: 119.433 ms (7 rows)
  • 30. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. SP-GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..667.40 rows=13600 width=39) (actual time=0.034..58.196 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.009..0.105 rows=1000 loops=1) -> Index Only Scan using spgist_idx_a on network_a a (cost=0.14..0.37 rows=10 width=7) (actual time=0.008..0.042 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.109 ms Execution time: 63.562 ms (7 rows)
  • 31. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com BRIN summarization for newBRIN summarization for new INSERTINSERTss • pre PG 10: perform VACUUM, or call brin_summarize_new_value() • NOW (Á. Herrera): – autovacuum daemon is now able to summarize now data in present ranges: • CREATE INDEX ON table USING brin (column) WITH (autosummarize=on); – It is possible to summarize/desummarized single blocks (bigint): • brin_summarize_range / brin_desummarize_range • BRIN are (still) not able to “shrinks” summarized data – if you update/delete boundary data, need to REINDEX
  • 32. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Other features about indexesOther features about indexes • Improve hash index performance (A. Kapila, M. Cy, A. Sharma) • Improve accuracy in determining if a BRIN index scan is beneficial (D. Rowley, E. Hasegeli) • Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently (A. Borodin) • Reduce page locking during vacuuming of GIN indexes (A. Borodin)
  • 33. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The future of indexes in PostgreSQLThe future of indexes in PostgreSQL • Allow compression/decompression AM functions in SP-GiST OpClasses (good for PostGIS!) • CREATE GLOBAL INDEX
  • 34. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ConclusionsConclusions • PostgreSQL has a long tradition in indexes development • different types for different goals • an eye to the future
  • 35. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Creative Commons licenseCreative Commons license This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/ © 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/