SlideShare a Scribd company logo
1 of 73
Download to read offline
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
PostgreSQL Indexing
Dublin, 2013
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Scope of this session:
- What a basic index does
- The PostgreSQL optimizer (cost model)
- Classical B-tree Indexes
- Partial / functional indexes
- Different types of indexes
- Full-Text-Search
- Fuzzy matching
- Writing your own indexing strategy
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Generating test data:
- for the purpose of this session we need a
table consisting of two columns:
test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name) VALUES ('hans');
INSERT 0 1
test=# INSERT INTO t_test (name) VALUES ('paul');
INSERT 0 1
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us see, how PostgreSQL executes a simple query:
test=# SELECT count(*) FROM t_test;
count
---------
4194304
(1 row)
Time: 431.192 ms
test=# explain analyze SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(actual time=0.013..531.448 rows=4194304 loops=1)
Total runtime: 977.917 ms
(3 rows)
Time: 1045.065 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us add a filter:
test=# SELECT count(*) FROM t_test WHERE id = 421234;
count
-------
1
(1 row)
Time: 476.965 ms
test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0)
(actual time=53.405..495.126 rows=1 loops=1)
Filter: (id = 421234)
Rows Removed by Filter: 4194303
Total runtime: 495.175 ms
(5 rows)
Time: 520.659 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Sequentially reading data:
- In case you like reading the phone book sequentially
we are basically done.
- Sequentially reading the phone book is technically ok
=> but socially not accepted
- Defining an index is the desired solution
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Creating an index
test=# h CREATE INDEX
Command: CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ]
ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ]
[ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
- At the end of the day all clauses will be
covered by this training
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A typical index:
test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX
Time: 7357.663 ms
- This gives us a standard btree index
- PostgreSQL provides “High-Concurrency B-Trees”
(Lehman-Yao, 1981)
- Many people can modify the index at the same time
- Highly efficient B+ tree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- How a btree works:
8k
Root Node
...
Sorted
...
Forward chaining
Tabelle
Index
8k ...
Row
linp
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Indexing is beneficial
test=# explain analyze SELECT count(*)
FROM t_test
WHERE id = 421234;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate (cost=8.73..8.74 rows=1 width=0)
(actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0)
(actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (id = 421234)
Heap Fetches: 1
Total runtime: 0.057 ms
(5 rows)
Time: 0.395 ms
- A lot faster :).
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Still slow ...
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 787.407 ms
- This is still slow. Let us create an index ...
test=# CREATE INDEX idx_name ON t_test (name);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The benefit is exactly zero:
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 782.443 ms
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
- The index won't be used
- Too many identical values (“not selective”)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The cost is far from zero:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
177 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_id'));
pg_size_pretty
----------------
90 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
90 MB
(1 row)
- Indexes need a fair amount of space
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Input values DO make a difference:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.74..7.75 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- PostgreSQL will decide depending on the input value
=> cost based optimization
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
- In our example the index is only used in case
of rare or non-existing values
- What is the point of an index when its entire
content is totally useless?
=> a more selective strategy is needed
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
test=# DROP INDEX idx_name;
DROP INDEX
test=# CREATE INDEX idx_name ON t_test (name)
WHERE name NOT IN ('hans', 'paul');
CREATE INDEX
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
8192 bytes
(1 row)
- A partial index reduces space consumption
- Benefit is still the same
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Equal benefit – lower cost:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.28..7.29 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- This is exactly the same as before !
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- What about functions?
test=# CREATE INDEX idx_cos ON t_test ( cos(id) );
CREATE INDEX
Time: 16867.228 ms
test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17;
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=23960.99..23961.00 rows=1 width=0)
-> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0)
Recheck Cond: (cos((id)::double precision) = 17::double precision)
-> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0)
Index Cond: (cos((id)::double precision) = 17::double precision)
(5 rows)
- PostgreSQL provides functional indexes
- VERY nice to avoid additional columns
- Gives a lot of extra flexibility
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Type of functions allowed
- Functions must be deterministic
=> “immutable”
=> Functions can be written in almost any language
=> This is highly performance sensitive
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL decide on
index vs. no index?
- PostgreSQL uses statistics to estimate the number of
rows coming back
- Each operation will be assigned to costs
=> costs are just a number to compare
different options inside the planner
- Costs parameters can be changed at runtime
or globally
=> be careful, it can go against you
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- pg_stats is your friend:
test=# d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Modifiers
-------------------------------+-----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Updating statistics
- System statistics are updated by ANALYZE:
test=# h ANALYZE
Command: ANALYZE
Description: collect statistics about a database
Syntax:
ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
- In most setups autovacuum is in charge
of updating pg_statistic
- In most cases statistics are not an issue
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL estimate costs?
- seq_page_cost = 1
- random_page_cost = 4
- cpu_tuple_cost = 0.01
- cpu_operator_cost = 0.0025
- cpu_index_tuple_cost = 0.005
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (1):
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(2 rows)
- total costs are at 75100.81
- costs are composed of I/O and CPU costs
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (2):
test=# SELECT pg_relation_size('t_test') / 8192;
?column?
----------
22672
(1 row)
- our table consists of 22672 blocks
- each block is 8kb in size
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (3):
The seq scan:
I/O cost = 22672 * seq_page_cost = 22672
4.194.304 * cpu_tuple_cost = 41943.04
= 64615.04 for the seq scan
The aggregate:
4.194.304 * cpu_operator_cost = 10485.76
Total costs => 75.100.80 + cpu_operator_cost
(we have to display the tuple)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Inflation at work:
test=# SET seq_page_cost TO 10;
SET
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=279148.80..279148.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0)
(2 rows)
- Costs can be changed at runtime to fine tune
index usage
=> only do this if you are fully aware of what
you are doing. It can have unintended side
effects
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Spinning disks vs. SSDs
- Traditional disks are fast sequentially
and pretty bad when doing random
I/O
- SSDs fixed the problem.
=> consider changing random_page_cost
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Abusing tablespaces:
test=# ALTER TABLESPACE pg_default
SET (random_page_cost = 1);
ALTER TABLESPACE
- Allows different cost settings for various
disk subsystems
- It also allows to split “cached” and “uncached”
data -> ugly but useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# CREATE TABLE t_random AS SELECT *
FROM t_test
ORDER BY random();
SELECT 4194304
test=# CREATE INDEX idx_random ON t_random(id);
CREATE INDEX
test=# ANALYZE t_random;
ANALYZE
- The PostgreSQL optimizer considers the
physical order of rows on disk
- High-correlation will make indexes ways
more likely as the optimizer reduces its
estimates for I/O costs.
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# explain SELECT count(*) FROM t_test WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=75.35..75.36 rows=1 width=0)
-> Index Only Scan using idx_id on t_test
(cost=0.00..72.72 rows=1049 width=0)
Index Cond: (id < 1000)
(3 rows)
test=# explain SELECT count(*) FROM t_random WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=950.31..950.32 rows=1 width=0)
-> Index Only Scan using idx_random on t_random
(cost=0.00..947.94 rows=947 width=0)
Index Cond: (id < 1000)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Implications:
- This is why different plans can pop up
EVEN if the data is the same
- There is no fixed amount of data making
PostgreSQL switch from index to
sequential scan
- High correlation can improve performance
=> consider clustering the table
test=# h CLUSTER
Command: CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Using OR / AND:
- PostgreSQL can use more than one index per
table per query
- PostgreSQL provides multi-column indexes
- What you might see is a so called “Bitmap Scan”
=> don't mix it up with Oracle Bitmap Indexes
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423;
QUERY PLAN
---------------------------------------------------------------------------
Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9)
Recheck Cond: ((id = 2343) OR (id = 423423))
-> BitmapOr (cost=9.44..9.44 rows=2 width=0)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 2343)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 423423)
(7 rows)
- PostgreSQL will scan the index twice
- PostgreSQL will look for blocks in the underlying table
- The condition has to be re-evaluated
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9)
Index Cond: (name = 'josef'::text)
Filter: (id = 2343)
(3 rows)
- PostgreSQL does not always use two indexes
when you have 2 quals
- The more selective index might be enough
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Multicolumn indexes:
test=# DROP INDEX idx_id;
DROP INDEX
test=# CREATE INDEX idx_combined ON t_test (id, name);
CREATE INDEX
test=# explain SELECT * FROM t_test WHERE id = 10;
QUERY PLAN
--------------------------------------------------------------------------------
Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9)
Index Cond: (id = 10)
(2 rows)
- PostgreSQL can use parts of those column IF they are
in the first part(s) of the index
- Imagine a phone book; it is just liked a combined index
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Many indexes or combined indexes?
- It depends on what you want to query
- If you always use the first conditions in the index
a combined index might be a good idea
- Many indexes are more flexible but maybe not perfect
- Sometimes a mixed-strategy can be useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
4. Indexes to provide order
- b-tress can be used for more than searching
- Binary trees provide you with order.
- Order helps to avoid repeated sorting.
test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit (cost=0.00..0.31 rows=10 width=9)
-> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9)
(2 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
5. Dealing with upper / lowercase
- Upper and lower case searches are common:
- If you want to do case-insensitive, don't use
a functional index
- Consider using “citext”
test=# CREATE EXTENSION citext;
CREATE EXTENSION
test=# SELECT 'ABC'::citext = 'abc'::citext;
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- PostgreSQL supports more than just btrees
- B-Trees are fine if you are interested in things
which can be sorted
- Try to sort polygons => you won't find them
- Geometric data and Full-Text-Search need
different algorithms
NOTE: This is not about, which index is faster.
This is about the correct ALGORITHM
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Index types provided by PostgreSQL
- B-Trees
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Indexes and algorithms
- B-Trees: numbers, text, dates, etc.
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- Gist operates on different principles
than btree
- it supports “contains”, “left of”, “overlaps”, etc.
- “contains”, etc. are good for
=> Full Text Search
=> Geometric operations (PostGIS, etc.)
=> Finding genome sequences
=> Handling ranges (time, etc.)
=> Fuzzy search
- Gist allows KNN-search
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- How it works internally ...
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- Gist is a so called inverted index
- Used for Full Text Search
- If you have 1 mio documents containing the word
“house”. Do you really want to have house inside
the index 1 mio times?
=> Binary tree for words
=> A document list for each word
=> Classical approach to text search
- FTS is not about “=”, it is about “contains”
=> forget btree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- GIN internal workings:
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- SP-Gist is a space partitioned index
- Can be used for a variety of algorithms, which use
space partitioning
=> quad trees
=> suffix trees
=> k-d trees
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- Quad trees: A prototype example ...
- We want to insert ... (6, 4) and (2, 8)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming:
- Before searching, it makes sense to perform
“stemming”
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car');
to_tsvector
-----------------------------------------
'better':5 'car':3,11 'mani':2 'one':10
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming is language dependent:
- Stemming works nicely for “roman” languages
=> it is hard to do this for chinese and so on
test=# SELECT to_tsvector('english', 'i am'),
to_tsvector('german', 'i am'),
to_tsvector('dutch', 'i am');
to_tsvector | to_tsvector | to_tsvector
-------------+-------------+--------------
| 'i':1 | 'am':2 'i':1
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Indexing is easy:
- All you need is a functional index
- Alternatively the stemmed content can be
“materialized” in a separate column
CREATE INDEX idx_fti ON t_test
USING gist (to_tsvector('german', name));
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- ts_vector and ts_query magic
- PostgreSQL allows you to use “and” (&)
and “or” (|)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', 'car & truck');
?column?
----------
f
(1 row)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', '(car | truck) & many');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- A stupid question: What is a “word”?
- PostgreSQL is NOT limited to textual search
- Remember, it is all about “contains” ...
- Create yourself your own parser:
test=# h CREATE TEXT SEARCH PARSER
Command: CREATE TEXT SEARCH PARSER
Description: define a new text search parser
Syntax:
CREATE TEXT SEARCH PARSER name (
START = start_function ,
GETTOKEN = gettoken_function ,
END = end_function ,
LEXTYPES = lextypes_function
[, HEADLINE = headline_function ]
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility (2):
test=# h CREATE TEXT SEARCH CONFIGURATION
Command: CREATE TEXT SEARCH CONFIGURATION
Description: define a new text search configuration
Syntax:
CREATE TEXT SEARCH CONFIGURATION name (
PARSER = parser_name |
COPY = source_config
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility:
test=# h CREATE TEXT SEARCH DICTIONARY
Command: CREATE TEXT SEARCH DICTIONARY
Description: define a new text search dictionary
Syntax:
CREATE TEXT SEARCH DICTIONARY name (
TEMPLATE = template
[, option = value [, ... ]]
)
test=# h CREATE TEXT SEARCH TEMPLATE
Command: CREATE TEXT SEARCH TEMPLATE
Description: define a new text search template
Syntax:
CREATE TEXT SEARCH TEMPLATE name (
[ INIT = init_function , ]
LEXIZE = lexize_function
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- What does it take to organize a btree?
Operator Strategy number
< 1
<= 2
= 3
>= 4
< 5
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Why care?
- The way numbers are treated is pretty “common”
- How about sorting this one?
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Defining indexing strategies
- We can write our own operators
- Those operators can be assigned to an operator
class, which will tell the index how to “behave”
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (1):
test=# CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
BEGIN
RETURN substring($1, 9, 2) ||
substring($1, 7, 2) ||
substring($1, 5, 2) ||
substring($1, 1, 4);
END; $$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION
test=# SELECT normalize_si('2305090478');
normalize_si
--------------
7804092305
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (2):
test=# CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
BEGIN
RETURN normalize_si($1) < normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
test=# CREATE OPERATOR <# (
PROCEDURE=si_lt,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR
CREATE FUNCTION
test=# SELECT '2305090478'::text <# '4353070677'::text;
?column?
----------
f
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Creating the operator class:
- write operators for all operations needed
- write “support functions” (= “same”, etc.)
- make sure that the most important strategies
have proper operators
test=# h CREATE OPERATOR CLASS
Command: CREATE OPERATOR CLASS
Description: define a new operator class
Syntax:
CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type
USING index_method [ FAMILY family_name ] AS
{ OPERATOR strategy_number operator_name [ ( op_type, op_type ) ]
[ FOR SEARCH | FOR ORDER BY sort_family_name ]
| FUNCTION support_number [ ( op_type [ , op_type ] ) ]
function_name ( argument_type [, ...] )
| STORAGE storage_type
} [, ... ]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- pg_trgm
- Trigrams are perfect to perform fuzzy matching
- Trigrams can be used nicely along with KNN-search
- pg_trgm is available as extension to PostgreSQL
test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
- Problem: “What is the proper way to spell the name of this
village?
“gramatneusiedl” vs. “grammatneusiedel”?
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm
test=# CREATE TABLE t_search AS
SELECT relname::text
FROM pg_class;
SELECT 303
test=# CREATE INDEX idx_trgm
ON t_search USING gist(relname gist_trgm_ops);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm (2):
test=# SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
relname | ?column?
--------------------------------+----------
pg_class | 0.454545
pg_opclass | 0.538462
pg_class_oid_index | 0.714286
pg_opclass_oid_index | 0.727273
pg_class_relname_nsp_index | 0.793103
pg_opclass_am_name_nsp_index | 0.8
pg_seclabel | 0.823529
pg_am | 0.833333
pg_seclabels | 0.833333
pg_shseclabel | 0.842105
(10 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- KNN in action:
test=# explain SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.14..1.40 rows=10 width=19)
-> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19)
Order By: (relname <-> 'pgclass'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- LIKE can be indexed in some cases:
- The PostgreSQL optimizer can rewrite queries featuring LIKE
in a fancy and efficient way
=> The goal is to find the “next character” in line
and query for a range
- This kind of rewrite only works when the next character
Is actually knows to PostgreSQL
- Special operator classes might be needed
=> varchar_pattern_ops, text_pattern_ops
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- An example:
test=# CREATE INDEX idx_relname
ON t_search (relname);
CREATE INDEX
test=# SET enable_seqscan TO off;
SET
test=# explain SELECT relname
FROM t_search
WHERE relname LIKE 'abc%';
QUERY PLAN
----------------------------------------------------------------------------------
Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19)
Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text))
Filter: (relname ~~ 'abc%'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
12. Indexing MIN / MAX
- An example:
- MIN / MAX works by reading the index from left
and right (backward scan)
test=# explain SELECT min(relname), max(relname) FROM t_search;
QUERY PLAN
----------------------------------------------------------------------------------
Result (cost=0.74..0.75 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan using idx_relname on t_search
(cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan Backward using idx_relname on
t_search t_search_1 (cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
(9 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Any question?
Thanks you for your attention
Any question?

More Related Content

What's hot

Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 

What's hot (20)

Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performance
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)
 
Postgresql
PostgresqlPostgresql
Postgresql
 
Sql query patterns, optimized
Sql query patterns, optimizedSql query patterns, optimized
Sql query patterns, optimized
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 

Viewers also liked

Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
Craig Kerstiens
 

Viewers also liked (20)

Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 
Database index
Database indexDatabase index
Database index
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Les11 Including Constraints
 
Indexing basics
Indexing basicsIndexing basics
Indexing basics
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
 
Less07 Users
Less07 UsersLess07 Users
Less07 Users
 
Writing optimal queries
Writing optimal queriesWriting optimal queries
Writing optimal queries
 
Postgre sql unleashed
Postgre sql unleashedPostgre sql unleashed
Postgre sql unleashed
 
5min analyse
5min analyse5min analyse
5min analyse
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
 
Explain explain
Explain explainExplain explain
Explain explain
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
PostgreSQL: The NoSQL way
PostgreSQL: The NoSQL wayPostgreSQL: The NoSQL way
PostgreSQL: The NoSQL way
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
 
Indexes
IndexesIndexes
Indexes
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 

Similar to PostgreSQL: Advanced indexing

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 

Similar to PostgreSQL: Advanced indexing (20)

Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
 
Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migration
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tips
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Chapter15
Chapter15Chapter15
Chapter15
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

PostgreSQL: Advanced indexing

  • 1. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de PostgreSQL Indexing Dublin, 2013 Hans-Jürgen Schönig
  • 2. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Scope of this session: - What a basic index does - The PostgreSQL optimizer (cost model) - Classical B-tree Indexes - Partial / functional indexes - Different types of indexes - Full-Text-Search - Fuzzy matching - Writing your own indexing strategy
  • 3. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Generating test data: - for the purpose of this session we need a table consisting of two columns: test=# CREATE TABLE t_test (id serial, name text); CREATE TABLE test=# INSERT INTO t_test (name) VALUES ('hans'); INSERT 0 1 test=# INSERT INTO t_test (name) VALUES ('paul'); INSERT 0 1
  • 4. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 5. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 6. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us see, how PostgreSQL executes a simple query: test=# SELECT count(*) FROM t_test; count --------- 4194304 (1 row) Time: 431.192 ms test=# explain analyze SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (actual time=0.013..531.448 rows=4194304 loops=1) Total runtime: 977.917 ms (3 rows) Time: 1045.065 ms
  • 7. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us add a filter: test=# SELECT count(*) FROM t_test WHERE id = 421234; count ------- 1 (1 row) Time: 476.965 ms test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0) (actual time=53.405..495.126 rows=1 loops=1) Filter: (id = 421234) Rows Removed by Filter: 4194303 Total runtime: 495.175 ms (5 rows) Time: 520.659 ms
  • 8. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Sequentially reading data: - In case you like reading the phone book sequentially we are basically done. - Sequentially reading the phone book is technically ok => but socially not accepted - Defining an index is the desired solution
  • 9. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Creating an index test=# h CREATE INDEX Command: CREATE INDEX Description: define a new index Syntax: CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] - At the end of the day all clauses will be covered by this training
  • 10. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A typical index: test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX Time: 7357.663 ms - This gives us a standard btree index - PostgreSQL provides “High-Concurrency B-Trees” (Lehman-Yao, 1981) - Many people can modify the index at the same time - Highly efficient B+ tree
  • 11. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - How a btree works: 8k Root Node ... Sorted ... Forward chaining Tabelle Index 8k ... Row linp
  • 12. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Indexing is beneficial test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=8.73..8.74 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=1) -> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: (id = 421234) Heap Fetches: 1 Total runtime: 0.057 ms (5 rows) Time: 0.395 ms - A lot faster :).
  • 13. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Still slow ... test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 787.407 ms - This is still slow. Let us create an index ... test=# CREATE INDEX idx_name ON t_test (name); CREATE INDEX
  • 14. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The benefit is exactly zero: test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 782.443 ms test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) - The index won't be used - Too many identical values (“not selective”)
  • 15. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The cost is far from zero: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 177 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_id')); pg_size_pretty ---------------- 90 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 90 MB (1 row) - Indexes need a fair amount of space
  • 16. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Input values DO make a difference: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.74..7.75 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - PostgreSQL will decide depending on the input value => cost based optimization
  • 17. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: - In our example the index is only used in case of rare or non-existing values - What is the point of an index when its entire content is totally useless? => a more selective strategy is needed
  • 18. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: test=# DROP INDEX idx_name; DROP INDEX test=# CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul'); CREATE INDEX test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 8192 bytes (1 row) - A partial index reduces space consumption - Benefit is still the same
  • 19. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Equal benefit – lower cost: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.28..7.29 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - This is exactly the same as before !
  • 20. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - What about functions? test=# CREATE INDEX idx_cos ON t_test ( cos(id) ); CREATE INDEX Time: 16867.228 ms test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=23960.99..23961.00 rows=1 width=0) -> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0) Recheck Cond: (cos((id)::double precision) = 17::double precision) -> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0) Index Cond: (cos((id)::double precision) = 17::double precision) (5 rows) - PostgreSQL provides functional indexes - VERY nice to avoid additional columns - Gives a lot of extra flexibility
  • 21. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Type of functions allowed - Functions must be deterministic => “immutable” => Functions can be written in almost any language => This is highly performance sensitive
  • 22. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL decide on index vs. no index? - PostgreSQL uses statistics to estimate the number of rows coming back - Each operation will be assigned to costs => costs are just a number to compare different options inside the planner - Costs parameters can be changed at runtime or globally => be careful, it can go against you
  • 23. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - pg_stats is your friend: test=# d pg_stats View "pg_catalog.pg_stats" Column | Type | Modifiers -------------------------------+-----------+----------- schemaname | name | tablename | name | attname | name | inherited | boolean | null_frac | real | avg_width | integer | n_distinct | real | most_common_vals | anyarray | most_common_freqs | real[] | histogram_bounds | anyarray | correlation | real | most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] |
  • 24. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Updating statistics - System statistics are updated by ANALYZE: test=# h ANALYZE Command: ANALYZE Description: collect statistics about a database Syntax: ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ] - In most setups autovacuum is in charge of updating pg_statistic - In most cases statistics are not an issue
  • 25. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL estimate costs? - seq_page_cost = 1 - random_page_cost = 4 - cpu_tuple_cost = 0.01 - cpu_operator_cost = 0.0025 - cpu_index_tuple_cost = 0.005
  • 26. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (1): test=# explain SELECT count(*) FROM t_test; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (2 rows) - total costs are at 75100.81 - costs are composed of I/O and CPU costs
  • 27. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (2): test=# SELECT pg_relation_size('t_test') / 8192; ?column? ---------- 22672 (1 row) - our table consists of 22672 blocks - each block is 8kb in size
  • 28. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (3): The seq scan: I/O cost = 22672 * seq_page_cost = 22672 4.194.304 * cpu_tuple_cost = 41943.04 = 64615.04 for the seq scan The aggregate: 4.194.304 * cpu_operator_cost = 10485.76 Total costs => 75.100.80 + cpu_operator_cost (we have to display the tuple)
  • 29. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Inflation at work: test=# SET seq_page_cost TO 10; SET test=# explain SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=279148.80..279148.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0) (2 rows) - Costs can be changed at runtime to fine tune index usage => only do this if you are fully aware of what you are doing. It can have unintended side effects
  • 30. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Spinning disks vs. SSDs - Traditional disks are fast sequentially and pretty bad when doing random I/O - SSDs fixed the problem. => consider changing random_page_cost
  • 31. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Abusing tablespaces: test=# ALTER TABLESPACE pg_default SET (random_page_cost = 1); ALTER TABLESPACE - Allows different cost settings for various disk subsystems - It also allows to split “cached” and “uncached” data -> ugly but useful
  • 32. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# CREATE TABLE t_random AS SELECT * FROM t_test ORDER BY random(); SELECT 4194304 test=# CREATE INDEX idx_random ON t_random(id); CREATE INDEX test=# ANALYZE t_random; ANALYZE - The PostgreSQL optimizer considers the physical order of rows on disk - High-correlation will make indexes ways more likely as the optimizer reduces its estimates for I/O costs.
  • 33. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# explain SELECT count(*) FROM t_test WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=75.35..75.36 rows=1 width=0) -> Index Only Scan using idx_id on t_test (cost=0.00..72.72 rows=1049 width=0) Index Cond: (id < 1000) (3 rows) test=# explain SELECT count(*) FROM t_random WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=950.31..950.32 rows=1 width=0) -> Index Only Scan using idx_random on t_random (cost=0.00..947.94 rows=947 width=0) Index Cond: (id < 1000) (3 rows)
  • 34. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Implications: - This is why different plans can pop up EVEN if the data is the same - There is no fixed amount of data making PostgreSQL switch from index to sequential scan - High correlation can improve performance => consider clustering the table test=# h CLUSTER Command: CLUSTER Description: cluster a table according to an index Syntax: CLUSTER [VERBOSE] table_name [ USING index_name ] CLUSTER [VERBOSE]
  • 35. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Using OR / AND: - PostgreSQL can use more than one index per table per query - PostgreSQL provides multi-column indexes - What you might see is a so called “Bitmap Scan” => don't mix it up with Oracle Bitmap Indexes
  • 36. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9) Recheck Cond: ((id = 2343) OR (id = 423423)) -> BitmapOr (cost=9.44..9.44 rows=2 width=0) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 2343) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 423423) (7 rows) - PostgreSQL will scan the index twice - PostgreSQL will look for blocks in the underlying table - The condition has to be re-evaluated
  • 37. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef'; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9) Index Cond: (name = 'josef'::text) Filter: (id = 2343) (3 rows) - PostgreSQL does not always use two indexes when you have 2 quals - The more selective index might be enough
  • 38. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Multicolumn indexes: test=# DROP INDEX idx_id; DROP INDEX test=# CREATE INDEX idx_combined ON t_test (id, name); CREATE INDEX test=# explain SELECT * FROM t_test WHERE id = 10; QUERY PLAN -------------------------------------------------------------------------------- Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9) Index Cond: (id = 10) (2 rows) - PostgreSQL can use parts of those column IF they are in the first part(s) of the index - Imagine a phone book; it is just liked a combined index
  • 39. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Many indexes or combined indexes? - It depends on what you want to query - If you always use the first conditions in the index a combined index might be a good idea - Many indexes are more flexible but maybe not perfect - Sometimes a mixed-strategy can be useful
  • 40. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 4. Indexes to provide order - b-tress can be used for more than searching - Binary trees provide you with order. - Order helps to avoid repeated sorting. test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=0.00..0.31 rows=10 width=9) -> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9) (2 rows)
  • 41. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 5. Dealing with upper / lowercase - Upper and lower case searches are common: - If you want to do case-insensitive, don't use a functional index - Consider using “citext” test=# CREATE EXTENSION citext; CREATE EXTENSION test=# SELECT 'ABC'::citext = 'abc'::citext; ?column? ---------- t (1 row)
  • 42. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - PostgreSQL supports more than just btrees - B-Trees are fine if you are interested in things which can be sorted - Try to sort polygons => you won't find them - Geometric data and Full-Text-Search need different algorithms NOTE: This is not about, which index is faster. This is about the correct ALGORITHM
  • 43. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Index types provided by PostgreSQL - B-Trees - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 44. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Indexes and algorithms - B-Trees: numbers, text, dates, etc. - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 45. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - Gist operates on different principles than btree - it supports “contains”, “left of”, “overlaps”, etc. - “contains”, etc. are good for => Full Text Search => Geometric operations (PostGIS, etc.) => Finding genome sequences => Handling ranges (time, etc.) => Fuzzy search - Gist allows KNN-search
  • 46. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - How it works internally ...
  • 47. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - Gist is a so called inverted index - Used for Full Text Search - If you have 1 mio documents containing the word “house”. Do you really want to have house inside the index 1 mio times? => Binary tree for words => A document list for each word => Classical approach to text search - FTS is not about “=”, it is about “contains” => forget btree
  • 48. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - GIN internal workings:
  • 49. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - SP-Gist is a space partitioned index - Can be used for a variety of algorithms, which use space partitioning => quad trees => suffix trees => k-d trees
  • 50. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - Quad trees: A prototype example ... - We want to insert ... (6, 4) and (2, 8)
  • 51. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming: - Before searching, it makes sense to perform “stemming” test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car'); to_tsvector ----------------------------------------- 'better':5 'car':3,11 'mani':2 'one':10 (1 row)
  • 52. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming is language dependent: - Stemming works nicely for “roman” languages => it is hard to do this for chinese and so on test=# SELECT to_tsvector('english', 'i am'), to_tsvector('german', 'i am'), to_tsvector('dutch', 'i am'); to_tsvector | to_tsvector | to_tsvector -------------+-------------+-------------- | 'i':1 | 'am':2 'i':1 (1 row)
  • 53. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 54. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 55. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Indexing is easy: - All you need is a functional index - Alternatively the stemmed content can be “materialized” in a separate column CREATE INDEX idx_fti ON t_test USING gist (to_tsvector('german', name));
  • 56. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - ts_vector and ts_query magic - PostgreSQL allows you to use “and” (&) and “or” (|) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car & truck'); ?column? ---------- f (1 row) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', '(car | truck) & many'); ?column? ---------- t (1 row)
  • 57. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - A stupid question: What is a “word”? - PostgreSQL is NOT limited to textual search - Remember, it is all about “contains” ... - Create yourself your own parser: test=# h CREATE TEXT SEARCH PARSER Command: CREATE TEXT SEARCH PARSER Description: define a new text search parser Syntax: CREATE TEXT SEARCH PARSER name ( START = start_function , GETTOKEN = gettoken_function , END = end_function , LEXTYPES = lextypes_function [, HEADLINE = headline_function ] )
  • 58. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility (2): test=# h CREATE TEXT SEARCH CONFIGURATION Command: CREATE TEXT SEARCH CONFIGURATION Description: define a new text search configuration Syntax: CREATE TEXT SEARCH CONFIGURATION name ( PARSER = parser_name | COPY = source_config )
  • 59. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility: test=# h CREATE TEXT SEARCH DICTIONARY Command: CREATE TEXT SEARCH DICTIONARY Description: define a new text search dictionary Syntax: CREATE TEXT SEARCH DICTIONARY name ( TEMPLATE = template [, option = value [, ... ]] ) test=# h CREATE TEXT SEARCH TEMPLATE Command: CREATE TEXT SEARCH TEMPLATE Description: define a new text search template Syntax: CREATE TEXT SEARCH TEMPLATE name ( [ INIT = init_function , ] LEXIZE = lexize_function )
  • 60. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - What does it take to organize a btree? Operator Strategy number < 1 <= 2 = 3 >= 4 < 5
  • 61. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Why care? - The way numbers are treated is pretty “common” - How about sorting this one? “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 62. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Defining indexing strategies - We can write our own operators - Those operators can be assigned to an operator class, which will tell the index how to “behave” “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 63. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (1): test=# CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$ BEGIN RETURN substring($1, 9, 2) || substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION test=# SELECT normalize_si('2305090478'); normalize_si -------------- 7804092305 (1 row)
  • 64. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (2): test=# CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$ BEGIN RETURN normalize_si($1) < normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; test=# CREATE OPERATOR <# ( PROCEDURE=si_lt, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR CREATE FUNCTION test=# SELECT '2305090478'::text <# '4353070677'::text; ?column? ---------- f (1 row)
  • 65. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Creating the operator class: - write operators for all operations needed - write “support functions” (= “same”, etc.) - make sure that the most important strategies have proper operators test=# h CREATE OPERATOR CLASS Command: CREATE OPERATOR CLASS Description: define a new operator class Syntax: CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type USING index_method [ FAMILY family_name ] AS { OPERATOR strategy_number operator_name [ ( op_type, op_type ) ] [ FOR SEARCH | FOR ORDER BY sort_family_name ] | FUNCTION support_number [ ( op_type [ , op_type ] ) ] function_name ( argument_type [, ...] ) | STORAGE storage_type } [, ... ]
  • 66. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - pg_trgm - Trigrams are perfect to perform fuzzy matching - Trigrams can be used nicely along with KNN-search - pg_trgm is available as extension to PostgreSQL test=# CREATE EXTENSION pg_trgm; CREATE EXTENSION - Problem: “What is the proper way to spell the name of this village? “gramatneusiedl” vs. “grammatneusiedel”?
  • 67. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm test=# CREATE TABLE t_search AS SELECT relname::text FROM pg_class; SELECT 303 test=# CREATE INDEX idx_trgm ON t_search USING gist(relname gist_trgm_ops); CREATE INDEX
  • 68. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm (2): test=# SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; relname | ?column? --------------------------------+---------- pg_class | 0.454545 pg_opclass | 0.538462 pg_class_oid_index | 0.714286 pg_opclass_oid_index | 0.727273 pg_class_relname_nsp_index | 0.793103 pg_opclass_am_name_nsp_index | 0.8 pg_seclabel | 0.823529 pg_am | 0.833333 pg_seclabels | 0.833333 pg_shseclabel | 0.842105 (10 rows)
  • 69. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - KNN in action: test=# explain SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.14..1.40 rows=10 width=19) -> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19) Order By: (relname <-> 'pgclass'::text) (3 rows)
  • 70. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - LIKE can be indexed in some cases: - The PostgreSQL optimizer can rewrite queries featuring LIKE in a fancy and efficient way => The goal is to find the “next character” in line and query for a range - This kind of rewrite only works when the next character Is actually knows to PostgreSQL - Special operator classes might be needed => varchar_pattern_ops, text_pattern_ops
  • 71. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - An example: test=# CREATE INDEX idx_relname ON t_search (relname); CREATE INDEX test=# SET enable_seqscan TO off; SET test=# explain SELECT relname FROM t_search WHERE relname LIKE 'abc%'; QUERY PLAN ---------------------------------------------------------------------------------- Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19) Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text)) Filter: (relname ~~ 'abc%'::text) (3 rows)
  • 72. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 12. Indexing MIN / MAX - An example: - MIN / MAX works by reading the index from left and right (backward scan) test=# explain SELECT min(relname), max(relname) FROM t_search; QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.74..0.75 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan using idx_relname on t_search (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan Backward using idx_relname on t_search t_search_1 (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) (9 rows)
  • 73. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Any question? Thanks you for your attention Any question?