SlideShare a Scribd company logo
Where’s Waldo? - Text Search and Pattern
Matching in PostgreSQL
Joe Conway
joe.conway@crunchydata.com
mail@joeconway.com
Crunchy Data
March 22, 2018
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Where’s Waldo?
Many potential methods
Usually best to use simplest method that fits use case
Might need to combine more than one method
Joe Conway PGConf APAC 2018 2/54
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Agenda
Summary of methods
Overview by method
Example use cases
Joe Conway PGConf APAC 2018 3/54
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Caveats
Full text search could easily fill a tutorial
⇒ this talk provides overview
Even other methods cannot be covered exhaustively
⇒ this talk provides overview
citext not covered, should be considered
Joe Conway PGConf APAC 2018 4/54
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Text Search Methods
Standard Pattern Matching
LIKE operator
SIMILAR TO operator
POSIX-style regular expressions
PostgreSQL extensions
fuzzystrmatch
Soundex
Levenshtein
Metaphone
Double Metaphone
pg trgm
Full Text Search
Joe Conway PGConf APAC 2018 5/54
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Note on Extensions
Extensions used are created as shown below
CREATE EXTENSION pg_trgm;
CREATE EXTENSION fuzzystrmatch;
Joe Conway PGConf APAC 2018 6/54
Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Sample Data
CREATE TABLE messages (
[...]
_from text NOT NULL,
_to text NOT NULL,
subject text NOT NULL,
bodytxt text NOT NULL,
fti tsvector NOT NULL,
[...]
);
select count(1) from messages;
count
---------
1086568
(1 row)
Joe Conway PGConf APAC 2018 7/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
LIKE Syntax
Expression returns TRUE if string matches pattern
Typically string comes from relation in FROM clause
Used as predicate in the WHERE clause to filter returned rows
LIKE is case sensitive
ILIKE is case insensitive
string LIKE pattern [ESCAPE escape-character]
string ~~ pattern [ESCAPE escape-character]
string ILIKE pattern [ESCAPE escape-character]
string ~~* pattern [ESCAPE escape-character]
lower(string) LIKE pattern [ESCAPE escape-character]
Joe Conway PGConf APAC 2018 8/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Negating LIKE
To negate match, use the NOT keyword
Appropriate operator also works
string NOT LIKE pattern [ESCAPE escape-character]
string !~~ pattern [ESCAPE escape-character]
string NOT ILIKE pattern [ESCAPE escape-character]
string !~~* pattern [ESCAPE escape-character]
NOT (string LIKE pattern [ESCAPE escape-character])
NOT (string ILIKE pattern [ESCAPE escape-character])
Joe Conway PGConf APAC 2018 9/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Wildcards
Pattern can contain wildcard characters
Underscore (”_”) matches any single character
Percent sign (”%”) matches zero or more characters
With no wildcards, expression acts like equals
To match literal wildcard chars, they must be escaped
Default escape char is backslash (””)
May be changed using ESCAPE clause
Match the literal escape char by doubling up
Joe Conway PGConf APAC 2018 10/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Alternate Index Op Classes
varchar_pattern_ops, text_pattern_ops and bpchar_pattern_ops
Useful for anchored pattern matching, e.g. ”<pattern>%”
Used by LIKE, SIMILAR TO, or POSIX regex when not using ”C” locale
Also create ”normal” index for queries with <, <=, >, or >=
Does NOT work for ILIKE or ~~*
Expression index over lower(column)
pg trgm index operator class
Joe Conway PGConf APAC 2018 11/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
ESCAPE Example
SELECT 'AbC_%_dEf' LIKE 'AbC#_#%#_d%' ESCAPE '#';
?column?
----------
t
(1 row)
Joe Conway PGConf APAC 2018 12/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
SIMILAR TO Syntax
Equivalent to LIKE
Interprets pattern using SQL definition of regex
string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]
Joe Conway PGConf APAC 2018 13/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Wildcards
Same as LIKE
Also supports meta-characters borrowed from POSIX REs
pipe (”|”): either of two alternatives
asterisk (”*”): repetition >= 0 times
plus (”+”): repetition >= 1 time
question mark (”?”): repetition 0 or 1 time
”{m}”: repetition exactly m times
”{m,}”: repetition >= m times
”{m,n}”: repetition >= m and <= n times
parentheses (”()”): group items into a single logical item
Joe Conway PGConf APAC 2018 14/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
SIMILAR TO Examples
SELECT 'AbCdEf' SIMILAR TO 'AbC%' AS true,
'AbCdEf' SIMILAR TO 'Ab(C|c)%' AS true,
'Abcccdef' SIMILAR TO 'Abc{4}%' AS false,
'Abcccdef' SIMILAR TO 'Abc{3}%' AS true,
'Abccdef' SIMILAR TO 'Abc?d?%' AS true;
true | true | false | true | true
------+------+-------+------+------
t | t | f | t | t
(1 row)
Joe Conway PGConf APAC 2018 15/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Syntax
Similar to LIKE and ILIKE
Allowed to match anywhere within string
⇒ unless RE is explicitly anchored
Interprets pattern using POSIX definition of regex
string ~ pattern -- matches RE, case sensitive
string ~* pattern -- matches RE, case insensitive
string !~ pattern -- not matches RE, case sensitive
string !~* pattern -- not matches RE, case insensitive
Joe Conway PGConf APAC 2018 16/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Syntax
POSIX-style REs complex enough to deserve own talk
See: www.postgresql.org/docs/9.5/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
SELECT 'AbCdefzzzzdef' ~* 'Ab((C|c).*)?z+def.*' AS true,
'AbcabcAbc' ~ '^Ab.*bc$' AS true,
'AbcabcAbc' ~ '^Ab' AS true,
'AbcAbcAbc' ~* 'abc' AS true,
'AbcAbcAbc' ~* '^abc$' AS false;
true | true | true | true | false
------+------+------+------+-------
t | t | t | t | f
(1 row)
Joe Conway PGConf APAC 2018 17/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Really slow without an index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE bodytxt ~* 'multixact';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=108 width=8)
(actual time=6.435..26851.944 rows=2580 loops=1)
Filter: (bodytxt ~* 'multixact'::text)
Rows Removed by Filter: 1083988
Planning time: 1.682 ms
Execution time: 26852.410 ms
Joe Conway PGConf APAC 2018 18/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Use trigram GIN index
CREATE INDEX trgm_gin_bodytxt_idx
ON messages USING gin (bodytxt using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE bodytxt ~* 'multixact';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_bodytxt_idx
(cost=0.00..124.81 rows=108 width=0)
(actual time=66.095..66.095 rows=2581 loops=1)
Index Cond: (bodytxt ~* 'multixact'::text)
Planning time: 3.680 ms
Execution time: 192.912 ms
Joe Conway PGConf APAC 2018 19/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Or use trigram GiST index . . . oops
CREATE INDEX trgm_gist_bodytxt_idx
ON messages USING gist (bodytxt using gist_trgm_ops);
ERROR: index row requires 8672 bytes, maximum size is 8191
Joe Conway PGConf APAC 2018 20/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Compared to FTS
For the sake of comparison - with full text search
EXPLAIN ANALYZE SELECT date FROM messages
WHERE fti @@ 'multixact:D';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on messages_fti_idx
(cost=0.00..64.75 rows=5433 width=0)
(actual time=1.085..1.085 rows=1475 loops=1)
Index Cond: (fti @@ '''multixact'':D'::tsquery)
Planning time: 0.504 ms
Execution time: 22.054 ms
Joe Conway PGConf APAC 2018 21/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Soundex
soundex: converts string to four character code
difference: converts two strings, reports # matching positions
Generally finds similarity of English names
Part of fuzzystrmatch extension
SELECT soundex('Joseph'), soundex('Josef'),
difference('Joseph', 'Josef');
soundex | soundex | difference
---------+---------+------------
J210 | J210 | 4
Joe Conway PGConf APAC 2018 22/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Levenshtein
Calculates Levenshtein distance between two strings
Comparisons case sensitive
Strings non-null, maximum 255 bytes
Part of fuzzystrmatch extension
SELECT levenshtein('Joseph','Josef') AS two,
levenshtein('John','Joan') AS one,
levenshtein('foo','foo') AS zero;
two | one | zero
-----+-----+------
2 | 1 | 0
Joe Conway PGConf APAC 2018 23/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Metaphone
Constructs code for an input string
Comparisons case in-sensitive
Strings non-null, maximum 255 bytes
max_output_length arg sets max length of code
Part of fuzzystrmatch extension
SELECT metaphone('extensive',6) AS "EKSTNS",
metaphone('exhaustive',6) AS "EKSHST",
metaphone('ExTensive',3) AS "EKS",
metaphone('eXhaustivE',3) AS "EKS";
EKSTNS | EKSHST | EKS | EKS
--------+--------+-----+-----
EKSTNS | EKSHST | EKS | EKS
Joe Conway PGConf APAC 2018 24/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Double Metaphone
Computes primary and alternate codes for string
Non-English names especially, can be different
Comparisons case in-sensitive
No length limit on the input strings
Part of fuzzystrmatch extension
SELECT dmetaphone('extensive') AS "AKST",
dmetaphone('exhaustive') AS "AKSS",
dmetaphone('Magnus') AS "MNS",
dmetaphone_alt('Magnus') AS "MKNS";
AKST | AKSS | MNS | MKNS
------+------+-----+------
AKST | AKSS | MNS | MKNS
Joe Conway PGConf APAC 2018 25/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Trigram Matching
Functions and operators for determining similarity
Trigram is group of three consecutive characters from string
Similarity of two strings - count number of trigrams shared
Index operator classes supporting fast similar strings search
Support indexed searches for LIKE and ILIKE queries
Comparisons case in-sensitive
Part of pg trgm extension
Joe Conway PGConf APAC 2018 26/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Trigram Matching Example
timing
SELECT set_limit(0.6); -- defaults to 0.3
SELECT DISTINCT _from, -- uses trgm_gist_idx
similarity(_from, 'Josef Konway <mail@joeconway.com>') AS sml
FROM messages WHERE _from % 'Josef Konway <mail@joeconway.com>'
ORDER BY sml DESC, _from;
_from | sml
------------------------------------+----------
Joseph Conway <mail@joeconway.com> | 0.724138
Joe Conway <mail@joeconway.com> | 0.703704
jconway <mail@joeconway.com> | 0.678571
"Joe Conway" <joe.conway@mail.com> | 0.62963
(4 rows)
Time: 502.002 ms
Joe Conway PGConf APAC 2018 27/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Overview
Searches documents with potentially complex criteria
Superior to other methods in many cases because:
Offers linguistic support for derived words
Ignores stop words
Ranks results by relevance
Very flexibly uses indexes
Topic very complex - see:
http://www.postgresql.org/docs/9.5/static/textsearch.html
http://www.postgresql.org/docs/9.5/static/datatype-textsearch.html
http://www.postgresql.org/docs/9.5/static/functions-textsearch.html
http://www.postgresql.org/docs/9.5/static/textsearch-indexes.html
Joe Conway PGConf APAC 2018 28/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Preprocessing
Convert text to tsvector
Store tsvector
Index tsvector
CREATE FUNCTION messages_fti_trigger_func()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN NEW.fti =
setweight(to_tsvector(coalesce(NEW.subject, '')), 'A') ||
setweight(to_tsvector(coalesce(NEW.bodytxt, '')), 'D');
RETURN NEW; END $$;
CREATE TRIGGER messages_fti_trigger BEFORE INSERT OR UPDATE
OF subject, bodytxt ON messages FOR EACH ROW
EXECUTE PROCEDURE messages_fti_trigger_func();
CREATE INDEX messages_fti_idx ON messages USING gin (fti);
Joe Conway PGConf APAC 2018 29/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Weighting
Weights used in relevance ranking
Array specifies how heavily to weigh each category
{D-weight, C-weight, B-weight, A-weight}
defaults: {0.1, 0.2, 0.4, 1.0}
Joe Conway PGConf APAC 2018 30/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Creating tsvector
Parse into tokens
Classes of tokens can be processed differently
Postgres has standard parser and predefined set of classes
Custom parsers can be created
Convert tokens into lexemes
Dictionaries used for this step
⇒ standard dictionaries provided
⇒ custom ones can be created
Normalized: different forms of same word made alike
⇒ fold upper-case letters to lower-case
⇒ removal of suffixes
⇒ elimination of stop words
Joe Conway PGConf APAC 2018 31/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
The pattern to be matched
Lexemes combined with boolean operators
& (AND)
| (OR)
! (NOT)
! (NOT) binds most tightly
& (AND) binds more tightly than | (OR)
Parentheses used to enforce grouping
Label with * to specify prefix matching
Supports weight labels
Joe Conway PGConf APAC 2018 32/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
to tsquery() to create
Normalizes tokens lexemes
Discards stop words
Can also cast to tsquery
Tokens taken at face value
No weight labels
SELECT to_tsquery('(hello:A & the:A) | writing:D') AS to_tsq,
'(hello & the) | writing'::tsquery AS cast_tsq
UNION ALL
SELECT to_tsquery('postgres:*'),
'postgres:*'::tsquery;
to_tsq | cast_tsq
-----------------------+-----------------------------
'hello':A | 'write':D | 'hello' & 'the' | 'writing'
'postgr':* | 'postgres':*
Joe Conway PGConf APAC 2018 33/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
Alternative function plainto tsquery()
Text parsed and normalized
& (AND) operator inserted between surviving words
Should use simple strings only
No boolean operators,
No weight labels,
No prefix-match labels
SELECT plainto_tsquery('(hello:B & the:B) | writing:D') AS tsq1,
plainto_tsquery('postgres:*') AS tsq2;
tsq1 | tsq2
-------------------------------------+----------
'hello' & 'b' & 'b' & 'write' & 'd' | 'postgr'
Joe Conway PGConf APAC 2018 34/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Match Operator
Text search match operator @@
Returns true if tsvector (preprocessed document)
matches tsquery (search pattern)
Either maybe be written first
SELECT split_part(_from, '<', 1) AS name, date
FROM messages
WHERE fti @@ 'multixact:A & race:D & bug:D';
name | date
-----------------+------------------------
Alvaro Herrera | 2013-11-25 07:36:19-08
Andres Freund | 2013-11-25 08:26:55-08
Andres Freund | 2013-11-29 11:58:06-08
Joe Conway PGConf APAC 2018 35/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Relevance Ranking
ts rank(): based on frequency of matching lexemes
ts rank cd(): lexeme proximity taken into consideration
WITH ts(q) AS
(
SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery
)
SELECT ts_rank(m.fti, ts.q) as tsrank
FROM messages m, ts
WHERE m.fti @@ ts.q
ORDER BY tsrank DESC LIMIT 4;
tsrank
----------
0.999997
0.999997
0.999997
0.999997
Joe Conway PGConf APAC 2018 36/54
Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Highlighting
ts headline(): returns excerpt with query terms highlighted
Apply in an outer query, after inner query LIMIT
⇒ avoids ts headline() overhead on eliminated rows
SELECT subject, tsrank, ts_headline(format('%s: %s', subject, bodytxt), q)
FROM (WITH ts(q) AS
(SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery)
SELECT ts_rank(m.fti, ts.q) as tsrank, ts.q, m.subject, m.bodytxt
FROM messages m, ts WHERE m.fti @@ ts.q ORDER BY tsrank DESC LIMIT 4
) AS inner_query LIMIT 1;
-[ RECORD 1 ]--------------------------------------------------------------
subject | Is anyone aware of data loss causing MultiXact bugs in 9.3.2?
tsrank | 0.999997
ts_headline | <b>data</b> <b>loss</b> causing <b>MultiXact</b>
bugs in 9.3.2?: I've had multiple complaints
of apparent <b>data</b>
Joe Conway PGConf APAC 2018 37/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Pattern Matching: Example Use Cases
Equal
Anchored
Anchored case-insensitive
Reverse Anchored case-insensitive
Unanchored case-insensitive
Fuzzy
Complex Search with Relevancy Ranking
Joe Conway PGConf APAC 2018 38/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Equal
Find all the rows where column matches '<pattern>'
Equal operator with suitable index is best
Without an index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from = 'Joseph Conway <mail@joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=61 width=8)
(actual time=49.192..527.343 rows=14 loops=1)
Filter: (_from = 'Joseph Conway <mail@joeconway.com>'::text)
Rows Removed by Filter: 1086554
Planning time: 0.256 ms
Execution time: 527.386 ms
Joe Conway PGConf APAC 2018 39/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Equal
With an index
CREATE INDEX from_idx ON messages(_from);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from = 'Joseph Conway <mail@joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on from_idx
(cost=0.00..4.88 rows=61 width=0)
(actual time=0.051..0.051 rows=14 loops=1)
Index Cond: (_from = 'Joseph Conway <mail@joeconway.com>'::text)
Planning time: 0.267 ms
Execution time: 0.161 ms
Joe Conway PGConf APAC 2018 40/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored
Find all the rows where column matches '<pattern>%'
LIKE operator with suitable index is best
This index does not do the job
CREATE INDEX from_idx ON messages(_from);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from LIKE 'Joseph Conway%';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=62 width=8)
(actual time=52.991..536.316 rows=14 loops=1)
Filter: (_from ~~ 'Joseph Conway%'::text)
Rows Removed by Filter: 1086554
Planning time: 0.264 ms
Execution time: 536.362 ms
Joe Conway PGConf APAC 2018 41/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored
Note text_pattern_ops - this works
CREATE INDEX pattern_idx ON messages(_from using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from LIKE 'Joseph Conway%';
QUERY PLAN
-------------------------------------------------------------------
Index Scan using pattern_idx on messages
(cost=0.43..8.45 rows=62 width=8)
(actual time=0.043..0.082 rows=14 loops=1)
Index Cond: ((_from ~>=~ 'Joseph Conway'::text)
AND (_from ~<~ 'Joseph Conwaz'::text))
Filter: (_from ~~ 'Joseph Conway%'::text)
Planning time: 0.490 ms
Execution time: 0.133 ms
Joe Conway PGConf APAC 2018 42/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Find all the rows where column matches '<pattern>%'
⇒ but in Case-Insensitive way
LIKE operator with suitable expression index is good
CREATE INDEX lower_pattern_idx ON messages(lower(_from) using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE lower(_from) LIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on lower_pattern_idx
(cost=0.00..214.76 rows=5433 width=0)
(actual time=0.074..0.074 rows=14 loops=1)
Index Cond: ((lower(_from) ~>=~ 'joseph conway'::text)
AND (lower(_from) ~<~ 'joseph conwaz'::text))
Planning time: 0.505 ms
Execution time: 0.258 ms
Joe Conway PGConf APAC 2018 43/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Can also use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..176.46 rows=62 width=0)
(actual time=92.980..92.980 rows=155 loops=1)
Index Cond: (_from ~~* 'joseph conway%'::text)
Planning time: 0.857 ms
Execution time: 93.473 ms
Joe Conway PGConf APAC 2018 44/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..8.88 rows=62 width=0)
(actual time=53.080..53.080 rows=155 loops=1)
Index Cond: (_from ~~* 'joseph conway%'::text)
Planning time: 1.068 ms
Execution time: 53.604 ms
Joe Conway PGConf APAC 2018 45/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Find all the rows where column matches '%<pattern>'
⇒ but in Case-Insensitive way
LIKE operator with suitable expression index is good
CREATE INDEX rev_lower_pattern_idx ON messages(lower(reverse(_from)) using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages WHERE lower(reverse(_from))
LIKE reverse('%joeconway.com>');
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on rev_lower_pattern_idx
(cost=0.00..214.76 rows=5433 width=0)
(actual time=1.357..1.357 rows=2749 loops=1)
Index Cond: ((lower(reverse(_from)) ~>=~ '>moc.yawnoceoj'::text)
AND (lower(reverse(_from)) ~<~ '>moc.yawnoceok'::text))
Planning time: 0.278 ms
Execution time: 17.491 ms
Joe Conway PGConf APAC 2018 46/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Can also use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..177.58 rows=2344 width=0)
(actual time=80.537..80.537 rows=2749 loops=1)
Index Cond: (_from ~~* '%joeconway.com>'::text)
Planning time: 0.915 ms
Execution time: 88.723 ms
Joe Conway PGConf APAC 2018 47/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..193.99 rows=2344 width=0)
(actual time=58.386..58.386 rows=2749 loops=1)
Index Cond: (_from ~~* '%joeconway.com>'::text)
Planning time: 0.921 ms
Execution time: 66.771 ms
Joe Conway PGConf APAC 2018 48/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Find all the rows where column matches '%<pattern>%'
⇒ but in Case-Insensitive way
This cannot use expression or pattern ops index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=5096 width=8)
(actual time=2.242..2002.998 rows=7402 loops=1)
Filter: (_from ~~* '%Conway%'::text)
Rows Removed by Filter: 1079166
Planning time: 0.860 ms
Execution time: 2003.667 ms
Joe Conway PGConf APAC 2018 49/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..94.22 rows=5096 width=0)
(actual time=9.060..9.060 rows=7402 loops=1)
Index Cond: (_from ~~* '%Conway%'::text)
Planning time: 0.915 ms
Execution time: 30.567 ms
Joe Conway PGConf APAC 2018 50/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..422.63 rows=5096 width=0)
(actual time=128.881..128.881 rows=7402 loops=1)
Index Cond: (_from ~~* '%Conway%'::text)
Planning time: 0.871 ms
Execution time: 149.755 ms
Joe Conway PGConf APAC 2018 51/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Fuzzy
Find all the rows where column matches '<pattern>' ⇒ but in an inexact way
Use dmetaphone function with an expression index
Might also use Soundex, Levenshtein, Metaphone, or pg trgm
CREATE INDEX dmet_expr_idx ON messages(dmetaphone(_from));
EXPLAIN ANALYZE SELECT _from FROM messages
WHERE dmetaphone(_from) = dmetaphone('josef konwei');
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on dmet_expr_idx
(cost=0.00..101.17 rows=5433 width=0)
(actual time=0.085..0.085 rows=108 loops=1)
Index Cond: (dmetaphone(_from) = 'JSFK'::text)
Planning time: 0.272 ms
Execution time: 0.445 ms
Joe Conway PGConf APAC 2018 52/54
Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Complex Requirements
Full Text Search
Complex multi-word searching
Relevancy Ranking
EXPLAIN ANALYZE SELECT date FROM messages
WHERE fti @@ 'bug:A & deadlock:D & startup:D';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on messages_fti_idx
(cost=0.00..52.02 rows=2 width=0)
(actual time=9.261..9.261 rows=93 loops=1)
Index Cond: (fti @@ '''bug'':A & ''deadlock'':D &
''startup'':D'::tsquery)
Planning time: 0.469 ms
Execution time: 12.614 ms
Joe Conway PGConf APAC 2018 53/54
Overview
Overview by Method
Use Cases
Questions
Questions?
Thank You!
mail@joeconway.com
Joe Conway PGConf APAC 2018 54/54

More Related Content

Similar to PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL

Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docx
amrit47
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Avelin Huo
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
brettflorio
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
Muhammad Alhalaby
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
openCypher
 
A3 sec -_regular_expressions
A3 sec -_regular_expressionsA3 sec -_regular_expressions
A3 sec -_regular_expressionsa3sec
 
Data Structures- Part1 overview and review
Data Structures- Part1 overview and reviewData Structures- Part1 overview and review
Data Structures- Part1 overview and review
Abdullah Al-hazmy
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
monicafrancis71118
 
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache MahoutDaniel Glauser
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
cargillfilberto
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
drandy1
 
Optimizing Tcl Bytecode
Optimizing Tcl BytecodeOptimizing Tcl Bytecode
Optimizing Tcl BytecodeDonal Fellows
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
WebStackAcademy
 
2021.laravelconf.tw.slides2
2021.laravelconf.tw.slides22021.laravelconf.tw.slides2
2021.laravelconf.tw.slides2
LiviaLiaoFontech
 
R meetup talk
R meetup talkR meetup talk
R meetup talk
Joseph Adler
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
Sujith Kumar
 
C Languages FAQ's
C Languages FAQ'sC Languages FAQ's
C Languages FAQ's
Sriram Raj
 

Similar to PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL (20)

Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docx
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
 
A3 sec -_regular_expressions
A3 sec -_regular_expressionsA3 sec -_regular_expressions
A3 sec -_regular_expressions
 
Lec-1c.pdf
Lec-1c.pdfLec-1c.pdf
Lec-1c.pdf
 
Data Structures- Part1 overview and review
Data Structures- Part1 overview and reviewData Structures- Part1 overview and review
Data Structures- Part1 overview and review
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache Mahout
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
Optimizing Tcl Bytecode
Optimizing Tcl BytecodeOptimizing Tcl Bytecode
Optimizing Tcl Bytecode
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 
2021.laravelconf.tw.slides2
2021.laravelconf.tw.slides22021.laravelconf.tw.slides2
2021.laravelconf.tw.slides2
 
R meetup talk
R meetup talkR meetup talk
R meetup talk
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
C Languages FAQ's
C Languages FAQ'sC Languages FAQ's
C Languages FAQ's
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
 

More from PGConf APAC

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
PGConf APAC
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC
 
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
PGConf APAC
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
PGConf APAC
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and Agribotics
PGConf APAC
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
PGConf APAC
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDS
PGConf APAC
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 

More from PGConf APAC (20)

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from Trenches
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and Agribotics
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDS
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 

PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL

  • 1. Where’s Waldo? - Text Search and Pattern Matching in PostgreSQL Joe Conway joe.conway@crunchydata.com mail@joeconway.com Crunchy Data March 22, 2018
  • 2. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Where’s Waldo? Many potential methods Usually best to use simplest method that fits use case Might need to combine more than one method Joe Conway PGConf APAC 2018 2/54
  • 3. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Agenda Summary of methods Overview by method Example use cases Joe Conway PGConf APAC 2018 3/54
  • 4. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Caveats Full text search could easily fill a tutorial ⇒ this talk provides overview Even other methods cannot be covered exhaustively ⇒ this talk provides overview citext not covered, should be considered Joe Conway PGConf APAC 2018 4/54
  • 5. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Text Search Methods Standard Pattern Matching LIKE operator SIMILAR TO operator POSIX-style regular expressions PostgreSQL extensions fuzzystrmatch Soundex Levenshtein Metaphone Double Metaphone pg trgm Full Text Search Joe Conway PGConf APAC 2018 5/54
  • 6. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Note on Extensions Extensions used are created as shown below CREATE EXTENSION pg_trgm; CREATE EXTENSION fuzzystrmatch; Joe Conway PGConf APAC 2018 6/54
  • 7. Overview Overview by Method Use Cases Questions Intro Agenda Methods Sample Data Sample Data CREATE TABLE messages ( [...] _from text NOT NULL, _to text NOT NULL, subject text NOT NULL, bodytxt text NOT NULL, fti tsvector NOT NULL, [...] ); select count(1) from messages; count --------- 1086568 (1 row) Joe Conway PGConf APAC 2018 7/54
  • 8. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search LIKE Syntax Expression returns TRUE if string matches pattern Typically string comes from relation in FROM clause Used as predicate in the WHERE clause to filter returned rows LIKE is case sensitive ILIKE is case insensitive string LIKE pattern [ESCAPE escape-character] string ~~ pattern [ESCAPE escape-character] string ILIKE pattern [ESCAPE escape-character] string ~~* pattern [ESCAPE escape-character] lower(string) LIKE pattern [ESCAPE escape-character] Joe Conway PGConf APAC 2018 8/54
  • 9. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Negating LIKE To negate match, use the NOT keyword Appropriate operator also works string NOT LIKE pattern [ESCAPE escape-character] string !~~ pattern [ESCAPE escape-character] string NOT ILIKE pattern [ESCAPE escape-character] string !~~* pattern [ESCAPE escape-character] NOT (string LIKE pattern [ESCAPE escape-character]) NOT (string ILIKE pattern [ESCAPE escape-character]) Joe Conway PGConf APAC 2018 9/54
  • 10. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Wildcards Pattern can contain wildcard characters Underscore (”_”) matches any single character Percent sign (”%”) matches zero or more characters With no wildcards, expression acts like equals To match literal wildcard chars, they must be escaped Default escape char is backslash (””) May be changed using ESCAPE clause Match the literal escape char by doubling up Joe Conway PGConf APAC 2018 10/54
  • 11. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Alternate Index Op Classes varchar_pattern_ops, text_pattern_ops and bpchar_pattern_ops Useful for anchored pattern matching, e.g. ”<pattern>%” Used by LIKE, SIMILAR TO, or POSIX regex when not using ”C” locale Also create ”normal” index for queries with <, <=, >, or >= Does NOT work for ILIKE or ~~* Expression index over lower(column) pg trgm index operator class Joe Conway PGConf APAC 2018 11/54
  • 12. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search ESCAPE Example SELECT 'AbC_%_dEf' LIKE 'AbC#_#%#_d%' ESCAPE '#'; ?column? ---------- t (1 row) Joe Conway PGConf APAC 2018 12/54
  • 13. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search SIMILAR TO Syntax Equivalent to LIKE Interprets pattern using SQL definition of regex string SIMILAR TO pattern [ESCAPE escape-character] string NOT SIMILAR TO pattern [ESCAPE escape-character] Joe Conway PGConf APAC 2018 13/54
  • 14. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Wildcards Same as LIKE Also supports meta-characters borrowed from POSIX REs pipe (”|”): either of two alternatives asterisk (”*”): repetition >= 0 times plus (”+”): repetition >= 1 time question mark (”?”): repetition 0 or 1 time ”{m}”: repetition exactly m times ”{m,}”: repetition >= m times ”{m,n}”: repetition >= m and <= n times parentheses (”()”): group items into a single logical item Joe Conway PGConf APAC 2018 14/54
  • 15. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search SIMILAR TO Examples SELECT 'AbCdEf' SIMILAR TO 'AbC%' AS true, 'AbCdEf' SIMILAR TO 'Ab(C|c)%' AS true, 'Abcccdef' SIMILAR TO 'Abc{4}%' AS false, 'Abcccdef' SIMILAR TO 'Abc{3}%' AS true, 'Abccdef' SIMILAR TO 'Abc?d?%' AS true; true | true | false | true | true ------+------+-------+------+------ t | t | f | t | t (1 row) Joe Conway PGConf APAC 2018 15/54
  • 16. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Syntax Similar to LIKE and ILIKE Allowed to match anywhere within string ⇒ unless RE is explicitly anchored Interprets pattern using POSIX definition of regex string ~ pattern -- matches RE, case sensitive string ~* pattern -- matches RE, case insensitive string !~ pattern -- not matches RE, case sensitive string !~* pattern -- not matches RE, case insensitive Joe Conway PGConf APAC 2018 16/54
  • 17. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Syntax POSIX-style REs complex enough to deserve own talk See: www.postgresql.org/docs/9.5/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP SELECT 'AbCdefzzzzdef' ~* 'Ab((C|c).*)?z+def.*' AS true, 'AbcabcAbc' ~ '^Ab.*bc$' AS true, 'AbcabcAbc' ~ '^Ab' AS true, 'AbcAbcAbc' ~* 'abc' AS true, 'AbcAbcAbc' ~* '^abc$' AS false; true | true | true | true | false ------+------+------+------+------- t | t | t | t | f (1 row) Joe Conway PGConf APAC 2018 17/54
  • 18. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Example Really slow without an index EXPLAIN ANALYZE SELECT date FROM messages WHERE bodytxt ~* 'multixact'; QUERY PLAN ------------------------------------------------------------------- Seq Scan on messages (cost=0.00..197436.10 rows=108 width=8) (actual time=6.435..26851.944 rows=2580 loops=1) Filter: (bodytxt ~* 'multixact'::text) Rows Removed by Filter: 1083988 Planning time: 1.682 ms Execution time: 26852.410 ms Joe Conway PGConf APAC 2018 18/54
  • 19. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Example Use trigram GIN index CREATE INDEX trgm_gin_bodytxt_idx ON messages USING gin (bodytxt using gin_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE bodytxt ~* 'multixact'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gin_bodytxt_idx (cost=0.00..124.81 rows=108 width=0) (actual time=66.095..66.095 rows=2581 loops=1) Index Cond: (bodytxt ~* 'multixact'::text) Planning time: 3.680 ms Execution time: 192.912 ms Joe Conway PGConf APAC 2018 19/54
  • 20. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Example Or use trigram GiST index . . . oops CREATE INDEX trgm_gist_bodytxt_idx ON messages USING gist (bodytxt using gist_trgm_ops); ERROR: index row requires 8672 bytes, maximum size is 8191 Joe Conway PGConf APAC 2018 20/54
  • 21. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Regular Expression Compared to FTS For the sake of comparison - with full text search EXPLAIN ANALYZE SELECT date FROM messages WHERE fti @@ 'multixact:D'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on messages_fti_idx (cost=0.00..64.75 rows=5433 width=0) (actual time=1.085..1.085 rows=1475 loops=1) Index Cond: (fti @@ '''multixact'':D'::tsquery) Planning time: 0.504 ms Execution time: 22.054 ms Joe Conway PGConf APAC 2018 21/54
  • 22. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Soundex soundex: converts string to four character code difference: converts two strings, reports # matching positions Generally finds similarity of English names Part of fuzzystrmatch extension SELECT soundex('Joseph'), soundex('Josef'), difference('Joseph', 'Josef'); soundex | soundex | difference ---------+---------+------------ J210 | J210 | 4 Joe Conway PGConf APAC 2018 22/54
  • 23. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Levenshtein Calculates Levenshtein distance between two strings Comparisons case sensitive Strings non-null, maximum 255 bytes Part of fuzzystrmatch extension SELECT levenshtein('Joseph','Josef') AS two, levenshtein('John','Joan') AS one, levenshtein('foo','foo') AS zero; two | one | zero -----+-----+------ 2 | 1 | 0 Joe Conway PGConf APAC 2018 23/54
  • 24. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Metaphone Constructs code for an input string Comparisons case in-sensitive Strings non-null, maximum 255 bytes max_output_length arg sets max length of code Part of fuzzystrmatch extension SELECT metaphone('extensive',6) AS "EKSTNS", metaphone('exhaustive',6) AS "EKSHST", metaphone('ExTensive',3) AS "EKS", metaphone('eXhaustivE',3) AS "EKS"; EKSTNS | EKSHST | EKS | EKS --------+--------+-----+----- EKSTNS | EKSHST | EKS | EKS Joe Conway PGConf APAC 2018 24/54
  • 25. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Double Metaphone Computes primary and alternate codes for string Non-English names especially, can be different Comparisons case in-sensitive No length limit on the input strings Part of fuzzystrmatch extension SELECT dmetaphone('extensive') AS "AKST", dmetaphone('exhaustive') AS "AKSS", dmetaphone('Magnus') AS "MNS", dmetaphone_alt('Magnus') AS "MKNS"; AKST | AKSS | MNS | MKNS ------+------+-----+------ AKST | AKSS | MNS | MKNS Joe Conway PGConf APAC 2018 25/54
  • 26. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Trigram Matching Functions and operators for determining similarity Trigram is group of three consecutive characters from string Similarity of two strings - count number of trigrams shared Index operator classes supporting fast similar strings search Support indexed searches for LIKE and ILIKE queries Comparisons case in-sensitive Part of pg trgm extension Joe Conway PGConf APAC 2018 26/54
  • 27. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Trigram Matching Example timing SELECT set_limit(0.6); -- defaults to 0.3 SELECT DISTINCT _from, -- uses trgm_gist_idx similarity(_from, 'Josef Konway <mail@joeconway.com>') AS sml FROM messages WHERE _from % 'Josef Konway <mail@joeconway.com>' ORDER BY sml DESC, _from; _from | sml ------------------------------------+---------- Joseph Conway <mail@joeconway.com> | 0.724138 Joe Conway <mail@joeconway.com> | 0.703704 jconway <mail@joeconway.com> | 0.678571 "Joe Conway" <joe.conway@mail.com> | 0.62963 (4 rows) Time: 502.002 ms Joe Conway PGConf APAC 2018 27/54
  • 28. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Overview Searches documents with potentially complex criteria Superior to other methods in many cases because: Offers linguistic support for derived words Ignores stop words Ranks results by relevance Very flexibly uses indexes Topic very complex - see: http://www.postgresql.org/docs/9.5/static/textsearch.html http://www.postgresql.org/docs/9.5/static/datatype-textsearch.html http://www.postgresql.org/docs/9.5/static/functions-textsearch.html http://www.postgresql.org/docs/9.5/static/textsearch-indexes.html Joe Conway PGConf APAC 2018 28/54
  • 29. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Preprocessing Convert text to tsvector Store tsvector Index tsvector CREATE FUNCTION messages_fti_trigger_func() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NEW.fti = setweight(to_tsvector(coalesce(NEW.subject, '')), 'A') || setweight(to_tsvector(coalesce(NEW.bodytxt, '')), 'D'); RETURN NEW; END $$; CREATE TRIGGER messages_fti_trigger BEFORE INSERT OR UPDATE OF subject, bodytxt ON messages FOR EACH ROW EXECUTE PROCEDURE messages_fti_trigger_func(); CREATE INDEX messages_fti_idx ON messages USING gin (fti); Joe Conway PGConf APAC 2018 29/54
  • 30. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Weighting Weights used in relevance ranking Array specifies how heavily to weigh each category {D-weight, C-weight, B-weight, A-weight} defaults: {0.1, 0.2, 0.4, 1.0} Joe Conway PGConf APAC 2018 30/54
  • 31. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Creating tsvector Parse into tokens Classes of tokens can be processed differently Postgres has standard parser and predefined set of classes Custom parsers can be created Convert tokens into lexemes Dictionaries used for this step ⇒ standard dictionaries provided ⇒ custom ones can be created Normalized: different forms of same word made alike ⇒ fold upper-case letters to lower-case ⇒ removal of suffixes ⇒ elimination of stop words Joe Conway PGConf APAC 2018 31/54
  • 32. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Writing tsquery The pattern to be matched Lexemes combined with boolean operators & (AND) | (OR) ! (NOT) ! (NOT) binds most tightly & (AND) binds more tightly than | (OR) Parentheses used to enforce grouping Label with * to specify prefix matching Supports weight labels Joe Conway PGConf APAC 2018 32/54
  • 33. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Writing tsquery to tsquery() to create Normalizes tokens lexemes Discards stop words Can also cast to tsquery Tokens taken at face value No weight labels SELECT to_tsquery('(hello:A & the:A) | writing:D') AS to_tsq, '(hello & the) | writing'::tsquery AS cast_tsq UNION ALL SELECT to_tsquery('postgres:*'), 'postgres:*'::tsquery; to_tsq | cast_tsq -----------------------+----------------------------- 'hello':A | 'write':D | 'hello' & 'the' | 'writing' 'postgr':* | 'postgres':* Joe Conway PGConf APAC 2018 33/54
  • 34. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Writing tsquery Alternative function plainto tsquery() Text parsed and normalized & (AND) operator inserted between surviving words Should use simple strings only No boolean operators, No weight labels, No prefix-match labels SELECT plainto_tsquery('(hello:B & the:B) | writing:D') AS tsq1, plainto_tsquery('postgres:*') AS tsq2; tsq1 | tsq2 -------------------------------------+---------- 'hello' & 'b' & 'b' & 'write' & 'd' | 'postgr' Joe Conway PGConf APAC 2018 34/54
  • 35. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Match Operator Text search match operator @@ Returns true if tsvector (preprocessed document) matches tsquery (search pattern) Either maybe be written first SELECT split_part(_from, '<', 1) AS name, date FROM messages WHERE fti @@ 'multixact:A & race:D & bug:D'; name | date -----------------+------------------------ Alvaro Herrera | 2013-11-25 07:36:19-08 Andres Freund | 2013-11-25 08:26:55-08 Andres Freund | 2013-11-29 11:58:06-08 Joe Conway PGConf APAC 2018 35/54
  • 36. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Relevance Ranking ts rank(): based on frequency of matching lexemes ts rank cd(): lexeme proximity taken into consideration WITH ts(q) AS ( SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery ) SELECT ts_rank(m.fti, ts.q) as tsrank FROM messages m, ts WHERE m.fti @@ ts.q ORDER BY tsrank DESC LIMIT 4; tsrank ---------- 0.999997 0.999997 0.999997 0.999997 Joe Conway PGConf APAC 2018 36/54
  • 37. Overview Overview by Method Use Cases Questions LIKE Operator SIMILAR TO Operator POSIX-style Regular Expressions Fuzzy Full Text Search Highlighting ts headline(): returns excerpt with query terms highlighted Apply in an outer query, after inner query LIMIT ⇒ avoids ts headline() overhead on eliminated rows SELECT subject, tsrank, ts_headline(format('%s: %s', subject, bodytxt), q) FROM (WITH ts(q) AS (SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery) SELECT ts_rank(m.fti, ts.q) as tsrank, ts.q, m.subject, m.bodytxt FROM messages m, ts WHERE m.fti @@ ts.q ORDER BY tsrank DESC LIMIT 4 ) AS inner_query LIMIT 1; -[ RECORD 1 ]-------------------------------------------------------------- subject | Is anyone aware of data loss causing MultiXact bugs in 9.3.2? tsrank | 0.999997 ts_headline | <b>data</b> <b>loss</b> causing <b>MultiXact</b> bugs in 9.3.2?: I've had multiple complaints of apparent <b>data</b> Joe Conway PGConf APAC 2018 37/54
  • 38. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Pattern Matching: Example Use Cases Equal Anchored Anchored case-insensitive Reverse Anchored case-insensitive Unanchored case-insensitive Fuzzy Complex Search with Relevancy Ranking Joe Conway PGConf APAC 2018 38/54
  • 39. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Equal Find all the rows where column matches '<pattern>' Equal operator with suitable index is best Without an index EXPLAIN ANALYZE SELECT date FROM messages WHERE _from = 'Joseph Conway <mail@joeconway.com>'; QUERY PLAN ------------------------------------------------------------------- Seq Scan on messages (cost=0.00..197436.10 rows=61 width=8) (actual time=49.192..527.343 rows=14 loops=1) Filter: (_from = 'Joseph Conway <mail@joeconway.com>'::text) Rows Removed by Filter: 1086554 Planning time: 0.256 ms Execution time: 527.386 ms Joe Conway PGConf APAC 2018 39/54
  • 40. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Equal With an index CREATE INDEX from_idx ON messages(_from); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from = 'Joseph Conway <mail@joeconway.com>'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on from_idx (cost=0.00..4.88 rows=61 width=0) (actual time=0.051..0.051 rows=14 loops=1) Index Cond: (_from = 'Joseph Conway <mail@joeconway.com>'::text) Planning time: 0.267 ms Execution time: 0.161 ms Joe Conway PGConf APAC 2018 40/54
  • 41. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Anchored Find all the rows where column matches '<pattern>%' LIKE operator with suitable index is best This index does not do the job CREATE INDEX from_idx ON messages(_from); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from LIKE 'Joseph Conway%'; QUERY PLAN ------------------------------------------------------------------- Seq Scan on messages (cost=0.00..197436.10 rows=62 width=8) (actual time=52.991..536.316 rows=14 loops=1) Filter: (_from ~~ 'Joseph Conway%'::text) Rows Removed by Filter: 1086554 Planning time: 0.264 ms Execution time: 536.362 ms Joe Conway PGConf APAC 2018 41/54
  • 42. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Anchored Note text_pattern_ops - this works CREATE INDEX pattern_idx ON messages(_from using text_pattern_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from LIKE 'Joseph Conway%'; QUERY PLAN ------------------------------------------------------------------- Index Scan using pattern_idx on messages (cost=0.43..8.45 rows=62 width=8) (actual time=0.043..0.082 rows=14 loops=1) Index Cond: ((_from ~>=~ 'Joseph Conway'::text) AND (_from ~<~ 'Joseph Conwaz'::text)) Filter: (_from ~~ 'Joseph Conway%'::text) Planning time: 0.490 ms Execution time: 0.133 ms Joe Conway PGConf APAC 2018 42/54
  • 43. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Anchored Case-Insensitive Find all the rows where column matches '<pattern>%' ⇒ but in Case-Insensitive way LIKE operator with suitable expression index is good CREATE INDEX lower_pattern_idx ON messages(lower(_from) using text_pattern_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE lower(_from) LIKE 'joseph conway%'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on lower_pattern_idx (cost=0.00..214.76 rows=5433 width=0) (actual time=0.074..0.074 rows=14 loops=1) Index Cond: ((lower(_from) ~>=~ 'joseph conway'::text) AND (lower(_from) ~<~ 'joseph conwaz'::text)) Planning time: 0.505 ms Execution time: 0.258 ms Joe Conway PGConf APAC 2018 43/54
  • 44. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Anchored Case-Insensitive Can also use trigram GIN index with ILIKE CREATE INDEX trgm_gin_idx ON messages USING gin (_from using gin_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE 'joseph conway%'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gin_idx (cost=0.00..176.46 rows=62 width=0) (actual time=92.980..92.980 rows=155 loops=1) Index Cond: (_from ~~* 'joseph conway%'::text) Planning time: 0.857 ms Execution time: 93.473 ms Joe Conway PGConf APAC 2018 44/54
  • 45. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Anchored Case-Insensitive Or a trigram GiST index with ILIKE CREATE INDEX trgm_gist_idx ON messages USING gist (_from using gist_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE 'joseph conway%'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gist_idx (cost=0.00..8.88 rows=62 width=0) (actual time=53.080..53.080 rows=155 loops=1) Index Cond: (_from ~~* 'joseph conway%'::text) Planning time: 1.068 ms Execution time: 53.604 ms Joe Conway PGConf APAC 2018 45/54
  • 46. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Reverse Anchored Case-Insensitive Find all the rows where column matches '%<pattern>' ⇒ but in Case-Insensitive way LIKE operator with suitable expression index is good CREATE INDEX rev_lower_pattern_idx ON messages(lower(reverse(_from)) using text_pattern_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE lower(reverse(_from)) LIKE reverse('%joeconway.com>'); QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on rev_lower_pattern_idx (cost=0.00..214.76 rows=5433 width=0) (actual time=1.357..1.357 rows=2749 loops=1) Index Cond: ((lower(reverse(_from)) ~>=~ '>moc.yawnoceoj'::text) AND (lower(reverse(_from)) ~<~ '>moc.yawnoceok'::text)) Planning time: 0.278 ms Execution time: 17.491 ms Joe Conway PGConf APAC 2018 46/54
  • 47. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Reverse Anchored Case-Insensitive Can also use trigram GIN index with ILIKE CREATE INDEX trgm_gin_idx ON messages USING gin (_from using gin_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE '%joeconway.com>'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gin_idx (cost=0.00..177.58 rows=2344 width=0) (actual time=80.537..80.537 rows=2749 loops=1) Index Cond: (_from ~~* '%joeconway.com>'::text) Planning time: 0.915 ms Execution time: 88.723 ms Joe Conway PGConf APAC 2018 47/54
  • 48. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Reverse Anchored Case-Insensitive Or a trigram GiST index with ILIKE CREATE INDEX trgm_gist_idx ON messages USING gist (_from using gist_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE '%joeconway.com>'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gist_idx (cost=0.00..193.99 rows=2344 width=0) (actual time=58.386..58.386 rows=2749 loops=1) Index Cond: (_from ~~* '%joeconway.com>'::text) Planning time: 0.921 ms Execution time: 66.771 ms Joe Conway PGConf APAC 2018 48/54
  • 49. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Unanchored Case-Insensitive Find all the rows where column matches '%<pattern>%' ⇒ but in Case-Insensitive way This cannot use expression or pattern ops index EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE '%Conway%'; QUERY PLAN ------------------------------------------------------------------- Seq Scan on messages (cost=0.00..197436.10 rows=5096 width=8) (actual time=2.242..2002.998 rows=7402 loops=1) Filter: (_from ~~* '%Conway%'::text) Rows Removed by Filter: 1079166 Planning time: 0.860 ms Execution time: 2003.667 ms Joe Conway PGConf APAC 2018 49/54
  • 50. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Unanchored Case-Insensitive Use trigram GIN index with ILIKE CREATE INDEX trgm_gin_idx ON messages USING gin (_from using gin_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE '%Conway%'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gin_idx (cost=0.00..94.22 rows=5096 width=0) (actual time=9.060..9.060 rows=7402 loops=1) Index Cond: (_from ~~* '%Conway%'::text) Planning time: 0.915 ms Execution time: 30.567 ms Joe Conway PGConf APAC 2018 50/54
  • 51. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Unanchored Case-Insensitive Or a trigram GiST index with ILIKE CREATE INDEX trgm_gist_idx ON messages USING gist (_from using gist_trgm_ops); EXPLAIN ANALYZE SELECT date FROM messages WHERE _from ILIKE '%Conway%'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on trgm_gist_idx (cost=0.00..422.63 rows=5096 width=0) (actual time=128.881..128.881 rows=7402 loops=1) Index Cond: (_from ~~* '%Conway%'::text) Planning time: 0.871 ms Execution time: 149.755 ms Joe Conway PGConf APAC 2018 51/54
  • 52. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Fuzzy Find all the rows where column matches '<pattern>' ⇒ but in an inexact way Use dmetaphone function with an expression index Might also use Soundex, Levenshtein, Metaphone, or pg trgm CREATE INDEX dmet_expr_idx ON messages(dmetaphone(_from)); EXPLAIN ANALYZE SELECT _from FROM messages WHERE dmetaphone(_from) = dmetaphone('josef konwei'); QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on dmet_expr_idx (cost=0.00..101.17 rows=5433 width=0) (actual time=0.085..0.085 rows=108 loops=1) Index Cond: (dmetaphone(_from) = 'JSFK'::text) Planning time: 0.272 ms Execution time: 0.445 ms Joe Conway PGConf APAC 2018 52/54
  • 53. Overview Overview by Method Use Cases Questions Summary Equal Anchored Fuzzy Complex Complex Requirements Full Text Search Complex multi-word searching Relevancy Ranking EXPLAIN ANALYZE SELECT date FROM messages WHERE fti @@ 'bug:A & deadlock:D & startup:D'; QUERY PLAN ------------------------------------------------------------------- Bitmap Heap Scan on messages [...] -> Bitmap Index Scan on messages_fti_idx (cost=0.00..52.02 rows=2 width=0) (actual time=9.261..9.261 rows=93 loops=1) Index Cond: (fti @@ '''bug'':A & ''deadlock'':D & ''startup'':D'::tsquery) Planning time: 0.469 ms Execution time: 12.614 ms Joe Conway PGConf APAC 2018 53/54
  • 54. Overview Overview by Method Use Cases Questions Questions? Thank You! mail@joeconway.com Joe Conway PGConf APAC 2018 54/54