SlideShare a Scribd company logo
1 of 26
Download to read offline
Accelera'ng 
Local 
Search 
With 
PostgreSQL 
9.1 
Jonathan 
S. 
Katz 
Exco 
Ventures 
PGConf.eu 
2011 
– 
Oct 
21, 
2011
Local 
Search? 
• Not 
necessarily 
loca'on 
based 
on 
places 
• “How 
close 
are 
two 
en''es 
to 
one 
another?” 
• “What 
are 
the 
closest 
en''es 
to 
me?” 
• “What 
are 
my 
nearest 
neighbors?” 
2
Nearest 
Neighbor 
Overview 
• Want 
to 
know 
“how 
similar” 
objects 
are 
rela've 
to 
each 
other 
– What 
are 
the 
top 
“k” 
choices 
near 
me? 
• Need 
to 
define 
a 
“metric” 
for 
similarity 
– “distance” 
3
K-­‐Nearest 
Neighbor 
• Given 
a 
collec'on 
of 
n 
objects 
• When 
trying 
to 
classify 
an 
unknown 
object 
– compute 
the 
distance 
between 
all 
known 
objects 
– find 
the 
k 
(k 
≥ 
1) 
closest 
objects 
to 
the 
unknown 
object 
– classify 
the 
object 
based 
on 
class 
of 
k 
closest 
objects 
• When 
k=1, 
then 
unknown 
object 
is 
given 
same 
classifica'on 
as 
object 
it 
is 
closest 
to 
4
K=1 
Example 
Voronoi 
Diagram 
of 
order 
1 
can 
be 
used 
to 
make 
k=1 
NN 
queries 
5
Just 
a 
bit 
more 
theory… 
• Voronoi 
diagrams 
of 
order-­‐1 
are 
created 
in 
O(n 
log 
n) 
'me 
– Looks 
similar 
to…? 
:-­‐) 
• Queried 
in 
O(1) 
'me 
• Therefore: 
– Pay 
the 
'me 
penalty 
to 
build 
the 
index 
– Query 
against 
index 
quickly 
6
Applica'ons 
• Geoloca'on 
+ 
Op'mizing 
Posi'oning 
• Classifica'on 
• Similarity 
• Recommenda'on 
systems 
• Content-­‐based 
image 
retrieval 
• etc. 
7
So 
what 
about 
PostgreSQL? 
• As 
of 
PostgreSQL 
9.0 
– supports 
geometric 
types 
and 
distances 
• Points, 
circles, 
lines, 
boxes, 
polygons 
• Distance 
operator: 
<-­‐> 
– pg_trgm 
– 
supplied 
module 
for 
determining 
text 
similarity 
• similarity(“abc”, 
“ade”) 
computes 
similarity 
score 
• <-­‐> 
defines 
distance 
(opposite 
if 
similarity), 
not 
defined 
(in 
9.0) 
8
PostgreSQL 
9.1: 
KNN-­‐GiST 
• Let 
n 
= 
size 
of 
a 
table 
• Can 
index 
data 
that 
provides 
a 
“<-­‐>” 
(distance) 
operator 
– Geometric 
– pg_trgm 
• “k” 
= 
LIMIT 
clause 
• Known 
inefficiencies 
when 
k=n 
and 
n 
is 
small 
9
Example: 
pg_trgm 
• Data: 
– List 
of 
1,000,000 
names 
– 
700,000 
unique 
– n 
= 
1,000,000 
• Indexes: 
– CREATE 
INDEX 
names_name_idx 
ON 
names 
(name) 
– CREATE 
INDEX 
trgm_idx 
ON 
names 
USING 
gist 
(name 
gist_trgm_ops) 
• k=10 
• Displaying 
query 
plan 
/ 
execu'on 
'me 
aqer 
10 
runs 
10
pg_trgm: 
9.0 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
11
pg_trgm: 
9.0 
Limit 
(cost=2724.95..2724.98 
rows=10 
width=14) 
(actual 
'me=192.793..192.794 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2724.95..2727.45 
rows=1000 
width=14) 
(actual 
'me=192.790..192.791 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=56.47..2703.34 
rows=1000 
width=14) 
(actual 
'me=188.836..192.499 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
trgm_idx 
(cost=0.00..56.22 
rows=1000 
width=0) 
(actual 
'me=188.652..188.652 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run'me: 
192.881 
ms 
12
pg_trgm: 
9.1 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
13
pg_trgm 
9.1 
Limit 
(cost=2720.91..2720.93 
rows=10 
width=14) 
(actual 
'me=202.022..202.023 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2720.91..2723.41 
rows=1000 
width=14) 
(actual 
'me=202.020..202.021 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=52.43..2699.30 
rows=1000 
width=14) 
(actual 
'me=198.324..201.719 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..52.18 
rows=1000 
width=0) 
(actual 
'me=198.156..198.156 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run1me: 
202.113 
ms 
14
Comparable? 
• Seems 
to 
be 
similar 
– Need 
to 
do 
more 
research 
why 
– 
anyone? 
• However, 
9.1 
offers 
improvements 
for 
LIKE/ 
ILIKE 
search 
with 
pg_trgm 
15
LIKE/ILIKE 
EXPLAIN 
ANALYZE 
SELECT 
name 
FROM 
names 
WHERE 
name 
LIKE 
'%ata%n'; 
16
LIKE/ILIKE 
pg_trgm: 
9.0 
vs 
9.1 
Seq 
Scan 
on 
names 
(cost=0.00..18717.00 
rows=99 
width=14) 
(actual 
'me=0.339..205.659 
rows=665 
loops=1) 
Filter: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
205.743 
ms 
Bitmap 
Heap 
Scan 
on 
names 
(cost=9.45..369.20 
rows=99 
width=14) 
(actual 
'me=122.494..125.967 
rows=665 
loops=1) 
Recheck 
Cond: 
(name 
~~ 
'%ata%n'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..9.42 
rows=99 
width=0) 
(actual 
'me=121.972..121.972 
rows=3551 
loops=1) 
Index 
Cond: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
126.065 
ms 
17
Geometry 
• Data: 
– 2,000,000 
points, 
from 
(0,0) 
-­‐> 
(10000, 
10000) 
• Index: 
– CREATE 
INDEX 
geoloc_coord_idx 
ON 
geoloc 
USING 
gist 
(coord); 
18
Geometry 
EXPLAIN 
ANALYZE 
SELECT 
*, 
coord 
<-­‐> 
point(500,500) 
FROM 
geoloc 
ORDER 
BY 
coord 
<-­‐> 
point(500,500) 
LIMIT 
10; 
19
Geometry: 
9.0 
vs 
9.1 
Limit 
(cost=80958.28..80958.31 
rows=10 
width=20) 
(actual 
'me=1035.313..1035.316 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=80958.28..85958.28 
rows=2000000 
width=20) 
(actual 
'me=1035.312..1035.314 
rows=10 
loops=1) 
Sort 
Key: 
((coord 
<-­‐> 
'(500,500)'::point)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Seq 
Scan 
on 
geoloc 
(cost=0.00..37739.00 
rows=2000000 
width=20) 
(actual 
'me=0.029..569.501 
rows=2000000 
loops=1) 
Total 
run1me: 
1035.349 
ms 
Limit 
(cost=0.00..0.81 
rows=10 
width=20) 
(actual 
'me=0.576..1.255 
rows=10 
loops=1) 
-­‐> 
Index 
Scan 
using 
geoloc_coord_idx 
on 
geoloc 
(cost=0.00..162068.96 
rows=2000000 
width=20) 
(actual 
'me=0.575..1.251 
rows=10 
loops=1) 
Order 
By: 
(coord 
<-­‐> 
'(500,500)'::point) 
Total 
run1me: 
1.391 
ms 
20
Applica'on 
Examples 
• Proximity 
map 
search 
– 
fast! 
21
Drawbacks 
• Performance 
benefits 
are 
limited 
when: 
– LIMIT 
is 
close 
to 
size 
of 
data 
set 
and 
data 
set 
is 
large 
– Data 
set 
is 
small 
• Time 
to 
build 
index 
– High 
transac'on 
table 
22
Conclusions 
• GiST: 
“Generalized 
Search 
Tree” 
– 
index 
is 
there, 
up 
to 
developers 
to 
define 
access 
methods 
of 
data 
types 
– e.g. 
yields 
KNN-­‐GiST 
• Different 
types 
of 
applica'ons 
can 
be 
built 
– 
performance 
enhancements 
• Next 
steps? 
23
My 
Wish 
List 
• Further 
geometric-­‐type 
support 
in 
Postgres 
– N-­‐dimensional 
points 
– ‘=‘ 
operator 
for 
point 
type 
– (PostGIS 
s'll 
champion 
of 
complex 
geometric 
+ 
geographic 
data 
types) 
• Define 
“distance” 
over 
mul'columns 
with 
different 
types? 
– SELECT 
(a.name, 
a.geocode) 
<-­‐> 
(b.name, 
b.geocode) 
FROM 
x 
a, 
x 
b; 
24
References 
• Oleg 
Bartunov 
– 
“Efficient 
K-­‐nearest 
neighbour 
search 
in 
PostgreSQL” 
(h~p://www.sai.msu.su/~megera/ 
postgres/talks/pgday-­‐2010.pdf) 
• Oleg 
Bartunov 
and 
Teodor 
Sigaev 
for 
work 
on 
KNN-­‐ 
GiST 
and 
notes 
on 
pg_trgm 
(h~p:// 
developer.postgresql.org/pgdocs/postgres/ 
pgtrgm.html) 
• Hubert 
“depesz” 
Lubaczewski 
– 
pa~erned 
benchmarks 
off 
of 
his 
work 
-­‐ 
h~p://www.depesz.com/index.php/ 
2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 
25
Contact 
• jonathan.katz@excoventures.com 
• @jkatz05 
• Feedback 
Please! 
– h~p://2011.pgconf.eu/feedback 
26

More Related Content

What's hot

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Fractional Knapsack Problem
Fractional Knapsack ProblemFractional Knapsack Problem
Fractional Knapsack Problemharsh kothari
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Dijkstra's algorithm presentation
Dijkstra's algorithm presentationDijkstra's algorithm presentation
Dijkstra's algorithm presentationSubid Biswas
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Noriyoshi Shinoda
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project JourneyJinwoong Kim
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDBNAVER D2
 
A day in the life of a click house query
A day in the life of a click house queryA day in the life of a click house query
A day in the life of a click house queryCristinaMunteanu43
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2ScyllaDB
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...bhavesh lande
 

What's hot (20)

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Fractional Knapsack Problem
Fractional Knapsack ProblemFractional Knapsack Problem
Fractional Knapsack Problem
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Dijkstra's algorithm presentation
Dijkstra's algorithm presentationDijkstra's algorithm presentation
Dijkstra's algorithm presentation
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project Journey
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
 
A day in the life of a click house query
A day in the life of a click house queryA day in the life of a click house query
A day in the life of a click house query
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearchFan Robbin
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with RDr. Volkan OBAN
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity searchChiao-Meng Huang
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search) (20)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
ClusterAnalysis
ClusterAnalysisClusterAnalysis
ClusterAnalysis
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with R
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity search
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 

More from Jonathan Katz (14)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Accelerating Local Search with PostgreSQL (KNN-Search)

  • 1. Accelera'ng Local Search With PostgreSQL 9.1 Jonathan S. Katz Exco Ventures PGConf.eu 2011 – Oct 21, 2011
  • 2. Local Search? • Not necessarily loca'on based on places • “How close are two en''es to one another?” • “What are the closest en''es to me?” • “What are my nearest neighbors?” 2
  • 3. Nearest Neighbor Overview • Want to know “how similar” objects are rela've to each other – What are the top “k” choices near me? • Need to define a “metric” for similarity – “distance” 3
  • 4. K-­‐Nearest Neighbor • Given a collec'on of n objects • When trying to classify an unknown object – compute the distance between all known objects – find the k (k ≥ 1) closest objects to the unknown object – classify the object based on class of k closest objects • When k=1, then unknown object is given same classifica'on as object it is closest to 4
  • 5. K=1 Example Voronoi Diagram of order 1 can be used to make k=1 NN queries 5
  • 6. Just a bit more theory… • Voronoi diagrams of order-­‐1 are created in O(n log n) 'me – Looks similar to…? :-­‐) • Queried in O(1) 'me • Therefore: – Pay the 'me penalty to build the index – Query against index quickly 6
  • 7. Applica'ons • Geoloca'on + Op'mizing Posi'oning • Classifica'on • Similarity • Recommenda'on systems • Content-­‐based image retrieval • etc. 7
  • 8. So what about PostgreSQL? • As of PostgreSQL 9.0 – supports geometric types and distances • Points, circles, lines, boxes, polygons • Distance operator: <-­‐> – pg_trgm – supplied module for determining text similarity • similarity(“abc”, “ade”) computes similarity score • <-­‐> defines distance (opposite if similarity), not defined (in 9.0) 8
  • 9. PostgreSQL 9.1: KNN-­‐GiST • Let n = size of a table • Can index data that provides a “<-­‐>” (distance) operator – Geometric – pg_trgm • “k” = LIMIT clause • Known inefficiencies when k=n and n is small 9
  • 10. Example: pg_trgm • Data: – List of 1,000,000 names – 700,000 unique – n = 1,000,000 • Indexes: – CREATE INDEX names_name_idx ON names (name) – CREATE INDEX trgm_idx ON names USING gist (name gist_trgm_ops) • k=10 • Displaying query plan / execu'on 'me aqer 10 runs 10
  • 11. pg_trgm: 9.0 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 11
  • 12. pg_trgm: 9.0 Limit (cost=2724.95..2724.98 rows=10 width=14) (actual 'me=192.793..192.794 rows=10 loops=1) -­‐> Sort (cost=2724.95..2727.45 rows=1000 width=14) (actual 'me=192.790..192.791 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=56.47..2703.34 rows=1000 width=14) (actual 'me=188.836..192.499 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on trgm_idx (cost=0.00..56.22 rows=1000 width=0) (actual 'me=188.652..188.652 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run'me: 192.881 ms 12
  • 13. pg_trgm: 9.1 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 13
  • 14. pg_trgm 9.1 Limit (cost=2720.91..2720.93 rows=10 width=14) (actual 'me=202.022..202.023 rows=10 loops=1) -­‐> Sort (cost=2720.91..2723.41 rows=1000 width=14) (actual 'me=202.020..202.021 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=52.43..2699.30 rows=1000 width=14) (actual 'me=198.324..201.719 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..52.18 rows=1000 width=0) (actual 'me=198.156..198.156 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run1me: 202.113 ms 14
  • 15. Comparable? • Seems to be similar – Need to do more research why – anyone? • However, 9.1 offers improvements for LIKE/ ILIKE search with pg_trgm 15
  • 16. LIKE/ILIKE EXPLAIN ANALYZE SELECT name FROM names WHERE name LIKE '%ata%n'; 16
  • 17. LIKE/ILIKE pg_trgm: 9.0 vs 9.1 Seq Scan on names (cost=0.00..18717.00 rows=99 width=14) (actual 'me=0.339..205.659 rows=665 loops=1) Filter: (name ~~ '%ata%n'::text) Total run1me: 205.743 ms Bitmap Heap Scan on names (cost=9.45..369.20 rows=99 width=14) (actual 'me=122.494..125.967 rows=665 loops=1) Recheck Cond: (name ~~ '%ata%n'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..9.42 rows=99 width=0) (actual 'me=121.972..121.972 rows=3551 loops=1) Index Cond: (name ~~ '%ata%n'::text) Total run1me: 126.065 ms 17
  • 18. Geometry • Data: – 2,000,000 points, from (0,0) -­‐> (10000, 10000) • Index: – CREATE INDEX geoloc_coord_idx ON geoloc USING gist (coord); 18
  • 19. Geometry EXPLAIN ANALYZE SELECT *, coord <-­‐> point(500,500) FROM geoloc ORDER BY coord <-­‐> point(500,500) LIMIT 10; 19
  • 20. Geometry: 9.0 vs 9.1 Limit (cost=80958.28..80958.31 rows=10 width=20) (actual 'me=1035.313..1035.316 rows=10 loops=1) -­‐> Sort (cost=80958.28..85958.28 rows=2000000 width=20) (actual 'me=1035.312..1035.314 rows=10 loops=1) Sort Key: ((coord <-­‐> '(500,500)'::point)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Seq Scan on geoloc (cost=0.00..37739.00 rows=2000000 width=20) (actual 'me=0.029..569.501 rows=2000000 loops=1) Total run1me: 1035.349 ms Limit (cost=0.00..0.81 rows=10 width=20) (actual 'me=0.576..1.255 rows=10 loops=1) -­‐> Index Scan using geoloc_coord_idx on geoloc (cost=0.00..162068.96 rows=2000000 width=20) (actual 'me=0.575..1.251 rows=10 loops=1) Order By: (coord <-­‐> '(500,500)'::point) Total run1me: 1.391 ms 20
  • 21. Applica'on Examples • Proximity map search – fast! 21
  • 22. Drawbacks • Performance benefits are limited when: – LIMIT is close to size of data set and data set is large – Data set is small • Time to build index – High transac'on table 22
  • 23. Conclusions • GiST: “Generalized Search Tree” – index is there, up to developers to define access methods of data types – e.g. yields KNN-­‐GiST • Different types of applica'ons can be built – performance enhancements • Next steps? 23
  • 24. My Wish List • Further geometric-­‐type support in Postgres – N-­‐dimensional points – ‘=‘ operator for point type – (PostGIS s'll champion of complex geometric + geographic data types) • Define “distance” over mul'columns with different types? – SELECT (a.name, a.geocode) <-­‐> (b.name, b.geocode) FROM x a, x b; 24
  • 25. References • Oleg Bartunov – “Efficient K-­‐nearest neighbour search in PostgreSQL” (h~p://www.sai.msu.su/~megera/ postgres/talks/pgday-­‐2010.pdf) • Oleg Bartunov and Teodor Sigaev for work on KNN-­‐ GiST and notes on pg_trgm (h~p:// developer.postgresql.org/pgdocs/postgres/ pgtrgm.html) • Hubert “depesz” Lubaczewski – pa~erned benchmarks off of his work -­‐ h~p://www.depesz.com/index.php/ 2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 25
  • 26. Contact • jonathan.katz@excoventures.com • @jkatz05 • Feedback Please! – h~p://2011.pgconf.eu/feedback 26