SlideShare a Scribd company logo
Accelera'ng 
Local 
Search 
With 
PostgreSQL 
9.1 
Jonathan 
S. 
Katz 
Exco 
Ventures 
PGConf.eu 
2011 
– 
Oct 
21, 
2011
Local 
Search? 
• Not 
necessarily 
loca'on 
based 
on 
places 
• “How 
close 
are 
two 
en''es 
to 
one 
another?” 
• “What 
are 
the 
closest 
en''es 
to 
me?” 
• “What 
are 
my 
nearest 
neighbors?” 
2
Nearest 
Neighbor 
Overview 
• Want 
to 
know 
“how 
similar” 
objects 
are 
rela've 
to 
each 
other 
– What 
are 
the 
top 
“k” 
choices 
near 
me? 
• Need 
to 
define 
a 
“metric” 
for 
similarity 
– “distance” 
3
K-­‐Nearest 
Neighbor 
• Given 
a 
collec'on 
of 
n 
objects 
• When 
trying 
to 
classify 
an 
unknown 
object 
– compute 
the 
distance 
between 
all 
known 
objects 
– find 
the 
k 
(k 
≥ 
1) 
closest 
objects 
to 
the 
unknown 
object 
– classify 
the 
object 
based 
on 
class 
of 
k 
closest 
objects 
• When 
k=1, 
then 
unknown 
object 
is 
given 
same 
classifica'on 
as 
object 
it 
is 
closest 
to 
4
K=1 
Example 
Voronoi 
Diagram 
of 
order 
1 
can 
be 
used 
to 
make 
k=1 
NN 
queries 
5
Just 
a 
bit 
more 
theory… 
• Voronoi 
diagrams 
of 
order-­‐1 
are 
created 
in 
O(n 
log 
n) 
'me 
– Looks 
similar 
to…? 
:-­‐) 
• Queried 
in 
O(1) 
'me 
• Therefore: 
– Pay 
the 
'me 
penalty 
to 
build 
the 
index 
– Query 
against 
index 
quickly 
6
Applica'ons 
• Geoloca'on 
+ 
Op'mizing 
Posi'oning 
• Classifica'on 
• Similarity 
• Recommenda'on 
systems 
• Content-­‐based 
image 
retrieval 
• etc. 
7
So 
what 
about 
PostgreSQL? 
• As 
of 
PostgreSQL 
9.0 
– supports 
geometric 
types 
and 
distances 
• Points, 
circles, 
lines, 
boxes, 
polygons 
• Distance 
operator: 
<-­‐> 
– pg_trgm 
– 
supplied 
module 
for 
determining 
text 
similarity 
• similarity(“abc”, 
“ade”) 
computes 
similarity 
score 
• <-­‐> 
defines 
distance 
(opposite 
if 
similarity), 
not 
defined 
(in 
9.0) 
8
PostgreSQL 
9.1: 
KNN-­‐GiST 
• Let 
n 
= 
size 
of 
a 
table 
• Can 
index 
data 
that 
provides 
a 
“<-­‐>” 
(distance) 
operator 
– Geometric 
– pg_trgm 
• “k” 
= 
LIMIT 
clause 
• Known 
inefficiencies 
when 
k=n 
and 
n 
is 
small 
9
Example: 
pg_trgm 
• Data: 
– List 
of 
1,000,000 
names 
– 
700,000 
unique 
– n 
= 
1,000,000 
• Indexes: 
– CREATE 
INDEX 
names_name_idx 
ON 
names 
(name) 
– CREATE 
INDEX 
trgm_idx 
ON 
names 
USING 
gist 
(name 
gist_trgm_ops) 
• k=10 
• Displaying 
query 
plan 
/ 
execu'on 
'me 
aqer 
10 
runs 
10
pg_trgm: 
9.0 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
11
pg_trgm: 
9.0 
Limit 
(cost=2724.95..2724.98 
rows=10 
width=14) 
(actual 
'me=192.793..192.794 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2724.95..2727.45 
rows=1000 
width=14) 
(actual 
'me=192.790..192.791 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=56.47..2703.34 
rows=1000 
width=14) 
(actual 
'me=188.836..192.499 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
trgm_idx 
(cost=0.00..56.22 
rows=1000 
width=0) 
(actual 
'me=188.652..188.652 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run'me: 
192.881 
ms 
12
pg_trgm: 
9.1 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
13
pg_trgm 
9.1 
Limit 
(cost=2720.91..2720.93 
rows=10 
width=14) 
(actual 
'me=202.022..202.023 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2720.91..2723.41 
rows=1000 
width=14) 
(actual 
'me=202.020..202.021 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=52.43..2699.30 
rows=1000 
width=14) 
(actual 
'me=198.324..201.719 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..52.18 
rows=1000 
width=0) 
(actual 
'me=198.156..198.156 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run1me: 
202.113 
ms 
14
Comparable? 
• Seems 
to 
be 
similar 
– Need 
to 
do 
more 
research 
why 
– 
anyone? 
• However, 
9.1 
offers 
improvements 
for 
LIKE/ 
ILIKE 
search 
with 
pg_trgm 
15
LIKE/ILIKE 
EXPLAIN 
ANALYZE 
SELECT 
name 
FROM 
names 
WHERE 
name 
LIKE 
'%ata%n'; 
16
LIKE/ILIKE 
pg_trgm: 
9.0 
vs 
9.1 
Seq 
Scan 
on 
names 
(cost=0.00..18717.00 
rows=99 
width=14) 
(actual 
'me=0.339..205.659 
rows=665 
loops=1) 
Filter: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
205.743 
ms 
Bitmap 
Heap 
Scan 
on 
names 
(cost=9.45..369.20 
rows=99 
width=14) 
(actual 
'me=122.494..125.967 
rows=665 
loops=1) 
Recheck 
Cond: 
(name 
~~ 
'%ata%n'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..9.42 
rows=99 
width=0) 
(actual 
'me=121.972..121.972 
rows=3551 
loops=1) 
Index 
Cond: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
126.065 
ms 
17
Geometry 
• Data: 
– 2,000,000 
points, 
from 
(0,0) 
-­‐> 
(10000, 
10000) 
• Index: 
– CREATE 
INDEX 
geoloc_coord_idx 
ON 
geoloc 
USING 
gist 
(coord); 
18
Geometry 
EXPLAIN 
ANALYZE 
SELECT 
*, 
coord 
<-­‐> 
point(500,500) 
FROM 
geoloc 
ORDER 
BY 
coord 
<-­‐> 
point(500,500) 
LIMIT 
10; 
19
Geometry: 
9.0 
vs 
9.1 
Limit 
(cost=80958.28..80958.31 
rows=10 
width=20) 
(actual 
'me=1035.313..1035.316 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=80958.28..85958.28 
rows=2000000 
width=20) 
(actual 
'me=1035.312..1035.314 
rows=10 
loops=1) 
Sort 
Key: 
((coord 
<-­‐> 
'(500,500)'::point)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Seq 
Scan 
on 
geoloc 
(cost=0.00..37739.00 
rows=2000000 
width=20) 
(actual 
'me=0.029..569.501 
rows=2000000 
loops=1) 
Total 
run1me: 
1035.349 
ms 
Limit 
(cost=0.00..0.81 
rows=10 
width=20) 
(actual 
'me=0.576..1.255 
rows=10 
loops=1) 
-­‐> 
Index 
Scan 
using 
geoloc_coord_idx 
on 
geoloc 
(cost=0.00..162068.96 
rows=2000000 
width=20) 
(actual 
'me=0.575..1.251 
rows=10 
loops=1) 
Order 
By: 
(coord 
<-­‐> 
'(500,500)'::point) 
Total 
run1me: 
1.391 
ms 
20
Applica'on 
Examples 
• Proximity 
map 
search 
– 
fast! 
21
Drawbacks 
• Performance 
benefits 
are 
limited 
when: 
– LIMIT 
is 
close 
to 
size 
of 
data 
set 
and 
data 
set 
is 
large 
– Data 
set 
is 
small 
• Time 
to 
build 
index 
– High 
transac'on 
table 
22
Conclusions 
• GiST: 
“Generalized 
Search 
Tree” 
– 
index 
is 
there, 
up 
to 
developers 
to 
define 
access 
methods 
of 
data 
types 
– e.g. 
yields 
KNN-­‐GiST 
• Different 
types 
of 
applica'ons 
can 
be 
built 
– 
performance 
enhancements 
• Next 
steps? 
23
My 
Wish 
List 
• Further 
geometric-­‐type 
support 
in 
Postgres 
– N-­‐dimensional 
points 
– ‘=‘ 
operator 
for 
point 
type 
– (PostGIS 
s'll 
champion 
of 
complex 
geometric 
+ 
geographic 
data 
types) 
• Define 
“distance” 
over 
mul'columns 
with 
different 
types? 
– SELECT 
(a.name, 
a.geocode) 
<-­‐> 
(b.name, 
b.geocode) 
FROM 
x 
a, 
x 
b; 
24
References 
• Oleg 
Bartunov 
– 
“Efficient 
K-­‐nearest 
neighbour 
search 
in 
PostgreSQL” 
(h~p://www.sai.msu.su/~megera/ 
postgres/talks/pgday-­‐2010.pdf) 
• Oleg 
Bartunov 
and 
Teodor 
Sigaev 
for 
work 
on 
KNN-­‐ 
GiST 
and 
notes 
on 
pg_trgm 
(h~p:// 
developer.postgresql.org/pgdocs/postgres/ 
pgtrgm.html) 
• Hubert 
“depesz” 
Lubaczewski 
– 
pa~erned 
benchmarks 
off 
of 
his 
work 
-­‐ 
h~p://www.depesz.com/index.php/ 
2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 
25
Contact 
• jonathan.katz@excoventures.com 
• @jkatz05 
• Feedback 
Please! 
– h~p://2011.pgconf.eu/feedback 
26

More Related Content

What's hot

All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
Huawei S5700 Basic Configuration Command
Huawei S5700 Basic Configuration CommandHuawei S5700 Basic Configuration Command
Huawei S5700 Basic Configuration Command
Huanetwork
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Altinity Ltd
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
前端工程師一定要知道的 Docker 虛擬化容器技巧
前端工程師一定要知道的 Docker 虛擬化容器技巧前端工程師一定要知道的 Docker 虛擬化容器技巧
前端工程師一定要知道的 Docker 虛擬化容器技巧
Chu-Siang Lai
 
Red Hat Linux cheat sheet
Red Hat Linux cheat sheetRed Hat Linux cheat sheet
Red Hat Linux cheat sheet
Rafael Montesinos Muñoz
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
Relayd: a load balancer for OpenBSD
Relayd: a load balancer for OpenBSD Relayd: a load balancer for OpenBSD
Relayd: a load balancer for OpenBSD
Giovanni Bechis
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
Viller Hsiao
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Implementation Lessons using WebRTC in Asterisk
Implementation Lessons using WebRTC in AsteriskImplementation Lessons using WebRTC in Asterisk
Implementation Lessons using WebRTC in Asterisk
Moises Silva
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
Alexey Bashtanov
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
Kernel TLV
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
Alexey Lesovsky
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
Servidor DNS Dinámico en Ubuntu Server 16.04 LTS
Servidor DNS Dinámico en Ubuntu Server 16.04 LTSServidor DNS Dinámico en Ubuntu Server 16.04 LTS
Servidor DNS Dinámico en Ubuntu Server 16.04 LTS
Diego Montiel
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB plc
 

What's hot (20)

All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Huawei S5700 Basic Configuration Command
Huawei S5700 Basic Configuration CommandHuawei S5700 Basic Configuration Command
Huawei S5700 Basic Configuration Command
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
前端工程師一定要知道的 Docker 虛擬化容器技巧
前端工程師一定要知道的 Docker 虛擬化容器技巧前端工程師一定要知道的 Docker 虛擬化容器技巧
前端工程師一定要知道的 Docker 虛擬化容器技巧
 
Red Hat Linux cheat sheet
Red Hat Linux cheat sheetRed Hat Linux cheat sheet
Red Hat Linux cheat sheet
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
Relayd: a load balancer for OpenBSD
Relayd: a load balancer for OpenBSD Relayd: a load balancer for OpenBSD
Relayd: a load balancer for OpenBSD
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Implementation Lessons using WebRTC in Asterisk
Implementation Lessons using WebRTC in AsteriskImplementation Lessons using WebRTC in Asterisk
Implementation Lessons using WebRTC in Asterisk
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Redis 101
Redis 101Redis 101
Redis 101
 
Servidor DNS Dinámico en Ubuntu Server 16.04 LTS
Servidor DNS Dinámico en Ubuntu Server 16.04 LTSServidor DNS Dinámico en Ubuntu Server 16.04 LTS
Servidor DNS Dinámico en Ubuntu Server 16.04 LTS
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
Dmytro Shylovskyi
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
Jonathan Katz
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
Jano Suchal
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
Fan Robbin
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
Sean Moran
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with R
Dr. Volkan OBAN
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Sean Moran
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
Nicholas Kiraly
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Konstantin V. Shvachko
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity search
Chiao-Meng Huang
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
DanielGarca686549
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search) (20)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
ClusterAnalysis
ClusterAnalysisClusterAnalysis
ClusterAnalysis
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with R
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity search
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
Jonathan Katz
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
Jonathan Katz
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
Jonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Jonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
Jonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
Jonathan Katz
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 

More from Jonathan Katz (13)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Accelerating Local Search with PostgreSQL (KNN-Search)

  • 1. Accelera'ng Local Search With PostgreSQL 9.1 Jonathan S. Katz Exco Ventures PGConf.eu 2011 – Oct 21, 2011
  • 2. Local Search? • Not necessarily loca'on based on places • “How close are two en''es to one another?” • “What are the closest en''es to me?” • “What are my nearest neighbors?” 2
  • 3. Nearest Neighbor Overview • Want to know “how similar” objects are rela've to each other – What are the top “k” choices near me? • Need to define a “metric” for similarity – “distance” 3
  • 4. K-­‐Nearest Neighbor • Given a collec'on of n objects • When trying to classify an unknown object – compute the distance between all known objects – find the k (k ≥ 1) closest objects to the unknown object – classify the object based on class of k closest objects • When k=1, then unknown object is given same classifica'on as object it is closest to 4
  • 5. K=1 Example Voronoi Diagram of order 1 can be used to make k=1 NN queries 5
  • 6. Just a bit more theory… • Voronoi diagrams of order-­‐1 are created in O(n log n) 'me – Looks similar to…? :-­‐) • Queried in O(1) 'me • Therefore: – Pay the 'me penalty to build the index – Query against index quickly 6
  • 7. Applica'ons • Geoloca'on + Op'mizing Posi'oning • Classifica'on • Similarity • Recommenda'on systems • Content-­‐based image retrieval • etc. 7
  • 8. So what about PostgreSQL? • As of PostgreSQL 9.0 – supports geometric types and distances • Points, circles, lines, boxes, polygons • Distance operator: <-­‐> – pg_trgm – supplied module for determining text similarity • similarity(“abc”, “ade”) computes similarity score • <-­‐> defines distance (opposite if similarity), not defined (in 9.0) 8
  • 9. PostgreSQL 9.1: KNN-­‐GiST • Let n = size of a table • Can index data that provides a “<-­‐>” (distance) operator – Geometric – pg_trgm • “k” = LIMIT clause • Known inefficiencies when k=n and n is small 9
  • 10. Example: pg_trgm • Data: – List of 1,000,000 names – 700,000 unique – n = 1,000,000 • Indexes: – CREATE INDEX names_name_idx ON names (name) – CREATE INDEX trgm_idx ON names USING gist (name gist_trgm_ops) • k=10 • Displaying query plan / execu'on 'me aqer 10 runs 10
  • 11. pg_trgm: 9.0 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 11
  • 12. pg_trgm: 9.0 Limit (cost=2724.95..2724.98 rows=10 width=14) (actual 'me=192.793..192.794 rows=10 loops=1) -­‐> Sort (cost=2724.95..2727.45 rows=1000 width=14) (actual 'me=192.790..192.791 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=56.47..2703.34 rows=1000 width=14) (actual 'me=188.836..192.499 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on trgm_idx (cost=0.00..56.22 rows=1000 width=0) (actual 'me=188.652..188.652 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run'me: 192.881 ms 12
  • 13. pg_trgm: 9.1 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 13
  • 14. pg_trgm 9.1 Limit (cost=2720.91..2720.93 rows=10 width=14) (actual 'me=202.022..202.023 rows=10 loops=1) -­‐> Sort (cost=2720.91..2723.41 rows=1000 width=14) (actual 'me=202.020..202.021 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=52.43..2699.30 rows=1000 width=14) (actual 'me=198.324..201.719 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..52.18 rows=1000 width=0) (actual 'me=198.156..198.156 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run1me: 202.113 ms 14
  • 15. Comparable? • Seems to be similar – Need to do more research why – anyone? • However, 9.1 offers improvements for LIKE/ ILIKE search with pg_trgm 15
  • 16. LIKE/ILIKE EXPLAIN ANALYZE SELECT name FROM names WHERE name LIKE '%ata%n'; 16
  • 17. LIKE/ILIKE pg_trgm: 9.0 vs 9.1 Seq Scan on names (cost=0.00..18717.00 rows=99 width=14) (actual 'me=0.339..205.659 rows=665 loops=1) Filter: (name ~~ '%ata%n'::text) Total run1me: 205.743 ms Bitmap Heap Scan on names (cost=9.45..369.20 rows=99 width=14) (actual 'me=122.494..125.967 rows=665 loops=1) Recheck Cond: (name ~~ '%ata%n'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..9.42 rows=99 width=0) (actual 'me=121.972..121.972 rows=3551 loops=1) Index Cond: (name ~~ '%ata%n'::text) Total run1me: 126.065 ms 17
  • 18. Geometry • Data: – 2,000,000 points, from (0,0) -­‐> (10000, 10000) • Index: – CREATE INDEX geoloc_coord_idx ON geoloc USING gist (coord); 18
  • 19. Geometry EXPLAIN ANALYZE SELECT *, coord <-­‐> point(500,500) FROM geoloc ORDER BY coord <-­‐> point(500,500) LIMIT 10; 19
  • 20. Geometry: 9.0 vs 9.1 Limit (cost=80958.28..80958.31 rows=10 width=20) (actual 'me=1035.313..1035.316 rows=10 loops=1) -­‐> Sort (cost=80958.28..85958.28 rows=2000000 width=20) (actual 'me=1035.312..1035.314 rows=10 loops=1) Sort Key: ((coord <-­‐> '(500,500)'::point)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Seq Scan on geoloc (cost=0.00..37739.00 rows=2000000 width=20) (actual 'me=0.029..569.501 rows=2000000 loops=1) Total run1me: 1035.349 ms Limit (cost=0.00..0.81 rows=10 width=20) (actual 'me=0.576..1.255 rows=10 loops=1) -­‐> Index Scan using geoloc_coord_idx on geoloc (cost=0.00..162068.96 rows=2000000 width=20) (actual 'me=0.575..1.251 rows=10 loops=1) Order By: (coord <-­‐> '(500,500)'::point) Total run1me: 1.391 ms 20
  • 21. Applica'on Examples • Proximity map search – fast! 21
  • 22. Drawbacks • Performance benefits are limited when: – LIMIT is close to size of data set and data set is large – Data set is small • Time to build index – High transac'on table 22
  • 23. Conclusions • GiST: “Generalized Search Tree” – index is there, up to developers to define access methods of data types – e.g. yields KNN-­‐GiST • Different types of applica'ons can be built – performance enhancements • Next steps? 23
  • 24. My Wish List • Further geometric-­‐type support in Postgres – N-­‐dimensional points – ‘=‘ operator for point type – (PostGIS s'll champion of complex geometric + geographic data types) • Define “distance” over mul'columns with different types? – SELECT (a.name, a.geocode) <-­‐> (b.name, b.geocode) FROM x a, x b; 24
  • 25. References • Oleg Bartunov – “Efficient K-­‐nearest neighbour search in PostgreSQL” (h~p://www.sai.msu.su/~megera/ postgres/talks/pgday-­‐2010.pdf) • Oleg Bartunov and Teodor Sigaev for work on KNN-­‐ GiST and notes on pg_trgm (h~p:// developer.postgresql.org/pgdocs/postgres/ pgtrgm.html) • Hubert “depesz” Lubaczewski – pa~erned benchmarks off of his work -­‐ h~p://www.depesz.com/index.php/ 2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 25
  • 26. Contact • jonathan.katz@excoventures.com • @jkatz05 • Feedback Please! – h~p://2011.pgconf.eu/feedback 26