This document summarizes Giuseppe Broccolo's presentation on using indexes on geospatial databases with PostgreSQL and PostGIS. It discusses the different types of indexes available in PostgreSQL for spatial data, including points, geometries, and point clouds. It provides examples of creating indexes on point datasets and measuring performance improvements for nearest neighbor searches and bounding box queries. For large point cloud datasets, it demonstrates that using indexes and patches can greatly speed up spatial queries compared to full table scans.
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015
1. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Use of indexes on geospatial
database with the PostgreSQL
DBMS
Giuseppe Broccolo
www.2ndquadrant.it
2. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
$~# whoami
• PostgreSQL and PostGIS consultant
– Development, Replication, Disaster Recovery, pre-production Benchmark,
Remote DBA, 24/7 Support, Training
3. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Outline
• Indexes on geospatial DBs
• What does PostgreSQL offer?
• Examples of usage:
– Points in PostgreSQL
– Points in PostGIS extension
– (LiDAR) points in PointCloud extension
4. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Indexes on geospatial databases
• Binary structure used to speed up accesses to data:
–
– In case of trees: balanced/unbalanced structure of nodes
– Theoretical performances:
• R/W: ~O(log N) Size: ~O(N)
– Algorithms are not defined by ordering/comparison but placement
operators
– Index nodes are defined starting from the MBR containing the whole
dataset
5. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
MBR
6. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
MBR
7. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
MBR
8. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
MBR
Balanced:
● R-tree, etc.
9. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
MBR
Unbalanced:
● Kd-tree, Quad-tree,
etc.
10. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
What PostgreSQL offers
• “in core” 2D geometric (not geografic) datatype
– Fixed resolution: double precision
– point, circle, box
– @-@, @@, <->, &&, <<, >>, <<|, |>>, ...
11. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
What PostgreSQL offers
• PostGIS extension:
– geometry, geography
– <@, @>, &&, <<, >>, <<|, |>>, ...
– ST_Lenght(), ST_Distance(), ...
12. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Tree indexes in PostgreSQL
• Balanced indexes
– B-Tree
– GIN (Generalized Inverted Index) – fast accesses to data
– GiST (Generalized Search Tree) – good concurrency, “lossy”
• kNN searches
• Unbalanced index
– SP-GiST (Space Partitioned GiST) – low I/O
• Introduced in PostgreSQL 9.2
• Usable in PostGIS >2.1
13. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Tree indexes in PostgreSQL
• Balanced indexes
– B-Tree
– GIN (Generalized Inverted Index) – fast accesses to data
– GiST (Generalized Search Tree) – good concurrency, “lossy”
• kNN searches
• Unbalanced index
– SP-GiST (Space Partitioned GiST) – low I/O
• Introduced in PostgreSQL 9.2
• Usable in PostGIS >2.1
14. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)
– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1
– postgresql.conf: default
• ~10M of points
– Nearest Neighbours search
– Bounding Box inclusion
15. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)
– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1
– postgresql.conf: default
• ~10M of points
– Nearest Neighbours search
– Bounding Box inclusion
16. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)
– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1
– postgresql.conf: default
• ~10M of points
– Nearest Neighbours search
– Bounding Box inclusion
17. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Indexes creation on the 2D sample
– point datatype supports both GiST and SPGiST indexing
=# CREATE INDEX idx_gist_point ON many_point USING gist(point);
=# CREATE INDEX idx_spgist_point ON many_point USING spgist(point);
– geometry(point,0) datatype supports only GiST indexing
=# CREATE INDEX idx_gist_geom ON many_geom USING gist(point);
=# CREATE INDEX idx_spgist ON many_geom USING spgist(point);
ERROR: data type geometry has no default operator class for access
method "spgist"
HINT: You must specify an operator class for the index or define a
default operator class for the data type.
18. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Indexes creation on the 2D sample
index size table size time
idx_gist_point 715MB 653MB 214s
idx_spgist_point 437MB 653MB 137s
idx_gist_geom 523MB 501MB 290s
19. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Nearest Neighbours search (2D)
– point
SELECT *
FROM many_geom
ORDER BY ST_MakePoint(0.5, 0.5) <-> geom LIMIT 10;
– geometry(point,0)
SELECT *
FROM many_point
ORDER BY point(0.5, 0.5) <-> point LIMIT 10;
20. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Nearest Neighbours search (2D)
• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + Sort 7.3s
planner strategy exec. time
Seq. Scan + Sort 17.2s
planner strategy exec. time
Index Scan
(idx_gist_point)
52ms
planner strategy exec. time
Index Scan
(idx_gist_geom)
18ms
21. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Bounding Box inclusion (2D)
– point
SELECT *
FROM many_geom
WHERE point && ST_MakeBox2D(ST_MakePoint(0.4, 0.4),
ST_MakePoint(0.6, 0.6));
– geometry(point,0)
SELECT *
FROM many_point
WHERE point <@ box(point(0.4, 0.4), point(0.6, 0.6));
22. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Bounding Box inclusion (2D)
• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + <@ 5.7s
planner strategy exec. time
Seq. Scan + && 2.0s
planner strategy exec. time
Index Scan
(idx_spgist_point)
0.4s
planner strategy exec. time
Index Scan
(idx_gist_geom)
0.7s
23. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Bounding Box inclusion (2D)
• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + <@ 5.7s
planner strategy exec. time
Seq. Scan + && 2.0s
planner strategy exec. time
Index Scan
(idx_spgist_point)
0.4s
planner strategy exec. time
Index Scan
(idx_gist_geom)
0.7s
Unbalanced indexes intrinsecally provide
boxed sample in their nodes
Used in BB inclusion!!
24. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Work with (many) 3D points in PostgreSQL
• The OpenGeo suite (Boundless – P. Ramsey)
– Include postgis and pointcloud extensions
• Casting between the two points datatype is allowed
• pointcloud allows to use the patches to reduce the
whole data size
– No packages available to work with PostgreSQL 9.4
– Can import LiDAR data from .LAS files
http://suite.opengeo.org/4.1/whatsnew.html
http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/loadingdata.html#loading-with-pdal
25. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
An example of usage: 1G points cloud
• The test environment:
– 16GB RAM, 1TB RAID1 storage, 8 CPU @3.3GHz, PostgreSQL 9.3
• Use the pointcloud extension
– one point → one record
• Search points inside a BB and NN
4B 4B 4B 2B
http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/schemas.html
26. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Build the index
table size GiST index size building time
56GB 59GB 6h
CREATE INDEX pc_gist_idx ON pcpoints USING gist(Geometry(pt));
You have to cast to PostGIS point datatype to use GiST index
27. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
BB inclusion with 1G points cloud
included points execution time
(no index)
execution time
(with index)
1M 798s 208ms
10M - 9.27s
100M - 99.7s
300M - 682s
SELECT * FROM pcpoint
WHERE Geometry(pt) &&
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326);
Index is
always used!
28. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
BB inclusion with 1G points cloud using patches
WITH sel AS (
SELECT PC_Explode(pa) AS pc FROM pcpatch
WHERE ST_SetSRID(ST_GeomFromEWKB(PC_Envelope(pa)), 4326) &&
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326)
)
SELECT pc FROM sel
WHERE ST_Within(Geometry(pc),
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326));
100k patches 10k points/patch (2h, 9.4GB)
http://suite.opengeo.org/4.1/dataadmin/pointcloud/objects.html#pcpatch
29. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
BB inclusion with 1G points cloud using patches
included points execution time
(search of patches)
execution time
(patch explosion)
1M 520ms 3s
10M 3.8s 16.5s
100M 33.8s 150s
So...indexed searches
are faster!
30. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Nearest neighbours search with 1G points cloud
searched points execution time
(no index)
execution time
(with index)
1M 2000s 1.41s
10M - 13.8s
SELECT *
FROM pcpoints
ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)
LIMIT <searched points>;
31. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Nearest neighbours search with 1G points cloud
searched points execution time
(no index)
execution time
(with index)
1M 2000s 1.41s
10M - 13.8s
SELECT *
FROM pcpoints
ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)
LIMIT <searched points>;
Index blocks in
memory are used,
then SeqScans
searched points execution time
100M 2100s
32. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Conclusions
• PostgreSQL includes many features to work with geospatial entities
– 2D in core geometries, PostGIS, PointCloud (, ...)
• Indexes can be successfully used
– Improved performances for geospatial entities introduced with PostGIS
• Waiting for SP-GiST indexes (PostGIS >2.1)
• Performances achievable for higher number of entries show that
geospatial features in the PostgreSQL DBMS can be suitable for the
range 100M-1G
33. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Questions?
• giuseppe.broccolo@2ndquadrant.it
• @giubro
• gemini__81
• gbroccolo7
34. 2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
FOSS4G.EU 2015
Como, Politecnico di Milano
July 14th
-17th
2015
Creative Commons License
Copyright 2012-2015,
2ndQuadrant Italia - http://www.2ndquadrant.it
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License