Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
June, 14th
2016
BRIN indexes on
geospatial
databases
www.2ndquadrant.com

~$ whoami~$ whoami
Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD
PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant
@giubro gbroccolo7
giuseppe.broccolo@2ndquadrant.it
gbroccolo gemini__81

Indexes & PostgreSQLIndexes & PostgreSQL
• Binary structures on disc:
– Indexing organised in nodes into 8kB pages
– Nodes point to table rows
• Speed up data access: ~O(log N)
– High performance until the index can be contained in RAM
• Size: ~O(N)

Geospatial indexes in PostgreSQLGeospatial indexes in PostgreSQL
• All datatypes can be indexed in principle...
– ...just define the needed Operator Class!
• Operators & support functions
– ...define how the data has to be stored into the index nodes
• Geospatial data:
– can be larger than 8kB
– Is generally a varlena
– a single node cannot be contained into a single page

PostGIS datatypes & GiST indexingPostGIS datatypes & GiST indexing
In PostGIS (since v0.6) geometry/geography are
converted into their float-precision bounding boxes
gidx/box2df are then used as storage
datatype to populate the index’s nodes

GIS & BigDataGIS & BigData
LiDAR surveys
Maps of entire countries
It’s relatively easy to have to work with terabytes of geospatial data…
...can indexes be contained in RAM?

A new index: Block Range Indexing (PG 9.5)A new index: Block Range Indexing (PG 9.5)
CREATE INDEX idx ON table
USING brin(column) WITH (pages_per_range=10);
data must be physically sorted on disk!
index node → row
index node → block
less specific than GiST!
Really small!
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
(S. Riggs, A. Herrera)

BRIN support for PostGISBRIN support for PostGIS
https://github.com/dalibo/postgis/tree/for_upstream
Julien RouhaudRonan Dunklau me

BRIN support for PostGISBRIN support for PostGIS
NO kNN SUPPORT!!
– Different OpClass for
• 2D (default), 3D, 4D geometry
• geography
• box2d/box3d
• cross-operators defined in the OpFamily
– Storage datatype: float-precision (such as in GiST)
• gidx (3D, 4D geometry)
• box2df (2D geometry, geography)
– Operators: &&, @, ~ (2D) - &&& (3D, 4D)
8kB8kB8kB8kB
8kB8kB8kB8kB

11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE INDEX idx_gist ON points USING gist
-# (geom gist_geometry_ops_nd);
=# CREATE INDEX idx_brin_128 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d);
=# CREATE INDEX idx_brin_10 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d)
-# WITH (pages_per_range=10);
BRIN(128)/GiST 1/2000
BRIN(10)/GiST 1/1000
BRIN(128)/GiST 1/30
BRIN(10)/GiST 1/20
Sizes
Building
times

11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
=# SELECT * FROM points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 50/1
BRIN(10)/GiST 6/1
BRIN(128)/SeqScan 1/60
Execution times

22ndnd
example: “unsorted”example: “unsorted” geometrygeometry pointspoints
=# CREATE TABLE unsorted_points AS
-# SELECT * FROM points ORDER BY random();
=# SELECT * FROM unsorted_points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 2800/1
BRIN(10)/GiST 400/1
Execution times

22ndnd
example: “unsorted”example: “unsorted” geometrygeometry pointspoints
=# CREATE EXTENSION pageinspect;
=# SELECT blknum, value FROM brin_page_items(get_raw_page('idx_brin_128', 2), 'idx_brin_128')
-# LIMIT 5;
blknum | value
--------+------------------------------------------------------------------------
0 | {GIDX( 1.11394846439 1.64980053902 1.11335003376, 99982.2578125 99982.640625
99982.1171875 ) .. f .. f}
128 | {GIDX( 0.224781364202 0.184096381068 0.798862099648, 99995.859375 99995.171875
99995.0390625 ) .. f .. f}
256 | {GIDX( 6.25802087784 6.50119876862 6.8031873703, 99993.15625 99993.8203125
99993.2109375 ) .. f .. f}
384 | {GIDX( 1.9203556776 1.11907613277 1.55043530464, 99995.421875 99995.234375
99995.421875 ) .. f .. f}
512 | {GIDX( 3.44181084633 3.4120452404 3.41762590408, 99996.3125 99996.7109375 99996.59375
) .. f .. f}
(5 rows)

...what about...what about INSERTINSERT’s?’s?
INSERT INTO points
SELECT ST_MakePoint(a, a, a)
FROM generate_series(1, 1000) AS f(a);
GiST 6 times than insertions without index
BRIN (10) 3 times than insertions without index
BRIN (128) 2 times than insertions without index
...blocks could be “not sorted” anymore!

Manage LiDAR
data with
PostgreSQL
is it possible?
Is the relational approach valid for point clouds?Is the relational approach valid for point clouds?

PostgreSQL & LiDARPostgreSQL & LiDAR
• PostgreSQL is a relational DBMS with an extension for LiDAR data:
pg_pointcloud
• Part of the OpenGeo suite, completely compatible with PostGIS
– two new datatype: pcpoint, pcpatch (compressed set of points)
• N points (with all attributes from the survey) → 1 patch → 1 record
– compatible with PDAL drivers to import data directly from .las
32b 32b 32b 16b
https://github.com/pgpointcloud/pointcloud

Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG
• GiST indexing in PostGIS
• GiST performances:
– storage: 1TB RAID1, RAM 16GB, 8 CPU @3.3GHz,
PostgreSQL9.3
– index size ~O(table size)
– Index was used:
• up to ~300M points in bbox inclusion searches
• up to ~10M points in kNN searches
LiDAR size: ~O(109
÷1011
) → few % can be properly indexed!
http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex

The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project
• 3D point cloud, coverage: almost the whole Netherlands
– EPSG: 28992, ~8 points/m2
• 1.6TB, ~250G points in ~560M patches (compression: ~10x)
– PDAL driver – filter.chipper
• available RAM: 16GB
• the point structure:
X Y Z scan LAS time RGB chipper
32b 32b 32b 40b 16b 64b 48b 32b
the “indexed” part (can be converted to PostGIS datatype)
(thanks to Tom Van Tilburg)

Typical searches on ahn2 -Typical searches on ahn2 - x3d_viewerx3d_viewer
• Intersection with a polygon (PostGIS)
– the part that lasts longer – need indexes here!
• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)
• constrained Delaunay Triangulation (SFCGAL)

WITH patches AS (
SELECT patches FROM ahn2
WHERE patches && ST_GeomFromText(‘POLYGON(...)’)
), points AS (
SELECT ST_Explode(patches) AS points
FROM patches
), sorted_points AS (
SELECT points,
ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt
FROM points ORDER BY points <#> poly_pt LIMIT 1;
), sel AS (
SELECT points FROM sorted_points
WHERE points && ST_GeomFromText(‘POLYGON(...)’)
)
SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;
All in the DB & just with one query!All in the DB & just with one query!

patches && polygons - GiST performancepatches && polygons - GiST performance
GiST 2.5 days
GiST 29GB
index building
index size
polygon
size
timing
~O(10m) ~20ms
~O(100m) ~60ms
~O(1Km) ~3s
~O(10km) hours
searches based on GiST
index not contained in
RAM anymore
(~5G points → ~3%)

patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygon
size
timing
~O(m) ~150s
searches based on BRIN
index size

patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 5 h BRIN 14MB
index building
polygon
size
timing
~O(m) ~150s
searches based on BRIN
How data was inserted...
LASLASLASLASLASLAS
chipper
chipper
chipper
PDAL driver
(parallel processes)
ahn2
index size

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE INDEX patch_geohash ON ahn2
USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));
CLUSTER ahn2 USING patch_geohash;
Morton order [http://geohash.org/]
Need more than 2X table size free space
Relatively fast!

CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Just need size for a second table
Slower than clustering using the index!

CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Just need size for a second table
Slower than clustering using the index!
45 hours

patches && polygons – BRIN performancepatches && polygons – BRIN performance
polygon
size
BRIN
timing
GiST
timing
~O(10m) ~380ms ~20ms
~O(100m) ~400ms ~60ms
~O(1Km) ~2.7s ~3s
~O(10km) ~4.7s hours
• After sorting: 150s→380ms
• GiST X20 faster than BRIN [r~O(10m)]
– BRIN X200 faster than SeqScan
• BRIN X1000 faster than SeqScan [r~O(100m)]
•
• BRIN ~ GiST [r~O(1Km)]
• Just BRIN is used for searches r>1km

is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST searches
= x10÷x100 GiST searches
= x10÷x100 GiST searches
patch explosion
+
NN sorting
constrained
triangulation

the future of the project...add kNN supportthe future of the project...add kNN support
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
k
get blocks gradually included within the k parameter, as already made
in GiST with single tuples
WIP…sponsorships are welcome!
ORDER BY geoms <-> point
LIMIT N

ConclusionsConclusions
• BRINs can be successfully used in geo DB based on PostgreSQL
– totally support PostGIS datatype
• A patch almost ready for the next PostGIS release (2.3.0)
– easier indexes maintenance, low indexing time, low impact on concurrent INSERTs
– less specific than GiST...but not to much!
• Really small indexes!
– GiST performances drop as well as it cannot be totally contained in RAM
• Can be relational approach to LiDAR data valid with PostgreSQL?
– Yes, at least for bbox inclusion searches
– make sure that data has the same sequentiality of .las

Creative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
http://creativecommons.org/licenses/by-nc-sa/2.5/it/
© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it

Gbroccolo foss4 guk2016_brin4postgis

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Gbroccolo foss4 guk2016_brin4postgis

Similar to Gbroccolo foss4 guk2016_brin4postgis (20)

Recently uploaded

Recently uploaded (20)

Gbroccolo foss4 guk2016_brin4postgis