O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lucene/Solr Spatial in 2015
David Smiley
Search Engineer/Consultant (Freelance)
3
About David Smiley
Freelance Search Developer/Consultant
Expert Lucene/Solr development skills,
advise (consulting), tra...
4
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCand...
5
Agenda
New Features / Capabilities
New Approaches
Improvements
Pending
6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr
Surface-of-sphere shapes (Geo3d) — Lucene
Accurate indexed g...
7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,
also useful for point-plotting search results
Usu...
8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based
Algorithm enumerates the underlying cell...
9
Solr Heatmap Faceting
On an RPT field
(SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad”
Query:
/select?facet...
10
Solr Heatmap Resources
Solr Ref guide:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search
Jack Reed’s Tuto...
11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis
Not a general 3D space geometry lib
Intern...
12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies
path from
Anchorage to
Miami doesn’t
actually cro...
13
Geo3D, continued…
Benefits
Inherently more accurate than 2D projected spatial
especially for big shapes or near poles
M...
14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape
wrapper with RPT
In Lucene-spatial for now
Index Geo3d shapes
Limi...
15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shape...
16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)
SDV (Serialized...
17
Topic: New Approaches
Lucene
BKD Tree Indexes
GeoPointField
18
BKD Tree Indexes
New numeric/spatial index approach with own file format
Not based on Lucene Terms index
https://www.cs...
19
Multiple BKD Implementations
Multiple implementations of the same BKD concept:
(1D) RangeTreeDocValuesFormat
(2D) BKDPo...
20
BKD 1D: RangeTree
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6...
21
BKD 2D: BKDPointField
Efficient 2D geospatial point index
Alternative to RPT or GeoPointField
5.7x faster than RPT w/ G...
22
GeoPointField
2D geospatial point field
Indexed point-only data, single/multi-valued
Spatial 2D Trie/PrefixTree terms i...
23
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementations...
24
Topic: Improvements
Spatial4j
Minimal longitude bounding-box algorithm
Lucene (PrefixTree / RPT indexing)
Leaner & fast...
25
Topic: Some Pending Spatial TODOs
Spatial4j
Geo3D integration — a JTS
alternative
Lucene
FlexPrefixTree — LUCENE-
4922
...
26
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apac...
Upcoming SlideShare
Loading in …5
×

Lucene/Solr spatial in 2015

2,854 views

Published on

An overview of spatial/geospatial developments on Lucene and Solr during 2015 (thru Oct). Presented at Lucene/Solr Revolution 2015 in Austin.

Published in: Technology

Lucene/Solr spatial in 2015

  1. 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  2. 2. Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)
  3. 3. 3 About David Smiley Freelance Search Developer/Consultant Expert Lucene/Solr development skills, advise (consulting), training Java, spatial, and full-stack experience Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
  4. 4. 4 More Spatial Contributors! Spatial4j Lucene Solr David Smiley ✔️ ✔️ ✔️ Ryan McKinley ✔️ Justin Deoliveira ✔️ Mike McCandless ✔️ Nick Knize ✔️ Karl Wright ✔️ Ishan Chattopadhyaya ✔️
  5. 5. 5 Agenda New Features / Capabilities New Approaches Improvements Pending
  6. 6. 6 Topic: New Features Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
  7. 7. 7 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually… v5.2
  8. 8. 8 Heatmaps Under the Hood Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
  9. 9. 9 Solr Heatmap Faceting On an RPT field (SpatialRecursivePrefixTreeFieldType) prefixTree=“packedQuad” Query: /select?facet=true &facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png // Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
  10. 10. 10 Solr Heatmap Resources Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10- million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap
  11. 11. 11 Geo3D: Shapes on the Surface of a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight- line), Normal
  12. 12. 12 All 2D Maps of the Earth Distort Straight Lines A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!
  13. 13. 13 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d No Solr integration yet; pending more Spatial4j integration
  14. 14. 14 Index & Search Geo3D Geometries Spatial4j Geo3dShape wrapper with RPT In Lucene-spatial for now Index Geo3d shapes Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps Geo3DPointField & PointInGeo3DShapeQuery Based on a 3D BKD index In spatial3d module Index points-only No multi-valued Query by Geo3d shape No distance sort Leaner & faster than RPT v5.4v5.2
  15. 15. 15 RPT/SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
  16. 16. 16 Combining RPT with Serialized Geometry RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable v5.2
  17. 17. 17 Topic: New Approaches Lucene BKD Tree Indexes GeoPointField
  18. 18. 18 BKD Tree Indexes New numeric/spatial index approach with own file format Not based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than Trie/PrefixTree based indexes Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide) Neat visualization https://youtu.be/x9WnzOvsGKs
  19. 19. 19 Multiple BKD Implementations Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress 1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet thus no Spatial4j Shape integration nor Solr integration yet
  20. 20. 20 BKD 1D: RangeTree Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT) Would love to see a benchmark! How-To: RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery v5.3
  21. 21. 21 BKD 2D: BKDPointField Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes. How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query: BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698 v5.3
  22. 22. 22 GeoPointField 2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm v5.3
  23. 23. 23 …continued Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies Easy to use compared to RPT; simpler internally too How-To: doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
  24. 24. 24 Topic: Improvements Spatial4j Minimal longitude bounding-box algorithm Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)
  25. 25. 25 Topic: Some Pending Spatial TODOs Spatial4j Geo3D integration — a JTS alternative Lucene FlexPrefixTree — LUCENE- 4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc. Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search Well, 2015 isn’t over yet. :-)
  26. 26. 26 That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: dsmiley@apache.org LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley

×