Lucene/Solr Spatial in 2015: Presented by David Smiley


Lucene/Solr Revolution 2015

  1. 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  2. 2. Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)
  3. 3. 3 About David Smiley Freelance Search Developer/Consultant Expert Lucene/Solr development skills,
 advise (consulting), training Java, spatial, and full-stack experience Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
  4. 4. 4 More Spatial Contributors! Spatial4j Lucene Solr David Smiley ✔️ ✔️ ✔️ Ryan McKinley ✔️ Justin Deoliveira ✔️ Mike McCandless ✔️ Nick Knize ✔️ Karl Wright ✔️ Ishan Chattopadhyaya ✔️
  5. 5. 5 Agenda New Features / Capabilities New Approaches Improvements Pending
  6. 6. 6 Topic: New Features Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
  7. 7. 7 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually… v5.2
  8. 8. 8 Heatmaps Under the Hood Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
  9. 9. 9 Solr Heatmap Faceting On an RPT field (SpatialRecursivePrefixTreeFieldType) prefixTree=“packedQuad” Query: 
 ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png // Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
  10. 10. 10 Solr Heatmap Resources Solr Ref guide: +Search Jack Reed’s Tutorial: visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: Open-source JavaScript Solr Heatmap Libraries
  11. 11. 11 Geo3D: Shapes on the Surface of a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight-line), Normal
  12. 12. 12 All 2D Maps of the Earth Distort Straight Lines A straight bird- flies path from Anchorage to Miami doesn’t actually cross the ocean!
  13. 13. 13 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d No Solr integration yet; pending more Spatial4j integration
  14. 14. 14 Index & Search Geo3D Geometries Spatial4j Geo3dShape wrapper with RPT In Lucene-spatial for now Index Geo3d shapes Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps Geo3DPointField & PointInGeo3DShapeQuery Based on a 3D BKD index In spatial3d module Index points-only No multi-valued Query by Geo3d shape No distance sort Leaner & faster than RPT v5.4v5.2
  15. 15. 15 RPT/SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
  16. 16. 16 Combining RPT with Serialized Geometry RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable v5.2
  17. 17. 17 Topic: New Approaches Lucene BKD Tree Indexes GeoPointField
  18. 18. 18 BKD Tree Indexes New numeric/spatial index approach with own file format Not based on Lucene Terms index Much faster and compact than Trie/PrefixTree based indexes Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide) Neat visualization
  19. 19. 19 Multiple BKD Implementations Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress 1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet thus no Spatial4j Shape integration nor Solr integration yet
  20. 20. 20 BKD 1D: RangeTree Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT) Would love to see a benchmark! How-To: RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery v5.3
  21. 21. 21 BKD 2D: BKDPointField Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes. How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query: BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698 v5.3
  22. 22. 22 GeoPointField 2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm v5.3
  23. 23. 23 …continued Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies Easy to use compared to RPT; simpler internally too How-To: doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
  24. 24. 24 Topic: Improvements Spatial4j Minimal longitude bounding-box algorithm Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)
  25. 25. 25 Topic: Some Pending Spatial TODOs Spatial4j Geo3D integration — a JTS alternative Lucene FlexPrefixTree — LUCENE-4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc. Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search Well, 2015 isn’t over yet. :-)
  26. 26. 26 That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: LinkedIn: G+: +DavidSmiley Twitter: @DavidWSmiley