SlideShare a Scribd company logo
Lucene/Solr Spatial in 2015
David Smiley
Search Engineer/Consultant (Freelance)
2
About David Smiley
Freelance Search Developer/Consultant
Expert Lucene/Solr development skills,
advise (consulting), training
Java, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC member
Primary author of “Apache Solr Enterprise Search Server”
3
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
4
Agenda
New Features / Capabilities
New Approaches
Improvements
Pending
Lucene’s Spatial Module
• Multiple approaches to index spatial data
abstract class SpatialStrategy
(5+ concrete implementations)
• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile
• Grid based
• Uses Spatial4j lib for shapes, distance calculations, and WKT
• Uses JTS Topology Suite lib for polygons
Shape
SpatialPrefixTree / Cell PrefixTreeStrategy
IntersectsPrefixTreeFilter
Contains…
Within…Geohash | Quad
6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr
Surface-of-sphere shapes (Geo3d) — Lucene
Accurate indexed geometries — Lucene, Solr
GeoJSON read/write — Spatial4j
7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,
also useful for point-plotting search results
Usually rendered with a gradient radius
Lucene & Solr APIs
Scalable & fast usually…
v5.2
8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based
Algorithm enumerates the underlying cell/terms and accumulates
the counter in a corresponding grid
Conceptually facet.method=enum for spatial
Works on non-point indexed shapes too
Complexity: O(cells * cellDepthFactor) not O(docs)
No/low memory; mainly the grid of integers
Solr will distribute to shards and merge
Could be faster still; a BFS (vs DFS) layout would be perfect
9
Solr Heatmap Faceting
On an RPT field
(SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad” (optional)
Query:
/select?facet=true
&facet.heatmap=geo_rpt
&facet.heatmap.geom=
["-180 -90" TO "180 90”]
facet.heatmap.format=ints2D or png
// Normal Solr response...
"facet_counts":{
... // facet response fields
"facet_heatmaps":{
"loc_srpt":[
"gridLevel",2,
"columns",32,
"rows",32,
"minX",-180.0,
"maxX",180.0,
"minY",-90.0,
"maxY",90.0,
"counts_ints2D", [null, null, [0, 0, ... ]]
...
10
Solr Heatmap Resources
Solr Ref guide:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search
Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-
million-geonames-with-leaflet-solr-heatmap-facets.html
Live Demo: http://worldwidegeoweb.com
Open-source JavaScript Solr Heatmap Libraries
https://github.com/spacemansteve/SolrHeatmapLayer
https://github.com/mejackreed/leaflet-solr-heatmap
https://github.com/voyagersearch/leaflet-solr-heatmap
11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis
Not a general 3D space geometry lib
Internally uses geocentric X, Y, Z coordinates (hence 3D) with
3D planar geometry mathematics
Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString)
with optional buffer
Distance computations: Arc (angular or surface), Linear (straight-
line), Normal
12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies
path from
Anchorage to
Miami doesn’t
actually cross the
ocean!
13
Geo3D, continued…
Benefits
Inherently more accurate than 2D projected spatial
especially for big shapes or near poles
Many computations are fast; no expensive trigonometry
An alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar file
Maven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
In progress!
14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape
wrapper with RPT
In Lucene-spatial for now
Index Geo3d shapes
Limited to grid accuracy
Query by Geo3d shape
Limited distance sort
Heatmaps
Geo3DPointField &
PointInGeo3DShapeQuery
Based on a 3D BKD index
In spatial3d module
Index points-only
Query by Geo3d shape
No distance sort
Leaner & faster than RPT?
v5.4v5.2
15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shapes as grid cells of varying precision by
prefix
Example, a point shape:
D, DR, DRT, DRT2, DRT2Y
More accuracy scales
Example, a polygon shape:
Too many to list… 508 cells
More accuracy does NOT scale
16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)
SDV (SerializedDVStrategy) stores serialized geometry (accurate)
RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexes
Optimized intersects predicate avoids some geometry checks
> 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialField
Compatible with the Heatmaps feature
Includes a shape cache (per-segment); configurable
v5.2
17
Topic: New Approaches
Lucene
DimensionalValues (BKD Tree Indexes)
GeoPointField
New Lucene index type for numeric values
Including multi-dimensional values!
Old: IntField, FloatField etc., trie indexing is now legacy
New: DimensionalIntField, DimensonalFloatField, etc. with
DimensionalRangeQuery, …
Implemented using a BKD Index
Paper: https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
Much faster and compact than trie/prefix-tree based indexes
Wither term auto-prefixing? LUCENE-5879 Defunct?
v6.0
DimensionalValues (BKD Index)
19
Multiple Fields/Queries using this:
(1D) DimensionalIntField
(2D) DimensionalLatLonField
(3D) Geo3DPointField (previously described)
And you can write your own
…continued
20
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6 bytes, …
Alternatives: LegacyIntField etc. (trie), DateRangeField (RPT)
Would love to see a benchmark!
How-To:
Dimensional___Field: Int, Long, Float, Double, Binary
DimensionalRangeQuery (or DimensionalQuery?)
v5.3
DimensionalValues 1D
21
Efficient 2D geospatial point index
Alternative to RPT or GeoPointField
In lucene-sandbox
No Lucene-spatial module SpatialStrategy wrappers yet, thus no Spatial4j
Shape integration nor Solr integration yet
How-To:
Index: DimensionalLatLonField
Query:
DimensionalPointInBBoxQuery
DimensionalPointInPolygonQuery
point-radius (circle) — in-progress LUCENE-6698
v5.3
DimensionalValues 2D: DimensionalLatLonField
Cool video: https://www.youtube.com/watch?v=x9WnzOvsGKs
22
GeoPointField
2D geospatial point field
Indexed point-only data, single/multi-valued
Spatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPT
Configurable 2x grid size (defaults to 512)
Compact bit interleaved Z-order encoding
Re-uses much of Lucene’s numeric precisionStep &
MultiTermQuery logic
2-phase grid/postings then doc-values algorithm
v5.3
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementations
No Solr support yet
No dependencies
Easy to use compared to RPT; simpler internally too
How-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES))
GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or
GeoPointInPolygonQuery or GeoPointDistanceRangeQuery
Cool video: https://www.youtube.com/watch?v=l2zB9TDUAL4
24
Topic: Some Pending Spatial TODOs
Spatial4j
JTS-free polygon API
(in-progress)
Geo3D adapter
Lucene
FlexPrefixTree — LUCENE-4922
Heatmap optimized FlexPrefixTree
(Breadth First Search layout)
SpatialStrategy adapters for
GeoPointField, DimensionalLatLonField,
Geo3DPointField
Solr
Better spatial Solr QParsers —
SOLR-4242
GeoJSON parsing
More FieldType adapters for
latest Lucene spatial
Nearest-neighbor search
DateRangeField faceting
25
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apache.org
LinkedIn: http://www.linkedin.com/in/davidwsmiley
G+: +DavidSmiley
Twitter: @DavidWSmiley

More Related Content

What's hot

Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
Universitat Politècnica de Catalunya
 
RWDA
RWDARWDA
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 
MySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
MySQL 8.0 Graphical Information System - Mid Atlantic Developers ConferenceMySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
MySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
Dave Stokes
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dat Nguyen
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
Takeshi Yamamuro
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Universitat Politècnica de Catalunya
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Yi-Hsuan Tsai
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web Applications
Symeon Papadopoulos
 
Graphical Objects and Scene Graphs
Graphical Objects and Scene GraphsGraphical Objects and Scene Graphs
Graphical Objects and Scene Graphs
Syed Zaid Irshad
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
University of Oklahoma
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
Mark Kilgard
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 

What's hot (14)

Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
RWDA
RWDARWDA
RWDA
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
MySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
MySQL 8.0 Graphical Information System - Mid Atlantic Developers ConferenceMySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
MySQL 8.0 Graphical Information System - Mid Atlantic Developers Conference
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web Applications
 
Graphical Objects and Scene Graphs
Graphical Objects and Scene GraphsGraphical Objects and Scene Graphs
Graphical Objects and Scene Graphs
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 

Similar to 2016-01 Lucene Solr spatial in 2015, NYC Meetup

No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Thierry Badard
 
[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis
Heejae(Kyungtaak) Noh
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
Jody Garnett
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX London
Joachim Van der Auwera
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
JAX London
 
GIS Data Types
GIS Data TypesGIS Data Types
GIS Data Types
John Reiser
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in Java
Joachim Van der Auwera
 
Opensource gis development - part 2
Opensource gis development - part 2Opensource gis development - part 2
Opensource gis development - part 2
Andrea Antonello
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Thierry Badard
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
Jody Garnett
 
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal OgudahGis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
Michael Kimathi
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kim
OSgeo Japan
 
Why is postgis awesome?
Why is postgis awesome?Why is postgis awesome?
Why is postgis awesome?
Kasper Van Lombeek
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
Jody Garnett
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
Matthew Gerring
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC Systems
HPCC Systems
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
Hektor Jacynycz García
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
Ram Sriharsha
 

Similar to 2016-01 Lucene Solr spatial in 2015, NYC Meetup (20)

No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
No(Geo)SQL
 
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
 
[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX London
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
 
GIS Data Types
GIS Data TypesGIS Data Types
GIS Data Types
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in Java
 
Opensource gis development - part 2
Opensource gis development - part 2Opensource gis development - part 2
Opensource gis development - part 2
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal OgudahGis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
Gis and Ruby 101 at Ruby Conf Kenya 2017 by Kamal Ogudah
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kim
 
Why is postgis awesome?
Why is postgis awesome?Why is postgis awesome?
Why is postgis awesome?
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC Systems
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 

2016-01 Lucene Solr spatial in 2015, NYC Meetup

  • 1. Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)
  • 2. 2 About David Smiley Freelance Search Developer/Consultant Expert Lucene/Solr development skills, advise (consulting), training Java, spatial, and full-stack experience Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
  • 3. 3 More Spatial Contributors! Spatial4j Lucene Solr David Smiley ✔️ ✔️ ✔️ Ryan McKinley ✔️ Justin Deoliveira ✔️ Mike McCandless ✔️ Nick Knize ✔️ Karl Wright ✔️ Ishan Chattopadhyaya ✔️
  • 4. 4 Agenda New Features / Capabilities New Approaches Improvements Pending
  • 5. Lucene’s Spatial Module • Multiple approaches to index spatial data abstract class SpatialStrategy (5+ concrete implementations) • RecursivePrefixTreeStrategy (RPT) is most prominent, versatile • Grid based • Uses Spatial4j lib for shapes, distance calculations, and WKT • Uses JTS Topology Suite lib for polygons Shape SpatialPrefixTree / Cell PrefixTreeStrategy IntersectsPrefixTreeFilter Contains… Within…Geohash | Quad
  • 6. 6 Topic: New Features Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
  • 7. 7 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually… v5.2
  • 8. 8 Heatmaps Under the Hood Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
  • 9. 9 Solr Heatmap Faceting On an RPT field (SpatialRecursivePrefixTreeFieldType) prefixTree=“packedQuad” (optional) Query: /select?facet=true &facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png // Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
  • 10. 10 Solr Heatmap Resources Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10- million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap
  • 11. 11 Geo3D: Shapes on the Surface of a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight- line), Normal
  • 12. 12 All 2D Maps of the Earth Distort Straight Lines A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!
  • 13. 13 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d No Solr integration yet; pending more Spatial4j integration In progress!
  • 14. 14 Index & Search Geo3D Geometries Spatial4j Geo3dShape wrapper with RPT In Lucene-spatial for now Index Geo3d shapes Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps Geo3DPointField & PointInGeo3DShapeQuery Based on a 3D BKD index In spatial3d module Index points-only Query by Geo3d shape No distance sort Leaner & faster than RPT? v5.4v5.2
  • 15. 15 RPT/SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
  • 16. 16 Combining RPT with Serialized Geometry RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable v5.2
  • 17. 17 Topic: New Approaches Lucene DimensionalValues (BKD Tree Indexes) GeoPointField
  • 18. New Lucene index type for numeric values Including multi-dimensional values! Old: IntField, FloatField etc., trie indexing is now legacy New: DimensionalIntField, DimensonalFloatField, etc. with DimensionalRangeQuery, … Implemented using a BKD Index Paper: https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than trie/prefix-tree based indexes Wither term auto-prefixing? LUCENE-5879 Defunct? v6.0 DimensionalValues (BKD Index)
  • 19. 19 Multiple Fields/Queries using this: (1D) DimensionalIntField (2D) DimensionalLatLonField (3D) Geo3DPointField (previously described) And you can write your own …continued
  • 20. 20 Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: LegacyIntField etc. (trie), DateRangeField (RPT) Would love to see a benchmark! How-To: Dimensional___Field: Int, Long, Float, Double, Binary DimensionalRangeQuery (or DimensionalQuery?) v5.3 DimensionalValues 1D
  • 21. 21 Efficient 2D geospatial point index Alternative to RPT or GeoPointField In lucene-sandbox No Lucene-spatial module SpatialStrategy wrappers yet, thus no Spatial4j Shape integration nor Solr integration yet How-To: Index: DimensionalLatLonField Query: DimensionalPointInBBoxQuery DimensionalPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698 v5.3 DimensionalValues 2D: DimensionalLatLonField Cool video: https://www.youtube.com/watch?v=x9WnzOvsGKs
  • 22. 22 GeoPointField 2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm v5.3
  • 23. …continued Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies Easy to use compared to RPT; simpler internally too How-To: doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery or GeoPointDistanceRangeQuery Cool video: https://www.youtube.com/watch?v=l2zB9TDUAL4
  • 24. 24 Topic: Some Pending Spatial TODOs Spatial4j JTS-free polygon API (in-progress) Geo3D adapter Lucene FlexPrefixTree — LUCENE-4922 Heatmap optimized FlexPrefixTree (Breadth First Search layout) SpatialStrategy adapters for GeoPointField, DimensionalLatLonField, Geo3DPointField Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial Nearest-neighbor search DateRangeField faceting
  • 25. 25 That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: dsmiley@apache.org LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley

Editor's Notes

  1. There was a “hit by a bus” syndrome until now. I’m going to be presenting a lot of stuff I did not work on.
  2. And list what this talk is *not*. Not a spatial overview
  3. Also, “Spaceman Steve” is a freelancer offering to do heatmap and other Solr/Geo work.
  4. Thanks to Karl Wright (Nokia/HERE)! The only surface-of-sphere shape supported prior to Geo3D was a circle.
  5. From https://www.reddit.com/r/MapPorn/comments/1p8dba/you_can_theoretically_drive_in_a_straight_line/ But don’t harp on this too much; 2D spatial is still useful.
  6. Geo3dShape w/ RPT more flexible Geo3DPointField is new & faster; more to come Neither have Solr support yet.
  7. Suggest QuadPrefixTree for non-point indices like this. Also: Supports most spatial predicates Theoretically could work well for point-data too; I haven’t tried.
  8. This is for 6.0. Some BKD versions existed in recent 5.x releases in lucene-sandbox
  9. Will include non-range (exact lookup) optimization / API convenience. Unknown if this is faster for date ranges than DateRangePrefixTree. Likely smaller indices.
  10. Will point-radius (circle) have a flat and surface-of-sphere version?
  11. Perf? Likely faster than RPT. Indexes certainly via configuration of a high precisionStep cheaper than Quad or even GeoHash too
  12. No promises! Some of these are new, brought on by new features. (e.g. Lucene then Solr adapter). This list is biased to my interests/awareness.