SlideShare a Scribd company logo
LUCENE 4 SPATIAL
2012 Basis Technology
Open Source Search Conference
Presented by David Smiley, MITRE




                                   © 2012 The MITRE Corporation. All rights reserved.
About David Smiley
• Working at MITRE, for 12 years
  • web development, Java, search
  • 3 Solr apps, 1 Endeca
• Published 1st book on Solr; then 2nd edition (2009, 2011)
• Apache Lucene / Solr committer (2012)
  • Specializing on spatial
• Presented at Lucene Revolution (2010) & Basis O.S.
  Search Conference (2011)
• Taught Solr classes at MITRE (2010, 2011, 2012)
• Solr search consultant within MITRE and its sponsors,
  and privately via OpenSource Connections

                                              2
                                              © 2012 The MITRE Corporation. All rights reserved.
What is Spatial Search?
Primary features:
  • Spatial filter query
  • Spatial distance sorting
  • Spatial distance relevancy (i.e. spatial query score)
  NOT “geocoding” – resolve “Boston” to its latitude and longitude


Typical use-case:
1. Index a location for each Lucene document given a
   latitude & longitude
2. Then search for matching documents by a circle (point-
   radius) or bounding box
3. Then sort results by distance
                                                        © 2012 The MITRE Corporation. All rights reserved.
History of Spatial for Lucene & Solr
• 2007: Local-Lucene
   • by Patric O’Leary (AOL)
• 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0
   • Local-Lucene graduates to an official Lucene contrib module
• 2009-12: Spatial Search Plugin (SSP) for Solr
   • by Chris Male (JTeam -> Orange11, ElasticSearch)
• 2010-10: SOLR-2155 a geohash prefix tree filter
   • by David Smiley (MITRE)
• 2011-01: Lucene Spatial Playground (LSP)
   • by Ryan McKinley (Voyager GIS), David, and Chris
• 2011-03: Solr 3.1 new spatial features
   • by Grant Ingersoll and Yonik Seeley (LucidWorks)
• 2012-03: LSP -> Lucene 4 spatial module + Spatial4j
   • replaces former Lucene spatial contrib module

                                                        © 2012 The MITRE Corporation. All rights reserved.
Lucene Spatial Committers
• David Smiley, MITRE
  • Bedford, MA




• Chris Male, Elastic Search
  • New Zealand




• Ryan McKinley, Voyager GIS
  • Oakland, CA



                               © 2012 The MITRE Corporation. All rights reserved.
Breakdown of Spatial Components

                                 Misc
                                 16%
          Solr adapters
               6%
                                                  Spatial4j
                                                   43%

                          Lucene spatial
                              35%




Total: 4,781 Non-Comment Source Statements (without javadocs or tests)
                                                              © 2012 The MITRE Corporation. All rights reserved.
Spatial4j: It’s all about the shapes
• Shapes
  • Types: Point, Rectangle, Circle, Polygon
  • Geospatial & Euclidean/2D implementations
  • Intersection: within, contains, intersects, disjoint
• Distance and area math utilities
• Input/Output serialization to Well Known Text (WKT)
   • Ex: POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))
• ASL licensed project independent of Apache on GitHub
• Requires JTS (3rd party LGPL) for polygon & WKT support
• Ported to .NET as Spatial4n and used by RavenDB
  • by Itamar Syn-Herskhko


                                                           © 2012 The MITRE Corporation. All rights reserved.
Lucene 4 Spatial Module
• There isn’t one best way to implement spatial indexing for
 all use-cases
  • Index just points, or other shapes too? Which?
  • Multiple shapes per field?
  • Query by Intersection? Contains? Within? Equals? Disjoint? …
  • Distance sorting? Query boost by distance?
    • Or more exotic shape relevancy like overlap percentage?
  • Tradeoff shape precision for speed?
• Multiple SpatialStrategy implementations:
  • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy
  • PointVectorStrategy
  • BBoxStrategy (currently in trunk, not 4x)
  • JtsGeoStrategy (in Spatial4j/LSP)           Names subject
                                                         to change!

                                                         © 2012 The MITRE Corporation. All rights reserved.
Strategy: PointVector
• Similar to Solr’s PointType / LatLonType
  • X & Y trie double fields; caching via FieldCache
• Characteristics
  • Indexes points (only)
  • Single-valued field (no multi)
  • Query by rectangle or circle (only)
     • Circle uses FieldCache (requires memory)
     • Circle does bbox pre-filter for performance
     • Relations: Intersects, Within (only)
  • Exact precision for x & y coordinates and query shape
  • Distance sort
     • Uses FieldCache (requires memory)



                                                       © 2012 The MITRE Corporation. All rights reserved.
Strategy: RecursivePrefixTree
                                   Potential rename to
• Grid / Tile / Trie / Prefix-
                                 GridFilterSpatialStrategy
 Tree based
  • With recursive decent
    algorithm
  • Or TermQueryPrefixTree
    alternative
• Choose Geohash (geo
  only) or Quad tree
• The most mature
  strategy to date
• The current evolution of
  SOLR-2155

                                   © 2012 The MITRE Corporation. All rights reserved.
Strategy: RecursivePrefixTree
• Characteristics:
  • Indexes all shapes
    • Variable precision of shape edges
       • Highly precise shapes other than point won’t scale
       • LineString’s possibly not precise enough for your needs
  • Multi-valued field support
  • Query by any shape
    • Variable precision for query shape
       • Highest precision usually scales
    • Relations: Intersects (only)
  • Distance sort (w/ multi-value support)
    • Warning: immature, won’t scale
    • Uses significant amounts of memory
  • Fast spatial filtering; no cache needed

                                                                   © 2012 The MITRE Corporation. All rights reserved.
Strategy: BBox
• Implemented with 4 doubles & 1 boolean
• Ported from ESRI Open SourceGeoPortal
• Characteristics:
  • Indexes rectangles (only)
  • Single-valued field (no multi)
  • Query by rectangle (only)
     • Supports all relations: Intersects, Within, Contains, …
  • Distance sort from box center
     • Uses FieldCache (requires memory)
  • Area overlap sorting
     • Sort results by percentage overlap between query and indexed boxes
     • Uses FieldCache (requires memory)
  • Note: FieldCache needs are somewhat high
                                                                 © 2012 The MITRE Corporation. All rights reserved.
Strategy: JtsGeoStrategy
• Stores any JTS geometry in Lucene 4’s DocValues
  • Stores WKB -- WKT in binary format
     • Full vector geometry is retained for search
  • DocValues is mostly a better FieldCache
    • Faster loading into memory
    • Can be disk resident or memory
• Characteristics:
  • Indexes any shape
  • Single valued field but can be MultiPoint, MultiPolygon, etc.
  • Query by any shape
     • Uses DocValues (memory use optional)
     • Supports all relations: intersect, within, contains, …
  • No sorting
  • Experimental / immature status

                                                                © 2012 The MITRE Corporation. All rights reserved.
Solr Adapters
• Configuration:
<fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
      spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"         distErrPct="0.025"
maxDistErr="0.000009" />
<field name="geo" type="geo" indexed="true" stored="true” multiValued="true" />

• Adding data:
<field name="geo">43.17614,-90.57341</field>
<field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field>

• Search Filter
fq=geo:”Intersects(Circle(54.729696,-98.525391 d=10))”

• Distance Sort
sort=query($sortsq) asc&sortsq={! score=distance v=$sq}&sq=store:"Intersects(Circle(54.729696,-98.525391 d=10))"




                                                                                        © 2012 The MITRE Corporation. All rights reserved.
Future Possibilities
• Solr:
  • Filter out points in multi-valued field from search results not matching
    filter
  • Heatmap/grid faceting spatial summarization
• Spatial-Temporal search
  • 3d (x,y,t) point shapes, and “track” shape queries
• Support any query shape for all Strategies
• PrefixTreeStrategy:
  • More efficient binary grid encoding; use Hilbert Curve order
  • Better multi-value point caches
  • Cache-less sort of top-N results
  • More query relations: Contains, Within
• Configurable DocValues vs. FieldCache choice
• Choose floats or configurable bits instead of forcing doubles
• CircleStrategy

                                                           © 2012 The MITRE Corporation. All rights reserved.
Thank you!
• References
  • Lucene 4 spatial javadocs
    • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/
  • Spatial4j at GitHub
    • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect)
    • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com
  • Solr
    • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

• Contact me:
  • David Smiley dsmiley@mitre.org dsmiley@apache.org




                                                               © 2012 The MITRE Corporation. All rights reserved.

More Related Content

Viewers also liked

Geometry
GeometryGeometry
Geometry
kayenta
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Terms
guest2b18d
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
DataStax Academy
 
Eops 2015 1_28
Eops 2015 1_28Eops 2015 1_28
Eops 2015 1_28
Christopher Krembs
 
Vwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_eVwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_e
Alina Wang
 
Rk 3 gsm network
Rk 3 gsm networkRk 3 gsm network
Rk 3 gsm network
Azri Randy
 
Etapas de la industria de un deporte
Etapas de la industria de un deporteEtapas de la industria de un deporte
Etapas de la industria de un deporte
Augusto Alvarez-Calderon
 
Nice photos in the nature
Nice photos in the natureNice photos in the nature
Nice photos in the nature
Renny
 
Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012
pierrickbouquet
 
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Cognizant
 
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Carlos Cueto
 
eFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in SaleseFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in Sales
Barbara Giamanco
 
Inno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezerInno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezer
Tanyer Sonmezer
 
Outside in done right- mindmeet-handout
Outside in done right- mindmeet-handoutOutside in done right- mindmeet-handout
Outside in done right- mindmeet-handout
Marion Debruyne
 
LIDERAZGO
LIDERAZGOLIDERAZGO
Primer Portal Empleo RSE
Primer Portal Empleo RSEPrimer Portal Empleo RSE
Awit ng paghilom
Awit ng paghilomAwit ng paghilom
Awit ng paghilomabad93
 
4 Principios de Email Marketing
4 Principios de Email Marketing4 Principios de Email Marketing
4 Principios de Email Marketing
Encuentro E-Marketing
 
Antecedentes de la Admistracion
Antecedentes de la AdmistracionAntecedentes de la Admistracion
Antecedentes de la Admistracion
Diana Sastoque
 
origen del callao
origen del callaoorigen del callao
origen del callao
stefano4016
 

Viewers also liked (20)

Geometry
GeometryGeometry
Geometry
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Terms
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
 
Eops 2015 1_28
Eops 2015 1_28Eops 2015 1_28
Eops 2015 1_28
 
Vwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_eVwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_e
 
Rk 3 gsm network
Rk 3 gsm networkRk 3 gsm network
Rk 3 gsm network
 
Etapas de la industria de un deporte
Etapas de la industria de un deporteEtapas de la industria de un deporte
Etapas de la industria de un deporte
 
Nice photos in the nature
Nice photos in the natureNice photos in the nature
Nice photos in the nature
 
Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012
 
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
 
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
 
eFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in SaleseFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in Sales
 
Inno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezerInno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezer
 
Outside in done right- mindmeet-handout
Outside in done right- mindmeet-handoutOutside in done right- mindmeet-handout
Outside in done right- mindmeet-handout
 
LIDERAZGO
LIDERAZGOLIDERAZGO
LIDERAZGO
 
Primer Portal Empleo RSE
Primer Portal Empleo RSEPrimer Portal Empleo RSE
Primer Portal Empleo RSE
 
Awit ng paghilom
Awit ng paghilomAwit ng paghilom
Awit ng paghilom
 
4 Principios de Email Marketing
4 Principios de Email Marketing4 Principios de Email Marketing
4 Principios de Email Marketing
 
Antecedentes de la Admistracion
Antecedentes de la AdmistracionAntecedentes de la Admistracion
Antecedentes de la Admistracion
 
origen del callao
origen del callaoorigen del callao
origen del callao
 

Similar to Lucene 4 spatial

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
David Smiley
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David Smiley
Lucidworks
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
Eduardo Castro
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
Pavan Naik
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
lucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
lucenerevolution
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
Matt Lord
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
Eduardo Castro
 
State of JTS 2017
State of JTS 2017State of JTS 2017
State of JTS 2017
Jody Garnett
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
Grant Ingersoll
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
tdthomassld
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
Stefan Schmidt
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
Matthew Groves
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
SearchStax
 

Similar to Lucene 4 spatial (20)

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David Smiley
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
No(Geo)SQL
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
State of JTS 2017
State of JTS 2017State of JTS 2017
State of JTS 2017
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to Postgres
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Lucene 4 spatial

  • 1. LUCENE 4 SPATIAL 2012 Basis Technology Open Source Search Conference Presented by David Smiley, MITRE © 2012 The MITRE Corporation. All rights reserved.
  • 2. About David Smiley • Working at MITRE, for 12 years • web development, Java, search • 3 Solr apps, 1 Endeca • Published 1st book on Solr; then 2nd edition (2009, 2011) • Apache Lucene / Solr committer (2012) • Specializing on spatial • Presented at Lucene Revolution (2010) & Basis O.S. Search Conference (2011) • Taught Solr classes at MITRE (2010, 2011, 2012) • Solr search consultant within MITRE and its sponsors, and privately via OpenSource Connections 2 © 2012 The MITRE Corporation. All rights reserved.
  • 3. What is Spatial Search? Primary features: • Spatial filter query • Spatial distance sorting • Spatial distance relevancy (i.e. spatial query score) NOT “geocoding” – resolve “Boston” to its latitude and longitude Typical use-case: 1. Index a location for each Lucene document given a latitude & longitude 2. Then search for matching documents by a circle (point- radius) or bounding box 3. Then sort results by distance © 2012 The MITRE Corporation. All rights reserved.
  • 4. History of Spatial for Lucene & Solr • 2007: Local-Lucene • by Patric O’Leary (AOL) • 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0 • Local-Lucene graduates to an official Lucene contrib module • 2009-12: Spatial Search Plugin (SSP) for Solr • by Chris Male (JTeam -> Orange11, ElasticSearch) • 2010-10: SOLR-2155 a geohash prefix tree filter • by David Smiley (MITRE) • 2011-01: Lucene Spatial Playground (LSP) • by Ryan McKinley (Voyager GIS), David, and Chris • 2011-03: Solr 3.1 new spatial features • by Grant Ingersoll and Yonik Seeley (LucidWorks) • 2012-03: LSP -> Lucene 4 spatial module + Spatial4j • replaces former Lucene spatial contrib module © 2012 The MITRE Corporation. All rights reserved.
  • 5. Lucene Spatial Committers • David Smiley, MITRE • Bedford, MA • Chris Male, Elastic Search • New Zealand • Ryan McKinley, Voyager GIS • Oakland, CA © 2012 The MITRE Corporation. All rights reserved.
  • 6. Breakdown of Spatial Components Misc 16% Solr adapters 6% Spatial4j 43% Lucene spatial 35% Total: 4,781 Non-Comment Source Statements (without javadocs or tests) © 2012 The MITRE Corporation. All rights reserved.
  • 7. Spatial4j: It’s all about the shapes • Shapes • Types: Point, Rectangle, Circle, Polygon • Geospatial & Euclidean/2D implementations • Intersection: within, contains, intersects, disjoint • Distance and area math utilities • Input/Output serialization to Well Known Text (WKT) • Ex: POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10)) • ASL licensed project independent of Apache on GitHub • Requires JTS (3rd party LGPL) for polygon & WKT support • Ported to .NET as Spatial4n and used by RavenDB • by Itamar Syn-Herskhko © 2012 The MITRE Corporation. All rights reserved.
  • 8. Lucene 4 Spatial Module • There isn’t one best way to implement spatial indexing for all use-cases • Index just points, or other shapes too? Which? • Multiple shapes per field? • Query by Intersection? Contains? Within? Equals? Disjoint? … • Distance sorting? Query boost by distance? • Or more exotic shape relevancy like overlap percentage? • Tradeoff shape precision for speed? • Multiple SpatialStrategy implementations: • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy • PointVectorStrategy • BBoxStrategy (currently in trunk, not 4x) • JtsGeoStrategy (in Spatial4j/LSP) Names subject to change! © 2012 The MITRE Corporation. All rights reserved.
  • 9. Strategy: PointVector • Similar to Solr’s PointType / LatLonType • X & Y trie double fields; caching via FieldCache • Characteristics • Indexes points (only) • Single-valued field (no multi) • Query by rectangle or circle (only) • Circle uses FieldCache (requires memory) • Circle does bbox pre-filter for performance • Relations: Intersects, Within (only) • Exact precision for x & y coordinates and query shape • Distance sort • Uses FieldCache (requires memory) © 2012 The MITRE Corporation. All rights reserved.
  • 10. Strategy: RecursivePrefixTree Potential rename to • Grid / Tile / Trie / Prefix- GridFilterSpatialStrategy Tree based • With recursive decent algorithm • Or TermQueryPrefixTree alternative • Choose Geohash (geo only) or Quad tree • The most mature strategy to date • The current evolution of SOLR-2155 © 2012 The MITRE Corporation. All rights reserved.
  • 11. Strategy: RecursivePrefixTree • Characteristics: • Indexes all shapes • Variable precision of shape edges • Highly precise shapes other than point won’t scale • LineString’s possibly not precise enough for your needs • Multi-valued field support • Query by any shape • Variable precision for query shape • Highest precision usually scales • Relations: Intersects (only) • Distance sort (w/ multi-value support) • Warning: immature, won’t scale • Uses significant amounts of memory • Fast spatial filtering; no cache needed © 2012 The MITRE Corporation. All rights reserved.
  • 12. Strategy: BBox • Implemented with 4 doubles & 1 boolean • Ported from ESRI Open SourceGeoPortal • Characteristics: • Indexes rectangles (only) • Single-valued field (no multi) • Query by rectangle (only) • Supports all relations: Intersects, Within, Contains, … • Distance sort from box center • Uses FieldCache (requires memory) • Area overlap sorting • Sort results by percentage overlap between query and indexed boxes • Uses FieldCache (requires memory) • Note: FieldCache needs are somewhat high © 2012 The MITRE Corporation. All rights reserved.
  • 13. Strategy: JtsGeoStrategy • Stores any JTS geometry in Lucene 4’s DocValues • Stores WKB -- WKT in binary format • Full vector geometry is retained for search • DocValues is mostly a better FieldCache • Faster loading into memory • Can be disk resident or memory • Characteristics: • Indexes any shape • Single valued field but can be MultiPoint, MultiPolygon, etc. • Query by any shape • Uses DocValues (memory use optional) • Supports all relations: intersect, within, contains, … • No sorting • Experimental / immature status © 2012 The MITRE Corporation. All rights reserved.
  • 14. Solr Adapters • Configuration: <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory" distErrPct="0.025" maxDistErr="0.000009" /> <field name="geo" type="geo" indexed="true" stored="true” multiValued="true" /> • Adding data: <field name="geo">43.17614,-90.57341</field> <field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field> • Search Filter fq=geo:”Intersects(Circle(54.729696,-98.525391 d=10))” • Distance Sort sort=query($sortsq) asc&sortsq={! score=distance v=$sq}&sq=store:"Intersects(Circle(54.729696,-98.525391 d=10))" © 2012 The MITRE Corporation. All rights reserved.
  • 15. Future Possibilities • Solr: • Filter out points in multi-valued field from search results not matching filter • Heatmap/grid faceting spatial summarization • Spatial-Temporal search • 3d (x,y,t) point shapes, and “track” shape queries • Support any query shape for all Strategies • PrefixTreeStrategy: • More efficient binary grid encoding; use Hilbert Curve order • Better multi-value point caches • Cache-less sort of top-N results • More query relations: Contains, Within • Configurable DocValues vs. FieldCache choice • Choose floats or configurable bits instead of forcing doubles • CircleStrategy © 2012 The MITRE Corporation. All rights reserved.
  • 16. Thank you! • References • Lucene 4 spatial javadocs • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/ • Spatial4j at GitHub • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect) • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com • Solr • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 • Contact me: • David Smiley dsmiley@mitre.org dsmiley@apache.org © 2012 The MITRE Corporation. All rights reserved.

Editor's Notes

  1. Distance sorting &amp; relevancy wind up being one underlying technical requirement from the implementation
  2. Misc: is a demo web application and a Lucene spatial strategy called “JtsSpatialStrategy” that cannot be included in Lucene spatial due to licensing.
  3. Polygons support dateline wrap.Well tested.Key differentiators: ASL licensed, Geospatial support, Circles &amp; Polygons
  4. In time there will be additional unique capabilities of different implementations.TermQueryPrefixTreeStrategy too.SpatialStrategies can be combined just as people index text different ways simultaneouslySee SpatialExample.java for some code samples
  5. This is a simple strategy. I’d like to see it extended to support choosing floats or other more compact means of holding the coordinates in memory for a desired precision level.
  6. Recommend pairing with TwoDoublesStrategy for single-value distance sort
  7. Would like to see customizable to floats ore other compact