SlideShare a Scribd company logo
1 of 58
Download to read offline
LUCENE/ SOLR 4 SPATIALDEEPDIVE
DavidSmiley
SoftwareSystemsEngineer,Lead
© 2013 The MITRE Corporation. All rights reserved.
LUCENE / SOLR 4 SPATIAL
DEEP-DIVE
2013 Lucene Revolution
Presented by David Smiley, MITRE
About David Smiley
• Working at MITRE, for 13 years
• web development, Java, search
• 3 Solr apps, 1 Endeca
• Published 1st book on Solr; then 2nd edition (2009, 2011)
• Apache Lucene / Solr committer/PMC member (2012)
• Specializing on spatial
• Presented at Lucene Revolution (2010) & Basis O.S.
Search Conference (2011, 2012)
• Taught Solr classes at MITRE (2010, 2011, 2012)
• Solr search consultant within MITRE and its sponsors,
and privately
3
Agenda
• Background, overview
• Spatial4j
• Lucene spatial
• PrefixTree / Trie / Grid
• Solr spatial
• Demo
• Interesting use-cases
BACKGROUND &
OVERVIEW
What is Spatial Search?
Popular features:
• Spatial filter query
• Spatial distance sorting
• Spatial distance relevancy (i.e. spatial query score)
NOT “geocoding” – resolve “Boston” to its latitude and longitude
Typical use-case:
1. Index a location for each Lucene document given a
latitude & longitude
2. Then search for matching documents by a circle (point-
radius) or bounding box
3. Then sort results by distance
History of Spatial for Lucene & Solr
• 2007: Local-Lucene
• by Patric O’Leary (AOL)
• 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0
• Local-Lucene graduates to an official Lucene contrib module
• 2009-12: Spatial Search Plugin (SSP) for Solr
• by Chris Male (JTeam -> Orange11, ElasticSearch)
• 2010-10: SOLR-2155 a geohash prefix tree filter
• by David Smiley (MITRE)
• 2011-01: Lucene Spatial Playground (LSP)
• by Ryan McKinley (Voyager GIS), David, and Chris
• 2011-03: Solr 3.1 new spatial features
• by Grant Ingersoll and Yonik Seeley (LucidWorks)
• 2012-03: LSP -> Lucene 4 spatial module + Spatial4j + SSP
• replaces former Lucene spatial contrib module
Lucene Spatial Committers
• David Smiley
• Works for MITRE
• Boston area
• Ryan McKinley
• Works for Voyager GIS
• Silicon Valley
• Chris Male,
• Formerly at Elastic Search
• New Zealand
Spatial decomposed
• Spatial4j
• Shapes, WKT, Distance calculations, JTS adapter
• Lucene spatial
• Strategies: PrefixTree (TermQuery & Recursive impl.), BBox,
PointVector
• Solr adapters
• Misc: Spatial Solr Sandbox
• LSE
• JtsGeoStrategy
• Spatial-Demo (web app)
Lines of Code for Spatial Components
Spatial4j
43%
Lucene spatial
35%
Solr adapters
6%
Misc
16%
Total: 4,781 Non-Comment Source Statements (without javadocs or tests)
as of 2012-09
CarrotSearch Labs’ RandomizedTesting
• http://labs.carrotsearch.com/randomizedtesting.html
• Provides plumbing for repeatable randomized JUnit tests
• All the spatial test code uses it extensively
Randomized testing more generally is a certain
philosophy / approach on how to test
• A typical hard-coded test will only catch some regressions
• A randomized test will catch just about anything
eventually, especially nasty edge cases
• Although it’s hard to read / write / maintain these tests
• Randomized testing helped find bugs related to…
• Computing the bounding box of a circle
• Computing the relationship of a circle to a rectangle that has all 4 of
its corners inside it
SPATIAL4J
It’s all about the shapes
Spatial4j: It’s all about the shapes
https://github.com/spatial4j/spatial4j (spatial4j.com redirect)
• Shapes
• A “Shape” abstraction with multiple implementations
• Geodetic (sphere) & Cartesian/2D implementations
• Computes intersection relationship with other shapes
• Also…
• Distance and area math utilities, Geohash utilities
• Parsing Well Known Text (WKT) formatted shapes
• ASL licensed project independent of Apache on GitHub
• Requires JTS (LGPL licensed) for polygons & WKT*
• JTS is “JTS Topology Suite”
• * WKT parsing soon to be implemented directly by Spatial4j
• Ported to .NET as Spatial4n and used by RavenDB
• by Itamar Syn-Herskhko
The case for Spatial4j’s existence
• Just for shapes? How much code could there be?
• You’d be surprised. Determining the relationship between a lat-lon
rectangle and a geodetic circle (Within, Contains, Intersects, Disjoint)
is non-trivial, and that’s just one shape.
• Lots of non-trivial test code go with it.
• Why isn’t it a part of Lucene spatial?
• Parts of Spatial4j depend on JTS, an LGPL licensed library. The
Lucene PMC voted not to introduce this compile-time dependency.
• Spatial4j is independently useful.
• Is this duplication of other open-source that could be used?
• Spatial4j needs to be ASL licensed to be a dependency of Lucene.
• Still… I haven’t found existing code that does what Spatial4j does.
• Can’t only the JTS dependent parts be external to Lucene?
The Shape interface
(may become an abstract class in the next version)
• interface Shape {
• Point getCenter();
• Rectangle getBoundingBox();
• boolean hasArea();
• double getArea();
• SpatialRelation relate(Shape other);
• Must support Point & Rectangle
• enum SpatialRelation
• DISJOINT, INTERSECTS, WITHIN, CONTAINS
• Note: simpler set than the “DE-9IM” spatial standard
• no “equals” or “touches”
Spatial4j shapes
Cartesian
Cartesian
with
dateline
wrap
Geodetic
Point Y Y Y
Line & LineString
(w/ buffer)
Y N N
Rectangle Y Y Y
Circle Y N Y
ShapeCollection Y Y Y
JTS Geometry
(incl. polygons)
Y Y N
• Cartesian (AKA
Euclidean): a flat plane
• Dateline wrap assumes
the plane circles back on
itself
• Geodetic: a spherical
mathematical model
Well Known Text (WKT)
(see Wikipedia)
• A popular standard for
representing shapes as
strings
• Requires JTS’s WKT
Parser but Spatial4j has
its own in-progress
• Extensions are TBD for
Rectangles and Circles
• Limited support for
EMPTY and “Z” and “M”
dimensions (future)
• Some Examples:
• POINT (3, -2)
• LINESTRING(30 10, 10 30, …
• POLYGON ((30 10, 10 20, 20
40, 40 40, 30 10))
• MULTIPOLYGON (((…
• …
• Deprecated (may move
to Solr):
• -90, -180
• -180 -90 180 90
• CIRCLE(4.56,1.23 d=0.071)
• TBD / Pending:
• ENVELOPE(-180,180,90,-90)
• BOX2D(-180 -90, 180 90)
Spatial4j code sample
SpatialContext ctx = SpatialContext.GEO;
Rectangle r = ctx.makeRectangle(-71, -70, 42, 43);
Circle c = ctx.makeCircle(-72, 42, 1);
SpatialRelation rel = r.relate(c);
System.out.println(rel);
rel.intersects();//boolean
ctx = JtsSpatialContext.GEO;
Shape s = ctx.readShape(“POLYGON ((30 10, 10 20, 20 40, 40
40, 30 10))”);
double distanceDegrees = ctx.getDistCalc().distance(
ctx.makePoint(2, 2), ctx.makePoint(3, 3) );
Distances (including circle
radius) are in “Degrees”, not
radians or KM
Spatial4j Future
• Built-in WKT support (no JTS dependency)
• Extensible to user-defined shapes
• API improvements
• Shape argument validation via WKT but not via ctx.makeShape(…)
• ShapeCollection visitor design pattern
• Refactor to remove need for isGeo()
• LineString dateline & geodetic support
• Projection / Datum support
LUCENE SPATIAL
Spatial index information retrieval
Lucene 4 Spatial Module
• There isn’t one best way to implement spatial indexing for
all use-cases
• Index just points, or other shapes too? Which?
• Multiple shapes per field?
• Query by Intersection? Contains? Within? Equals? Disjoint? …
• Distance sorting? Query boost by distance?
• Or more exotic shape relevancy like overlap percentage?
• Tradeoff shape precision for speed?
• Multiple SpatialStrategy implementations:
• RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy
• PointVectorStrategy
• BBoxStrategy (currently in trunk, not 4x)
• JtsGeoStrategy (in Spatial Solr Sandbox)
Strategy: PointVector
• Similar to Solr’s PointType / LatLonType
• X & Y trie double fields; caching via FieldCache
• Characteristics
• Indexes points (only)
• Single-valued field (no multi)
• Query by rectangle or circle (only)
• Circle uses FieldCache (requires memory)
• Circle does bbox pre-filter for performance
• Relations: Intersects, Within (only)
• Exact precision for x & y coordinates and query shape
• Distance sort
• Uses FieldCache (requires memory)
Strategy: BBox
• Implemented with 4 doubles & 1 boolean
• Ported from ESRI GeoPortal (Open Source)
• Characteristics:
• Indexes rectangles (only)
• Single-valued field (no multi)
• Query by rectangle (only)
• Supports all relations: Intersects, Within, Contains, …
• Distance sort from box center
• Uses FieldCache (requires memory)
• Area overlap sorting
• Sort results by percentage overlap between query and indexed boxes
• Uses FieldCache (requires memory)
• Note: FieldCache needs are somewhat high
Strategy: JtsGeoStrategy
• Stores a JTS geometry in Lucene 4’s DocValues
• Stores WKB (WKT in binary format)
• Full vector geometry is retained for search
• DocValues is mostly a better FieldCache
• Faster loading into memory
• Can be disk resident or memory
• Multi-valued
• Characteristics:
• Indexes any shape, including Multi… varieties
• Query by any shape
• Uses DocValues (memory use optional)
• Supports all relations: intersect, within, contains, …
• Could easily also support JTS’s exotic DE-9IM based relations
• Exact precision to the vector geometry
• No sorting
• Experimental / immature status
More of a proof-of-concept for now
PREFIXTREE STRATEGY
Spatial grid indexing
Strategy: RecursivePrefixTree
• Grid / Tile / Trie / Prefix-
Tree based
• With recursive decent
algorithms
• Or TermQueryPrefixTree
alternative
• Choose Geohash (geo
only) or Quad tree
• The most mature
strategy to date
• Highly tested
• The current evolution of
SOLR-2155
Strategy: RecursivePrefixTree
• Characteristics:
• Indexes all shapes
• Variable precision of shape edges
• Highly precise shapes other than Point won’t scale
• LineString possibly not precise enough for your needs
• Multi-valued field support
• Query by any shape
• Variable precision for query shape
• Highest precision usually scales
• All Relations: Intersects, Within, Contains, Disjoint
• Distance sort (w/ multi-value support)
• Warning: immature, won’t scale
• Uses significant amounts of memory
• Fast scalable spatial filtering; no caches needed
new in Lucene 4.3
How many search /
NoSQL systems have
these capabilities?
Geohashes
• What is a Geohash?
• A lat/lon geocode system
• Has a hierarchical spatial structure
• Gradual precision degradation
• In the public domain
http://en.wikipedia.org/wiki/Geohash
• Example: (Boston) DRT2Y
Demo
http://openlocation.org/geohash/geohash-js/
Zooming In: D
Zooming In: DR
Zooming In: DRT
Zooming In: DRT2
Zooming In: DRT2Y
Geohash Grids
DRT2Y
Internal coordinates of an odd length geohash…
…and an even length geohash
DRT2
Demo
• Spatial Solr Playground
• Demo KML grid generation from geometries
• A sample point with quad tree indexes to these tokens:
• A, AD, ADB, ADBA
• A sample circle with quad tree indexes to these tokens:
• A, AB, ABA, ABAB+, ABAC+, ABAD+, ABB, ABBA+,
ABBB+, ABBC+, ABBD+, ABC, ABCA+, ABCB+, ABCC+,
ABCD+, ABD+, AD, ADA, ADAA+, ADAB+, ADAC+, ADAD+,
ADB+, ADC, ADCA+, ADCB+, ADCD+, ADD, ADDA+,
ADDB+, ADDC+, ADDD+, B, BA, BAA, BAAC+, BAAD+,
BAC, BACA+, BACB+, BACC+, BACD+, BC, BCA, BCAA+,
BCAB+, BCAC+, BCC, BCCA+, BCCC+, C, CB, CBB,
CBBA+
• Tokens with a ‘+’ are actually indexed with and without the ‘+’
PrefixTreeStrategy Architecture
Shape
calc rect relationship
SpatialPrefixTree & Cell
byte string to/from Cell (rect)
PrefixTreeStrategy
index & search algorithms
Lucene
TermsEnum
IntersectsPrefixTreeFilter
ContainsPrefixTreeFilter
WithinPrefixTreeFilter
Lucene Spatial example code
ctx = SpatialContext.GEO;
strategy = new RecursivePrefixTreeStrategy(
new GeohashPrefixTree(ctx,11), “myGeoField”);
… // make indexWriter and a Document
for (Field f : strategy.createIndexableFields(shape))
doc.add(f);
indexWriter.addDocument(doc);
…
filter = strategy.makeFilter(
new SpatialArgs(SpatialOperation.Intersects,
ctx.makeCircle(-80.0, 33.0,
DistanceUtils.dist2Degrees(200,
DistanceUtils.EARTH_MEAN_RADIUS_KM))));
indexSearcher.search(userKeywordQuery, filter, 10);
See SpatialExample.java in Lucene spatial tests for more
Future
• Possible de-emphasis of SpatialStrategy abstraction
• A better options for distance sorting of PrefixTree
strategies
• Better PrefixTree encoding than both geohash & quad
tree
• Google Summer of Code 2013 -- TBD
• Performance improvements to spatial Intersects
RecursivePrefixTree Filter
• Remove the need to double-index leaf-nodes (with and
without ‘+’)
• Exact geometry search by blending benefits of PrefixTree
and JtsGeoStrategy
• A Single-dimensional PrefixTree (for numeric range index)
SOLR SPATIAL
Adapters to Lucene 4 spatial
Solr 3 Spatial: LatLonType & friends
• Solr 3 was Solr’s first release to include spatial support
• Not based on Lucene’s old spatial contrib module
• Similar to TwoDoublesStrategy but more optimized
• Single-valued only, fast distance sorting, can choose floats (save
memory)
• Fields:
• LatLonType (Geodetic)
• PointType (Cartesian)
• Query parsers (spatial filters):
• {!geofilt} (circle) “p” and “sfield” and “d” params
• {!bbox} (bounding box of a circle)
• Distance function:
• geodist() and some esoteric others
NOT completely
superseded by Solr 4
spatial fields
Solr 4 Spatial
• See
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial
4
<fieldType name="location_rpt"
class="solr.SpatialRecursivePrefixTreeFieldType”
spatialContextFactory=”
com.spatial4j.core.context.jts.JtsSpatialContextFactory”
distErrPct="0.025”
maxDistErr="0.000009”
units="degrees” />
If you don’t need JTS
(polygons) don’t set this
Non-point shapes
approximated to
grid up to 2.5% of
radius
Max precision (1m) as
measured in degrees
Indexing
• Point: Latitude, Longitude (i.e. Y, X)
<field name="geo">43.17614, -90.57341</field>
• Point: X Y
<field name="geo">-90.57341 43.17614</field>
• Rect: minX minY maxX maxY
<field name="geo">-74.093 41.042 -69.347 44.558</field>
• Circle: point then d=radius (in degrees)
• will be deprecated
<field name="geo">Circle(4.56,1.23 d=0.0710)</field>
• WKT (preferred; it’s a standard)
<field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20,
0 0, -10 30))</field>
Filter (search)
• Using Solr 3’s bbox or geofilt query parsers
• Distance radius ‘d’ is interpreted as kilometers, just like LatLonType
• Limited to bbox and bbox of a circle
fq={!geofilt}&sfield=geo&pt=45.15,-93.85&d=5
• Range query style (bounding box)
• Handles dateline wrap
fq=geo:[-90,-180 TO 90,180]
• Field query style
• Unique to Lucene 4 spatial; see SpatialArgsParser
fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40
20, 0 0, -10 30))) distErrPct=0”
• Predicates: Intersects, IsDisjointTo, IsWithin,
Contains, …
• distErrPct (& distErr) optional; override field type’s default
SOLR-4242: A
better spatial
query parser
Distance Sort & Relevancy Boost
• geodist() is for Solr 3 LatLonType only
sort=geodist(lltField,45.15,-93.85) desc
• Solr 4 spatial queries can return the distance as the score
q={!geofilt sfield=geo pt=45.15,-93.85 d=5
score=distance}&sort=score asc&fl=*,score
• Without a filter
sort=query($sortsq) asc&sortsq={!geofilt filter=false
score=distance sfield=geo pt=45.15,-93.85 d=0}
• Relevancy boost
defType=edismax&boost=query($mysq)&mysq={!geofilt
filter=false score=recipDistance pt=45.15,-98.85
d=5}
Distance Faceting
• sfield=geo (the field)
• pt=45.15,-93.85 (point of reference)
• Within 10km
• facet.query={!geofilt d=10}
• Within 50km
• facet.query={!geofilt d=50}
• Within 100km
• facet.query={!geofilt d=100}
Future
• A more Solr-friendly spatial query parser SOLR-4242
• Retrofit geodist() to support the SpatialStrategies?
• Expose more tunables
• A grid based heat-map faceting component
• Idea: a multi-strategy spatial field encompassing
• A PrefixTree field for points
• A PrefixTree field for non-points
• A TwoDoubles field for good distance sorting / relevancy
• Knows whether its single vs. multi-valued
• A FieldType for multi-value numeric ranges
DEMO
INTERESTING USE CASES
1. Geohash each point to multiple lengths and index each
length into its own field
• geohash_1:D, geohash_2:DR, geohash_3:DRT, geohash_4:DRT2
2. Search with a rectangle (bbox) filter, and…
3. Facet on the geohash field with the desired resolution
• facet.field=geohash_4
&facet.limit=10000
• Lots of tuning / customization
options
• Projected / quad tree
• facet.prefix may help
Heatmap / Grid faceting
Plotting many points on a map
• Why not ask Solr for rows=1000 ?
• It’s slow
• If variable-points per doc then could yield be 1 distinct point or 1M
• Instead facet on a geohash with facet.limit=1000
• Fast
• Guaranteed <= 1000 points
• But might need lots of memory
• Or result-grouping on a geohash
But do you really want
to plot 1000+ points
on a map?
Filter by indexed distance constraints
• Imagine a dating site where both potential parties have a
maximum distance they’re willing to travel
• Q: For the current user, who is not “too far” for you but is
also not “too far” for them?
• A: Index each user’s location as a point in one field and
as a circle in another. Query by the current user’s circle to
the indexed point field as well as the current user’s point
to the indexed circle field.
Multi-valued durations
• What if your documents needed a variable number of time (or
other numerical value) durations
• This approach won’t work:
<field name=“start” type=“tdate” multiValued=“true”/>
<field name=“end” type=“tdate” multiValued=“true”/>
• Solr (without Solr 4 spatial fields) can’t do it!
• You need to think differently to solve this…
http://wiki.apache.org/solr/SpatialForTimeDurations
• Example use-cases
• Searching for hotel-room vacancies
• Searching for movie show-times
• (next slides) Each document is a person with a variable number of
“shifts” that they are working…
… model durations as points
… queries become rectangles
… some config & search details
• Configuration
<fieldType name="days_of_year”
class="solr.SpatialRecursivePrefixTreeFieldType"
geo="false" units="degrees"
worldBounds="0 0 365 365"
distErrPct="0" maxDistErr="1"/>
• Sample search: Find shifts that have any overlap with 19th day to 23rd
daysOfYear:Intersects(0 18.5 23.5 365)
• Caveat: Won’t scale to the full precision of a java Long (timestamp)
Thank you!
• References
• Lucene 4 spatial javadocs
• https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/
• Spatial4j at GitHub
• https://github.com/spatial4j/spatial4j ( spatial4j.com redirect)
• http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com
• Solr
• http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
• Spatial Solr Sandbox
• https://github.com/ryantxu/spatial-solr-sandbox
• Contact me:
• David Smiley dsmiley@mitre.org dsmiley@apache.org
CONTACT
DavidSmiley
dsmiley@mitre.org

More Related Content

What's hot

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBleesjensen
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
473721 dba feature_usage_statistics
473721 dba feature_usage_statistics473721 dba feature_usage_statistics
473721 dba feature_usage_statisticsMartin Berger
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
JSONB introduction and comparison with other frameworks
JSONB introduction and comparison with other frameworksJSONB introduction and comparison with other frameworks
JSONB introduction and comparison with other frameworksDmitry Kornilov
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le
 
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO DevOps with ActiveMQ, Camel, Fabric8, and HawtIO
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO Christian Posta
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsCarlos Sierra
 

What's hot (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
473721 dba feature_usage_statistics
473721 dba feature_usage_statistics473721 dba feature_usage_statistics
473721 dba feature_usage_statistics
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
6.hive
6.hive6.hive
6.hive
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
JSONB introduction and comparison with other frameworks
JSONB introduction and comparison with other frameworksJSONB introduction and comparison with other frameworks
JSONB introduction and comparison with other frameworks
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learned
 
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO DevOps with ActiveMQ, Camel, Fabric8, and HawtIO
DevOps with ActiveMQ, Camel, Fabric8, and HawtIO
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 

Viewers also liked

Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Searchlucenerevolution
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucidworks
 
Geospatial search with SOLR
Geospatial search with SOLRGeospatial search with SOLR
Geospatial search with SOLRNicolas Leroy
 
Geometry
GeometryGeometry
Geometrykayenta
 
OpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on SolrOpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on Solrlucenerevolution
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Termsguest2b18d
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 

Viewers also liked (18)

Lucene 4 spatial
Lucene 4 spatialLucene 4 spatial
Lucene 4 spatial
 
Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Search
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
 
Geospatial search with SOLR
Geospatial search with SOLRGeospatial search with SOLR
Geospatial search with SOLR
 
Geometry
GeometryGeometry
Geometry
 
OpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on SolrOpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on Solr
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Terms
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 

Similar to Lucene solr 4 spatial extended deep dive

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal updateDavid Smiley
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyLucidworks
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaSpark Summit
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovypaulbowler
 
NGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationNGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationFIWARE
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018Matthew Groves
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for JavaJody Garnett
 
"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design PrinciplesSerhiy Oplakanets
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GISbryanluman
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft PlatformAll Things Open
 
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...Matthew Groves
 
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018Matthew Groves
 

Similar to Lucene solr 4 spatial extended deep dive (20)

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David Smiley
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
State of JTS 2017
State of JTS 2017State of JTS 2017
State of JTS 2017
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
Spatial search with geohashes
Spatial search with geohashesSpatial search with geohashes
Spatial search with geohashes
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
 
NGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationNGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integration
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles
 
Openstreetmap
OpenstreetmapOpenstreetmap
Openstreetmap
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GIS
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform
 
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
 
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Lucene solr 4 spatial extended deep dive

  • 1. LUCENE/ SOLR 4 SPATIALDEEPDIVE DavidSmiley SoftwareSystemsEngineer,Lead
  • 2. © 2013 The MITRE Corporation. All rights reserved. LUCENE / SOLR 4 SPATIAL DEEP-DIVE 2013 Lucene Revolution Presented by David Smiley, MITRE
  • 3. About David Smiley • Working at MITRE, for 13 years • web development, Java, search • 3 Solr apps, 1 Endeca • Published 1st book on Solr; then 2nd edition (2009, 2011) • Apache Lucene / Solr committer/PMC member (2012) • Specializing on spatial • Presented at Lucene Revolution (2010) & Basis O.S. Search Conference (2011, 2012) • Taught Solr classes at MITRE (2010, 2011, 2012) • Solr search consultant within MITRE and its sponsors, and privately 3
  • 4. Agenda • Background, overview • Spatial4j • Lucene spatial • PrefixTree / Trie / Grid • Solr spatial • Demo • Interesting use-cases
  • 6. What is Spatial Search? Popular features: • Spatial filter query • Spatial distance sorting • Spatial distance relevancy (i.e. spatial query score) NOT “geocoding” – resolve “Boston” to its latitude and longitude Typical use-case: 1. Index a location for each Lucene document given a latitude & longitude 2. Then search for matching documents by a circle (point- radius) or bounding box 3. Then sort results by distance
  • 7. History of Spatial for Lucene & Solr • 2007: Local-Lucene • by Patric O’Leary (AOL) • 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0 • Local-Lucene graduates to an official Lucene contrib module • 2009-12: Spatial Search Plugin (SSP) for Solr • by Chris Male (JTeam -> Orange11, ElasticSearch) • 2010-10: SOLR-2155 a geohash prefix tree filter • by David Smiley (MITRE) • 2011-01: Lucene Spatial Playground (LSP) • by Ryan McKinley (Voyager GIS), David, and Chris • 2011-03: Solr 3.1 new spatial features • by Grant Ingersoll and Yonik Seeley (LucidWorks) • 2012-03: LSP -> Lucene 4 spatial module + Spatial4j + SSP • replaces former Lucene spatial contrib module
  • 8. Lucene Spatial Committers • David Smiley • Works for MITRE • Boston area • Ryan McKinley • Works for Voyager GIS • Silicon Valley • Chris Male, • Formerly at Elastic Search • New Zealand
  • 9. Spatial decomposed • Spatial4j • Shapes, WKT, Distance calculations, JTS adapter • Lucene spatial • Strategies: PrefixTree (TermQuery & Recursive impl.), BBox, PointVector • Solr adapters • Misc: Spatial Solr Sandbox • LSE • JtsGeoStrategy • Spatial-Demo (web app)
  • 10. Lines of Code for Spatial Components Spatial4j 43% Lucene spatial 35% Solr adapters 6% Misc 16% Total: 4,781 Non-Comment Source Statements (without javadocs or tests) as of 2012-09
  • 11. CarrotSearch Labs’ RandomizedTesting • http://labs.carrotsearch.com/randomizedtesting.html • Provides plumbing for repeatable randomized JUnit tests • All the spatial test code uses it extensively Randomized testing more generally is a certain philosophy / approach on how to test • A typical hard-coded test will only catch some regressions • A randomized test will catch just about anything eventually, especially nasty edge cases • Although it’s hard to read / write / maintain these tests • Randomized testing helped find bugs related to… • Computing the bounding box of a circle • Computing the relationship of a circle to a rectangle that has all 4 of its corners inside it
  • 13. Spatial4j: It’s all about the shapes https://github.com/spatial4j/spatial4j (spatial4j.com redirect) • Shapes • A “Shape” abstraction with multiple implementations • Geodetic (sphere) & Cartesian/2D implementations • Computes intersection relationship with other shapes • Also… • Distance and area math utilities, Geohash utilities • Parsing Well Known Text (WKT) formatted shapes • ASL licensed project independent of Apache on GitHub • Requires JTS (LGPL licensed) for polygons & WKT* • JTS is “JTS Topology Suite” • * WKT parsing soon to be implemented directly by Spatial4j • Ported to .NET as Spatial4n and used by RavenDB • by Itamar Syn-Herskhko
  • 14. The case for Spatial4j’s existence • Just for shapes? How much code could there be? • You’d be surprised. Determining the relationship between a lat-lon rectangle and a geodetic circle (Within, Contains, Intersects, Disjoint) is non-trivial, and that’s just one shape. • Lots of non-trivial test code go with it. • Why isn’t it a part of Lucene spatial? • Parts of Spatial4j depend on JTS, an LGPL licensed library. The Lucene PMC voted not to introduce this compile-time dependency. • Spatial4j is independently useful. • Is this duplication of other open-source that could be used? • Spatial4j needs to be ASL licensed to be a dependency of Lucene. • Still… I haven’t found existing code that does what Spatial4j does. • Can’t only the JTS dependent parts be external to Lucene?
  • 15. The Shape interface (may become an abstract class in the next version) • interface Shape { • Point getCenter(); • Rectangle getBoundingBox(); • boolean hasArea(); • double getArea(); • SpatialRelation relate(Shape other); • Must support Point & Rectangle • enum SpatialRelation • DISJOINT, INTERSECTS, WITHIN, CONTAINS • Note: simpler set than the “DE-9IM” spatial standard • no “equals” or “touches”
  • 16. Spatial4j shapes Cartesian Cartesian with dateline wrap Geodetic Point Y Y Y Line & LineString (w/ buffer) Y N N Rectangle Y Y Y Circle Y N Y ShapeCollection Y Y Y JTS Geometry (incl. polygons) Y Y N • Cartesian (AKA Euclidean): a flat plane • Dateline wrap assumes the plane circles back on itself • Geodetic: a spherical mathematical model
  • 17. Well Known Text (WKT) (see Wikipedia) • A popular standard for representing shapes as strings • Requires JTS’s WKT Parser but Spatial4j has its own in-progress • Extensions are TBD for Rectangles and Circles • Limited support for EMPTY and “Z” and “M” dimensions (future) • Some Examples: • POINT (3, -2) • LINESTRING(30 10, 10 30, … • POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10)) • MULTIPOLYGON (((… • … • Deprecated (may move to Solr): • -90, -180 • -180 -90 180 90 • CIRCLE(4.56,1.23 d=0.071) • TBD / Pending: • ENVELOPE(-180,180,90,-90) • BOX2D(-180 -90, 180 90)
  • 18. Spatial4j code sample SpatialContext ctx = SpatialContext.GEO; Rectangle r = ctx.makeRectangle(-71, -70, 42, 43); Circle c = ctx.makeCircle(-72, 42, 1); SpatialRelation rel = r.relate(c); System.out.println(rel); rel.intersects();//boolean ctx = JtsSpatialContext.GEO; Shape s = ctx.readShape(“POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))”); double distanceDegrees = ctx.getDistCalc().distance( ctx.makePoint(2, 2), ctx.makePoint(3, 3) ); Distances (including circle radius) are in “Degrees”, not radians or KM
  • 19. Spatial4j Future • Built-in WKT support (no JTS dependency) • Extensible to user-defined shapes • API improvements • Shape argument validation via WKT but not via ctx.makeShape(…) • ShapeCollection visitor design pattern • Refactor to remove need for isGeo() • LineString dateline & geodetic support • Projection / Datum support
  • 20. LUCENE SPATIAL Spatial index information retrieval
  • 21. Lucene 4 Spatial Module • There isn’t one best way to implement spatial indexing for all use-cases • Index just points, or other shapes too? Which? • Multiple shapes per field? • Query by Intersection? Contains? Within? Equals? Disjoint? … • Distance sorting? Query boost by distance? • Or more exotic shape relevancy like overlap percentage? • Tradeoff shape precision for speed? • Multiple SpatialStrategy implementations: • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy • PointVectorStrategy • BBoxStrategy (currently in trunk, not 4x) • JtsGeoStrategy (in Spatial Solr Sandbox)
  • 22. Strategy: PointVector • Similar to Solr’s PointType / LatLonType • X & Y trie double fields; caching via FieldCache • Characteristics • Indexes points (only) • Single-valued field (no multi) • Query by rectangle or circle (only) • Circle uses FieldCache (requires memory) • Circle does bbox pre-filter for performance • Relations: Intersects, Within (only) • Exact precision for x & y coordinates and query shape • Distance sort • Uses FieldCache (requires memory)
  • 23. Strategy: BBox • Implemented with 4 doubles & 1 boolean • Ported from ESRI GeoPortal (Open Source) • Characteristics: • Indexes rectangles (only) • Single-valued field (no multi) • Query by rectangle (only) • Supports all relations: Intersects, Within, Contains, … • Distance sort from box center • Uses FieldCache (requires memory) • Area overlap sorting • Sort results by percentage overlap between query and indexed boxes • Uses FieldCache (requires memory) • Note: FieldCache needs are somewhat high
  • 24. Strategy: JtsGeoStrategy • Stores a JTS geometry in Lucene 4’s DocValues • Stores WKB (WKT in binary format) • Full vector geometry is retained for search • DocValues is mostly a better FieldCache • Faster loading into memory • Can be disk resident or memory • Multi-valued • Characteristics: • Indexes any shape, including Multi… varieties • Query by any shape • Uses DocValues (memory use optional) • Supports all relations: intersect, within, contains, … • Could easily also support JTS’s exotic DE-9IM based relations • Exact precision to the vector geometry • No sorting • Experimental / immature status More of a proof-of-concept for now
  • 26. Strategy: RecursivePrefixTree • Grid / Tile / Trie / Prefix- Tree based • With recursive decent algorithms • Or TermQueryPrefixTree alternative • Choose Geohash (geo only) or Quad tree • The most mature strategy to date • Highly tested • The current evolution of SOLR-2155
  • 27. Strategy: RecursivePrefixTree • Characteristics: • Indexes all shapes • Variable precision of shape edges • Highly precise shapes other than Point won’t scale • LineString possibly not precise enough for your needs • Multi-valued field support • Query by any shape • Variable precision for query shape • Highest precision usually scales • All Relations: Intersects, Within, Contains, Disjoint • Distance sort (w/ multi-value support) • Warning: immature, won’t scale • Uses significant amounts of memory • Fast scalable spatial filtering; no caches needed new in Lucene 4.3 How many search / NoSQL systems have these capabilities?
  • 28. Geohashes • What is a Geohash? • A lat/lon geocode system • Has a hierarchical spatial structure • Gradual precision degradation • In the public domain http://en.wikipedia.org/wiki/Geohash • Example: (Boston) DRT2Y
  • 35. Geohash Grids DRT2Y Internal coordinates of an odd length geohash… …and an even length geohash DRT2
  • 36. Demo • Spatial Solr Playground • Demo KML grid generation from geometries • A sample point with quad tree indexes to these tokens: • A, AD, ADB, ADBA • A sample circle with quad tree indexes to these tokens: • A, AB, ABA, ABAB+, ABAC+, ABAD+, ABB, ABBA+, ABBB+, ABBC+, ABBD+, ABC, ABCA+, ABCB+, ABCC+, ABCD+, ABD+, AD, ADA, ADAA+, ADAB+, ADAC+, ADAD+, ADB+, ADC, ADCA+, ADCB+, ADCD+, ADD, ADDA+, ADDB+, ADDC+, ADDD+, B, BA, BAA, BAAC+, BAAD+, BAC, BACA+, BACB+, BACC+, BACD+, BC, BCA, BCAA+, BCAB+, BCAC+, BCC, BCCA+, BCCC+, C, CB, CBB, CBBA+ • Tokens with a ‘+’ are actually indexed with and without the ‘+’
  • 37. PrefixTreeStrategy Architecture Shape calc rect relationship SpatialPrefixTree & Cell byte string to/from Cell (rect) PrefixTreeStrategy index & search algorithms Lucene TermsEnum IntersectsPrefixTreeFilter ContainsPrefixTreeFilter WithinPrefixTreeFilter
  • 38. Lucene Spatial example code ctx = SpatialContext.GEO; strategy = new RecursivePrefixTreeStrategy( new GeohashPrefixTree(ctx,11), “myGeoField”); … // make indexWriter and a Document for (Field f : strategy.createIndexableFields(shape)) doc.add(f); indexWriter.addDocument(doc); … filter = strategy.makeFilter( new SpatialArgs(SpatialOperation.Intersects, ctx.makeCircle(-80.0, 33.0, DistanceUtils.dist2Degrees(200, DistanceUtils.EARTH_MEAN_RADIUS_KM)))); indexSearcher.search(userKeywordQuery, filter, 10); See SpatialExample.java in Lucene spatial tests for more
  • 39. Future • Possible de-emphasis of SpatialStrategy abstraction • A better options for distance sorting of PrefixTree strategies • Better PrefixTree encoding than both geohash & quad tree • Google Summer of Code 2013 -- TBD • Performance improvements to spatial Intersects RecursivePrefixTree Filter • Remove the need to double-index leaf-nodes (with and without ‘+’) • Exact geometry search by blending benefits of PrefixTree and JtsGeoStrategy • A Single-dimensional PrefixTree (for numeric range index)
  • 40. SOLR SPATIAL Adapters to Lucene 4 spatial
  • 41. Solr 3 Spatial: LatLonType & friends • Solr 3 was Solr’s first release to include spatial support • Not based on Lucene’s old spatial contrib module • Similar to TwoDoublesStrategy but more optimized • Single-valued only, fast distance sorting, can choose floats (save memory) • Fields: • LatLonType (Geodetic) • PointType (Cartesian) • Query parsers (spatial filters): • {!geofilt} (circle) “p” and “sfield” and “d” params • {!bbox} (bounding box of a circle) • Distance function: • geodist() and some esoteric others NOT completely superseded by Solr 4 spatial fields
  • 42. Solr 4 Spatial • See http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial 4 <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType” spatialContextFactory=” com.spatial4j.core.context.jts.JtsSpatialContextFactory” distErrPct="0.025” maxDistErr="0.000009” units="degrees” /> If you don’t need JTS (polygons) don’t set this Non-point shapes approximated to grid up to 2.5% of radius Max precision (1m) as measured in degrees
  • 43. Indexing • Point: Latitude, Longitude (i.e. Y, X) <field name="geo">43.17614, -90.57341</field> • Point: X Y <field name="geo">-90.57341 43.17614</field> • Rect: minX minY maxX maxY <field name="geo">-74.093 41.042 -69.347 44.558</field> • Circle: point then d=radius (in degrees) • will be deprecated <field name="geo">Circle(4.56,1.23 d=0.0710)</field> • WKT (preferred; it’s a standard) <field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field>
  • 44. Filter (search) • Using Solr 3’s bbox or geofilt query parsers • Distance radius ‘d’ is interpreted as kilometers, just like LatLonType • Limited to bbox and bbox of a circle fq={!geofilt}&sfield=geo&pt=45.15,-93.85&d=5 • Range query style (bounding box) • Handles dateline wrap fq=geo:[-90,-180 TO 90,180] • Field query style • Unique to Lucene 4 spatial; see SpatialArgsParser fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))) distErrPct=0” • Predicates: Intersects, IsDisjointTo, IsWithin, Contains, … • distErrPct (& distErr) optional; override field type’s default SOLR-4242: A better spatial query parser
  • 45. Distance Sort & Relevancy Boost • geodist() is for Solr 3 LatLonType only sort=geodist(lltField,45.15,-93.85) desc • Solr 4 spatial queries can return the distance as the score q={!geofilt sfield=geo pt=45.15,-93.85 d=5 score=distance}&sort=score asc&fl=*,score • Without a filter sort=query($sortsq) asc&sortsq={!geofilt filter=false score=distance sfield=geo pt=45.15,-93.85 d=0} • Relevancy boost defType=edismax&boost=query($mysq)&mysq={!geofilt filter=false score=recipDistance pt=45.15,-98.85 d=5}
  • 46. Distance Faceting • sfield=geo (the field) • pt=45.15,-93.85 (point of reference) • Within 10km • facet.query={!geofilt d=10} • Within 50km • facet.query={!geofilt d=50} • Within 100km • facet.query={!geofilt d=100}
  • 47. Future • A more Solr-friendly spatial query parser SOLR-4242 • Retrofit geodist() to support the SpatialStrategies? • Expose more tunables • A grid based heat-map faceting component • Idea: a multi-strategy spatial field encompassing • A PrefixTree field for points • A PrefixTree field for non-points • A TwoDoubles field for good distance sorting / relevancy • Knows whether its single vs. multi-valued • A FieldType for multi-value numeric ranges
  • 48. DEMO
  • 50. 1. Geohash each point to multiple lengths and index each length into its own field • geohash_1:D, geohash_2:DR, geohash_3:DRT, geohash_4:DRT2 2. Search with a rectangle (bbox) filter, and… 3. Facet on the geohash field with the desired resolution • facet.field=geohash_4 &facet.limit=10000 • Lots of tuning / customization options • Projected / quad tree • facet.prefix may help Heatmap / Grid faceting
  • 51. Plotting many points on a map • Why not ask Solr for rows=1000 ? • It’s slow • If variable-points per doc then could yield be 1 distinct point or 1M • Instead facet on a geohash with facet.limit=1000 • Fast • Guaranteed <= 1000 points • But might need lots of memory • Or result-grouping on a geohash But do you really want to plot 1000+ points on a map?
  • 52. Filter by indexed distance constraints • Imagine a dating site where both potential parties have a maximum distance they’re willing to travel • Q: For the current user, who is not “too far” for you but is also not “too far” for them? • A: Index each user’s location as a point in one field and as a circle in another. Query by the current user’s circle to the indexed point field as well as the current user’s point to the indexed circle field.
  • 53. Multi-valued durations • What if your documents needed a variable number of time (or other numerical value) durations • This approach won’t work: <field name=“start” type=“tdate” multiValued=“true”/> <field name=“end” type=“tdate” multiValued=“true”/> • Solr (without Solr 4 spatial fields) can’t do it! • You need to think differently to solve this… http://wiki.apache.org/solr/SpatialForTimeDurations • Example use-cases • Searching for hotel-room vacancies • Searching for movie show-times • (next slides) Each document is a person with a variable number of “shifts” that they are working…
  • 54. … model durations as points
  • 55. … queries become rectangles
  • 56. … some config & search details • Configuration <fieldType name="days_of_year” class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" units="degrees" worldBounds="0 0 365 365" distErrPct="0" maxDistErr="1"/> • Sample search: Find shifts that have any overlap with 19th day to 23rd daysOfYear:Intersects(0 18.5 23.5 365) • Caveat: Won’t scale to the full precision of a java Long (timestamp)
  • 57. Thank you! • References • Lucene 4 spatial javadocs • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/ • Spatial4j at GitHub • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect) • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com • Solr • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 • Spatial Solr Sandbox • https://github.com/ryantxu/spatial-solr-sandbox • Contact me: • David Smiley dsmiley@mitre.org dsmiley@apache.org