SlideShare a Scribd company logo
1 of 23
Download to read offline
The Latest in 
Spatial & Temporal Search 
David Smiley
Agenda 
Spatial 
• Polygons 
and 
Accuracy: 
SerializedDVStrategy 
• FlexPrefixTree 
• BBoxSpa=alStrategy 
• Student/Intern 
contribu=ons, 
Geodesics 
Temporal 
• Dates, 
and 
Date 
Ranges 
• Search 
• Face=ng
About David Smiley 
• Freelance search consultant / developer 
• Expert 
Lucene/Solr 
development 
skills, 
advice 
(consul=ng), 
training 
• Java 
(full-­‐stack), 
Web, 
Spa=al 
• Apache Lucene / Solr committer & PMC, 
Eclipse Locationtech PMC 
• Authored 1st book on Solr, plus two editions 
• Presented at several conferences & meetups 
• Taught several Solr classes, self-developed & LucidWorks
Lucene Spatial Overview 
• Multiple approaches to index spatial data 
abstract class SpatialStrategy 
(5+ 
concrete 
implementa=ons) 
• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile 
• Grid 
based 
Shape 
Spa=alPrefixTree 
/ 
Cell 
PrefixTreeStrategy 
• Uses Spatial4j lib for shapes, distance calculations, and WKT 
• Uses 
JTS 
Topology 
Suite 
lib 
for 
polygons 
IntersectsPrefixTreeFilter 
Contains… 
Geohash 
| 
Quad 
Within…
SpatialPrefixTrees and Accuracy 
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree 
• Thus 
represents 
shapes 
as 
grid 
cells 
of 
varying 
precision 
by 
prefix 
Example, a point shape: 
• D, 
DR, 
DRT, 
DRT2, 
DRT2Y 
Example, a polygon shape: 
• Too 
many 
to 
list… 
508 
cells 
More 
details 
here: 
h7p://opensourceconnec;ons.com/blog/2014/04/11/indexing-­‐polygons-­‐in-­‐lucene-­‐with-­‐accuracy/
…continued 
• For more accuracy, index more levels (longer prefixes) 
• Points: 
linear 
rela=onship 
of 
levels 
to 
number 
of 
cells 
J 
• Non-­‐points: 
exponen=al 
rela=onship… 
L 
RPT applies a distErrPct shape size ratio to non-point shapes to 
trade accuracy for scalability 
• distErrPct=0.025 (2.5% of the radius, the default): 
• Massachuse[s: 
level 
6 
• USA: 
level 
4 
(not 
as 
precise)
SerializedDVStrategy (Lucene 4.7) 
• Stores serialized geometry into Lucene BinaryDocValues 
• It’s 
as 
accurate 
as 
the 
underlying 
geometry 
coordinates/shape 
• But 
it’s 
not 
a 
spa=al 
index 
– 
it’s 
retrievable 
on 
a 
per-­‐document 
basis 
• Use RPT + SerializedDV for speed and accuracy! 
• More to come eventually: 
• Solr 
adapter 
– 
SOLR-­‐5728, 
Elas=cSearch 
adapter 
#2361 
• Speed: 
Skip 
the 
serialized 
geometry 
check 
for 
non-­‐edge 
cells 
– 
LUCENE-­‐5579
Sample Code 
SpatialArgs 
args 
= 
new 
SpatialArgs(INTERSECTS, 
point); 
treeStrategy 
= 
new 
RecursivePrefixTreeStrategy( 
grid, 
"geometry"); 
verifyStrategy 
= 
new 
SerializedDVStrategy( 
ctx, 
"serialized_geometry"); 
Query 
treeQuery 
= 
new 
ConstantScoreQuery( 
treeStrategy.makeFilter(args)); 
Query 
combinedQuery 
= 
new 
FilteredQuery( 
treeQuery, 
verifyStrategy.makeFilter(args), 
FilteredQuery.QUERY_FIRST_FILTER_STRATEGY); 
Code 
is 
from 
a 
related 
presenta;on 
by 
the 
Climate 
Corpora;on 
presented 
at 
FOSS4G 
2014
FlexPrefixTree (Coming to Lucene 5) 
• A new SpatialPrefixTree by Varun Shenoy (GSOC 2014) ! 
• LUCENE-­‐4922; 
S=ll 
needs 
to 
be 
commi[ed. 
Goal 
is 
for 
5.0. 
• More optimized, more flexible, than Geohash & Quad 
• Configurable 
sub-­‐cells 
at 
each 
level: 
4, 
16, 
64, 
256 
• You 
choose 
trade-­‐off 
between 
index 
speed/disk 
size 
& 
search 
speed 
• Internally 
uses 
an 
integer 
coordinate 
system 
• Rectangle 
searches 
are 
par=cularly 
fast; 
minimal 
floa=ng-­‐point 
conversion 
• Cells 
are 
always 
squares 
(equal 
sides) 
– 
be[er 
for 
heatmaps 
• YMMV: 
10% 
-­‐ 
100% 
faster 
than 
GeohashPrefixTree
BBoxSpatialStrategy (Lucene 4.10) 
• Rectangles (BBox’s) only, one value per field 
• Wide predicate support 
• Equals, 
Intersects, 
Within, 
Contains, 
Disjoint 
• Accurate (8-byte double floating point) 
• Area overlap relevancy 
• Weight 
search 
results 
by 
a 
combina=on 
of 
query 
shape 
overlap 
& 
index 
shape 
overlap 
ra=os 
• Solr BBoxField…
Solr BBoxField 
• Schema configuration 
<field name="bbox" type="bbox" /> 
<fieldType name="bbox" class="solr.BBoxField” 
geo="true" units="degrees" numberType="_bbox_coord" /> 
<fieldType name="_bbox_coord" class="solr.TrieDoubleField” 
precisionStep="8" docValues="true" stored="false"/> 
• Search with overlap ratio ordering 
&q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10)) 
• score 
can 
be: 
overlapRa=o, 
area, 
area2D
Recent Student/Intern Contributions 
• Varun Shenoy via GSOC: summer 2014 
• Lucene 
spa=al: 
new 
“FlexPrefixTree” 
– 
an 
op=mized 
grid 
• Rebecca Alford via F.B. Open-Academy: winter 2014 
• Spa=al4j: 
geodesic 
polygons 
• Chris Pavlicek via F.B. Open-Academy: winter 2014 
• Spa=al4j: 
geodesic 
buffered 
lines 
• Evana Gizzi, MITRE intern: winter 2014 
• Spa=al4j: 
geodesic 
circle 
polygonizer 
• Liviy Ambrose, MITRE intern: fall 2013 
• Lucene 
spa=al: 
integrated 
with 
Lucene’s 
benchmark 
module
Temporal/Date Durations 
or basically any numeric ranges
Approach: Simple Two-field 
(as you might do in SQL or any system without native range types) 
• A start-time & end-time field pair 
• A search window (time span) becomes two range queries 
• details 
vary 
by 
predicate 
(Intersects, 
Contains, 
vs. 
Within) 
• Single-valued only 
• …even 
though 
Lucene 
supports 
mul=-­‐valued 
fields 
• Theore=cally 
possible 
but 
would 
be 
a 
lot 
of 
work 
• because 
Lucene 
doesn’t 
store 
“posi=on” 
info 
for 
numeric 
fields 
• because 
numeric 
range/prefix 
queries 
are 
posi=on-­‐less
Approach: 2D Spatial PrefixTree 
• Lucene Spatial QuadPrefixTree 
(2D) with RPT Strategy 
• Use ‘x’ for start-time, ‘y’ for end-time 
• A search window (time span) 
becomes a rectangle query 
• details 
vary 
by 
predicate 
(Intersects, 
Contains, 
vs. 
Within) 
• Cool… 
• But 
floa=ng-­‐point 
edge 
issues 
• Only 
~50 
levels 
supported; 
not 
64 
Details: 
h[p://wiki.apache.org/solr/Spa=alForTimeDura=ons
Approach: DateRangePrefixTree (Lucene 5) 
• A new 1D SpatialPrefixTree: NumberRangePrefixTree 
• NumberRangePrefixTree 
w/ 
DateRangePrefixTree 
subclass 
• NR-­‐SPT: 
Configurable 
sub-­‐cells 
per 
level; 
no 
level 
limit 
• Not 
just 
for 
ranges; 
instances 
too 
• Index/Search 
with 
NumberRangePrefixTreeStrategy 
• Indexing, 
and 
search 
predicate 
code 
(e.g. 
Intersects…) 
completely 
re-­‐used 
• DateRangePrefixTree 
• 9 
Levels: 
1M 
years, 
1K 
years, 
years, 
months, 
days, 
hours, 
minutes, 
seconds, 
millis 
…continued…
Trade-offs of N/D-SPT 
• Indexing: 
• “Common” 
date-­‐ranges 
use 
~ 
<50 
terms, 
but 
random 
millisecond 
ranges 
use 
up 
to 
~14K 
terms 
• All 
date 
instances 
(not 
a 
range) 
<= 
9 
terms 
• Comparison 
to 
2D 
SPT: 
instance 
or 
range, 
always 
50 
• Search: 
• Query 
for 
“common” 
query 
ranges 
faster 
than 
uncommon 
• Comparison 
to 
2D 
SPT: 
• Contains 
& 
Within 
predicates: 
overlapping 
values 
per 
document 
get 
coalesced, 
can’t 
be 
differen=ated
Solr DateRangeField 
• Configuration in schema.xml: 
<field 
name="dateRange" 
type=”dateRange” 
/> 
<fieldType 
name="dateRange" 
class="solr.DateRangeField" 
/> 
• Index field data, examples: 
• 2014-­‐05-­‐21T12:00:00.000Z 
(same 
as 
TrieDate) 
• 2014-­‐05-­‐21T12 
(truncated 
to 
desired 
precision) 
• [1990 
TO 
1995] 
• Query, examples: 
• fq=dateRange:[* 
TO 
2014-­‐05-­‐21] 
• fq={!field 
f=dateRange 
op=Contains} 
[2000 
TO 
2014-­‐05-­‐21]
Visualizing Date Facets 
• http://bl.ocks.org/mbostock/4063318
Date Faceting 
• Option A: facet.range 
• Not 
for 
indexed 
date-­‐ranges 
• Internally 
executes 
one 
query 
for 
each 
value 
& 
caches 
large 
bitset 
• Option B: facet.interval (Solr 4.10) 
• Not 
for 
indexed 
date-­‐ranges 
• Requires 
DocValues 
(more 
index 
data) 
• Supports 
variable/custom 
intervals 
• New work-in-progress option: Facet on DateRangeField 
• Ranges 
are 
fixed/pre-­‐determined 
(months, 
days, 
etc.) 
• Op=mized 
for 
thousands 
of 
ranges 
to 
count 
• Each 
value-­‐range 
is 
only 
1 
term!
Future stuff I’m excited about 
• Continuing works in-progress 
• Spatial heatmaps! Coming in January 2015! 
• Lucene 
layer 
& 
Solr 
adapter 
• Lucene term auto-prefixing LUCENE-5879 
• Brings 
spa=al, 
date, 
numeric, 
indexing/search 
to 
the 
next 
level! 
• More prefix-tree optimizations 
• Inner 
vs 
edge 
leaf 
cell 
differen=a=on 
for 
non-­‐point 
shapes 
• RPT 
+ 
SerializedDVStrategy; 
skip 
accuracy 
checks 
for 
inner 
cells 
• Don’t 
index 
leaf 
cells 
twice
That’s 
all 
for 
now; 
thanks 
for 
coming! 
Need 
Lucene/Solr 
guidance 
or 
custom 
development? 
Contact 
me! 
Email: 
dsmiley@apache.org 
LinkedIn: 
h[p://www.linkedin.com/in/davidwsmiley 
G+: 
+DavidSmiley 
Twi[er: 
@DavidWSmiley 
ETA: 
December 
2014

More Related Content

Similar to The Latest in Spatial & Temporal Search: Presented by David Smiley

Lucene solr 4 spatial extended deep dive
Lucene solr 4 spatial   extended deep diveLucene solr 4 spatial   extended deep dive
Lucene solr 4 spatial extended deep divelucenerevolution
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Yandex
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Scaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptScaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptssuserbad56d
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...Mihail Mateev
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemMarco Parenzan
 
PostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDPostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDOleg Bartunov
 
AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017Kisung Kim
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databaseBarry Jones
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph SchemaJoshua Shinavier
 

Similar to The Latest in Spatial & Temporal Search: Presented by David Smiley (20)

Lucene 4 spatial
Lucene 4 spatialLucene 4 spatial
Lucene 4 spatial
 
Lucene solr 4 spatial extended deep dive
Lucene solr 4 spatial   extended deep diveLucene solr 4 spatial   extended deep dive
Lucene solr 4 spatial extended deep dive
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
Scaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptScaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.ppt
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...
Cloud conf-varna-2014-mihail mateev-spatial-data-and-microsoft-azure-sql-data...
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
PostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDPostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACID
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

The Latest in Spatial & Temporal Search: Presented by David Smiley

  • 1.
  • 2. The Latest in Spatial & Temporal Search David Smiley
  • 3. Agenda Spatial • Polygons and Accuracy: SerializedDVStrategy • FlexPrefixTree • BBoxSpa=alStrategy • Student/Intern contribu=ons, Geodesics Temporal • Dates, and Date Ranges • Search • Face=ng
  • 4. About David Smiley • Freelance search consultant / developer • Expert Lucene/Solr development skills, advice (consul=ng), training • Java (full-­‐stack), Web, Spa=al • Apache Lucene / Solr committer & PMC, Eclipse Locationtech PMC • Authored 1st book on Solr, plus two editions • Presented at several conferences & meetups • Taught several Solr classes, self-developed & LucidWorks
  • 5. Lucene Spatial Overview • Multiple approaches to index spatial data abstract class SpatialStrategy (5+ concrete implementa=ons) • RecursivePrefixTreeStrategy (RPT) is most prominent, versatile • Grid based Shape Spa=alPrefixTree / Cell PrefixTreeStrategy • Uses Spatial4j lib for shapes, distance calculations, and WKT • Uses JTS Topology Suite lib for polygons IntersectsPrefixTreeFilter Contains… Geohash | Quad Within…
  • 6. SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree • Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: • D, DR, DRT, DRT2, DRT2Y Example, a polygon shape: • Too many to list… 508 cells More details here: h7p://opensourceconnec;ons.com/blog/2014/04/11/indexing-­‐polygons-­‐in-­‐lucene-­‐with-­‐accuracy/
  • 7. …continued • For more accuracy, index more levels (longer prefixes) • Points: linear rela=onship of levels to number of cells J • Non-­‐points: exponen=al rela=onship… L RPT applies a distErrPct shape size ratio to non-point shapes to trade accuracy for scalability • distErrPct=0.025 (2.5% of the radius, the default): • Massachuse[s: level 6 • USA: level 4 (not as precise)
  • 8. SerializedDVStrategy (Lucene 4.7) • Stores serialized geometry into Lucene BinaryDocValues • It’s as accurate as the underlying geometry coordinates/shape • But it’s not a spa=al index – it’s retrievable on a per-­‐document basis • Use RPT + SerializedDV for speed and accuracy! • More to come eventually: • Solr adapter – SOLR-­‐5728, Elas=cSearch adapter #2361 • Speed: Skip the serialized geometry check for non-­‐edge cells – LUCENE-­‐5579
  • 9. Sample Code SpatialArgs args = new SpatialArgs(INTERSECTS, point); treeStrategy = new RecursivePrefixTreeStrategy( grid, "geometry"); verifyStrategy = new SerializedDVStrategy( ctx, "serialized_geometry"); Query treeQuery = new ConstantScoreQuery( treeStrategy.makeFilter(args)); Query combinedQuery = new FilteredQuery( treeQuery, verifyStrategy.makeFilter(args), FilteredQuery.QUERY_FIRST_FILTER_STRATEGY); Code is from a related presenta;on by the Climate Corpora;on presented at FOSS4G 2014
  • 10. FlexPrefixTree (Coming to Lucene 5) • A new SpatialPrefixTree by Varun Shenoy (GSOC 2014) ! • LUCENE-­‐4922; S=ll needs to be commi[ed. Goal is for 5.0. • More optimized, more flexible, than Geohash & Quad • Configurable sub-­‐cells at each level: 4, 16, 64, 256 • You choose trade-­‐off between index speed/disk size & search speed • Internally uses an integer coordinate system • Rectangle searches are par=cularly fast; minimal floa=ng-­‐point conversion • Cells are always squares (equal sides) – be[er for heatmaps • YMMV: 10% -­‐ 100% faster than GeohashPrefixTree
  • 11. BBoxSpatialStrategy (Lucene 4.10) • Rectangles (BBox’s) only, one value per field • Wide predicate support • Equals, Intersects, Within, Contains, Disjoint • Accurate (8-byte double floating point) • Area overlap relevancy • Weight search results by a combina=on of query shape overlap & index shape overlap ra=os • Solr BBoxField…
  • 12. Solr BBoxField • Schema configuration <field name="bbox" type="bbox" /> <fieldType name="bbox" class="solr.BBoxField” geo="true" units="degrees" numberType="_bbox_coord" /> <fieldType name="_bbox_coord" class="solr.TrieDoubleField” precisionStep="8" docValues="true" stored="false"/> • Search with overlap ratio ordering &q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10)) • score can be: overlapRa=o, area, area2D
  • 13. Recent Student/Intern Contributions • Varun Shenoy via GSOC: summer 2014 • Lucene spa=al: new “FlexPrefixTree” – an op=mized grid • Rebecca Alford via F.B. Open-Academy: winter 2014 • Spa=al4j: geodesic polygons • Chris Pavlicek via F.B. Open-Academy: winter 2014 • Spa=al4j: geodesic buffered lines • Evana Gizzi, MITRE intern: winter 2014 • Spa=al4j: geodesic circle polygonizer • Liviy Ambrose, MITRE intern: fall 2013 • Lucene spa=al: integrated with Lucene’s benchmark module
  • 14. Temporal/Date Durations or basically any numeric ranges
  • 15. Approach: Simple Two-field (as you might do in SQL or any system without native range types) • A start-time & end-time field pair • A search window (time span) becomes two range queries • details vary by predicate (Intersects, Contains, vs. Within) • Single-valued only • …even though Lucene supports mul=-­‐valued fields • Theore=cally possible but would be a lot of work • because Lucene doesn’t store “posi=on” info for numeric fields • because numeric range/prefix queries are posi=on-­‐less
  • 16. Approach: 2D Spatial PrefixTree • Lucene Spatial QuadPrefixTree (2D) with RPT Strategy • Use ‘x’ for start-time, ‘y’ for end-time • A search window (time span) becomes a rectangle query • details vary by predicate (Intersects, Contains, vs. Within) • Cool… • But floa=ng-­‐point edge issues • Only ~50 levels supported; not 64 Details: h[p://wiki.apache.org/solr/Spa=alForTimeDura=ons
  • 17. Approach: DateRangePrefixTree (Lucene 5) • A new 1D SpatialPrefixTree: NumberRangePrefixTree • NumberRangePrefixTree w/ DateRangePrefixTree subclass • NR-­‐SPT: Configurable sub-­‐cells per level; no level limit • Not just for ranges; instances too • Index/Search with NumberRangePrefixTreeStrategy • Indexing, and search predicate code (e.g. Intersects…) completely re-­‐used • DateRangePrefixTree • 9 Levels: 1M years, 1K years, years, months, days, hours, minutes, seconds, millis …continued…
  • 18. Trade-offs of N/D-SPT • Indexing: • “Common” date-­‐ranges use ~ <50 terms, but random millisecond ranges use up to ~14K terms • All date instances (not a range) <= 9 terms • Comparison to 2D SPT: instance or range, always 50 • Search: • Query for “common” query ranges faster than uncommon • Comparison to 2D SPT: • Contains & Within predicates: overlapping values per document get coalesced, can’t be differen=ated
  • 19. Solr DateRangeField • Configuration in schema.xml: <field name="dateRange" type=”dateRange” /> <fieldType name="dateRange" class="solr.DateRangeField" /> • Index field data, examples: • 2014-­‐05-­‐21T12:00:00.000Z (same as TrieDate) • 2014-­‐05-­‐21T12 (truncated to desired precision) • [1990 TO 1995] • Query, examples: • fq=dateRange:[* TO 2014-­‐05-­‐21] • fq={!field f=dateRange op=Contains} [2000 TO 2014-­‐05-­‐21]
  • 20. Visualizing Date Facets • http://bl.ocks.org/mbostock/4063318
  • 21. Date Faceting • Option A: facet.range • Not for indexed date-­‐ranges • Internally executes one query for each value & caches large bitset • Option B: facet.interval (Solr 4.10) • Not for indexed date-­‐ranges • Requires DocValues (more index data) • Supports variable/custom intervals • New work-in-progress option: Facet on DateRangeField • Ranges are fixed/pre-­‐determined (months, days, etc.) • Op=mized for thousands of ranges to count • Each value-­‐range is only 1 term!
  • 22. Future stuff I’m excited about • Continuing works in-progress • Spatial heatmaps! Coming in January 2015! • Lucene layer & Solr adapter • Lucene term auto-prefixing LUCENE-5879 • Brings spa=al, date, numeric, indexing/search to the next level! • More prefix-tree optimizations • Inner vs edge leaf cell differen=a=on for non-­‐point shapes • RPT + SerializedDVStrategy; skip accuracy checks for inner cells • Don’t index leaf cells twice
  • 23. That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: dsmiley@apache.org LinkedIn: h[p://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twi[er: @DavidWSmiley ETA: December 2014