SlideShare a Scribd company logo
1 of 18
Download to read offline
APACHE SOLR
Four Big Features:
•  Faceting
•  Query auto-complete
•  Geospatial
•  Scaling
2014 March Presented by David Smiley at the Boston Java Meetup Group
About David Smiley
➢  Software Engineer (14 years)
○  Search (5 years)
○  Java, Web, Spatial
➢  Part-time employed at MITRE
➢  Part-time search consultant
➢  Apache Lucene / Solr committer & PMC
➢  Published 1st book on Solr
➢  Presented at several conferences
➢  Taught several Solr classes
Faceting
•  Do you know what I mean by “faceting”?
•  AKA: faceted navigation, or parametric search
•  Popular Apps:
•  eBay, Amazon, and many e-commerce sites
•  Apps I use that don’t use faceting but I wish they did:
•  http://search.maven.org and all Maven repository software: Nexus,
Artifactory, Archiva
•  JIRA
•  Compare this to: http://jirasearch.mikemccandless.com/
Faceted Navigation & Analytics
by example…
Notice the counts
Optionally start with
a keyword search or
filter
Extremely useful feature supported by very few platforms:
Solr, ElasticSearch, Sphinx, … (no DBs)
Credit: Trey Grainger; CareerBuilder
How to: Field Faceting
•  Index setup: schema.xml:
<field name=“category” type=“string” />
<field name=“manufacturer” type=“string” />
•  Facet search:
http://localhost:8983/solr/
collection1/
select?
q=*:*&
facet=true&
facet.field=category&
facet.field=manufacturer
How to: Numeric/Date Faceting
•  Index setup: schema.xml:
<field name=“timestamp” type=“tdate” />
•  Facet search:
http://localhost:8983/solr/
collection1/
select?
q=*:*&
facet=true&
facet.range=timestamp&
facet.range.start=NOW/YEAR-10YEAR
facet.range.end=NOW/YEAR+1YEAR
facet.range.gap=+1YEAR
Query Suggest / Autocomplete
If you aren’t doing this then
you really should!
Several Types
•  Instant search
•  Direct navigation to documents, usually by name/title/id, etc.
•  Implement via edge n-grams or a Suggester
•  Ex: iTunes, Netflix, …
•  Query log completion
•  Searches user queries you’ve captured & indexed
•  Implement via edge n-grams or FreeTextSuggester
•  Ex: Google
•  Term completion
•  Completes indexed words
•  Implement via facet.prefix technique or a Suggester
•  Facet / field value completion
•  Ex: Mint.com
Not mutually exclusive!
Tools for Completing / Suggesting
•  The Suggester
•  A specialization of the spell-check Solr component
•  8 implementations to choose from! Different pros/cons
•  Weighted? Analyzing? Infix? Highlight? Fuzzy? N-gram model?
•  Faceting with facet.prefix
•  Respects your current filters – don’t suggest a 0-result response
•  Edge n-grams, with standard search
•  Terms component
Sample Suggester Search
Search
http://localhost:
8983/solr/
mbartists/
a_term_suggest?
q=sma
Response
{
"responseHeader":{
"status":0,
"QTime":1},
"spellcheck":{
"suggestions":[
"sma",{
"numFound":4,
"startOffset":0,
"endOffset":3,
"suggestion":[
"small",
"smart",
"smash"
“smalley”]},
"collation","small"]}}
Geospatial Features
•  Lucene/Solr can index text, numbers, dates, and spatial
data
•  Features:
•  Index latitude & longitude coordinates or any X Y pairs
•  Index polygons or other geometry
•  Query by point-radius, rectangle, or polygon geometry
•  Including “IsWithin” vs “Intersects” vs “Contains” predicates
•  2d/flat Euclidean OR geodetic spherical world model
•  Sort or relevancy-boost by distance to indexed points
The NoSQL solutions with the best spatial are CouchDB, MongoDB, Solr, and ElasticSearch
How to: Spatial Filter & Sort
• Index setup:
schema.xml:
<field name=“geo”
type=“location_rpt”
/>
• Index latitude comma
longitude in your
document:
37.7752,-100.0232
• Filter :
http://localhost:
8983/solr/
collection1/
select?
q=*:*&
fq={!geofilt}&
sort=geodist() asc&
sfield=geo&
pt=45.15,-93.85&
d=5
Cool Technology Under the Hood
• Grid / tile based
recursive indexed
structure using a prefix
tree / trie indexing
approach on standard
Lucene inverted index
• Future:
•  Precise indexed shapes
•  Geodetic polygons
•  Hilbert curve ordering
Scaling Solr
Solr’s mechanisms for scaling:
•  Replication
•  Eliminates single point of failure
•  Reduces query load on any one node
•  Backups
•  Distributed-search (for sharded indexes)
•  For collections of large multi-million document collections
•  SolrCloud
•  Combines distributed-search and real-time replicated indexing
•  Centrally manages configuration
•  A higher level logical API, manages lots of coordination underneath
•  Advanced: doc routing, shard splitting, migration
Replication & Sharding
Illustrated with a metaphor of an encyclopedia at a library
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
26 Shards
3 Replicas A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Nice Admin Screen UI
More Advanced SolrCloud Features
•  Document routing customization
•  Answers: Which shard does a document belong in?
•  Hash (i.e. random) distribution
•  Or keep certain related documents together (ex: for same user)
•  Helps scale when searching by a subset
•  Or manage it yourself manually (ex: index by month)
•  Shard splitting
•  When your shard(s) get to be too big
•  Live; no down-time
•  Inter-collection document migration
•  Copies a subset of one collection to another, possibly new
collection
•  Live; no down-time
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me: dsmiley@apache.org
ETA: June 2014

More Related Content

What's hot

Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHPHiraq Citra M
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 

What's hot (20)

Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
it's just search
it's just searchit's just search
it's just search
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 

Viewers also liked

Top apache solr features
Top apache solr featuresTop apache solr features
Top apache solr featuresIntellipaat
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
Top Apache Solr Features
Top Apache Solr FeaturesTop Apache Solr Features
Top Apache Solr FeaturesIntellipaat
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
 
How to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/SolrHow to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/Solrlucenerevolution
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for SolrToke Eskildsen
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Lucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
Google search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchGoogle search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchVeera Shekar
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Fun With SHA2 Certificates
Fun With SHA2 CertificatesFun With SHA2 Certificates
Fun With SHA2 CertificatesGabriella Davis
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 

Viewers also liked (19)

Top apache solr features
Top apache solr featuresTop apache solr features
Top apache solr features
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Top Apache Solr Features
Top Apache Solr FeaturesTop Apache Solr Features
Top Apache Solr Features
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
How to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/SolrHow to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/Solr
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for Solr
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Google search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchGoogle search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise search
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Fun With SHA2 Certificates
Fun With SHA2 CertificatesFun With SHA2 Certificates
Fun With SHA2 Certificates
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Endeca vs solr
Endeca vs solrEndeca vs solr
Endeca vs solr
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
HTTP Basics
HTTP BasicsHTTP Basics
HTTP Basics
 

Similar to Solr: 4 big features

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data ScienceLucidworks
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)Timon Vonk
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16MLconf
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformLucidworks (Archived)
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...lucenerevolution
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfullyTEST Huddle
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 

Similar to Solr: 4 big features (20)

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Using LWE/Solr/Lucene for eCom
Using LWE/Solr/Lucene for eComUsing LWE/Solr/Lucene for eCom
Using LWE/Solr/Lucene for eCom
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 

More from David Smiley

2020-02 Solr's New Plugin System
2020-02 Solr's New Plugin System2020-02 Solr's New Plugin System
2020-02 Solr's New Plugin SystemDavid Smiley
 
H-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleH-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleDavid Smiley
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01David Smiley
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015David Smiley
 
2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal updateDavid Smiley
 

More from David Smiley (7)

2020-02 Solr's New Plugin System
2020-02 Solr's New Plugin System2020-02 Solr's New Plugin System
2020-02 Solr's New Plugin System
 
H-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleH-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at Scale
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015
 
2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
 
Lucene 4 spatial
Lucene 4 spatialLucene 4 spatial
Lucene 4 spatial
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Solr: 4 big features

  • 1. APACHE SOLR Four Big Features: •  Faceting •  Query auto-complete •  Geospatial •  Scaling 2014 March Presented by David Smiley at the Boston Java Meetup Group
  • 2. About David Smiley ➢  Software Engineer (14 years) ○  Search (5 years) ○  Java, Web, Spatial ➢  Part-time employed at MITRE ➢  Part-time search consultant ➢  Apache Lucene / Solr committer & PMC ➢  Published 1st book on Solr ➢  Presented at several conferences ➢  Taught several Solr classes
  • 3. Faceting •  Do you know what I mean by “faceting”? •  AKA: faceted navigation, or parametric search •  Popular Apps: •  eBay, Amazon, and many e-commerce sites •  Apps I use that don’t use faceting but I wish they did: •  http://search.maven.org and all Maven repository software: Nexus, Artifactory, Archiva •  JIRA •  Compare this to: http://jirasearch.mikemccandless.com/
  • 4. Faceted Navigation & Analytics by example… Notice the counts Optionally start with a keyword search or filter Extremely useful feature supported by very few platforms: Solr, ElasticSearch, Sphinx, … (no DBs) Credit: Trey Grainger; CareerBuilder
  • 5. How to: Field Faceting •  Index setup: schema.xml: <field name=“category” type=“string” /> <field name=“manufacturer” type=“string” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.field=category& facet.field=manufacturer
  • 6. How to: Numeric/Date Faceting •  Index setup: schema.xml: <field name=“timestamp” type=“tdate” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.range=timestamp& facet.range.start=NOW/YEAR-10YEAR facet.range.end=NOW/YEAR+1YEAR facet.range.gap=+1YEAR
  • 7. Query Suggest / Autocomplete If you aren’t doing this then you really should!
  • 8. Several Types •  Instant search •  Direct navigation to documents, usually by name/title/id, etc. •  Implement via edge n-grams or a Suggester •  Ex: iTunes, Netflix, … •  Query log completion •  Searches user queries you’ve captured & indexed •  Implement via edge n-grams or FreeTextSuggester •  Ex: Google •  Term completion •  Completes indexed words •  Implement via facet.prefix technique or a Suggester •  Facet / field value completion •  Ex: Mint.com Not mutually exclusive!
  • 9. Tools for Completing / Suggesting •  The Suggester •  A specialization of the spell-check Solr component •  8 implementations to choose from! Different pros/cons •  Weighted? Analyzing? Infix? Highlight? Fuzzy? N-gram model? •  Faceting with facet.prefix •  Respects your current filters – don’t suggest a 0-result response •  Edge n-grams, with standard search •  Terms component
  • 11. Geospatial Features •  Lucene/Solr can index text, numbers, dates, and spatial data •  Features: •  Index latitude & longitude coordinates or any X Y pairs •  Index polygons or other geometry •  Query by point-radius, rectangle, or polygon geometry •  Including “IsWithin” vs “Intersects” vs “Contains” predicates •  2d/flat Euclidean OR geodetic spherical world model •  Sort or relevancy-boost by distance to indexed points The NoSQL solutions with the best spatial are CouchDB, MongoDB, Solr, and ElasticSearch
  • 12. How to: Spatial Filter & Sort • Index setup: schema.xml: <field name=“geo” type=“location_rpt” /> • Index latitude comma longitude in your document: 37.7752,-100.0232 • Filter : http://localhost: 8983/solr/ collection1/ select? q=*:*& fq={!geofilt}& sort=geodist() asc& sfield=geo& pt=45.15,-93.85& d=5
  • 13. Cool Technology Under the Hood • Grid / tile based recursive indexed structure using a prefix tree / trie indexing approach on standard Lucene inverted index • Future: •  Precise indexed shapes •  Geodetic polygons •  Hilbert curve ordering
  • 14. Scaling Solr Solr’s mechanisms for scaling: •  Replication •  Eliminates single point of failure •  Reduces query load on any one node •  Backups •  Distributed-search (for sharded indexes) •  For collections of large multi-million document collections •  SolrCloud •  Combines distributed-search and real-time replicated indexing •  Centrally manages configuration •  A higher level logical API, manages lots of coordination underneath •  Advanced: doc routing, shard splitting, migration
  • 15. Replication & Sharding Illustrated with a metaphor of an encyclopedia at a library A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 26 Shards 3 Replicas A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  • 17. More Advanced SolrCloud Features •  Document routing customization •  Answers: Which shard does a document belong in? •  Hash (i.e. random) distribution •  Or keep certain related documents together (ex: for same user) •  Helps scale when searching by a subset •  Or manage it yourself manually (ex: index by month) •  Shard splitting •  When your shard(s) get to be too big •  Live; no down-time •  Inter-collection document migration •  Copies a subset of one collection to another, possibly new collection •  Live; no down-time
  • 18. That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me: dsmiley@apache.org ETA: June 2014