SlideShare a Scribd company logo
www.anant.us | solutions@anant.us | 202.905.2818
1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007
Research & Development –
Comparing Lucene / SolR / Elastic &
Cloud Search Providers
Building Search Engines
What do we do?
Streamline, Organize & Unify
Business Information
Agenda
• Challenge - Why does this matter?
• Info Retrieval - Retrieval / Routing
• Lucene - More than meets the eye ...
• Search Engine - 30k Foot View
• On Premise - Lucene / SolR / Elastic
• Cloud Providers - Amazon / Azure
Challenge – Why does this matter?
Knowledge
Project
Information
Client Service
Information
Corporate
Guides
Collaborative
Documents
Assets
& Files
Corporate
Resources
Appleseed Framework (Portal, Base, Search)
G Drive
Delta
DropBox
G Drive
Delta
Nutshell
Dropbox
Freshbooks
G Drive
G Sites (KB)
G Drive
Workflowy
Evernote
G Drive
DropBox
OwnCloud
Pocket
Leaves
AIC (WP)
Anant (WP)
Document Retrieval
• Google Search
• Amazon Search
• LinkedIn Search
• CMS Search *
• Portal Search *
• CRM Search *
• Search *
Document Routing
• Google Alerts
• Amazon Recommendations
• Netflix Recommendations
• LinkedIn Recommendations
Information Retrieval
Lucene – Inverted Index
Lucene – More than meets the eye
Who
Next?
Think of it like a “NoSQL” Database that has great indexing..
everywhere.
Search Engine – 30 Thousand Foot View
The search index is only as good as your processed data.
If you put everything you find in your index, you are going to
spend a lot of time telling people how to search.
On Premise – Lucene / ES / SolR
Lucene
• Library
• File System
• Format
• Fast
• Embeddable*
• Indexing Anywhere
• Need to really know
Lucene
• No Interface
• No server
• Lots of house
keeping
SolR
• Server
• Admin / REST
Interface
• Configurable
• Scalable
• Great at Text*
• Truly Open
• 10+ Years
• Good ecosystem
• Too customizable
• Schemas*
• Zookeeper Needed
ElasticSearch
• Server
• Configurable
• Scalable
• Good ecosystem
• Built in Clustering
• Grouping / Filtering
• Great for Logs
• Started as a Cloud
Tool
• No great OTS
Interface
• Only REST Interface
Cloud Search – Amazon / Azure
Amazon
• SolRCloud*
• AWS* Ecosystem
• 5 QParsers
• Dynamic Fields
• 100% Completely
Managed
• Been Around for a
While
• Data / Read Writes
• No nested Objects
Azure
• ElasticSearch*
• Azure* Ecosystem
• 2 QParsers
• 100% Completely
Managed
• Good SDK
• Few Years Old
• Data / Read Writes
• No nested Objects
• Not so Dynamic Fields
Questions & Contact
www.anant.us | solutions@anant.us | 202.905.2818
1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007
@anantcorp
facebook.com/anantCorp
linkedin.com/company/anant
rahul@anant.us
linkedin.com/in/xingh
Rahul Singh
CEO & Founder
Questions & Contact
• Modern Enterprise
• Mastering Services in the Service of Others
• Hybrid Agile Project Management
• Building Search Engines
• CICD / DevOps
• Connecting Internet Software
www.anant.us | solutions@anant.us | 202.905.2818
1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007
Streamlined Data
Integration / Data Pipelines
Organized Knowledge
Search / Data Warehouses
Unified Interfaces
Portals / Dashboards / Mobile

More Related Content

What's hot

What's hot (20)

A lap around microsofts business intelligence platform
A lap around microsofts business intelligence platformA lap around microsofts business intelligence platform
A lap around microsofts business intelligence platform
 
In Memory Cahce Structure
In Memory Cahce StructureIn Memory Cahce Structure
In Memory Cahce Structure
 
Part I: SharePoint 2013 Administration by Todd Klindt and Shane Young - SPTec...
Part I: SharePoint 2013 Administration by Todd Klindt and Shane Young - SPTec...Part I: SharePoint 2013 Administration by Todd Klindt and Shane Young - SPTec...
Part I: SharePoint 2013 Administration by Todd Klindt and Shane Young - SPTec...
 
SPA vs. MPA
SPA vs. MPASPA vs. MPA
SPA vs. MPA
 
DC Titanium User Group Meetup: Appcelerator Titanium Alloy jan2013
DC Titanium User Group Meetup: Appcelerator Titanium Alloy jan2013DC Titanium User Group Meetup: Appcelerator Titanium Alloy jan2013
DC Titanium User Group Meetup: Appcelerator Titanium Alloy jan2013
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
Elasticsearch for Autosuggest in Clojure at Workframe
Elasticsearch for Autosuggest in Clojure at WorkframeElasticsearch for Autosuggest in Clojure at Workframe
Elasticsearch for Autosuggest in Clojure at Workframe
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
 
SQL Server 2016 What's New For Developers
SQL Server 2016  What's New For DevelopersSQL Server 2016  What's New For Developers
SQL Server 2016 What's New For Developers
 
The Importance of Wait Statistics in SQL Server
The Importance of Wait Statistics in SQL ServerThe Importance of Wait Statistics in SQL Server
The Importance of Wait Statistics in SQL Server
 
Performance Tuning Azure SQL Database
Performance Tuning Azure SQL DatabasePerformance Tuning Azure SQL Database
Performance Tuning Azure SQL Database
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive services
 
Serverless Real-time Tracking & Analysis
Serverless Real-time Tracking & AnalysisServerless Real-time Tracking & Analysis
Serverless Real-time Tracking & Analysis
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
Design for scale
Design for scaleDesign for scale
Design for scale
 
Intro to API Design Principles
Intro to API Design PrinciplesIntro to API Design Principles
Intro to API Design Principles
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
SQL Azure for ISUG(SQL Server Israeli User Group)
SQL Azure for ISUG(SQL Server Israeli User Group)SQL Azure for ISUG(SQL Server Israeli User Group)
SQL Azure for ISUG(SQL Server Israeli User Group)
 

Similar to Building Search Engines - Lucene, SolR and Elasticsearch

10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 

Similar to Building Search Engines - Lucene, SolR and Elasticsearch (20)

Lucene Enterprise Knowledge Search
Lucene Enterprise Knowledge SearchLucene Enterprise Knowledge Search
Lucene Enterprise Knowledge Search
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Intro to Solr in Drupal
Intro to Solr in Drupal Intro to Solr in Drupal
Intro to Solr in Drupal
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Emerging technologies in academic libraries
Emerging technologies in academic librariesEmerging technologies in academic libraries
Emerging technologies in academic libraries
 
Building A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on HadoopBuilding A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on Hadoop
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)SharePoint Databases: What you need to know (201609)
SharePoint Databases: What you need to know (201609)
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 

More from Rahul Singh

Building People First - Lessons in Team Effectiveness & Happiness
Building People First - Lessons in Team Effectiveness & HappinessBuilding People First - Lessons in Team Effectiveness & Happiness
Building People First - Lessons in Team Effectiveness & Happiness
Rahul Singh
 
Rahul.singh.speech presentation
Rahul.singh.speech presentationRahul.singh.speech presentation
Rahul.singh.speech presentation
Rahul Singh
 

More from Rahul Singh (14)

Unifying Business Information with Dashboards
Unifying Business Information with Dashboards Unifying Business Information with Dashboards
Unifying Business Information with Dashboards
 
Get Your Shit Together
Get Your Shit TogetherGet Your Shit Together
Get Your Shit Together
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 
Building Online Business Software 101 (B2B)
Building Online Business Software 101 (B2B) Building Online Business Software 101 (B2B)
Building Online Business Software 101 (B2B)
 
Asynchronous Data Processing
Asynchronous Data ProcessingAsynchronous Data Processing
Asynchronous Data Processing
 
Deliver Excellent Service to your Customers
Deliver Excellent Service to your CustomersDeliver Excellent Service to your Customers
Deliver Excellent Service to your Customers
 
Building Smart Indexes for Drupal Sites
Building Smart Indexes for Drupal SitesBuilding Smart Indexes for Drupal Sites
Building Smart Indexes for Drupal Sites
 
Building People First - Lessons in Team Effectiveness & Happiness
Building People First - Lessons in Team Effectiveness & HappinessBuilding People First - Lessons in Team Effectiveness & Happiness
Building People First - Lessons in Team Effectiveness & Happiness
 
Select * From Internet - Integrating the Web
Select * From Internet - Integrating the WebSelect * From Internet - Integrating the Web
Select * From Internet - Integrating the Web
 
Bill Drayton - Father of Social Entrepreneurship, Leading Leader of Social Ch...
Bill Drayton - Father of Social Entrepreneurship, Leading Leader of Social Ch...Bill Drayton - Father of Social Entrepreneurship, Leading Leader of Social Ch...
Bill Drayton - Father of Social Entrepreneurship, Leading Leader of Social Ch...
 
The Future of the Internet - The Next 30 Years
The Future of the Internet - The Next 30 YearsThe Future of the Internet - The Next 30 Years
The Future of the Internet - The Next 30 Years
 
Modern Presidential Communications - Communicating Presidential Rhetorical Vi...
Modern Presidential Communications - Communicating Presidential Rhetorical Vi...Modern Presidential Communications - Communicating Presidential Rhetorical Vi...
Modern Presidential Communications - Communicating Presidential Rhetorical Vi...
 
Rahul.singh.speech presentation
Rahul.singh.speech presentationRahul.singh.speech presentation
Rahul.singh.speech presentation
 
Anant - Micro Enterprise - The Future, Today
Anant - Micro Enterprise - The Future, TodayAnant - Micro Enterprise - The Future, Today
Anant - Micro Enterprise - The Future, Today
 

Recently uploaded

Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
aagad
 

Recently uploaded (12)

Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
The Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI StudioThe Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI Studio
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdf
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 

Building Search Engines - Lucene, SolR and Elasticsearch

  • 1. www.anant.us | solutions@anant.us | 202.905.2818 1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007 Research & Development – Comparing Lucene / SolR / Elastic & Cloud Search Providers Building Search Engines
  • 2. What do we do? Streamline, Organize & Unify Business Information
  • 3. Agenda • Challenge - Why does this matter? • Info Retrieval - Retrieval / Routing • Lucene - More than meets the eye ... • Search Engine - 30k Foot View • On Premise - Lucene / SolR / Elastic • Cloud Providers - Amazon / Azure
  • 4. Challenge – Why does this matter? Knowledge Project Information Client Service Information Corporate Guides Collaborative Documents Assets & Files Corporate Resources Appleseed Framework (Portal, Base, Search) G Drive Delta DropBox G Drive Delta Nutshell Dropbox Freshbooks G Drive G Sites (KB) G Drive Workflowy Evernote G Drive DropBox OwnCloud Pocket Leaves AIC (WP) Anant (WP)
  • 5. Document Retrieval • Google Search • Amazon Search • LinkedIn Search • CMS Search * • Portal Search * • CRM Search * • Search * Document Routing • Google Alerts • Amazon Recommendations • Netflix Recommendations • LinkedIn Recommendations Information Retrieval
  • 7. Lucene – More than meets the eye Who Next? Think of it like a “NoSQL” Database that has great indexing.. everywhere.
  • 8. Search Engine – 30 Thousand Foot View The search index is only as good as your processed data. If you put everything you find in your index, you are going to spend a lot of time telling people how to search.
  • 9. On Premise – Lucene / ES / SolR Lucene • Library • File System • Format • Fast • Embeddable* • Indexing Anywhere • Need to really know Lucene • No Interface • No server • Lots of house keeping SolR • Server • Admin / REST Interface • Configurable • Scalable • Great at Text* • Truly Open • 10+ Years • Good ecosystem • Too customizable • Schemas* • Zookeeper Needed ElasticSearch • Server • Configurable • Scalable • Good ecosystem • Built in Clustering • Grouping / Filtering • Great for Logs • Started as a Cloud Tool • No great OTS Interface • Only REST Interface
  • 10. Cloud Search – Amazon / Azure Amazon • SolRCloud* • AWS* Ecosystem • 5 QParsers • Dynamic Fields • 100% Completely Managed • Been Around for a While • Data / Read Writes • No nested Objects Azure • ElasticSearch* • Azure* Ecosystem • 2 QParsers • 100% Completely Managed • Good SDK • Few Years Old • Data / Read Writes • No nested Objects • Not so Dynamic Fields
  • 11. Questions & Contact www.anant.us | solutions@anant.us | 202.905.2818 1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007 @anantcorp facebook.com/anantCorp linkedin.com/company/anant rahul@anant.us linkedin.com/in/xingh Rahul Singh CEO & Founder Questions & Contact • Modern Enterprise • Mastering Services in the Service of Others • Hybrid Agile Project Management • Building Search Engines • CICD / DevOps • Connecting Internet Software
  • 12. www.anant.us | solutions@anant.us | 202.905.2818 1010 Wisconsin Ave, NW | Suite 250 | Washington, DC 20007 Streamlined Data Integration / Data Pipelines Organized Knowledge Search / Data Warehouses Unified Interfaces Portals / Dashboards / Mobile