SlideShare a Scribd company logo
How SOLR Search Works
Rajat Jain - 20th Dec, 2016
Agenda
• What do you mean by Search?
• Search Requirements
• Comparison of SOLR with SQL/NoSQL
• SOLR Architecture
• SOLR Usage in Trellis
• How Google Search Works
• Other Search Technologies
What do you mean by Search?
What do you mean by Search?
What do you mean by Search?
Search Requirements
• Text Search – eg. “Architects”
• Filters – eg. “In New Delhi”, “iOS”
• Sorting – eg. “Best Match”, “Highest Rating”, etc.
• And More..
• Facets
• Stemming
• Fuzzy Matching
• Image Search, etc.
Search Requirements
• Full Text Search
• Fast reads (writes can be slower)
• Various Combinations of Filters
• Various Combinations of Sorting
• Non Features:
• Real-time – usually staleness is not a problem
• Data Integrity – usually not a source of storage – can be ‘lossy’
Search Requirements – Faceted Search
• A Type of Filtering with
suggestions
• In most cases – sorted by
number
• Basically helps the user to
narrow down the search without
having to ‘guess’ how to narrow
it
Conventional Storage for Search
• SQL (MySQL)
• Relational Tables
• Normalized Data
• Assuming using Keys / Indexes for reads & writes
• Optimized for reads and writes & transactional data (acid transactions)
• Lots of security, etc.
• Table Data stored in File System
• Indexing - Individual columns – set of columns
• Full Text search – recent addition (full text index)
Conventional Storage for Search
• No SQL (think MongoDB)
• Key Value Pairs
• De-normalized Data
• Unstructured Data
• Optimized for Reads – writes can be slightly slower (in case of transactional)
• Data stored in File System
• Indexing – individual fields
• Full Text Search – has in-built support
Advantages of SOLR over MySQL/NoSQL
• Reversed Index
• Mind-blowing Text-analysis / stemming / scoring / fuzziness
• Weighting fields / boosting – custom scoring functions
• Single document concept – no relations (in general)
• Faceting support out-of-the box
• Optimized for search and search alone (at scale without performance
drop)
SOLR Architecture – Indexing
• Take a ‘document’ / field, etc.
• For each field apply set of filters
/ tokenizers
• Convert to individual tokens
• Update the ‘inverted’ index
based on the tokens
• In general in the Index keep
track of stats, etc. for the various
terms
• Different indexes per field
SOLR Architecture - Indexing
13
XML Update
Handler
CSV Update
Handler
/update /update/csv
XML Update
with custom
processor chain
/update/xml
Extracting
RequestHandler
(PDF, Word, …)
/update/extract
Lucene Index
Data Import
Handler
Database pull
RSS pull
Simple
transforms
SQL DB
RSS
feed
<doc>
<title>
Remove Duplicates
processor
Logging
processor
Index
processor
Custom Transform
processor
PDF
HTTP POST
HTTP POST
pull
pull
Update Processor Chain (per handler)
Lucene
Text Index
Analyzers
SOLR Architecture – Searching
• User enters query
• Parse the query, i.e. apply the
required filters and tokenizers
• Converted to tokens
• Parallel search across multiple
indexes (per field)
• Score all the documents
• Sort in async fashion
SOLR Architecture - Full
SOLR Architecture – Updating Index
• Types of Index Updates
• Instant Index
• Incremental Indexing
• Full Indexing
• Index Update Strategies
• Instant / Incremental Index cannot happen continuously
• Too much causes performance degradation
• Full Index periodically to optimize the index
SOLR Architecture – Scalability
• Sharding
• Splitting collections across servers
– search in parallel
• Replication
• More than one copy of the data
for failover
• SolrCloud
• Using Zookeeper for managing
clusters
SOLR Architecture – Other Features
• Stemming
• Identify root word and variations of the word, eg. "stems", "stemmer",
"stemming", "stemmed" as based on "stem"
• Fuzzy Matching
• Similar Words / Misspellings
• Edit Distance
• NLP
• Identify Entities / Nouns in Search Query
• OpenNLP Plugin for SOLR
• And much more…
SOLR Usage in Trellis
• Architecture
• Data-in from MySQL
• Index Update Strategy
• AutoComplete
• Basic Search
• Advanced Search
• Filters / Sorting / Facets & More
• Demo (Incl. Config Files)
How Google Search Works
• Crawling
• Robots.txt
• Indexing
• Multiple Indexes – Instant / Daily / Weekly / Long Tail
• Searching
• NLP, Stemming, Auto-correct, etc.
• Ranking – PageRank
• Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
Other Search Technologies
• ElasticSearch
• Much newer than Solr
• Built-in scalability
• Uses same Lucene as the base
• JSON instead of XML
• Good for Analytical querying
• Others
• Splunk
• Sphinx
That’s All Folks
References
• SOLR Home Page -
http://lucene.apache.org/solr/
• Tutorials
• http://www.solrtutorial.com/index.h
tml
• https://lucene.apache.org/solr/4_10
_0/tutorial.html
• Just Google the rest!!

More Related Content

What's hot

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
SpringPeople
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and Cloud
Joe Ryan
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
foundsearch
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Altinity Ltd
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
Codemotion
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
hypto
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
Olav Sandstå
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
pmanvi
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
Olav Sandstå
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
Maruf Hassan
 

What's hot (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and Cloud
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
MySQL lecture
MySQL lectureMySQL lecture
MySQL lecture
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 

Similar to How Solr Search Works

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
Lutf Ur Rehman
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
Bozhidar Bozhanov
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
Tommaso Teofili
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
NIKHIL DUBEY
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
SaaS Is Beautiful
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
Chris Caple
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
Alexander Tokarev
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
Alexander Tokarev
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
Mary Jo Sminkey
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
SearchStax
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix
 

Similar to How Solr Search Works (20)

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Solr
SolrSolr
Solr
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 

More from Atlogys Technical Consulting

Latest UI guidelines for Web Apps
Latest UI guidelines for Web AppsLatest UI guidelines for Web Apps
Latest UI guidelines for Web Apps
Atlogys Technical Consulting
 
Discipline at Atlogys
Discipline at AtlogysDiscipline at Atlogys
Discipline at Atlogys
Atlogys Technical Consulting
 
Reprogram your mind for Positive Thinking
Reprogram your mind for Positive ThinkingReprogram your mind for Positive Thinking
Reprogram your mind for Positive Thinking
Atlogys Technical Consulting
 
Docker @ Atlogys
Docker @ AtlogysDocker @ Atlogys
Tests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure AppsTests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure Apps
Atlogys Technical Consulting
 
Atomic Design with PatternLabs
Atomic Design with PatternLabsAtomic Design with PatternLabs
Atomic Design with PatternLabs
Atlogys Technical Consulting
 
Git and Version Control at Atlogys
Git and Version Control at AtlogysGit and Version Control at Atlogys
Git and Version Control at Atlogys
Atlogys Technical Consulting
 
Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)
Atlogys Technical Consulting
 
Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys
Atlogys Technical Consulting
 
BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy Series
Atlogys Technical Consulting
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
Atlogys Technical Consulting
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Atlogys Technical Consulting
 
Wordpress Tech Talk
Wordpress Tech Talk Wordpress Tech Talk
Wordpress Tech Talk
Atlogys Technical Consulting
 
Tech Talk on ReactJS
Tech Talk on ReactJSTech Talk on ReactJS
Tech Talk on ReactJS
Atlogys Technical Consulting
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Technical Consulting
 
Atlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design GuidelinesAtlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Technical Consulting
 
Firebase Tech Talk By Atlogys
Firebase Tech Talk By AtlogysFirebase Tech Talk By Atlogys
Firebase Tech Talk By Atlogys
Atlogys Technical Consulting
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys Technical Consulting
 
Smart CTO Service
Smart CTO ServiceSmart CTO Service
Smart CTO Service
Atlogys Technical Consulting
 
Atlogys Technical Consulting
Atlogys Technical ConsultingAtlogys Technical Consulting
Atlogys Technical Consulting
Atlogys Technical Consulting
 

More from Atlogys Technical Consulting (20)

Latest UI guidelines for Web Apps
Latest UI guidelines for Web AppsLatest UI guidelines for Web Apps
Latest UI guidelines for Web Apps
 
Discipline at Atlogys
Discipline at AtlogysDiscipline at Atlogys
Discipline at Atlogys
 
Reprogram your mind for Positive Thinking
Reprogram your mind for Positive ThinkingReprogram your mind for Positive Thinking
Reprogram your mind for Positive Thinking
 
Docker @ Atlogys
Docker @ AtlogysDocker @ Atlogys
Docker @ Atlogys
 
Tests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure AppsTests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure Apps
 
Atomic Design with PatternLabs
Atomic Design with PatternLabsAtomic Design with PatternLabs
Atomic Design with PatternLabs
 
Git and Version Control at Atlogys
Git and Version Control at AtlogysGit and Version Control at Atlogys
Git and Version Control at Atlogys
 
Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)
 
Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys
 
BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy Series
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
 
Wordpress Tech Talk
Wordpress Tech Talk Wordpress Tech Talk
Wordpress Tech Talk
 
Tech Talk on ReactJS
Tech Talk on ReactJSTech Talk on ReactJS
Tech Talk on ReactJS
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
 
Atlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design GuidelinesAtlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design Guidelines
 
Firebase Tech Talk By Atlogys
Firebase Tech Talk By AtlogysFirebase Tech Talk By Atlogys
Firebase Tech Talk By Atlogys
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!
 
Smart CTO Service
Smart CTO ServiceSmart CTO Service
Smart CTO Service
 
Atlogys Technical Consulting
Atlogys Technical ConsultingAtlogys Technical Consulting
Atlogys Technical Consulting
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

How Solr Search Works

  • 1. How SOLR Search Works Rajat Jain - 20th Dec, 2016
  • 2. Agenda • What do you mean by Search? • Search Requirements • Comparison of SOLR with SQL/NoSQL • SOLR Architecture • SOLR Usage in Trellis • How Google Search Works • Other Search Technologies
  • 3. What do you mean by Search?
  • 4. What do you mean by Search?
  • 5. What do you mean by Search?
  • 6. Search Requirements • Text Search – eg. “Architects” • Filters – eg. “In New Delhi”, “iOS” • Sorting – eg. “Best Match”, “Highest Rating”, etc. • And More.. • Facets • Stemming • Fuzzy Matching • Image Search, etc.
  • 7. Search Requirements • Full Text Search • Fast reads (writes can be slower) • Various Combinations of Filters • Various Combinations of Sorting • Non Features: • Real-time – usually staleness is not a problem • Data Integrity – usually not a source of storage – can be ‘lossy’
  • 8. Search Requirements – Faceted Search • A Type of Filtering with suggestions • In most cases – sorted by number • Basically helps the user to narrow down the search without having to ‘guess’ how to narrow it
  • 9. Conventional Storage for Search • SQL (MySQL) • Relational Tables • Normalized Data • Assuming using Keys / Indexes for reads & writes • Optimized for reads and writes & transactional data (acid transactions) • Lots of security, etc. • Table Data stored in File System • Indexing - Individual columns – set of columns • Full Text search – recent addition (full text index)
  • 10. Conventional Storage for Search • No SQL (think MongoDB) • Key Value Pairs • De-normalized Data • Unstructured Data • Optimized for Reads – writes can be slightly slower (in case of transactional) • Data stored in File System • Indexing – individual fields • Full Text Search – has in-built support
  • 11. Advantages of SOLR over MySQL/NoSQL • Reversed Index • Mind-blowing Text-analysis / stemming / scoring / fuzziness • Weighting fields / boosting – custom scoring functions • Single document concept – no relations (in general) • Faceting support out-of-the box • Optimized for search and search alone (at scale without performance drop)
  • 12. SOLR Architecture – Indexing • Take a ‘document’ / field, etc. • For each field apply set of filters / tokenizers • Convert to individual tokens • Update the ‘inverted’ index based on the tokens • In general in the Index keep track of stats, etc. for the various terms • Different indexes per field
  • 13. SOLR Architecture - Indexing 13 XML Update Handler CSV Update Handler /update /update/csv XML Update with custom processor chain /update/xml Extracting RequestHandler (PDF, Word, …) /update/extract Lucene Index Data Import Handler Database pull RSS pull Simple transforms SQL DB RSS feed <doc> <title> Remove Duplicates processor Logging processor Index processor Custom Transform processor PDF HTTP POST HTTP POST pull pull Update Processor Chain (per handler) Lucene Text Index Analyzers
  • 14. SOLR Architecture – Searching • User enters query • Parse the query, i.e. apply the required filters and tokenizers • Converted to tokens • Parallel search across multiple indexes (per field) • Score all the documents • Sort in async fashion
  • 16. SOLR Architecture – Updating Index • Types of Index Updates • Instant Index • Incremental Indexing • Full Indexing • Index Update Strategies • Instant / Incremental Index cannot happen continuously • Too much causes performance degradation • Full Index periodically to optimize the index
  • 17. SOLR Architecture – Scalability • Sharding • Splitting collections across servers – search in parallel • Replication • More than one copy of the data for failover • SolrCloud • Using Zookeeper for managing clusters
  • 18. SOLR Architecture – Other Features • Stemming • Identify root word and variations of the word, eg. "stems", "stemmer", "stemming", "stemmed" as based on "stem" • Fuzzy Matching • Similar Words / Misspellings • Edit Distance • NLP • Identify Entities / Nouns in Search Query • OpenNLP Plugin for SOLR • And much more…
  • 19. SOLR Usage in Trellis • Architecture • Data-in from MySQL • Index Update Strategy • AutoComplete • Basic Search • Advanced Search • Filters / Sorting / Facets & More • Demo (Incl. Config Files)
  • 20. How Google Search Works • Crawling • Robots.txt • Indexing • Multiple Indexes – Instant / Daily / Weekly / Long Tail • Searching • NLP, Stemming, Auto-correct, etc. • Ranking – PageRank • Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
  • 21. Other Search Technologies • ElasticSearch • Much newer than Solr • Built-in scalability • Uses same Lucene as the base • JSON instead of XML • Good for Analytical querying • Others • Splunk • Sphinx
  • 22. That’s All Folks References • SOLR Home Page - http://lucene.apache.org/solr/ • Tutorials • http://www.solrtutorial.com/index.h tml • https://lucene.apache.org/solr/4_10 _0/tutorial.html • Just Google the rest!!