SlideShare a Scribd company logo
SOLR BASED
SEARCH
SHUBHANGI PARDESHI
ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND
MANY MORE !!
SOLR BASED SEARCH
CONTENTS
▸ Introduction to Lucene
▸ Introduction to Solr
▸ Terminologies
▸ Steps
▸ Document / Query Analysis
▸ Solr Search Features
▸ Solr Search - Query types
▸ Search Interfaces
▸ Search Challenges and solution
SOLR BASED SEARCH
WHAT IS LUCENE ?
▸ Open Source full text search (IR) library /API
▸ Witten in Java by Doug Cutting
▸ Major Components
▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of
data size)
▸ Search Algorithm : IndexSearcher
▸ No notion of schema
▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search
▸ Lucene is very powerful & difficult to use
SOLR BASED SEARCH
WHAT IS SOLR ?
▸ A full text Enterprise Search Server
▸ Caching
▸ Replication
▸ Easy administration
▸ Web Service layer on top of Lucene
▸ Non-Relation data storage and processing
▸ Loose schema to define type and fields
▸ Better recall and precision with various configurations options
▸ Easy to use
SOLR BASED SEARCH
TERMINOLOGIES
▸ Document : Unit of Index and Search
▸ Format : XML , JSON , CSV
▸ Fields : Name - Value pair , type is associated with each
field
▸ Search :
▸ Query : QueryParser - Creates query ——- >
IndexSearcher —- > Return hits
▸ Create Indexes
▸ Build Document
▸ Analyse Document
▸ Index Document
▸ Search
▸ Input Query
▸ Analyse Query
▸ Render Result
SOLR BASED SEARCH
STEPS
GET
CONTENTS
BUILD
SOLR DOC
ANALYSE
DOC
INDEX DOC
SEARCH UI
BUILD
QUERY
SEARCH
QUERY
STRING
RENDER
RESULT
ANALYSE
QUERY
CREATE INDEXES
SEARCH DOCUMENT
SOLR BASED SEARCH
SEARCH STRING / DOCUMENT ANALYSIS
▸ Analysis = Analyzer + Tokenizer + Filter
▸ Analyzer for Index and Search may or may not same
▸ E.g. <filedType name=“nametext” class=“solo.TextField”>
<analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” />
<fieldType>
<fieldType name=“nametext” class=“solo.TextField”>
<analyzer type=“index”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solr.LowerCaseFilterFactory” />
<filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” />
<filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” />
<analyzer>
<analyzer type=“query”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solo.LowerCaseFilterFactory” />
<analyzer>
<fieldType>
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Ranked Search : High score documents at top , score is one of the field in hits
▸ Field Searching
▸ Custom Sort by Field
▸ Boosting Result
▸ Multiword synonyms (Solr 6.5 onwards)
▸ Stemming
▸ Hit highlight
▸ Autocomplete
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Faceting
▸ Term Frequency
▸ Document age consideration
▸ Spellchecks
▸ Typo tolerant
▸ Phonetic match
▸ OpenNLP / UIMA integration
▸ Pagination
▸ Functions for computation (Like this)
▸ So on …
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Simple text Search :
▸ Find films where genre contains word “action” (q=genre:Action)
▸ Find films where genre contains word “Thriller” (q=genre:Thriller)
▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*)
▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose)
▸ Strict term presence search :
▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller))
▸ Find films directed by person whose name contains words “Gary” as well as
“Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary)
▸ Proximity Search
▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action
adventure”~20)
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Phrase Search
▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way)
(q=*:*&fq=genre:”action thriller”)
▸ Faceted Search
▸ Movies released during 2005 and 2006 , get count for each director
(fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO
2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by)
▸ Fuzzy Search (~)
▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~)
▸ Negative Search
▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
TEXT
▸ Wildcard Search
▸ Genre contains word like *ction (q=*:*&fq=genre:*ction)
▸ Conditional Logic in search
▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller))
▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller))
▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT
Thriller))
▸ Range Search
▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date:
[ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*)
▸ So on…
TEXT
EXAMPLES OF SEARCH INTERFACE
▸ REST API
▸ http://<host>:<port>/solr/<collection>/query?
▸ APIs such as SolrJ
▸ Solr Admin UI
SOLR BASED SEARCH
PRECISION AND RECALL WITH SOLR
WHAT
HOW
Results Relevant Results More Hits More Relevant Results at High Rank
Solr Synonyms
Fuzzy Search
Proximity Search
Phrase Search
Negative Search
Strict Term Presence
Doc Boosting
Index Binary Docs
Multiline Search
Index Many Fields
Search String Limit
SOLR BASED SEARCH
CHALLENGES
▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc
▸ Proper Solr Collection Configuration
▸ Slight change in query string words changes search results considerably
▸ Use stemming
▸ High recall
▸ Limit search by score
▸ Spaces
▸ Custom tokanizer
▸ Spelling mistakes in query string
▸ Fuzzy Search and/or spelling checkers
▸ Document Field names get indexed
SOLR BASED SEARCH
THANK YOU !!

More Related Content

Similar to Solr basedsearch

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
Kasper de Waard
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
Tiffany Hamburg Hamburg
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
amrapalibuildersreviews
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
Martina Helene Welander
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
NordicSitecoreConference
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
otisg
 
Google cheatsheet
Google cheatsheetGoogle cheatsheet
Google cheatsheet
Alejandro Rivera Santander
 
Azure Search for Your Apps
Azure Search for Your AppsAzure Search for Your Apps
Azure Search for Your Apps
Nurul Arif Setiawan
 
Azure Search for Your App
Azure Search for Your AppAzure Search for Your App
Azure Search for Your App
Nurul Arif Setiawan
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
Kira
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
Michael Oryszak
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Michael Reinsch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Amine Ferchichi
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 

Similar to Solr basedsearch (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Google cheatsheet
Google cheatsheetGoogle cheatsheet
Google cheatsheet
 
Azure Search for Your Apps
Azure Search for Your AppsAzure Search for Your Apps
Azure Search for Your Apps
 
Azure Search for Your App
Azure Search for Your AppAzure Search for Your App
Azure Search for Your App
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 

Recently uploaded

Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 

Recently uploaded (20)

Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 

Solr basedsearch

  • 1. SOLR BASED SEARCH SHUBHANGI PARDESHI ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND MANY MORE !!
  • 2. SOLR BASED SEARCH CONTENTS ▸ Introduction to Lucene ▸ Introduction to Solr ▸ Terminologies ▸ Steps ▸ Document / Query Analysis ▸ Solr Search Features ▸ Solr Search - Query types ▸ Search Interfaces ▸ Search Challenges and solution
  • 3. SOLR BASED SEARCH WHAT IS LUCENE ? ▸ Open Source full text search (IR) library /API ▸ Witten in Java by Doug Cutting ▸ Major Components ▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of data size) ▸ Search Algorithm : IndexSearcher ▸ No notion of schema ▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search ▸ Lucene is very powerful & difficult to use
  • 4. SOLR BASED SEARCH WHAT IS SOLR ? ▸ A full text Enterprise Search Server ▸ Caching ▸ Replication ▸ Easy administration ▸ Web Service layer on top of Lucene ▸ Non-Relation data storage and processing ▸ Loose schema to define type and fields ▸ Better recall and precision with various configurations options ▸ Easy to use
  • 5. SOLR BASED SEARCH TERMINOLOGIES ▸ Document : Unit of Index and Search ▸ Format : XML , JSON , CSV ▸ Fields : Name - Value pair , type is associated with each field ▸ Search : ▸ Query : QueryParser - Creates query ——- > IndexSearcher —- > Return hits
  • 6. ▸ Create Indexes ▸ Build Document ▸ Analyse Document ▸ Index Document ▸ Search ▸ Input Query ▸ Analyse Query ▸ Render Result SOLR BASED SEARCH STEPS GET CONTENTS BUILD SOLR DOC ANALYSE DOC INDEX DOC SEARCH UI BUILD QUERY SEARCH QUERY STRING RENDER RESULT ANALYSE QUERY CREATE INDEXES SEARCH DOCUMENT
  • 7. SOLR BASED SEARCH SEARCH STRING / DOCUMENT ANALYSIS ▸ Analysis = Analyzer + Tokenizer + Filter ▸ Analyzer for Index and Search may or may not same ▸ E.g. <filedType name=“nametext” class=“solo.TextField”> <analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” /> <fieldType> <fieldType name=“nametext” class=“solo.TextField”> <analyzer type=“index”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solr.LowerCaseFilterFactory” /> <filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” /> <filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” /> <analyzer> <analyzer type=“query”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solo.LowerCaseFilterFactory” /> <analyzer> <fieldType>
  • 8. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Ranked Search : High score documents at top , score is one of the field in hits ▸ Field Searching ▸ Custom Sort by Field ▸ Boosting Result ▸ Multiword synonyms (Solr 6.5 onwards) ▸ Stemming ▸ Hit highlight ▸ Autocomplete
  • 9. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Faceting ▸ Term Frequency ▸ Document age consideration ▸ Spellchecks ▸ Typo tolerant ▸ Phonetic match ▸ OpenNLP / UIMA integration ▸ Pagination ▸ Functions for computation (Like this) ▸ So on …
  • 10. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Simple text Search : ▸ Find films where genre contains word “action” (q=genre:Action) ▸ Find films where genre contains word “Thriller” (q=genre:Thriller) ▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*) ▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose) ▸ Strict term presence search : ▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller)) ▸ Find films directed by person whose name contains words “Gary” as well as “Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary) ▸ Proximity Search ▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action adventure”~20)
  • 11. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Phrase Search ▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way) (q=*:*&fq=genre:”action thriller”) ▸ Faceted Search ▸ Movies released during 2005 and 2006 , get count for each director (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by) ▸ Fuzzy Search (~) ▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~) ▸ Negative Search ▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
  • 12. TEXT ▸ Wildcard Search ▸ Genre contains word like *ction (q=*:*&fq=genre:*ction) ▸ Conditional Logic in search ▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller)) ▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller)) ▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT Thriller)) ▸ Range Search ▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*) ▸ So on…
  • 13. TEXT EXAMPLES OF SEARCH INTERFACE ▸ REST API ▸ http://<host>:<port>/solr/<collection>/query? ▸ APIs such as SolrJ ▸ Solr Admin UI
  • 14. SOLR BASED SEARCH PRECISION AND RECALL WITH SOLR WHAT HOW Results Relevant Results More Hits More Relevant Results at High Rank Solr Synonyms Fuzzy Search Proximity Search Phrase Search Negative Search Strict Term Presence Doc Boosting Index Binary Docs Multiline Search Index Many Fields Search String Limit
  • 15. SOLR BASED SEARCH CHALLENGES ▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc ▸ Proper Solr Collection Configuration ▸ Slight change in query string words changes search results considerably ▸ Use stemming ▸ High recall ▸ Limit search by score ▸ Spaces ▸ Custom tokanizer ▸ Spelling mistakes in query string ▸ Fuzzy Search and/or spelling checkers ▸ Document Field names get indexed