SlideShare a Scribd company logo
ENTERPRISE  SEARCH an introduction
Web Search Desktop Search Enterprise Search
so what is a Search Engine?
[object Object],[object Object],[object Object]
Any search application has  two major components SEARCH   component  INDEXING   component - of importance to us  developers (read headache) - of importance to the  users
data INDEX  FILES is indexed user sends  search query receives  search results INDEXING   component SEARCH   component
Let’s start with INDEXING
is it easy to search here  . . .
or  here  . . .
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
so what all   needs to be  Indexed and Searched ?
various   FILE FORMATS Text Files HTML PDF MS Word PPT
coming from various   DATA SOURCES Emails CMS File System Database Web Pages
data  ( documents )   INDEX  FILES user sends  search query receives  search results Analyzer fed to text that should be indexed  removing  stop words  such as "a" or "the" converting all text to  lowercase  letters  for case-insensitive searching Stemming (A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". )-   Index Writer tokenized text
Document 1: Coffee isn't my cup of tea.   Document 2:  Chocolate, men, coffee - some things are better rich.   INDEX coffee  - 1,2 cup - 1  tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
And now the SEARCH  Component
data INDEX  FILES is indexed user receives  search results sends  search query search terms
Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
introducing   LUCENE
[object Object],[object Object],[object Object],[object Object]
 
 
Ways of storing fields  of any document: Indexed   means it is   searchable Stored   you may chose not to make a field searchable,  means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized  means it is run through an  Analyzer , that converts the content into a sequence of  tokens
introducing   SOLR Solr Solr Lucene Index
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Adding Documents to SOLR
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schema.xml   field indexing and display definition
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solrconfig.xml  file  defines cache size, faceted field type, request handler customization
Deleting Documents ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price
Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
<response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc>  <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float>  </doc>  <doc>  <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update  Handler Caching XML Update  Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
 

More Related Content

What's hot

Search engine
Search engine Search engine
Search engine
AbinashranaSingh
 
internet ppt
internet pptinternet ppt
internet ppt
Saransh Modgil
 
محركات البحث
محركات البحثمحركات البحث
محركات البحث
Eyas Shrif
 
Social media
Social mediaSocial media
Social media
erwin marlon sario
 
Internet ,merits demerits, features
Internet ,merits demerits, featuresInternet ,merits demerits, features
Internet ,merits demerits, features
Kiran Kurian Philip
 
Twitter Ppt Presenation
Twitter Ppt PresenationTwitter Ppt Presenation
Twitter Ppt Presenation
shane_aib
 
The Internet Presentation
The Internet Presentation The Internet Presentation
The Internet Presentation
guest9e3d59
 
Google Search Presentation
Google Search PresentationGoogle Search Presentation
Google Search Presentation
WFL Tech Trainer, Jen Farr
 
google search engine
google search enginegoogle search engine
google search engine
way2go
 
All about Digital Marketing and its types.
All  about Digital Marketing and its types. All  about Digital Marketing and its types.
All about Digital Marketing and its types.
Jawhar Ali
 
Search Engine ppt
Search Engine pptSearch Engine ppt
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
Ali Saif Mirza
 
Social media
Social mediaSocial media
Social media
Eeshan Mishra
 
How search engines work
How search engines workHow search engines work
How search engines work
Chinna Botla
 
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
muaz mustafa
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
Karan Thakkar
 
Effective Internet Searching
Effective Internet SearchingEffective Internet Searching
Effective Internet Searching
Maggie Verster
 
ادوات البحث في شبكة الانترنت
ادوات البحث في شبكة الانترنتادوات البحث في شبكة الانترنت
ادوات البحث في شبكة الانترنت
سامر باخت
 
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة الإلكترونية : دراسة تحليلية
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة  الإلكترونية : دراسة تحليليةمقاييس الأداء الخاصة بتقييم مجموعات المكتبة  الإلكترونية : دراسة تحليلية
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة الإلكترونية : دراسة تحليلية
Wesam Musleh
 
How to REALLY use LinkedIn - mini LinkedIn Presentation
How to REALLY use LinkedIn - mini LinkedIn PresentationHow to REALLY use LinkedIn - mini LinkedIn Presentation
How to REALLY use LinkedIn - mini LinkedIn Presentation
Bert Verdonck
 

What's hot (20)

Search engine
Search engine Search engine
Search engine
 
internet ppt
internet pptinternet ppt
internet ppt
 
محركات البحث
محركات البحثمحركات البحث
محركات البحث
 
Social media
Social mediaSocial media
Social media
 
Internet ,merits demerits, features
Internet ,merits demerits, featuresInternet ,merits demerits, features
Internet ,merits demerits, features
 
Twitter Ppt Presenation
Twitter Ppt PresenationTwitter Ppt Presenation
Twitter Ppt Presenation
 
The Internet Presentation
The Internet Presentation The Internet Presentation
The Internet Presentation
 
Google Search Presentation
Google Search PresentationGoogle Search Presentation
Google Search Presentation
 
google search engine
google search enginegoogle search engine
google search engine
 
All about Digital Marketing and its types.
All  about Digital Marketing and its types. All  about Digital Marketing and its types.
All about Digital Marketing and its types.
 
Search Engine ppt
Search Engine pptSearch Engine ppt
Search Engine ppt
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
Social media
Social mediaSocial media
Social media
 
How search engines work
How search engines workHow search engines work
How search engines work
 
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
أنواع محركات البحث وعلاقتها في البحث عن المعلومات (1)
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Effective Internet Searching
Effective Internet SearchingEffective Internet Searching
Effective Internet Searching
 
ادوات البحث في شبكة الانترنت
ادوات البحث في شبكة الانترنتادوات البحث في شبكة الانترنت
ادوات البحث في شبكة الانترنت
 
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة الإلكترونية : دراسة تحليلية
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة  الإلكترونية : دراسة تحليليةمقاييس الأداء الخاصة بتقييم مجموعات المكتبة  الإلكترونية : دراسة تحليلية
مقاييس الأداء الخاصة بتقييم مجموعات المكتبة الإلكترونية : دراسة تحليلية
 
How to REALLY use LinkedIn - mini LinkedIn Presentation
How to REALLY use LinkedIn - mini LinkedIn PresentationHow to REALLY use LinkedIn - mini LinkedIn Presentation
How to REALLY use LinkedIn - mini LinkedIn Presentation
 

Viewers also liked

Search Engines
Search EnginesSearch Engines
Search Engines
Shamprasad Pujar
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
JSCHO9
 
Search engines
Search enginesSearch engines
Search engines
Sahiba Khurana
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
201014161
 
Search engines and its types
Search engines and its typesSearch engines and its types
Search engines and its types
Nagarjuna Kalluru
 
Search Engine
Search EngineSearch Engine
Search Engine
Ram Dutt Shukla
 
Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!
aashokkr
 
Tutorial 3 - Searcing the Web
Tutorial 3 - Searcing the WebTutorial 3 - Searcing the Web
Tutorial 3 - Searcing the Web
dpd
 
Search Engine
Search EngineSearch Engine
Search Engine
Ankush Srivastava
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
Surendra Kapadia
 

Viewers also liked (10)

Search Engines
Search EnginesSearch Engines
Search Engines
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Search engines and its types
Search engines and its typesSearch engines and its types
Search engines and its types
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!
 
Tutorial 3 - Searcing the Web
Tutorial 3 - Searcing the WebTutorial 3 - Searcing the Web
Tutorial 3 - Searcing the Web
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
 

Similar to Introduction to Search Engines

Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
longkeyy
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
Optum
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
pascaldimassimo
 
Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010
Chaitu Madala
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
patinijava
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
IRJET Journal
 
Microsoft Enterprise Search Products
Microsoft Enterprise Search ProductsMicrosoft Enterprise Search Products
Microsoft Enterprise Search Products
jareckib
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
askankit
 
E pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverviewE pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverview
wqwqqw wqqww
 
Xml
XmlXml
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
Christopher Biow
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
Crossref
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Corey Roth
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
BG Java EE Course
 
SharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer WorkflowsSharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer Workflows
Ivan Sanders
 
COinS (eng version)
COinS (eng version)COinS (eng version)
COinS (eng version)
Milan Janíček
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
Rishav Dixit
 
Basics of Xml
Basics of XmlBasics of Xml
Basics of Xml
Jerry Kurian
 

Similar to Introduction to Search Engines (20)

Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
 
Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
Microsoft Enterprise Search Products
Microsoft Enterprise Search ProductsMicrosoft Enterprise Search Products
Microsoft Enterprise Search Products
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
E pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverviewE pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverview
 
Xml
XmlXml
Xml
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
SharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer WorkflowsSharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer Workflows
 
COinS (eng version)
COinS (eng version)COinS (eng version)
COinS (eng version)
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
 
Basics of Xml
Basics of XmlBasics of Xml
Basics of Xml
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 

Introduction to Search Engines

  • 1. ENTERPRISE SEARCH an introduction
  • 2. Web Search Desktop Search Enterprise Search
  • 3. so what is a Search Engine?
  • 4.
  • 5. Any search application has two major components SEARCH component INDEXING component - of importance to us developers (read headache) - of importance to the users
  • 6. data INDEX FILES is indexed user sends search query receives search results INDEXING component SEARCH component
  • 8. is it easy to search here . . .
  • 9. or here . . .
  • 10.
  • 11.
  • 12. so what all needs to be Indexed and Searched ?
  • 13. various FILE FORMATS Text Files HTML PDF MS Word PPT
  • 14. coming from various DATA SOURCES Emails CMS File System Database Web Pages
  • 15. data ( documents ) INDEX FILES user sends search query receives search results Analyzer fed to text that should be indexed removing stop words such as &quot;a&quot; or &quot;the&quot; converting all text to lowercase letters for case-insensitive searching Stemming (A stemming algorithm reduces the words &quot;fishing&quot;, &quot;fished&quot;, &quot;fish&quot;, and &quot;fisher&quot; to the root word, &quot;fish&quot;. )- Index Writer tokenized text
  • 16. Document 1: Coffee isn't my cup of tea. Document 2: Chocolate, men, coffee - some things are better rich. INDEX coffee - 1,2 cup - 1 tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
  • 17. And now the SEARCH Component
  • 18. data INDEX FILES is indexed user receives search results sends search query search terms
  • 19. Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
  • 20. introducing LUCENE
  • 21.
  • 22.  
  • 23.  
  • 24. Ways of storing fields of any document: Indexed means it is searchable Stored you may chose not to make a field searchable, means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized means it is run through an Analyzer , that converts the content into a sequence of tokens
  • 25. introducing SOLR Solr Solr Lucene Index
  • 26.
  • 28.
  • 29. Schema.xml field indexing and display definition
  • 30.
  • 31. Solrconfig.xml file defines cache size, faceted field type, request handler customization
  • 32.
  • 34. Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
  • 35. <response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc> <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float> </doc> <doc> <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
  • 36. Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
  • 37.