SlideShare a Scribd company logo
1 of 10
Download to read offline
Lucene
THE POWERFUL INFORMATION RETRIEVAL LIBRARY
What is Lucene ?
u Lucene is a high-performance, scalable information retrieval (IR)
library
u Lucene is just a software library, a toolkit
u A number of full-featured search applications have been built on
top of Lucene.
u Lucene was written by Doug Cutting
u Beyond Lucene’s core JAR are a number of extensions modules that
offer useful add-on functionality. Some of these are vital to almost all
applications, like the spellchecker and highlighter module
Components of Search
u Indexing
u Acquire Content
u Build Document
u Analyze Document
u Searching
u Build Query
u Search Query
u Render Results
Components of Search
Search
User
Interface
Build
Query
Render
Results
Run
QueryIndex
Index
Doc
Analyze
Doc
Build
Doc
Acquire
Content
Raw
Content
Building Index - Introduction
u Lucene index data as Inverted Index.
u What is Inverted Index ? How does it looks like?
u Lucene indexed data as files called segments.
u What is inside these segments ?
u Lucene has a flexible schema
u Documents and Fields in Lucene
u De-normalization
Building Index – Indexing Process
u Extracting text and creating the document
u Analysis
u Adding to the index
Build Doc Analyze Doc Index
Building Index – Indexing Utils
u Indexing Operations
u Add
u Delete
u Update
u Various Field Types
u Boosting documents and fields
u Optimize Index
u Concurrency, thread safety, and locking issues
u Index Commits
u Merging
Search over Index
u Search Introduction
u Lucene Query Modeling
u Search Query & their parser
u Paging and Sorting Results
u Understanding Lucene scoring
Search
User
Interface
Build
Query
Render
Results
Run
Query
Analysis Process
u Default Analyzers
u How Analyzers work
u Writing custom analyzer
Lucene Extras
u Codecs
u The Codec API allows you to customise the way the following pieces of
index information are stored.
u Ex: SimpleTextCodec
u Faceting

More Related Content

Similar to Lucene - The Powerful Information Retrieval Library

General ledger tech
General ledger techGeneral ledger tech
General ledger tech
Ram H
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 
KD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-LuceneKD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-Lucene
Harshakumar Ummerpillai
 

Similar to Lucene - The Powerful Information Retrieval Library (20)

MARUTHI_INVERTED_SEARCH_presentation.pptx
MARUTHI_INVERTED_SEARCH_presentation.pptxMARUTHI_INVERTED_SEARCH_presentation.pptx
MARUTHI_INVERTED_SEARCH_presentation.pptx
 
Illuminating Lucene.Net
Illuminating Lucene.NetIlluminating Lucene.Net
Illuminating Lucene.Net
 
General ledger tech
General ledger techGeneral ledger tech
General ledger tech
 
Atl aug99 gltech
Atl aug99 gltechAtl aug99 gltech
Atl aug99 gltech
 
Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ
 
Ids search presentation
Ids search presentationIds search presentation
Ids search presentation
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
JavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and SearchingJavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and Searching
 
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly  DiscoverableHarvesting From Many Silos at Web-scale Makes E-content Truly  Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
 
KD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-LuceneKD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-Lucene
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks Enterprise
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 

Lucene - The Powerful Information Retrieval Library

  • 2. What is Lucene ? u Lucene is a high-performance, scalable information retrieval (IR) library u Lucene is just a software library, a toolkit u A number of full-featured search applications have been built on top of Lucene. u Lucene was written by Doug Cutting u Beyond Lucene’s core JAR are a number of extensions modules that offer useful add-on functionality. Some of these are vital to almost all applications, like the spellchecker and highlighter module
  • 3. Components of Search u Indexing u Acquire Content u Build Document u Analyze Document u Searching u Build Query u Search Query u Render Results
  • 5. Building Index - Introduction u Lucene index data as Inverted Index. u What is Inverted Index ? How does it looks like? u Lucene indexed data as files called segments. u What is inside these segments ? u Lucene has a flexible schema u Documents and Fields in Lucene u De-normalization
  • 6. Building Index – Indexing Process u Extracting text and creating the document u Analysis u Adding to the index Build Doc Analyze Doc Index
  • 7. Building Index – Indexing Utils u Indexing Operations u Add u Delete u Update u Various Field Types u Boosting documents and fields u Optimize Index u Concurrency, thread safety, and locking issues u Index Commits u Merging
  • 8. Search over Index u Search Introduction u Lucene Query Modeling u Search Query & their parser u Paging and Sorting Results u Understanding Lucene scoring Search User Interface Build Query Render Results Run Query
  • 9. Analysis Process u Default Analyzers u How Analyzers work u Writing custom analyzer
  • 10. Lucene Extras u Codecs u The Codec API allows you to customise the way the following pieces of index information are stored. u Ex: SimpleTextCodec u Faceting