SlideShare a Scribd company logo
Apache Lucene
Full text search
Marcelo
What’s that?
 API created on 00’s
 Apache owns that
 Indexing
 Searching
 Available on Java, .NET, C++
Why is that so good?
 Enhance user experience
 More inteligent products
 Speed processing
 Relevance
 Efficient
 Suggestions
Indexing
 IndexWritter
1. Directory implementation
2. Analizer
 Create documents
 Add these document to IndexWritter
 Optimize (merge segments)
 Close writter
Indexing
Searching
 Directory
 IndexSearcher
 QueryParser
 Query(“my search”)
 TopDocs
Searching
How does that work?
 Inverted Index
 Term Normalization
1. Similar words (merge)
2. Stop words (remove)
3. +relevance –size on disk
Term Document Ids
And 1,2,3
Big 2,4,7
Fire 1
Keep 7,8
keeper 3,4
the 1,8
Analyzers
“@Andy52 went to school yesterday!”
 StandardAnalyzer
[@Andy52] [went] [school] [yesterday!]
 StopAnalyzer
[Andy] [went] [school] [yesterday]
 SimpleAnalyzer
[andy] [went] [to] [school] [yesterday]
 WhitespaceAnalyzer
[@Andy52] [went] [to] [school] [yesterday]
 KeywordAnalyzer
[@Andy52 went to school yesterday!]
What known apps use that?
 Twitter
 Linked In
 My Space
That’s all, thanks!

More Related Content

Similar to Apache lucene - full text search

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Apache lucene
Apache luceneApache lucene
Apache lucene
Dr. Abhiram Gandhe
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
GokulD
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
TO THE NEW | Technology
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
otisg
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
Findwise
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
Kira
 
ProjectHub
ProjectHubProjectHub
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
Mikko Huilaja
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
Clifford James
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
Mikko Huilaja
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
Willem Meints
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
Mindfire Solutions
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Trey Grainger
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Grant Ingersoll
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
Tibor Lipusz
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
Joey Wen
 

Similar to Apache lucene - full text search (20)

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 

More from Marcelo Cure

Api design
Api designApi design
Api design
Marcelo Cure
 
Zero mq
Zero mqZero mq
Zero mq
Marcelo Cure
 
Dev ops engineering and chatbots
Dev ops engineering and chatbotsDev ops engineering and chatbots
Dev ops engineering and chatbots
Marcelo Cure
 
Versioning APIs
Versioning APIsVersioning APIs
Versioning APIs
Marcelo Cure
 
Building restful ap is with harvester js
Building restful ap is with harvester jsBuilding restful ap is with harvester js
Building restful ap is with harvester js
Marcelo Cure
 
Cqrs, event sourcing and microservices
Cqrs, event sourcing and microservicesCqrs, event sourcing and microservices
Cqrs, event sourcing and microservices
Marcelo Cure
 
Immutability and immutable js
Immutability and immutable jsImmutability and immutable js
Immutability and immutable js
Marcelo Cure
 
Functional programming with python
Functional programming with pythonFunctional programming with python
Functional programming with python
Marcelo Cure
 
Polymer
PolymerPolymer
Polymer
Marcelo Cure
 
Hexagonal Architecture
Hexagonal ArchitectureHexagonal Architecture
Hexagonal Architecture
Marcelo Cure
 
What's the value of the metrics
What's the value of the metricsWhat's the value of the metrics
What's the value of the metrics
Marcelo Cure
 
Scala
ScalaScala
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
Marcelo Cure
 
Test driven development
Test driven developmentTest driven development
Test driven development
Marcelo Cure
 
Usability testing
Usability testingUsability testing
Usability testing
Marcelo Cure
 
Corona
CoronaCorona
Corona
Marcelo Cure
 

More from Marcelo Cure (16)

Api design
Api designApi design
Api design
 
Zero mq
Zero mqZero mq
Zero mq
 
Dev ops engineering and chatbots
Dev ops engineering and chatbotsDev ops engineering and chatbots
Dev ops engineering and chatbots
 
Versioning APIs
Versioning APIsVersioning APIs
Versioning APIs
 
Building restful ap is with harvester js
Building restful ap is with harvester jsBuilding restful ap is with harvester js
Building restful ap is with harvester js
 
Cqrs, event sourcing and microservices
Cqrs, event sourcing and microservicesCqrs, event sourcing and microservices
Cqrs, event sourcing and microservices
 
Immutability and immutable js
Immutability and immutable jsImmutability and immutable js
Immutability and immutable js
 
Functional programming with python
Functional programming with pythonFunctional programming with python
Functional programming with python
 
Polymer
PolymerPolymer
Polymer
 
Hexagonal Architecture
Hexagonal ArchitectureHexagonal Architecture
Hexagonal Architecture
 
What's the value of the metrics
What's the value of the metricsWhat's the value of the metrics
What's the value of the metrics
 
Scala
ScalaScala
Scala
 
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
Usability testing
Usability testingUsability testing
Usability testing
 
Corona
CoronaCorona
Corona
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Apache lucene - full text search

  • 1. Apache Lucene Full text search Marcelo
  • 2. What’s that?  API created on 00’s  Apache owns that  Indexing  Searching  Available on Java, .NET, C++
  • 3. Why is that so good?  Enhance user experience  More inteligent products  Speed processing  Relevance  Efficient  Suggestions
  • 4. Indexing  IndexWritter 1. Directory implementation 2. Analizer  Create documents  Add these document to IndexWritter  Optimize (merge segments)  Close writter
  • 6. Searching  Directory  IndexSearcher  QueryParser  Query(“my search”)  TopDocs
  • 8. How does that work?  Inverted Index  Term Normalization 1. Similar words (merge) 2. Stop words (remove) 3. +relevance –size on disk Term Document Ids And 1,2,3 Big 2,4,7 Fire 1 Keep 7,8 keeper 3,4 the 1,8
  • 9. Analyzers “@Andy52 went to school yesterday!”  StandardAnalyzer [@Andy52] [went] [school] [yesterday!]  StopAnalyzer [Andy] [went] [school] [yesterday]  SimpleAnalyzer [andy] [went] [to] [school] [yesterday]  WhitespaceAnalyzer [@Andy52] [went] [to] [school] [yesterday]  KeywordAnalyzer [@Andy52 went to school yesterday!]
  • 10. What known apps use that?  Twitter  Linked In  My Space