SlideShare a Scribd company logo
Lucene in Action
Применение Lucene для
построения
высокопроизволительных систем
Гавриленко Евгений
Ведущий разработчик Artezio
Lucene
• Что же это такое?
• Twitter 1млрд запросов в день
• hh.ru 400 запросов в секунду
• LinkedIn, FedEx…
Основные компоненты индексации
• IndexWriter
• Directory (FSDirectory, RAMDirectory)
• Analyzer
• Document
• Field / Multivalued fields
Построение индекса
var directory = new RAMDirectory();
//var directory = FSDirectory.Open("/tmp/testindex");
var analyzer = new RussianAnalyzer(Version.LUCENE_30);
using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
for (var i = 0; i < 1000000; i++)
{
var doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED));
writer.AddDocument(doc);
if (i%100000 == 0)
Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now);
}
writer.Optimize();
}
Схема данных
var doc1 = new Document();
doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
var field = new NumericField(“numericField1”, Field.Store.NO, true);
doc1.Add(field.SetDoubleValue(value));
var doc2 = new Document();
doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
Основные компоненты поиска
• IndexSearcher/MultiSearcher/ParallelMultiSearcher
• Term
• Query
• TermQuery
• TopDocs
Query
• TermQuery
• MultiFieldQueryParser
• BooleanQuery
• NumericRangeQuery
• SpanQuery
• …
• QueryParser
Поиск
var reader = IndexReader.Open(directory, true);
var searcher = new IndexSearcher(reader);
var parser = new QueryParser(Version.LUCENE_30, "text", analyzer);
var query = parser.Parse("20 строку");
var hits = searcher.Search(query, 100);
Console.WriteLine("total hits: {0}", hits.TotalHits);
if (hits.TotalHits == 0) return;
var rdoc = reader.Document(hits.ScoreDocs[0].Doc);
Console.WriteLine("value:{0}", rdoc.Get("text"));
Поиск с сортировкой
switch (sl)
{
case "barcode":
case "code":
indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir));
break;
case "price":
indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir));
break;
default:
indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir));
break;
}
...
searcher.SetDefaultFieldSortScoring(true,false);
var hits = searcher.Search(query, filter, count, indexSort);
Paging
Анализаторы
• StandardAnalyzer
• SnowballAnalyzer
• KeywordAnalyzer
• WhitespaceAnalyzer
• RussianAnalyzer ()
Применение в E-Commerce
Ecommerce
DB
Service/
Daemon
Lucene
Index
search
service
Search
backend
Linq to Lucene
public class Article
{
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Author { get; set; }
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Title { get; set; }
public DateTimeOffset PublishDate { get; set; }
[NumericField]
public long Id { get; set; }
[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)]
public string BodyText { get; set; }
[Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))]
public string SearchText
{
get { return string.Join(" ", new[] {Author, Title, BodyText}); }
}
}
Linq to Lucene
var directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, Version.LUCENE_30);
using (var session = provider.OpenSession<Article>())
{
session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow});
}
var articles = provider.AsQueryable<Article>();
var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));
var articlesByJohn = from a in articles
where a.Author == "John Doe" && a.PublishDate > threshold
orderby a.Title
select a;
Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count());
var searchResults = from a in articles
where a.SearchText == "some search query"
select a;
Console.WriteLine("Search Results: " + searchResults.Count());
Полезные ресурсы
• Lucene http://lucene.apache.org/
• Lucene.Net http://lucenenet.apache.org
• Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq
• “Lucene in Action” http://it-ebooks.info/book/2112

More Related Content

What's hot

Dapper performance
Dapper performanceDapper performance
Dapper performance
Suresh Loganatha
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
Lucidworks
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Flink Forward
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Lucidworks
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
Roy Russo
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
Sematext Group, Inc.
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
Nanda8904648951
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
Michael Häusler
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Erik Hatcher
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
Amresh Singh
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
Lucidworks
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
Yonik Seeley
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
OpenSource Connections
 

What's hot (20)

Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 

Viewers also liked

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineeringDevOWL Meetup
 
devOWL coffee-break
devOWL coffee-breakdevOWL coffee-break
devOWL coffee-break
DevOWL Meetup
 
SEO basics for developers
SEO basics for developersSEO basics for developers
SEO basics for developers
DevOWL Meetup
 
Startup tactics for developers: A, B, C
Startup tactics for developers: A, B, CStartup tactics for developers: A, B, C
Startup tactics for developers: A, B, C
DevOWL Meetup
 
HR VS DEV
HR VS DEVHR VS DEV
HR VS DEV
DevOWL Meetup
 
Bootstrap3 basics
Bootstrap3 basicsBootstrap3 basics
Bootstrap3 basics
DevOWL Meetup
 
Testing is coming
Testing is comingTesting is coming
Testing is coming
DevOWL Meetup
 
Easily create apps using Phonegap
Easily create apps using PhonegapEasily create apps using Phonegap
Easily create apps using Phonegap
DevOWL Meetup
 
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsTrainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
DevOWL Meetup
 
Как и зачем мы тестируем UI
Как и зачем мы тестируем UIКак и зачем мы тестируем UI
Как и зачем мы тестируем UI
Vyacheslav Lyalkin
 
ECMAScript 5 Features
ECMAScript 5 FeaturesECMAScript 5 Features
ECMAScript 5 Features
DevOWL Meetup
 
Потоковая репликация PostgreSQL
Потоковая репликация PostgreSQLПотоковая репликация PostgreSQL
Потоковая репликация PostgreSQL
DevOWL Meetup
 
Async Module Definition via RequireJS
Async Module Definition via RequireJSAsync Module Definition via RequireJS
Async Module Definition via RequireJS
DevOWL Meetup
 
AngularJS basics & theory
AngularJS basics & theoryAngularJS basics & theory
AngularJS basics & theory
DevOWL Meetup
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationVasilii Diachenko
 
Reactивная тяга
Reactивная тягаReactивная тяга
Reactивная тяга
Vitebsk Miniq
 
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiКак оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Softengi
 

Viewers also liked (17)

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineering
 
devOWL coffee-break
devOWL coffee-breakdevOWL coffee-break
devOWL coffee-break
 
SEO basics for developers
SEO basics for developersSEO basics for developers
SEO basics for developers
 
Startup tactics for developers: A, B, C
Startup tactics for developers: A, B, CStartup tactics for developers: A, B, C
Startup tactics for developers: A, B, C
 
HR VS DEV
HR VS DEVHR VS DEV
HR VS DEV
 
Bootstrap3 basics
Bootstrap3 basicsBootstrap3 basics
Bootstrap3 basics
 
Testing is coming
Testing is comingTesting is coming
Testing is coming
 
Easily create apps using Phonegap
Easily create apps using PhonegapEasily create apps using Phonegap
Easily create apps using Phonegap
 
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsTrainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
 
Как и зачем мы тестируем UI
Как и зачем мы тестируем UIКак и зачем мы тестируем UI
Как и зачем мы тестируем UI
 
ECMAScript 5 Features
ECMAScript 5 FeaturesECMAScript 5 Features
ECMAScript 5 Features
 
Потоковая репликация PostgreSQL
Потоковая репликация PostgreSQLПотоковая репликация PostgreSQL
Потоковая репликация PostgreSQL
 
Async Module Definition via RequireJS
Async Module Definition via RequireJSAsync Module Definition via RequireJS
Async Module Definition via RequireJS
 
AngularJS basics & theory
AngularJS basics & theoryAngularJS basics & theory
AngularJS basics & theory
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentation
 
Reactивная тяга
Reactивная тягаReactивная тяга
Reactивная тяга
 
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiКак оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
 

Similar to Lucene in Action

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
otisg
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
Ismail Mayat
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
Kira
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
Stelios Gorilas
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
Marouane Gazanayi
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
Appsterdam Milan
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
Mindfire Solutions
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
jdhok
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
Rob Windsor
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
JBug Italy
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
Martin Rehfeld
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
searchbox-com
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
Giovanni Fernandez-Kincade
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
Justin Finkelstein
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
aragozin
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
SPTechCon
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 

Similar to Lucene in Action (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 

More from DevOWL Meetup

Что такое современная Frontend разработка
Что такое современная Frontend разработкаЧто такое современная Frontend разработка
Что такое современная Frontend разработка
DevOWL Meetup
 
CQRS and EventSourcing
CQRS and EventSourcingCQRS and EventSourcing
CQRS and EventSourcing
DevOWL Meetup
 
Cага о сагах
Cага о сагахCага о сагах
Cага о сагах
DevOWL Meetup
 
MeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляMeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июля
DevOWL Meetup
 
Обзор Haxe & OpenFl
Обзор Haxe & OpenFlОбзор Haxe & OpenFl
Обзор Haxe & OpenFl
DevOWL Meetup
 
Recommerce изнутри
Recommerce изнутриRecommerce изнутри
Recommerce изнутри
DevOWL Meetup
 
Google map markers with Symfony2
Google map markers with Symfony2Google map markers with Symfony2
Google map markers with Symfony2
DevOWL Meetup
 

More from DevOWL Meetup (7)

Что такое современная Frontend разработка
Что такое современная Frontend разработкаЧто такое современная Frontend разработка
Что такое современная Frontend разработка
 
CQRS and EventSourcing
CQRS and EventSourcingCQRS and EventSourcing
CQRS and EventSourcing
 
Cага о сагах
Cага о сагахCага о сагах
Cага о сагах
 
MeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляMeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июля
 
Обзор Haxe & OpenFl
Обзор Haxe & OpenFlОбзор Haxe & OpenFl
Обзор Haxe & OpenFl
 
Recommerce изнутри
Recommerce изнутриRecommerce изнутри
Recommerce изнутри
 
Google map markers with Symfony2
Google map markers with Symfony2Google map markers with Symfony2
Google map markers with Symfony2
 

Recently uploaded

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Lucene in Action

  • 1. Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio
  • 2. Lucene • Что же это такое? • Twitter 1млрд запросов в день • hh.ru 400 запросов в секунду • LinkedIn, FedEx…
  • 3. Основные компоненты индексации • IndexWriter • Directory (FSDirectory, RAMDirectory) • Analyzer • Document • Field / Multivalued fields
  • 4. Построение индекса var directory = new RAMDirectory(); //var directory = FSDirectory.Open("/tmp/testindex"); var analyzer = new RussianAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize(); }
  • 5. Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); var field = new NumericField(“numericField1”, Field.Store.NO, true); doc1.Add(field.SetDoubleValue(value)); var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
  • 6. Основные компоненты поиска • IndexSearcher/MultiSearcher/ParallelMultiSearcher • Term • Query • TermQuery • TopDocs
  • 7. Query • TermQuery • MultiFieldQueryParser • BooleanQuery • NumericRangeQuery • SpanQuery • … • QueryParser
  • 8. Поиск var reader = IndexReader.Open(directory, true); var searcher = new IndexSearcher(reader); var parser = new QueryParser(Version.LUCENE_30, "text", analyzer); var query = parser.Parse("20 строку"); var hits = searcher.Search(query, 100); Console.WriteLine("total hits: {0}", hits.TotalHits); if (hits.TotalHits == 0) return; var rdoc = reader.Document(hits.ScoreDocs[0].Doc); Console.WriteLine("value:{0}", rdoc.Get("text"));
  • 9. Поиск с сортировкой switch (sl) { case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir)); break; case "price": indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir)); break; default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir)); break; } ... searcher.SetDefaultFieldSortScoring(true,false); var hits = searcher.Search(query, filter, count, indexSort);
  • 11. Анализаторы • StandardAnalyzer • SnowballAnalyzer • KeywordAnalyzer • WhitespaceAnalyzer • RussianAnalyzer ()
  • 13. Linq to Lucene public class Article { [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; } [Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; } public DateTimeOffset PublishDate { get; set; } [NumericField] public long Id { get; set; } [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; } [Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } } }
  • 14. Linq to Lucene var directory = new RAMDirectory(); var provider = new LuceneDataProvider(directory, Version.LUCENE_30); using (var session = provider.OpenSession<Article>()) { session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow}); } var articles = provider.AsQueryable<Article>(); var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30)); var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a; Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count()); var searchResults = from a in articles where a.SearchText == "some search query" select a; Console.WriteLine("Search Results: " + searchResults.Count());
  • 15. Полезные ресурсы • Lucene http://lucene.apache.org/ • Lucene.Net http://lucenenet.apache.org • Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq • “Lucene in Action” http://it-ebooks.info/book/2112