Lucene in Action
Применение Lucene для
построения
высокопроизволительных систем
Гавриленко Евгений
Ведущий разработчик Art...
Lucene
• Что же это такое?
• Twitter 1млрд запросов в день
• hh.ru 400 запросов в секунду
• LinkedIn, FedEx…
Основные компоненты индексации
• IndexWriter
• Directory (FSDirectory, RAMDirectory)
• Analyzer
• Document
• Field / Multi...
Построение индекса
var directory = new RAMDirectory();
//var directory = FSDirectory.Open("/tmp/testindex");
var analyzer ...
Схема данных
var doc1 = new Document();
doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_N...
Основные компоненты поиска
• IndexSearcher/MultiSearcher/ParallelMultiSearcher
• Term
• Query
• TermQuery
• TopDocs
Query
• TermQuery
• MultiFieldQueryParser
• BooleanQuery
• NumericRangeQuery
• SpanQuery
• …
• QueryParser
Поиск
var reader = IndexReader.Open(directory, true);
var searcher = new IndexSearcher(reader);
var parser = new QueryPars...
Поиск с сортировкой
switch (sl)
{
case "barcode":
case "code":
indexSort = new Sort(new SortField(sl, SortField.STRING,ind...
Paging
Анализаторы
• StandardAnalyzer
• SnowballAnalyzer
• KeywordAnalyzer
• WhitespaceAnalyzer
• RussianAnalyzer ()
Применение в E-Commerce
Ecommerce
DB
Service/
Daemon
Lucene
Index
search
service
Search
backend
Linq to Lucene
public class Article
{
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Author { get; set; }
[Fie...
Linq to Lucene
var directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, Version.LUCENE_30);
us...
Полезные ресурсы
• Lucene http://lucene.apache.org/
• Lucene.Net http://lucenenet.apache.org
• Linq to Lucene https://gith...
Upcoming SlideShare
Loading in...5
×

Lucene in Action

6,883

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
6,883
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lucene in Action

  1. 1. Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio
  2. 2. Lucene • Что же это такое? • Twitter 1млрд запросов в день • hh.ru 400 запросов в секунду • LinkedIn, FedEx…
  3. 3. Основные компоненты индексации • IndexWriter • Directory (FSDirectory, RAMDirectory) • Analyzer • Document • Field / Multivalued fields
  4. 4. Построение индекса var directory = new RAMDirectory(); //var directory = FSDirectory.Open("/tmp/testindex"); var analyzer = new RussianAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize(); }
  5. 5. Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); var field = new NumericField(“numericField1”, Field.Store.NO, true); doc1.Add(field.SetDoubleValue(value)); var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
  6. 6. Основные компоненты поиска • IndexSearcher/MultiSearcher/ParallelMultiSearcher • Term • Query • TermQuery • TopDocs
  7. 7. Query • TermQuery • MultiFieldQueryParser • BooleanQuery • NumericRangeQuery • SpanQuery • … • QueryParser
  8. 8. Поиск var reader = IndexReader.Open(directory, true); var searcher = new IndexSearcher(reader); var parser = new QueryParser(Version.LUCENE_30, "text", analyzer); var query = parser.Parse("20 строку"); var hits = searcher.Search(query, 100); Console.WriteLine("total hits: {0}", hits.TotalHits); if (hits.TotalHits == 0) return; var rdoc = reader.Document(hits.ScoreDocs[0].Doc); Console.WriteLine("value:{0}", rdoc.Get("text"));
  9. 9. Поиск с сортировкой switch (sl) { case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir)); break; case "price": indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir)); break; default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir)); break; } ... searcher.SetDefaultFieldSortScoring(true,false); var hits = searcher.Search(query, filter, count, indexSort);
  10. 10. Paging
  11. 11. Анализаторы • StandardAnalyzer • SnowballAnalyzer • KeywordAnalyzer • WhitespaceAnalyzer • RussianAnalyzer ()
  12. 12. Применение в E-Commerce Ecommerce DB Service/ Daemon Lucene Index search service Search backend
  13. 13. Linq to Lucene public class Article { [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; } [Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; } public DateTimeOffset PublishDate { get; set; } [NumericField] public long Id { get; set; } [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; } [Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } } }
  14. 14. Linq to Lucene var directory = new RAMDirectory(); var provider = new LuceneDataProvider(directory, Version.LUCENE_30); using (var session = provider.OpenSession<Article>()) { session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow}); } var articles = provider.AsQueryable<Article>(); var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30)); var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a; Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count()); var searchResults = from a in articles where a.SearchText == "some search query" select a; Console.WriteLine("Search Results: " + searchResults.Count());
  15. 15. Полезные ресурсы • Lucene http://lucene.apache.org/ • Lucene.Net http://lucenenet.apache.org • Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq • “Lucene in Action” http://it-ebooks.info/book/2112
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×