Lucene in Action
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,741
On Slideshare
291
From Embeds
3,450
Number of Embeds
7

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 3,450

http://meetup.gorodvitebsk.by 2,816
http://localhost 623
http://www.meetup.gorodvitebsk.by 4
http://nvswk5dvoa.m5xxe33eozuxizlconvs4ytz.nblk.ru 2
http://nvswk5dvoa.m5xxe33eozuxizlconvs4ytz.dd34.ru 2
http://hghltd.yandex.net 2
http://www.slideee.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio
  • 2. Lucene • Что же это такое? • Twitter 1млрд запросов в день • hh.ru 400 запросов в секунду • LinkedIn, FedEx…
  • 3. Основные компоненты индексации • IndexWriter • Directory (FSDirectory, RAMDirectory) • Analyzer • Document • Field / Multivalued fields
  • 4. Построение индекса var directory = new RAMDirectory(); //var directory = FSDirectory.Open("/tmp/testindex"); var analyzer = new RussianAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize(); }
  • 5. Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); var field = new NumericField(“numericField1”, Field.Store.NO, true); doc1.Add(field.SetDoubleValue(value)); var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
  • 6. Основные компоненты поиска • IndexSearcher/MultiSearcher/ParallelMultiSearcher • Term • Query • TermQuery • TopDocs
  • 7. Query • TermQuery • MultiFieldQueryParser • BooleanQuery • NumericRangeQuery • SpanQuery • … • QueryParser
  • 8. Поиск var reader = IndexReader.Open(directory, true); var searcher = new IndexSearcher(reader); var parser = new QueryParser(Version.LUCENE_30, "text", analyzer); var query = parser.Parse("20 строку"); var hits = searcher.Search(query, 100); Console.WriteLine("total hits: {0}", hits.TotalHits); if (hits.TotalHits == 0) return; var rdoc = reader.Document(hits.ScoreDocs[0].Doc); Console.WriteLine("value:{0}", rdoc.Get("text"));
  • 9. Поиск с сортировкой switch (sl) { case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir)); break; case "price": indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir)); break; default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir)); break; } ... searcher.SetDefaultFieldSortScoring(true,false); var hits = searcher.Search(query, filter, count, indexSort);
  • 10. Paging
  • 11. Анализаторы • StandardAnalyzer • SnowballAnalyzer • KeywordAnalyzer • WhitespaceAnalyzer • RussianAnalyzer ()
  • 12. Применение в E-Commerce Ecommerce DB Service/ Daemon Lucene Index search service Search backend
  • 13. Linq to Lucene public class Article { [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; } [Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; } public DateTimeOffset PublishDate { get; set; } [NumericField] public long Id { get; set; } [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; } [Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } } }
  • 14. Linq to Lucene var directory = new RAMDirectory(); var provider = new LuceneDataProvider(directory, Version.LUCENE_30); using (var session = provider.OpenSession<Article>()) { session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow}); } var articles = provider.AsQueryable<Article>(); var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30)); var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a; Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count()); var searchResults = from a in articles where a.SearchText == "some search query" select a; Console.WriteLine("Search Results: " + searchResults.Count());
  • 15. Полезные ресурсы • Lucene http://lucene.apache.org/ • Lucene.Net http://lucenenet.apache.org • Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq • “Lucene in Action” http://it-ebooks.info/book/2112