Apache lucene - full text search

416 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
416
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apache lucene - full text search

  1. 1. Apache Lucene Full text search Marcelo
  2. 2. What’s that?  API created on 00’s  Apache owns that  Indexing  Searching  Available on Java, .NET, C++
  3. 3. Why is that so good?  Enhance user experience  More inteligent products  Speed processing  Relevance  Efficient  Suggestions
  4. 4. Indexing  IndexWritter 1. Directory implementation 2. Analizer  Create documents  Add these document to IndexWritter  Optimize (merge segments)  Close writter
  5. 5. Indexing
  6. 6. Searching  Directory  IndexSearcher  QueryParser  Query(“my search”)  TopDocs
  7. 7. Searching
  8. 8. How does that work?  Inverted Index  Term Normalization 1. Similar words (merge) 2. Stop words (remove) 3. +relevance –size on disk Term Document Ids And 1,2,3 Big 2,4,7 Fire 1 Keep 7,8 keeper 3,4 the 1,8
  9. 9. Analyzers “@Andy52 went to school yesterday!”  StandardAnalyzer [@Andy52] [went] [school] [yesterday!]  StopAnalyzer [Andy] [went] [school] [yesterday]  SimpleAnalyzer [andy] [went] [to] [school] [yesterday]  WhitespaceAnalyzer [@Andy52] [went] [to] [school] [yesterday]  KeywordAnalyzer [@Andy52 went to school yesterday!]
  10. 10. What known apps use that?  Twitter  Linked In  My Space
  11. 11. That’s all, thanks!

×