Apache Lucene

1.
Apache Lucene (Core) November, 2013 Engin Yöyen

2.
What is Lucene? • Information retrieval library / Text search-engine • Build with Java • High performance, scalable • Turn-key solution

3.
Document Model

4.
Indexing

5.
Inverted Index idterm docId 1 take 3 2 step 3 3 hang 1 4 right 1,2 5 people 2,3 6 consider 2 7 wrong 2

6.
Queries • Fieldbased (author:tolstoy content:people) • Boolean (people AND fear) (+people +fear -fun) • Wildcard (fe?r, peop?) • Fuzzy (mist~, mist~0.9) • Proximity ("sooner later”~1) • Range (1850 to 1890) • Boost factor (people^2 fear)

7.
Tokenizers&Token Filters thequick brown fox jumped over the lazy dogs • WhitespaceTokenizer (“the” “quick” “brown” “fox” “jumped” “over” “the” “lazy” “dogs”) • Standard Tokenizer (“quick” “brown” “fox” “jumped” “over” “lazy” “dogs”) • Stem filter (waiting -> wait) • Lower case filter • Synonym filter • and more….

8.
Performance over 150GB/hour

9.
Questions?

Apache Lucene

More Related Content

Viewers also liked

Similar to Apache Lucene

Recently uploaded

Apache Lucene