A Brief Introduction of Lucene, the small bomb behind the Solr & Elasticsearch.
Lucene is one of the best example and demonstrator of the Information Retrieval of the text-based documents,
2. What is Lucene ?
u Lucene is a high-performance, scalable information retrieval (IR)
library
u Lucene is just a software library, a toolkit
u A number of full-featured search applications have been built on
top of Lucene.
u Lucene was written by Doug Cutting
u Beyond Lucene’s core JAR are a number of extensions modules that
offer useful add-on functionality. Some of these are vital to almost all
applications, like the spellchecker and highlighter module
3. Components of Search
u Indexing
u Acquire Content
u Build Document
u Analyze Document
u Searching
u Build Query
u Search Query
u Render Results
5. Building Index - Introduction
u Lucene index data as Inverted Index.
u What is Inverted Index ? How does it looks like?
u Lucene indexed data as files called segments.
u What is inside these segments ?
u Lucene has a flexible schema
u Documents and Fields in Lucene
u De-normalization
6. Building Index – Indexing Process
u Extracting text and creating the document
u Analysis
u Adding to the index
Build Doc Analyze Doc Index
7. Building Index – Indexing Utils
u Indexing Operations
u Add
u Delete
u Update
u Various Field Types
u Boosting documents and fields
u Optimize Index
u Concurrency, thread safety, and locking issues
u Index Commits
u Merging
8. Search over Index
u Search Introduction
u Lucene Query Modeling
u Search Query & their parser
u Paging and Sorting Results
u Understanding Lucene scoring
Search
User
Interface
Build
Query
Render
Results
Run
Query
10. Lucene Extras
u Codecs
u The Codec API allows you to customise the way the following pieces of
index information are stored.
u Ex: SimpleTextCodec
u Faceting