4. a SOFTWARE
• that builds index on Text
• answers queries using that index
5. Any search application has
two major components
SEARCH component
INDEXING component
- of importance to us developers
(read headache)
- of importance to the users
15. data ( documents )
INDEX FILES
user
sends
search query
receives
search results
Analyzer
fed to
text that should be indexed
removing stop words such as "a" or "the"
converting all text to lowercase letters
for case-insensitive searching
Stemming
(A stemming algorithm reduces
the words "fishing", "fished",
"fish", and "fisher" to the root word, "fish". )-
Index Writer
tokenized text
16. Document 1:
Coffee isn't my cup of tea.
Document 2:
Chocolate, men, coffee - some things are better rich.
INDEX
coffee - 1,2
cup - 1
tea - 1
chocolate - 1
men - 1
things - 1
better - 1
rich - 1
24. Ways of storing fields of any document:
Indexed means it is searchable
Stored you may chose not to make a field searchable, means the content can be
displayed in the search results. Example : “summary associated with a page”
Tokenized means it is run through an Analyzer, that converts the content into
a sequence of tokens
26. • open source
• handles index/Query to Lucene via HTTP and XML
( also JSON )
• manages document update, add and delete
requests to Lucene
• straightforward schema and config files
• comprehensive HTML Admin Interfaces
• highly configurable
33. Default Parameters
param default description
q The query
start 0 Offset into the list of matches
rows 10 Number of documents to return
fl * Stored fields to return
qt standard Query type; maps to query handler
df (schema) Default field to search
http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price