Makes data accessible in a structured format , easily accessible through search.
so what all needs to be Indexed and Searched ?
various FILE FORMATS Text Files HTML PDF MS Word PPT
coming from various DATA SOURCES Emails CMS File System Database Web Pages
data ( documents ) INDEX FILES user sends search query receives search results Analyzer fed to text that should be indexed removing stop words such as "a" or "the" converting all text to lowercase letters for case-insensitive searching Stemming (A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". )- Index Writer tokenized text
Document 1: Coffee isn't my cup of tea. Document 2: Chocolate, men, coffee - some things are better rich. INDEX coffee - 1,2 cup - 1 tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
Ways of storing fields of any document: Indexed means it is searchable Stored you may chose not to make a field searchable, means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized means it is run through an Analyzer , that converts the content into a sequence of tokens
Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here