PRESENTATION Solr Enterprise Search Server by Armen Polischuk
Introduction <ul><li>Java Web Application (Http/XML)
Uses Apache Lucene as text search engine
Inverted Index Data Structure </li></ul>November, 30  2010
Common Usage November, 30  2010
Lucene Features November, 30  2010 <ul><li>A text-based inverted index persistent storage for efficient retrieval of
documents by indexed terms
A rich set of text analyzers to transform a string of text into a series of terms (words), which are the fundamental units...
A query syntax with a parser and a variety of query types
Lookup to exotic fuzzy matches
A good scoring algorithm based on sound Information Retrieval (IR)
principles to produce the more likely candidates first, with flexible means
to affect the scoring
A highlighter feature to show words found in context </li></ul><ul><li>A query spellchecker based on indexed content </li>...
Upcoming SlideShare
Loading in …5
×

Solr: Enterprise Search Server

1,222 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,222
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
34
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Solr: Enterprise Search Server

  1. 1. PRESENTATION Solr Enterprise Search Server by Armen Polischuk
  2. 2. Introduction <ul><li>Java Web Application (Http/XML)
  3. 3. Uses Apache Lucene as text search engine
  4. 4. Inverted Index Data Structure </li></ul>November, 30 2010
  5. 5. Common Usage November, 30 2010
  6. 6. Lucene Features November, 30 2010 <ul><li>A text-based inverted index persistent storage for efficient retrieval of
  7. 7. documents by indexed terms
  8. 8. A rich set of text analyzers to transform a string of text into a series of terms (words), which are the fundamental units indexed and searched
  9. 9. A query syntax with a parser and a variety of query types
  10. 10. Lookup to exotic fuzzy matches
  11. 11. A good scoring algorithm based on sound Information Retrieval (IR)
  12. 12. principles to produce the more likely candidates first, with flexible means
  13. 13. to affect the scoring
  14. 14. A highlighter feature to show words found in context </li></ul><ul><li>A query spellchecker based on indexed content </li></ul>
  15. 15. Solr Features November, 30 2010 <ul><li>HTTP request processing for indexing and querying documents
  16. 16. Several caches for faster query responses
  17. 17. A web-based administrative interface
  18. 18. Configuration files for the schema and the server itself
  19. 19. The disjunction-max query handler
  20. 20. A more like this plugin to list documents that are similar to a chosen document
  21. 21. A distributed Solr server model </li></ul>
  22. 22. Indexing Data November, 30 2010 <ul><li>Solr's native XML
  23. 23. CSV, JSON
  24. 24. Direct Database and XML Import through Solr's DataImportHandler
  25. 25. Rich documents through Solr Cell (pdf, doc, xls, ppt) </li></ul>
  26. 26. Indexing XML Request November, 30 2010 <ul><add allowDups = &quot;false&quot; > <ul><doc boost = &quot;2.0&quot; > <ul><field name = &quot;doc_id&quot; > 1 </field> <field name = &quot;type&quot; > PERSON </field> <field name = &quot;first_name&quot; boost = &quot;2.5&quot; > Armen </field> </ul></ul></ul><ul><ul><ul><li><field name = &quot;last_name&quot; > Polischuk </field> </li></ul></ul></ul><ul><ul><li></doc> </li></ul></ul><ul><ul><li><doc> </li><ul><li><field name = &quot;doc_id&quot; > 2 </field>
  27. 27. <field name = &quot;type&quot; > PERSON </field>
  28. 28. <field name = &quot;first_name&quot; > John </field>
  29. 29. <field name = &quot;last_name&quot; > Smith </field> </li></ul><li></doc> </li></ul></ul><ul></add> </ul><ul>Adding documents: </ul><ul>Deleting documents: </ul><ul><delete><id> doc_id:2 </id><id> doc_id:3 </id></delete> </ul>
  30. 30. Basic Searching November, 30 2010 <ul><li>Using Web Interface
  31. 31. Using http request/response
  32. 32. Using SolrJ client </li></ul>http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= Example:
  33. 33. Searching XML Response November, 30 2010 <response> <lst name = &quot;responseHeader&quot; > <int name = &quot;status&quot; > 0 </int> <int name = &quot;QTime&quot; > 392 </int> <lst name = &quot;params&quot; > <str name = &quot;explainOther&quot; /> <str name = &quot;fl&quot; > *,score </str> <str name = &quot;start&quot; > 0 </str> <str name = &quot;q&quot; > *:* </str> <str name = &quot;hl.fl&quot; /> </lst> </lst> <result name = &quot;response&quot; numFound = &quot;1002272&quot; start = &quot;0&quot; maxScore = &quot;1.0&quot; > <doc> <float name = &quot;score&quot; > 1.0 </float> <str name = &quot;id&quot; > PERSON:1 </str> <str name = &quot;first_name&quot; > Armen </str> <str name = &quot;last_name&quot; > Polischuk </str> </doc> </result> </response>
  34. 34. Features November, 30 2010 <ul><li>And, or, not: </li></ul>Java AND Developer NOT swing (Java OR Python) AND Developer <ul><li>Field qualifier: </li></ul>first_name:John AND last_name:Doe Phrase and term proximity “ Web Developer” “ Web Developer”~3 <ul><li>Wildcards </li></ul>Java* AND Developer <ul><li>Boosting </li></ul>Java^10 AND Web^5 AND Developer <ul><li>Filtering and sorting </li></ul>q=Java&fq=type%3APERSON&sort=score+asc
  35. 35. Advanced Features November, 30 2010 <ul><li>Highlighting
  36. 36. Query elevation
  37. 37. Spell checking aka “Did you mean...”
  38. 38. The more-like-this search </li></ul>
  39. 39. Scaling - approaches November, 30 2010 <ul><li>Optimizing a single Solr server </li><ul><li>JVM params
  40. 40. HTTP caching
  41. 41. Solr caching </li></ul></ul><ul><li>Split data by doc type </li></ul><ul><li>Scale Wide
  42. 42. Shards </li></ul>
  43. 43. Scaling – whole picture November, 30 2010
  44. 44. Documentation November, 30 2010 <ul><ul><li>http://wiki.apache.org/lucene-java/FrontPage </li></ul></ul><ul><ul><li>http://wiki.apache.org/solr/FrontPage </li></ul></ul>

×