0
Searching with Solr <ul><li>Tom Hill </li></ul><ul><li>[email_address] </li></ul><ul><li>eBig Java SIG, June 18th, 2008 </...
Tonight's Talk <ul><li>Tonight's Talk should run about 1 1/2 hours </li></ul><ul><li>About Solr  </li></ul><ul><ul><li>Bac...
Why Implement Search? <ul><li>Does your site need search? </li></ul><ul><li>Do you need to implement it, or  is Google eno...
What is Solr? <ul><li>Web application for text search </li></ul><ul><li>A wrapper around Apache Lucene  </li></ul><ul><ul>...
What is Lucene? <ul><li>Text search  library  in Java </li></ul><ul><ul><li>Fast, feature rich. </li></ul></ul><ul><ul><li...
Why Solr? <ul><ul><li>Reliable </li></ul></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Supported </li></ul></ul><ul><ul...
Solr Versions <ul><ul><li>Current Version is 1.2 </li></ul></ul><ul><ul><ul><li>A year old </li></ul></ul></ul><ul><ul><ul...
Alternatives to Solr <ul><li>Just Use Google </li></ul><ul><li>Use Lucene </li></ul><ul><li>Use Your Database </li></ul><u...
What Solr is Not <ul><li>A replacement for a relational database </li></ul><ul><li>An embedded database* </li></ul><ul><li...
Solr Sites <ul><li>CNet (Reviews & Products) </li></ul><ul><li>Internet Archive (Collections) </li></ul><ul><li>Netflix (M...
Features Here's a quick look at some of the features of Solr, as implemented on Zvents.com
 
Faceted Navigation <ul><li>Groups the results by category </li></ul><ul><li>Can do multiple facets at once  </li></ul><ul>...
Additional Constraints
Synonyms, etc.
Solr Overview
Simple Webapp Web Servers[1..n] Database Master Database Slaves[0..n] Solr Master Solr Slaves[0..n]
Scaling Solr <ul><li>Master/Slave architecture </li></ul><ul><li>Writes to master/reads to slaves </li></ul><ul><li>Replic...
Updates <ul><li>Updates flush caches, bad for performance </li></ul><ul><li>Master therefor much slower than slaves </li><...
Solr's Data Model <ul><li>Solr maintains a collection of documents </li></ul><ul><li>A document is a collection of fields ...
Querying <ul><li>Http request </li></ul><ul><li>http://localhost:8080/comix/select/?q=java </li></ul>
Solr Query Syntax <ul><li>Lucene Query Syntax + a bit </li></ul><ul><li>paris </li></ul><ul><li>city:paris </li></ul><ul><...
Solr Query Syntax II <ul><li>-inStock:false </li></ul><ul><li>te?t </li></ul><ul><li>theat* </li></ul><ul><li>te*t </li></...
Using Solr <ul><li>Getting data into Solr </li></ul><ul><li>Getting data out of Solr </li></ul>
Getting Data Into Solr <ul><li>POST it. </li></ul><add> <doc> <field name=&quot;employeeId&quot;>05991</field> <field name...
Getting Data Into Solr <ul><li>POST it. </li></ul><add> < doc > <field name=&quot;employeeId&quot;>05991</field> <field na...
Getting Data Into Solr <ul><li>POST it. </li></ul><add> <doc> <field name=&quot; employeeId &quot;> 05991 </field> <field ...
Committing <ul><li>Nothing shows up in the index until you commit </li></ul><ul><li>You can just POST <commit/> to  http:/...
Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot...
Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot...
Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot...
Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on { &quot;responseHeader&quot;:{ &quot;status&quot;:0,...
Debug Query Option <ul><li>Add  &debugQuery=on  to request params </li></ul><ul><li>Returns parsed form of query </li></ul...
Debug Query Option II <ul><li>Add  &debugQuery=on  to request params </li></ul><ul><li>Returns scoring information </li></...
Deleting Data <ul><li>POST  </li></ul><delete><id>35</id></delete> <delete><query>city:paris</query></delete>
Command Line Control <ul><li>curl  http://localhost:8983/solr/update  -H &quot;Content-type: text/xml&quot; --data-binary ...
Solr in 3 minutes! <ul><li>Download Solr from Apache </li></ul><ul><li>Untar </li></ul><ul><li>&quot;ant example&quot; </l...
Solr in Ten Minutes <Context docBase=&quot;/var/solr/apache-solr-1.2.0.war&quot; debug=&quot;0&quot; crossContext=&quot;tr...
Directory Layout <ul><li>${solr.home}/conf </li></ul><ul><ul><li>schema.xml </li></ul></ul><ul><ul><li>solrconfig.xml </li...
Java Solr Client <ul><li>Called SolrJ </li></ul><ul><li>Not in Solr 1.2.  </li></ul><ul><ul><li>I grabbed from the HEAD fr...
Adding Docs w/SolrJ Given Map<String, String> fields; CommonsHttpSolrServer  server  =  new  CommonsHttpSolrServer( url );...
Deleting Docs w/SolrJ CommonsHttpSolrServer  server  =  new  CommonsHttpSolrServer( url ); UpdateResponse res; res = serve...
Simple Query CommonsHttpSolrServer  server = new  CommonsHttpSolrServer( url ); SolrQuery query =  new  SolrQuery(); query...
More Interesting Query CommonsHttpSolrServer  server  =  new  CommonsHttpSolrServer( url ); SolrQuery query =  new  SolrQu...
Query Responses <ul><li>QueryResponse qr =  server .query(query); </li></ul><ul><li>SolrDocumentList docs = qr.getResults(...
Other Commands <ul><li>Commit </li></ul><ul><ul><li>server.commit() </li></ul></ul><ul><li>Optimize </li></ul><ul><ul><li>...
Request Handlers <ul><li>Request handler define how the query is processed. </li></ul><ul><li>Two main types </li></ul><ul...
&quot;Standard&quot; Request Handler <ul><li>Accepts Solr Query Syntax </li></ul><ul><li>I tend to use it for my queries, ...
DisMaxRequestHandler <ul><li>Recommended for user queries </li></ul><ul><li>Allows simple users keywords to be applied to ...
Boost Functions <ul><li>Allow you to influence scoring at run time </li></ul><ul><li>Computationally Expensive! </li></ul>...
The Solr Schema <ul><li>schema.xml </li></ul><ul><ul><li>Defines types used in this webapp </li></ul></ul><ul><ul><li>Defi...
Types <ul><li>Types define processing for a field </li></ul><ul><ul><li>How the words are split (Whitespace? Punctuation? ...
Analysis: Index and Query Time <ul><li>Types have two modes </li></ul><ul><ul><li>Index Time </li></ul></ul><ul><ul><li>Qu...
Simple Text Field <fieldType name=&quot;text&quot; class=&quot;solr.TextField&quot;  positionIncrementGap= &quot;100&quot;...
Analysis & Facets <ul><li>Make sure to use an untokenized field for faceting. </li></ul><ul><li>&quot;San Jose&quot; != &q...
Fields <ul><li>Elements of a document </li></ul><ul><ul><li>Both predefined & dynamic </li></ul></ul><ul><ul><li>Fields ma...
Example Fields <field name=&quot;id&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; require...
Copy Fields <ul><li>Two main uses </li></ul><ul><ul><li>To analyze a field in two different ways </li></ul></ul><ul><ul><l...
The Solr Config File <ul><li>solrconfig.xml </li></ul><ul><ul><li>Defines request handlers, defaults, caches,  </li></ul><...
Configuring DisMax <ul><li>Parameter defaults set in solrconfig.xml </li></ul><ul><li>Can be overridden in each request </...
DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; >  <lst name=&qu...
DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; >  ... <str name...
DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; >  ... <str name...
Wrap Up
Resources <ul><li>Solr  http://lucene.apache.org/solr/ </li></ul><ul><ul><li>wiki, mailing list, jira (bugs/features) </li...
Lucene In Action
Building Search Applications with Lucene, lingpipe and Gate Manu Konchady Manu Konchady Manu Konchady
Other Presentations <ul><li>Yonik Seely's Solr & Lucene </li></ul><ul><ul><li>http://people.apache.org/~yonik/presentation...
Thanks! Thanks for coming. Feel free to email me if you have questions about Solr Tom Hill [email_address]
Extra Slides Things I didn't have time for in the presentation. Some of them unfinished.
Search Engines are not the Same as Users <ul><li>Search engines have different usage patterns than users </li></ul>
Response Writers <ul><li>http://localhost:8983/solr/select/?q=text_t%3Atiger&version=2.2&start=0&rows=10&indent=on& wt=rub...
Explain <ul><li>Just why did the documents come up in that order? </li></ul>
Data Matters <ul><li>Gigo </li></ul><ul><li>The better the data is, the better the search will be. </li></ul>
Watch Your Caches <ul><li>Just like any other app, check your statistics </li></ul><ul><li>What's the hit rate for your ca...
Setting Up Replication <ul><li>Run rsyncd on the master </li></ul><ul><li>Run snapshot on the master at intervals </li></u...
Autowarming <ul><li>Runs after an update to the index </li></ul><ul><ul><li>Updates flush caches </li></ul></ul><ul><li>Ru...
Tour Of Solr's Web UI
Programming Collective Intelligence A Really Fun Book
Geographic Searching <ul><li>Local Lucene & Local Solr </li></ul><ul><li>http://locallucene.wiki.sourceforge.net </li></ul...
http://localhost:8983/solr/admin/stats.jsp#update Are there commits pending?
http://localhost:8983/comix/admin/analysis.jsp?name=text&val=wi-fi Analysis Explanation
Upcoming SlideShare
Loading in...5
×

An Introduction to Solr

25,882

Published on

A brief introduction to using Apache Solr for implementing search for your website.
Download the ppt to see comments which add more detail.
Presented at eBig Java SIG, Oakland, CA. June 2008

Published in: Technology
4 Comments
45 Likes
Statistics
Notes
No Downloads
Views
Total Views
25,882
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
1,294
Comments
4
Likes
45
Embeds 0
No embeds

No notes for slide
  • Transcript of "An Introduction to Solr"

    1. 1. Searching with Solr <ul><li>Tom Hill </li></ul><ul><li>[email_address] </li></ul><ul><li>eBig Java SIG, June 18th, 2008 </li></ul>
    2. 2. Tonight's Talk <ul><li>Tonight's Talk should run about 1 1/2 hours </li></ul><ul><li>About Solr </li></ul><ul><ul><li>Background & overview </li></ul></ul><ul><li>Installing & Bringing Up Solr </li></ul><ul><li>Rest Interface & Java Client </li></ul><ul><li>Configuring Solr </li></ul>
    3. 3. Why Implement Search? <ul><li>Does your site need search? </li></ul><ul><li>Do you need to implement it, or is Google enough? </li></ul><ul><ul><li>Just text or Structured Data? </li></ul></ul><ul><ul><li>Do you need to control ranking? </li></ul></ul>
    4. 4. What is Solr? <ul><li>Web application for text search </li></ul><ul><li>A wrapper around Apache Lucene </li></ul><ul><ul><li>Lucene is a library (.jar file) </li></ul></ul><ul><ul><li>Solr is a web app (.war file) </li></ul></ul><ul><li>Written at CNet, now at Apache </li></ul>
    5. 5. What is Lucene? <ul><li>Text search library in Java </li></ul><ul><ul><li>Fast, feature rich. </li></ul></ul><ul><ul><li>Written by Doug Cutting </li></ul></ul><ul><ul><li>Active Apache development community </li></ul></ul><ul><li>Versions also in C++, C#, Ruby, Python, Delphi, Lisp, etc... </li></ul>
    6. 6. Why Solr? <ul><ul><li>Reliable </li></ul></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Supported </li></ul></ul><ul><ul><li>Open Source </li></ul></ul><ul><ul><li>Tunable Scoring </li></ul></ul>
    7. 7. Solr Versions <ul><ul><li>Current Version is 1.2 </li></ul></ul><ul><ul><ul><li>A year old </li></ul></ul></ul><ul><ul><ul><li>1.3 is coming &quot;sometime&quot; </li></ul></ul></ul><ul><ul><li>Large number of features in HEAD </li></ul></ul><ul><ul><ul><li>Use the latest from subversion for new projects </li></ul></ul></ul>
    8. 8. Alternatives to Solr <ul><li>Just Use Google </li></ul><ul><li>Use Lucene </li></ul><ul><li>Use Your Database </li></ul><ul><li>Commercial Libraries </li></ul><ul><li>Write your own </li></ul>
    9. 9. What Solr is Not <ul><li>A replacement for a relational database </li></ul><ul><li>An embedded database* </li></ul><ul><li>Fully cross platform :-( </li></ul><ul><ul><li>Replication depends on unix FS </li></ul></ul><ul><ul><li>Admin scripts are bash(minor) </li></ul></ul>
    10. 10. Solr Sites <ul><li>CNet (Reviews & Products) </li></ul><ul><li>Internet Archive (Collections) </li></ul><ul><li>Netflix (Movies) </li></ul><ul><li>Zvents (Events) </li></ul><ul><li>StripSearch.ws (Comics) </li></ul><ul><li>And many more </li></ul>
    11. 11. Features Here's a quick look at some of the features of Solr, as implemented on Zvents.com
    12. 13. Faceted Navigation <ul><li>Groups the results by category </li></ul><ul><li>Can do multiple facets at once </li></ul><ul><li>Returns matching counts </li></ul>
    13. 14. Additional Constraints
    14. 15. Synonyms, etc.
    15. 16. Solr Overview
    16. 17. Simple Webapp Web Servers[1..n] Database Master Database Slaves[0..n] Solr Master Solr Slaves[0..n]
    17. 18. Scaling Solr <ul><li>Master/Slave architecture </li></ul><ul><li>Writes to master/reads to slaves </li></ul><ul><li>Replication: Periodic transfers, not continuous </li></ul><ul><li>Rsync </li></ul>
    18. 19. Updates <ul><li>Updates flush caches, bad for performance </li></ul><ul><li>Master therefor much slower than slaves </li></ul><ul><ul><li>So send all queries to slaves </li></ul></ul><ul><li>Depends on your update rates </li></ul>
    19. 20. Solr's Data Model <ul><li>Solr maintains a collection of documents </li></ul><ul><li>A document is a collection of fields & values </li></ul><ul><li>A field can occur multiple times in a document </li></ul><ul><li>Documents are immutable. </li></ul><ul><ul><li>They can be deleted, and a new version added, however. </li></ul></ul>
    20. 21. Querying <ul><li>Http request </li></ul><ul><li>http://localhost:8080/comix/select/?q=java </li></ul>
    21. 22. Solr Query Syntax <ul><li>Lucene Query Syntax + a bit </li></ul><ul><li>paris </li></ul><ul><li>city:paris </li></ul><ul><li>title:&quot;The Right Way&quot; AND text:go </li></ul><ul><li>id:[* TO *] </li></ul>
    22. 23. Solr Query Syntax II <ul><li>-inStock:false </li></ul><ul><li>te?t </li></ul><ul><li>theat* </li></ul><ul><li>te*t </li></ul><ul><li>test~ </li></ul>
    23. 24. Using Solr <ul><li>Getting data into Solr </li></ul><ul><li>Getting data out of Solr </li></ul>
    24. 25. Getting Data Into Solr <ul><li>POST it. </li></ul><add> <doc> <field name=&quot;employeeId&quot;>05991</field> <field name=&quot;office&quot;>Bridgewater</field> <field name=&quot;skills&quot;>Perl</field> <field name=&quot;skills&quot;>Java</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]] </add>
    25. 26. Getting Data Into Solr <ul><li>POST it. </li></ul><add> < doc > <field name=&quot;employeeId&quot;>05991</field> <field name=&quot;office&quot;>Bridgewater</field> <field name=&quot;skills&quot;>Perl</field> <field name=&quot;skills&quot;>Java</field> </ doc > [<doc> ... </doc>[<doc> ... </doc>]] </add>
    26. 27. Getting Data Into Solr <ul><li>POST it. </li></ul><add> <doc> <field name=&quot; employeeId &quot;> 05991 </field> <field name=&quot;office&quot;>Bridgewater</field> <field name=&quot;skills&quot;>Perl</field> <field name=&quot;skills&quot;>Java</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]] </add>
    27. 28. Committing <ul><li>Nothing shows up in the index until you commit </li></ul><ul><li>You can just POST <commit/> to http:// host : port /solr/update </li></ul>
    28. 29. Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot;responseHeader&quot;> </li></ul><ul><li><int name=&quot;status&quot;>0</int> </li></ul><ul><li><int name=&quot;QTime&quot;>0</int> </li></ul><ul><li><lst name=&quot;params&quot;> </li></ul><ul><ul><li><str name=&quot;indent&quot;>on</str> </li></ul></ul><ul><ul><li><str name=&quot;q&quot;>data</str> </li></ul></ul><ul><li></lst> </li></ul><ul><li></lst> </li></ul><ul><li><result name=&quot;response&quot; numFound=&quot;2&quot; start=&quot;0&quot;> </li></ul><ul><li><doc> </li></ul><ul><ul><li><str name=&quot;id&quot;>strip.3136</str> </li></ul></ul><ul><ul><li><str name=&quot;release_date&quot;>1992-05-07</str> </li></ul></ul><ul><ul><li><date name=&quot;timestamp&quot;>2008-02-28T10:06:01.682Z</date> </li></ul></ul><ul><ul><li><str name=&quot;type&quot;>strip</str> </li></ul></ul><ul><li></doc> </li></ul><ul><li></result> </li></ul><ul><li></response> </li></ul>
    29. 30. Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot;responseHeader&quot;> </li></ul><ul><li><int name=&quot;status&quot;>0</int> </li></ul><ul><li><int name=&quot;QTime&quot;>0</int> </li></ul><ul><li><lst name=&quot;params&quot;> </li></ul><ul><ul><li><str name=&quot;indent&quot;>on</str> </li></ul></ul><ul><ul><li><str name=&quot;q&quot;>data</str> </lst> </li></ul></ul><ul><li></lst> </li></ul><ul><li><result name=&quot;response&quot; numFound=&quot;2&quot; start=&quot;0&quot;> </li></ul><ul><li><doc> </li></ul><ul><ul><li><str name=&quot;id&quot;>strip.3136</str> </li></ul></ul><ul><ul><li><str name=&quot;release_date&quot;>1992-05-07</str> </li></ul></ul><ul><ul><li><date name=&quot;timestamp&quot;>2008-02-28T10:06:01.682Z</date> </li></ul></ul><ul><ul><li><str name=&quot;type&quot;>strip</str> </li></ul></ul><ul><li></doc> </li></ul><ul><li></result> </li></ul><ul><li></response> </li></ul>
    30. 31. Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on <ul><li><response> </li></ul><ul><li><lst name=&quot;responseHeader&quot;> </li></ul><ul><li><int name=&quot;status&quot;>0</int> </li></ul><ul><li><int name=&quot;QTime&quot;>0</int> </li></ul><ul><li><lst name=&quot;params&quot;> </li></ul><ul><ul><li><str name=&quot;indent&quot;>on</str> </li></ul></ul><ul><ul><li><str name=&quot;q&quot;>data</str> </lst> </li></ul></ul><ul><li></lst> </li></ul><ul><li><result name=&quot;response&quot; numFound=&quot;2&quot; start=&quot;0&quot;> </li></ul><ul><li><doc> </li></ul><ul><ul><li><str name=&quot;id&quot;>strip.3136</str> </li></ul></ul><ul><ul><li><str name=&quot;release_date&quot;>1992-05-07</str> </li></ul></ul><ul><ul><li><date name=&quot;timestamp&quot;>2008-02-28T10:06:01.682Z</date> </li></ul></ul><ul><ul><li><str name=&quot;type&quot;>strip</str> </li></ul></ul><ul><li></doc> ... </li></ul><ul><li></result> </li></ul><ul><li></response> </li></ul>
    31. 32. Getting Data Out http://localhost:8080/comix/select/?q=data&indent=on { &quot;responseHeader&quot;:{ &quot;status&quot;:0, &quot;QTime&quot;:1, &quot;params&quot;:{ &quot;wt&quot;:&quot;json&quot;, &quot;rows&quot;:[&quot;1&quot;, &quot;1&quot;], &quot;start&quot;:&quot;0&quot;, &quot;indent&quot;:&quot;on&quot;, &quot;q&quot;:&quot;data&quot;, &quot;version&quot;:&quot;2.2&quot;}}, &quot;response&quot;:{&quot;numFound&quot;:2,&quot;start&quot;:0,&quot;docs&quot;:[ { &quot;feature_id&quot;:&quot;3&quot;, &quot;release_date&quot;:&quot;1992-05-07&quot;, &quot;id&quot;:&quot;strip.3136&quot;, &quot;timestamp&quot;:&quot;2008-02-28T10:06:01.682Z&quot;}] }} JSON format
    32. 33. Debug Query Option <ul><li>Add &debugQuery=on to request params </li></ul><ul><li>Returns parsed form of query </li></ul><str name=&quot;rawquerystring&quot;>c.i.a</str><str name=&quot;querystring&quot;>c.i.a</str><str name=&quot;parsedquery&quot;>PhraseQuery(text:&quot;c i a&quot;)</str><str name=&quot;parsedquery_toString&quot;>text:&quot;c i a&quot;</str>
    33. 34. Debug Query Option II <ul><li>Add &debugQuery=on to request params </li></ul><ul><li>Returns scoring information </li></ul><str name=&quot;id=strip.2781,internal_docid=29854&quot;> 2.6219895 = (MATCH) fieldWeight(text:calvin in 29854), product of: 1.0 = tf(termFreq(text:calvin)=1) 2.6219895 = idf(docFreq=6222) 1.0 = fieldNorm(field=text, doc=29854) </str> <str name=&quot;id=strip.4078,internal_docid=31151&quot;> 2.6219895 = (MATCH) fieldWeight(text:calvin in 31151), product of: 1.0 = tf(termFreq(text:calvin)=1) 2.6219895 = idf(docFreq=6222) 1.0 = fieldNorm(field=text, doc=31151) </str>
    34. 35. Deleting Data <ul><li>POST </li></ul><delete><id>35</id></delete> <delete><query>city:paris</query></delete>
    35. 36. Command Line Control <ul><li>curl http://localhost:8983/solr/update -H &quot;Content-type: text/xml&quot; --data-binary '<commit/>' </li></ul><ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?><response><lst name=&quot;responseHeader&quot;> </li></ul><ul><li><int name=&quot;status&quot;>0</int> </li></ul><ul><ul><li><int name=&quot;QTime&quot;>20</int> </li></ul></ul><ul><ul><li></lst></response> </li></ul></ul><ul><li></lst></response> </li></ul><ul><li></lst></response> </li></ul>
    36. 37. Solr in 3 minutes! <ul><li>Download Solr from Apache </li></ul><ul><li>Untar </li></ul><ul><li>&quot;ant example&quot; </li></ul><ul><li>Start the example app </li></ul><ul><li>Load data into Solr </li></ul><ul><li>Query </li></ul>
    37. 38. Solr in Ten Minutes <Context docBase=&quot;/var/solr/apache-solr-1.2.0.war&quot; debug=&quot;0&quot; crossContext=&quot;true&quot; > <Environment name=&quot;solr/home&quot; type=&quot;java.lang.String&quot; value=&quot;/var/solr&quot; override=&quot;true&quot; /></Context> <ul><li>Copy Solr's example/solr dir to /var/solr </li></ul><ul><li>Edit schema.xml and solrconfig.xml </li></ul><ul><li>Load data into Solr </li></ul><ul><li>In $CATALINA_HOME/conf/Catalina/localhost/foo.xml </li></ul>
    38. 39. Directory Layout <ul><li>${solr.home}/conf </li></ul><ul><ul><li>schema.xml </li></ul></ul><ul><ul><li>solrconfig.xml </li></ul></ul><ul><li>${solr.home}/data </li></ul><ul><li>${solr.home}/logs </li></ul><ul><li>${solr.home}/bin </li></ul>
    39. 40. Java Solr Client <ul><li>Called SolrJ </li></ul><ul><li>Not in Solr 1.2. </li></ul><ul><ul><li>I grabbed from the HEAD from svn </li></ul></ul><ul><ul><li>Works with Solr 1.2 </li></ul></ul><ul><li>Add/Delete/Query/Commit/Optimize </li></ul>
    40. 41. Adding Docs w/SolrJ Given Map<String, String> fields; CommonsHttpSolrServer server = new CommonsHttpSolrServer( url ); SolrInputDocument doc= new SolrInputDocument(); for (Map.Entry<String, String> e : fields.entrySet()){ doc.addField(e.getKey(), e.getValue()); } UpdateResponse res = server .add( doc);
    41. 42. Deleting Docs w/SolrJ CommonsHttpSolrServer server = new CommonsHttpSolrServer( url ); UpdateResponse res; res = server .deleteById(&quot;100&quot;); res = server .deleteByQuery(&quot;city:paris&quot;);
    42. 43. Simple Query CommonsHttpSolrServer server = new CommonsHttpSolrServer( url ); SolrQuery query = new SolrQuery(); query.setQuery(&quot;dance&quot;); QueryResponse rsp = server .query(query);
    43. 44. More Interesting Query CommonsHttpSolrServer server = new CommonsHttpSolrServer( url ); SolrQuery query = new SolrQuery(); query.setQuery(&quot;dance&quot;); query.setFacet( true ); query.addFacetField(&quot;city&quot;); query.setFacetMinCount(1); query.addSortField( &quot;price&quot;, SolrQuery.ORDER.asc ); QueryResponse rsp = server .query(query);
    44. 45. Query Responses <ul><li>QueryResponse qr = server .query(query); </li></ul><ul><li>SolrDocumentList docs = qr.getResults(); </li></ul><ul><li>List<FacetField> lf = qr.getFacetFields(); </li></ul><ul><li>for (FacetField ff: lf) { </li></ul><ul><li>String fieldName = ff.getName(); </li></ul><ul><li>List<FacetField.Count> lc = ff.getValues(); </li></ul><ul><li>for (FacetField.Count c: lc) { </li></ul><ul><li>String countName = c.getName(); </li></ul><ul><ul><li>long count = c.getCount(); </li></ul></ul><ul><li> } </li></ul><ul><li>} </li></ul>
    45. 46. Other Commands <ul><li>Commit </li></ul><ul><ul><li>server.commit() </li></ul></ul><ul><li>Optimize </li></ul><ul><ul><li>server.optimize() </li></ul></ul><ul><li>Not too complicated! </li></ul>
    46. 47. Request Handlers <ul><li>Request handler define how the query is processed. </li></ul><ul><li>Two main types </li></ul><ul><ul><li>StandardRequestHandler </li></ul></ul><ul><ul><li>DisMaxRequestHandler </li></ul></ul><ul><ul><li>You can implement your own </li></ul></ul><ul><li>Changing in Solr 1.3 </li></ul>
    47. 48. &quot;Standard&quot; Request Handler <ul><li>Accepts Solr Query Syntax </li></ul><ul><li>I tend to use it for my queries, not user queries. </li></ul>
    48. 49. DisMaxRequestHandler <ul><li>Recommended for user queries </li></ul><ul><li>Allows simple users keywords to be applied to multiple fields, with weighting. </li></ul><ul><li>Boost Functions </li></ul><ul><li>Boost Queries </li></ul>
    49. 50. Boost Functions <ul><li>Allow you to influence scoring at run time </li></ul><ul><li>Computationally Expensive! </li></ul><ul><li>Really useful for tuning scoring </li></ul><ul><li>linear(x,2,4) returns 2*x+4 </li></ul><ul><ul><li>x is a field </li></ul></ul>
    50. 51. The Solr Schema <ul><li>schema.xml </li></ul><ul><ul><li>Defines types used in this webapp </li></ul></ul><ul><ul><li>Defines the fields and their types </li></ul></ul><ul><ul><li>Defines &quot;copyFields&quot; </li></ul></ul><ul><ul><li>READ THE EXAMPLE SCHEMA.XML </li></ul></ul>
    51. 52. Types <ul><li>Types define processing for a field </li></ul><ul><ul><li>How the words are split (Whitespace? Punctuation? CIA != C.I.A.) </li></ul></ul><ul><ul><ul><li>Stemming </li></ul></ul></ul><ul><ul><ul><li>Case Folding, etc </li></ul></ul></ul><ul><ul><ul><li>Predefined date, int, float, etc </li></ul></ul></ul><ul><ul><ul><li>c </li></ul></ul></ul>
    52. 53. Analysis: Index and Query Time <ul><li>Types have two modes </li></ul><ul><ul><li>Index Time </li></ul></ul><ul><ul><li>Query Time </li></ul></ul>
    53. 54. Simple Text Field <fieldType name=&quot;text&quot; class=&quot;solr.TextField&quot; positionIncrementGap= &quot;100&quot;> <analyzer type=&quot;index&quot;> <tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/> <filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot; words=&quot;stopwords.txt&quot;/></analyzer><analyzer type=&quot;query&quot;><tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/> <filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;true&quot;/> <filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot; words=&quot;stopwords.txt&quot;/></analyzer></fieldType> <filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot; words=&quot;stopwords.txt&quot;/></analyzer></fieldType> <filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot; words=&quot;stopwords.txt&quot;/></analyzer></fieldType>
    54. 55. Analysis & Facets <ul><li>Make sure to use an untokenized field for faceting. </li></ul><ul><li>&quot;San Jose&quot; != &quot;San&quot; &quot;Jose&quot; </li></ul>
    55. 56. Fields <ul><li>Elements of a document </li></ul><ul><ul><li>Both predefined & dynamic </li></ul></ul><ul><ul><li>Fields may occur multiple times </li></ul></ul><ul><ul><li>Maybe indexed and/or stored </li></ul></ul>
    56. 57. Example Fields <field name=&quot;id&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; required=&quot;true&quot; /><field name=&quot;name&quot; type=&quot;text&quot; indexed=&quot;true&quot; stored=&quot;true&quot;/><field name=&quot;alphaNameSort&quot; type=&quot;alphaOnlySort&quot; indexed=&quot;true&quot; stored=&quot;false&quot;/>
    57. 58. Copy Fields <ul><li>Two main uses </li></ul><ul><ul><li>To analyze a field in two different ways </li></ul></ul><ul><ul><li>To concatenate fields </li></ul></ul>
    58. 59. The Solr Config File <ul><li>solrconfig.xml </li></ul><ul><ul><li>Defines request handlers, defaults, caches, </li></ul></ul><ul><ul><li>Read the example solrconfig.xml </li></ul></ul>
    59. 60. Configuring DisMax <ul><li>Parameter defaults set in solrconfig.xml </li></ul><ul><li>Can be overridden in each request </li></ul><ul><ul><li>Except for params labeled invariant </li></ul></ul>
    60. 61. DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; > <lst name=&quot;defaults&quot;> <str name=&quot;echoParams&quot;>explicit</str> <float name=&quot;tie&quot;>0.01</float> <str name=&quot;qf&quot;> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> <str name=&quot;pf&quot;> text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 </str> ... </requestHandler>
    61. 62. DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; > ... <str name=&quot;bf&quot;> ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 </str> <str name=&quot;fl&quot;> id,name,price,score </str>... </requestHandler>
    62. 63. DisMax Config Example <requestHandler name=&quot;dismax&quot; class=&quot;solr.DisMaxRequestHandler&quot; > ... <str name=&quot;mm&quot;> 2&lt;-1 5&lt;-2 6&lt;90% </str> <int name=&quot;ps&quot;>100</int> <str name=&quot;q.alt&quot;>*:*</str> </lst> </requestHandler>
    63. 64. Wrap Up
    64. 65. Resources <ul><li>Solr http://lucene.apache.org/solr/ </li></ul><ul><ul><li>wiki, mailing list, jira (bugs/features) </li></ul></ul><ul><li>Lucene http://lucene.apache.org / </li></ul>
    65. 66. Lucene In Action
    66. 67. Building Search Applications with Lucene, lingpipe and Gate Manu Konchady Manu Konchady Manu Konchady
    67. 68. Other Presentations <ul><li>Yonik Seely's Solr & Lucene </li></ul><ul><ul><li>http://people.apache.org/~yonik/presentations/ </li></ul></ul><ul><li>Slideshare.net </li></ul><ul><ul><li>Search for solr, or search for lucene </li></ul></ul>
    68. 69. Thanks! Thanks for coming. Feel free to email me if you have questions about Solr Tom Hill [email_address]
    69. 70. Extra Slides Things I didn't have time for in the presentation. Some of them unfinished.
    70. 71. Search Engines are not the Same as Users <ul><li>Search engines have different usage patterns than users </li></ul>
    71. 72. Response Writers <ul><li>http://localhost:8983/solr/select/?q=text_t%3Atiger&version=2.2&start=0&rows=10&indent=on& wt=ruby </li></ul><ul><li>http://localhost:8983/solr/select/?q=text_t%3Atiger&version=2.2&start=0&rows=10&indent=on& wt=xml </li></ul>
    72. 73. Explain <ul><li>Just why did the documents come up in that order? </li></ul>
    73. 74. Data Matters <ul><li>Gigo </li></ul><ul><li>The better the data is, the better the search will be. </li></ul>
    74. 75. Watch Your Caches <ul><li>Just like any other app, check your statistics </li></ul><ul><li>What's the hit rate for your caches? </li></ul>
    75. 76. Setting Up Replication <ul><li>Run rsyncd on the master </li></ul><ul><li>Run snapshot on the master at intervals </li></ul><ul><li>Run snappuller on the slaves at (different) intervals. </li></ul><ul><li>Scripts don't print errors! </li></ul><ul><ul><li>Check the logs </li></ul></ul><ul><ul><li>Use bash -xv </li></ul></ul>
    76. 77. Autowarming <ul><li>Runs after an update to the index </li></ul><ul><ul><li>Updates flush caches </li></ul></ul><ul><li>Runs some queries to populate caches again </li></ul><ul><li>Can be a problem, with frequent updates </li></ul><ul><ul><li>Don't autowarm master, if updating lots </li></ul></ul>
    77. 78. Tour Of Solr's Web UI
    78. 79. Programming Collective Intelligence A Really Fun Book
    79. 80. Geographic Searching <ul><li>Local Lucene & Local Solr </li></ul><ul><li>http://locallucene.wiki.sourceforge.net </li></ul><ul><li>There's also geolucene, but it's not being actively developed, as far as I can tell. </li></ul><ul><li>http://www.gossamer-threads.com/lists/l ucene/java-dev/53378 </li></ul>
    80. 81. http://localhost:8983/solr/admin/stats.jsp#update Are there commits pending?
    81. 82. http://localhost:8983/comix/admin/analysis.jsp?name=text&val=wi-fi Analysis Explanation
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×