Introduction to Apache Solr.


Published on

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Published in: Technology

Introduction to Apache Solr.

  1. 1. Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
  2. 2. Agenda <ul><li>Overview of Apache Solr </li></ul><ul><li>Why Solr? </li></ul><ul><li>Installing Apache Solr </li></ul><ul><li>Getting Solr configuration right. </li></ul><ul><li>Solr query basics and not so basic stuff. </li></ul><ul><li>Scaling Solr </li></ul><ul><li>Some tips on Solr Caching </li></ul>
  3. 3. Overview <ul><li>Apache Solr is a standalone full-text search server with Apache Lucene at the backend. </li></ul><ul><li>Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. </li></ul><ul><li>In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform. </li></ul>
  4. 4. Features <ul><li>Full Text Search </li></ul><ul><li>Faceted navigation </li></ul><ul><li>More items like this(Recommendation)/ Related searches </li></ul><ul><li>Spell Suggest/Auto-Complete </li></ul><ul><li>Custom document ranking/ordering </li></ul><ul><li>Snippet generation/highlighting </li></ul><ul><li>And a lot More.... </li></ul>
  5. 5. So, why would “I” need solr?? <ul><li>Want Greater control over your website search. </li></ul><ul><li>Caching, Replication, Distributed search. </li></ul><ul><li>Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction). </li></ul><ul><li>Great admin interface can be used over HTTP. </li></ul><ul><li>Awesome community support too. </li></ul><ul><li>Support for integration with various other products like drupal CMS, etc. </li></ul>
  6. 6. Products using Solr <ul><li>E-commerce sites, CMS, Blog sites. </li></ul><ul><li>Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg. </li></ul><ul><li>Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team) </li></ul>
  7. 7. Installation <ul><li>Minimum Requirements. </li></ul><ul><li>Directory for storing index files. </li></ul><ul><li>Directory for storing configuration files. </li></ul><ul><li>Solr_Home having other dependencies </li></ul><ul><li>A Servlet container(tomcat, jetty) </li></ul><ul><li>with appropriate configuration. </li></ul>
  8. 8. Configuring Solr <ul><li>Schema.xml – Contains all of the details about document structure, index-time and query-time processing. </li></ul><ul><li>Solrconfig.xml - Contains most of the parameters for configuring Solr itself. </li></ul>
  9. 9. Querying Solr: The basics <ul><li>Plain text search </li></ul><ul><li>q = text:&quot;I love android&quot; </li></ul><ul><li>Expanding search to more fields : </li></ul><ul><li>title:android & type:review & price:[* To 500] </li></ul><ul><li>Add facets </li></ul><ul><li>facet.field=product & facet.field=rating </li></ul>
  10. 10. Querying Solr: The basics <ul><li>Add facets for range queries </li></ul><ul><li>facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *] </li></ul><ul><li>Ordering results </li></ul><ul><li>sort = score desc, price asc </li></ul><ul><li>Limiting results </li></ul><ul><li>rows=15 </li></ul><ul><li>Paginating on results </li></ul><ul><li>start=25 & rows=10 </li></ul>
  11. 11. Querying Solr - Not so basics stuff <ul><li>Advanced Query operators: </li></ul><ul><li>fq : FilterQuery , Example: fq = type:review & price:[* TO 500] </li></ul><ul><li>fl : Restrict fields to be returned with the resultset. </li></ul><ul><li>Example: fl=id,title,text </li></ul>
  12. 12. Querying Solr - Not so basics stuff <ul><li>hl : Highlighting matches in snippet, Snippet generation etc. </li></ul><ul><li>Example query : hl=true&hl.fl=title,text </li></ul><ul><li>Custom Field boosting </li></ul><ul><li>Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3 </li></ul><ul><li>debug = true </li></ul>
  13. 13. Solr Search Custom handlers <ul><li>Request Handlers </li></ul><ul><li>DataImportHandler, DisMaxHandler </li></ul><ul><li>Response Writers </li></ul><ul><li>json,xml,csv format writers </li></ul>
  14. 14. External Search Components <ul><li>SpellCheckComponent : </li></ul><ul><li>Uses solr indexes, Custom dictionaries etc. </li></ul><ul><li>More Like this - (Term Suggest, Similar items etc.) </li></ul><ul><li>Clustering component </li></ul><ul><li>TermVector Component </li></ul><ul><li>Returns advanced information about Query terms, offset, positions </li></ul><ul><li>Query Elevation Component - Sponsored Results </li></ul>
  15. 15. Scaling Solr (I feel the Need for Speed >>>> ) <ul><li>Distributed Search a.k.a Sharding. </li></ul><ul><li>Create Separate indexes(Rsync/Scp) </li></ul><ul><li>OR </li></ul><ul><li>Can run Solr index Replication daemon. </li></ul><ul><li>Optimization/Autocommit for the indexes. </li></ul>
  16. 16. Solr Caching <ul><li>Build your queries wisely. </li></ul><ul><li>External Caching : Memcached, etc. </li></ul><ul><li>Internal Caching </li></ul><ul><li>Different types of cache: </li></ul><ul><li>1) FilterCache: Used by facetQueries(fq), sometimes for faceting too. </li></ul><ul><li>2) QueryResultCache : Used for results returned by generic queries </li></ul>
  17. 17. Links and resources <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>Links and resources
  18. 18. Thanks! This talk wouldn't have been possible without the support from Paypal and Apache Solr project. <ul><li>Questions ? </li></ul>