Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
Agenda <ul><li>Overview of Apache Solr </li></ul><ul><li>Why Solr? </li></ul><ul><li>Installing Apache Solr </li></ul><ul>...
Overview <ul><li>Apache Solr is a standalone full-text search server with Apache Lucene at the backend.  </li></ul><ul><li...
Features <ul><li>Full Text Search </li></ul><ul><li>Faceted navigation </li></ul><ul><li>More items like this(Recommendati...
So, why would “I” need solr?? <ul><li>Want Greater control over your website search. </li></ul><ul><li>Caching, Replicatio...
Products using Solr <ul><li>E-commerce sites, CMS, Blog sites. </li></ul><ul><li>Heavily used by LinkedIn, Twitter, Cnet, ...
Installation <ul><li>Minimum Requirements. </li></ul><ul><li>Directory for storing index files. </li></ul><ul><li>Director...
Configuring Solr <ul><li>Schema.xml – Contains all of the details about document structure, index-time and query-time proc...
Querying Solr: The basics <ul><li>Plain text search </li></ul><ul><li>q = text:&quot;I love android&quot; </li></ul><ul><l...
Querying Solr: The basics <ul><li>Add facets for range queries </li></ul><ul><li>facet.query=price:[* TO 100]&facet.query=...
Querying Solr - Not so basics stuff <ul><li>Advanced Query operators: </li></ul><ul><li>fq : FilterQuery , Example: fq = t...
Querying Solr - Not so basics stuff <ul><li>hl : Highlighting matches in snippet, Snippet generation etc. </li></ul><ul><l...
Solr Search Custom handlers <ul><li>Request Handlers  </li></ul><ul><li>DataImportHandler, DisMaxHandler </li></ul><ul><li...
External Search Components <ul><li>SpellCheckComponent :  </li></ul><ul><li>Uses solr indexes, Custom dictionaries etc. </...
Scaling Solr (I feel the Need for Speed >>>> ) <ul><li>Distributed Search a.k.a Sharding. </li></ul><ul><li>Create Separat...
Solr Caching  <ul><li>Build your queries wisely. </li></ul><ul><li>External Caching : Memcached, etc. </li></ul><ul><li>In...
Links and resources <ul><li>http://wiki.apache.org/solr/ </li></ul><ul><li>http://www.lucidimagination.com/developer/Artic...
Upcoming SlideShare
Loading in …5
×

Introduction to Apache Solr.

20,882 views

Published on

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Published in: Technology

Introduction to Apache Solr.

  1. 1. Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
  2. 2. Agenda <ul><li>Overview of Apache Solr </li></ul><ul><li>Why Solr? </li></ul><ul><li>Installing Apache Solr </li></ul><ul><li>Getting Solr configuration right. </li></ul><ul><li>Solr query basics and not so basic stuff. </li></ul><ul><li>Scaling Solr </li></ul><ul><li>Some tips on Solr Caching </li></ul>
  3. 3. Overview <ul><li>Apache Solr is a standalone full-text search server with Apache Lucene at the backend. </li></ul><ul><li>Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. </li></ul><ul><li>In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform. </li></ul>
  4. 4. Features <ul><li>Full Text Search </li></ul><ul><li>Faceted navigation </li></ul><ul><li>More items like this(Recommendation)/ Related searches </li></ul><ul><li>Spell Suggest/Auto-Complete </li></ul><ul><li>Custom document ranking/ordering </li></ul><ul><li>Snippet generation/highlighting </li></ul><ul><li>And a lot More.... </li></ul>
  5. 5. So, why would “I” need solr?? <ul><li>Want Greater control over your website search. </li></ul><ul><li>Caching, Replication, Distributed search. </li></ul><ul><li>Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction). </li></ul><ul><li>Great admin interface can be used over HTTP. </li></ul><ul><li>Awesome community support too. </li></ul><ul><li>Support for integration with various other products like drupal CMS, etc. </li></ul>
  6. 6. Products using Solr <ul><li>E-commerce sites, CMS, Blog sites. </li></ul><ul><li>Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg. </li></ul><ul><li>Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team) </li></ul>
  7. 7. Installation <ul><li>Minimum Requirements. </li></ul><ul><li>Directory for storing index files. </li></ul><ul><li>Directory for storing configuration files. </li></ul><ul><li>Solr_Home having other dependencies </li></ul><ul><li>A Servlet container(tomcat, jetty) </li></ul><ul><li>with appropriate configuration. </li></ul>
  8. 8. Configuring Solr <ul><li>Schema.xml – Contains all of the details about document structure, index-time and query-time processing. </li></ul><ul><li>Solrconfig.xml - Contains most of the parameters for configuring Solr itself. </li></ul>
  9. 9. Querying Solr: The basics <ul><li>Plain text search </li></ul><ul><li>q = text:&quot;I love android&quot; </li></ul><ul><li>Expanding search to more fields : </li></ul><ul><li>title:android & type:review & price:[* To 500] </li></ul><ul><li>Add facets </li></ul><ul><li>facet.field=product & facet.field=rating </li></ul>
  10. 10. Querying Solr: The basics <ul><li>Add facets for range queries </li></ul><ul><li>facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *] </li></ul><ul><li>Ordering results </li></ul><ul><li>sort = score desc, price asc </li></ul><ul><li>Limiting results </li></ul><ul><li>rows=15 </li></ul><ul><li>Paginating on results </li></ul><ul><li>start=25 & rows=10 </li></ul>
  11. 11. Querying Solr - Not so basics stuff <ul><li>Advanced Query operators: </li></ul><ul><li>fq : FilterQuery , Example: fq = type:review & price:[* TO 500] </li></ul><ul><li>fl : Restrict fields to be returned with the resultset. </li></ul><ul><li>Example: fl=id,title,text </li></ul>
  12. 12. Querying Solr - Not so basics stuff <ul><li>hl : Highlighting matches in snippet, Snippet generation etc. </li></ul><ul><li>Example query : hl=true&hl.fl=title,text </li></ul><ul><li>Custom Field boosting </li></ul><ul><li>Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3 </li></ul><ul><li>debug = true </li></ul>
  13. 13. Solr Search Custom handlers <ul><li>Request Handlers </li></ul><ul><li>DataImportHandler, DisMaxHandler </li></ul><ul><li>Response Writers </li></ul><ul><li>json,xml,csv format writers </li></ul>
  14. 14. External Search Components <ul><li>SpellCheckComponent : </li></ul><ul><li>Uses solr indexes, Custom dictionaries etc. </li></ul><ul><li>More Like this - (Term Suggest, Similar items etc.) </li></ul><ul><li>Clustering component </li></ul><ul><li>TermVector Component </li></ul><ul><li>Returns advanced information about Query terms, offset, positions </li></ul><ul><li>Query Elevation Component - Sponsored Results </li></ul>
  15. 15. Scaling Solr (I feel the Need for Speed >>>> ) <ul><li>Distributed Search a.k.a Sharding. </li></ul><ul><li>Create Separate indexes(Rsync/Scp) </li></ul><ul><li>OR </li></ul><ul><li>Can run Solr index Replication daemon. </li></ul><ul><li>Optimization/Autocommit for the indexes. </li></ul>
  16. 16. Solr Caching <ul><li>Build your queries wisely. </li></ul><ul><li>External Caching : Memcached, etc. </li></ul><ul><li>Internal Caching </li></ul><ul><li>Different types of cache: </li></ul><ul><li>1) FilterCache: Used by facetQueries(fq), sometimes for faceting too. </li></ul><ul><li>2) QueryResultCache : Used for results returned by generic queries </li></ul>
  17. 17. Links and resources <ul><li>http://wiki.apache.org/solr/ </li></ul><ul><li>http://www.lucidimagination.com/developer/Articles </li></ul><ul><li>http://khaidoan.wikidot.com/solr </li></ul><ul><li>http://42bits.wordpress.com </li></ul>Links and resources
  18. 18. Thanks! This talk wouldn't have been possible without the support from Paypal and Apache Solr project. <ul><li>Questions ? </li></ul>

×