Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
Agenda Overview of Apache Solr Why Solr? Installing Apache Solr Getting Solr configuration right. Solr query basics and not so basic stuff. Scaling Solr Some tips on Solr Caching
Overview Apache Solr is a standalone full-text search server with Apache Lucene at the backend.  Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.  In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.
Features Full Text Search Faceted navigation More items like this(Recommendation)/ Related searches  Spell Suggest/Auto-Complete Custom document ranking/ordering Snippet generation/highlighting And a lot More....
So, why would “I” need solr?? Want Greater control over your website search. Caching, Replication, Distributed search. Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction). Great admin interface can be used over HTTP. Awesome community support too. Support for integration with various other products like drupal CMS, etc.
Products using Solr E-commerce sites, CMS, Blog sites. Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg. Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team)
Installation Minimum Requirements. Directory for storing index files. Directory for storing configuration files. Solr_Home having other dependencies A Servlet container(tomcat, jetty)  with appropriate configuration.
Configuring Solr Schema.xml – Contains all of the details about document structure, index-time and query-time processing. Solrconfig.xml - Contains most of the parameters for configuring Solr itself.
Querying Solr: The basics Plain text search q = text:"I love android" Expanding search to more fields :  title:android & type:review & price:[* To 500] Add facets facet.field=product & facet.field=rating
Querying Solr: The basics Add facets for range queries facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *] Ordering results sort = score desc, price asc Limiting results rows=15 Paginating on results start=25 & rows=10
Querying Solr - Not so basics stuff Advanced Query operators: fq : FilterQuery , Example: fq = type:review & price:[* TO 500] fl : Restrict fields to be returned with the resultset. Example: fl=id,title,text
Querying Solr - Not so basics stuff hl : Highlighting matches in snippet, Snippet generation etc. Example query : hl=true&hl.fl=title,text Custom Field boosting Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3 debug = true
Solr Search Custom handlers Request Handlers  DataImportHandler, DisMaxHandler Response Writers  json,xml,csv format writers
External Search Components SpellCheckComponent :  Uses solr indexes, Custom dictionaries etc. More Like this - (Term Suggest, Similar items etc.) Clustering component  TermVector Component Returns advanced information about Query terms, offset, positions Query Elevation Component - Sponsored Results
Scaling Solr (I feel the Need for Speed >>>> ) Distributed Search a.k.a Sharding. Create Separate indexes(Rsync/Scp) OR  Can run Solr index Replication daemon. Optimization/Autocommit for the indexes.
Solr Caching  Build your queries wisely. External Caching : Memcached, etc. Internal Caching Different types of cache: 1) FilterCache: Used by facetQueries(fq), sometimes for faceting too. 2) QueryResultCache : Used for results returned by generic queries
Links and resources http://wiki.apache.org/solr/ http://www.lucidimagination.com/developer/Articles http://khaidoan.wikidot.com/solr http://42bits.wordpress.com Links and resources

Introduction to Apache Solr.

  • 1.
    Barcamp 5, ChennaiApache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
  • 2.
    Agenda Overview ofApache Solr Why Solr? Installing Apache Solr Getting Solr configuration right. Solr query basics and not so basic stuff. Scaling Solr Some tips on Solr Caching
  • 3.
    Overview Apache Solris a standalone full-text search server with Apache Lucene at the backend. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.
  • 4.
    Features Full TextSearch Faceted navigation More items like this(Recommendation)/ Related searches Spell Suggest/Auto-Complete Custom document ranking/ordering Snippet generation/highlighting And a lot More....
  • 5.
    So, why would“I” need solr?? Want Greater control over your website search. Caching, Replication, Distributed search. Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction). Great admin interface can be used over HTTP. Awesome community support too. Support for integration with various other products like drupal CMS, etc.
  • 6.
    Products using SolrE-commerce sites, CMS, Blog sites. Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg. Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team)
  • 7.
    Installation Minimum Requirements.Directory for storing index files. Directory for storing configuration files. Solr_Home having other dependencies A Servlet container(tomcat, jetty) with appropriate configuration.
  • 8.
    Configuring Solr Schema.xml– Contains all of the details about document structure, index-time and query-time processing. Solrconfig.xml - Contains most of the parameters for configuring Solr itself.
  • 9.
    Querying Solr: Thebasics Plain text search q = text:"I love android" Expanding search to more fields : title:android & type:review & price:[* To 500] Add facets facet.field=product & facet.field=rating
  • 10.
    Querying Solr: Thebasics Add facets for range queries facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *] Ordering results sort = score desc, price asc Limiting results rows=15 Paginating on results start=25 & rows=10
  • 11.
    Querying Solr -Not so basics stuff Advanced Query operators: fq : FilterQuery , Example: fq = type:review & price:[* TO 500] fl : Restrict fields to be returned with the resultset. Example: fl=id,title,text
  • 12.
    Querying Solr -Not so basics stuff hl : Highlighting matches in snippet, Snippet generation etc. Example query : hl=true&hl.fl=title,text Custom Field boosting Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3 debug = true
  • 13.
    Solr Search Customhandlers Request Handlers DataImportHandler, DisMaxHandler Response Writers json,xml,csv format writers
  • 14.
    External Search ComponentsSpellCheckComponent : Uses solr indexes, Custom dictionaries etc. More Like this - (Term Suggest, Similar items etc.) Clustering component TermVector Component Returns advanced information about Query terms, offset, positions Query Elevation Component - Sponsored Results
  • 15.
    Scaling Solr (Ifeel the Need for Speed >>>> ) Distributed Search a.k.a Sharding. Create Separate indexes(Rsync/Scp) OR Can run Solr index Replication daemon. Optimization/Autocommit for the indexes.
  • 16.
    Solr Caching Build your queries wisely. External Caching : Memcached, etc. Internal Caching Different types of cache: 1) FilterCache: Used by facetQueries(fq), sometimes for faceting too. 2) QueryResultCache : Used for results returned by generic queries
  • 17.
    Links and resourceshttp://wiki.apache.org/solr/ http://www.lucidimagination.com/developer/Articles http://khaidoan.wikidot.com/solr http://42bits.wordpress.com Links and resources