• Like
Introduction to Apache Solr.
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Introduction to Apache Solr.

  • 12,533 views
Published

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
12,533
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
299
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
  • 2. Agenda
    • Overview of Apache Solr
    • Why Solr?
    • Installing Apache Solr
    • Getting Solr configuration right.
    • Solr query basics and not so basic stuff.
    • Scaling Solr
    • Some tips on Solr Caching
  • 3. Overview
    • Apache Solr is a standalone full-text search server with Apache Lucene at the backend.
    • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
    • In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.
  • 4. Features
    • Full Text Search
    • Faceted navigation
    • More items like this(Recommendation)/ Related searches
    • Spell Suggest/Auto-Complete
    • Custom document ranking/ordering
    • Snippet generation/highlighting
    • And a lot More....
  • 5. So, why would “I” need solr??
    • Want Greater control over your website search.
    • Caching, Replication, Distributed search.
    • Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction).
    • Great admin interface can be used over HTTP.
    • Awesome community support too.
    • Support for integration with various other products like drupal CMS, etc.
  • 6. Products using Solr
    • E-commerce sites, CMS, Blog sites.
    • Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg.
    • Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team)
  • 7. Installation
    • Minimum Requirements.
    • Directory for storing index files.
    • Directory for storing configuration files.
    • Solr_Home having other dependencies
    • A Servlet container(tomcat, jetty)
    • with appropriate configuration.
  • 8. Configuring Solr
    • Schema.xml – Contains all of the details about document structure, index-time and query-time processing.
    • Solrconfig.xml - Contains most of the parameters for configuring Solr itself.
  • 9. Querying Solr: The basics
    • Plain text search
    • q = text:"I love android"
    • Expanding search to more fields :
    • title:android & type:review & price:[* To 500]
    • Add facets
    • facet.field=product & facet.field=rating
  • 10. Querying Solr: The basics
    • Add facets for range queries
    • facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *]
    • Ordering results
    • sort = score desc, price asc
    • Limiting results
    • rows=15
    • Paginating on results
    • start=25 & rows=10
  • 11. Querying Solr - Not so basics stuff
    • Advanced Query operators:
    • fq : FilterQuery , Example: fq = type:review & price:[* TO 500]
    • fl : Restrict fields to be returned with the resultset.
    • Example: fl=id,title,text
  • 12. Querying Solr - Not so basics stuff
    • hl : Highlighting matches in snippet, Snippet generation etc.
    • Example query : hl=true&hl.fl=title,text
    • Custom Field boosting
    • Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3
    • debug = true
  • 13. Solr Search Custom handlers
    • Request Handlers
    • DataImportHandler, DisMaxHandler
    • Response Writers
    • json,xml,csv format writers
  • 14. External Search Components
    • SpellCheckComponent :
    • Uses solr indexes, Custom dictionaries etc.
    • More Like this - (Term Suggest, Similar items etc.)
    • Clustering component
    • TermVector Component
    • Returns advanced information about Query terms, offset, positions
    • Query Elevation Component - Sponsored Results
  • 15. Scaling Solr (I feel the Need for Speed >>>> )
    • Distributed Search a.k.a Sharding.
    • Create Separate indexes(Rsync/Scp)
    • OR
    • Can run Solr index Replication daemon.
    • Optimization/Autocommit for the indexes.
  • 16. Solr Caching
    • Build your queries wisely.
    • External Caching : Memcached, etc.
    • Internal Caching
    • Different types of cache:
    • 1) FilterCache: Used by facetQueries(fq), sometimes for faceting too.
    • 2) QueryResultCache : Used for results returned by generic queries
  • 17. Links and resources
    • http://wiki.apache.org/solr/
    • http://www.lucidimagination.com/developer/Articles
    • http://khaidoan.wikidot.com/solr
    • http://42bits.wordpress.com
    Links and resources
  • 18. Thanks! This talk wouldn't have been possible without the support from Paypal and Apache Solr project.
    • Questions ?