Introduction to Apache Solr.
Upcoming SlideShare
Loading in...5
×
 

Introduction to Apache Solr.

on

  • 12,128 views

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Slides of my Tech Talk on Apache Solr, at BarCamp 5, Chennai.

Statistics

Views

Total Views
12,128
Views on SlideShare
12,128
Embed Views
0

Actions

Likes
5
Downloads
242
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to Apache Solr. Introduction to Apache Solr. Presentation Transcript

    • Barcamp 5, Chennai Apache Solr – I can haz Search! Ashish Yadav (ashish_0x90)
    • Agenda
      • Overview of Apache Solr
      • Why Solr?
      • Installing Apache Solr
      • Getting Solr configuration right.
      • Solr query basics and not so basic stuff.
      • Scaling Solr
      • Some tips on Solr Caching
    • Overview
      • Apache Solr is a standalone full-text search server with Apache Lucene at the backend.
      • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
      • In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.
    • Features
      • Full Text Search
      • Faceted navigation
      • More items like this(Recommendation)/ Related searches
      • Spell Suggest/Auto-Complete
      • Custom document ranking/ordering
      • Snippet generation/highlighting
      • And a lot More....
    • So, why would “I” need solr??
      • Want Greater control over your website search.
      • Caching, Replication, Distributed search.
      • Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction).
      • Great admin interface can be used over HTTP.
      • Awesome community support too.
      • Support for integration with various other products like drupal CMS, etc.
    • Products using Solr
      • E-commerce sites, CMS, Blog sites.
      • Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg.
      • Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team)
    • Installation
      • Minimum Requirements.
      • Directory for storing index files.
      • Directory for storing configuration files.
      • Solr_Home having other dependencies
      • A Servlet container(tomcat, jetty)
      • with appropriate configuration.
    • Configuring Solr
      • Schema.xml – Contains all of the details about document structure, index-time and query-time processing.
      • Solrconfig.xml - Contains most of the parameters for configuring Solr itself.
    • Querying Solr: The basics
      • Plain text search
      • q = text:"I love android"
      • Expanding search to more fields :
      • title:android & type:review & price:[* To 500]
      • Add facets
      • facet.field=product & facet.field=rating
    • Querying Solr: The basics
      • Add facets for range queries
      • facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *]
      • Ordering results
      • sort = score desc, price asc
      • Limiting results
      • rows=15
      • Paginating on results
      • start=25 & rows=10
    • Querying Solr - Not so basics stuff
      • Advanced Query operators:
      • fq : FilterQuery , Example: fq = type:review & price:[* TO 500]
      • fl : Restrict fields to be returned with the resultset.
      • Example: fl=id,title,text
    • Querying Solr - Not so basics stuff
      • hl : Highlighting matches in snippet, Snippet generation etc.
      • Example query : hl=true&hl.fl=title,text
      • Custom Field boosting
      • Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3
      • debug = true
    • Solr Search Custom handlers
      • Request Handlers
      • DataImportHandler, DisMaxHandler
      • Response Writers
      • json,xml,csv format writers
    • External Search Components
      • SpellCheckComponent :
      • Uses solr indexes, Custom dictionaries etc.
      • More Like this - (Term Suggest, Similar items etc.)
      • Clustering component
      • TermVector Component
      • Returns advanced information about Query terms, offset, positions
      • Query Elevation Component - Sponsored Results
    • Scaling Solr (I feel the Need for Speed >>>> )
      • Distributed Search a.k.a Sharding.
      • Create Separate indexes(Rsync/Scp)
      • OR
      • Can run Solr index Replication daemon.
      • Optimization/Autocommit for the indexes.
    • Solr Caching
      • Build your queries wisely.
      • External Caching : Memcached, etc.
      • Internal Caching
      • Different types of cache:
      • 1) FilterCache: Used by facetQueries(fq), sometimes for faceting too.
      • 2) QueryResultCache : Used for results returned by generic queries
    • Links and resources
      • http://wiki.apache.org/solr/
      • http://www.lucidimagination.com/developer/Articles
      • http://khaidoan.wikidot.com/solr
      • http://42bits.wordpress.com
      Links and resources
    • Thanks! This talk wouldn't have been possible without the support from Paypal and Apache Solr project.
      • Questions ?