SlideShare a Scribd company logo
1 of 50
Download to read offline
Solr
   Search at the Speed of Light


          JavaZone 2009
           September 10
               Oslo
  Erik Hatcher, Lucid Imagination
erik.hatcher@lucidimagination.com




                                    1
Solr History

     • Created by Yonik Seeley for CNET
     • Contributed to Apache in January 2006
     • December 2006:Version 1.1 released
     • June 2007:Version 1.2 released
     • September 2008:Version 1.3 released
     • ~September 2009:Version 1.4
http://lucene.apache.org/solr
    © 2008-2009          Lucid Imagination, Inc.
                                                   2
Solr: Big Picture
                                   Data


                                                       DB


              Document
               Document
                 Documents




                                Solr




                               Search Results




© 2008-2009                  Lucid Imagination, Inc.
                                                            3
Features

 • Lucene power exposed over HTTP
 • Scalability: caching, replication, distributed
      search
 • Faceting
 • And more: spell checking, highlighting,
      clustering, rich document and DB indexing,
      "more like this"


© 2008-2009            Lucid Imagination, Inc.
                                                    4
Lucene

 • Fast, scalable search library
 • Lucene index structure
  • Index contains documents
    • documents have fields
      • indexed fields have terms

© 2008-2009        Lucid Imagination, Inc.
                                             5
Inverted Index

 • Commonly used search
      engine data structure
 • Efficient lookup of terms
      across large number of
      documents
 • Usually stores positional
      information to enable From "Taming Text" by Grant Ingersoll and Tom Morton
      phrase/proximity queries


© 2008-2009                     Lucid Imagination, Inc.
                                                                                   6
Analysis Process




© 2008-2009         Lucid Imagination, Inc.
                                              7
Analyzing the analyzer
                    Example phrase

      The quick brown fox jumps over the lazy dog.




© 2008-2009            Lucid Imagination, Inc.
                                                     8
WhitespaceAnalyzer
                Simplest built-in analyzer
      The quick brown fox jumps over the lazy dog.




  [The] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog.]

© 2008-2009             Lucid Imagination, Inc.
                                                     9
SimpleAnalyzer
          Lowercases, splits at non-letter boundaries
      the quick brown fox jumps over the lazy dog.




  [the] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog]

© 2008-2009               Lucid Imagination, Inc.
                                                        10
StopAnalyzer
              Lowercases and removes stop words


      The quick brown fox jumps over the lazy dog.




 [quick] [brown] [fox] [jumps] [over] [lazy] [dog]




© 2008-2009               Lucid Imagination, Inc.
                                                     11
SnowballAnalyzer
                   Stemming algorithm
      The quick brown fox jumps over the lazi dog.




   [the] [quick] [brown] [fox] [jump] [over] [the]
                     [lazi] [dog]

© 2008-2009            Lucid Imagination, Inc.
                                                     12
What's in a token?




© 2008-2009          Lucid Imagination, Inc.
                                               13
Relevance

 •    Term frequency (TF): number of times a term
      appears in a document

 •    Inverse document frequency (IDF): One over
      number of times term appears in the index (1/df)

 •    Field length normalization: control affect field
      length, in number of terms, has on score

 •    Boost factors: terms, fields, or documents



© 2008-2009               Lucid Imagination, Inc.
                                                         14
Lucene Scoring
                                  d1




                                                q1
                  Θ




© 2008-2009           Lucid Imagination, Inc.
                                                     15
Solr APIs

 • HTTP GET/POST (curl or any other HTTP
      client)
 • JSON
 • SolrJ (embedded or HTTP)
 • solr-ruby
 • python, PHP, solrsharp, XSLT

© 2008-2009         Lucid Imagination, Inc.
                                              16
Solr in Production
                                              Incoming Search
                                                  Requests




                                               Load Balancer




                                                  Solr
                                                 Solr Master
                                                  Solr Master


                              Shard Request                    Shard Request


                   Load Balancer                                          Load Balancer



                      Shard                                                    Shard
          Shard                                                  Shard
          Master                                 1..n            Master
                          Replicant             shards                            Replicant
                           Replicant                                               Replicant
                            Replicant                                               Replicant
                              Replicant                                               Replicant




© 2008-2009                                    Lucid Imagination, Inc.
                                                                                                  17
Getting Started:
                 It's This Easy
1.Start Solr

  java -jar start.jar
2.Index your data

  java -jar post.jar *.xml
3.Search

  http://localhost:8983/solr
  © 2008-2009         Lucid Imagination, Inc.
                                                18
Configuration
 •    schema.xml

     •    field types and fields

 •    solrconfig.xml

     •    request handler mappings

     •    cache settings: filter, query, document

     •    warming listeners

     •    HTTP cache settings

     •    Lucene index parameters

     •    plugins: spell checking, highlighting


© 2008-2009                      Lucid Imagination, Inc.
                                                           19
Solr add/update XML
<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of
               video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display
                         with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless,
                         H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display,
      Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware,
      USB 2.0 compatibility, Playback speed control, Rechargeable capability,
      Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
</doc></add>


     © 2008-2009                     Lucid Imagination, Inc.
                                                                                     20
Indexing Solr XML
 • Via curl:'http://localhost:8983/
   curl
      solr/update?commit=true' --
      data-binary @ipod_video.xml -
      H 'Content-type:text/xml;
      charset=utf-8'

 • Via Solr's Java-based post tool:
      java -jar post.jar ipod_video.xml



© 2008-2009            Lucid Imagination, Inc.
                                                 21
Indexing CSV


curl 'http://localhost:8983/solr/update/
csv?commit=true' --data-binary @books.csv -
H 'Content-type:text/plain; charset=utf-8'




   © 2008-2009       Lucid Imagination, Inc.
                                               22
Content Streams

 •    Allows Solr server to fetch local or remote data
      itself. Must enable remote streaming in
      solrconfig.xml

 •    http://localhost:8983/solr/update?stream.file=<local
      Solr path to exampledocs>/ipod_video.xml

 •    &stream.url=<url to content>

 •    Security warning: allows Solr to fetch arbitrary
      server-side file or network URL content



© 2008-2009                Lucid Imagination, Inc.
                                                            23
Indexing Rich Documents


curl 'http://localhost:8983/solr/update/
extract?
literal.id=doc1&commit=true&extractOnly=true
&wt=ruby&indent=on' -F
"myfile=@tutorial.html"




    © 2008-2009     Lucid Imagination, Inc.
                                               24
Indexing with SolrJ

SolrServer solr =
    new CommonsHttpSolrServer(new URL("http://localhost:8983/solr"));

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "JAVAZONE_09");
doc.addField("title", "JavaZone 2009 SolrJ Example");
solr.add(doc);
solr.commit();     // after a batch, not per document
solr.optimize();   // periodically, when needed




    © 2008-2009                Lucid Imagination, Inc.
                                                                        25
Indexing with Ruby

solr = Connection.new(
  'http://localhost:8983/solr',
  :autocommit => :on)

solr.add(:id => 123,
         :title => 'Solr in Action')

solr.optimize       # periodically, as needed




  © 2008-2009           Lucid Imagination, Inc.
                                                  26
Data Import Handler


• Indexes relational database, XML data sources,
   e-mail, and more
• Supports full and incremental/delta indexing
• Extensible with custom data sources,
   transformers, etc
• http://wiki.apache.org/solr/DataImportHandler
 © 2008-2009           Lucid Imagination, Inc.
                                                   27
DB Indexing



http://localhost:8983/solr/db/dataimport?
command=full-import




  © 2008-2009       Lucid Imagination, Inc.
                                              28
Example Search Request

 • http://localhost:8983/solr/select?q=query
  • &start=50
  • &rows=25
  • &fq=filter+query
  • &facet=on&facet.field=category

© 2008-2009         Lucid Imagination, Inc.
                                               29
Debug Query


 • &debugQuery=true is your friend
 • Includes parsed query, explanations, and
      search component timings in response




© 2008-2009           Lucid Imagination, Inc.
                                                30
Query Parser

 • Controlled by defType parameter
  • &defType=lucene (actually a Solr
          extension of Lucene’s QueryParser)
     • &defType=dismax
 • Local {!..} override syntax

© 2008-2009             Lucid Imagination, Inc.
                                                  31
Solr Query Parser

 • http://lucene.apache.org/java/2_4_0/
      queryparsersyntax.html + Solr extensions
 • Kitchen sink parser, includes advanced user-
      unfriendly syntax
 • Syntax errors throw parse exceptions back
      to client
 • Example: title:ipod* AND price:[0 TO 100]
© 2008-2009               Lucid Imagination, Inc.
                                                    32
Dismax Query Parser

 • Simplified syntax:
      loose text “quote phrases” -prohibited
      +required
 • Spreads query terms across query fields
      (qf) with dynamic boosting per field, implicit
      phrase construction (pf), boosting function
      (bf), boosting query (bq), and minimum
      match (mm)


© 2008-2009            Lucid Imagination, Inc.
                                                      33
Searching with SolrJ


SolrServer server = new CommonsHttpSolrServer("http://
  localhost:8983/solr");
SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);
QueryResponse response = server.query(params);
for (SolrDocument document : response.getResults()) {
      System.out.println("Doc: " + document);
}




   © 2008-2009            Lucid Imagination, Inc.
                                                         34
Searching with Ruby


conn = Connection.new(
    'http://localhost:8983/solr')

conn.query('my query') do |hit|
  puts hit.inspect
end




© 2008-2009           Lucid Imagination, Inc.
                                                35
delete, update, etc
 •    Delete:
     • <delete><id>05991</id></delete>
     •    <delete>
             <query>category:Unused</query>
          </delete>

     •    java -Ddata=args -jar post.jar
          "<delete><query>*:*</query></delete>"

 •    Update: simply <add> doc with same unique key

 •    Commit: <commit/>

 •    Optimize: <optimize/>
© 2008-2009              Lucid Imagination, Inc.
                                                      36
Faceting


• Counts per subset within results
• Facet on: field terms, queries, date
    ranges
• &facet=on
    &facet.field=cat
    &facet.query=price:[0 TO 100]
• http://wiki.apache.org/solr/
    SimpleFacetParameters
© 2008-2009          Lucid Imagination, Inc.
                                               37
Spell checking


•    Not enabled by default, see example config to wire it in

•    http://localhost:8983/solr/spell?
     q=epod&spellcheck=on&spellcheck.build=true

•    File or index-based dictionaries

•    Supports pluggable distance algorithms: Levenstein and
     JaroWinkler

•    http://wiki.apache.org/solr/SpellCheckComponent


© 2008-2009                Lucid Imagination, Inc.
                                                               38
Highlighting


 • http://localhost:8983/solr/select?
      q=ipod&hl=on&hl.fl=manu,name
 • http://wiki.apache.org/solr/
      HighlightingParameters




© 2008-2009           Lucid Imagination, Inc.
                                                39
More Like This


 • http://localhost:8983/solr/select?
      q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min
      df=1&mlt.mintf=1&fl=id,score,name
 • http://wiki.apache.org/solr/MoreLikeThis


© 2008-2009          Lucid Imagination, Inc.
                                               40
Scaling: Query Throughput

 • Replication
  • slaves poll master for index updates
  • transfers index files from master to slave
  • configuration files can also be transferred
  • entirely Java/HTTP-based in Solr 1.4
          (prior versions used rsync)



© 2008-2009              Lucid Imagination, Inc.
                                                   41
Scaling: Collection Size

 • Distribution
  • Index documents across shards
  • query single server with shards
          parameter
         • sends requests to each shard
         • aggregates result to a single response

© 2008-2009             Lucid Imagination, Inc.
                                                    42
Solr-powered UI

 • Solritas (from "celeritas"):
      VelocityResponseWriter
     • easily templated output
 • SolrJS: jQuery-based widgets
  • see http://solrjs.solrstuff.org/
 • Blacklight and Flare: RoR plugins

© 2008-2009           Lucid Imagination, Inc.
                                                43
Lucene in Action, 2nd Edition




              http://www.manning.com/lucene
© 2008-2009               Lucid Imagination, Inc.
                                                    44
Search at Lucid
http://search.lucidimagination.com/?q=javazone




© 2008-2009         Lucid Imagination, Inc.
                                                 45
/")$/#$0(#
            !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12&
            !"#2+29:-43&2#-050,2(
            !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6&
            !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230&
            <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3&
            !"#2+29:-43>;)02%&02)3#1&0-4",$-+0
                 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'(
                 (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&&




  A&BCCD>BCCE
   © 2008-2009                     !"#$%&'()*$+),$-+.&'+#/Inc.
                                   Lucid Imagination,            !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%

                                                                                                                                                 46
!"#$%&'()*$+),$-+&./#0+$#)1&./)(
                          ! 2-+$3&4//1/56                                          ! <)8#&F8/11/+9,/$+6
                                     012),-1&-3&4-51&&
     Unique                          !"#2+264-51&#-(($,,21.&780&(2(921
                                                                                                 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:,
 Combination of           ! 78)+,&'+*/89-116
                                                                                                 H7&42)1#:.&0=G.&I5J2K$21
Enterprise Search                    !"#$%&"'&(')*+,#-#'.&&%'!$/01                 ! @8$)+&G$+3/8,-+6
   and Lucene                        !"#2+264-51&#-(($,,21.&0:)$1.&780                           L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,&
                          ! :8$3&;),#0/86                                                        #-(@12:2+J$K2&J2)1#:&2+*$+2&
    Expertise
                                     0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6                        71$+#$@)5&P1#:$,2#,&),&PF
                                     !"#2+264-51&#-(($,,21.&780&(2(921             ! 4$(-+&H-9/+,0)16
                          ! <)83&<$11/8                                                          4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:,
                                     !"#2+264-51&#-(($,,21.&780&
                                     (2(921                                        ! I)5&;$116
                          ! 4)($&4$8/+                                                           4-5",$-+J&P1#:$,2#,.&M255J&Q)1*-
                                     <",#:6=$>)&#-(($,,21.&780&(2(921
                                                                                   ! H5)+&<#F$+1/56
                          ! =+%8>/?&@$1)1/#3$&
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     !"#2+26<",#:6?)%--@&#-(($,,21.&780&
                                     (2(921&
                                                                                   ! B08$9&;-9,/,,/86&C=%D$9-8E
                          ! A-"*&B",,$+*6&C=%D$9-8E
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     012),-1&-3&!"#2+2.&<",#:&A&?)%--@
                                                                                                 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+


       B&CDDE;CDDF
           © 2008-2009                                   !"#$%&'()*$+),$-+.&'+#/
                                                         Lucid Imagination, Inc.
                                                                                                                                          47
!"#$%&'()*$+),$-+&."/$+0//&1-%02
  ;:00
<-=+2-)%
                                                                                  ()*+,-,./+"0+,/.1)
                       2+,*.3.+4"5./*,.67*.1)/
                             & 8,++"&

                        3)2"04)%%&567

     !"#0+0
                                                   89*:)%0
   >9)#?0@-:*




      2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@"


  !"#$$%&#$$'
         © 2008-2009                        A7:.4"B9;@.);*.1) 21)3.4+)*.;<   !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%
                                                   Lucid Imagination, Inc.
                                                                                                                                                             48
Thank you




              http://www.lucidimagination.com
© 2008-2009                Lucid Imagination, Inc.
                                                                 49
© 2008-2009   Lucid Imagination, Inc.
                                        50

More Related Content

What's hot

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 

What's hot (20)

Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Python-03| Data types
Python-03| Data typesPython-03| Data types
Python-03| Data types
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Introduction to php
Introduction to phpIntroduction to php
Introduction to php
 
Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python Programming
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Python programming
Python  programmingPython  programming
Python programming
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 

Viewers also liked

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 

Viewers also liked (20)

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 

Similar to Solr: Search at the Speed of Light

Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Lucidworks (Archived)
 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
MySQLConference
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Yukinori Suda
 

Similar to Solr: Search at the Speed of Light (20)

The Seven Deadly Sins of Solr
The Seven Deadly Sins of SolrThe Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)
 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
 
Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla   Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla
 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
 
Oracle+golden+gate+introduction
Oracle+golden+gate+introductionOracle+golden+gate+introduction
Oracle+golden+gate+introduction
 
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HøydahlOslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
 
Mule ESB - Integration Simplified
Mule ESB - Integration SimplifiedMule ESB - Integration Simplified
Mule ESB - Integration Simplified
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
 
Ontology and semantic web (2016)
Ontology and semantic web (2016)Ontology and semantic web (2016)
Ontology and semantic web (2016)
 
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
MarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheConMarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheCon
 
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
 
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and CascadingBuilding Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascading
 

More from Erik Hatcher

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

More from Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Solr: Search at the Speed of Light

  • 1. Solr Search at the Speed of Light JavaZone 2009 September 10 Oslo Erik Hatcher, Lucid Imagination erik.hatcher@lucidimagination.com 1
  • 2. Solr History • Created by Yonik Seeley for CNET • Contributed to Apache in January 2006 • December 2006:Version 1.1 released • June 2007:Version 1.2 released • September 2008:Version 1.3 released • ~September 2009:Version 1.4 http://lucene.apache.org/solr © 2008-2009 Lucid Imagination, Inc. 2
  • 3. Solr: Big Picture Data DB Document Document Documents Solr Search Results © 2008-2009 Lucid Imagination, Inc. 3
  • 4. Features • Lucene power exposed over HTTP • Scalability: caching, replication, distributed search • Faceting • And more: spell checking, highlighting, clustering, rich document and DB indexing, "more like this" © 2008-2009 Lucid Imagination, Inc. 4
  • 5. Lucene • Fast, scalable search library • Lucene index structure • Index contains documents • documents have fields • indexed fields have terms © 2008-2009 Lucid Imagination, Inc. 5
  • 6. Inverted Index • Commonly used search engine data structure • Efficient lookup of terms across large number of documents • Usually stores positional information to enable From "Taming Text" by Grant Ingersoll and Tom Morton phrase/proximity queries © 2008-2009 Lucid Imagination, Inc. 6
  • 7. Analysis Process © 2008-2009 Lucid Imagination, Inc. 7
  • 8. Analyzing the analyzer Example phrase The quick brown fox jumps over the lazy dog. © 2008-2009 Lucid Imagination, Inc. 8
  • 9. WhitespaceAnalyzer Simplest built-in analyzer The quick brown fox jumps over the lazy dog. [The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.] © 2008-2009 Lucid Imagination, Inc. 9
  • 10. SimpleAnalyzer Lowercases, splits at non-letter boundaries the quick brown fox jumps over the lazy dog. [the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 10
  • 11. StopAnalyzer Lowercases and removes stop words The quick brown fox jumps over the lazy dog. [quick] [brown] [fox] [jumps] [over] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 11
  • 12. SnowballAnalyzer Stemming algorithm The quick brown fox jumps over the lazi dog. [the] [quick] [brown] [fox] [jump] [over] [the] [lazi] [dog] © 2008-2009 Lucid Imagination, Inc. 12
  • 13. What's in a token? © 2008-2009 Lucid Imagination, Inc. 13
  • 14. Relevance • Term frequency (TF): number of times a term appears in a document • Inverse document frequency (IDF): One over number of times term appears in the index (1/df) • Field length normalization: control affect field length, in number of terms, has on score • Boost factors: terms, fields, or documents © 2008-2009 Lucid Imagination, Inc. 14
  • 15. Lucene Scoring d1 q1 Θ © 2008-2009 Lucid Imagination, Inc. 15
  • 16. Solr APIs • HTTP GET/POST (curl or any other HTTP client) • JSON • SolrJ (embedded or HTTP) • solr-ruby • python, PHP, solrsharp, XSLT © 2008-2009 Lucid Imagination, Inc. 16
  • 17. Solr in Production Incoming Search Requests Load Balancer Solr Solr Master Solr Master Shard Request Shard Request Load Balancer Load Balancer Shard Shard Shard Shard Master 1..n Master Replicant shards Replicant Replicant Replicant Replicant Replicant Replicant Replicant © 2008-2009 Lucid Imagination, Inc. 17
  • 18. Getting Started: It's This Easy 1.Start Solr java -jar start.jar 2.Index your data java -jar post.jar *.xml 3.Search http://localhost:8983/solr © 2008-2009 Lucid Imagination, Inc. 18
  • 19. Configuration • schema.xml • field types and fields • solrconfig.xml • request handler mappings • cache settings: filter, query, document • warming listeners • HTTP cache settings • Lucene index parameters • plugins: spell checking, highlighting © 2008-2009 Lucid Imagination, Inc. 19
  • 20. Solr add/update XML <add><doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> © 2008-2009 Lucid Imagination, Inc. 20
  • 21. Indexing Solr XML • Via curl:'http://localhost:8983/ curl solr/update?commit=true' -- data-binary @ipod_video.xml - H 'Content-type:text/xml; charset=utf-8' • Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml © 2008-2009 Lucid Imagination, Inc. 21
  • 22. Indexing CSV curl 'http://localhost:8983/solr/update/ csv?commit=true' --data-binary @books.csv - H 'Content-type:text/plain; charset=utf-8' © 2008-2009 Lucid Imagination, Inc. 22
  • 23. Content Streams • Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml • http://localhost:8983/solr/update?stream.file=<local Solr path to exampledocs>/ipod_video.xml • &stream.url=<url to content> • Security warning: allows Solr to fetch arbitrary server-side file or network URL content © 2008-2009 Lucid Imagination, Inc. 23
  • 24. Indexing Rich Documents curl 'http://localhost:8983/solr/update/ extract? literal.id=doc1&commit=true&extractOnly=true &wt=ruby&indent=on' -F "myfile=@tutorial.html" © 2008-2009 Lucid Imagination, Inc. 24
  • 25. Indexing with SolrJ SolrServer solr = new CommonsHttpSolrServer(new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "JAVAZONE_09"); doc.addField("title", "JavaZone 2009 SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, when needed © 2008-2009 Lucid Imagination, Inc. 25
  • 26. Indexing with Ruby solr = Connection.new( 'http://localhost:8983/solr', :autocommit => :on) solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed © 2008-2009 Lucid Imagination, Inc. 26
  • 27. Data Import Handler • Indexes relational database, XML data sources, e-mail, and more • Supports full and incremental/delta indexing • Extensible with custom data sources, transformers, etc • http://wiki.apache.org/solr/DataImportHandler © 2008-2009 Lucid Imagination, Inc. 27
  • 29. Example Search Request • http://localhost:8983/solr/select?q=query • &start=50 • &rows=25 • &fq=filter+query • &facet=on&facet.field=category © 2008-2009 Lucid Imagination, Inc. 29
  • 30. Debug Query • &debugQuery=true is your friend • Includes parsed query, explanations, and search component timings in response © 2008-2009 Lucid Imagination, Inc. 30
  • 31. Query Parser • Controlled by defType parameter • &defType=lucene (actually a Solr extension of Lucene’s QueryParser) • &defType=dismax • Local {!..} override syntax © 2008-2009 Lucid Imagination, Inc. 31
  • 32. Solr Query Parser • http://lucene.apache.org/java/2_4_0/ queryparsersyntax.html + Solr extensions • Kitchen sink parser, includes advanced user- unfriendly syntax • Syntax errors throw parse exceptions back to client • Example: title:ipod* AND price:[0 TO 100] © 2008-2009 Lucid Imagination, Inc. 32
  • 33. Dismax Query Parser • Simplified syntax: loose text “quote phrases” -prohibited +required • Spreads query terms across query fields (qf) with dynamic boosting per field, implicit phrase construction (pf), boosting function (bf), boosting query (bq), and minimum match (mm) © 2008-2009 Lucid Imagination, Inc. 33
  • 34. Searching with SolrJ SolrServer server = new CommonsHttpSolrServer("http:// localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } © 2008-2009 Lucid Imagination, Inc. 34
  • 35. Searching with Ruby conn = Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end © 2008-2009 Lucid Imagination, Inc. 35
  • 36. delete, update, etc • Delete: • <delete><id>05991</id></delete> • <delete> <query>category:Unused</query> </delete> • java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" • Update: simply <add> doc with same unique key • Commit: <commit/> • Optimize: <optimize/> © 2008-2009 Lucid Imagination, Inc. 36
  • 37. Faceting • Counts per subset within results • Facet on: field terms, queries, date ranges • &facet=on &facet.field=cat &facet.query=price:[0 TO 100] • http://wiki.apache.org/solr/ SimpleFacetParameters © 2008-2009 Lucid Imagination, Inc. 37
  • 38. Spell checking • Not enabled by default, see example config to wire it in • http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build=true • File or index-based dictionaries • Supports pluggable distance algorithms: Levenstein and JaroWinkler • http://wiki.apache.org/solr/SpellCheckComponent © 2008-2009 Lucid Imagination, Inc. 38
  • 39. Highlighting • http://localhost:8983/solr/select? q=ipod&hl=on&hl.fl=manu,name • http://wiki.apache.org/solr/ HighlightingParameters © 2008-2009 Lucid Imagination, Inc. 39
  • 40. More Like This • http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min df=1&mlt.mintf=1&fl=id,score,name • http://wiki.apache.org/solr/MoreLikeThis © 2008-2009 Lucid Imagination, Inc. 40
  • 41. Scaling: Query Throughput • Replication • slaves poll master for index updates • transfers index files from master to slave • configuration files can also be transferred • entirely Java/HTTP-based in Solr 1.4 (prior versions used rsync) © 2008-2009 Lucid Imagination, Inc. 41
  • 42. Scaling: Collection Size • Distribution • Index documents across shards • query single server with shards parameter • sends requests to each shard • aggregates result to a single response © 2008-2009 Lucid Imagination, Inc. 42
  • 43. Solr-powered UI • Solritas (from "celeritas"): VelocityResponseWriter • easily templated output • SolrJS: jQuery-based widgets • see http://solrjs.solrstuff.org/ • Blacklight and Flare: RoR plugins © 2008-2009 Lucid Imagination, Inc. 43
  • 44. Lucene in Action, 2nd Edition http://www.manning.com/lucene © 2008-2009 Lucid Imagination, Inc. 44
  • 46. /")$/#$0(# !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12& !"#2+29:-43&2#-050,2( !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6& !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230& <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3& !"#2+29:-43>;)02%&02)3#1&0-4",$-+0 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'( (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&& A&BCCD>BCCE © 2008-2009 !"#$%&'()*$+),$-+.&'+#/Inc. Lucid Imagination, !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% 46
  • 47. !"#$%&'()*$+),$-+&./#0+$#)1&./)( ! 2-+$3&4//1/56 ! <)8#&F8/11/+9,/$+6 012),-1&-3&4-51&& Unique !"#2+264-51&#-(($,,21.&780&(2(921 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:, Combination of ! 78)+,&'+*/89-116 H7&42)1#:.&0=G.&I5J2K$21 Enterprise Search !"#$%&"'&(')*+,#-#'.&&%'!$/01 ! @8$)+&G$+3/8,-+6 and Lucene !"#2+264-51&#-(($,,21.&0:)$1.&780 L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,& ! :8$3&;),#0/86 #-(@12:2+J$K2&J2)1#:&2+*$+2& Expertise© 2008-2009 !"#$%&'()*$+),$-+.&'+#/ Lucid Imagination, Inc. 47
  • 48. !"#$%&'()*$+),$-+&."/$+0//&1-%02 ;:00 <-=+2-)% ()*+,-,./+"0+,/.1) 2+,*.3.+4"5./*,.67*.1)/ & 8,++"& 3)2"04)%%&567 !"#0+0 89*:)%0 >9)#?0@-:* 2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@" !"#$$%&#$$' © 2008-2009 A7:.4"B9;@.);*.1) 21)3.4+)*.;< !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% Lucid Imagination, Inc. 48
  • 49. Thank you http://www.lucidimagination.com © 2008-2009 Lucid Imagination, Inc. 49
  • 50. © 2008-2009 Lucid Imagination, Inc. 50