SlideShare a Scribd company logo
1 of 50
Download to read offline
Solr
   Search at the Speed of Light


          JavaZone 2009
           September 10
               Oslo
  Erik Hatcher, Lucid Imagination
erik.hatcher@lucidimagination.com




                                    1
Solr History

     ā€¢ Created by Yonik Seeley for CNET
     ā€¢ Contributed to Apache in January 2006
     ā€¢ December 2006:Version 1.1 released
     ā€¢ June 2007:Version 1.2 released
     ā€¢ September 2008:Version 1.3 released
     ā€¢ ~September 2009:Version 1.4
http://lucene.apache.org/solr
    Ā© 2008-2009          Lucid Imagination, Inc.
                                                   2
Solr: Big Picture
                                   Data


                                                       DB


              Document
               Document
                 Documents




                                Solr




                               Search Results




Ā© 2008-2009                  Lucid Imagination, Inc.
                                                            3
Features

 ā€¢ Lucene power exposed over HTTP
 ā€¢ Scalability: caching, replication, distributed
      search
 ā€¢ Faceting
 ā€¢ And more: spell checking, highlighting,
      clustering, rich document and DB indexing,
      "more like this"


Ā© 2008-2009            Lucid Imagination, Inc.
                                                    4
Lucene

 ā€¢ Fast, scalable search library
 ā€¢ Lucene index structure
  ā€¢ Index contains documents
    ā€¢ documents have ļ¬elds
      ā€¢ indexed ļ¬elds have terms

Ā© 2008-2009        Lucid Imagination, Inc.
                                             5
Inverted Index

 ā€¢ Commonly used search
      engine data structure
 ā€¢ Efļ¬cient lookup of terms
      across large number of
      documents
 ā€¢ Usually stores positional
      information to enable From "Taming Text" by Grant Ingersoll and Tom Morton
      phrase/proximity queries


Ā© 2008-2009                     Lucid Imagination, Inc.
                                                                                   6
Analysis Process




Ā© 2008-2009         Lucid Imagination, Inc.
                                              7
Analyzing the analyzer
                    Example phrase

      The quick brown fox jumps over the lazy dog.




Ā© 2008-2009            Lucid Imagination, Inc.
                                                     8
WhitespaceAnalyzer
                Simplest built-in analyzer
      The quick brown fox jumps over the lazy dog.




  [The] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog.]

Ā© 2008-2009             Lucid Imagination, Inc.
                                                     9
SimpleAnalyzer
          Lowercases, splits at non-letter boundaries
      the quick brown fox jumps over the lazy dog.




  [the] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog]

Ā© 2008-2009               Lucid Imagination, Inc.
                                                        10
StopAnalyzer
              Lowercases and removes stop words


      The quick brown fox jumps over the lazy dog.




 [quick] [brown] [fox] [jumps] [over] [lazy] [dog]




Ā© 2008-2009               Lucid Imagination, Inc.
                                                     11
SnowballAnalyzer
                   Stemming algorithm
      The quick brown fox jumps over the lazi dog.




   [the] [quick] [brown] [fox] [jump] [over] [the]
                     [lazi] [dog]

Ā© 2008-2009            Lucid Imagination, Inc.
                                                     12
What's in a token?




Ā© 2008-2009          Lucid Imagination, Inc.
                                               13
Relevance

 ā€¢    Term frequency (TF): number of times a term
      appears in a document

 ā€¢    Inverse document frequency (IDF): One over
      number of times term appears in the index (1/df)

 ā€¢    Field length normalization: control affect ļ¬eld
      length, in number of terms, has on score

 ā€¢    Boost factors: terms, ļ¬elds, or documents



Ā© 2008-2009               Lucid Imagination, Inc.
                                                         14
Lucene Scoring
                                  d1




                                                q1
                  Ī˜




Ā© 2008-2009           Lucid Imagination, Inc.
                                                     15
Solr APIs

 ā€¢ HTTP GET/POST (curl or any other HTTP
      client)
 ā€¢ JSON
 ā€¢ SolrJ (embedded or HTTP)
 ā€¢ solr-ruby
 ā€¢ python, PHP, solrsharp, XSLT

Ā© 2008-2009         Lucid Imagination, Inc.
                                              16
Solr in Production
                                              Incoming Search
                                                  Requests




                                               Load Balancer




                                                  Solr
                                                 Solr Master
                                                  Solr Master


                              Shard Request                    Shard Request


                   Load Balancer                                          Load Balancer



                      Shard                                                    Shard
          Shard                                                  Shard
          Master                                 1..n            Master
                          Replicant             shards                            Replicant
                           Replicant                                               Replicant
                            Replicant                                               Replicant
                              Replicant                                               Replicant




Ā© 2008-2009                                    Lucid Imagination, Inc.
                                                                                                  17
Getting Started:
                 It's This Easy
1.Start Solr

  java -jar start.jar
2.Index your data

  java -jar post.jar *.xml
3.Search

  http://localhost:8983/solr
  Ā© 2008-2009         Lucid Imagination, Inc.
                                                18
Conļ¬guration
 ā€¢    schema.xml

     ā€¢    ļ¬eld types and ļ¬elds

 ā€¢    solrconļ¬g.xml

     ā€¢    request handler mappings

     ā€¢    cache settings: ļ¬lter, query, document

     ā€¢    warming listeners

     ā€¢    HTTP cache settings

     ā€¢    Lucene index parameters

     ā€¢    plugins: spell checking, highlighting


Ā© 2008-2009                      Lucid Imagination, Inc.
                                                           19
Solr add/update XML
<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of
               video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display
                         with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless,
                         H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display,
      Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware,
      USB 2.0 compatibility, Playback speed control, Rechargeable capability,
      Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
</doc></add>


     Ā© 2008-2009                     Lucid Imagination, Inc.
                                                                                     20
Indexing Solr XML
 ā€¢ Via curl:'http://localhost:8983/
   curl
      solr/update?commit=true' --
      data-binary @ipod_video.xml -
      H 'Content-type:text/xml;
      charset=utf-8'

 ā€¢ Via Solr's Java-based post tool:
      java -jar post.jar ipod_video.xml



Ā© 2008-2009            Lucid Imagination, Inc.
                                                 21
Indexing CSV


curl 'http://localhost:8983/solr/update/
csv?commit=true' --data-binary @books.csv -
H 'Content-type:text/plain; charset=utf-8'




   Ā© 2008-2009       Lucid Imagination, Inc.
                                               22
Content Streams

 ā€¢    Allows Solr server to fetch local or remote data
      itself. Must enable remote streaming in
      solrconļ¬g.xml

 ā€¢    http://localhost:8983/solr/update?stream.ļ¬le=<local
      Solr path to exampledocs>/ipod_video.xml

 ā€¢    &stream.url=<url to content>

 ā€¢    Security warning: allows Solr to fetch arbitrary
      server-side ļ¬le or network URL content



Ā© 2008-2009                Lucid Imagination, Inc.
                                                            23
Indexing Rich Documents


curl 'http://localhost:8983/solr/update/
extract?
literal.id=doc1&commit=true&extractOnly=true
&wt=ruby&indent=on' -F
"myfile=@tutorial.html"




    Ā© 2008-2009     Lucid Imagination, Inc.
                                               24
Indexing with SolrJ

SolrServer solr =
    new CommonsHttpSolrServer(new URL("http://localhost:8983/solr"));

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "JAVAZONE_09");
doc.addField("title", "JavaZone 2009 SolrJ Example");
solr.add(doc);
solr.commit();     // after a batch, not per document
solr.optimize();   // periodically, when needed




    Ā© 2008-2009                Lucid Imagination, Inc.
                                                                        25
Indexing with Ruby

solr = Connection.new(
  'http://localhost:8983/solr',
  :autocommit => :on)

solr.add(:id => 123,
         :title => 'Solr in Action')

solr.optimize       # periodically, as needed




  Ā© 2008-2009           Lucid Imagination, Inc.
                                                  26
Data Import Handler


ā€¢ Indexes relational database, XML data sources,
   e-mail, and more
ā€¢ Supports full and incremental/delta indexing
ā€¢ Extensible with custom data sources,
   transformers, etc
ā€¢ http://wiki.apache.org/solr/DataImportHandler
 Ā© 2008-2009           Lucid Imagination, Inc.
                                                   27
DB Indexing



http://localhost:8983/solr/db/dataimport?
command=full-import




  Ā© 2008-2009       Lucid Imagination, Inc.
                                              28
Example Search Request

 ā€¢ http://localhost:8983/solr/select?q=query
  ā€¢ &start=50
  ā€¢ &rows=25
  ā€¢ &fq=ļ¬lter+query
  ā€¢ &facet=on&facet.ļ¬eld=category

Ā© 2008-2009         Lucid Imagination, Inc.
                                               29
Debug Query


 ā€¢ &debugQuery=true is your friend
 ā€¢ Includes parsed query, explanations, and
      search component timings in response




Ā© 2008-2009           Lucid Imagination, Inc.
                                                30
Query Parser

 ā€¢ Controlled by defType parameter
  ā€¢ &defType=lucene (actually a Solr
          extension of Luceneā€™s QueryParser)
     ā€¢ &defType=dismax
 ā€¢ Local {!..} override syntax

Ā© 2008-2009             Lucid Imagination, Inc.
                                                  31
Solr Query Parser

 ā€¢ http://lucene.apache.org/java/2_4_0/
      queryparsersyntax.html + Solr extensions
 ā€¢ Kitchen sink parser, includes advanced user-
      unfriendly syntax
 ā€¢ Syntax errors throw parse exceptions back
      to client
 ā€¢ Example: title:ipod* AND price:[0 TO 100]
Ā© 2008-2009               Lucid Imagination, Inc.
                                                    32
Dismax Query Parser

 ā€¢ Simpliļ¬ed syntax:
      loose text ā€œquote phrasesā€ -prohibited
      +required
 ā€¢ Spreads query terms across query ļ¬elds
      (qf) with dynamic boosting per ļ¬eld, implicit
      phrase construction (pf), boosting function
      (bf), boosting query (bq), and minimum
      match (mm)


Ā© 2008-2009            Lucid Imagination, Inc.
                                                      33
Searching with SolrJ


SolrServer server = new CommonsHttpSolrServer("http://
  localhost:8983/solr");
SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);
QueryResponse response = server.query(params);
for (SolrDocument document : response.getResults()) {
      System.out.println("Doc: " + document);
}




   Ā© 2008-2009            Lucid Imagination, Inc.
                                                         34
Searching with Ruby


conn = Connection.new(
    'http://localhost:8983/solr')

conn.query('my query') do |hit|
  puts hit.inspect
end




Ā© 2008-2009           Lucid Imagination, Inc.
                                                35
delete, update, etc
 ā€¢    Delete:
     ā€¢ <delete><id>05991</id></delete>
     ā€¢    <delete>
             <query>category:Unused</query>
          </delete>

     ā€¢    java -Ddata=args -jar post.jar
          "<delete><query>*:*</query></delete>"

 ā€¢    Update: simply <add> doc with same unique key

 ā€¢    Commit: <commit/>

 ā€¢    Optimize: <optimize/>
Ā© 2008-2009              Lucid Imagination, Inc.
                                                      36
Faceting


ā€¢ Counts per subset within results
ā€¢ Facet on: ļ¬eld terms, queries, date
    ranges
ā€¢ &facet=on
    &facet.ļ¬eld=cat
    &facet.query=price:[0 TO 100]
ā€¢ http://wiki.apache.org/solr/
    SimpleFacetParameters
Ā© 2008-2009          Lucid Imagination, Inc.
                                               37
Spell checking


ā€¢    Not enabled by default, see example conļ¬g to wire it in

ā€¢    http://localhost:8983/solr/spell?
     q=epod&spellcheck=on&spellcheck.build=true

ā€¢    File or index-based dictionaries

ā€¢    Supports pluggable distance algorithms: Levenstein and
     JaroWinkler

ā€¢    http://wiki.apache.org/solr/SpellCheckComponent


Ā© 2008-2009                Lucid Imagination, Inc.
                                                               38
Highlighting


 ā€¢ http://localhost:8983/solr/select?
      q=ipod&hl=on&hl.ļ¬‚=manu,name
 ā€¢ http://wiki.apache.org/solr/
      HighlightingParameters




Ā© 2008-2009           Lucid Imagination, Inc.
                                                39
More Like This


 ā€¢ http://localhost:8983/solr/select?
      q=ipod&mlt=true&mlt.ļ¬‚=manu,cat&mlt.min
      df=1&mlt.mintf=1&ļ¬‚=id,score,name
 ā€¢ http://wiki.apache.org/solr/MoreLikeThis


Ā© 2008-2009          Lucid Imagination, Inc.
                                               40
Scaling: Query Throughput

 ā€¢ Replication
  ā€¢ slaves poll master for index updates
  ā€¢ transfers index ļ¬les from master to slave
  ā€¢ conļ¬guration ļ¬les can also be transferred
  ā€¢ entirely Java/HTTP-based in Solr 1.4
          (prior versions used rsync)



Ā© 2008-2009              Lucid Imagination, Inc.
                                                   41
Scaling: Collection Size

 ā€¢ Distribution
  ā€¢ Index documents across shards
  ā€¢ query single server with shards
          parameter
         ā€¢ sends requests to each shard
         ā€¢ aggregates result to a single response

Ā© 2008-2009             Lucid Imagination, Inc.
                                                    42
Solr-powered UI

 ā€¢ Solritas (from "celeritas"):
      VelocityResponseWriter
     ā€¢ easily templated output
 ā€¢ SolrJS: jQuery-based widgets
  ā€¢ see http://solrjs.solrstuff.org/
 ā€¢ Blacklight and Flare: RoR plugins

Ā© 2008-2009           Lucid Imagination, Inc.
                                                43
Lucene in Action, 2nd Edition




              http://www.manning.com/lucene
Ā© 2008-2009               Lucid Imagination, Inc.
                                                    44
Search at Lucid
http://search.lucidimagination.com/?q=javazone




Ā© 2008-2009         Lucid Imagination, Inc.
                                                 45
/")$/#$0(#
            !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12&
            !"#2+29:-43&2#-050,2(
            !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6&
            !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230&
            <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3&
            !"#2+29:-43>;)02%&02)3#1&0-4",$-+0
                 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'(
                 (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&&




  A&BCCD>BCCE
   Ā© 2008-2009                     !"#$%&'()*$+),$-+.&'+#/Inc.
                                   Lucid Imagination,            !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%

                                                                                                                                                 46
!"#$%&'()*$+),$-+&./#0+$#)1&./)(
                          ! 2-+$3&4//1/56                                          ! <)8#&F8/11/+9,/$+6
                                     012),-1&-3&4-51&&
     Unique                          !"#2+264-51&#-(($,,21.&780&(2(921
                                                                                                 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:,
 Combination of           ! 78)+,&'+*/89-116
                                                                                                 H7&42)1#:.&0=G.&I5J2K$21
Enterprise Search                    !"#$%&"'&(')*+,#-#'.&&%'!$/01                 ! @8$)+&G$+3/8,-+6
   and Lucene                        !"#2+264-51&#-(($,,21.&0:)$1.&780                           L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,&
                          ! :8$3&;),#0/86                                                        #-(@12:2+J$K2&J2)1#:&2+*$+2&
    Expertise
                                     0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6                        71$+#$@)5&P1#:$,2#,&),&PF
                                     !"#2+264-51&#-(($,,21.&780&(2(921             ! 4$(-+&H-9/+,0)16
                          ! <)83&<$11/8                                                          4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:,
                                     !"#2+264-51&#-(($,,21.&780&
                                     (2(921                                        ! I)5&;$116
                          ! 4)($&4$8/+                                                           4-5",$-+J&P1#:$,2#,.&M255J&Q)1*-
                                     <",#:6=$>)&#-(($,,21.&780&(2(921
                                                                                   ! H5)+&<#F$+1/56
                          ! =+%8>/?&@$1)1/#3$&
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     !"#2+26<",#:6?)%--@&#-(($,,21.&780&
                                     (2(921&
                                                                                   ! B08$9&;-9,/,,/86&C=%D$9-8E
                          ! A-"*&B",,$+*6&C=%D$9-8E
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     012),-1&-3&!"#2+2.&<",#:&A&?)%--@
                                                                                                 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+


       B&CDDE;CDDF
           Ā© 2008-2009                                   !"#$%&'()*$+),$-+.&'+#/
                                                         Lucid Imagination, Inc.
                                                                                                                                          47
!"#$%&'()*$+),$-+&."/$+0//&1-%02
  ;:00
<-=+2-)%
                                                                                  ()*+,-,./+"0+,/.1)
                       2+,*.3.+4"5./*,.67*.1)/
                             & 8,++"&

                        3)2"04)%%&567

     !"#0+0
                                                   89*:)%0
   >9)#?0@-:*




      2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@"


  !"#$$%&#$$'
         Ā© 2008-2009                        A7:.4"B9;@.);*.1) 21)3.4+)*.;<   !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%
                                                   Lucid Imagination, Inc.
                                                                                                                                                             48
Thank you




              http://www.lucidimagination.com
Ā© 2008-2009                Lucid Imagination, Inc.
                                                                 49
Ā© 2008-2009   Lucid Imagination, Inc.
                                        50

More Related Content

What's hot

Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai WƤhner
Ā 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
Ā 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
Ā 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
Ā 

What's hot (20)

Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Ā 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
Ā 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Ā 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
Ā 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
Ā 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Ā 
ACID ORC, Iceberg, and Delta Lakeā€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeā€”An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lakeā€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeā€”An Overview of Table Formats for Large Scal...
Ā 
Query Optimizer ā€“ MySQL vs. PostgreSQL
Query Optimizer ā€“ MySQL vs. PostgreSQLQuery Optimizer ā€“ MySQL vs. PostgreSQL
Query Optimizer ā€“ MySQL vs. PostgreSQL
Ā 
Making Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQLMaking Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQL
Ā 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Ā 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
Ā 
ELK introduction
ELK introductionELK introduction
ELK introduction
Ā 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
Ā 
Golang Vs Rust
Golang Vs Rust Golang Vs Rust
Golang Vs Rust
Ā 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
Ā 
SQL INJECTION
SQL INJECTIONSQL INJECTION
SQL INJECTION
Ā 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
Ā 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Ā 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Ā 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
Ā 

Viewers also liked

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
Ā 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
Ā 

Viewers also liked (20)

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Ā 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
Ā 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Ā 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
Ā 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Ā 
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Ā 
Solr introduction
Solr introductionSolr introduction
Solr introduction
Ā 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Ā 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
Ā 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Ā 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
Ā 
Spark overview
Spark overviewSpark overview
Spark overview
Ā 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
Ā 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
Ā 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Ā 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
Ā 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
Ā 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Ā 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Ā 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Ā 

Similar to Solr: Search at the Speed of Light

Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Lucidworks (Archived)
Ā 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
MySQLConference
Ā 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Yukinori Suda
Ā 

Similar to Solr: Search at the Speed of Light (20)

The Seven Deadly Sins of Solr
The Seven Deadly Sins of SolrThe Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
Ā 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
Ā 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
Ā 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)
Ā 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Ā 
Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla   Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla
Ā 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Ā 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Ā 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Ā 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Ā 
Oracle+golden+gate+introduction
Oracle+golden+gate+introductionOracle+golden+gate+introduction
Oracle+golden+gate+introduction
Ā 
Oslo Enterprise MeetUp May 12th 2010 - Jan HĆøydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HĆøydahlOslo Enterprise MeetUp May 12th 2010 - Jan HĆøydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HĆøydahl
Ā 
Mule ESB - Integration Simplified
Mule ESB - Integration SimplifiedMule ESB - Integration Simplified
Mule ESB - Integration Simplified
Ā 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
Ā 
Ontology and semantic web (2016)
Ontology and semantic web (2016)Ontology and semantic web (2016)
Ontology and semantic web (2016)
Ā 
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
Ā 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
Ā 
MarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheConMarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheCon
Ā 
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Ā 
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and CascadingBuilding Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascading
Ā 

More from Erik Hatcher

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
Ā 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Ā 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Ā 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Ā 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
Ā 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
Ā 

More from Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
Ā 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
Ā 
it's just search
it's just searchit's just search
it's just search
Ā 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Ā 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Ā 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
Ā 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Ā 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
Ā 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
Ā 
Solr 4
Solr 4Solr 4
Solr 4
Ā 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
Ā 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Ā 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Ā 
Solr Flair
Solr FlairSolr Flair
Solr Flair
Ā 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Ā 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Ā 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Ā 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Ā 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Ā 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Ā 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(ā˜Žļø+971_581248768%)**%*]'#abortion pills for sale in dubai@
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Ā 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 

Solr: Search at the Speed of Light

  • 1. Solr Search at the Speed of Light JavaZone 2009 September 10 Oslo Erik Hatcher, Lucid Imagination erik.hatcher@lucidimagination.com 1
  • 2. Solr History ā€¢ Created by Yonik Seeley for CNET ā€¢ Contributed to Apache in January 2006 ā€¢ December 2006:Version 1.1 released ā€¢ June 2007:Version 1.2 released ā€¢ September 2008:Version 1.3 released ā€¢ ~September 2009:Version 1.4 http://lucene.apache.org/solr Ā© 2008-2009 Lucid Imagination, Inc. 2
  • 3. Solr: Big Picture Data DB Document Document Documents Solr Search Results Ā© 2008-2009 Lucid Imagination, Inc. 3
  • 4. Features ā€¢ Lucene power exposed over HTTP ā€¢ Scalability: caching, replication, distributed search ā€¢ Faceting ā€¢ And more: spell checking, highlighting, clustering, rich document and DB indexing, "more like this" Ā© 2008-2009 Lucid Imagination, Inc. 4
  • 5. Lucene ā€¢ Fast, scalable search library ā€¢ Lucene index structure ā€¢ Index contains documents ā€¢ documents have ļ¬elds ā€¢ indexed ļ¬elds have terms Ā© 2008-2009 Lucid Imagination, Inc. 5
  • 6. Inverted Index ā€¢ Commonly used search engine data structure ā€¢ Efļ¬cient lookup of terms across large number of documents ā€¢ Usually stores positional information to enable From "Taming Text" by Grant Ingersoll and Tom Morton phrase/proximity queries Ā© 2008-2009 Lucid Imagination, Inc. 6
  • 7. Analysis Process Ā© 2008-2009 Lucid Imagination, Inc. 7
  • 8. Analyzing the analyzer Example phrase The quick brown fox jumps over the lazy dog. Ā© 2008-2009 Lucid Imagination, Inc. 8
  • 9. WhitespaceAnalyzer Simplest built-in analyzer The quick brown fox jumps over the lazy dog. [The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.] Ā© 2008-2009 Lucid Imagination, Inc. 9
  • 10. SimpleAnalyzer Lowercases, splits at non-letter boundaries the quick brown fox jumps over the lazy dog. [the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog] Ā© 2008-2009 Lucid Imagination, Inc. 10
  • 11. StopAnalyzer Lowercases and removes stop words The quick brown fox jumps over the lazy dog. [quick] [brown] [fox] [jumps] [over] [lazy] [dog] Ā© 2008-2009 Lucid Imagination, Inc. 11
  • 12. SnowballAnalyzer Stemming algorithm The quick brown fox jumps over the lazi dog. [the] [quick] [brown] [fox] [jump] [over] [the] [lazi] [dog] Ā© 2008-2009 Lucid Imagination, Inc. 12
  • 13. What's in a token? Ā© 2008-2009 Lucid Imagination, Inc. 13
  • 14. Relevance ā€¢ Term frequency (TF): number of times a term appears in a document ā€¢ Inverse document frequency (IDF): One over number of times term appears in the index (1/df) ā€¢ Field length normalization: control affect ļ¬eld length, in number of terms, has on score ā€¢ Boost factors: terms, ļ¬elds, or documents Ā© 2008-2009 Lucid Imagination, Inc. 14
  • 15. Lucene Scoring d1 q1 Ī˜ Ā© 2008-2009 Lucid Imagination, Inc. 15
  • 16. Solr APIs ā€¢ HTTP GET/POST (curl or any other HTTP client) ā€¢ JSON ā€¢ SolrJ (embedded or HTTP) ā€¢ solr-ruby ā€¢ python, PHP, solrsharp, XSLT Ā© 2008-2009 Lucid Imagination, Inc. 16
  • 17. Solr in Production Incoming Search Requests Load Balancer Solr Solr Master Solr Master Shard Request Shard Request Load Balancer Load Balancer Shard Shard Shard Shard Master 1..n Master Replicant shards Replicant Replicant Replicant Replicant Replicant Replicant Replicant Ā© 2008-2009 Lucid Imagination, Inc. 17
  • 18. Getting Started: It's This Easy 1.Start Solr java -jar start.jar 2.Index your data java -jar post.jar *.xml 3.Search http://localhost:8983/solr Ā© 2008-2009 Lucid Imagination, Inc. 18
  • 19. Conļ¬guration ā€¢ schema.xml ā€¢ ļ¬eld types and ļ¬elds ā€¢ solrconļ¬g.xml ā€¢ request handler mappings ā€¢ cache settings: ļ¬lter, query, document ā€¢ warming listeners ā€¢ HTTP cache settings ā€¢ Lucene index parameters ā€¢ plugins: spell checking, highlighting Ā© 2008-2009 Lucid Imagination, Inc. 19
  • 20. Solr add/update XML <add><doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> Ā© 2008-2009 Lucid Imagination, Inc. 20
  • 21. Indexing Solr XML ā€¢ Via curl:'http://localhost:8983/ curl solr/update?commit=true' -- data-binary @ipod_video.xml - H 'Content-type:text/xml; charset=utf-8' ā€¢ Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml Ā© 2008-2009 Lucid Imagination, Inc. 21
  • 22. Indexing CSV curl 'http://localhost:8983/solr/update/ csv?commit=true' --data-binary @books.csv - H 'Content-type:text/plain; charset=utf-8' Ā© 2008-2009 Lucid Imagination, Inc. 22
  • 23. Content Streams ā€¢ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconļ¬g.xml ā€¢ http://localhost:8983/solr/update?stream.ļ¬le=<local Solr path to exampledocs>/ipod_video.xml ā€¢ &stream.url=<url to content> ā€¢ Security warning: allows Solr to fetch arbitrary server-side ļ¬le or network URL content Ā© 2008-2009 Lucid Imagination, Inc. 23
  • 24. Indexing Rich Documents curl 'http://localhost:8983/solr/update/ extract? literal.id=doc1&commit=true&extractOnly=true &wt=ruby&indent=on' -F "myfile=@tutorial.html" Ā© 2008-2009 Lucid Imagination, Inc. 24
  • 25. Indexing with SolrJ SolrServer solr = new CommonsHttpSolrServer(new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "JAVAZONE_09"); doc.addField("title", "JavaZone 2009 SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, when needed Ā© 2008-2009 Lucid Imagination, Inc. 25
  • 26. Indexing with Ruby solr = Connection.new( 'http://localhost:8983/solr', :autocommit => :on) solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed Ā© 2008-2009 Lucid Imagination, Inc. 26
  • 27. Data Import Handler ā€¢ Indexes relational database, XML data sources, e-mail, and more ā€¢ Supports full and incremental/delta indexing ā€¢ Extensible with custom data sources, transformers, etc ā€¢ http://wiki.apache.org/solr/DataImportHandler Ā© 2008-2009 Lucid Imagination, Inc. 27
  • 29. Example Search Request ā€¢ http://localhost:8983/solr/select?q=query ā€¢ &start=50 ā€¢ &rows=25 ā€¢ &fq=ļ¬lter+query ā€¢ &facet=on&facet.ļ¬eld=category Ā© 2008-2009 Lucid Imagination, Inc. 29
  • 30. Debug Query ā€¢ &debugQuery=true is your friend ā€¢ Includes parsed query, explanations, and search component timings in response Ā© 2008-2009 Lucid Imagination, Inc. 30
  • 31. Query Parser ā€¢ Controlled by defType parameter ā€¢ &defType=lucene (actually a Solr extension of Luceneā€™s QueryParser) ā€¢ &defType=dismax ā€¢ Local {!..} override syntax Ā© 2008-2009 Lucid Imagination, Inc. 31
  • 32. Solr Query Parser ā€¢ http://lucene.apache.org/java/2_4_0/ queryparsersyntax.html + Solr extensions ā€¢ Kitchen sink parser, includes advanced user- unfriendly syntax ā€¢ Syntax errors throw parse exceptions back to client ā€¢ Example: title:ipod* AND price:[0 TO 100] Ā© 2008-2009 Lucid Imagination, Inc. 32
  • 33. Dismax Query Parser ā€¢ Simpliļ¬ed syntax: loose text ā€œquote phrasesā€ -prohibited +required ā€¢ Spreads query terms across query ļ¬elds (qf) with dynamic boosting per ļ¬eld, implicit phrase construction (pf), boosting function (bf), boosting query (bq), and minimum match (mm) Ā© 2008-2009 Lucid Imagination, Inc. 33
  • 34. Searching with SolrJ SolrServer server = new CommonsHttpSolrServer("http:// localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } Ā© 2008-2009 Lucid Imagination, Inc. 34
  • 35. Searching with Ruby conn = Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end Ā© 2008-2009 Lucid Imagination, Inc. 35
  • 36. delete, update, etc ā€¢ Delete: ā€¢ <delete><id>05991</id></delete> ā€¢ <delete> <query>category:Unused</query> </delete> ā€¢ java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" ā€¢ Update: simply <add> doc with same unique key ā€¢ Commit: <commit/> ā€¢ Optimize: <optimize/> Ā© 2008-2009 Lucid Imagination, Inc. 36
  • 37. Faceting ā€¢ Counts per subset within results ā€¢ Facet on: ļ¬eld terms, queries, date ranges ā€¢ &facet=on &facet.ļ¬eld=cat &facet.query=price:[0 TO 100] ā€¢ http://wiki.apache.org/solr/ SimpleFacetParameters Ā© 2008-2009 Lucid Imagination, Inc. 37
  • 38. Spell checking ā€¢ Not enabled by default, see example conļ¬g to wire it in ā€¢ http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build=true ā€¢ File or index-based dictionaries ā€¢ Supports pluggable distance algorithms: Levenstein and JaroWinkler ā€¢ http://wiki.apache.org/solr/SpellCheckComponent Ā© 2008-2009 Lucid Imagination, Inc. 38
  • 39. Highlighting ā€¢ http://localhost:8983/solr/select? q=ipod&hl=on&hl.ļ¬‚=manu,name ā€¢ http://wiki.apache.org/solr/ HighlightingParameters Ā© 2008-2009 Lucid Imagination, Inc. 39
  • 40. More Like This ā€¢ http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.ļ¬‚=manu,cat&mlt.min df=1&mlt.mintf=1&ļ¬‚=id,score,name ā€¢ http://wiki.apache.org/solr/MoreLikeThis Ā© 2008-2009 Lucid Imagination, Inc. 40
  • 41. Scaling: Query Throughput ā€¢ Replication ā€¢ slaves poll master for index updates ā€¢ transfers index ļ¬les from master to slave ā€¢ conļ¬guration ļ¬les can also be transferred ā€¢ entirely Java/HTTP-based in Solr 1.4 (prior versions used rsync) Ā© 2008-2009 Lucid Imagination, Inc. 41
  • 42. Scaling: Collection Size ā€¢ Distribution ā€¢ Index documents across shards ā€¢ query single server with shards parameter ā€¢ sends requests to each shard ā€¢ aggregates result to a single response Ā© 2008-2009 Lucid Imagination, Inc. 42
  • 43. Solr-powered UI ā€¢ Solritas (from "celeritas"): VelocityResponseWriter ā€¢ easily templated output ā€¢ SolrJS: jQuery-based widgets ā€¢ see http://solrjs.solrstuff.org/ ā€¢ Blacklight and Flare: RoR plugins Ā© 2008-2009 Lucid Imagination, Inc. 43
  • 44. Lucene in Action, 2nd Edition http://www.manning.com/lucene Ā© 2008-2009 Lucid Imagination, Inc. 44
  • 46. /")$/#$0(# !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12& !"#2+29:-43&2#-050,2( !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6& !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230& <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3& !"#2+29:-43>;)02%&02)3#1&0-4",$-+0 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'( (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&& A&BCCD>BCCE Ā© 2008-2009 !"#$%&'()*$+),$-+.&'+#/Inc. Lucid Imagination, !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% 46
  • 47. !"#$%&'()*$+),$-+&./#0+$#)1&./)( ! 2-+$3&4//1/56 ! <)8#&F8/11/+9,/$+6 012),-1&-3&4-51&& Unique !"#2+264-51&#-(($,,21.&780&(2(921 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:, Combination of ! 78)+,&'+*/89-116 H7&42)1#:.&0=G.&I5J2K$21 Enterprise Search !"#$%&"'&(')*+,#-#'.&&%'!$/01 ! @8$)+&G$+3/8,-+6 and Lucene !"#2+264-51&#-(($,,21.&0:)$1.&780 L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,& ! :8$3&;),#0/86 #-(@12:2+J$K2&J2)1#:&2+*$+2& Expertise 0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6 71$+#$@)5&P1#:$,2#,&),&PF !"#2+264-51&#-(($,,21.&780&(2(921 ! 4$(-+&H-9/+,0)16 ! <)83&<$11/8 4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:, !"#2+264-51&#-(($,,21.&780& (2(921 ! I)5&;$116 ! 4)($&4$8/+ 4-5",$-+J&P1#:$,2#,.&M255J&Q)1*- <",#:6=$>)&#-(($,,21.&780&(2(921 ! H5)+&<#F$+1/56 ! =+%8>/?&@$1)1/#3$& !"#2+264-51&#-(($,,21.&&780&(2(921 !"#2+26<",#:6?)%--@&#-(($,,21.&780& (2(921& ! B08$9&;-9,/,,/86&C=%D$9-8E ! A-"*&B",,$+*6&C=%D$9-8E !"#2+264-51&#-(($,,21.&&780&(2(921 012),-1&-3&!"#2+2.&<",#:&A&?)%--@ 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+ B&CDDE;CDDF Ā© 2008-2009 !"#$%&'()*$+),$-+.&'+#/ Lucid Imagination, Inc. 47
  • 48. !"#$%&'()*$+),$-+&."/$+0//&1-%02 ;:00 <-=+2-)% ()*+,-,./+"0+,/.1) 2+,*.3.+4"5./*,.67*.1)/ & 8,++"& 3)2"04)%%&567 !"#0+0 89*:)%0 >9)#?0@-:* 2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@" !"#$$%&#$$' Ā© 2008-2009 A7:.4"B9;@.);*.1) 21)3.4+)*.;< !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% Lucid Imagination, Inc. 48
  • 49. Thank you http://www.lucidimagination.com Ā© 2008-2009 Lucid Imagination, Inc. 49
  • 50. Ā© 2008-2009 Lucid Imagination, Inc. 50