Solr Powered Lucene
             Erik Hatcher
           Lucid Imagination
  erik.hatcher@lucidimagination.com



  Northern Virginia Java Users Group
        December 16, 2009




                                       1
Erik Hatcher
• Member of Technical Staff, Lucid
  Imagination
• Apache Lucene/Solr Committer
• Member, Apache Software Foundation
• Co-author, Lucene in Action and Java
  Development with Ant (Manning)


                                         2
A word from our
           sponsor...
•   Pizza!

•   Lucid Imagination

    •   commercial entity exclusively dedicated to Apache Lucene/
        Solr open source search technology

    •   Services: Technical Support, Expert Link, Training, Consulting

    •   Tools: LucidGaze for Lucene and Solr

    •   Free certified distributions of Solr and Lucene

    •   more to come...



                                                                         3
What is Solr?
• Search server
• Built upon Apache Lucene (Java)
• Fast, very
• Scalable, query load and collection size
• Interoperable
• Extensible
                                             4
5
Solr Example
•   Start Solr

    •   java -jar start.jar (Apache Solr distro)

    •   lucidworks start (Lucid certified distro)

•   java -jar post.jar *.xml

•   HTML view (via Solritas)

    •   easily enabled in Apache distro

    •   built-in to Lucid certified distro


                                                   6
Solr History
• Created by Yonik Seeley for CNET
• Contributed to Apache in January 2006
• December 2006:Version 1.
• June 2007:Version 1.2
• September 2008:Version 1.3
• November 2009:Version 1.4
                                          7
Features
•   Lucene power exposed over HTTP

    •   Spell checking, highlighting,
        more-like-this

•   Caching

•   Replication

•   Faceting

•   Distributed search


                                        8
Solr APIs
• HTTP GET/POST (curl or any other HTTP
  client)
• JSON
• SolrJ (embedded or HTTP)
• Ruby: solr-ruby, RSolr, etc
• Many others: python, PHP, solrsharp, XSLT
                                              9
Deployment
                   Architecture
•   Scales from:

    •   single Solr server

    •   master/replicants(slaves)

    •   distributed shards

•   Each Solr instance can also
    have multiple cores




                                    10
Lucene Fundamentals




                      11
Concepts

• Index
• Document
• Field
• Terms (aka Tokens)

                       12
Inverted Index
                                From "Taming Text" by Grant Ingersoll and Tom Morton
•   Commonly used search
    engine data structure

•   Efficient lookup of terms
    across large number of
    documents

•   Usually stores positional
    information to enable
    phrase/proximity queries




                                                                                       13
14
What's in a token?




                     15
Lucene Scoring
        d1




    Θ        q1




                  16
Relevance
•   Term frequency (TF): number of times a term
    appears in a document

•   Inverse document frequency (IDF): One over
    number of times term appears in the index (1/
    df)

•   Field length normalization: control affect field
    length, in number of terms, has on score

•   Boost factors: terms, fields, or documents


                                                      17
Solr Core

• single primary index
• schema.xml / solrconfig.xml
• (optionally) multiple cores per Solr
  instance, configured in solr.xml
• other configuration and data files

                                         18
schema.xml
•   Field types
•   Fields
•   Unique key (optional*)
    * I've never seen a case that didn't require a
    unique identifier per document
•   copy fields
•   similarity and Solr query parser configuration


                                                     19
Schema Analysis
•   http://localhost:8983/solr/admin/analysis.jsp

•   Document analysis request handler:
    curl http://localhost:8983/solr/analysis/
    document --data-binary @ipod_video.xml -H
    'Content-type:text/xml; charset=utf-8'

•   Field analysis request handler:
    http://localhost:8983/solr/analysis/field?
    analysis.fieldtype=text&analysis.fieldvalu
    e=Foo%20Bar&q=foo&analysis.showmatch=true




                                                    20
solrconfig.xml
• Lucene indexing parameters
• Cache settings
• Request handler configuration
• HTTP cache settings
• Search components, response writers,
  query parsers


                                         21
Request handlers
• mini-“servlets”
• SearchHandler extensions chain search
  components
• Flexible response formatting:
 • &wt=[json, ruby, xslt, php, phps, javabin,
    python,velocity]


                                                22
Solr XML
<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field> <field name="features">iTunes, Podcasts,
Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of
               video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display
                         with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field> <field
name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless,
                         H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display,
      Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware,
      USB 2.0 compatibility, Playback speed control, Rechargeable capability,
      Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
</doc></add>




                                                                                     23
Indexing Solr XML

• Via curl:
  curl 'http://localhost:8983/solr/update?
  commit=true' --data-binary
  @ipod_video.xml -H 'Content-type:text/
  xml; charset=utf-8'

• Via Solr's Java-based post tool:
  java -jar post.jar ipod_video.xml




                                             24
Indexing CSV

curl 'http://localhost:8983/solr/update/csv?
commit=true' --data-binary @books.csv -H 'Content-
type:text/plain; charset=utf-8'




                                                     25
Content Streams
•   Allows Solr server to fetch local or remote data
    itself. Must enable remote streaming in
    solrconfig.xml
•   http://localhost:8983/solr/update

    •   ?stream.file=<local Solr path to
        exampledocs>/ipod_video.xml

    •   ?stream.url=<url to content>

•   Security warning: allows Solr to fetch arbitrary
    server-side file or network URL content


                                                       26
Indexing with SolrJ
SolrServer solr =
    new CommonsHttpSolrServer(
        new URL("http://localhost:8983/solr"));

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);

solr.commit();     // after a batch, not per document

solr.optimize();   // periodically, if/when needed




                                                        27
Indexing with solr-ruby
solr = Connection.new(
  'http://localhost:8983/solr',
  :autocommit => :on

solr.add(:id => 123,
         :title => 'Solr in Action')

solr.optimize   # periodically, as needed




                                            28
delete, update, etc
          •   Delete:

              •   <delete><id>05991</id></delete>

              •   <delete>
                    <query>category:Unused</query>
                  </delete>

              •   java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>"

          •   Update: simply <add> doc with same unique key

          •   <commit/> pending documents

          •   <optimize/> index, squeezes out deleted documents, collapses segments

          •   <rollback/> to last commit point
Update commands via GET: http://localhost:8983/solr/update?stream.body=<commit/>


                                                                                         29
Data Import Handler

•   Indexes relational database, XML data, and e-
    mail sources

•   Supports full and incremental/delta indexing

•   Highly extensible with custom data sources,
    transformers, etc

•   http://wiki.apache.org/solr/DataImportHandler



                                                    30
DIH details
• Put JDBC driver JAR in <solr-home>/lib,
  configure dataimport request handler
• http://localhost:8983/solr/db/admin/
  dataimport.jsp - debugging console
• http://localhost:8983/solr/db/dataimport?
  command=full-import - removes all
  documents and imports from scratch


                                              31
Solr Cell
• aka ExtractingRequestHandler
• leveraging Tika, extracts and indexes rich
    documents such as Word, PDF, HTML, and
    many other types
•   curl 'http://localhost:8983/solr/update/
    extract?literal.id=doc1&commit=true' -F
    "myfile=@tutorial.html"

• http://wiki.apache.org/solr/
    ExtractingRequestHandler


                                               32
Standard Search
          Request


• http://localhost:8983/solr/select?q=query




                                              33
Debug Query


• &debugQuery=true is your friend
• Includes parsed query, explanations, and
  search component timings in response




                                             34
Searching
• Send GET HTTP requests
  •   http://localhost:8983/solr/select?
      q=solr&start=0&rows=10&fl=id,name

• start: zero-based starting result
• rows: number of hits to return
• fl: list of stored fields to return

                                           35
Query Parser

• Controlled by defType parameter
 • &defType=lucene (actually a Solr
    extension of Lucene’s QueryParser)
  • &defType=dismax
• Local {!...} override syntax

                                         36
Solr Query Parser
•   http://lucene.apache.org/java/2_9_1/
    queryparsersyntax.html+ Solr extensions

•   Kitchen sink parser, includes advanced user-
    unfriendly syntax

•   Syntax errors throw parse exceptions back to
    client

•   Example: title:ipod* AND price:[0 TO 100]

•   http://wiki.apache.org/solr/SolrQuerySyntax


                                                   37
Dismax Query Parser
• Simplified syntax:
  loose text “quote phrases” -prohibited
  +required
• Spreads query terms across query fields
  (qf) with dynamic boosting per field, phrase
  construction (pf), and boosting query and
  function capabilities (bq and bf)


                                                38
Searching with SolrJ
SolrServer server = new
      CommonsHttpSolrServer("http://localhost:8983/solr");

SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);

QueryResponse response = server.query(params);

for (SolrDocument document : response.getResults()) {
      System.out.println("Doc: " + document);
}




                                                             39
Searching with Ruby
conn = Connection.new(
    'http://localhost:8983/solr')

conn.query('my query') do |hit|
  puts hit.inspect
end




                                    40
Built-in search
          components
• Standard: query, facet, mlt, highlight,
             stats, debug
• Others: elevation, clustering, term,
           term vector




                                            41
Faceting
•   Counts per subset within results

•   Facet on: field terms, queries,
    date ranges

•   &facet=on
    &facet.field=cat
    &facet.query=price:[0 TO 100]

•   http://wiki.apache.org/solr/
    SimpleFacetParameters


                                       42
Spell checking
• http://localhost:8983/solr/spell?
  q=epod&spellcheck=on&spellcheck.build
  =true
• File or index-based dictionaries
• Supports pluggable distance algorithms:
  Levenstein and JaroWinkler
• http://wiki.apache.org/solr/
                                            43
Highlighting

• http://localhost:8983/solr/select?
  q=apple&hl=on&hl.fl=*
• http://wiki.apache.org/solr/
  HighlightingParameters




                                       44
More Like This

• http://localhost:8983/solr/select?
  q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min
  df=1&mlt.mintf=1&fl=id,score,name
• http://wiki.apache.org/solr/MoreLikeThis


                                             45
Query Elevation
• http://localhost:8983/solr/elevate?
  q=ipod&debugQuery=true&enableElevation
  =true
• Configure an “elevate.xml” to boost/
  exclude specific documents
• http://wiki.apache.org/solr/
  QueryElevationComponent


                                           46
Clustering
•   Dynamic grouping of documents into labeled
    sets

•   http://localhost:8983/solr/clustering?
    q=*:*&rows=10

•   http://wiki.apache.org/solr/
    ClusteringComponent

•   Requires additional steps to install (see
    documentation) with Apache Solr distro


                                                 47
Terms

• Enumerates terms from specified fields
• http://localhost:8983/solr/terms?
  terms.fl=name&terms.sort=index&terms.pr
  efix=vi




                                           48
Term Vectors

• Details term vector information: term
  frequency, document frequency, position
  and offset information
• http://localhost:8983/solr/select/?q=*
  %3A*&qt=tvrh&tv=true&tv.all=true




                                            49
stats.jsp
• Not technically a “request handler”, outputs
  only XML
• http://localhost:8983/solr/admin/stats.jsp
• Index stats such as number of documents,
  searcher open time
• Request handler details, number of
  requests and errors, average request time,


                                                 50
Replication
• Master is polled
• Replicant pulls Lucene index and optionally
  also Solr configuration files
• Query throughput scaling: replicate and
  load balance
• http://wiki.apache.org/solr/SolrReplication

                                                51
Distributed Search

• Distribute documents to same-schema
  shards
• Scaling for when single index becomes too
  large, or a single query becomes too slow
• http://wiki.apache.org/solr/
  DistributedSearch



                                              52
What’s new in Solr 1.4?
•   Java-based replication   •   StatsComponent

•   VelocityResponseWriter   •   TermVectorComponent
    (Solritas)
                             •   Configurable Directory
•   AJAX-Solr                    provider

•   Logging switched to
    SLF4J

•   Rollback, since last
    commit


                                                         53
Lucene 2.9
•   IndexReader#reopen()

•   Faster filter performance, by 300% in some cases

•   Per-segment FieldCache

•   Reusable token streams

•   Faster numeric/date range queries, thanks to trie

•   and tons more, see Lucene 2.9's CHANGES.txt



                                                        54
Performance
       Improvements
• Caching
• Concurrent file access
• Per-segment index updates
• Faceting
• DocSet generation, avoids scoring
• Streaming updates for SolrJ
                                      55
Feature Improvements
• Rich document          • Multi-select faceting
  indexing
                         • Speedier range
• DataImportHandler        queries
  enhancements
                         • Duplicate detection
• Smoother replication
                         • New request handler
• More choices for         components
  logging



                                                   56
Resources
•   http://wiki.apache.org/solr

•   solr-user@lucene.apache.org



•   Lucid Imagination

    •   http://www.lucidimagination.com

    •   Articles, webinars, blogs, and...

    •   Search the Lucene ecosystem at:
        http://search.lucidimagination.com

    •   support@lucidimagination.com



                                             57
Shout Out




            58
e-book now available!
        print coming soon
http://www.manning.com/lucene

                                59
LucidWorks for Solr
•   Certified Distribution

•   Value-added integration

    •   KStemmer

    •   Carrot2 clustering

    •   LucidGaze for Solr

    •   installer

•   Reference Manual

•   Solr 1.4 certified distro coming soon!



                                            60
LucidGaze for Solr

• Monitoring tool, captures, stores, and
  interactively views Solr performance
  metrics
• requests/second
• time/request

                                           61
62
LucidFind




http://search.lucidimagination.com/?q=novajug


                                                63
64

Solr Powered Lucene

  • 1.
    Solr Powered Lucene Erik Hatcher Lucid Imagination erik.hatcher@lucidimagination.com Northern Virginia Java Users Group December 16, 2009 1
  • 2.
    Erik Hatcher • Memberof Technical Staff, Lucid Imagination • Apache Lucene/Solr Committer • Member, Apache Software Foundation • Co-author, Lucene in Action and Java Development with Ant (Manning) 2
  • 3.
    A word fromour sponsor... • Pizza! • Lucid Imagination • commercial entity exclusively dedicated to Apache Lucene/ Solr open source search technology • Services: Technical Support, Expert Link, Training, Consulting • Tools: LucidGaze for Lucene and Solr • Free certified distributions of Solr and Lucene • more to come... 3
  • 4.
    What is Solr? •Search server • Built upon Apache Lucene (Java) • Fast, very • Scalable, query load and collection size • Interoperable • Extensible 4
  • 5.
  • 6.
    Solr Example • Start Solr • java -jar start.jar (Apache Solr distro) • lucidworks start (Lucid certified distro) • java -jar post.jar *.xml • HTML view (via Solritas) • easily enabled in Apache distro • built-in to Lucid certified distro 6
  • 7.
    Solr History • Createdby Yonik Seeley for CNET • Contributed to Apache in January 2006 • December 2006:Version 1. • June 2007:Version 1.2 • September 2008:Version 1.3 • November 2009:Version 1.4 7
  • 8.
    Features • Lucene power exposed over HTTP • Spell checking, highlighting, more-like-this • Caching • Replication • Faceting • Distributed search 8
  • 9.
    Solr APIs • HTTPGET/POST (curl or any other HTTP client) • JSON • SolrJ (embedded or HTTP) • Ruby: solr-ruby, RSolr, etc • Many others: python, PHP, solrsharp, XSLT 9
  • 10.
    Deployment Architecture • Scales from: • single Solr server • master/replicants(slaves) • distributed shards • Each Solr instance can also have multiple cores 10
  • 11.
  • 12.
    Concepts • Index • Document •Field • Terms (aka Tokens) 12
  • 13.
    Inverted Index From "Taming Text" by Grant Ingersoll and Tom Morton • Commonly used search engine data structure • Efficient lookup of terms across large number of documents • Usually stores positional information to enable phrase/proximity queries 13
  • 14.
  • 15.
    What's in atoken? 15
  • 16.
    Lucene Scoring d1 Θ q1 16
  • 17.
    Relevance • Term frequency (TF): number of times a term appears in a document • Inverse document frequency (IDF): One over number of times term appears in the index (1/ df) • Field length normalization: control affect field length, in number of terms, has on score • Boost factors: terms, fields, or documents 17
  • 18.
    Solr Core • singleprimary index • schema.xml / solrconfig.xml • (optionally) multiple cores per Solr instance, configured in solr.xml • other configuration and data files 18
  • 19.
    schema.xml • Field types • Fields • Unique key (optional*) * I've never seen a case that didn't require a unique identifier per document • copy fields • similarity and Solr query parser configuration 19
  • 20.
    Schema Analysis • http://localhost:8983/solr/admin/analysis.jsp • Document analysis request handler: curl http://localhost:8983/solr/analysis/ document --data-binary @ipod_video.xml -H 'Content-type:text/xml; charset=utf-8' • Field analysis request handler: http://localhost:8983/solr/analysis/field? analysis.fieldtype=text&analysis.fieldvalu e=Foo%20Bar&q=foo&analysis.showmatch=true 20
  • 21.
    solrconfig.xml • Lucene indexingparameters • Cache settings • Request handler configuration • HTTP cache settings • Search components, response writers, query parsers 21
  • 22.
    Request handlers • mini-“servlets” •SearchHandler extensions chain search components • Flexible response formatting: • &wt=[json, ruby, xslt, php, phps, javabin, python,velocity] 22
  • 23.
    Solr XML <add><doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> 23
  • 24.
    Indexing Solr XML •Via curl: curl 'http://localhost:8983/solr/update? commit=true' --data-binary @ipod_video.xml -H 'Content-type:text/ xml; charset=utf-8' • Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml 24
  • 25.
    Indexing CSV curl 'http://localhost:8983/solr/update/csv? commit=true'--data-binary @books.csv -H 'Content- type:text/plain; charset=utf-8' 25
  • 26.
    Content Streams • Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml • http://localhost:8983/solr/update • ?stream.file=<local Solr path to exampledocs>/ipod_video.xml • ?stream.url=<url to content> • Security warning: allows Solr to fetch arbitrary server-side file or network URL content 26
  • 27.
    Indexing with SolrJ SolrServersolr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "EXAMPLEDOC01"); doc.addField("title", "NOVAJUG SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, if/when needed 27
  • 28.
    Indexing with solr-ruby solr= Connection.new( 'http://localhost:8983/solr', :autocommit => :on solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed 28
  • 29.
    delete, update, etc • Delete: • <delete><id>05991</id></delete> • <delete> <query>category:Unused</query> </delete> • java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" • Update: simply <add> doc with same unique key • <commit/> pending documents • <optimize/> index, squeezes out deleted documents, collapses segments • <rollback/> to last commit point Update commands via GET: http://localhost:8983/solr/update?stream.body=<commit/> 29
  • 30.
    Data Import Handler • Indexes relational database, XML data, and e- mail sources • Supports full and incremental/delta indexing • Highly extensible with custom data sources, transformers, etc • http://wiki.apache.org/solr/DataImportHandler 30
  • 31.
    DIH details • PutJDBC driver JAR in <solr-home>/lib, configure dataimport request handler • http://localhost:8983/solr/db/admin/ dataimport.jsp - debugging console • http://localhost:8983/solr/db/dataimport? command=full-import - removes all documents and imports from scratch 31
  • 32.
    Solr Cell • akaExtractingRequestHandler • leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types • curl 'http://localhost:8983/solr/update/ extract?literal.id=doc1&commit=true' -F "myfile=@tutorial.html" • http://wiki.apache.org/solr/ ExtractingRequestHandler 32
  • 33.
    Standard Search Request • http://localhost:8983/solr/select?q=query 33
  • 34.
    Debug Query • &debugQuery=trueis your friend • Includes parsed query, explanations, and search component timings in response 34
  • 35.
    Searching • Send GETHTTP requests • http://localhost:8983/solr/select? q=solr&start=0&rows=10&fl=id,name • start: zero-based starting result • rows: number of hits to return • fl: list of stored fields to return 35
  • 36.
    Query Parser • Controlledby defType parameter • &defType=lucene (actually a Solr extension of Lucene’s QueryParser) • &defType=dismax • Local {!...} override syntax 36
  • 37.
    Solr Query Parser • http://lucene.apache.org/java/2_9_1/ queryparsersyntax.html+ Solr extensions • Kitchen sink parser, includes advanced user- unfriendly syntax • Syntax errors throw parse exceptions back to client • Example: title:ipod* AND price:[0 TO 100] • http://wiki.apache.org/solr/SolrQuerySyntax 37
  • 38.
    Dismax Query Parser •Simplified syntax: loose text “quote phrases” -prohibited +required • Spreads query terms across query fields (qf) with dynamic boosting per field, phrase construction (pf), and boosting query and function capabilities (bq and bf) 38
  • 39.
    Searching with SolrJ SolrServerserver = new CommonsHttpSolrServer("http://localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } 39
  • 40.
    Searching with Ruby conn= Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end 40
  • 41.
    Built-in search components • Standard: query, facet, mlt, highlight, stats, debug • Others: elevation, clustering, term, term vector 41
  • 42.
    Faceting • Counts per subset within results • Facet on: field terms, queries, date ranges • &facet=on &facet.field=cat &facet.query=price:[0 TO 100] • http://wiki.apache.org/solr/ SimpleFacetParameters 42
  • 43.
    Spell checking • http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build =true • File or index-based dictionaries • Supports pluggable distance algorithms: Levenstein and JaroWinkler • http://wiki.apache.org/solr/ 43
  • 44.
    Highlighting • http://localhost:8983/solr/select? q=apple&hl=on&hl.fl=* • http://wiki.apache.org/solr/ HighlightingParameters 44
  • 45.
    More Like This •http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min df=1&mlt.mintf=1&fl=id,score,name • http://wiki.apache.org/solr/MoreLikeThis 45
  • 46.
    Query Elevation • http://localhost:8983/solr/elevate? q=ipod&debugQuery=true&enableElevation =true • Configure an “elevate.xml” to boost/ exclude specific documents • http://wiki.apache.org/solr/ QueryElevationComponent 46
  • 47.
    Clustering • Dynamic grouping of documents into labeled sets • http://localhost:8983/solr/clustering? q=*:*&rows=10 • http://wiki.apache.org/solr/ ClusteringComponent • Requires additional steps to install (see documentation) with Apache Solr distro 47
  • 48.
    Terms • Enumerates termsfrom specified fields • http://localhost:8983/solr/terms? terms.fl=name&terms.sort=index&terms.pr efix=vi 48
  • 49.
    Term Vectors • Detailsterm vector information: term frequency, document frequency, position and offset information • http://localhost:8983/solr/select/?q=* %3A*&qt=tvrh&tv=true&tv.all=true 49
  • 50.
    stats.jsp • Not technicallya “request handler”, outputs only XML • http://localhost:8983/solr/admin/stats.jsp • Index stats such as number of documents, searcher open time • Request handler details, number of requests and errors, average request time, 50
  • 51.
    Replication • Master ispolled • Replicant pulls Lucene index and optionally also Solr configuration files • Query throughput scaling: replicate and load balance • http://wiki.apache.org/solr/SolrReplication 51
  • 52.
    Distributed Search • Distributedocuments to same-schema shards • Scaling for when single index becomes too large, or a single query becomes too slow • http://wiki.apache.org/solr/ DistributedSearch 52
  • 53.
    What’s new inSolr 1.4? • Java-based replication • StatsComponent • VelocityResponseWriter • TermVectorComponent (Solritas) • Configurable Directory • AJAX-Solr provider • Logging switched to SLF4J • Rollback, since last commit 53
  • 54.
    Lucene 2.9 • IndexReader#reopen() • Faster filter performance, by 300% in some cases • Per-segment FieldCache • Reusable token streams • Faster numeric/date range queries, thanks to trie • and tons more, see Lucene 2.9's CHANGES.txt 54
  • 55.
    Performance Improvements • Caching • Concurrent file access • Per-segment index updates • Faceting • DocSet generation, avoids scoring • Streaming updates for SolrJ 55
  • 56.
    Feature Improvements • Richdocument • Multi-select faceting indexing • Speedier range • DataImportHandler queries enhancements • Duplicate detection • Smoother replication • New request handler • More choices for components logging 56
  • 57.
    Resources • http://wiki.apache.org/solr • solr-user@lucene.apache.org • Lucid Imagination • http://www.lucidimagination.com • Articles, webinars, blogs, and... • Search the Lucene ecosystem at: http://search.lucidimagination.com • support@lucidimagination.com 57
  • 58.
  • 59.
    e-book now available! print coming soon http://www.manning.com/lucene 59
  • 60.
    LucidWorks for Solr • Certified Distribution • Value-added integration • KStemmer • Carrot2 clustering • LucidGaze for Solr • installer • Reference Manual • Solr 1.4 certified distro coming soon! 60
  • 61.
    LucidGaze for Solr •Monitoring tool, captures, stores, and interactively views Solr performance metrics • requests/second • time/request 61
  • 62.
  • 63.
  • 64.