SlideShare a Scribd company logo
1 of 64
Download to read offline
Solr Powered Lucene
             Erik Hatcher
           Lucid Imagination
  erik.hatcher@lucidimagination.com



  Northern Virginia Java Users Group
        December 16, 2009




                                       1
Erik Hatcher
• Member of Technical Staff, Lucid
  Imagination
• Apache Lucene/Solr Committer
• Member, Apache Software Foundation
• Co-author, Lucene in Action and Java
  Development with Ant (Manning)


                                         2
A word from our
           sponsor...
•   Pizza!

•   Lucid Imagination

    •   commercial entity exclusively dedicated to Apache Lucene/
        Solr open source search technology

    •   Services: Technical Support, Expert Link, Training, Consulting

    •   Tools: LucidGaze for Lucene and Solr

    •   Free certified distributions of Solr and Lucene

    •   more to come...



                                                                         3
What is Solr?
• Search server
• Built upon Apache Lucene (Java)
• Fast, very
• Scalable, query load and collection size
• Interoperable
• Extensible
                                             4
5
Solr Example
•   Start Solr

    •   java -jar start.jar (Apache Solr distro)

    •   lucidworks start (Lucid certified distro)

•   java -jar post.jar *.xml

•   HTML view (via Solritas)

    •   easily enabled in Apache distro

    •   built-in to Lucid certified distro


                                                   6
Solr History
• Created by Yonik Seeley for CNET
• Contributed to Apache in January 2006
• December 2006:Version 1.
• June 2007:Version 1.2
• September 2008:Version 1.3
• November 2009:Version 1.4
                                          7
Features
•   Lucene power exposed over HTTP

    •   Spell checking, highlighting,
        more-like-this

•   Caching

•   Replication

•   Faceting

•   Distributed search


                                        8
Solr APIs
• HTTP GET/POST (curl or any other HTTP
  client)
• JSON
• SolrJ (embedded or HTTP)
• Ruby: solr-ruby, RSolr, etc
• Many others: python, PHP, solrsharp, XSLT
                                              9
Deployment
                   Architecture
•   Scales from:

    •   single Solr server

    •   master/replicants(slaves)

    •   distributed shards

•   Each Solr instance can also
    have multiple cores




                                    10
Lucene Fundamentals




                      11
Concepts

• Index
• Document
• Field
• Terms (aka Tokens)

                       12
Inverted Index
                                From "Taming Text" by Grant Ingersoll and Tom Morton
•   Commonly used search
    engine data structure

•   Efficient lookup of terms
    across large number of
    documents

•   Usually stores positional
    information to enable
    phrase/proximity queries




                                                                                       13
14
What's in a token?




                     15
Lucene Scoring
        d1




    Θ        q1




                  16
Relevance
•   Term frequency (TF): number of times a term
    appears in a document

•   Inverse document frequency (IDF): One over
    number of times term appears in the index (1/
    df)

•   Field length normalization: control affect field
    length, in number of terms, has on score

•   Boost factors: terms, fields, or documents


                                                      17
Solr Core

• single primary index
• schema.xml / solrconfig.xml
• (optionally) multiple cores per Solr
  instance, configured in solr.xml
• other configuration and data files

                                         18
schema.xml
•   Field types
•   Fields
•   Unique key (optional*)
    * I've never seen a case that didn't require a
    unique identifier per document
•   copy fields
•   similarity and Solr query parser configuration


                                                     19
Schema Analysis
•   http://localhost:8983/solr/admin/analysis.jsp

•   Document analysis request handler:
    curl http://localhost:8983/solr/analysis/
    document --data-binary @ipod_video.xml -H
    'Content-type:text/xml; charset=utf-8'

•   Field analysis request handler:
    http://localhost:8983/solr/analysis/field?
    analysis.fieldtype=text&analysis.fieldvalu
    e=Foo%20Bar&q=foo&analysis.showmatch=true




                                                    20
solrconfig.xml
• Lucene indexing parameters
• Cache settings
• Request handler configuration
• HTTP cache settings
• Search components, response writers,
  query parsers


                                         21
Request handlers
• mini-“servlets”
• SearchHandler extensions chain search
  components
• Flexible response formatting:
 • &wt=[json, ruby, xslt, php, phps, javabin,
    python,velocity]


                                                22
Solr XML
<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field> <field name="features">iTunes, Podcasts,
Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of
               video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display
                         with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field> <field
name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless,
                         H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display,
      Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware,
      USB 2.0 compatibility, Playback speed control, Rechargeable capability,
      Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
</doc></add>




                                                                                     23
Indexing Solr XML

• Via curl:
  curl 'http://localhost:8983/solr/update?
  commit=true' --data-binary
  @ipod_video.xml -H 'Content-type:text/
  xml; charset=utf-8'

• Via Solr's Java-based post tool:
  java -jar post.jar ipod_video.xml




                                             24
Indexing CSV

curl 'http://localhost:8983/solr/update/csv?
commit=true' --data-binary @books.csv -H 'Content-
type:text/plain; charset=utf-8'




                                                     25
Content Streams
•   Allows Solr server to fetch local or remote data
    itself. Must enable remote streaming in
    solrconfig.xml
•   http://localhost:8983/solr/update

    •   ?stream.file=<local Solr path to
        exampledocs>/ipod_video.xml

    •   ?stream.url=<url to content>

•   Security warning: allows Solr to fetch arbitrary
    server-side file or network URL content


                                                       26
Indexing with SolrJ
SolrServer solr =
    new CommonsHttpSolrServer(
        new URL("http://localhost:8983/solr"));

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);

solr.commit();     // after a batch, not per document

solr.optimize();   // periodically, if/when needed




                                                        27
Indexing with solr-ruby
solr = Connection.new(
  'http://localhost:8983/solr',
  :autocommit => :on

solr.add(:id => 123,
         :title => 'Solr in Action')

solr.optimize   # periodically, as needed




                                            28
delete, update, etc
          •   Delete:

              •   <delete><id>05991</id></delete>

              •   <delete>
                    <query>category:Unused</query>
                  </delete>

              •   java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>"

          •   Update: simply <add> doc with same unique key

          •   <commit/> pending documents

          •   <optimize/> index, squeezes out deleted documents, collapses segments

          •   <rollback/> to last commit point
Update commands via GET: http://localhost:8983/solr/update?stream.body=<commit/>


                                                                                         29
Data Import Handler

•   Indexes relational database, XML data, and e-
    mail sources

•   Supports full and incremental/delta indexing

•   Highly extensible with custom data sources,
    transformers, etc

•   http://wiki.apache.org/solr/DataImportHandler



                                                    30
DIH details
• Put JDBC driver JAR in <solr-home>/lib,
  configure dataimport request handler
• http://localhost:8983/solr/db/admin/
  dataimport.jsp - debugging console
• http://localhost:8983/solr/db/dataimport?
  command=full-import - removes all
  documents and imports from scratch


                                              31
Solr Cell
• aka ExtractingRequestHandler
• leveraging Tika, extracts and indexes rich
    documents such as Word, PDF, HTML, and
    many other types
•   curl 'http://localhost:8983/solr/update/
    extract?literal.id=doc1&commit=true' -F
    "myfile=@tutorial.html"

• http://wiki.apache.org/solr/
    ExtractingRequestHandler


                                               32
Standard Search
          Request


• http://localhost:8983/solr/select?q=query




                                              33
Debug Query


• &debugQuery=true is your friend
• Includes parsed query, explanations, and
  search component timings in response




                                             34
Searching
• Send GET HTTP requests
  •   http://localhost:8983/solr/select?
      q=solr&start=0&rows=10&fl=id,name

• start: zero-based starting result
• rows: number of hits to return
• fl: list of stored fields to return

                                           35
Query Parser

• Controlled by defType parameter
 • &defType=lucene (actually a Solr
    extension of Lucene’s QueryParser)
  • &defType=dismax
• Local {!...} override syntax

                                         36
Solr Query Parser
•   http://lucene.apache.org/java/2_9_1/
    queryparsersyntax.html+ Solr extensions

•   Kitchen sink parser, includes advanced user-
    unfriendly syntax

•   Syntax errors throw parse exceptions back to
    client

•   Example: title:ipod* AND price:[0 TO 100]

•   http://wiki.apache.org/solr/SolrQuerySyntax


                                                   37
Dismax Query Parser
• Simplified syntax:
  loose text “quote phrases” -prohibited
  +required
• Spreads query terms across query fields
  (qf) with dynamic boosting per field, phrase
  construction (pf), and boosting query and
  function capabilities (bq and bf)


                                                38
Searching with SolrJ
SolrServer server = new
      CommonsHttpSolrServer("http://localhost:8983/solr");

SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);

QueryResponse response = server.query(params);

for (SolrDocument document : response.getResults()) {
      System.out.println("Doc: " + document);
}




                                                             39
Searching with Ruby
conn = Connection.new(
    'http://localhost:8983/solr')

conn.query('my query') do |hit|
  puts hit.inspect
end




                                    40
Built-in search
          components
• Standard: query, facet, mlt, highlight,
             stats, debug
• Others: elevation, clustering, term,
           term vector




                                            41
Faceting
•   Counts per subset within results

•   Facet on: field terms, queries,
    date ranges

•   &facet=on
    &facet.field=cat
    &facet.query=price:[0 TO 100]

•   http://wiki.apache.org/solr/
    SimpleFacetParameters


                                       42
Spell checking
• http://localhost:8983/solr/spell?
  q=epod&spellcheck=on&spellcheck.build
  =true
• File or index-based dictionaries
• Supports pluggable distance algorithms:
  Levenstein and JaroWinkler
• http://wiki.apache.org/solr/
                                            43
Highlighting

• http://localhost:8983/solr/select?
  q=apple&hl=on&hl.fl=*
• http://wiki.apache.org/solr/
  HighlightingParameters




                                       44
More Like This

• http://localhost:8983/solr/select?
  q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min
  df=1&mlt.mintf=1&fl=id,score,name
• http://wiki.apache.org/solr/MoreLikeThis


                                             45
Query Elevation
• http://localhost:8983/solr/elevate?
  q=ipod&debugQuery=true&enableElevation
  =true
• Configure an “elevate.xml” to boost/
  exclude specific documents
• http://wiki.apache.org/solr/
  QueryElevationComponent


                                           46
Clustering
•   Dynamic grouping of documents into labeled
    sets

•   http://localhost:8983/solr/clustering?
    q=*:*&rows=10

•   http://wiki.apache.org/solr/
    ClusteringComponent

•   Requires additional steps to install (see
    documentation) with Apache Solr distro


                                                 47
Terms

• Enumerates terms from specified fields
• http://localhost:8983/solr/terms?
  terms.fl=name&terms.sort=index&terms.pr
  efix=vi




                                           48
Term Vectors

• Details term vector information: term
  frequency, document frequency, position
  and offset information
• http://localhost:8983/solr/select/?q=*
  %3A*&qt=tvrh&tv=true&tv.all=true




                                            49
stats.jsp
• Not technically a “request handler”, outputs
  only XML
• http://localhost:8983/solr/admin/stats.jsp
• Index stats such as number of documents,
  searcher open time
• Request handler details, number of
  requests and errors, average request time,


                                                 50
Replication
• Master is polled
• Replicant pulls Lucene index and optionally
  also Solr configuration files
• Query throughput scaling: replicate and
  load balance
• http://wiki.apache.org/solr/SolrReplication

                                                51
Distributed Search

• Distribute documents to same-schema
  shards
• Scaling for when single index becomes too
  large, or a single query becomes too slow
• http://wiki.apache.org/solr/
  DistributedSearch



                                              52
What’s new in Solr 1.4?
•   Java-based replication   •   StatsComponent

•   VelocityResponseWriter   •   TermVectorComponent
    (Solritas)
                             •   Configurable Directory
•   AJAX-Solr                    provider

•   Logging switched to
    SLF4J

•   Rollback, since last
    commit


                                                         53
Lucene 2.9
•   IndexReader#reopen()

•   Faster filter performance, by 300% in some cases

•   Per-segment FieldCache

•   Reusable token streams

•   Faster numeric/date range queries, thanks to trie

•   and tons more, see Lucene 2.9's CHANGES.txt



                                                        54
Performance
       Improvements
• Caching
• Concurrent file access
• Per-segment index updates
• Faceting
• DocSet generation, avoids scoring
• Streaming updates for SolrJ
                                      55
Feature Improvements
• Rich document          • Multi-select faceting
  indexing
                         • Speedier range
• DataImportHandler        queries
  enhancements
                         • Duplicate detection
• Smoother replication
                         • New request handler
• More choices for         components
  logging



                                                   56
Resources
•   http://wiki.apache.org/solr

•   solr-user@lucene.apache.org



•   Lucid Imagination

    •   http://www.lucidimagination.com

    •   Articles, webinars, blogs, and...

    •   Search the Lucene ecosystem at:
        http://search.lucidimagination.com

    •   support@lucidimagination.com



                                             57
Shout Out




            58
e-book now available!
        print coming soon
http://www.manning.com/lucene

                                59
LucidWorks for Solr
•   Certified Distribution

•   Value-added integration

    •   KStemmer

    •   Carrot2 clustering

    •   LucidGaze for Solr

    •   installer

•   Reference Manual

•   Solr 1.4 certified distro coming soon!



                                            60
LucidGaze for Solr

• Monitoring tool, captures, stores, and
  interactively views Solr performance
  metrics
• requests/second
• time/request

                                           61
62
LucidFind




http://search.lucidimagination.com/?q=novajug


                                                63
64

More Related Content

What's hot

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 

What's hot (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Solr 4
Solr 4Solr 4
Solr 4
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 

Viewers also liked

Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Beyond text similarity
Beyond text similarityBeyond text similarity
Beyond text similaritychristianuhlcc
 
How we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used ElasticsearchHow we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used ElasticsearchMichael Stockerl
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondLucidworks
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoLucidworks
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache SolrEmpathyBroker
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaDropsolid
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrLucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 

Viewers also liked (20)

Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Beyond text similarity
Beyond text similarityBeyond text similarity
Beyond text similarity
 
How we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used ElasticsearchHow we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used Elasticsearch
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
Solr5
Solr5Solr5
Solr5
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and Beyond
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache Solr
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with Solr
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Seminario Apache Solr
Seminario Apache SolrSeminario Apache Solr
Seminario Apache Solr
 
Formación apache Solr
Formación apache SolrFormación apache Solr
Formación apache Solr
 

Similar to Solr Powered Lucene

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 

Similar to Solr Powered Lucene (20)

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr
SolrSolr
Solr
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
SOLR
SOLRSOLR
SOLR
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 

More from Erik Hatcher

Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

More from Erik Hatcher (13)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Recently uploaded

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 

Recently uploaded (20)

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 

Solr Powered Lucene

  • 1. Solr Powered Lucene Erik Hatcher Lucid Imagination erik.hatcher@lucidimagination.com Northern Virginia Java Users Group December 16, 2009 1
  • 2. Erik Hatcher • Member of Technical Staff, Lucid Imagination • Apache Lucene/Solr Committer • Member, Apache Software Foundation • Co-author, Lucene in Action and Java Development with Ant (Manning) 2
  • 3. A word from our sponsor... • Pizza! • Lucid Imagination • commercial entity exclusively dedicated to Apache Lucene/ Solr open source search technology • Services: Technical Support, Expert Link, Training, Consulting • Tools: LucidGaze for Lucene and Solr • Free certified distributions of Solr and Lucene • more to come... 3
  • 4. What is Solr? • Search server • Built upon Apache Lucene (Java) • Fast, very • Scalable, query load and collection size • Interoperable • Extensible 4
  • 5. 5
  • 6. Solr Example • Start Solr • java -jar start.jar (Apache Solr distro) • lucidworks start (Lucid certified distro) • java -jar post.jar *.xml • HTML view (via Solritas) • easily enabled in Apache distro • built-in to Lucid certified distro 6
  • 7. Solr History • Created by Yonik Seeley for CNET • Contributed to Apache in January 2006 • December 2006:Version 1. • June 2007:Version 1.2 • September 2008:Version 1.3 • November 2009:Version 1.4 7
  • 8. Features • Lucene power exposed over HTTP • Spell checking, highlighting, more-like-this • Caching • Replication • Faceting • Distributed search 8
  • 9. Solr APIs • HTTP GET/POST (curl or any other HTTP client) • JSON • SolrJ (embedded or HTTP) • Ruby: solr-ruby, RSolr, etc • Many others: python, PHP, solrsharp, XSLT 9
  • 10. Deployment Architecture • Scales from: • single Solr server • master/replicants(slaves) • distributed shards • Each Solr instance can also have multiple cores 10
  • 12. Concepts • Index • Document • Field • Terms (aka Tokens) 12
  • 13. Inverted Index From "Taming Text" by Grant Ingersoll and Tom Morton • Commonly used search engine data structure • Efficient lookup of terms across large number of documents • Usually stores positional information to enable phrase/proximity queries 13
  • 14. 14
  • 15. What's in a token? 15
  • 16. Lucene Scoring d1 Θ q1 16
  • 17. Relevance • Term frequency (TF): number of times a term appears in a document • Inverse document frequency (IDF): One over number of times term appears in the index (1/ df) • Field length normalization: control affect field length, in number of terms, has on score • Boost factors: terms, fields, or documents 17
  • 18. Solr Core • single primary index • schema.xml / solrconfig.xml • (optionally) multiple cores per Solr instance, configured in solr.xml • other configuration and data files 18
  • 19. schema.xml • Field types • Fields • Unique key (optional*) * I've never seen a case that didn't require a unique identifier per document • copy fields • similarity and Solr query parser configuration 19
  • 20. Schema Analysis • http://localhost:8983/solr/admin/analysis.jsp • Document analysis request handler: curl http://localhost:8983/solr/analysis/ document --data-binary @ipod_video.xml -H 'Content-type:text/xml; charset=utf-8' • Field analysis request handler: http://localhost:8983/solr/analysis/field? analysis.fieldtype=text&analysis.fieldvalu e=Foo%20Bar&q=foo&analysis.showmatch=true 20
  • 21. solrconfig.xml • Lucene indexing parameters • Cache settings • Request handler configuration • HTTP cache settings • Search components, response writers, query parsers 21
  • 22. Request handlers • mini-“servlets” • SearchHandler extensions chain search components • Flexible response formatting: • &wt=[json, ruby, xslt, php, phps, javabin, python,velocity] 22
  • 23. Solr XML <add><doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> 23
  • 24. Indexing Solr XML • Via curl: curl 'http://localhost:8983/solr/update? commit=true' --data-binary @ipod_video.xml -H 'Content-type:text/ xml; charset=utf-8' • Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml 24
  • 25. Indexing CSV curl 'http://localhost:8983/solr/update/csv? commit=true' --data-binary @books.csv -H 'Content- type:text/plain; charset=utf-8' 25
  • 26. Content Streams • Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml • http://localhost:8983/solr/update • ?stream.file=<local Solr path to exampledocs>/ipod_video.xml • ?stream.url=<url to content> • Security warning: allows Solr to fetch arbitrary server-side file or network URL content 26
  • 27. Indexing with SolrJ SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "EXAMPLEDOC01"); doc.addField("title", "NOVAJUG SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, if/when needed 27
  • 28. Indexing with solr-ruby solr = Connection.new( 'http://localhost:8983/solr', :autocommit => :on solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed 28
  • 29. delete, update, etc • Delete: • <delete><id>05991</id></delete> • <delete> <query>category:Unused</query> </delete> • java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" • Update: simply <add> doc with same unique key • <commit/> pending documents • <optimize/> index, squeezes out deleted documents, collapses segments • <rollback/> to last commit point Update commands via GET: http://localhost:8983/solr/update?stream.body=<commit/> 29
  • 30. Data Import Handler • Indexes relational database, XML data, and e- mail sources • Supports full and incremental/delta indexing • Highly extensible with custom data sources, transformers, etc • http://wiki.apache.org/solr/DataImportHandler 30
  • 31. DIH details • Put JDBC driver JAR in <solr-home>/lib, configure dataimport request handler • http://localhost:8983/solr/db/admin/ dataimport.jsp - debugging console • http://localhost:8983/solr/db/dataimport? command=full-import - removes all documents and imports from scratch 31
  • 32. Solr Cell • aka ExtractingRequestHandler • leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types • curl 'http://localhost:8983/solr/update/ extract?literal.id=doc1&commit=true' -F "myfile=@tutorial.html" • http://wiki.apache.org/solr/ ExtractingRequestHandler 32
  • 33. Standard Search Request • http://localhost:8983/solr/select?q=query 33
  • 34. Debug Query • &debugQuery=true is your friend • Includes parsed query, explanations, and search component timings in response 34
  • 35. Searching • Send GET HTTP requests • http://localhost:8983/solr/select? q=solr&start=0&rows=10&fl=id,name • start: zero-based starting result • rows: number of hits to return • fl: list of stored fields to return 35
  • 36. Query Parser • Controlled by defType parameter • &defType=lucene (actually a Solr extension of Lucene’s QueryParser) • &defType=dismax • Local {!...} override syntax 36
  • 37. Solr Query Parser • http://lucene.apache.org/java/2_9_1/ queryparsersyntax.html+ Solr extensions • Kitchen sink parser, includes advanced user- unfriendly syntax • Syntax errors throw parse exceptions back to client • Example: title:ipod* AND price:[0 TO 100] • http://wiki.apache.org/solr/SolrQuerySyntax 37
  • 38. Dismax Query Parser • Simplified syntax: loose text “quote phrases” -prohibited +required • Spreads query terms across query fields (qf) with dynamic boosting per field, phrase construction (pf), and boosting query and function capabilities (bq and bf) 38
  • 39. Searching with SolrJ SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } 39
  • 40. Searching with Ruby conn = Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end 40
  • 41. Built-in search components • Standard: query, facet, mlt, highlight, stats, debug • Others: elevation, clustering, term, term vector 41
  • 42. Faceting • Counts per subset within results • Facet on: field terms, queries, date ranges • &facet=on &facet.field=cat &facet.query=price:[0 TO 100] • http://wiki.apache.org/solr/ SimpleFacetParameters 42
  • 43. Spell checking • http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build =true • File or index-based dictionaries • Supports pluggable distance algorithms: Levenstein and JaroWinkler • http://wiki.apache.org/solr/ 43
  • 44. Highlighting • http://localhost:8983/solr/select? q=apple&hl=on&hl.fl=* • http://wiki.apache.org/solr/ HighlightingParameters 44
  • 45. More Like This • http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min df=1&mlt.mintf=1&fl=id,score,name • http://wiki.apache.org/solr/MoreLikeThis 45
  • 46. Query Elevation • http://localhost:8983/solr/elevate? q=ipod&debugQuery=true&enableElevation =true • Configure an “elevate.xml” to boost/ exclude specific documents • http://wiki.apache.org/solr/ QueryElevationComponent 46
  • 47. Clustering • Dynamic grouping of documents into labeled sets • http://localhost:8983/solr/clustering? q=*:*&rows=10 • http://wiki.apache.org/solr/ ClusteringComponent • Requires additional steps to install (see documentation) with Apache Solr distro 47
  • 48. Terms • Enumerates terms from specified fields • http://localhost:8983/solr/terms? terms.fl=name&terms.sort=index&terms.pr efix=vi 48
  • 49. Term Vectors • Details term vector information: term frequency, document frequency, position and offset information • http://localhost:8983/solr/select/?q=* %3A*&qt=tvrh&tv=true&tv.all=true 49
  • 50. stats.jsp • Not technically a “request handler”, outputs only XML • http://localhost:8983/solr/admin/stats.jsp • Index stats such as number of documents, searcher open time • Request handler details, number of requests and errors, average request time, 50
  • 51. Replication • Master is polled • Replicant pulls Lucene index and optionally also Solr configuration files • Query throughput scaling: replicate and load balance • http://wiki.apache.org/solr/SolrReplication 51
  • 52. Distributed Search • Distribute documents to same-schema shards • Scaling for when single index becomes too large, or a single query becomes too slow • http://wiki.apache.org/solr/ DistributedSearch 52
  • 53. What’s new in Solr 1.4? • Java-based replication • StatsComponent • VelocityResponseWriter • TermVectorComponent (Solritas) • Configurable Directory • AJAX-Solr provider • Logging switched to SLF4J • Rollback, since last commit 53
  • 54. Lucene 2.9 • IndexReader#reopen() • Faster filter performance, by 300% in some cases • Per-segment FieldCache • Reusable token streams • Faster numeric/date range queries, thanks to trie • and tons more, see Lucene 2.9's CHANGES.txt 54
  • 55. Performance Improvements • Caching • Concurrent file access • Per-segment index updates • Faceting • DocSet generation, avoids scoring • Streaming updates for SolrJ 55
  • 56. Feature Improvements • Rich document • Multi-select faceting indexing • Speedier range • DataImportHandler queries enhancements • Duplicate detection • Smoother replication • New request handler • More choices for components logging 56
  • 57. Resources • http://wiki.apache.org/solr • solr-user@lucene.apache.org • Lucid Imagination • http://www.lucidimagination.com • Articles, webinars, blogs, and... • Search the Lucene ecosystem at: http://search.lucidimagination.com • support@lucidimagination.com 57
  • 58. Shout Out 58
  • 59. e-book now available! print coming soon http://www.manning.com/lucene 59
  • 60. LucidWorks for Solr • Certified Distribution • Value-added integration • KStemmer • Carrot2 clustering • LucidGaze for Solr • installer • Reference Manual • Solr 1.4 certified distro coming soon! 60
  • 61. LucidGaze for Solr • Monitoring tool, captures, stores, and interactively views Solr performance metrics • requests/second • time/request 61
  • 62. 62
  • 64. 64