What's New in Solr?


        code4lib 2011 preconference
               Bloomington, IN
presented by Erik Hatcher of Lucid Imagination
about me
spoken at several code4lib conferences

   Keynoted Athens '07 along with the pioneering Solr preconference,

   Providence '09, "Rising Sun"

   pre-conferenced Asheville '10, "Solr Black Belt"

co-authored "Lucene in Action", first edition; ghost/toast on second edition

Lucene and Solr committer.

library world claims to fame are founding and naming Blacklight, original developer on
Collex and the Rossetti Archive search

now at Lucid Imagination, dedicated to Lucene/Solr support/services/training/etc
abstract
 The library world is fired up about Solr. Practically every
next-gen catalog is using it (via Blacklight, VuFind, or other
    technologies). Solr has continued improving in some
     dramatic ways, including geospatial support, field
collapsing/grouping, extended dismax query parsing, pivot/
   grid/matrix/tree faceting, autosuggest, and more. This
 session will cover all of these new features, showcasing
 live examples of them all, including anything new that is
           implemented prior to the conference.
LIA2 - Lucene in Action
Published: July 2010 - http://www.manning.com/lucene/
New in this second edition:
   Performing hot backups
   Using numeric fields
   Tuning for indexing or searching speed
   Boosting matches with payloads
   Creating reusable analyzers
   Adding concurrency with threads
   Four new case studies, and more
Version Number
Which one ya talking 'bout, Willis?

  3.1? 4.0?? TRUNK??

playing with fire

  index format changes to be expected

     reindexing recommended/required

Solr/Lucene merged development codebases

  releases should occur lock-step moving forward
dependencies

November 2009: Solr 1.4 (Lucene 2.9.1)

June 2010: Solr 1.4.1 (Lucene 2.9.3)

Spring 2011(?): Solr 3.1 (Lucene 3.1)

TRUNK: Solr 4.x (Lucene TRUNK)
lucene
per-segment field cache, etc

Unicode and analysis improvements throughout

Analysis "attributes"

AutomatonQuery: RegexpQuery, WildcardQuery

flexible indexing

and so much more!
README

Reindex!

Upgrade SolrJ libraries too (javabin format
changed)

Read Lucene and Solr's CHANGES.txt files for all
the details
Analysis

UAX, using ICU

CollationKey

PatternReplaceCharFilter

KeywordMarkerFilterFactory,
StemmerOverrideFilterFactory
Standard tokenization

ClassicTokenizer: old StandardTokenizer

StandardTokenizer: now uses Unicode text
segmentation specified by UAX#29

UAX29URLEmailTokenizer

maxTokenLength: default=255
PathHierarchyTokenizer


delimiter: default=/

replace: default=<delimiter>

"/foo/bar" => [/foo] [/foo/bar]
CollationKeyFilter
A filter that lets one specify:

   A system collator associated with a locale, or

   A collator based on custom rules

This can be used for changing sort order for non-english languages as well as
to modify the collation sequence for certain languages. You must use the same
CollationKeyFilter at both index-time and query-time for correct results. Also,
the JVM vendor, version (including patch version) of the slave should be exactly
same as the master (or indexer) for consistent results.

http://wiki.apache.org/solr/UnicodeCollation

see also: ICUCollationKeyFilter
ICU
International Components for Unicode

ICUFoldingFilter

ICUNormalizer2Filter

  name=nfc|nfkc|nfkc_cf

  mode=compose|decompose

  filter
ICUFoldingFilter
Accent removal, case folding,canonical duplicates folding,dashes
folding,diacritic removal (including stroke, hook, descender), Greek letterforms
folding, Han Radical folding, Hebrew Alternates folding, Jamo folding,
Letterforms folding, Math symbol folding, Multigraph Expansions: All, Native
digit folding, No-break folding, Overline folding, Positional forms folding, Small
forms folding, Space folding, Spacing Accents folding, Subscript folding,
Superscript folding, Suzhou Numeral folding, Symbol folding, Underline folding,
Vertical forms folding, Width folding

Additionally, Default Ignorables are removed, and text is normalized to NFKC.

 All foldings, case folding, and normalization mappings are applied recursively
to ensure a fully folded and normalized result.
ICUTransformFilter
id: specific transliterator identifier from ICU's
Transliterator#getAvailableIDs()(required)

direction=forward|reverse

Examples:

  Traditional-Simplified:         =>

  Cyrillic-Latin: Российская Федерация =>
  Rossijskaâ Federaciâ
Tom Burton-West's
latest

ICU

shingles

query parser

ABC -> [A] [B] [C] or [AB] [BC]...
highlighter


deprecated old config, now config as standard
search component

FastVectorHighlighter
FastVectorHighlighter

if termVectors="true", termPositions="true", and
termOffsets="true"

and hl.useFastVectorHighlighter=true

  hl.fragListBuilder

  hl.fragmentsBuilder
spatial
JTeam's plugin: packaged for easy deployment

Solr trunk capabilities

many distance functions

What's missing?

  geo faceting? scoring by distance? distance
  pseudo-field?

All units in kilometers, unless otherwise specified
Spatial field types

Point: n-dimensional, must specify dimension
(default=2), represented by N subfields internally

LatLon: latitude,longitude, represented by two
subfields internally, single valued only

GeoHash: single string representation of lat/lon
Spatial query parsers
geofilt: exact filtering

bbox: uses (trie) range queries

Parameters:

  sfield: spatial field

  pt: reference point

  d: distance
field collapsing/grouping
backwards compatibility mode?               sort: how to sort groups, by top
                                            document in each group
http://wiki.apache.org/solr/
FieldCollapsing                             group.sort: how to sort docs within
                                            each group
group=true
                                            group.format: grouped | simple
group.field / group.func / group.query
                                            group.main=true|false:
rows / start: for groups, not documents
                                            faceting works as normal
group.limit: number of results per
group                                       not distributed savvy yet

group.offset: offset into doclist of each
group
query parsing


TextField: autoGeneratePhraseQueries="true"

  if single string analyzes to multiple tokens
{!raw|term|field f=$f}...
Recall why we needed {!raw} from last year

<fieldType = .../> - use one string, one numeric, (and one text?)

<field name="..."/>

table for numeric and for string (and text?):

    {!raw f=$f} | TermQuery(...)

    {!term f=$f} | ...

    {!field f=$f} | ...

Which to use when? {!raw} works for strings just fine, but best to migrate to the generally
safer/wiser {!term} for future-proofing.
{!term f=field}


fq={!term f=weight}1.5
dismax

q.op or schema.xml's <solrQueryParser
defaultOperator="[AND|OR]"/> defaults mm to 0%
(OR) or 100% (AND)

#code4lib: issues with non-analyzed fields in qf
edismax
Supports full lucene query syntax in the absence of syntax errors


supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode


When there are syntax errors, improved smart partial escaping of special characters is done to prevent
them... in this mode, fielded queries, +/-, and phrase queries are still supported.


Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in
the document to get any boost, as well as having all of the words in a single field.


advanced stopword handling... stopwords are not required in the mandatory part of the query but are still
used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be)
then all will be required.


Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of
adding it in


Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
function queries


termfreq, tf, docfreq, idf, norm, maxdoc, numdocs

{!func}termfreq(text,ipod)

standard java.util.Math functions
faceting
per-segment, single-valued fields:

   facet.method=fcs (field cache per segment)

   facet.field={!threads=-1}field_name

       threads=0: direct execution

       threads=-1: thread per segment

   speeds up single and multivalued method=fc, especially for deep paging with
   facet.offset

date faceting improvements, generalized for numeric ranges too

can now exclude main query q={!tag=main}the+query&facet.field={!ex=main}category
pivot/grid/matrix/tree
faceting


is this also "hierarchical faceting"? it depends!
pivot faceting output
/select?q=*:*&rows=0&facet=on
&facet.pivot=cat,popularity,inStock
&facet.pivot=popularity,cat
spell checking


DirectSolrSpellChecker

  no external index needed, uses automaton on
  main index
spellcheck config
solrconfig.xml
  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">textgen</str>

    <!-- a spellchecker that uses no auxiliary index -->
     <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spell</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="minPrefix">1</str>
    </lst>
  </searchComponent>
spellcheck handler

solrconfig.xml
  <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.collate">true</str>
     </lst>

    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>
spellcheck response
http://localhost:8983/solr/select?q=ipud%20bluck&wt=ruby&indent=on
                        {
                            'responseHeader'=>{
                               'status'=>0,
                               'QTime'=>10,
                               'params'=>{
                                 'indent'=>'on',
                                 'wt'=>'ruby',
                                 'q'=>'ipud bluck'}},
                            'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
                            },
                            'spellcheck'=>{
                               'suggestions'=>[
                                 'ipud',{
                                   'numFound'=>1,
                                   'startOffset'=>0,
                                   'endOffset'=>4,
                                   'suggestion'=>['ipod']},
                                 'bluck',{
                                   'numFound'=>1,
                                   'startOffset'=>5,
                                   'endOffset'=>10,
                                   'suggestion'=>['black']},
                                 'collation','ipod black']}}
autosuggest

new "spellcheck" component, builds TST

collates query

can check if collated suggestions yield results,
optionally, providing hit count
suggest config
solrconfig.xml
  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">textgen</str>


    <lst name="spellchecker">
      <str name="name">suggest</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">
        org.apache.solr.spelling.suggest.jaspell.JaspellLookup
      </str>
      <str name="field">suggest</str>
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>

schema.xml
   <field name="suggest" type="textgen" indexed="true" stored="false"/>

   <copyField source="name" dest="suggest"/>
suggest handler
solrconfig.xml
  <requestHandler class="solr.SearchHandler" name="/suggest">
    <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggest</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.count">10</str>
      <str name="rows">0</str>
      <str name="spellcheck.maxCollationTries">20</str>
      <str name="spellcheck.maxCollations">10</str>
      <str name="spellcheck.collateExtendedResults">true</str>
    </lst>
    <arr name="components">
      <str>query</str> <!-- to allow suggestion hit counts to be returned -->
      <str>spellcheck</str>
    </arr>
  </requestHandler>
suggest response
http://localhost:8983/solr/suggest?q=ip&wt=ruby&indent=on
         {
             'responseHeader'=>{
                'status'=>0,
                'QTime'=>2},
             'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
             },
             'spellcheck'=>{
                'suggestions'=>[
                  'ip',{
                    'numFound'=>1,
                    'startOffset'=>0,
                    'endOffset'=>2,
                    'suggestion'=>['ipod']},
                  'collation',[
                    'collationQuery','ipod',
                    'hits',3,
                    'misspellingsAndCorrections',[
                      'ip','ipod']]]}}
sort

by function

  &q=*:*&sfield=store&pt=39.194564,-86.432947&
  sort=geodist() asc

but still can't get value of function back

  unless you force it to be the score somehow
clustering component


now works out-of-the-box; all Apache license
compatible

supports distributed search
debug=true


debug=true|all|timing|query|results

debug=results&debug.explain.structured=true
structured explain
http://localhost:8983/solr/select?q=title:solr
&debug.explain.structured=true&debug=results
&wt=ruby&indent=on
     'debug'=>{
       'explain'=>{
         'doc1'=>{
           'match'=>true,
           'value'=>0.076713204,
           'description'=>'fieldWeight(title:solr in 0), product of:',
           'details'=>[{
                'match'=>true,
                'value'=>1.0,
                'description'=>'tf(termFreq(title:solr)=1)'},
             {
                'match'=>true,
                'value'=>0.30685282,
                'description'=>'idf(docFreq=1, maxDocs=1)'},
             {
                'match'=>true,
                'value'=>0.25,
                'description'=>'fieldNorm(field=title, doc=0)'}]}}}}
SolrCloud

shared/central config and core/shard managment
via zookeeper,

built-in load balancing, and infrastructure for future
SolrCloud work.
/update/json
solrconfig.xml
  <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler"/>




     curl
         'http://localhost:8983/solr/update/json?commit=true'
         -H 'Content-type:application/json' -d '
     {
       "add": {
         "doc": {
           "id" : "MyTestDocument",
           "title" : "This is just a test"
         }
       }
     }'
wt=csv

Writes only docs (no response header or response
extras) in CSV format

Roundtrippable with /update/csv

  provided all fields are stored
UIMA
Unstructured Information Management
Architecture

 http://uima.apache.org/

New update processor chain, augmenting
incoming documents from a UIMA annotator
pipeline

 http://wiki.apache.org/solr/SolrUIMA
(solr|lucene)-dev


ant [idea|eclipse]

go!

http://wiki.apache.org/solr/HowToContribute
works in progress

some interesting open issues (with patches):

  PayloadTermQuery

  XMLQueryParser plugin

  join
{!join from=$f to=$t}


insert <what Yonik said>

  https://issues.apache.org/jira/browse/
  SOLR-2272
Lucid (imagination)
What's Lucid done for you lately -

  Yonik, Mark, Grant, Hoss: Lucene and Solr performance,
  faceting, grouping, join query, spatial, Mahout, ORP, PMC,
  etc, etc, etc

  Other technical staff involved in mailing list assistance, bug
  reporting, contributing patches (hi Lance, Erick, Jay, Tom,
  Grijesh, Tomas....)

  extended dismax, join, faceting performance improvements

  LucidWorks Enterprise
Hoss Simplicity

http://www.lucidimagination.com/blog/
2011/01/21/solr-powered-isfdb-part1/

http://www.lucidimagination.com/blog/
2011/01/28/solr-powered-isfdb-part-2/
LucidWorks Enterprise
      "lucid" query parser               REST API

      click boosting                     Data sources,
                                         crawlers, and
      tunable norms, per-                scheduling
      field
                                         Alerts
      role filtering

      administrative UI

http://www.lucidimagination.com/enterprise-search-solutions/lucidworks
Community Questions


fire away!
resources


duh!: #code4lib

lucene.apache.org/solr

search.lucidimagination.com/?q=<your query>
Q&A: faceting


why is paging through facets the way it is?

  short-circuits on enum
Community:
- The state of Extended DisMax, and what Lucene features
remain incompatible with it.

- Any developments on faceting (I've implemented the
standard workaround to the "unknown facet list size"
problem...  but I'd still love to be able to know exactly how
long the lists are)

- Hierarchical documents in Solr -- I haven't followed the
conversations closely, but I gather that this topic is gaining
some momentum in the Solr community.
contact info
erik.hatcher @ lucidimagination . com

http://www.lucidimagination.com

  webinars, documentation

  LucidFind: search.lucidimagination.com

    search mailing list posts, wiki pages, web
    sites, our blog, etc for latest Lucene/Solr
    assistance
re: code4lib

code4lib 2011 preconference: What's New in Solr (since 1.4.1)

  • 1.
    What's New inSolr? code4lib 2011 preconference Bloomington, IN presented by Erik Hatcher of Lucid Imagination
  • 2.
    about me spoken atseveral code4lib conferences Keynoted Athens '07 along with the pioneering Solr preconference, Providence '09, "Rising Sun" pre-conferenced Asheville '10, "Solr Black Belt" co-authored "Lucene in Action", first edition; ghost/toast on second edition Lucene and Solr committer. library world claims to fame are founding and naming Blacklight, original developer on Collex and the Rossetti Archive search now at Lucid Imagination, dedicated to Lucene/Solr support/services/training/etc
  • 3.
    abstract The libraryworld is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/ grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
  • 4.
    LIA2 - Lucenein Action Published: July 2010 - http://www.manning.com/lucene/ New in this second edition: Performing hot backups Using numeric fields Tuning for indexing or searching speed Boosting matches with payloads Creating reusable analyzers Adding concurrency with threads Four new case studies, and more
  • 5.
    Version Number Which oneya talking 'bout, Willis? 3.1? 4.0?? TRUNK?? playing with fire index format changes to be expected reindexing recommended/required Solr/Lucene merged development codebases releases should occur lock-step moving forward
  • 6.
    dependencies November 2009: Solr1.4 (Lucene 2.9.1) June 2010: Solr 1.4.1 (Lucene 2.9.3) Spring 2011(?): Solr 3.1 (Lucene 3.1) TRUNK: Solr 4.x (Lucene TRUNK)
  • 7.
    lucene per-segment field cache,etc Unicode and analysis improvements throughout Analysis "attributes" AutomatonQuery: RegexpQuery, WildcardQuery flexible indexing and so much more!
  • 8.
    README Reindex! Upgrade SolrJ librariestoo (javabin format changed) Read Lucene and Solr's CHANGES.txt files for all the details
  • 9.
  • 10.
    Standard tokenization ClassicTokenizer: oldStandardTokenizer StandardTokenizer: now uses Unicode text segmentation specified by UAX#29 UAX29URLEmailTokenizer maxTokenLength: default=255
  • 11.
  • 12.
    CollationKeyFilter A filter thatlets one specify: A system collator associated with a locale, or A collator based on custom rules This can be used for changing sort order for non-english languages as well as to modify the collation sequence for certain languages. You must use the same CollationKeyFilter at both index-time and query-time for correct results. Also, the JVM vendor, version (including patch version) of the slave should be exactly same as the master (or indexer) for consistent results. http://wiki.apache.org/solr/UnicodeCollation see also: ICUCollationKeyFilter
  • 13.
    ICU International Components forUnicode ICUFoldingFilter ICUNormalizer2Filter name=nfc|nfkc|nfkc_cf mode=compose|decompose filter
  • 14.
    ICUFoldingFilter Accent removal, casefolding,canonical duplicates folding,dashes folding,diacritic removal (including stroke, hook, descender), Greek letterforms folding, Han Radical folding, Hebrew Alternates folding, Jamo folding, Letterforms folding, Math symbol folding, Multigraph Expansions: All, Native digit folding, No-break folding, Overline folding, Positional forms folding, Small forms folding, Space folding, Spacing Accents folding, Subscript folding, Superscript folding, Suzhou Numeral folding, Symbol folding, Underline folding, Vertical forms folding, Width folding Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.
  • 15.
    ICUTransformFilter id: specific transliteratoridentifier from ICU's Transliterator#getAvailableIDs()(required) direction=forward|reverse Examples: Traditional-Simplified: => Cyrillic-Latin: Российская Федерация => Rossijskaâ Federaciâ
  • 16.
  • 17.
    highlighter deprecated old config,now config as standard search component FastVectorHighlighter
  • 18.
    FastVectorHighlighter if termVectors="true", termPositions="true",and termOffsets="true" and hl.useFastVectorHighlighter=true hl.fragListBuilder hl.fragmentsBuilder
  • 19.
    spatial JTeam's plugin: packagedfor easy deployment Solr trunk capabilities many distance functions What's missing? geo faceting? scoring by distance? distance pseudo-field? All units in kilometers, unless otherwise specified
  • 20.
    Spatial field types Point:n-dimensional, must specify dimension (default=2), represented by N subfields internally LatLon: latitude,longitude, represented by two subfields internally, single valued only GeoHash: single string representation of lat/lon
  • 21.
    Spatial query parsers geofilt:exact filtering bbox: uses (trie) range queries Parameters: sfield: spatial field pt: reference point d: distance
  • 22.
    field collapsing/grouping backwards compatibilitymode? sort: how to sort groups, by top document in each group http://wiki.apache.org/solr/ FieldCollapsing group.sort: how to sort docs within each group group=true group.format: grouped | simple group.field / group.func / group.query group.main=true|false: rows / start: for groups, not documents faceting works as normal group.limit: number of results per group not distributed savvy yet group.offset: offset into doclist of each group
  • 23.
    query parsing TextField: autoGeneratePhraseQueries="true" if single string analyzes to multiple tokens
  • 24.
    {!raw|term|field f=$f}... Recall whywe needed {!raw} from last year <fieldType = .../> - use one string, one numeric, (and one text?) <field name="..."/> table for numeric and for string (and text?): {!raw f=$f} | TermQuery(...) {!term f=$f} | ... {!field f=$f} | ... Which to use when? {!raw} works for strings just fine, but best to migrate to the generally safer/wiser {!term} for future-proofing.
  • 25.
  • 26.
    dismax q.op or schema.xml's<solrQueryParser defaultOperator="[AND|OR]"/> defaults mm to 0% (OR) or 100% (AND) #code4lib: issues with non-analyzed fields in qf
  • 27.
    edismax Supports full lucenequery syntax in the absence of syntax errors supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported. Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field. advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required. Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
  • 28.
    function queries termfreq, tf,docfreq, idf, norm, maxdoc, numdocs {!func}termfreq(text,ipod) standard java.util.Math functions
  • 29.
    faceting per-segment, single-valued fields: facet.method=fcs (field cache per segment) facet.field={!threads=-1}field_name threads=0: direct execution threads=-1: thread per segment speeds up single and multivalued method=fc, especially for deep paging with facet.offset date faceting improvements, generalized for numeric ranges too can now exclude main query q={!tag=main}the+query&facet.field={!ex=main}category
  • 30.
    pivot/grid/matrix/tree faceting is this also"hierarchical faceting"? it depends!
  • 31.
  • 32.
    spell checking DirectSolrSpellChecker no external index needed, uses automaton on main index
  • 33.
    spellcheck config solrconfig.xml <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textgen</str> <!-- a spellchecker that uses no auxiliary index --> <lst name="spellchecker"> <str name="name">default</str> <str name="field">spell</str> <str name="classname">solr.DirectSolrSpellChecker</str> <str name="minPrefix">1</str> </lst> </searchComponent>
  • 34.
    spellcheck handler solrconfig.xml <requestHandler name="standard" class="solr.SearchHandler" default="true"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="spellcheck">true</str> <str name="spellcheck.collate">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
  • 35.
    spellcheck response http://localhost:8983/solr/select?q=ipud%20bluck&wt=ruby&indent=on { 'responseHeader'=>{ 'status'=>0, 'QTime'=>10, 'params'=>{ 'indent'=>'on', 'wt'=>'ruby', 'q'=>'ipud bluck'}}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'spellcheck'=>{ 'suggestions'=>[ 'ipud',{ 'numFound'=>1, 'startOffset'=>0, 'endOffset'=>4, 'suggestion'=>['ipod']}, 'bluck',{ 'numFound'=>1, 'startOffset'=>5, 'endOffset'=>10, 'suggestion'=>['black']}, 'collation','ipod black']}}
  • 36.
    autosuggest new "spellcheck" component,builds TST collates query can check if collated suggestions yield results, optionally, providing hit count
  • 37.
    suggest config solrconfig.xml <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textgen</str> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl"> org.apache.solr.spelling.suggest.jaspell.JaspellLookup </str> <str name="field">suggest</str> <str name="buildOnCommit">true</str> </lst> </searchComponent> schema.xml <field name="suggest" type="textgen" indexed="true" stored="false"/> <copyField source="name" dest="suggest"/>
  • 38.
    suggest handler solrconfig.xml <requestHandler class="solr.SearchHandler" name="/suggest"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.collate">true</str> <str name="spellcheck.count">10</str> <str name="rows">0</str> <str name="spellcheck.maxCollationTries">20</str> <str name="spellcheck.maxCollations">10</str> <str name="spellcheck.collateExtendedResults">true</str> </lst> <arr name="components"> <str>query</str> <!-- to allow suggestion hit counts to be returned --> <str>spellcheck</str> </arr> </requestHandler>
  • 39.
    suggest response http://localhost:8983/solr/suggest?q=ip&wt=ruby&indent=on { 'responseHeader'=>{ 'status'=>0, 'QTime'=>2}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'spellcheck'=>{ 'suggestions'=>[ 'ip',{ 'numFound'=>1, 'startOffset'=>0, 'endOffset'=>2, 'suggestion'=>['ipod']}, 'collation',[ 'collationQuery','ipod', 'hits',3, 'misspellingsAndCorrections',[ 'ip','ipod']]]}}
  • 40.
    sort by function &q=*:*&sfield=store&pt=39.194564,-86.432947& sort=geodist() asc but still can't get value of function back unless you force it to be the score somehow
  • 41.
    clustering component now worksout-of-the-box; all Apache license compatible supports distributed search
  • 42.
  • 43.
    structured explain http://localhost:8983/solr/select?q=title:solr &debug.explain.structured=true&debug=results &wt=ruby&indent=on 'debug'=>{ 'explain'=>{ 'doc1'=>{ 'match'=>true, 'value'=>0.076713204, 'description'=>'fieldWeight(title:solr in 0), product of:', 'details'=>[{ 'match'=>true, 'value'=>1.0, 'description'=>'tf(termFreq(title:solr)=1)'}, { 'match'=>true, 'value'=>0.30685282, 'description'=>'idf(docFreq=1, maxDocs=1)'}, { 'match'=>true, 'value'=>0.25, 'description'=>'fieldNorm(field=title, doc=0)'}]}}}}
  • 44.
    SolrCloud shared/central config andcore/shard managment via zookeeper, built-in load balancing, and infrastructure for future SolrCloud work.
  • 45.
    /update/json solrconfig.xml <requestHandlername="/update/json" class="solr.JsonUpdateRequestHandler"/> curl 'http://localhost:8983/solr/update/json?commit=true' -H 'Content-type:application/json' -d ' { "add": { "doc": { "id" : "MyTestDocument", "title" : "This is just a test" } } }'
  • 46.
    wt=csv Writes only docs(no response header or response extras) in CSV format Roundtrippable with /update/csv provided all fields are stored
  • 47.
    UIMA Unstructured Information Management Architecture http://uima.apache.org/ New update processor chain, augmenting incoming documents from a UIMA annotator pipeline http://wiki.apache.org/solr/SolrUIMA
  • 48.
  • 49.
    works in progress someinteresting open issues (with patches): PayloadTermQuery XMLQueryParser plugin join
  • 50.
    {!join from=$f to=$t} insert<what Yonik said> https://issues.apache.org/jira/browse/ SOLR-2272
  • 51.
    Lucid (imagination) What's Luciddone for you lately - Yonik, Mark, Grant, Hoss: Lucene and Solr performance, faceting, grouping, join query, spatial, Mahout, ORP, PMC, etc, etc, etc Other technical staff involved in mailing list assistance, bug reporting, contributing patches (hi Lance, Erick, Jay, Tom, Grijesh, Tomas....) extended dismax, join, faceting performance improvements LucidWorks Enterprise
  • 52.
  • 53.
    LucidWorks Enterprise "lucid" query parser REST API click boosting Data sources, crawlers, and tunable norms, per- scheduling field Alerts role filtering administrative UI http://www.lucidimagination.com/enterprise-search-solutions/lucidworks
  • 54.
  • 55.
  • 56.
    Q&A: faceting why ispaging through facets the way it is? short-circuits on enum
  • 57.
    Community: - The stateof Extended DisMax, and what Lucene features remain incompatible with it. - Any developments on faceting (I've implemented the standard workaround to the "unknown facet list size" problem...  but I'd still love to be able to know exactly how long the lists are) - Hierarchical documents in Solr -- I haven't followed the conversations closely, but I gather that this topic is gaining some momentum in the Solr community.
  • 58.
    contact info erik.hatcher @lucidimagination . com http://www.lucidimagination.com webinars, documentation LucidFind: search.lucidimagination.com search mailing list posts, wiki pages, web sites, our blog, etc for latest Lucene/Solr assistance
  • 60.