Oslo Solr Community   March 20th 2012




                What is new in Solr 4.0ß
                     Jan Høydahl




Sponsors:
Agenda                                        2




–   Solr/Lucene 4 ß, what, when?
–   Near-Realtime-Search
–   SolrCloud
–   Better Spellchecker
–   Flex – smaller index
–   Pluggable Ranking
–   Sort by Function
–   Result field aliasing and pseudo fields
–   Pivot facets
–   Join query
–   New Admin GUI

– And what about Solr 3.6 ?
4.0 beta?                                 3




–   Never released a public beta before
–   So many changes, it makes sense
–   Time frame??
–   Stability
Near-Realtime-Search                                                              4




– Before:
      • Add, add add add (not searchable)
      • Commit (new segment written → searchable)
– 4.0:
      •    In-memory index
      •    Add
      •    Soft-commit-(within/auto)
      •    Real-time GET:
<!-- realtime get handler, guaranteed to return the latest stored fields of
          any document, without the need to commit or open a new searcher.   The
          current implementation relies on the updateLog feature being enabled.
-->
 <requestHandler name="/get" class="solr.RealTimeGetHandler">
      <lst name="defaults">
          <str name="omitHeader">true</str>
      </lst>
 </requestHandler>
Solr Cloud                                                          5




         – Solr Cloud is the popular name for an initiative to make Solr
           more easily scalable and managable in a distributed world
         – Enables centralized configuration and cluster status
           monitoring
         – Solr 4.0ß contains the first features
               •   Apache ZooKeeper support, including built-in ZK
               •   Support for auto distributed/LB query (by means of ZK)
               •   Fault tolerant indexing and recovery
               •   Add a new node and let it discover its role and sync up
         – Expected features to come
               • Tools to manage the config in ZK
               • Re-balancing of shards



http://wiki.apache.org/solr/SolrCloud
6

      Solr Cloud...

– New concepts:
   • Collection: Cores making up one data set
   • ZooKeeper: Central coordination server
– Easier distributed search:
   • /solr/web/select?q=*:*&distrib=true
       – This queries all cores in same "collection"
– Easier distributed indexing:
   • http://<any.server>/solr/web/update...
Solr Cloud on the index side...   7




http://wiki.apache.org/solr/SolrCloud
Better spellchecker         8




– Direct SpellChecker
– Automaton based
  (no extra lucene-index)
– No long build times
– Better performance
– Better accuracy (?)
Flex – smaller index              9




– Lucene's Flex APIs
– Lets you plug in your own
  codecs
– Greater flexibility in how
  you can represent the
  binary index
– Opens up for many new
  features
   •   DocValues
   •   Pluggable ranking
   •   TEXT index
   •   Store as UTF-8[]
   •   Or other encoding for
       space saving for Chinese
Pluggable Ranking                              10




– Lucene uses TF/IDF and VSM
– Now support for BM25




– Plug your own!
– Hopefully attracts researchers
– Also, pluggable Similarity class per field
Sort by Function                        11




– q=foo&sort=sub(price,discount) desc
– q=foo&sort=dist(2, x, y, 0, 0) asc
Result field aliasing and pseudo fields                            12




– Aliasing:
   • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)
– Field name globbing:
   • q=foo&fl=score,t*
– Pseudo fields:
   • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
Pivot facets                       13




– Multi dimensional facets
   • &facet.pivot=cat,popularity
Join query                            14




– Simple Join feature (inner join)
– &q={!join from=manu_id to=id}ipod
New Admin GUI   15
Solr 3.6                                                16




– SOLR-2764*: NorwegianLightStemmer,
              NorwegianMinimalStemmer
– SOLR-2202*: Money/Currency FieldType
– SOLR-2826*: URLClassify Update Processor
– SOLR-3056: Japanese field type in schema.xml
– SOLR-3026*: eDismax user fields
– SOLR-3140*: omitNorms default for all numeric field types
– SOLR-2901*: Upgrade Solr to Tika 1.0
– SOLR-1709: Distributed Date and Range Faceting
– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR
– SOLR-2509*: spellcheck StringIndexOutOfBoundsException

* Committed by Jan Høydahl

Oslo Solr MeetUp March 2012 - Solr4 alpha

  • 1.
    Oslo Solr Community March 20th 2012 What is new in Solr 4.0ß Jan Høydahl Sponsors:
  • 2.
    Agenda 2 – Solr/Lucene 4 ß, what, when? – Near-Realtime-Search – SolrCloud – Better Spellchecker – Flex – smaller index – Pluggable Ranking – Sort by Function – Result field aliasing and pseudo fields – Pivot facets – Join query – New Admin GUI – And what about Solr 3.6 ?
  • 3.
    4.0 beta? 3 – Never released a public beta before – So many changes, it makes sense – Time frame?? – Stability
  • 4.
    Near-Realtime-Search 4 – Before: • Add, add add add (not searchable) • Commit (new segment written → searchable) – 4.0: • In-memory index • Add • Soft-commit-(within/auto) • Real-time GET: <!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled. --> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
  • 5.
    Solr Cloud 5 – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr 4.0ß contains the first features • Apache ZooKeeper support, including built-in ZK • Support for auto distributed/LB query (by means of ZK) • Fault tolerant indexing and recovery • Add a new node and let it discover its role and sync up – Expected features to come • Tools to manage the config in ZK • Re-balancing of shards http://wiki.apache.org/solr/SolrCloud
  • 6.
    6 Solr Cloud... – New concepts: • Collection: Cores making up one data set • ZooKeeper: Central coordination server – Easier distributed search: • /solr/web/select?q=*:*&distrib=true – This queries all cores in same "collection" – Easier distributed indexing: • http://<any.server>/solr/web/update...
  • 7.
    Solr Cloud onthe index side... 7 http://wiki.apache.org/solr/SolrCloud
  • 8.
    Better spellchecker 8 – Direct SpellChecker – Automaton based (no extra lucene-index) – No long build times – Better performance – Better accuracy (?)
  • 9.
    Flex – smallerindex 9 – Lucene's Flex APIs – Lets you plug in your own codecs – Greater flexibility in how you can represent the binary index – Opens up for many new features • DocValues • Pluggable ranking • TEXT index • Store as UTF-8[] • Or other encoding for space saving for Chinese
  • 10.
    Pluggable Ranking 10 – Lucene uses TF/IDF and VSM – Now support for BM25 – Plug your own! – Hopefully attracts researchers – Also, pluggable Similarity class per field
  • 11.
    Sort by Function 11 – q=foo&sort=sub(price,discount) desc – q=foo&sort=dist(2, x, y, 0, 0) asc
  • 12.
    Result field aliasingand pseudo fields 12 – Aliasing: • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount) – Field name globbing: • q=foo&fl=score,t* – Pseudo fields: • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
  • 13.
    Pivot facets 13 – Multi dimensional facets • &facet.pivot=cat,popularity
  • 14.
    Join query 14 – Simple Join feature (inner join) – &q={!join from=manu_id to=id}ipod
  • 15.
  • 16.
    Solr 3.6 16 – SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer – SOLR-2202*: Money/Currency FieldType – SOLR-2826*: URLClassify Update Processor – SOLR-3056: Japanese field type in schema.xml – SOLR-3026*: eDismax user fields – SOLR-3140*: omitNorms default for all numeric field types – SOLR-2901*: Upgrade Solr to Tika 1.0 – SOLR-1709: Distributed Date and Range Faceting – SOLR-2487*: Do not include slf4j-jdk14 jar in WAR – SOLR-2509*: spellcheck StringIndexOutOfBoundsException * Committed by Jan Høydahl