1. Oslo Solr Community March 20th 2012
What is new in Solr 4.0ß
Jan Høydahl
Sponsors:
2. Agenda 2
– Solr/Lucene 4 ß, what, when?
– Near-Realtime-Search
– SolrCloud
– Better Spellchecker
– Flex – smaller index
– Pluggable Ranking
– Sort by Function
– Result field aliasing and pseudo fields
– Pivot facets
– Join query
– New Admin GUI
– And what about Solr 3.6 ?
3. 4.0 beta? 3
– Never released a public beta before
– So many changes, it makes sense
– Time frame??
– Stability
4. Near-Realtime-Search 4
– Before:
• Add, add add add (not searchable)
• Commit (new segment written → searchable)
– 4.0:
• In-memory index
• Add
• Soft-commit-(within/auto)
• Real-time GET:
<!-- realtime get handler, guaranteed to return the latest stored fields of
any document, without the need to commit or open a new searcher. The
current implementation relies on the updateLog feature being enabled.
-->
<requestHandler name="/get" class="solr.RealTimeGetHandler">
<lst name="defaults">
<str name="omitHeader">true</str>
</lst>
</requestHandler>
5. Solr Cloud 5
– Solr Cloud is the popular name for an initiative to make Solr
more easily scalable and managable in a distributed world
– Enables centralized configuration and cluster status
monitoring
– Solr 4.0ß contains the first features
• Apache ZooKeeper support, including built-in ZK
• Support for auto distributed/LB query (by means of ZK)
• Fault tolerant indexing and recovery
• Add a new node and let it discover its role and sync up
– Expected features to come
• Tools to manage the config in ZK
• Re-balancing of shards
http://wiki.apache.org/solr/SolrCloud
6. 6
Solr Cloud...
– New concepts:
• Collection: Cores making up one data set
• ZooKeeper: Central coordination server
– Easier distributed search:
• /solr/web/select?q=*:*&distrib=true
– This queries all cores in same "collection"
– Easier distributed indexing:
• http://<any.server>/solr/web/update...
7. Solr Cloud on the index side... 7
http://wiki.apache.org/solr/SolrCloud
8. Better spellchecker 8
– Direct SpellChecker
– Automaton based
(no extra lucene-index)
– No long build times
– Better performance
– Better accuracy (?)
9. Flex – smaller index 9
– Lucene's Flex APIs
– Lets you plug in your own
codecs
– Greater flexibility in how
you can represent the
binary index
– Opens up for many new
features
• DocValues
• Pluggable ranking
• TEXT index
• Store as UTF-8[]
• Or other encoding for
space saving for Chinese
10. Pluggable Ranking 10
– Lucene uses TF/IDF and VSM
– Now support for BM25
– Plug your own!
– Hopefully attracts researchers
– Also, pluggable Similarity class per field
11. Sort by Function 11
– q=foo&sort=sub(price,discount) desc
– q=foo&sort=dist(2, x, y, 0, 0) asc
12. Result field aliasing and pseudo fields 12
– Aliasing:
• q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)
– Field name globbing:
• q=foo&fl=score,t*
– Pseudo fields:
• q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
13. Pivot facets 13
– Multi dimensional facets
• &facet.pivot=cat,popularity
16. Solr 3.6 16
– SOLR-2764*: NorwegianLightStemmer,
NorwegianMinimalStemmer
– SOLR-2202*: Money/Currency FieldType
– SOLR-2826*: URLClassify Update Processor
– SOLR-3056: Japanese field type in schema.xml
– SOLR-3026*: eDismax user fields
– SOLR-3140*: omitNorms default for all numeric field types
– SOLR-2901*: Upgrade Solr to Tika 1.0
– SOLR-1709: Distributed Date and Range Faceting
– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR
– SOLR-2509*: spellcheck StringIndexOutOfBoundsException
* Committed by Jan Høydahl