Oslo Solr MeetUp March 2012 - Solr4 alpha


Published on

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Oslo Solr MeetUp March 2012 - Solr4 alpha

  1. 1. Oslo Solr Community March 20th 2012 What is new in Solr 4.0ß Jan HøydahlSponsors:
  2. 2. Agenda 2– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI– And what about Solr 3.6 ?
  3. 3. 4.0 beta? 3– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability
  4. 4. Near-Realtime-Search 4– Before: • Add, add add add (not searchable) • Commit (new segment written → searchable)– 4.0: • In-memory index • Add • Soft-commit-(within/auto) • Real-time GET:<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
  5. 5. Solr Cloud 5 – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr 4.0ß contains the first features • Apache ZooKeeper support, including built-in ZK • Support for auto distributed/LB query (by means of ZK) • Fault tolerant indexing and recovery • Add a new node and let it discover its role and sync up – Expected features to come • Tools to manage the config in ZK • Re-balancing of shardshttp://wiki.apache.org/solr/SolrCloud
  6. 6. 6 Solr Cloud...– New concepts: • Collection: Cores making up one data set • ZooKeeper: Central coordination server– Easier distributed search: • /solr/web/select?q=*:*&distrib=true – This queries all cores in same "collection"– Easier distributed indexing: • http://<any.server>/solr/web/update...
  7. 7. Solr Cloud on the index side... 7http://wiki.apache.org/solr/SolrCloud
  8. 8. Better spellchecker 8– Direct SpellChecker– Automaton based (no extra lucene-index)– No long build times– Better performance– Better accuracy (?)
  9. 9. Flex – smaller index 9– Lucenes Flex APIs– Lets you plug in your own codecs– Greater flexibility in how you can represent the binary index– Opens up for many new features • DocValues • Pluggable ranking • TEXT index • Store as UTF-8[] • Or other encoding for space saving for Chinese
  10. 10. Pluggable Ranking 10– Lucene uses TF/IDF and VSM– Now support for BM25– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field
  11. 11. Sort by Function 11– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc
  12. 12. Result field aliasing and pseudo fields 12– Aliasing: • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)– Field name globbing: • q=foo&fl=score,t*– Pseudo fields: • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
  13. 13. Pivot facets 13– Multi dimensional facets • &facet.pivot=cat,popularity
  14. 14. Join query 14– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod
  15. 15. New Admin GUI 15
  16. 16. Solr 3.6 16– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException* Committed by Jan Høydahl