Your SlideShare is downloading. ×
Oslo Solr MeetUp March 2012 - Solr4 alpha
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Oslo Solr MeetUp March 2012 - Solr4 alpha


Published on

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Oslo Solr Community March 20th 2012 What is new in Solr 4.0ß Jan HøydahlSponsors:
  • 2. Agenda 2– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI– And what about Solr 3.6 ?
  • 3. 4.0 beta? 3– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability
  • 4. Near-Realtime-Search 4– Before: • Add, add add add (not searchable) • Commit (new segment written → searchable)– 4.0: • In-memory index • Add • Soft-commit-(within/auto) • Real-time GET:<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
  • 5. Solr Cloud 5 – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr 4.0ß contains the first features • Apache ZooKeeper support, including built-in ZK • Support for auto distributed/LB query (by means of ZK) • Fault tolerant indexing and recovery • Add a new node and let it discover its role and sync up – Expected features to come • Tools to manage the config in ZK • Re-balancing of shards
  • 6. 6 Solr Cloud...– New concepts: • Collection: Cores making up one data set • ZooKeeper: Central coordination server– Easier distributed search: • /solr/web/select?q=*:*&distrib=true – This queries all cores in same "collection"– Easier distributed indexing: • http://<any.server>/solr/web/update...
  • 7. Solr Cloud on the index side... 7
  • 8. Better spellchecker 8– Direct SpellChecker– Automaton based (no extra lucene-index)– No long build times– Better performance– Better accuracy (?)
  • 9. Flex – smaller index 9– Lucenes Flex APIs– Lets you plug in your own codecs– Greater flexibility in how you can represent the binary index– Opens up for many new features • DocValues • Pluggable ranking • TEXT index • Store as UTF-8[] • Or other encoding for space saving for Chinese
  • 10. Pluggable Ranking 10– Lucene uses TF/IDF and VSM– Now support for BM25– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field
  • 11. Sort by Function 11– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc
  • 12. Result field aliasing and pseudo fields 12– Aliasing: • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)– Field name globbing: • q=foo&fl=score,t*– Pseudo fields: • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
  • 13. Pivot facets 13– Multi dimensional facets • &facet.pivot=cat,popularity
  • 14. Join query 14– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod
  • 15. New Admin GUI 15
  • 16. Solr 3.6 16– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException* Committed by Jan Høydahl