Oslo Solr MeetUp March 2012 - Solr4 alpha
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Oslo Solr MeetUp March 2012 - Solr4 alpha

  • 2,003 views
Uploaded on

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,003
On Slideshare
2,003
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Oslo Solr Community March 20th 2012 What is new in Solr 4.0ß Jan HøydahlSponsors:
  • 2. Agenda 2– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI– And what about Solr 3.6 ?
  • 3. 4.0 beta? 3– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability
  • 4. Near-Realtime-Search 4– Before: • Add, add add add (not searchable) • Commit (new segment written → searchable)– 4.0: • In-memory index • Add • Soft-commit-(within/auto) • Real-time GET:<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
  • 5. Solr Cloud 5 – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr 4.0ß contains the first features • Apache ZooKeeper support, including built-in ZK • Support for auto distributed/LB query (by means of ZK) • Fault tolerant indexing and recovery • Add a new node and let it discover its role and sync up – Expected features to come • Tools to manage the config in ZK • Re-balancing of shardshttp://wiki.apache.org/solr/SolrCloud
  • 6. 6 Solr Cloud...– New concepts: • Collection: Cores making up one data set • ZooKeeper: Central coordination server– Easier distributed search: • /solr/web/select?q=*:*&distrib=true – This queries all cores in same "collection"– Easier distributed indexing: • http://<any.server>/solr/web/update...
  • 7. Solr Cloud on the index side... 7http://wiki.apache.org/solr/SolrCloud
  • 8. Better spellchecker 8– Direct SpellChecker– Automaton based (no extra lucene-index)– No long build times– Better performance– Better accuracy (?)
  • 9. Flex – smaller index 9– Lucenes Flex APIs– Lets you plug in your own codecs– Greater flexibility in how you can represent the binary index– Opens up for many new features • DocValues • Pluggable ranking • TEXT index • Store as UTF-8[] • Or other encoding for space saving for Chinese
  • 10. Pluggable Ranking 10– Lucene uses TF/IDF and VSM– Now support for BM25– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field
  • 11. Sort by Function 11– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc
  • 12. Result field aliasing and pseudo fields 12– Aliasing: • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)– Field name globbing: • q=foo&fl=score,t*– Pseudo fields: • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
  • 13. Pivot facets 13– Multi dimensional facets • &facet.pivot=cat,popularity
  • 14. Join query 14– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod
  • 15. New Admin GUI 15
  • 16. Solr 3.6 16– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException* Committed by Jan Høydahl