Oslo Solr MeetUp March 2012 - Solr4 alpha

  • 1,511 views
Uploaded on

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,511
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Oslo Solr Community March 20th 2012 What is new in Solr 4.0ß Jan HøydahlSponsors:
  • 2. Agenda 2– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI– And what about Solr 3.6 ?
  • 3. 4.0 beta? 3– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability
  • 4. Near-Realtime-Search 4– Before: • Add, add add add (not searchable) • Commit (new segment written → searchable)– 4.0: • In-memory index • Add • Soft-commit-(within/auto) • Real-time GET:<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
  • 5. Solr Cloud 5 – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr 4.0ß contains the first features • Apache ZooKeeper support, including built-in ZK • Support for auto distributed/LB query (by means of ZK) • Fault tolerant indexing and recovery • Add a new node and let it discover its role and sync up – Expected features to come • Tools to manage the config in ZK • Re-balancing of shardshttp://wiki.apache.org/solr/SolrCloud
  • 6. 6 Solr Cloud...– New concepts: • Collection: Cores making up one data set • ZooKeeper: Central coordination server– Easier distributed search: • /solr/web/select?q=*:*&distrib=true – This queries all cores in same "collection"– Easier distributed indexing: • http://<any.server>/solr/web/update...
  • 7. Solr Cloud on the index side... 7http://wiki.apache.org/solr/SolrCloud
  • 8. Better spellchecker 8– Direct SpellChecker– Automaton based (no extra lucene-index)– No long build times– Better performance– Better accuracy (?)
  • 9. Flex – smaller index 9– Lucenes Flex APIs– Lets you plug in your own codecs– Greater flexibility in how you can represent the binary index– Opens up for many new features • DocValues • Pluggable ranking • TEXT index • Store as UTF-8[] • Or other encoding for space saving for Chinese
  • 10. Pluggable Ranking 10– Lucene uses TF/IDF and VSM– Now support for BM25– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field
  • 11. Sort by Function 11– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc
  • 12. Result field aliasing and pseudo fields 12– Aliasing: • q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)– Field name globbing: • q=foo&fl=score,t*– Pseudo fields: • q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
  • 13. Pivot facets 13– Multi dimensional facets • &facet.pivot=cat,popularity
  • 14. Join query 14– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod
  • 15. New Admin GUI 15
  • 16. Solr 3.6 16– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException* Committed by Jan Høydahl