• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
What's new in solr june 2014
 

What's new in solr june 2014

on

  • 1,369 views

 

Statistics

Views

Total Views
1,369
Views on SlideShare
659
Embed Views
710

Actions

Likes
3
Downloads
24
Comments
0

4 Embeds 710

http://programs.lucidworks.com 605
https://twitter.com 99
http://wiki.xtremax.com 5
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Asynchronous collection API calls in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-AsynchronousCalls <br /> <br /> REQUESTSTATUS action in the Solr Reference Guide: http://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus
  • See Pagination of Results in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results <br />
  • Chris Hostetter’s scripts to produce the graph: https://github.com/LucidWorks/blog-deep-paging-perf
  • Date Math Expressions in Solr Javadocs: https://lucene.apache.org/solr/4_8_1/solr-core/org/apache/solr/util/DateMathParser.html <br /> <br /> See Chris Hostetter’s blog post “New in Solr 4.8: Document Expiration”: http://searchhub.org/2014/05/07/document-expiration/
  • See the “Managed Resources” page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Managed+Resources <br /> <br /> See also Tim Potter’s blog “Using Solr’s REST APIs to manage stop words and synonyms”: http://searchhub.org/2014/03/31/introducing-solrs-restmanager-and-managed-stop-words-and-synonyms/
  • For info on Tri-level compositeId routing, see Anshum Gupta’s blog “Multi level composite-id routing in SolrCloud”: http://searchhub.org/2014/01/06/10590/ <br /> <br /> See the Config Sets page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Config+Sets
  • Suggester v2 JIRA issue: https://issues.apache.org/jira/browse/SOLR-5378 <br /> <br /> Simple Query Parser in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser <br /> <br /> Complex Phrase Query Parser in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
  • See the Collapse & Expand page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Collapse+%26+Expand <br /> <br /> See also Joel Bernstein’s blog post “The CollapsingQParserPlugin: Solr’s New High Performance Field Collapsing PostFilter”: http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/ <br /> <br /> See also Joel Bernstein’s blog post “Solr’s New Expand Component”: http://heliosearch.org/solrs-new-expand-component/ <br /> <br /> See also Joel Bernstein’s blog post “Using the ExpandComponent to expand a Solr Block Join”: http://heliosearch.org/expand-block-join/
  • See Joel Bernstein’s blog post “Solr’s New AnalyticsQuery API”: http://heliosearch.org/solrs-new-analyticsquery-api/ <br /> <br /> See Joel Bernstein’s blog post “New in Solr 4.9: Query Re-Ranking”: http://heliosearch.org/solrs-new-re-ranking-feature/ <br /> <br />

What's new in solr june 2014 What's new in solr june 2014 Presentation Transcript

  • 1 What’s New in Solr Solr 4.7 & 4.8 June 12, 2014 Search | Discover | Analyze
  • Speaker • Software Engineer at LucidWorks • Lucene/Solr committer and PMC member • Previously worked on search and NLP at the Center for Natural Language Processing at Syracuse University’s iSchool • Twitter: @steven_a_rowe Steve Rowe 2
  • Agenda • A short history of Solr 4 • Solr 4.7 and 4.8: new features • Solr 4.9 and beyond 3
  • A short history of Solr 4 • Solr 4.0 released October 2012 4
  • A short history of Solr 4 • SolrCloud – Distributed indexing and searching, NRT and NoSQL features, e.g. realtime-get, optimistic concurrency and durable updates – Sharding, replication, ZooKeeper ensemble – High availability with no single points of failure • Real-time Get: Access latest document version, no commit or new searcher open required • Atomic updates: incremental field add/update/increment via stored fields • NRT: “soft” commits 5
  • A short history of Solr 4 • Solr Reference Guide now released with each feature release: – Live (targeting next Solr release): http://s.apache.org/SolrReferenceGuide – Most recent released PDF: http://s.apache.org/Solr-Ref-Guide-PDF – Previous release PDFs: http://s.apache.org/Older-Solr-Ref-Guide-PDFs 6
  • A short history of Solr 4 • Flexible indexing – Solr core = Lucene index • Lucene index = 1 or more segments – Codec: per-segment suite of formats • Flexible scoring – You can specify similarity implementation per fieldType in your schema.xml if you use SchemaSimilarityFactory – Built-in Similarities (other than the default TF-IDF): • Okapi BM25 • Divergence from Randomness • Information-Based • Language Models (with two smoothing implementations) • SweetSpot 7
  • A short history of Solr 4 • DocValues: typed column stride fields – Document-to-value mapping built at index time – Reduced memory usage compared to field cache – Good for faceting and sorting – Missing values now supported as of Solr 4.5 • Pseudo-fields – Field aliasing, e.g. &fl=result:indexed – Function queries, aliasable too, e.g. &fl=price:sum(a,b) – Document transformers • Standard: [explain], [value], [shard], [docid] • Pseudo-joins, e.g. ?q={!join+from=manu+to=id}ipod • Pivot faceting: automatic drill-down (no distr.’d support) 8
  • A short history of Solr 4 • Schema API • GET /collection/schema/fields/fieldname • PUT /collection/schema/fields/name • JSON body: { "type":"text_general", "stored":true, "indexed":true } • Schemaless mode • a.k.a. data-driven schema or field guessing • Class guessed based on field values, then class(es) mapped to a fieldType; first gets added to the schema • Supported value classes: Boolean, Integer, Long, Float, Double, and Date 9
  • A short history of Solr 4 • Document routing – CompositeId router, e.g. id=tenant!docid • Used by default when numShards specified when creating a collection. • Restrict queries to shard(s): &_route_=tenant! – Implicit router • Online shard splitting – Allows collections to scale, rather than having to decide on how much to overshard up front. – Split in two; with custom hash ranges; or using split.key param to split to a dedicated shard 10
  • A short history of Solr 4 • Nested documents, a.k.a. Block Join – Nested doc to be added: <add> <doc> <field name="id">1</field> <field name="title">Solr adds block join support</field> <field name="content_type">parentDocument</field> <doc> <field name="id">2</field> <field name="comments">SolrCloud supports it too!</field> </doc> </doc> </add> – Queries: • Child query parser, e.g. q={!child of="content_type:parentDocument"}title:Solr • Parent query parser, e.g. q={!parent which="content_type:parentDocument"}comments:SolrCloud 11
  • A short history of Solr 4 • solr.xml legacy & discovery modes – Legacy mode (cores listed in solr.xml) is deprecated; support will be removed in Solr 5. – Discovery mode (new as of Solr 4.3): • No cores are listed in solr.xml • Cores are discovered by a recursive walk of the solr home directory, marked by core.properties files • Nested core directories are not allowed 12
  • A short history of Solr 4 • New web admin UI with SolrCloud support 13
  • Solr 4.7 and 4.8: new features • As of Solr 4.8, Java 7 is the minimum supported JVM version. Recommended: Oracle 1.7.0_60 • <fields> and <types> tags are no longer necessary in schema.xml • Collections API improvements – Working toward “ZooKeeper = Truth” mode • legacyCloud=false cluster property – New actions: • CLUSTERSTATUS, LIST, ADDROLE, DELETEROLE, ADDREPLICA, DELETEREPLICA, OVERSEERSTATUS, MIGRATE, CLUSTERPROP – Core properties can be specified with CREATE and SPLITSHARD actions 14
  • Solr 4.7 and 4.8: new features • Asynchronous execution of long-running actions – SolrCloud Collections API: • CREATE, SPLITSHARD, MIGRATE – CoreAdminHandler: • CREATE, RENAME, UNLOAD, SWAP, MERGEINDEXES, SPLIT – Tracking request ID supplied via async param – Track status via the new REQUESTSTATUS action, using the tracking request ID • Possible states: running, complete, failed, notfound – Clear stored statuses with special request ID -1 15
  • Solr 4.7 and 4.8: new features • Cursors: Efficient Deep Paging – Request must include a sort, which must include the uniqueKey, which must be defined – First page: ?q=…&sort=id+asc&rows=N&cursorMark=* • Response contains "nextCursorMark":"<base64encoded>" – Following pages: ?q=…&sort=id+asc&rows=N&cursorMark=<from response> – Repeat; when nextCursorMark=cursorMark from the request, there are no more results – No server-side state 16
  • Solr 4.7 and 4.8: new features 17
  • Solr 4.7 and 4.8: new features • Document expiration and Time To Live (TTL) – Auto-delete expired documents • DocExpirationUpdateProcessorFactory can periodically wake up and delete expired documents – Compute expiration date from TTL • Update request _ttl_ param, or • Document _ttl_ field • Both names are configurable, defaulting to _ttl_. • _ttl_ values are interpreted as Date Math Expressions relative to NOW, e.g. “+1YEAR”. 18
  • Solr 4.7 and 4.8: new features • Dynamic synonyms and stopwords – “Managed” resources: configuration and content for synonyms and stopwords, persistence managed by Solr – Specified as ManagedSynonymFilterFactory and ManagedStopFilterFactory on analyzers in schema.xml – CRUD operations are enabled via a REST endpoint per managed resource. – The “managed” attribute names the REST endpoint, e.g. <filter class="solr.ManagedStopFilterFactory" managed="french" /> – E.g. to delete stopword “le” from the “french” managed stoplist: curl -X DELETE "…/solr/colln/schema/analysis/stopwords/french/le" 19
  • Solr 4.7 and 4.8: new features • SSL support in SolrCloud – URL scheme stored in ZooKeeper – SSL certificates are specifiable via system properties, to enable authentication • Nested documents may be specified in JSON format • Tri-level compositeId routing – E.g. “tenant!group!docid”, 8/8/16 hash bits per component • Build Solr indexes with Hadoop’s MapReduce – +Mark Miller’s blog: http://bit.ly/1oh0fWq • Github solr-map-reduce-example: http://bit.ly/1pnDAao • Named config sets in non-SolrCloud mode – Default base directory is SOLR_HOME/configsets/ 20
  • Solr 4.7 and 4.8: new features • Suggester v2 – Added BlendedInfixSuggester – Added FreeTextSuggester – Queries can use multiple suggesters • New query parsing features – SimpleQParserPlugin: parser for human entered queries with selectable operators. – ComplexPhraseQParserPlugin: wildcards, ORs, etc. inside Phrase Queries • E.g. {!complexphrase inOrder=true}name:"Jo* Smith" 21
  • Solr 4.7 and 4.8: new features • CollapsingQParserPlugin – Performant alternative grouping/field collapsing implementation, for high distinct group cardinality. • ExpandComponent – Expands collapsed groups – Can also expand nested documents 22
  • Solr 4.9 and beyond • ZooKeeper = Truth / legacyCloud=false • MODIFYCOLLECTION collections API – Modify maxShardsPerNode, replicationFactor for the entire collection • Incremental Field Updates on numeric DocValues – Binary DocValues IFUs also coming • Multi-valued DocValues sort fields • Legacy numeric/date field types deprecated, removed in Solr 5 in favor of Trie field types 23
  • Solr 4.9 and beyond • In Solr 5, the .war will no longer be shipped • Index integrity: checksums • Integrity check on merge off by default • solrconfig.xml option <indexConfig><checkIntegrityAtMerge> • New update query param min_rf will allow clients to set the minimum successful replicas for the request • Return Block Join child documents when parents match, via a new DocTransformer [child parentFilter=“field:value”] 24
  • Solr 4.9 and beyond • AnalyticsQuery: support pluggable, pipeline-able analytics, orderable via the “cost” parameter, like PostFilters. • ReRankingQParserPlugin • Re-rank the top n results 25
  • Platform LucidWorks Open Source 26 • Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk • Logstash for Solr: https://github.com/LucidWorks/solrlogmanager • Banana (Kibana for Solr): https://github.com/LucidWorks/banana • Data Quality Toolkit: https://github.com/LucidWorks/data- quality • Coming Soon for Big Data: Hadoop, Pig, Hive 2-way support w/ Lucene and Solr, different file formats, pipelines, Logstash
  • Links Solr website: http://lucene.apache.org/solr Solr Reference Guide: • Live (targeting next Solr release): http://s.apache.org/SolrReferenceGuide • Most recent released PDF: http://s.apache.org/Solr-Ref-Guide- PDF • Previous release PDFs: http://s.apache.org/Older-Solr-Ref- Guide-PDFs Lucene/Solr Revolution: http://www.LuceneRevolution.org Q & A 27