Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: What's New in Solr 6

3,951 views

Published on

Solr committer Cassandra Targett shares a preview of features and functionality included in the upcoming release of Apache Solr.

Published in: Technology

Webinar: What's New in Solr 6

  1. 1. What’s New in Solr 6 Cassandra Targett
  2. 2. 2016 OCTOBER 11-14
 BOSTON, MA
  3. 3. Introduction • Lucene/Solr committer since 2013 • Director of Engineering at Lucidworks
  4. 4. Solr 6 builds on the innovations of Solr 5 • Easy to use • Scalable • Secure
  5. 5. Solr 5 Main Themes • Easy to Use • bin/solr and bin/post improvements • JSON-based facets • More APIs • Modern UI (Angular-based) • Scalable • SolrCloud hardening • Replica placement strategy • Streaming expressions • Secure • Authentication and Authorization frameworks
  6. 6. Highlights of Recent Solr Releases (5.4 and 5.5) • Solr 5.4 • Basic authentication • ConfigSets API • FORCELEADER command • Optimizations for faceting DocValue fields • Solr 5.5 • Ability to edit ZooKeeper configs with bin/solr • Rule-based authorization flexibility • XML query parser • More async collection APIs
  7. 7. Solr 6 introduces several new features • Parallel SQL • Cross Data Center Replication • Graph Traversal • Modern APIs • Jetty 9.3 and HTTP/2
  8. 8. Parallel SQL Parallelized SQL support in Solr for scalable relational algebra
  9. 9. Seamlessly combines SQL with Solr’s full-text capabilities • Realtime MapReduce(ish) or Facet aggregation modes • Parallel execution of queries across SolrCloud • Advanced SQL syntax for powerful queries
  10. 10. Parallel SQL builds on Solr’s Streaming Capabilities • Export request handler (/export) • Streaming API • Streams tuples in JSON • new class: org.apache.solr.client.solrj.io • Streaming Expressions (/stream) • Allows non-Java programmers to access Streaming API • Expressions are essentially functions which originate the stream or operate on the stream
  11. 11. Streaming Expression Request - search curl -d 'expr=search(gettingstarted, q="*:*", fl=“id, manu_exact”, sort=“manu_exact asc")' http://localhost:8983/solr/gettingstarted/stream { "result-set": { "docs": [ {"manu_exact": "A-DATA Technology Inc.”, "id": "VDBDB1A16"}, {"manu_exact": "ASUS Computer Inc.”, "id": "EN7800GTX/2DHTV/256M"}, {"manu_exact": "ATI Technologies”, "id": "100-435805"} … {"EOF": true,"RESPONSE_TIME": 15}] } }
  12. 12. Functions, aka Stream Sources and Stream Decorators • Define how data is retrieved and any aggregations performed • Designed to work with entire result sets • Can be compounded or wrapped to perform several operations at the same time
  13. 13. Streaming Expression Request - reduce curl http://localhost:8983/solr/gettingstarted/stream -d ‘expr=reduce (search(gettingstarted, q="inStock:true", qt="/export", fl="id,manu_exact", sort="manu_exact asc"), by="manu_exact", group( sort="manu_exact asc", n="2"))'
  14. 14. Streaming Expression Response {“result-set": {"docs":[ {"id":"0380014300","group":[{"id":"0380014300"},{"id":"0553573403"}]}, {"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16","group":[{"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16"}]}, {"manu_exact":"Apache Software Foundation","id":"UTF8TEST","group":[{"manu_exact":"Apache Software Foundation","id":"UTF8TEST"},{"manu_exact":"Apache Software Foundation","id":"SOLR1000"}]}, {"manu_exact":"Apple Computer Inc.","id":"MA147LL/A","group":[{"manu_exact":"Apple Computer Inc.","id":"MA147LL/A"}]}, {"manu_exact":"Bank of America","id":"USD","group":[{"manu_exact":"Bank of America","id":"USD"}]}, {"manu_exact":"Bank of Norway","id":"NOK","group":[{"manu_exact":"Bank of Norway","id":"NOK"}]}, {"manu_exact":"Canon Inc.","id":"9885A004","group":[{"manu_exact":"Canon Inc.","id":"9885A004"}, {"manu_exact":"Canon Inc.","id":"0579B002"}]}, {"manu_exact":"Corsair Microsystems Inc.","id":"VS1GB400C3","group":[{"manu_exact":"Corsair Microsystems Inc.","id":"VS1GB400C3"},{"manu_exact":"Corsair Microsystems Inc.","id":"TWINX2048-3200PRO"}]}, {"manu_exact":"Dell, Inc.","id":"3007WFP","group":[{"manu_exact":"Dell, Inc.","id":"3007WFP"}]}, {“EOF":true,"RESPONSE_TIME":24}]} }
  15. 15. Available Functions • Stream Sources • Search • JDBC • Facet • Stats • Topic • Stream Decorators • Complement, Unique, Intersect • leftOuterJoin, innerJoin, hashJoin, outerHashJoin • Top, Rollup, Facet • Parallel • Decorators, cont’d • Update • Merge • Group, Reduce • Daemon • Select
  16. 16. Streaming Expression Request - parallel curl http://localhost:8983/solr/gettingstarted/stream -d 'expr=parallel(workcollection, search(gettingstarted, q="inStock:true", fl="id, manu_exact", sort="manu_exact asc", partitionKeys="manu_exact"), workers=2, zkHost="localhost:9983", sort="manu_exact asc")'
  17. 17. Parallel SQL builds on export and streaming • SQL statements translated into Streaming Expressions • Automatic merge of results from worker nodes • Advanced SQL syntax
  18. 18. SQL Syntax • SELECT and SELECT DISTINCT • select id, manu_exact from techproducts • select distinct id, manu_exact from techproducts • WHERE • select id, manu_exact from techproducts where inStock=true • select id, manu_exact from techproducts order where price=‘[10 TO 50]’ • select id, manu_exact from techproducts where cat=‘(electronics or music)’
  19. 19. SQL Syntax • ORDER BY and LIMIT • select id, manu_exact from techproducts order by manu_exact asc • select id, manu_exact from techproducts limit 10 • GROUP BY • select id, manu_exact from techproducts where inStock=true group by manu
  20. 20. SQL Syntax • Stats • select count(manu_exact) as count, avg(price) as avg from techproducts • HAVING • select id, manu_exact from techproducts where inStock=true having (avg(price)>5) order by manu_exact asc
  21. 21. SQL Statement and Results {"result-set": {"docs":[ {"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16"}, {"manu_exact":"Apache Software Foundation","id":"SOLR1000"}, {"manu_exact":"Apache Software Foundation","id":"UTF8TEST"}, {"manu_exact":"Apple Computer Inc.","id":"MA147LL/A"}, {"manu_exact":"Bank of America","id":"USD"}, {"EOF":"true","RESPONSE_TIME":8}] } } curl -d '&stmt=select id, manu_exact from techproducts where inStock='true' order by manu_exact limit 5' http://localhost:8983/solr/techproducts/sql
  22. 22. Aggregation Modes • map_reduce • Tuples are shuffled to worker nodes, where aggregation occurs • Tuples are sent to worker nodes sorted by GROUP BY fields • Great for high cardinality • facet • Pushes computation to JSON Facet API - only aggregates are sent over the network • Great for low-to-moderate cardinality
  23. 23. Parallel SQL with map_reduce Aggregation Mode Client/sql handlerSQL Tier worker 2 worker 3 worker 4worker 1Worker Tier s2_r1 s1_r3 s1_r2 s1_r1 s2_r2 s2_r3 s3_r3 s3_r2 s3_r1 s4_r3 s4_r2 s4_r1 Data Tier Each worker queries 1 replica in each shard
  24. 24. JDBC Driver • Solr now includes a JDBC driver which can be used to query Solr • Can be used only with the SQL handler • DB visualization tools can also be used, such as Apache Zeppelin, Squirrel, DBVisualizer, etc.
  25. 25. Best Practices • Create a separate collection for the /sql handler and worker nodes • Designed for large clusters and large data sets • Use the correct aggregation mode • Usually best to partition on what you are grouping on
  26. 26. DocValue Fields ONLY! Export and Stream request handlers can only be used on fields that use DocValues. Because Parallel SQL uses these capabilities, in most cases it also requires DocValue fields.
  27. 27. Cross Data Center Replication Replication between two or more SolrCloud clusters in two or more data centers
  28. 28. CDCR Design Points • Uses existing transaction logs • Leader-to-Leader communication avoids duplicate updates across data centers • Active-passive disaster recovery • Synchronous or asynchronous indexing • Configurable batch sizes • No single point of failure or bottlenecks
  29. 29. Title
  30. 30. CDCR Limitations • Must start with an empty index or one that is already fully synchronized • May be unsatisfactory if rate of updates is high • Active-passive
  31. 31. Graph Traversal Perform graph queries for interconnected data
  32. 32. Solr supports graph queries • Follow nodes to edges • Apply optional filters during traversal • Use cases: • Find all tweets mentioning “Solr” by me or people I follow • Find all draft blog posts about “parallel sql” written by a developer I know • Find 3-star hotels in NYC my friends stayed in last year q=Solr&fq={!graph from=following_id to=id maxDepth=1}id:”childerelda”
  33. 33. Modern API Redesign Solr’s user-facing APIs
  34. 34. Designed for Humans • Consistent • Versioned • Friendlier endpoint names • Introspectable • JSON output by default (`wt` still supported) Not in 6.0, but coming very soon
  35. 35. {"responseHeader": { "status": 0, "QTime": 2 }, "initFailures": {}, "status": { "techproducts": { "name": "techproducts", "instanceDir": "/Users/cass/LuceneSolr/lucene-solr/solr/example/techproducts/solr/techproducts", "dataDir": "/Users/cass/LuceneSolr/lucene-solr/solr/example/techproducts/solr/techproducts/data/", "config": "solrconfig.xml", "schema": "managed-schema", "startTime": "2016-03-07T19:18:07.765Z", "uptime": 295560, "index": { "numDocs": 32, "maxDoc": 32, "deletedDocs": 0, "indexHeapUsageBytes": -1, "version": 6, "segmentCount": 1, "current": true, "hasDeletions": false, "directory": "org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/Users/cass/LuceneSolr/lucene-solr/solr/example/ techproducts/solr/techproducts/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1244fae; maxCacheMB=48.0 maxMergeSizeMB=4.0)", "segmentsFile": "segments_2", "segmentsFileSizeInBytes": 165, "userData": { "commitTimeMSec": "1457378288231" }, "lastModified": "2016-03-07T19:18:08.231Z", "sizeInBytes": 27542, "size": "26.9 KB" } }}} http://localhost:8983/solr/v2/cores
  36. 36. { "schema":{ "name":"example", "version":1.6, "uniqueKey":"id", "fieldTypes":[{ "name":"_bbox_coord", "class":"solr.TrieDoubleField", "stored":false, "docValues":true, “precisionStep":"8"}], "fields":[{ "name":"_root_", "type":"string", "indexed":true, "stored":false}, { "name":"_src_", "type":"string", "indexed":false, "stored":true}, { "name":"_version_", "type":"long", "indexed":true, “stored”:true}] } } http://localhost:8983/solr/v2/cores/techproducts/schema truncated response
  37. 37. { "spec": [{ "documentation": "https://cwiki.apache.org/confluence/display/solr/Schema+API", "methods": ["POST"], "url": { "paths": ["$handlerName"] }, "commands": { "add-field": { "properties": {}, "additionalProperties": true }, "delete-field": { "additionalProperties": true } } }, { "documentation": "https://cwiki.apache.org/confluence/display/solr$handlerName+API", "methods": ["GET"], "url": { "paths": ["$handlerName", "$handlerName/name", "$handlerName/uniquekey", "$handlerName/version", "$handlerName/similarity", "$handlerName/solrqueryparser", "$handlerName/zkversion", "$handlerName/zkversion", "$handlerName/solrqueryparser/defaultoperator", "$handlerName/name", "$handlerName/version", "$handlerName/uniquekey", "$handlerName/similarity", "$handlerName/similarity"] }, "body": null }] } http://localhost:8983/solr/v2/cores/techproducts/schema/_introspect truncated response
  38. 38. …and More • BM25 is the default Similarity • SolrCloud Backup/Restore API • AngularJS-based Admin UI • Jetty 9.3 and HTTP/2 (in 6.x)
  39. 39. Collection Overview Screen
  40. 40. Getting Ready to Upgrade Highlights of other major changes
  41. 41. Java 8 or higher only! If you are still using Java 7, you will need to update Java before upgrading to Solr 6.
  42. 42. Changes to Defaults • Default schemaFactory is now ManagedIndexSchemaFactory • Similarity defaults: • If no <similarity> defined, SchemaSimilarityFactory is used • Defaults to BM25 when field type does not declare similarity
  43. 43. Deprecations introduced in Solr 5 have been removed • SolrServer and subclasses (use SolrClient) • DefaultSimilarityFactory has been removed • GET methods on the Schema API have been changed • range.date has been removed (finally) • SolrClient.shutdown() removed in favor of SolrClient.close()
  44. 44. All right, WHEN? The first release candidate could be created this week. Expect release in the next 2-4 weeks.
  45. 45. More Information • Solr Reference Guide • https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface • https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions+(Solr+6) • Joel Bernstein’s presentation at Lucene Revolution • https://www.youtube.com/watch?v=baWQfHWozXc • Yonik’s blog, Solr ’n Stuff • http://yonik.com/solr-cross-data-center-replication/ • http://yonik.com/solr-6/ • Shalin’s presentation to Bangalore Apache Solr/Lucene Group: http://slides.com/ shalinmangar/what-s-cooking
  46. 46. Thanks to everyone who’s blogged or presented on upcoming features • Joel Bernstein and Dennis Gove • Shalin Mangar • Yonik Seeley • Doug Turnbull
  47. 47. Questions? @childerelda www.lucidworks.com

×