Webinar: What's New in Solr 6

What’s New in Solr 6
Cassandra Targett

2016
OCTOBER 11-14 
BOSTON, MA

Introduction
• Lucene/Solr committer
since 2013
• Director of Engineering
at Lucidworks

Solr 6 builds on the
innovations of Solr 5
• Easy to use
• Scalable
• Secure

Solr 5 Main Themes
• Easy to Use
• bin/solr and bin/post
improvements
• JSON-based facets
• More APIs
• Modern UI (Angular-based)
• Scalable
• SolrCloud hardening
• Replica placement strategy
• Streaming expressions
• Secure
• Authentication and Authorization
frameworks

Highlights of Recent Solr Releases (5.4 and 5.5)
• Solr 5.4
• Basic authentication
• ConfigSets API
• FORCELEADER command
• Optimizations for faceting
DocValue fields
• Solr 5.5
• Ability to edit ZooKeeper configs
with bin/solr
• Rule-based authorization
flexibility
• XML query parser
• More async collection APIs

Solr 6
introduces
several new
features
• Parallel SQL
• Cross Data Center Replication
• Graph Traversal
• Modern APIs
• Jetty 9.3 and HTTP/2

Parallel SQL
Parallelized SQL support in Solr for scalable relational algebra

Seamlessly combines
SQL with Solr’s full-text
capabilities
• Realtime MapReduce(ish)
or Facet aggregation
modes
• Parallel execution of
queries across SolrCloud
• Advanced SQL syntax for
powerful queries

Parallel SQL builds on Solr’s Streaming Capabilities
• Export request handler (/export)
• Streaming API
• Streams tuples in JSON
• new class: org.apache.solr.client.solrj.io
• Streaming Expressions (/stream)
• Allows non-Java programmers to access Streaming API
• Expressions are essentially functions which originate the stream or operate on
the stream

Streaming Expression Request - search
curl -d 'expr=search(gettingstarted,
q="*:*",
ﬂ=“id, manu_exact”,
sort=“manu_exact asc")' http://localhost:8983/solr/gettingstarted/stream
{
"result-set": {
"docs": [
{"manu_exact": "A-DATA Technology Inc.”, "id": "VDBDB1A16"},
{"manu_exact": "ASUS Computer Inc.”, "id": "EN7800GTX/2DHTV/256M"},
{"manu_exact": "ATI Technologies”, "id": "100-435805"}
…
{"EOF": true,"RESPONSE_TIME": 15}]
}
}

Functions, aka
Stream Sources
and Stream
Decorators
• Deﬁne how data is retrieved and any
aggregations performed
• Designed to work with entire result sets
• Can be compounded or wrapped to perform
several operations at the same time

Streaming Expression Request - reduce
curl http://localhost:8983/solr/gettingstarted/stream -d
‘expr=reduce
(search(gettingstarted,
q="inStock:true",
qt="/export",
ﬂ="id,manu_exact",
sort="manu_exact asc"),
by="manu_exact",
group(
sort="manu_exact asc", n="2"))'

Streaming Expression Response
{“result-set":
{"docs":[
{"id":"0380014300","group":[{"id":"0380014300"},{"id":"0553573403"}]},
{"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16","group":[{"manu_exact":"A-DATA Technology
Inc.","id":"VDBDB1A16"}]},
{"manu_exact":"Apache Software Foundation","id":"UTF8TEST","group":[{"manu_exact":"Apache Software
Foundation","id":"UTF8TEST"},{"manu_exact":"Apache Software Foundation","id":"SOLR1000"}]},
{"manu_exact":"Apple Computer Inc.","id":"MA147LL/A","group":[{"manu_exact":"Apple Computer
Inc.","id":"MA147LL/A"}]},
{"manu_exact":"Bank of America","id":"USD","group":[{"manu_exact":"Bank of America","id":"USD"}]},
{"manu_exact":"Bank of Norway","id":"NOK","group":[{"manu_exact":"Bank of Norway","id":"NOK"}]},
{"manu_exact":"Canon Inc.","id":"9885A004","group":[{"manu_exact":"Canon Inc.","id":"9885A004"},
{"manu_exact":"Canon Inc.","id":"0579B002"}]},
{"manu_exact":"Corsair Microsystems Inc.","id":"VS1GB400C3","group":[{"manu_exact":"Corsair Microsystems
Inc.","id":"VS1GB400C3"},{"manu_exact":"Corsair Microsystems Inc.","id":"TWINX2048-3200PRO"}]},
{"manu_exact":"Dell, Inc.","id":"3007WFP","group":[{"manu_exact":"Dell, Inc.","id":"3007WFP"}]},
{“EOF":true,"RESPONSE_TIME":24}]}
}

Available Functions
• Stream Sources
• Search
• JDBC
• Facet
• Stats
• Topic
• Stream Decorators
• Complement, Unique,
Intersect
• leftOuterJoin, innerJoin,
hashJoin, outerHashJoin
• Top, Rollup, Facet
• Parallel
• Decorators, cont’d
• Update
• Merge
• Group, Reduce
• Daemon
• Select

Streaming Expression Request - parallel
curl http://localhost:8983/solr/gettingstarted/stream -d
'expr=parallel(workcollection,
search(gettingstarted,
q="inStock:true",
ﬂ="id, manu_exact",
sort="manu_exact asc",
partitionKeys="manu_exact"),
workers=2,
zkHost="localhost:9983",
sort="manu_exact asc")'

Parallel SQL builds on
export and streaming
• SQL statements
translated into Streaming
Expressions
• Automatic merge of
results from worker
nodes
• Advanced SQL syntax

SQL Syntax
• SELECT and SELECT DISTINCT
• select id, manu_exact from techproducts
• select distinct id, manu_exact from techproducts
• WHERE
• select id, manu_exact from techproducts where inStock=true
• select id, manu_exact from techproducts order where price=‘[10 TO 50]’
• select id, manu_exact from techproducts where cat=‘(electronics or music)’

SQL Syntax
• ORDER BY and LIMIT
• select id, manu_exact from techproducts order by manu_exact asc
• select id, manu_exact from techproducts limit 10
• GROUP BY
• select id, manu_exact from techproducts where inStock=true group by manu

SQL Syntax
• Stats
• select count(manu_exact) as count, avg(price) as avg from techproducts
• HAVING
• select id, manu_exact from techproducts where inStock=true having
(avg(price)>5) order by manu_exact asc

SQL Statement and Results
{"result-set":
{"docs":[
{"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16"},
{"manu_exact":"Apache Software Foundation","id":"SOLR1000"},
{"manu_exact":"Apache Software Foundation","id":"UTF8TEST"},
{"manu_exact":"Apple Computer Inc.","id":"MA147LL/A"},
{"manu_exact":"Bank of America","id":"USD"},
{"EOF":"true","RESPONSE_TIME":8}]
}
}
curl -d '&stmt=select id, manu_exact from techproducts where inStock='true' order by
manu_exact limit 5' http://localhost:8983/solr/techproducts/sql

Aggregation Modes
• map_reduce
• Tuples are shufﬂed to worker nodes, where aggregation occurs
• Tuples are sent to worker nodes sorted by GROUP BY ﬁelds
• Great for high cardinality
• facet
• Pushes computation to JSON Facet API - only aggregates are sent over the
network
• Great for low-to-moderate cardinality

Parallel SQL with map_reduce Aggregation Mode
Client/sql handlerSQL Tier
worker 2 worker 3 worker 4worker 1Worker Tier
s2_r1
s1_r3
s1_r2
s1_r1
s2_r2
s2_r3 s3_r3
s3_r2
s3_r1
s4_r3
s4_r2
s4_r1
Data Tier
Each worker queries 1 replica in each shard

JDBC Driver
• Solr now includes a JDBC driver which can be
used to query Solr
• Can be used only with the SQL handler
• DB visualization tools can also be used, such as
Apache Zeppelin, Squirrel, DBVisualizer, etc.

Best Practices
• Create a separate collection for the /sql
handler and worker nodes
• Designed for large clusters and large data sets
• Use the correct aggregation mode
• Usually best to partition on what you are
grouping on

DocValue Fields ONLY!
Export and Stream request handlers can only be used on ﬁelds
that use DocValues.
Because Parallel SQL uses these capabilities, in most cases it also
requires DocValue ﬁelds.

Cross Data Center Replication
Replication between two or more SolrCloud clusters in two or
more data centers

CDCR Design Points
• Uses existing transaction logs
• Leader-to-Leader communication avoids duplicate updates across data centers
• Active-passive disaster recovery
• Synchronous or asynchronous indexing
• Conﬁgurable batch sizes
• No single point of failure or bottlenecks

CDCR Limitations
• Must start with an empty
index or one that is
already fully
synchronized
• May be unsatisfactory if
rate of updates is high
• Active-passive

Graph Traversal
Perform graph queries for interconnected data

Solr supports
graph queries
• Follow nodes to edges
• Apply optional ﬁlters during traversal
• Use cases:
• Find all tweets mentioning “Solr” by me or
people I follow
• Find all draft blog posts about “parallel sql”
written by a developer I know
• Find 3-star hotels in NYC my friends stayed in
last year
q=Solr&fq={!graph from=following_id to=id
maxDepth=1}id:”childerelda”

Modern API
Redesign Solr’s user-facing APIs

Designed for
Humans
• Consistent
• Versioned
• Friendlier endpoint names
• Introspectable
• JSON output by default (`wt` still supported)
Not in 6.0, but coming very soon

{"responseHeader": {
"status": 0,
"QTime": 2
},
"initFailures": {},
"status": {
"techproducts": {
"name": "techproducts",
"instanceDir": "/Users/cass/LuceneSolr/lucene-solr/solr/example/techproducts/solr/techproducts",
"dataDir": "/Users/cass/LuceneSolr/lucene-solr/solr/example/techproducts/solr/techproducts/data/",
"config": "solrconfig.xml",
"schema": "managed-schema",
"startTime": "2016-03-07T19:18:07.765Z",
"uptime": 295560,
"index": {
"numDocs": 32,
"maxDoc": 32,
"deletedDocs": 0,
"indexHeapUsageBytes": -1,
"version": 6,
"segmentCount": 1,
"current": true,
"hasDeletions": false,
"directory": "org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/Users/cass/LuceneSolr/lucene-solr/solr/example/
techproducts/solr/techproducts/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1244fae; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile": "segments_2",
"segmentsFileSizeInBytes": 165,
"userData": {
"commitTimeMSec": "1457378288231"
},
"lastModified": "2016-03-07T19:18:08.231Z",
"sizeInBytes": 27542,
"size": "26.9 KB"
}
}}}
http://localhost:8983/solr/v2/cores

{
"schema":{
"name":"example",
"version":1.6,
"uniqueKey":"id",
"ﬁeldTypes":[{
"name":"_bbox_coord",
"class":"solr.TrieDoubleField",
"stored":false,
"docValues":true,
“precisionStep":"8"}],
"ﬁelds":[{
"name":"_root_",
"type":"string",
"indexed":true,
"stored":false},
{
"name":"_src_",
"type":"string",
"indexed":false,
"stored":true},
{
"name":"_version_",
"type":"long",
"indexed":true,
“stored”:true}]
}
}
http://localhost:8983/solr/v2/cores/techproducts/schema
truncated response

{
"spec": [{
"documentation": "https://cwiki.apache.org/confluence/display/solr/Schema+API",
"methods": ["POST"],
"url": {
"paths": ["$handlerName"]
},
"commands": {
"add-field": {
"properties": {},
"additionalProperties": true
},
"delete-field": {
"additionalProperties": true
}
}
}, {
"documentation": "https://cwiki.apache.org/confluence/display/solr$handlerName+API",
"methods": ["GET"],
"url": {
"paths": ["$handlerName", "$handlerName/name", "$handlerName/uniquekey", "$handlerName/version", "$handlerName/similarity",
"$handlerName/solrqueryparser", "$handlerName/zkversion", "$handlerName/zkversion", "$handlerName/solrqueryparser/defaultoperator",
"$handlerName/name", "$handlerName/version", "$handlerName/uniquekey", "$handlerName/similarity", "$handlerName/similarity"]
},
"body": null
}]
}
http://localhost:8983/solr/v2/cores/techproducts/schema/_introspect
truncated response

…and More
• BM25 is the default Similarity
• SolrCloud Backup/Restore API
• AngularJS-based Admin UI
• Jetty 9.3 and HTTP/2 (in 6.x)

Getting Ready to Upgrade
Highlights of other major changes

Java 8 or higher only!
If you are still using Java 7, you will need to update Java before
upgrading to Solr 6.

Changes to
Defaults
• Default schemaFactory is now
ManagedIndexSchemaFactory
• Similarity defaults:
• If no <similarity> deﬁned,
SchemaSimilarityFactory is used
• Defaults to BM25 when ﬁeld type does not
declare similarity

Deprecations
introduced in
Solr 5 have
been removed
• SolrServer and subclasses (use SolrClient)
• DefaultSimilarityFactory has been removed
• GET methods on the Schema API have been
changed
• range.date has been removed (ﬁnally)
• SolrClient.shutdown() removed in favor of
SolrClient.close()

All right, WHEN?
The ﬁrst release candidate could be created this week.
Expect release in the next 2-4 weeks.

More Information
• Solr Reference Guide
• https://cwiki.apache.org/conﬂuence/display/solr/Parallel+SQL+Interface
• https://cwiki.apache.org/conﬂuence/display/solr/Streaming+Expressions+(Solr+6)
• Joel Bernstein’s presentation at Lucene Revolution
• https://www.youtube.com/watch?v=baWQfHWozXc
• Yonik’s blog, Solr ’n Stuff
• http://yonik.com/solr-cross-data-center-replication/
• http://yonik.com/solr-6/
• Shalin’s presentation to Bangalore Apache Solr/Lucene Group: http://slides.com/
shalinmangar/what-s-cooking

Thanks to everyone
who’s blogged or
presented on upcoming
features
• Joel Bernstein and
Dennis Gove
• Shalin Mangar
• Yonik Seeley
• Doug Turnbull

Questions?
@childerelda
www.lucidworks.com

Webinar: What's New in Solr 6

More Related Content

What's hot

Similar to Webinar: What's New in Solr 6

More from Lucidworks

Recently uploaded

Webinar: What's New in Solr 6