10. Seamlessly combines
SQL with Solr’s full-text
capabilities
• Realtime MapReduce(ish)
or Facet aggregation
modes
• Parallel execution of
queries across SolrCloud
• Advanced SQL syntax for
powerful queries
11. Parallel SQL builds on Solr’s Streaming Capabilities
• Export request handler (/export)
• Streaming API
• Streams tuples in JSON
• new class: org.apache.solr.client.solrj.io
• Streaming Expressions (/stream)
• Allows non-Java programmers to access Streaming API
• Expressions are essentially functions which originate the stream or operate on
the stream
13. Functions, aka
Stream Sources
and Stream
Decorators
• Define how data is retrieved and any
aggregations performed
• Designed to work with entire result sets
• Can be compounded or wrapped to perform
several operations at the same time
18. Parallel SQL builds on
export and streaming
• SQL statements
translated into Streaming
Expressions
• Automatic merge of
results from worker
nodes
• Advanced SQL syntax
19. SQL Syntax
• SELECT and SELECT DISTINCT
• select id, manu_exact from techproducts
• select distinct id, manu_exact from techproducts
• WHERE
• select id, manu_exact from techproducts where inStock=true
• select id, manu_exact from techproducts order where price=‘[10 TO 50]’
• select id, manu_exact from techproducts where cat=‘(electronics or music)’
20. SQL Syntax
• ORDER BY and LIMIT
• select id, manu_exact from techproducts order by manu_exact asc
• select id, manu_exact from techproducts limit 10
• GROUP BY
• select id, manu_exact from techproducts where inStock=true group by manu
21. SQL Syntax
• Stats
• select count(manu_exact) as count, avg(price) as avg from techproducts
• HAVING
• select id, manu_exact from techproducts where inStock=true having
(avg(price)>5) order by manu_exact asc
22. SQL Statement and Results
{"result-set":
{"docs":[
{"manu_exact":"A-DATA Technology Inc.","id":"VDBDB1A16"},
{"manu_exact":"Apache Software Foundation","id":"SOLR1000"},
{"manu_exact":"Apache Software Foundation","id":"UTF8TEST"},
{"manu_exact":"Apple Computer Inc.","id":"MA147LL/A"},
{"manu_exact":"Bank of America","id":"USD"},
{"EOF":"true","RESPONSE_TIME":8}]
}
}
curl -d '&stmt=select id, manu_exact from techproducts where inStock='true' order by
manu_exact limit 5' http://localhost:8983/solr/techproducts/sql
23. Aggregation Modes
• map_reduce
• Tuples are shuffled to worker nodes, where aggregation occurs
• Tuples are sent to worker nodes sorted by GROUP BY fields
• Great for high cardinality
• facet
• Pushes computation to JSON Facet API - only aggregates are sent over the
network
• Great for low-to-moderate cardinality
24. Parallel SQL with map_reduce Aggregation Mode
Client/sql handlerSQL Tier
worker 2 worker 3 worker 4worker 1Worker Tier
s2_r1
s1_r3
s1_r2
s1_r1
s2_r2
s2_r3 s3_r3
s3_r2
s3_r1
s4_r3
s4_r2
s4_r1
Data Tier
Each worker queries 1 replica in each shard
25. JDBC Driver
• Solr now includes a JDBC driver which can be
used to query Solr
• Can be used only with the SQL handler
• DB visualization tools can also be used, such as
Apache Zeppelin, Squirrel, DBVisualizer, etc.
26. Best Practices
• Create a separate collection for the /sql
handler and worker nodes
• Designed for large clusters and large data sets
• Use the correct aggregation mode
• Usually best to partition on what you are
grouping on
27. DocValue Fields ONLY!
Export and Stream request handlers can only be used on fields
that use DocValues.
Because Parallel SQL uses these capabilities, in most cases it also
requires DocValue fields.
28. Cross Data Center Replication
Replication between two or more SolrCloud clusters in two or
more data centers
29. CDCR Design Points
• Uses existing transaction logs
• Leader-to-Leader communication avoids duplicate updates across data centers
• Active-passive disaster recovery
• Synchronous or asynchronous indexing
• Configurable batch sizes
• No single point of failure or bottlenecks
31. CDCR Limitations
• Must start with an empty
index or one that is
already fully
synchronized
• May be unsatisfactory if
rate of updates is high
• Active-passive
33. Solr supports
graph queries
• Follow nodes to edges
• Apply optional filters during traversal
• Use cases:
• Find all tweets mentioning “Solr” by me or
people I follow
• Find all draft blog posts about “parallel sql”
written by a developer I know
• Find 3-star hotels in NYC my friends stayed in
last year
q=Solr&fq={!graph from=following_id to=id
maxDepth=1}id:”childerelda”
35. Designed for
Humans
• Consistent
• Versioned
• Friendlier endpoint names
• Introspectable
• JSON output by default (`wt` still supported)
Not in 6.0, but coming very soon
42. Java 8 or higher only!
If you are still using Java 7, you will need to update Java before
upgrading to Solr 6.
43. Changes to
Defaults
• Default schemaFactory is now
ManagedIndexSchemaFactory
• Similarity defaults:
• If no <similarity> defined,
SchemaSimilarityFactory is used
• Defaults to BM25 when field type does not
declare similarity
44. Deprecations
introduced in
Solr 5 have
been removed
• SolrServer and subclasses (use SolrClient)
• DefaultSimilarityFactory has been removed
• GET methods on the Schema API have been
changed
• range.date has been removed (finally)
• SolrClient.shutdown() removed in favor of
SolrClient.close()
45. All right, WHEN?
The first release candidate could be created this week.
Expect release in the next 2-4 weeks.
46. More Information
• Solr Reference Guide
• https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface
• https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions+(Solr+6)
• Joel Bernstein’s presentation at Lucene Revolution
• https://www.youtube.com/watch?v=baWQfHWozXc
• Yonik’s blog, Solr ’n Stuff
• http://yonik.com/solr-cross-data-center-replication/
• http://yonik.com/solr-6/
• Shalin’s presentation to Bangalore Apache Solr/Lucene Group: http://slides.com/
shalinmangar/what-s-cooking
47. Thanks to everyone
who’s blogged or
presented on upcoming
features
• Joel Bernstein and
Dennis Gove
• Shalin Mangar
• Yonik Seeley
• Doug Turnbull