SlideShare a Scribd company logo
1 of 31
Download to read offline
2016
OCTOBER 11-14

BOSTON, MA
http://lucenerevolution.com
Solr 6 Deep Dive: SQL
and Graph
Grant Ingersoll
@gsingers
CTO, Lucidworks
Tim Potter
@thelabdude
Sr. Software Engineer, Lucidworks
• Motivations
• Streaming Expressions and Parallel SQL
• Graph Capabilities
• How does this compare to…?
• Future Directions
Agenda
Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content
• Big data systems have grown too complex trying to satisfy a variety of access patterns
• Fast primary key lookups / atomic updates (Solr, HBase, Cassandra, …)
• Low-latency ranked retrieval (Solr, Elastic, DataStax, …)
• Large, distributed table scans (Spark, M/R, Pig, Cassandra, Hive, Impala, …)
• Graph traversal (Graphx, Giraph, Neo4j, …)
• De-normalization can be inconvenient as related data sets can change at different velocities
(movies vs. movie ratings)
• Leverage progress made by the Solr community to support big data in Solr using horizontal
scalability (shards & replicas)
• Don’t forget about speed ~ Search engines in general and Solr in particular are extremely fast!
Why Solr needs Parallel Computation
Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &

Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.
Fusion Architecture
RESTAPI
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS(Optional)
Shared Config
Mgmt
Leader
Election
Load
Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
Streaming Expressions
and SQL
• SQL is ubiquitous language for analytics
• People: Less training and easier to understand
• Tools! Solr as JDBC data source (DbVisualizer,
Apache Zeppelin, and SQuirreL SQL)
• Query planning / optimization can evolve
iteratively
SQL is natural extension for Solr’s parallel computing engine
Give me the top 5 action movies with rating of 4 or better
Mental Warm-up
/select?q=*:*
&fq=genre_ss:action
&fq=rating_i:[4 TO *]
&facet=true
&facet.limit=5
&facet.mincount=1
&facet.field=title_s
SELECT title_s, COUNT(*) as cnt
FROM movielens
WHERE genre_ss='action'
AND rating_i='[4 TO *]’
GROUP BY title_s
ORDER BY cnt desc
LIMIT 5
{	
  ...	
  
	
  	
  	
  "facet_counts":{	
  
	
  	
  	
  	
  "facet_fields":{	
  
	
  	
  	
  	
  	
  	
  "title_s":[	
  
	
  	
  	
  	
  	
  	
  	
  	
  "Star	
  Wars	
  (1977)",501,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "Return	
  of	
  the	
  Jedi	
  (1983)",379,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "Godfather,	
  The	
  (1972)",351,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "Raiders	
  of	
  the	
  Lost	
  Ark	
  (1981)",348,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "Empire	
  Strikes	
  Back,	
  The	
  (1980)",293]},	
  
	
  	
  	
  	
  ...}}
{"result-­‐set":{"docs":[	
  
{"title_s":"Star	
  Wars	
  (1977)”,"cnt":501},	
  
{"title_s":"Return	
  of	
  the	
  Jedi	
  (1983)","cnt":379},	
  
{"title_s":"Godfather,	
  The	
  (1972)","cnt":351},	
  
{"title_s":"Raiders	
  of	
  the	
  Lost	
  Ark	
  (1981)","cnt":348},	
  
{"title_s":"Empire	
  Strikes	
  Back,	
  The	
  (1980)","cnt":293},	
  
{"EOF":true,"RESPONSE_TIME":42}]}}
 	
  SELECT	
  gender_s,	
  COUNT(*)	
  as	
  num_ratings,	
  avg(rating_i)	
  as	
  avg_rating	
  	
  
	
  	
  	
  	
  FROM	
  movielens	
  	
  
	
  	
  	
  WHERE	
  genre_ss='romance'	
  AND	
  age_i='[30	
  TO	
  *]'	
  
GROUP	
  BY	
  gender_s	
  	
  
ORDER	
  BY	
  num_ratings	
  desc
SQL Examples
	
  	
  SELECT	
  title_s,	
  genre_s,	
  COUNT(*)	
  as	
  num_ratings,	
  avg(rating_i)	
  as	
  avg_rating	
  	
  
	
  	
  	
  	
  FROM	
  movielens	
  	
  
GROUP	
  BY	
  title_s,	
  genre_s	
  	
  
	
  	
  HAVING	
  num_ratings	
  >=	
  100	
  	
  
ORDER	
  BY	
  avg_rating	
  desc	
  	
  
	
  	
  	
  LIMIT	
  5
	
  	
  SELECT	
  DISTINCT(user_id_i)	
  as	
  user_id	
  	
  
	
  	
  	
  	
  FROM	
  movielens	
  	
  
	
  	
  	
  WHERE	
  genre_ss='documentary'	
  	
  
ORDER	
  BY	
  user_id	
  desc
Give me the avg rating for men
and women over 30 for
romance movies
Give me the top 5 rated movies
with at least 100 ratings
Give me the set of unique users
that have rated documentaries
• Perform relational operations on
streams
• Stream sources: search, jdbc, facets,
stats, topic, gatherNodes
• Stream decorators: complement,
daemon, leftOuterJoin, hashJoin,
innerJoin, intersect, merge,
outerHashJoin, parallel, reduce,
random, rollup, select, shortestPath,
sort, top, unique, update
Streaming Expressions
• Relies on docValues (column-oriented data
structure) and /export handler
• Extreme read performance (8-10x faster than
queries using cursorMark)
• Facet or map/reduce style aggregation modes
• Tiered architecture
• SQL interface tier
• Worker tier (scale a pool of worker “nodes”
independently of the data collection)
• Data tier (Solr collection)
Streaming API: Nuts and Bolts
parallel(workers,	
  
	
  	
  hashJoin(	
  
	
  	
  	
  	
  search(movielens,	
  q=*:*,	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  fl="user_id_i,movie_id_i,rating_i",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  sort="movie_id_i	
  asc",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  partitionKeys="movie_id_i"),	
  
	
  	
  	
  	
  hashed=search(movielens_movies,	
  q=*:*,	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  fl="movie_id_i,title_s,genre_s",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  sort="movie_id_i	
  asc",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  partitionKeys="movie_id_i"),	
  
	
  	
  	
  	
  on="movie_id_i"	
  
	
  	
  ),	
  
	
  	
  workers="4",	
  
	
  	
  sort="movie_id_i	
  asc"	
  
)	
  
Streaming Expression Example: hashJoin
The small “right” side of the join
gets loaded into memory on
each worker node
Each shard queried by N
workers, so 4 workers x 4 shards
means 16 queries (usually all
replicas per shard are hit)
Workers collection isolates parallel
computation nodes from data nodes
Aggregation Modes
• Map/Reduce aggregationMode — for high cardinality aggregations and distributed joins
(requires a shuffle phase to move keys to correct worker)
curl	
  -­‐-­‐data-­‐urlencode	
  "stmt=SELECT	
  user_id_i,	
  avg(rating_i)	
  as	
  avg_rating	
  FROM	
  movielens	
  GROUP	
  BY	
  user_id_i"	
  	
  
“http://host:port/solr/movielens/sql?aggregationMode=map_reduce”	
  
• Facet aggregationMode — Uses JSON facet engine for high performance on low-to-moderate
cardinality fields (e.g. movies)
curl	
  -­‐-­‐data-­‐urlencode	
  "stmt=SELECT	
  movie_id_i,	
  avg(rating_i)	
  as	
  avg_rating	
  FROM	
  movielens	
  GROUP	
  BY	
  movie_id_i"	
  	
  
	
  	
  “http://host:port/solr/movielens/sql?aggregationMode=facet”
• spark-solr project uses streaming API to pull data
from Solr into Spark jobs if docValues enabled,
see: https://github.com/lucidworks/spark-solr
• Perform aggregations of “signals”, e.g clicks, to
compute boosts and recommendations using
Spark
• Custom Scala script jobs to perform complex
analysis on data in Solr, e.g. sessionize request
logs
• Power rich data visualizations using Spark SQL
over Solr streaming aggregations
How we use Solr streaming API in Fusion
Graph
• Anomaly detection and fraud detection
• Recommenders
• Social network analysis
• Graph Search
• Access Control
• Examples:
• Find all tweets mentioning “Solr” by me or people I follow
• Find all draft blog posts about “Parallel SQL” written by a developer
• Find 3-star hotels in NYC my friends stayed in last year
Graph Use Cases
• Some data is much more naturally represented as a graph structure
• Traditionally hard to deal with in search’s inverted index
• Solr 6.0 introduces the Graph Query Parser
• Solr 6.1 brings Graph Streaming expressions
Graph Basics
• Query-time, cyclic aware graph traversal is able to rank
documents based on relationships
• Provides controls for depth, filtering of results and
inclusion of root and/or leaves
• Limitations: single node/shard only
• Examples:
• http://localhost:8983/solr/graph/query?
fl=id,score&q={!graph+from=in_edge+to=out_edge}
id:A
• http://localhost:8983/solr/my_graph/query?fl=id&q={!
graph+from=in_edge+to=out_edge
+traversalFilter='foo:[*+TO+15]'}id:A
• http://localhost:8983/solr/my_graph/query?fl=id&q={!
graph+from=in_edge+to=out_edge+maxDepth=1}foo:
[*+TO+10]
Graph Query Parser
•Part of Solr’s broader Streaming Expressions capability
•Implements a powerful, breadth-first traversal
•Works across shards AND collections
•Supports aggregations
•Cycle aware
Graph Streaming Expressions (Solr 6.1)
curl -X POST -H "Content-Type: application/x-www-form-urlencoded" -d ‘expr=…’ "http://localhost:
18984/solr/movielens/stream"
All movies that user 389 watched
expr:gatherNodes(movielens,walk="389->user_id_i",gather="movie_id_i")
All the Movies that viewers of Movie 161 watched
expr:gatherNodes(movielens,
gatherNodes(movielens,walk="161->movie_id_i",gather="user_id_i"),
walk="node->user_id_i",gather="movie_id_i", trackTraversal="true")
Movie 161: “The Air Up There”
Collaborative Filtering Example
expr=top(n="5", sort="count(*) desc",
gatherNodes(movielens, top(n="30", sort="count(*) desc",
gatherNodes(movielens,
search(movielens, q="user_id_i:305", fl="movie_id_i", sort="movie_id_i asc", qt=“/export"),
walk="movie_id_i->movie_id_i", gather="user_id_i",
maxDocFreq="10000", count(*))),
walk="node->user_id_i", gather="movie_id_i", count(*)))'
Comparisons
Comparing Graph Choices
Solr Elastic Graph Neo4J
Spark
GraphX
Best Use Case
QParser: predef.
relationships as filters
Expressions: fast,
query-based, dist.
graph ops
Term relationship
exploration
Graph ops and
querying that fit on a
single node
Large-scale, iterative
graph ops
Common Graph
Algorithms (e.g.
Pregel, Traversal)
Partial No Yes Yes
Scaling
QParser: no
Expressions: yes
Yes Master/Replica Yes
Commercial
License Required
No Yes GPLv3 No
Visualizations
GraphML support
(Gephi)
Kibana Neo4j browser 3rd party
Comparing Big Data SQL Choices
Solr Hive Drill SparkSQL
Secret Sauce
Push complex query
constructs into engine
(full text, spatial,
functions, etc)
Mature SQL solution
for Hadoop stack
Execute SQL over
NoSQL data sources
Spark core (optimized
shuffle, in-memory,
etc), integration of
other APIs: ML,
Streaming, GraphX
SQL Features Evolving Mature Maturing Maturing
Scaling
Linear (shards and
replicas) backed by
inverted index
Limited by Hadoop
infrastructure (table
scans)
Good, but need to
benchmark
Memory intensive;
Scale out using Spark
cluster, backed by
RDDs
Integration w/ external
systems
JDBC stream source
external tables /
plugin API
many drivers
available
DataSource API,
many systems
supported
Future Work
• Alternate graph traversal approaches, e.g. depth-first
• Possible support for Gremlin (Graph Traversal Language from Tinker Pop)
• Additional graph algorithms (e.g. strongly conn. components, page rank)
Future Work
• No support for pushing >, >=, <, <= operators in
WHERE clause down into Solr as range queries;
use range syntax [4 TO *] for now
• Using Solr function queries in WHERE clause, e.g.
WHERE	
  location_p='{!geofilt	
  d=90	
  
pt=37.773972,-­‐122.431297	
  sfield=location_p}’	
  
• SQL Joins (SOLR-8593)
• Port SQL layer to use Apache Calcite vs. Presto
SQL: Current Limitations and Future Plans

More Related Content

What's hot

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubLucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 

What's hot (19)

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 

Similar to Webinar: Solr 6 Deep Dive - SQL and Graph

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Lucidworks
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceChitturi Kiran
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsDataWorks Summit
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyondErnesto Reig
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxSonuShaw16
 

Similar to Webinar: Solr 6 Deep Dive - SQL and Graph (20)

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
 
Solr5
Solr5Solr5
Solr5
 
ETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developersETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developers
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Webinar: Solr 6 Deep Dive - SQL and Graph

  • 1.
  • 3. Solr 6 Deep Dive: SQL and Graph Grant Ingersoll @gsingers CTO, Lucidworks Tim Potter @thelabdude Sr. Software Engineer, Lucidworks
  • 4. • Motivations • Streaming Expressions and Parallel SQL • Graph Capabilities • How does this compare to…? • Future Directions Agenda
  • 6. • Big data systems have grown too complex trying to satisfy a variety of access patterns • Fast primary key lookups / atomic updates (Solr, HBase, Cassandra, …) • Low-latency ranked retrieval (Solr, Elastic, DataStax, …) • Large, distributed table scans (Spark, M/R, Pig, Cassandra, Hive, Impala, …) • Graph traversal (Graphx, Giraph, Neo4j, …) • De-normalization can be inconvenient as related data sets can change at different velocities (movies vs. movie ratings) • Leverage progress made by the Solr community to support big data in Solr using horizontal scalability (shards & replicas) • Don’t forget about speed ~ Search engines in general and Solr in particular are extremely fast! Why Solr needs Parallel Computation
  • 7. Lucidworks Fusion Is Search-Driven Everything •Drive next generation relevance via Content, Collaboration and Context •Harness best in class Open Source: Apache Solr + Spark •Simplify application development and reduce ongoing maintenance CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Indexing & Streaming Smart Access API Recommendations &
 Alerts Analytics & InsightsExtreme Relevancy Access data from anywhere to build intelligent, data- driven applications.
  • 8. Fusion Architecture RESTAPI Worker Worker Cluster Mgr. Apache Spark Shards Shards Apache Solr HDFS(Optional) Shared Config Mgmt Leader Election Load Balancing ZK 1 Apache Zookeeper ZK N DATABASEWEBFILELOGSHADOOP CLOUD Connectors Alerting/Messaging NLP Pipelines Blob Storage Scheduling Recommenders/Signals … Core Services Admin UI SECURITY BUILT-IN Lucidworks View
  • 10. • SQL is ubiquitous language for analytics • People: Less training and easier to understand • Tools! Solr as JDBC data source (DbVisualizer, Apache Zeppelin, and SQuirreL SQL) • Query planning / optimization can evolve iteratively SQL is natural extension for Solr’s parallel computing engine
  • 11. Give me the top 5 action movies with rating of 4 or better Mental Warm-up /select?q=*:* &fq=genre_ss:action &fq=rating_i:[4 TO *] &facet=true &facet.limit=5 &facet.mincount=1 &facet.field=title_s SELECT title_s, COUNT(*) as cnt FROM movielens WHERE genre_ss='action' AND rating_i='[4 TO *]’ GROUP BY title_s ORDER BY cnt desc LIMIT 5 {  ...        "facet_counts":{          "facet_fields":{              "title_s":[                  "Star  Wars  (1977)",501,                  "Return  of  the  Jedi  (1983)",379,                  "Godfather,  The  (1972)",351,                  "Raiders  of  the  Lost  Ark  (1981)",348,                  "Empire  Strikes  Back,  The  (1980)",293]},          ...}} {"result-­‐set":{"docs":[   {"title_s":"Star  Wars  (1977)”,"cnt":501},   {"title_s":"Return  of  the  Jedi  (1983)","cnt":379},   {"title_s":"Godfather,  The  (1972)","cnt":351},   {"title_s":"Raiders  of  the  Lost  Ark  (1981)","cnt":348},   {"title_s":"Empire  Strikes  Back,  The  (1980)","cnt":293},   {"EOF":true,"RESPONSE_TIME":42}]}}
  • 12.    SELECT  gender_s,  COUNT(*)  as  num_ratings,  avg(rating_i)  as  avg_rating            FROM  movielens          WHERE  genre_ss='romance'  AND  age_i='[30  TO  *]'   GROUP  BY  gender_s     ORDER  BY  num_ratings  desc SQL Examples    SELECT  title_s,  genre_s,  COUNT(*)  as  num_ratings,  avg(rating_i)  as  avg_rating            FROM  movielens     GROUP  BY  title_s,  genre_s        HAVING  num_ratings  >=  100     ORDER  BY  avg_rating  desc          LIMIT  5    SELECT  DISTINCT(user_id_i)  as  user_id            FROM  movielens          WHERE  genre_ss='documentary'     ORDER  BY  user_id  desc Give me the avg rating for men and women over 30 for romance movies Give me the top 5 rated movies with at least 100 ratings Give me the set of unique users that have rated documentaries
  • 13. • Perform relational operations on streams • Stream sources: search, jdbc, facets, stats, topic, gatherNodes • Stream decorators: complement, daemon, leftOuterJoin, hashJoin, innerJoin, intersect, merge, outerHashJoin, parallel, reduce, random, rollup, select, shortestPath, sort, top, unique, update Streaming Expressions
  • 14. • Relies on docValues (column-oriented data structure) and /export handler • Extreme read performance (8-10x faster than queries using cursorMark) • Facet or map/reduce style aggregation modes • Tiered architecture • SQL interface tier • Worker tier (scale a pool of worker “nodes” independently of the data collection) • Data tier (Solr collection) Streaming API: Nuts and Bolts
  • 15. parallel(workers,      hashJoin(          search(movielens,  q=*:*,                              fl="user_id_i,movie_id_i,rating_i",                              sort="movie_id_i  asc",                              partitionKeys="movie_id_i"),          hashed=search(movielens_movies,  q=*:*,                                                fl="movie_id_i,title_s,genre_s",                                                sort="movie_id_i  asc",                                                partitionKeys="movie_id_i"),          on="movie_id_i"      ),      workers="4",      sort="movie_id_i  asc"   )   Streaming Expression Example: hashJoin The small “right” side of the join gets loaded into memory on each worker node Each shard queried by N workers, so 4 workers x 4 shards means 16 queries (usually all replicas per shard are hit) Workers collection isolates parallel computation nodes from data nodes
  • 16. Aggregation Modes • Map/Reduce aggregationMode — for high cardinality aggregations and distributed joins (requires a shuffle phase to move keys to correct worker) curl  -­‐-­‐data-­‐urlencode  "stmt=SELECT  user_id_i,  avg(rating_i)  as  avg_rating  FROM  movielens  GROUP  BY  user_id_i"     “http://host:port/solr/movielens/sql?aggregationMode=map_reduce”   • Facet aggregationMode — Uses JSON facet engine for high performance on low-to-moderate cardinality fields (e.g. movies) curl  -­‐-­‐data-­‐urlencode  "stmt=SELECT  movie_id_i,  avg(rating_i)  as  avg_rating  FROM  movielens  GROUP  BY  movie_id_i"        “http://host:port/solr/movielens/sql?aggregationMode=facet”
  • 17. • spark-solr project uses streaming API to pull data from Solr into Spark jobs if docValues enabled, see: https://github.com/lucidworks/spark-solr • Perform aggregations of “signals”, e.g clicks, to compute boosts and recommendations using Spark • Custom Scala script jobs to perform complex analysis on data in Solr, e.g. sessionize request logs • Power rich data visualizations using Spark SQL over Solr streaming aggregations How we use Solr streaming API in Fusion
  • 18. Graph
  • 19. • Anomaly detection and fraud detection • Recommenders • Social network analysis • Graph Search • Access Control • Examples: • Find all tweets mentioning “Solr” by me or people I follow • Find all draft blog posts about “Parallel SQL” written by a developer • Find 3-star hotels in NYC my friends stayed in last year Graph Use Cases
  • 20. • Some data is much more naturally represented as a graph structure • Traditionally hard to deal with in search’s inverted index • Solr 6.0 introduces the Graph Query Parser • Solr 6.1 brings Graph Streaming expressions Graph Basics
  • 21. • Query-time, cyclic aware graph traversal is able to rank documents based on relationships • Provides controls for depth, filtering of results and inclusion of root and/or leaves • Limitations: single node/shard only • Examples: • http://localhost:8983/solr/graph/query? fl=id,score&q={!graph+from=in_edge+to=out_edge} id:A • http://localhost:8983/solr/my_graph/query?fl=id&q={! graph+from=in_edge+to=out_edge +traversalFilter='foo:[*+TO+15]'}id:A • http://localhost:8983/solr/my_graph/query?fl=id&q={! graph+from=in_edge+to=out_edge+maxDepth=1}foo: [*+TO+10] Graph Query Parser
  • 22. •Part of Solr’s broader Streaming Expressions capability •Implements a powerful, breadth-first traversal •Works across shards AND collections •Supports aggregations •Cycle aware Graph Streaming Expressions (Solr 6.1) curl -X POST -H "Content-Type: application/x-www-form-urlencoded" -d ‘expr=…’ "http://localhost: 18984/solr/movielens/stream"
  • 23. All movies that user 389 watched expr:gatherNodes(movielens,walk="389->user_id_i",gather="movie_id_i")
  • 24. All the Movies that viewers of Movie 161 watched expr:gatherNodes(movielens, gatherNodes(movielens,walk="161->movie_id_i",gather="user_id_i"), walk="node->user_id_i",gather="movie_id_i", trackTraversal="true") Movie 161: “The Air Up There”
  • 25. Collaborative Filtering Example expr=top(n="5", sort="count(*) desc", gatherNodes(movielens, top(n="30", sort="count(*) desc", gatherNodes(movielens, search(movielens, q="user_id_i:305", fl="movie_id_i", sort="movie_id_i asc", qt=“/export"), walk="movie_id_i->movie_id_i", gather="user_id_i", maxDocFreq="10000", count(*))), walk="node->user_id_i", gather="movie_id_i", count(*)))'
  • 27. Comparing Graph Choices Solr Elastic Graph Neo4J Spark GraphX Best Use Case QParser: predef. relationships as filters Expressions: fast, query-based, dist. graph ops Term relationship exploration Graph ops and querying that fit on a single node Large-scale, iterative graph ops Common Graph Algorithms (e.g. Pregel, Traversal) Partial No Yes Yes Scaling QParser: no Expressions: yes Yes Master/Replica Yes Commercial License Required No Yes GPLv3 No Visualizations GraphML support (Gephi) Kibana Neo4j browser 3rd party
  • 28. Comparing Big Data SQL Choices Solr Hive Drill SparkSQL Secret Sauce Push complex query constructs into engine (full text, spatial, functions, etc) Mature SQL solution for Hadoop stack Execute SQL over NoSQL data sources Spark core (optimized shuffle, in-memory, etc), integration of other APIs: ML, Streaming, GraphX SQL Features Evolving Mature Maturing Maturing Scaling Linear (shards and replicas) backed by inverted index Limited by Hadoop infrastructure (table scans) Good, but need to benchmark Memory intensive; Scale out using Spark cluster, backed by RDDs Integration w/ external systems JDBC stream source external tables / plugin API many drivers available DataSource API, many systems supported
  • 30. • Alternate graph traversal approaches, e.g. depth-first • Possible support for Gremlin (Graph Traversal Language from Tinker Pop) • Additional graph algorithms (e.g. strongly conn. components, page rank) Future Work
  • 31. • No support for pushing >, >=, <, <= operators in WHERE clause down into Solr as range queries; use range syntax [4 TO *] for now • Using Solr function queries in WHERE clause, e.g. WHERE  location_p='{!geofilt  d=90   pt=37.773972,-­‐122.431297  sfield=location_p}’   • SQL Joins (SOLR-8593) • Port SQL layer to use Apache Calcite vs. Presto SQL: Current Limitations and Future Plans