SlideShare a Scribd company logo
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
1
Leveraging the Power of Solr with Spark
JOHANNES WEIGEND
CTO, QAware GmbH / Germany
2
3
01
Agenda
Introduction to Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
It is Hard to Scale Horizontally!
■ Functions

- Trivial
- Loadbalancing of stateless services (macro- / microservices)
- More users -> more machines
- Nontrivial
- More machines -> faster response times
■ Data

- Trivial
- Linear distribution of data on multiple machines
- More machines -> more data
- Nontrivial
- Constant response times with growing datasets
4
5
Cloud
-Document based NoSQL database with outstanding search capabilities
A document is a collection of fields (string, number, date, …)
Single und multiple fields (fields can be arrays)
Nested documents
Static und dynamic scheme
Powerful query language (Lucene)
-Horizontally scalable with Solr Cloud
Distributed data in separate shards
Resilience by combination of zookeeper and replication
-Powerful aggregations (aka facets)
6
Shard2
Solr Server
Zookeeper
Solr ServerSolr Server
Shard1
Zookeeper Zookeeper Zookeeper
Ensamble
Solr Cloud
Leader
Scale Out
Shard3
Replica8 Replica9
Shard5Shard4 Shard6 Shard8Shard7 Shard9
Replica2 Replica3 Replica5
Shards
Replicas
Collection
Replica4 Replica7 Replica1 Replica6
The Architecture of Solr Cloud
Two Levels of Distribution
Search Search Search
Search

Index

Store

Map Map Map
Calculate

Cache

Join

Combine

Frontend
Reduce Business Layer
Combining Solr + Spark
7
READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
■Distributed computing (100x faster than Hadoop M/R)
■Distributed Map/Reduce on distributed data can be done in-memory
■Supports online and batch workloads
■Scala with Java/Scala/Python APIs
■Processes data from distributed and local sources
-Textfiles (accessible from all nodes)
-Hadoop File System (HDFS)
-Databases (JDBC)
-Solr per Lucidworks API
8
Driver
9
Apache Spark
executing parallel tasks
executing parallel tasks
Executor
Executor
10
Cloud in a Box
The Cloud in a Box

6th generation Intel® Core™ i5-6260U processor
with Intel® Iris™ graphics
(1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB
Cache, 15W TDP)
CPU
32 GB Dual-channel DDR4 SODIMMs
1.2V, 2133 MHz
RAM
256 GB Samsung M.2 internal SSDDISK
! Used for all benchmarks in this talk
10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
11
12
13
01
Introduction into Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
Agenda
Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH
Monitoring Sample Data
■ Single CSV per process, host, metric type
wls1_lpapp18_jmx.csv
Datetime CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV Solr document per cell
14
15
CloudSolrClient
SOLR1
SOLR2
SOLR3
add(List<document> batch) ShardsClient
Input Data
read input data
create batch
add batch to Solr
Bottleneck
Processing
Bottleneck
Network
Importing and Indexing into Solr can be slow
Some Options to Speed Things Up
Spark Executor
16
CloudSolrClient Solr Server 1
add(List<document> batch) Shards
Parallel Cloud Importer
Distributed
Input Data
-read input data
-create batch
-add batch to Solr
Parallel Import with Spark makes Import Scalable
Node1
CloudSolrClient Solr Server 2Spark ExecutorNode2
Scale upScale up Scale up
Node n
Solr Server 3CloudSolrClientSpark ExecutorNode3
17
How to Import Multiple (HDFS) Files
18
19
Solr UUID-Field
20
Import takes - 78411 ms
—> 180.000 Docs per Second
Indexing 14 Mio Docs in 1:20 Min
SolrJ and Spark have Different Transitive Dependencies
Depending on the Software Version
■ Adding both libraries to your classpath leads by transitivity to serious
problems at runtime (Serialization errors / ClassNotFoundExceptions…)
■ Pinning / Exclusion helps - but can produce strange errors. There is
currently no satisfying solution for the BigData class path hell.
21
22
01
Introduction into Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
Agenda
23
Using Solr Facet Queries for Aggregation
#
# Grouping per sub query
#
curl $SOLR/$COLLECTION/select -d '
q=process:wls1 AND metric:*.HeapMemoryUsage.used&
rows=0&
json.facet={
Hosts: {
type: terms,
field: host,
facet:{
Off : { query : "value: [* TO 0]" },
Idle : { query : "value: [0 TO 1000000000]" },
Busy : { query : "value: [1000000001 TO 10000000000]" },
Overload : { query : "value: [10000000001 TO *]" }
}
}
}
Why Do we Need Even More?
■ Data centerer applications need a scalable way of
- Post processing search results or facets (business logik, ML,
data analytics)
- Post filtering search results
- Processing denormalized data (if you store a one-to-many
relation in a single Solr document)
24
Accessing Solr from Spark with SolrRDD
■ https://github.com/
lucidworks/spark-solr
■ You have to build the
library locally. There is no
released version at Maven
Central.
■ Make sure to adjust the
versions depending on
your environment
25
Streaming from Solr into Spark
Not Bad! 14 Mio in 1:27 Minutes
26
27
You Can Speed up Spark / Solr by Factor 10
Using the Export Handler
Using SolrRDD with Java
28
29
Reading 14 Mio Docs in 10 Seconds
Streaming 14 Mio Solr documents into Spark
takes 10 Seconds
—> 1.400 000 Docs per Second
RDDs using /export Handler Rocks!
30
Scaling up
31
Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH
Recap: Monitoring Sample Data
■ Single CSV per process, host, metric type
wls1_lpapp18_jmx.csv
Date CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV SOLR
32
1000 lines with 10.000
columuns = 3MB gzipped
1000 x 10.000 docs = 1 Mio Solr docs
A Naive Solr Datamodel
A single Solr document per CSV cell
‣ Advantage
You can use Solr for aggregation, sorting and
searching for values or time intervals
‣ Disadvantage
Data explosion (single compressed CSV file with 3MB
in size produces 1 Mil Solr documents)
33
Column Based Denormalization
wls1_lpapp18_jmx.csv
Date CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV
SolrDocument {
process: wls1
host: lpapp18
type: jmx
maxdate: 1/10/16 9:00
mindate: 1/10/16 12:00
metric: CPU % Usage
values: [BINARY (Date, Long)]
max: 80
min: 50
avg: 65
}
n 1
Store 1000-10000 events in a single document
Document per column
34
Storing 1-to-1400 Relation in a Single Document
Base64 encoded and gzipped
values: [{date: …, value:}, … ]
35
32k Limit for DocValues
Benefits of Denomalization
‣ Benefits
- You can scale from a xxx million documents in a Solr Cloud up to
trillions of searchable events
- Import is vastly faster
‣ Drawbacks
- Searching on single values requires additional logic
- Counting and faceting requires additional logic
‣ Spark can solve these problems by parallel post processing
- Decompressing, aggregating, joining, grouping
36
Accessing Compressed Data within Spark
37
38
Indexing 19 Million of CSV Values
in 13500 Solr documents
takes now 24 Seconds (before 1:20)
—> 800,000 Values per Second
39
Streaming One Billion of Solr Values into Spark
Takes now 34 Seconds (Before 700 s)
—> 29,000,000 Values per Second
Summary
■ The combination of Solr Cloud and Spark gives you the power to
deal with BigData workloads in realtime
■ Denormalization can make your Solr application vastly faster
■ Make use of the /export handler when using the SolrRDD
■ Parallel post processing is mandatory for nontrivial applications
■ If you want to learn more: come to the Chronix talk on Friday
40
Learn More
■ https://github.com/lucidworks/spark-solr
■ https://github.com/jweigend/solr-spark
■ http://chronix.io
■ https://github.com/ChronixDB/chronix.spark/
■ http://qaware.blogspot.de
41
42
43

More Related Content

What's hot

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
thelabdude
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
Varun Thacker
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
Amazon Web Services
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
lucenerevolution
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
 
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo..."Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
Lucidworks
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
thelabdude
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
ravikgiitk
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
searchbox-com
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
thelabdude
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
Rahul Jain
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
MongoDB
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Spark Summit
 

What's hot (20)

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo..."Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 

Viewers also liked

Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Lucidworks
 
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceImproving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Lucidworks
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with Fusion
Lucidworks
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Lucidworks
 
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
Lucidworks
 
Http2 right now
Http2 right nowHttp2 right now
Http2 right now
Daniel Stenberg
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Lucidworks
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
Lucidworks
 
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
Lucidworks
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeaways
Havas
 

Viewers also liked (15)

Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
 
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceImproving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with Fusion
 
Free Data
Free DataFree Data
Free Data
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
 
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
 
Http2 right now
Http2 right nowHttp2 right now
Http2 right now
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakut...
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeaways
 

Similar to Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Dataconomy Media
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Florian Lautenschlager
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
Tim Bell
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Redis Labs
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Amazon Web Services
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
MongoDB
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
lilyco
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
Aerospike, Inc.
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
Robert Grossman
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
Philip Zhong
 

Similar to Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware (20)

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A 1
  • 2. Leveraging the Power of Solr with Spark JOHANNES WEIGEND CTO, QAware GmbH / Germany 2
  • 3. 3 01 Agenda Introduction to Solr Cloud and Spark Importing Searching and Aggregating Scaling Up
  • 4. It is Hard to Scale Horizontally! ■ Functions - Trivial - Loadbalancing of stateless services (macro- / microservices) - More users -> more machines - Nontrivial - More machines -> faster response times ■ Data - Trivial - Linear distribution of data on multiple machines - More machines -> more data - Nontrivial - Constant response times with growing datasets 4
  • 5. 5 Cloud -Document based NoSQL database with outstanding search capabilities A document is a collection of fields (string, number, date, …) Single und multiple fields (fields can be arrays) Nested documents Static und dynamic scheme Powerful query language (Lucene) -Horizontally scalable with Solr Cloud Distributed data in separate shards Resilience by combination of zookeeper and replication -Powerful aggregations (aka facets)
  • 6. 6 Shard2 Solr Server Zookeeper Solr ServerSolr Server Shard1 Zookeeper Zookeeper Zookeeper Ensamble Solr Cloud Leader Scale Out Shard3 Replica8 Replica9 Shard5Shard4 Shard6 Shard8Shard7 Shard9 Replica2 Replica3 Replica5 Shards Replicas Collection Replica4 Replica7 Replica1 Replica6 The Architecture of Solr Cloud Two Levels of Distribution
  • 7. Search Search Search Search Index Store Map Map Map Calculate Cache Join Combine Frontend Reduce Business Layer Combining Solr + Spark 7
  • 8. READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf ■Distributed computing (100x faster than Hadoop M/R) ■Distributed Map/Reduce on distributed data can be done in-memory ■Supports online and batch workloads ■Scala with Java/Scala/Python APIs ■Processes data from distributed and local sources -Textfiles (accessible from all nodes) -Hadoop File System (HDFS) -Databases (JDBC) -Solr per Lucidworks API 8
  • 9. Driver 9 Apache Spark executing parallel tasks executing parallel tasks Executor Executor
  • 11. The Cloud in a Box
 6th generation Intel® Core™ i5-6260U processor with Intel® Iris™ graphics (1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB Cache, 15W TDP) CPU 32 GB Dual-channel DDR4 SODIMMs 1.2V, 2133 MHz RAM 256 GB Samsung M.2 internal SSDDISK ! Used for all benchmarks in this talk 10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal 11
  • 12. 12
  • 13. 13 01 Introduction into Solr Cloud and Spark Importing Searching and Aggregating Scaling Up Agenda
  • 14. Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH Monitoring Sample Data ■ Single CSV per process, host, metric type wls1_lpapp18_jmx.csv Datetime CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV Solr document per cell 14
  • 15. 15 CloudSolrClient SOLR1 SOLR2 SOLR3 add(List<document> batch) ShardsClient Input Data read input data create batch add batch to Solr Bottleneck Processing Bottleneck Network Importing and Indexing into Solr can be slow Some Options to Speed Things Up
  • 16. Spark Executor 16 CloudSolrClient Solr Server 1 add(List<document> batch) Shards Parallel Cloud Importer Distributed Input Data -read input data -create batch -add batch to Solr Parallel Import with Spark makes Import Scalable Node1 CloudSolrClient Solr Server 2Spark ExecutorNode2 Scale upScale up Scale up Node n Solr Server 3CloudSolrClientSpark ExecutorNode3
  • 17. 17 How to Import Multiple (HDFS) Files
  • 18. 18
  • 20. 20 Import takes - 78411 ms —> 180.000 Docs per Second Indexing 14 Mio Docs in 1:20 Min
  • 21. SolrJ and Spark have Different Transitive Dependencies Depending on the Software Version ■ Adding both libraries to your classpath leads by transitivity to serious problems at runtime (Serialization errors / ClassNotFoundExceptions…) ■ Pinning / Exclusion helps - but can produce strange errors. There is currently no satisfying solution for the BigData class path hell. 21
  • 22. 22 01 Introduction into Solr Cloud and Spark Importing Searching and Aggregating Scaling Up Agenda
  • 23. 23 Using Solr Facet Queries for Aggregation # # Grouping per sub query # curl $SOLR/$COLLECTION/select -d ' q=process:wls1 AND metric:*.HeapMemoryUsage.used& rows=0& json.facet={ Hosts: { type: terms, field: host, facet:{ Off : { query : "value: [* TO 0]" }, Idle : { query : "value: [0 TO 1000000000]" }, Busy : { query : "value: [1000000001 TO 10000000000]" }, Overload : { query : "value: [10000000001 TO *]" } } } }
  • 24. Why Do we Need Even More? ■ Data centerer applications need a scalable way of - Post processing search results or facets (business logik, ML, data analytics) - Post filtering search results - Processing denormalized data (if you store a one-to-many relation in a single Solr document) 24
  • 25. Accessing Solr from Spark with SolrRDD ■ https://github.com/ lucidworks/spark-solr ■ You have to build the library locally. There is no released version at Maven Central. ■ Make sure to adjust the versions depending on your environment 25
  • 26. Streaming from Solr into Spark Not Bad! 14 Mio in 1:27 Minutes 26
  • 27. 27 You Can Speed up Spark / Solr by Factor 10 Using the Export Handler
  • 29. 29 Reading 14 Mio Docs in 10 Seconds Streaming 14 Mio Solr documents into Spark takes 10 Seconds —> 1.400 000 Docs per Second
  • 30. RDDs using /export Handler Rocks! 30
  • 32. Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH Recap: Monitoring Sample Data ■ Single CSV per process, host, metric type wls1_lpapp18_jmx.csv Date CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV SOLR 32 1000 lines with 10.000 columuns = 3MB gzipped 1000 x 10.000 docs = 1 Mio Solr docs
  • 33. A Naive Solr Datamodel A single Solr document per CSV cell ‣ Advantage You can use Solr for aggregation, sorting and searching for values or time intervals ‣ Disadvantage Data explosion (single compressed CSV file with 3MB in size produces 1 Mil Solr documents) 33
  • 34. Column Based Denormalization wls1_lpapp18_jmx.csv Date CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV SolrDocument { process: wls1 host: lpapp18 type: jmx maxdate: 1/10/16 9:00 mindate: 1/10/16 12:00 metric: CPU % Usage values: [BINARY (Date, Long)] max: 80 min: 50 avg: 65 } n 1 Store 1000-10000 events in a single document Document per column 34
  • 35. Storing 1-to-1400 Relation in a Single Document Base64 encoded and gzipped values: [{date: …, value:}, … ] 35 32k Limit for DocValues
  • 36. Benefits of Denomalization ‣ Benefits - You can scale from a xxx million documents in a Solr Cloud up to trillions of searchable events - Import is vastly faster ‣ Drawbacks - Searching on single values requires additional logic - Counting and faceting requires additional logic ‣ Spark can solve these problems by parallel post processing - Decompressing, aggregating, joining, grouping 36
  • 37. Accessing Compressed Data within Spark 37
  • 38. 38 Indexing 19 Million of CSV Values in 13500 Solr documents takes now 24 Seconds (before 1:20) —> 800,000 Values per Second
  • 39. 39 Streaming One Billion of Solr Values into Spark Takes now 34 Seconds (Before 700 s) —> 29,000,000 Values per Second
  • 40. Summary ■ The combination of Solr Cloud and Spark gives you the power to deal with BigData workloads in realtime ■ Denormalization can make your Solr application vastly faster ■ Make use of the /export handler when using the SolrRDD ■ Parallel post processing is mandatory for nontrivial applications ■ If you want to learn more: come to the Chronix talk on Friday 40
  • 41. Learn More ■ https://github.com/lucidworks/spark-solr ■ https://github.com/jweigend/solr-spark ■ http://chronix.io ■ https://github.com/ChronixDB/chronix.spark/ ■ http://qaware.blogspot.de 41
  • 42. 42
  • 43. 43