SlideShare a Scribd company logo
1 of 37
Download to read offline
Towards the True Elasticity
of Spark
Michael Le and Min Li
IBM,	
  T.J.	
  Watson	
  Research	
  Center	
  
Auto-Scaling
•  Detects changes of resource usage in current workloads →
Dynamically allocate/de-allocate resources
•  Meets SLA requirements at reduced cost
•  Existing auto-scaling approaches react slowly and often miss
optimization opportunities
•  YARN and Mesos have initial auto-scaling support, yet how
workloads can benefit from the capability?
Infrastructure as a Service"
Resource Manager (YARN/Mesos)"
Main Focus
•  Analyze how scaling affects Spark workloads
–  Is simply adding new resources sufficient for performance
improvement?
•  Analyze pros/cons of current Spark auto-scaling mechanism
–  Are there rooms for performance improvement?
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Experimental Setup
•  Baseline setup (6 nodes):
•  Node: 4 CPUs, 8GB, 100 GB HDDs
•  YARN executor: 3GB, 1CPU
•  Mesos executor: 6GB and 4CPUs
•  4 benchmarks (SparkBench*):
–  Kmeans (input 37GB)
–  Page Rank (input 1.1GB)
–  Spark SQL: SQL queries (input 39GB)
–  Logistic Regression (input 47GB)
•  Scaling up: add 4 new nodes “instantaneously”
•  Wait ~45 seconds after benchmark run
•  Scale down – wait 3min after benchmark run
*	
  h6ps://bitbucket.org/lm0926/sparkbench	
  
Mechanisms for Scaling
•  Scale up – add new node “instantaneously” (VM already
provisioned)
–  Run YARN node manager or Mesos slave daemon
–  New nodes are Task nodes (no HDFS component)
–  Scale down – kill resource manager processes
•  Spark on YARN – set total executors for app higher than
available in cluster to ensure executors get launched
•  Spark on Mesos – make use of new resource offers from
Mesos
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Runtime – Spark on YARN
Benchmark! Runtime
(baseline)!
Runtime !
(scale out)!
Runtime
Reduction!
Kmeans" 54min 44sec" 29min 26sec" 46%"
Page Rank" 9min 28sec" 8min 18sec" 12.3%"
Spark SQL queries" 15min 45sec" 13min 11sec" 16.3%"
LogisticRegression" 13min 10sec" 12min 55sec" 2%"
Similar behavior seen on Mesos
Why the variation?
•  Delay scheduling preventing tasks to be scheduled on
new node
Benchmark! Total
tasks!
% tasks locality
preference
RACK or ANY!
% tasks wait >3s
to be scheduled!
% tasks on new
nodes!
KMeans" 7800" 0.4%" 3.5%" 24.5%"
Page Rank" 21600" 0%" 0.01%" 13.5%"
Spark SQL" 9805" 3%" 0.2%" 11.8%"
LogisticRegression" 6637" 0%" 0.1%" 1.5%"
Tuning Spark for Scaling
•  locality wait time
–  How soon to change locality preference of tasks in
stage
•  resource revive interval
–  How soon to inform scheduler a resource that has not
been used is still available
Runtime – Tuning Spark
Benchmark! Runtime
(baseline)!
Runtime
(scale out)!
Runtime (scale
out– revive
interval 100ms)!
Runtime (scale
out– locality wait
time 100ms)!
Runtime (scale
out–locality wait
time 0ms)!
KMeans" 54min 44sec" 29min 26sec" 30min 39sec" 11min 32sec" 14min 1sec"
Page Rank" 9min 28sec" 8min 18sec" 8min 35sec" 7min 35sec" 5min 57sec"
Spark SQL queries" 15min 45sec" 13min 11sec" 12min 55sec" 11min 40sec" 19min 57sec"
LogisticRegression" 13min 10sec" 12min 55sec" 12min 43sec" 6min 54sec" 9min 12sec"
Benchmark! % tasks on new
node (locality wait
time 3s)!
% tasks on new node
(locality wait time
100ms)!
% tasks on new node
(locality wait time
0ms)!
KMeans" 24.5%" 39.1%" 38%"
Page Rank" 13.5%" 16.1%" 39%"
Spark SQL queries" 11.8%" 22.6%" 39%"
LogisticRegression" 1.5%" 38%" 39%"
KMeans - CPU Utilization Per Node
Scale out (locality wait - 100ms)
1 of the 4 new nodes
Base line (1 of the 6 base nodes) Scale out (locality wait - 100ms)
1 of the 6 base nodes
Scale out (locality wait - 0ms)
1 of the 6 base nodes
Scale out (locality wait - 0ms)
1 of the 4 new nodes
KMeans - Network Utilization
10 MB shuffle read
12 MB shuffle write
Baseline
10 MB shuffle read
11 MB shuffle write
Scale out (locality wait - 100ms)
Greater bandwidth consumption due to transferring of RDD partitions
Scale out (locality wait - 0ms)
10 MB shuffle read
12 MB shuffle write
KMeans - Memory Utilization
Baseline KMeans – scale out (locality wait - 100ms)
KMeans – scale out (locality wait - 0ms)
PageRank - CPU Utilization
Scale out (locality wait - 100ms)
1 of the 4 new nodes
Base line (1 of the 6 nodes) Scale out (locality wait - 100ms)
1 of the 6 nodes
Scale out (locality wait - 0ms)
1 of the 4 new nodes
Scale out (locality wait - 0ms)
1 of the 6 nodes
PageRank - Network Utilization
15.50 GB shuffle read
16.71 GB shuffle write
Baseline
16.07 GB shuffle read
16.71 GB shuffle write
Scale out (locality wait - 100ms)
Greater bandwidth consumption due to transferring of RDD partitions
16.10 GB shuffle read
16.73 GB shuffle write
Scale out (locality wait - 0ms)
PageRank - Memory Utilization
Baseline Scale out (locality wait - 100ms)
Scale out (locality wait - 0ms)
SQL – Network Utilization
Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
LogisticRegression – Network Utilization
Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
Take-aways
•  Locality wait time is key to improvement
– Tune during runtime?
– Adjust during scaling to force use of new
nodes
•  Need to consider gains from running task
on new nodes vs. network bandwidth used
Scaling Down
Benchmark" Runtime – Baseline
(10 nodes)"
Runtime – "
Scale in"
KMeans" 11min 54sec" 49min 13sec"
PageRank" 11min 41sec" 13min 12sec"
Spark SQL" 12min 50sec" 15min 52sec"
LogisticRegression" 10min 20sec" 14min 7sec"
Re-execution overhead worst for some workloads
Scaling Down with Mesos/YARN
•  Prevent Spark from scheduling more tasks on
nodes that are selected to removed
–  Mesos fine-grained, simply not offer resource
from node selected for removal
–  Mesos coarse-grained and YARN, requires
cooperation from Spark to not schedule new
tasks on nodes selected for removal
•  Shutdown node once tasks are drained
–  What to do about stored shuffle data?
Agenda
•  Introduction
•  Background
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Existing Auto-scaling Mechanism
•  Dynamic Executor Allocation (DEA)
–  Works only with YARN
–  Spark request new executors after a fixed time interval
when there are still pending tasks
–  Number of requested executors grow exponentially
•  Works but does have some potential inefficiency
Improving Auto-scaling
•  Main reason tasks are not scheduled on new node
is data locality preferences
–  Delay scheduling preventing tasks from being
run on new node
•  Approach:
–  Change locality wait time dynamically during
application runtime
–  Ideally, reduce locality wait time at point of scale
out, then after stabilizes, revert to previous
locality wait time value
Improving Auto-scaling Details
•  Ideally: tasks spread evenly among all executors
–  average # of tasks per executor:
•  T = Total tasks / # executors
•  ti = # of tasks per executori
•  If ti < alpha*T, for s seconds, then change locality
wait time
•  If ti is still below threshold after r seconds from first
change of locality time, then remove executor
•  If no executors below threshold, reset locality wait
time to initial value
Dynamically Adjusting Locality Wait Time at
Runtime with Dynamic Allocator Execution
Benchmark! Runtime 

(scale out w/
DEA)!
!
% tasks on new
node (locality
wait time 3s)!
Runtime!
(scale out w/
DEA and runtime
adjustment of
locality wait time)!
% tasks on new
node (dynamic
locality wait time)!
KMeans" 32min 15sec" 28.6%" 20min 35sec" 35%"
PageRank" 12min 2sec" 12.9%" 11min 23sec" 14%"
Spark SQL" 12min 48sec" 11.5%" 12min 26sec" 16%"
Logistic

Regression"
11min 58sec" 0.7%" 12min 16sec" 3.3%"
Parameters: alpha = 30%, s = 5sec, new locality wait time = 100ms
Mechanism helps at increasing % tasks on new nodes
Simple Auto-scaling Improvements
Existing
Mechanism
Description Drawbacks Proposal
When to scale? Backlogged tasks
exist for n secs
Request CPU
resources when
Tasks might be I/O
bound
CPU/memory of more than 50%
nodes are greater than a threshold t
(e.g., 98%) over n seconds
How many more
VMs?
Increase
exponentially number
of executors until
configured max or #
of pending tasks
Initially under request n% of the task queue length
Whether to scale
(still beneficial)?
Always scale if above
condition met
Scaling unnecessary
if near end or short
runtime
Option1: According to the ratio of
unprocessed data: e.g., < 80%.
Rough estimation of job execution
time: proportional to the data
process rate
Option2: Model driven – predict
runtime based on previous runs
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Lessons
•  Naïve scaling can help but effectiveness varies greatly
across different workloads
•  Why some workloads do not respond well to scaling?
–  Delay scheduling preventing new nodes to be utilized
•  Dynamic executor allocation works but can be improved
–  Dynamically changing locality wait time can be
effective
•  Overhead of transferring RDD partitions can reduce
benefit of scaling
Future Work
•  Study scaling effects given multiple
simultaneous workloads
•  Implement better support for scaling down
•  Enhance DEA to make use of resource
monitors and job runtime prediction
Backup
Delay Scheduling
•  Intended for fixed-size clusters running
multiple workloads with short tasks
•  Emphasis on scheduling task on nodes
containing data
– Wait short time for resources to free up
on nodes containing data rather than run
task on node available now but further
away from data
Resource Managers + Auto-scaling
•  Resource management
→ different frameworks can coexist
→ high resource utilization
•  Facilitates on-demand resource allocation – support
elastic services
Infrastructure as a Service"
Resource Manager (YARN/
Mesos)"
Brief Intro: YARN
•  Resource Manager (RM) controls resource allocation
•  Application Master (AM) negotiates with RM for
resources and launch executors to run jobs
Brief Intro: Mesos
•  Framework schedulers accept or
reject offered resource
•  Resource preferences are
communicated to Mesos thru common
APIs
•  Coarse-grained mode
–  Mesos launches one long-running Spark
executor on each node to execute all Spark
tasks
•  Fine-grained mode
–  Mesos launches executor for each Spark
task
Runtime – Spark on Mesos
Benchmark! Runtime (baseline)! Runtime (scale out)!
Fine-grained" KMeans" 92min 54sec" 35min 37sec"
Page Rank" 19min 29sec" 16min 55 sec"
Spark SQL queries" 24min 45sec" 19min 7sec"
LogisticRegression" 14min 43sec" 14min 39sec"
Coarse-grained" KMeans" 89min 24sec" 11min 49sec"
Page Rank" 8min 29sec" 7min 41sec"
Spark SQL queries" 10min 32sec" 8min 30sec"
LogisticRegression" 10min 57sec" 7min 56sec"

More Related Content

What's hot

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Spark Summit
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Spark Summit
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Shelan Perera
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...Spark Summit
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSpark Summit
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 

What's hot (20)

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 

Viewers also liked

Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легендыNinel Kek
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочкаNinel Kek
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSJonathan Oliver
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4HomichAlla
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)HomichAlla
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorialjames tong
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache ApexApache Apex
 
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeRoberto Cortez
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commandsbispsolutions
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyPangoly
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesIntroduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesNoé Fernández-Pozo
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 

Viewers also liked (20)

Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
 
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy Code
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with Pangoly
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesIntroduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examples
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 

Similar to Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 
Tackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedInTackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedInDatabricks
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Alpine Data
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDatabricks
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangDatabricks
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 

Similar to Towards True Elasticity of Spark-(Michael Le and Min Li, IBM) (20)

Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Tackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedInTackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedIn
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

  • 1. Towards the True Elasticity of Spark Michael Le and Min Li IBM,  T.J.  Watson  Research  Center  
  • 2. Auto-Scaling •  Detects changes of resource usage in current workloads → Dynamically allocate/de-allocate resources •  Meets SLA requirements at reduced cost •  Existing auto-scaling approaches react slowly and often miss optimization opportunities •  YARN and Mesos have initial auto-scaling support, yet how workloads can benefit from the capability? Infrastructure as a Service" Resource Manager (YARN/Mesos)"
  • 3. Main Focus •  Analyze how scaling affects Spark workloads –  Is simply adding new resources sufficient for performance improvement? •  Analyze pros/cons of current Spark auto-scaling mechanism –  Are there rooms for performance improvement?
  • 4. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 5. Experimental Setup •  Baseline setup (6 nodes): •  Node: 4 CPUs, 8GB, 100 GB HDDs •  YARN executor: 3GB, 1CPU •  Mesos executor: 6GB and 4CPUs •  4 benchmarks (SparkBench*): –  Kmeans (input 37GB) –  Page Rank (input 1.1GB) –  Spark SQL: SQL queries (input 39GB) –  Logistic Regression (input 47GB) •  Scaling up: add 4 new nodes “instantaneously” •  Wait ~45 seconds after benchmark run •  Scale down – wait 3min after benchmark run *  h6ps://bitbucket.org/lm0926/sparkbench  
  • 6. Mechanisms for Scaling •  Scale up – add new node “instantaneously” (VM already provisioned) –  Run YARN node manager or Mesos slave daemon –  New nodes are Task nodes (no HDFS component) –  Scale down – kill resource manager processes •  Spark on YARN – set total executors for app higher than available in cluster to ensure executors get launched •  Spark on Mesos – make use of new resource offers from Mesos
  • 7. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 8. Runtime – Spark on YARN Benchmark! Runtime (baseline)! Runtime ! (scale out)! Runtime Reduction! Kmeans" 54min 44sec" 29min 26sec" 46%" Page Rank" 9min 28sec" 8min 18sec" 12.3%" Spark SQL queries" 15min 45sec" 13min 11sec" 16.3%" LogisticRegression" 13min 10sec" 12min 55sec" 2%" Similar behavior seen on Mesos
  • 9. Why the variation? •  Delay scheduling preventing tasks to be scheduled on new node Benchmark! Total tasks! % tasks locality preference RACK or ANY! % tasks wait >3s to be scheduled! % tasks on new nodes! KMeans" 7800" 0.4%" 3.5%" 24.5%" Page Rank" 21600" 0%" 0.01%" 13.5%" Spark SQL" 9805" 3%" 0.2%" 11.8%" LogisticRegression" 6637" 0%" 0.1%" 1.5%"
  • 10. Tuning Spark for Scaling •  locality wait time –  How soon to change locality preference of tasks in stage •  resource revive interval –  How soon to inform scheduler a resource that has not been used is still available
  • 11. Runtime – Tuning Spark Benchmark! Runtime (baseline)! Runtime (scale out)! Runtime (scale out– revive interval 100ms)! Runtime (scale out– locality wait time 100ms)! Runtime (scale out–locality wait time 0ms)! KMeans" 54min 44sec" 29min 26sec" 30min 39sec" 11min 32sec" 14min 1sec" Page Rank" 9min 28sec" 8min 18sec" 8min 35sec" 7min 35sec" 5min 57sec" Spark SQL queries" 15min 45sec" 13min 11sec" 12min 55sec" 11min 40sec" 19min 57sec" LogisticRegression" 13min 10sec" 12min 55sec" 12min 43sec" 6min 54sec" 9min 12sec" Benchmark! % tasks on new node (locality wait time 3s)! % tasks on new node (locality wait time 100ms)! % tasks on new node (locality wait time 0ms)! KMeans" 24.5%" 39.1%" 38%" Page Rank" 13.5%" 16.1%" 39%" Spark SQL queries" 11.8%" 22.6%" 39%" LogisticRegression" 1.5%" 38%" 39%"
  • 12. KMeans - CPU Utilization Per Node Scale out (locality wait - 100ms) 1 of the 4 new nodes Base line (1 of the 6 base nodes) Scale out (locality wait - 100ms) 1 of the 6 base nodes Scale out (locality wait - 0ms) 1 of the 6 base nodes Scale out (locality wait - 0ms) 1 of the 4 new nodes
  • 13. KMeans - Network Utilization 10 MB shuffle read 12 MB shuffle write Baseline 10 MB shuffle read 11 MB shuffle write Scale out (locality wait - 100ms) Greater bandwidth consumption due to transferring of RDD partitions Scale out (locality wait - 0ms) 10 MB shuffle read 12 MB shuffle write
  • 14. KMeans - Memory Utilization Baseline KMeans – scale out (locality wait - 100ms) KMeans – scale out (locality wait - 0ms)
  • 15. PageRank - CPU Utilization Scale out (locality wait - 100ms) 1 of the 4 new nodes Base line (1 of the 6 nodes) Scale out (locality wait - 100ms) 1 of the 6 nodes Scale out (locality wait - 0ms) 1 of the 4 new nodes Scale out (locality wait - 0ms) 1 of the 6 nodes
  • 16. PageRank - Network Utilization 15.50 GB shuffle read 16.71 GB shuffle write Baseline 16.07 GB shuffle read 16.71 GB shuffle write Scale out (locality wait - 100ms) Greater bandwidth consumption due to transferring of RDD partitions 16.10 GB shuffle read 16.73 GB shuffle write Scale out (locality wait - 0ms)
  • 17. PageRank - Memory Utilization Baseline Scale out (locality wait - 100ms) Scale out (locality wait - 0ms)
  • 18. SQL – Network Utilization Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
  • 19. LogisticRegression – Network Utilization Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
  • 20. Take-aways •  Locality wait time is key to improvement – Tune during runtime? – Adjust during scaling to force use of new nodes •  Need to consider gains from running task on new nodes vs. network bandwidth used
  • 21. Scaling Down Benchmark" Runtime – Baseline (10 nodes)" Runtime – " Scale in" KMeans" 11min 54sec" 49min 13sec" PageRank" 11min 41sec" 13min 12sec" Spark SQL" 12min 50sec" 15min 52sec" LogisticRegression" 10min 20sec" 14min 7sec" Re-execution overhead worst for some workloads
  • 22. Scaling Down with Mesos/YARN •  Prevent Spark from scheduling more tasks on nodes that are selected to removed –  Mesos fine-grained, simply not offer resource from node selected for removal –  Mesos coarse-grained and YARN, requires cooperation from Spark to not schedule new tasks on nodes selected for removal •  Shutdown node once tasks are drained –  What to do about stored shuffle data?
  • 23. Agenda •  Introduction •  Background •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 24. Existing Auto-scaling Mechanism •  Dynamic Executor Allocation (DEA) –  Works only with YARN –  Spark request new executors after a fixed time interval when there are still pending tasks –  Number of requested executors grow exponentially •  Works but does have some potential inefficiency
  • 25. Improving Auto-scaling •  Main reason tasks are not scheduled on new node is data locality preferences –  Delay scheduling preventing tasks from being run on new node •  Approach: –  Change locality wait time dynamically during application runtime –  Ideally, reduce locality wait time at point of scale out, then after stabilizes, revert to previous locality wait time value
  • 26. Improving Auto-scaling Details •  Ideally: tasks spread evenly among all executors –  average # of tasks per executor: •  T = Total tasks / # executors •  ti = # of tasks per executori •  If ti < alpha*T, for s seconds, then change locality wait time •  If ti is still below threshold after r seconds from first change of locality time, then remove executor •  If no executors below threshold, reset locality wait time to initial value
  • 27. Dynamically Adjusting Locality Wait Time at Runtime with Dynamic Allocator Execution Benchmark! Runtime 
 (scale out w/ DEA)! ! % tasks on new node (locality wait time 3s)! Runtime! (scale out w/ DEA and runtime adjustment of locality wait time)! % tasks on new node (dynamic locality wait time)! KMeans" 32min 15sec" 28.6%" 20min 35sec" 35%" PageRank" 12min 2sec" 12.9%" 11min 23sec" 14%" Spark SQL" 12min 48sec" 11.5%" 12min 26sec" 16%" Logistic
 Regression" 11min 58sec" 0.7%" 12min 16sec" 3.3%" Parameters: alpha = 30%, s = 5sec, new locality wait time = 100ms Mechanism helps at increasing % tasks on new nodes
  • 28. Simple Auto-scaling Improvements Existing Mechanism Description Drawbacks Proposal When to scale? Backlogged tasks exist for n secs Request CPU resources when Tasks might be I/O bound CPU/memory of more than 50% nodes are greater than a threshold t (e.g., 98%) over n seconds How many more VMs? Increase exponentially number of executors until configured max or # of pending tasks Initially under request n% of the task queue length Whether to scale (still beneficial)? Always scale if above condition met Scaling unnecessary if near end or short runtime Option1: According to the ratio of unprocessed data: e.g., < 80%. Rough estimation of job execution time: proportional to the data process rate Option2: Model driven – predict runtime based on previous runs
  • 29. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 30. Lessons •  Naïve scaling can help but effectiveness varies greatly across different workloads •  Why some workloads do not respond well to scaling? –  Delay scheduling preventing new nodes to be utilized •  Dynamic executor allocation works but can be improved –  Dynamically changing locality wait time can be effective •  Overhead of transferring RDD partitions can reduce benefit of scaling
  • 31. Future Work •  Study scaling effects given multiple simultaneous workloads •  Implement better support for scaling down •  Enhance DEA to make use of resource monitors and job runtime prediction
  • 33. Delay Scheduling •  Intended for fixed-size clusters running multiple workloads with short tasks •  Emphasis on scheduling task on nodes containing data – Wait short time for resources to free up on nodes containing data rather than run task on node available now but further away from data
  • 34. Resource Managers + Auto-scaling •  Resource management → different frameworks can coexist → high resource utilization •  Facilitates on-demand resource allocation – support elastic services Infrastructure as a Service" Resource Manager (YARN/ Mesos)"
  • 35. Brief Intro: YARN •  Resource Manager (RM) controls resource allocation •  Application Master (AM) negotiates with RM for resources and launch executors to run jobs
  • 36. Brief Intro: Mesos •  Framework schedulers accept or reject offered resource •  Resource preferences are communicated to Mesos thru common APIs •  Coarse-grained mode –  Mesos launches one long-running Spark executor on each node to execute all Spark tasks •  Fine-grained mode –  Mesos launches executor for each Spark task
  • 37. Runtime – Spark on Mesos Benchmark! Runtime (baseline)! Runtime (scale out)! Fine-grained" KMeans" 92min 54sec" 35min 37sec" Page Rank" 19min 29sec" 16min 55 sec" Spark SQL queries" 24min 45sec" 19min 7sec" LogisticRegression" 14min 43sec" 14min 39sec" Coarse-grained" KMeans" 89min 24sec" 11min 49sec" Page Rank" 8min 29sec" 7min 41sec" Spark SQL queries" 10min 32sec" 8min 30sec" LogisticRegression" 10min 57sec" 7min 56sec"