SlideShare a Scribd company logo
1 of 29
Download to read offline
Spark on Mesos
Tim Chen
Mirantis
tnachen@gmail.com
Dean Wampler
Lightbend
dean.wampler@lightbend.com
Dean Wampler
• Architect for Big Data Products at
Lightbend
• Early advocate for Spark on Mesos
• O’Reilly author
–Programming Scala, 2nd Edition
–Programming Hive
–Functional Programming for Java
Developers
Timothy Chen
• Principal Engineer at Mirantis
• Previously lead engineer at
Mesosphere
• Apache Mesos PMC
• Spark contributor, help maintain
Spark on Mesos
What’s this all about, then?
• Why Spark on Mesos?
• What’s happened since last year?
• Demo - GPU support
•What’s next for Spark and Mesos?
Why Spark on Mesos
• Hadoop is great, but ...
–… resource management with
YARN is limited to compute
engines like MapReduce and
Spark.
•What if your clustering system
could run everything?
Why Spark on Mesos
• Hadoop is great, but ...
–… Big Data is moving to streaming (“Fast Data”) and
Spark offers mini-batch streaming.
•What if your cluster system offered dynamic and
flexible resource scheduling able to meet the
needs of evolving, long-running streams?
Why Spark on Mesos
• Hadoop is great, but ...
–… it doesn’t support other popular tools like
Cassandra, Akka, web frameworks, ...
•Maybe you need the SMACK stack:
–Spark
–Mesos
–Akka
–Cassandra
–Kafka
There’s a
Scheduler
for that!
What’s happened since last year?
• What’s new in Mesos
• What’s new in Spark on Mesos
• Getting rid of fine-grained mode?
What’s new in Mesos?
• Maintenance primitives
• Resource quotas, dynamic reservation
*Beta*
• CNI network Support
• GPU Support
• Unified Containerizer
• More..
What’s new in Spark on Mesos?
• Integration test suite
• New scheduler
• Mesos framework authentication
• Cluster mode now supports Python
Integration Test Suite
• A recent release candidate for Spark broke
Mesos integration completely.
–Better integration testing clearly needed.
–Lightbend and Mesosphere collaborated on an
automated integration test suite.
https://github.com/typesafehub/mesos-spark-integration-tests
Integration Test Suite
• “mesos-docker” subproject:
–Builds Docker image with Ubuntu, Mesos, Spark, and
HDFS.
–Scripts to run cluster with 1 master and N slaves,
configurable #s of CPUs, memory, etc.
•(Not needed if you already have a Mesos cluster ;^)
Integration Test Suite
• “test-runner” subproject:
–Executes a suite of tests on your Mesos or DC/OS
cluster.
– Currently exercises dynamic allocation, coarse-grain
and fine-grain modes, etc.
New Coarse Grain Scheduler
How the old Coarse grain scheduler works?
Launch 1 Spark executor per agent
- Rough steps:
- Evaluate offers as it comes in from the master
- Offers that meets min cpu (1) and min memory
requirements
- Use as much cores until meets spark.cores.max
- Every executor requests fixed memory
New Coarse Grain Scheduler
How the old Coarse grain scheduler works?
Mesos Agent 1
CPU: 8
Memory: 8gb
Mesos Agent 2
CPU: 8
Memory: 8gb
Mesos Agent 3
CPU: 8
Memory: 8gb
CoarseMesosSchedulerBackend
spark.cores.max=12
spark.executor.memory=4gb
Spark Executor
CPU 8
Memory 4gb
Spark Executor
CPU 4
Memory 4gb
New Coarse Grain Scheduler
How the old Coarse grain scheduler works?
Mesos Agent 1
CPU: 8
Memory: 8gb
Mesos Agent 2
CPU: 2
Memory: 8gb
Mesos Agent 3
CPU: 2
Memory: 8gb
CoarseMesosSchedulerBackend
spark.cores.max=12
spark.executor.memory=4gb
Spark Executor
CPU 8
Memory 4gb
Spark Executor
CPU 2
Memory 4gb
Spark Executor
CPU 2
Memory 4gb
New Coarse Grain Scheduler
How the old Coarse grain scheduler works?
Mesos Agent
CPU: 8
Memory: 64gb
Mesos Agent
CPU: 2
Memory: 64gb
Mesos Agent
CPU: 2
Memory: 64gb
CoarseGrainedMesosScheduler
Spark Executor
CPU 8
Memory 64gb
Spark Executor
CPU 2
Memory 64gb
Spark Executor
CPU 2
Memory 64gb
spark.cores.max=12
spark.executor.memory=64gb
New Coarse Grain Scheduler
Problems with the old scheduler:
• Only allow one executor per slave
• Unpredictable performance
• Can skew allocation
New Coarse Grain Scheduler
Mesos Agent 1
CPU: 8
Memory: 8gb
Mesos Agent 2
CPU: 8
Memory: 8gb
Mesos Agent 3
CPU: 8
Memory: 8gb
CoarseMesosSchedulerBackend
spark.cores.max=12
spark.executor.memory=4gb
spark.executor.cores=4
Spark Executor
CPU 4
Memory 4gb
Spark Executor
CPU 4
Memory 4gb
Spark Executor
CPU 4
Memory 4gb
Mesos Framework Authentication
• Mesos supports framework authentication.
• Roles can be set per framework
–Impacts the relative weight of resource allocation
•Optional authentication information to allow the
framework to be connected to the master.
Getting rid of fine-grained mode?
Coarse-grained Mode Fine-grained Mode
Getting rid of fine-grained mode?
• Why two modes?
–FG uses resources more efficiently, because of start-
on-demand and Spark executor+task are removed
when no longer needed.
–CG holds onto all allocated tasks until the job
finishes.
–But that makes CG faster to start tasks; nice for
interactive jobs (e.g., SQL queries).
–While FG has a longer start up time.
Getting rid of fine-grained mode?
• Why two modes?
–Until recently, only ONE CG executor allowed per
worker node.
–Makes it harder to exploit all of the node’s resources.
Getting rid of fine-grained mode?
• Today:
–Dynamic Allocation reclaims unused executors.
•(Although running this service on every node is a disadvantage)
–Allows more than one CG executor per node.
•Hence, the advantages of FG are becoming less
important.
Getting rid of fine-grained mode?
• Spark has lots of redundant code to implement
both modes.
•So, to simplify the code base and operations, FG
is now deprecated, but it can’t be removed yet.
GPUsMesos
Running _____________ on __________with
_______
on top of ______ using _____ in the _____!
Demo
Deep Learning Tensorflow
Spark
Cloud
What’s Next for Mesos?
• Pod support
• Multiple roles support
• Event Bus
• Improved Container Security (capabilities, etc)
What’s Next for Spark on Mesos?
• GPU Support on Mesos
• Use revocable resources
• Better scheduling
–Strategies (e.g: Spread, Binpack)
–Scheduling metrics
• More integration test coverage:
–More cluster and job configuration options.
–Roles and authentication scenarios.
What’s Next for Spark on Mesos?
• Make “production” easier:
–Easier overriding of configuration with config files
outside the jars.
–Better documentation.
–Easier access to Spark UIs and logs from Mesos UIs
–Improved metrics.
–Smarter acceptance of resources offered.
THANK YOU.
tnachen@gmail.com
@tnachen
dean.wampler@lightbend.com
@deanwampler

More Related Content

What's hot

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Databricks
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark ClustersApril 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
Yahoo Developer Network
 

What's hot (20)

Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark ClustersApril 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 

Viewers also liked

Viewers also liked (20)

Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
From MapReduce to Apache Spark
From MapReduce to Apache SparkFrom MapReduce to Apache Spark
From MapReduce to Apache Spark
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 

Similar to Spark on Mesos

Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 

Similar to Spark on Mesos (20)

Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Spark 1.0
Spark 1.0Spark 1.0
Spark 1.0
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
Introduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software StackIntroduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software Stack
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 

More from Jen Aman

More from Jen Aman (15)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
 

Recently uploaded

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Spark on Mesos

  • 1. Spark on Mesos Tim Chen Mirantis tnachen@gmail.com Dean Wampler Lightbend dean.wampler@lightbend.com
  • 2. Dean Wampler • Architect for Big Data Products at Lightbend • Early advocate for Spark on Mesos • O’Reilly author –Programming Scala, 2nd Edition –Programming Hive –Functional Programming for Java Developers Timothy Chen • Principal Engineer at Mirantis • Previously lead engineer at Mesosphere • Apache Mesos PMC • Spark contributor, help maintain Spark on Mesos
  • 3. What’s this all about, then? • Why Spark on Mesos? • What’s happened since last year? • Demo - GPU support •What’s next for Spark and Mesos?
  • 4. Why Spark on Mesos • Hadoop is great, but ... –… resource management with YARN is limited to compute engines like MapReduce and Spark. •What if your clustering system could run everything?
  • 5. Why Spark on Mesos • Hadoop is great, but ... –… Big Data is moving to streaming (“Fast Data”) and Spark offers mini-batch streaming. •What if your cluster system offered dynamic and flexible resource scheduling able to meet the needs of evolving, long-running streams?
  • 6. Why Spark on Mesos • Hadoop is great, but ... –… it doesn’t support other popular tools like Cassandra, Akka, web frameworks, ... •Maybe you need the SMACK stack: –Spark –Mesos –Akka –Cassandra –Kafka There’s a Scheduler for that!
  • 7. What’s happened since last year? • What’s new in Mesos • What’s new in Spark on Mesos • Getting rid of fine-grained mode?
  • 8. What’s new in Mesos? • Maintenance primitives • Resource quotas, dynamic reservation *Beta* • CNI network Support • GPU Support • Unified Containerizer • More..
  • 9. What’s new in Spark on Mesos? • Integration test suite • New scheduler • Mesos framework authentication • Cluster mode now supports Python
  • 10. Integration Test Suite • A recent release candidate for Spark broke Mesos integration completely. –Better integration testing clearly needed. –Lightbend and Mesosphere collaborated on an automated integration test suite. https://github.com/typesafehub/mesos-spark-integration-tests
  • 11. Integration Test Suite • “mesos-docker” subproject: –Builds Docker image with Ubuntu, Mesos, Spark, and HDFS. –Scripts to run cluster with 1 master and N slaves, configurable #s of CPUs, memory, etc. •(Not needed if you already have a Mesos cluster ;^)
  • 12. Integration Test Suite • “test-runner” subproject: –Executes a suite of tests on your Mesos or DC/OS cluster. – Currently exercises dynamic allocation, coarse-grain and fine-grain modes, etc.
  • 13. New Coarse Grain Scheduler How the old Coarse grain scheduler works? Launch 1 Spark executor per agent - Rough steps: - Evaluate offers as it comes in from the master - Offers that meets min cpu (1) and min memory requirements - Use as much cores until meets spark.cores.max - Every executor requests fixed memory
  • 14. New Coarse Grain Scheduler How the old Coarse grain scheduler works? Mesos Agent 1 CPU: 8 Memory: 8gb Mesos Agent 2 CPU: 8 Memory: 8gb Mesos Agent 3 CPU: 8 Memory: 8gb CoarseMesosSchedulerBackend spark.cores.max=12 spark.executor.memory=4gb Spark Executor CPU 8 Memory 4gb Spark Executor CPU 4 Memory 4gb
  • 15. New Coarse Grain Scheduler How the old Coarse grain scheduler works? Mesos Agent 1 CPU: 8 Memory: 8gb Mesos Agent 2 CPU: 2 Memory: 8gb Mesos Agent 3 CPU: 2 Memory: 8gb CoarseMesosSchedulerBackend spark.cores.max=12 spark.executor.memory=4gb Spark Executor CPU 8 Memory 4gb Spark Executor CPU 2 Memory 4gb Spark Executor CPU 2 Memory 4gb
  • 16. New Coarse Grain Scheduler How the old Coarse grain scheduler works? Mesos Agent CPU: 8 Memory: 64gb Mesos Agent CPU: 2 Memory: 64gb Mesos Agent CPU: 2 Memory: 64gb CoarseGrainedMesosScheduler Spark Executor CPU 8 Memory 64gb Spark Executor CPU 2 Memory 64gb Spark Executor CPU 2 Memory 64gb spark.cores.max=12 spark.executor.memory=64gb
  • 17. New Coarse Grain Scheduler Problems with the old scheduler: • Only allow one executor per slave • Unpredictable performance • Can skew allocation
  • 18. New Coarse Grain Scheduler Mesos Agent 1 CPU: 8 Memory: 8gb Mesos Agent 2 CPU: 8 Memory: 8gb Mesos Agent 3 CPU: 8 Memory: 8gb CoarseMesosSchedulerBackend spark.cores.max=12 spark.executor.memory=4gb spark.executor.cores=4 Spark Executor CPU 4 Memory 4gb Spark Executor CPU 4 Memory 4gb Spark Executor CPU 4 Memory 4gb
  • 19. Mesos Framework Authentication • Mesos supports framework authentication. • Roles can be set per framework –Impacts the relative weight of resource allocation •Optional authentication information to allow the framework to be connected to the master.
  • 20. Getting rid of fine-grained mode? Coarse-grained Mode Fine-grained Mode
  • 21. Getting rid of fine-grained mode? • Why two modes? –FG uses resources more efficiently, because of start- on-demand and Spark executor+task are removed when no longer needed. –CG holds onto all allocated tasks until the job finishes. –But that makes CG faster to start tasks; nice for interactive jobs (e.g., SQL queries). –While FG has a longer start up time.
  • 22. Getting rid of fine-grained mode? • Why two modes? –Until recently, only ONE CG executor allowed per worker node. –Makes it harder to exploit all of the node’s resources.
  • 23. Getting rid of fine-grained mode? • Today: –Dynamic Allocation reclaims unused executors. •(Although running this service on every node is a disadvantage) –Allows more than one CG executor per node. •Hence, the advantages of FG are becoming less important.
  • 24. Getting rid of fine-grained mode? • Spark has lots of redundant code to implement both modes. •So, to simplify the code base and operations, FG is now deprecated, but it can’t be removed yet.
  • 25. GPUsMesos Running _____________ on __________with _______ on top of ______ using _____ in the _____! Demo Deep Learning Tensorflow Spark Cloud
  • 26. What’s Next for Mesos? • Pod support • Multiple roles support • Event Bus • Improved Container Security (capabilities, etc)
  • 27. What’s Next for Spark on Mesos? • GPU Support on Mesos • Use revocable resources • Better scheduling –Strategies (e.g: Spread, Binpack) –Scheduling metrics • More integration test coverage: –More cluster and job configuration options. –Roles and authentication scenarios.
  • 28. What’s Next for Spark on Mesos? • Make “production” easier: –Easier overriding of configuration with config files outside the jars. –Better documentation. –Easier access to Spark UIs and logs from Mesos UIs –Improved metrics. –Smarter acceptance of resources offered.