Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to deploy Apache Spark 
to Mesos/DCOS

42,260 views

Published on

Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS).

In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe.

We will introduce:

Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS
Overview of Mesos and how it relates to Mesosphere DCOS
Deeper look at how Spark runs on Mesos
How to manage coarse-grained and fine-grained scheduling modes on Mesos
What to know about a client vs. cluster deployment
A demo running Spark on Mesos

Published in: Technology

How to deploy Apache Spark 
to Mesos/DCOS

  1. 1. How to deploy Apache Spark
 to Mesos/DCOS with Iulian Dragoș
  2. 2. Agenda • Intro Apache Spark • Apache Mesos • Why Spark on Mesos • A look under the hood 2
  3. 3. Spark - lightning-fast cluster computing • next generation Big Data solution • analytics and data processing • up to 100x faster than Hadoop MapReduce • built with Scala and Akka • Apache top-level project 4
  4. 4. Spark • It’s a next generation compute-engine • Does not replace the whole Hadoop ecosystem • just MapReduce • Integrates/works with HDFS, Hive, Hbase, etc. 5
  5. 5. Spark API • Scala distributed collections • also available from Python and Java • interactive shell and job submission • streaming and batch modes • flourishing ecosystem (SparkSQL, MLLib, GraphX) 6
  6. 6. Spark execution 7 http://spark.apache.org/docs/latest/cluster-­‐overview.html
  7. 7. Spark execution • local (for experimentation) • standalone (built-in cluster manager) • YARN (Hadoop cluster manager) • Mesos (general cluster manager) 8
  8. 8. Apache Mesos
  9. 9. Why Apache Mesos? • General (a “distributed kernel”) • Efficient resource management • Proven technology (in production at Apple and Twitter) • Typesafe & Mesosphere maintain the Spark/Mesos framework 10 “Program against your datacenter as a single pool of resources”
  10. 10. Frameworks running on Mesos • HDFS • Cassandra • ElasticSearch • Yarn (Myriad) • Marathon, etc. • and of course, Spark 11
  11. 11. Resource scheduling with Mesos • 2-level scheduling • Mesos offers resources to frameworks • Frameworks accept or reject offers • Offers include • CPU cores, memory, ports, disk 12
  12. 12. 13 Mesos Cluster master Mesos Master Key Mesos Spark HDFS master / client master / client node Mesos Slave Name Node Executor task1 … node DiskDiskDiskDiskDisk Mesos Slave Data Node Executor … … node … HDFS FW Sched. Job 1 Spark FW Sched. Job 1 1 (S1, 8CPU, 32GB, ...)
  13. 13. 14 Mesos Cluster master Mesos Master Key Mesos Spark HDFS master / client master / client node Mesos Slave Name Node Executor task1 … node DiskDiskDiskDiskDisk Mesos Slave Data Node Executor … … node … HDFS FW Sched. Job 1 Spark FW Sched. Job 1 2 (S1, 8CPU, 32GB, ...) 1
  14. 14. def foo(x: Int) 15 Mesos Cluster master Mesos Master Key Mesos Spark HDFS master / client master / client node Mesos Slave Name Node Executor task1 … node DiskDiskDiskDiskDisk Mesos Slave Data Node Executor … … node … HDFS FW Sched. Job 1 Spark FW Sched. Job 1 2 1 (S1, 2CPU, 8GB, ...) (S1, 2CPU, 8GB, ...) 3
  15. 15. def foo(x: Int) 16 Mesos Cluster master Mesos Master Key Mesos Spark HDFS master / client master / client node Mesos Slave Name Node Executor task1 … node DiskDiskDiskDiskDisk Mesos Slave Data Node Executor … … node … HDFS FW Sched. Job 1 Spark FW Sched. Job 1 2 1 (S1, 2CPU, 8GB, ...) (S1, 2CPU, 8GB, ...) 3 4 Spark Executor task1 …
  16. 16. 17 Spark Cluster Abstraction …NodeNode Spark Driver object MyApp { def main() { val sc = new SparkContext(…) … } } Cluster Manager Spark Executor task task task task Spark Executor task task task task …
  17. 17. 18 Mesos Coarse Grained Mode …Node Node Mesos Executor …Mesos Executor master Spark Executor task task task task Spark Executor task task task task … Mesos Master Spark Framework Spark Driver object MyApp { def main() { val sc = new SparkContext(…) … } } Scheduler
  18. 18. Mesos Coarse Grained Mode • Fast startup for tasks: • Better for interactive sessions. • But resources locked up in larger Mesos task. • (Dynamic allocation changes this is in 1.5) 19 …Node Node Mesos Executor …Mesos Executor master Spark Executor task task task task Spark Executor task task task task … Mesos Master Spark Framework Spark Driver object MyApp { def main() { val sc = new SparkContext(…) … } } Scheduler
  19. 19. Mesos Fine Grained Mode 20 …NodeNode Spark Framework Mesos Executor … master Spark Driver object MyApp { def main() { val sc = new SparkContext(…) … } } task task task task … Mesos Master Mesos Executor Spark Exec task Spark Exec task Spark Exec task Spark Exec task Mesos Executor Spark Exec task Spark Exec task Spark Exec task Spark Exec task … Scheduler
  20. 20. Mesos Fine Grained Mode • Better resource utilization. • Slower startup for tasks: • Fine for batch and relatively static streaming. 21 …NodeNode Spark Framework Mesos Executor … master Spark Driver object MyApp { def main() { val sc = new SparkContext(…) … } } task task task task … Mesos Master Mesos Executor Spark Exec task Spark Exec task Spark Exec task Spark Exec task Mesos Executor Spark Exec task Spark Exec task Spark Exec task Spark Exec task … Scheduler
  21. 21. Dynamic allocation • Mesos support was added in Spark 1.5 • adds and removes executors based on load • when executors are idle, kills them • when tasks queue up in the scheduler, adds executors • needs external-shuffle-service to be running on each node 22
  22. 22. Client vs Cluster mode • Where does the driver process run? • client-mode: on the machine that submits the job • cluster-mode: on a machine in the cluster 23
  23. 23. Demo
  24. 24. What’s next on Mesos • Oversubscription (0.23) • Persistence Volumes • Dynamic Reservations • Optimistic Offers • Isolations • More…. 25
  25. 25. Closing words on Spark Streaming • Spark 1.5 improves resiliency by adding back-pressure inside Spark Streaming • slow-down receivers dynamically, based on load • Spark 1.6 will add the ability to connect to Reactive Streams • propagate back-pressure outside of Spark 26
  26. 26. Key points • Spark is a next-generation compute engine for Big Data • Mesos is a next-generation cluster manager • better utilization of cluster resources across organization • Spark on Mesos is commercially supported by Typesafe • Typesafe&Mesosphere are the maintainers of Spark/Mesos 27
  27. 27. EXPERT SUPPORT Why Contact Typesafe for Your Apache Spark Project? Ignite your Spark project with 24/7 production SLA, unlimited expert support and on-site training: • Full application lifecycle support for Spark Core, Spark SQL & Spark Streaming • Deployment to Standalone, EC2, Mesos clusters • Expert support from dedicated Spark team • Optional 10-day “getting started” services package Typesafe is a partner with Databricks, Mesosphere and IBM. Learn more about on-site trainingCONTACT US

×