Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Spark on Kubernetes


Published on

How we can make use of Kubernetes as Resource Manager for Spark. What are the Pros and Cons of Spark Resource manager are discussed on this slides and the associated tutorial.

Refer this github project for more details and code samples :

Published in: Software
  • Be the first to comment

Apache Spark on Kubernetes

  1. 1. Apache Spark on Kubernetes Haridas N
  2. 2. Agenda ● What’s Kubernetes ● Why we need to run park on Kubernetes ● How comparable kubernetes with other cluster managers ● Hands on with few spark jobs.
  3. 3. Kubernetes ● Container orchestrator ● Provision containers on multiple nodes and abstract the networks over multiple node. ● Supports multiple namespaces for better project isolation. ● User role and privilege management.
  4. 4. Spark on Kubernetes ● Support different deployment modes
  5. 5. Resource provisioning
  6. 6. Why spark on Kubernetes ● Kubernetes is now widely used container management for SOA and other application environment. ● Better isolation on different deployments.
  7. 7. Demo / Workshop
  8. 8. RECAP
  9. 9. Hadoop, HDFS and Yarn ● HDFS for the storage layer, namenode and datanode services take care the data storage part. ● Yarn resource negotiator provide a compute framework over HDFS nodes. ● Map-reduce jobs are written on yarn framework. ● Best fit for batch-processing, big-data storage
  10. 10. Apache Spark on Yarn ● Using Yarn we deployed spark driver and slaves into hadoop cluster. ● Yarn provides more flexible resource management. ● Dynamic worker allocation or on demand allocation. ● Best fit, if you already have a hadoop cluster and want to run spark jobs on it.
  11. 11. Apache Drill ● SQL interface for bigdata, with spark like architecture. ● Interface with HDFS, NoSQL, Hive, Kafka etc. and provide unified standard SQL interface ● Exposes APIs JDBC, HTTP. ● Best fit for quick data analysis using SQL commands.
  12. 12. Apache Spark on Kubernetes ● Kubernetes is a widely used container orchestrator ● Major deployments outside big-data domain for different needs. ● Project supports big-data tools like spark and hadoop on top of it. ● Run spark job on existing kubernetes cluster. ● Got better feature set with resourcemanagement compared to all other cluster managers. ● Best fit if you already have kubernetes cluster in your environment.
  13. 13. QA
  14. 14. Thank you