Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Teaching Apache Spark
Applications to Manage
Their Workers Elastically
Erik Erlandson
Trevor McKay
Red Hat, Inc.
Introduction
Erik Erlandson
Senior SW Engineer; Red Hat, Inc.
Emerging Technologies Group
Internal Data Science
Insightful...
Outline
• Trevor
– container orchestration
– containerizing spark
• Erik
– spark dynamic allocation
– metrics
– elastic wo...
Containerizing Spark
• Container 101
– What is a container?
– Docker, Kubernetes and OpenShift
• Why Containerize Spark?
•...
What is a container?
• A process running in a namespace on a
container host
– separate process table, file system, and rou...
Docker and Kubernetes
• “Docker is the world's leading software
containerization platform” www.docker.com
– Open-source to...
OpenShift Origin
• Built around a core of Docker and Kubernetes
• Adds application lifecycle management
functionality and ...
Why Containerize Spark?
• Repeatable clusters with no mandatory config
• Normal users can create a cluster
– No special pr...
Why Containerize Spark?
• Containers allow a cluster-per-app model
– Quick to spin up and spin down
– Isolation == multipl...
Why containerize Spark?
• Ephemeral clusters conserve resources
• Kubernetes makes horizontal scale out simple
– Elastic W...
Deeper on Spark + Containers
Optimizing Spark Deployments for Containers:
Isolation, Safety, and Performance
● William Ben...
Oshinko: Simplifying further
• Containers simplify deployment but still lots to do ...
– Create the master and worker cont...
Oshinko Features
• CLI, web UI, and REST interfaces
• Cluster creation with sane defaults (name only)
• Scale and delete w...
pod
Oshinko Components Oshinko web UI
Oshinko REST
Oshinko
Core
Oshinko CLI
Oshinko
Core
Oshinko OpenShift
console
OpenShi...
Creating a Cluster
CLI from a shell …
$ oshinko-cli create mycluster --storedconfig=clusterconfig 
--insecure-skip-tls-ver...
What is a cluster config?
$ oc export configmap clusterconfig
apiVersion: v1
data:
metrics.enable: "true"
scorpionstare.en...
Source for demo’s oshinko-rest
• Metrics implementation is being reviewed
– using carbon and graphite today
– investigatin...
Dynamic Allocation
Count
Backlog
Jobs
> E ?
Request
min(2x current,
backlog)
Wait
Interval
Report
Target as
Metric
Shut Do...
Dynamic Allocation
Count
Backlog
Jobs
> E ?
Request
min(2x current,
backlog)
Wait
Interval
Report
Target as
Metric
Shut Do...
Dynamic Allocation
Count
Backlog
Jobs
> E ?
Request
min(2x current,
backlog)
Wait
Interval
Report
Target as
Metric
Shut Do...
Dynamic Allocation
Count
Backlog
Jobs
> E ?
Request
min(2x current,
backlog)
Wait
Interval
Report
Target as
Metric
Shut Do...
Executor Scaling
spark.dynamicAllocation.initialExecutors
>= spark.dynamicAllocation.minExecutors
<= spark.dynamicAllocati...
Shuffle Service
• Caches shuffle results independent of Executor
• Saves results if Executor is shut down
• Required for r...
Dynamic Allocation Metrics
Published by the ExecutorAllocationManager
numberExecutorsToAdd Additional executors requested
...
Elastic Worker Daemon
driver
metric
service
elastic
daemon
oshinko
API
server
spark
workers
spark
workers
spark
workersspa...
Elastic Worker Daemon
driver
metric
service
elastic
daemon
oshinko
API
server
spark
workers
spark
workers
spark
workersspa...
Elastic Worker Daemon
driver
metric
service
elastic
daemon
oshinko
API
server
spark
workers
spark
workers
spark
workersspa...
Elastic Worker Daemon
driver
metric
service
elastic
daemon
oshinko
API
server
spark
workers
spark
workers
spark
workersspa...
Elastic Worker Daemon
driver
metric
service
elastic
daemon
oshinko
API
server
spark
workers
spark
workers
spark
workersspa...
Demo
Demo
Radanalytics.io
New community landing page at
http://radanalytics.io/
Where to Find Oshinko
Oshinko and related bits:
http://github.com/radanalyticsio/
Docker images:
https://hub.docker.com/u/...
Related Effort: Spark on K8s
• Native scheduler backend for Kubernetes
• https://github.com/apache-spark-on-k8s/spark
• De...
Thank You.
eje@redhat.com
tmckay@redhat.com
Upcoming SlideShare
Loading in …5
×

Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

665 views

Published on

Devops engineers have applied a great deal of creativity and energy to invent tools that automate infrastructure management, in the service of deploying capable and functional applications. For data-driven applications running on Apache Spark, the details of instantiating and managing the backing Spark cluster can be a distraction from focusing on the application logic. In the spirit of devops, automating Spark cluster management tasks allows engineers to focus their attention on application code that provides value to end-users.

Using Openshift Origin as a laboratory, we implemented a platform where Apache Spark applications create their own clusters and then dynamically manage their own scale via host-platform APIs. This makes it possible to launch a fully elastic Spark application with little more than the click of a button.

We will present a live demo of turn-key deployment for elastic Apache Spark applications, and share what we’ve learned about developing Spark applications that manage their own resources dynamically with platform APIs.

The audience for this talk will be anyone looking for ways to streamline their Apache Spark cluster management, reduce the workload for Spark application deployment, or create self-scaling elastic applications. Attendees can expect to learn about leveraging APIs in the Kubernetes ecosystem that enable application deployments to manipulate their own scale elastically.

Published in: Data & Analytics
  • Be the first to comment

Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

  1. 1. Teaching Apache Spark Applications to Manage Their Workers Elastically Erik Erlandson Trevor McKay Red Hat, Inc.
  2. 2. Introduction Erik Erlandson Senior SW Engineer; Red Hat, Inc. Emerging Technologies Group Internal Data Science Insightful Applications Trevor McKay Principal SW Engineer; Red Hat, Inc. Emerging Technologies Group Oshinko Development Insightful Applications
  3. 3. Outline • Trevor – container orchestration – containerizing spark • Erik – spark dynamic allocation – metrics – elastic worker daemon • Demo
  4. 4. Containerizing Spark • Container 101 – What is a container? – Docker, Kubernetes and OpenShift • Why Containerize Spark? • Oshinko – features – components – cluster creation example
  5. 5. What is a container? • A process running in a namespace on a container host – separate process table, file system, and routing table – base operating system elements – application-specific code • Resources can be limited through cgroups
  6. 6. Docker and Kubernetes • “Docker is the world's leading software containerization platform” www.docker.com – Open-source tool to build, store, and run containers – Images can be stored in shared registries • “Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts” kubernetes.io
  7. 7. OpenShift Origin • Built around a core of Docker and Kubernetes • Adds application lifecycle management functionality and DevOps tooling. www.openshift.org/ – multi-tenancy – Source-to-Image (S2I) • Runs on your laptop with “oc cluster up”
  8. 8. Why Containerize Spark? • Repeatable clusters with no mandatory config • Normal users can create a cluster – No special privileges, just an account on a management platform
  9. 9. Why Containerize Spark? • Containers allow a cluster-per-app model – Quick to spin up and spin down – Isolation == multiple clusters on the same host – Data can still be shared through common endpoints – Do I need to share a large dedicated cluster?
  10. 10. Why containerize Spark? • Ephemeral clusters conserve resources • Kubernetes makes horizontal scale out simple – Elastic Worker daemon builds on this foundation – Elasticity further conserves resources
  11. 11. Deeper on Spark + Containers Optimizing Spark Deployments for Containers: Isolation, Safety, and Performance ● William Benton (Red Hat) ● Thursday, February 9 ● 11:40 AM – 12:10 PM ● Ballroom A
  12. 12. Oshinko: Simplifying further • Containers simplify deployment but still lots to do ... – Create the master and worker containers – Handle spark configuration – Wire the cluster together – Allow access to http endpoints – Tear it all down when you’re done • Oshinko treats clusters as abstractions and does this work for us
  13. 13. Oshinko Features • CLI, web UI, and REST interfaces • Cluster creation with sane defaults (name only) • Scale and delete with simple commands • Advanced configuration – Enable features like Elastic Workers with a flag – Specify images and spark configuration files – Cluster configurations persisted in Kubernetes • Source-to-Image integration (pyspark, java, scala)
  14. 14. pod Oshinko Components Oshinko web UI Oshinko REST Oshinko Core Oshinko CLI Oshinko Core Oshinko OpenShift console OpenShift and Kubernetes API servers Oshinko CLI Oshinko Core pod s2i image pod Oshinko CLI Oshinko Core Launch script and user code
  15. 15. Creating a Cluster CLI from a shell … $ oshinko-cli create mycluster --storedconfig=clusterconfig --insecure-skip-tls-verify=true --token=$TOKEN Using REST from Python … import requests r = requests.post("http://oshinko-rest/clusters", json={"name": clustername, "config": {"name": "clusterconfig"} })
  16. 16. What is a cluster config? $ oc export configmap clusterconfig apiVersion: v1 data: metrics.enable: "true" scorpionstare.enable: "true" sparkimage: docker.io/manyangled/var-spark-worker:latest sparkmasterconfig: masterconfig sparkworkerconfig: workerconfig kind: ConfigMap metadata: creationTimestamp: null name: clusterconfig
  17. 17. Source for demo’s oshinko-rest • Metrics implementation is being reviewed – using carbon and graphite today – investigating jolokia metrics • Metrics and elastic workers currently supported at https://github.com/tmckayus/oshinko-rest/tree/metrics • Both features will be merged to oshinko master soon
  18. 18. Dynamic Allocation Count Backlog Jobs > E ? Request min(2x current, backlog) Wait Interval Report Target as Metric Shut Down Idle Executors Yes No spark .dynamicAllocation .enabled
  19. 19. Dynamic Allocation Count Backlog Jobs > E ? Request min(2x current, backlog) Wait Interval Report Target as Metric Shut Down Idle Executors Yes No spark .dynamicAllocation .maxExecutors
  20. 20. Dynamic Allocation Count Backlog Jobs > E ? Request min(2x current, backlog) Wait Interval Report Target as Metric Shut Down Idle Executors Yes No spark .dynamicAllocation .schedulerBacklogTimeout *.sink.graphite.host
  21. 21. Dynamic Allocation Count Backlog Jobs > E ? Request min(2x current, backlog) Wait Interval Report Target as Metric Shut Down Idle Executors Yes No spark .dynamicAllocation .executorIdleTimeout
  22. 22. Executor Scaling spark.dynamicAllocation.initialExecutors >= spark.dynamicAllocation.minExecutors <= spark.dynamicAllocation.maxExecutors <= backlog jobs (<= RDD partitions)
  23. 23. Shuffle Service • Caches shuffle results independent of Executor • Saves results if Executor is shut down • Required for running Dynamic Allocation • spark.shuffle.service.enabled = true
  24. 24. Dynamic Allocation Metrics Published by the ExecutorAllocationManager numberExecutorsToAdd Additional executors requested numberExecutorsPendingToRemove Executors being shut down numberAllExecutors Executors in any state numberTargetExecutors Total requested (current+additional) numberMaxNeededExecutors Maximum that could be loaded
  25. 25. Elastic Worker Daemon driver metric service elastic daemon oshinko API server spark workers spark workers spark workersspark master REST REST openshift API server Spark Master Pod Spark Worker PodsExecutor Request
  26. 26. Elastic Worker Daemon driver metric service elastic daemon oshinko API server spark workers spark workers spark workersspark master REST REST openshift API server numberTargetExecutors Spark Master Pod Spark Worker Pods
  27. 27. Elastic Worker Daemon driver metric service elastic daemon oshinko API server spark workers spark workers spark workersspark master REST REST openshift API server numberTargetExecutors Spark Master Pod Spark Worker Pods
  28. 28. Elastic Worker Daemon driver metric service elastic daemon oshinko API server spark workers spark workers spark workersspark master REST REST openshift API server Spark Master Pod Spark Worker Pods
  29. 29. Elastic Worker Daemon driver metric service elastic daemon oshinko API server spark workers spark workers spark workersspark master REST REST openshift API server Spark Master Pod Spark Worker Pods Pod Replication
  30. 30. Demo Demo
  31. 31. Radanalytics.io New community landing page at http://radanalytics.io/
  32. 32. Where to Find Oshinko Oshinko and related bits: http://github.com/radanalyticsio/ Docker images: https://hub.docker.com/u/radanalyticsio/ Images and notebook for today’s demo: https://hub.docker.com/u/tmckay/ https://hub.docker.com/u/manyangled/ https://github.com/erikerlandson/var-notebook/pulls
  33. 33. Related Effort: Spark on K8s • Native scheduler backend for Kubernetes • https://github.com/apache-spark-on-k8s/spark • Developer Community Collaboration
  34. 34. Thank You. eje@redhat.com tmckay@redhat.com

×