Spark day 2017 - Spark on Kubernetes

• Senior Software Engineer of SK Telecom
• Commercial Products
• Big Data Discovery Solution (~’17)
• Hadoop DW (~’15)
• PaaS(CloudFoundry) (~’13)
• Iaas (OpenStack) (~’13)
• Mail to : jerryjung@apache.org
2

Kubernetes
Spark deployment using Kubernetes
Spark on Kubernetes
Demo
3

Open Source
Automation
Framework for
deploying,
managing, and
scaling
applications.
4

Kubernetes provides a common API and
self-healing framework which
automatically handles machine failures
and application deployments, logging,
and monitoring.
5

6
https://github.com/kubernetes/kubernetes/blob/master/docs/design/architecture.md

https://thenewstack.io/kubernetes-an-overview/
7

8

9

Clusters - set of compute, storage, network
resource
Pods - colocated group of application containers
that share volumes and a networking stack
Replication Controllers - ensure a speciﬁc number
of pods, manage pods, status updates
Services - cluster wide service discovery
10

Node #1 192.168.0.2
Pod #1
10.0.0.2
Node #5 192.168.0.6
Volume
Network
Pod #2
10.0.0.3
Volume
Network
Pod #8
10.0.0.9
Volume
Network
8080 8080 8080
11

Support for Event Stream Processing
Fast Data Queries in Real Time
Improved Programmer Productivity
Fast Batch Processing of Large Data Set
12

Driver Process that contains the SparkContext
Executor Process that executes one or more Spark tasks
Master Process that manages applications across the cluster
Worker Process that manages executors on a particular node
13
http://spark.apache.org/docs/latest/cluster-overview.html

Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Worker Node
Executor
Worker Node
Executor
14

http://freecontent.manning.com/running-spark-an-overview-of-sparks-runtime-architecture/
15
cluster mode client mode

https://www.slideshare.net/grahaindia/new-features-of-kubernetes-v120-beta
DaemonSet
16

StatefulSets
http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html
17
$(statefulset name)-$(ordinal)

https://github.com/Comcast/kube-yarn
18

Node #1 …. #n
HDFS
DN
HDFS
NN
HDFS
DN……………
YARN
NM
YARN
RM
YARN
NM……………
zeppelin
pod
spark submit
19

Node #1 …. #n
HDFS
DN
HDFS
NN
HDFS
DN……………
YARN
NM
YARN
RM
YARN
NM……………
zeppelin
X
pod
spark submit
21

https://github.com/kubernetes/kubernetes/issues/34377
https://issues.apache.org/jira/browse/SPARK-18278
25

https://spark-summit.org/2017/events/apache-spark-on-kubernetes/
26

27

SPARK-18278 - Spark on Kubernetes Design Proposal.pdf
28

Node Manager # 1…N
external
shufﬂe plugin
RDD 
(IntermediateFile)
RDD 
(IntermediateFile)
External Shufﬂe
30
Executor
Long-Running ETL jobs
Interactive application or Server
Any application with large shuffles
Executor

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
1. shuffle plugin add jar
2. yarn-site.xml add plugin
31

spark.dynamicAllocation.enabled true
spark.shufﬂe.service.enabled true
spark.dynamicAllocation.minExecutors 50
spark.dynamicAllocation.maxExecutors 100
spark.dynamicAllocation.initialExecutors 50
spark.dynamicAllocation.schedulerBacklogTimeout 5s
spark.dynamicAllocation.executorIdleTimeout 60
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
cf) Mesos - Coarse-Grained Mode
3. edit spark-default.conf
32

3333
1
2 3

https://www.slideshare.net/cfregly/spark-on-kubernetes-advanced-spark-and-tensorﬂow-
meetup-jan-19-2017-anirudh-ramanthan-from-google-kubernetes-team
34
1

bin/spark-submit
--deploy-mode cluster
--class org.apache.spark.examples.SparkPi
--master k8s://https://{k8s address}
--kubernetes-namespace default
--conf spark.executor.instances=5
--conf spark.app.name=spark-pi
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0
local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar
3535

https://www.slideshare.net/cfregly/spark-on-kubernetes-advanced-spark-and-tensorﬂow-
meetup-jan-19-2017-anirudh-ramanthan-from-google-kubernetes-team
36
2

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: spark-shuffle-service
spark-version: 2.1.0
name: shuffle
spec:
template:
metadata:
labels:
app: spark-shuffle-service
spark-version: 2.1.0
spec:
volumes:
- name: temp-volume
hostPath:
path: '/var/tmp' # change this path according to your cluster configuration.
containers:
- name: shuffle
image: kubespark/spark-shuffle:v2.1.0-kubernetes-0.2.0
37

bin/spark-submit
--class org.apache.spark.examples.GroupByTest
--master k8s://https://{k8s address}
--conf spark.app.name=group-by-test
--conf spark.dynamicAllocation.enabled=true
--conf spark.shuffle.service.enabled=true
--conf spark.kubernetes.shuffle.namespace=default
--conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0"
local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar 10 40000 2
38

39
3

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: spark-resource-staging-server
spec:
replicas: 1
---
apiVersion: v1
kind: Service
metadata:
name: spark-resource-staging-service
spec:
type: NodePort
selector:
resource-staging-server-instance: default
ports:
- protocol: TCP
port: 10000
targetPort: 10000
nodePort: 31000
40

bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master k8s://{k8s address}
--conf spark.executor.instances=5
--conf spark.app.name=spark-pi
--conf spark.kubernetes.resourceStagingServer.uri=http://{node ip}:31000
examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar
41

https://spark-summit.org/2017/events/hdfs-on-kubernetes-lessons-learned/
4444

4545

4646

4747
https://github.com/apache-spark-on-k8s/kubernetes-HDFS

Spark day 2017 - Spark on Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spark day 2017 - Spark on Kubernetes

Similar to Spark day 2017 - Spark on Kubernetes (20)

More from Yousun Jeong

More from Yousun Jeong (9)

Recently uploaded

Recently uploaded (20)

Spark day 2017 - Spark on Kubernetes