Kimoon Kim (kimoon@pepperdata.com)
HDFS on Kubernetes --
Lessons Learned
Outline
1. Kubernetes intro
2. Big Data on Kubernetes
3. Demo
4. Problems we fixed -- HDFS data locality
Kubernetes
New open-source cluster manager.
Runs programs in Linux containers.
1000+ contributors and 40,000+ commits.
“My app was running fine
until someone installed
their software”
More isolation is good
Kubernetes provides each program with:
• a lightweight virtual file system
– an independent set of S/W packages
• a virtual network interface
– a unique virtual IP address
– an entire range of ports
• etc
Kubernetes architecture
node A node B
Pod 1 Pod 2 Pod 3
10.0.0.2
196.0.0.5 196.0.0.6
10.0.0.3 10.0.1.2
Pod, a unit of scheduling and isolation.
• runs a user program in a primary container
• holds isolation layers like an virtual IP in an infra container
Big Data on Kubernetes
github.com/apache-spark-on-k8s
• Google, Haiwen, Hyperpilot, Intel, Palantir, Pepperdata, Red Hat,
and growing.
• patching up Spark Driver and Executor code to work on
Kubernetes.
Yesterday’s talk:
spark-summit.org/2017/events/apache-spark-on-kuberne
tes/
Spark on Kubernetes
node A node B
Driver Pod Executor Pod 1 Executor Pod 2
10.0.0.2
196.0.0.5 196.0.0.6
10.0.0.3 10.0.1.2
Client
Client
Driver Pod Executor Pod 1 Executor Pod 2
10.0.0.4 10.0.0.5 10.0.1.3
Job 1
Job 2
What about storage?
Spark often stores data on HDFS.
How can Spark on Kubernetes access HDFS data?
Hadoop Distributed File System
Namenode
• runs on a central cluster node.
• maintains file system metadata.
Datanodes
• on every cluster node.
• read and write file data on local disks.
HDFS on Kubernetes
node A node B
Driver Pod Executor Pod 1 Executor Pod 2
10.0.0.2
196.0.0.5 196.0.0.6
10.0.0.3 10.0.1.2
Client
Client
Driver Pod Executor Pod 1 Executor Pod 2
10.0.0.4 10.0.0.5 10.0.1.3
Job 1
Job 2
Namenode Pod Datanode Pod 1 Datanode Pod 2
HDFS
hadoop.fs.defaultFS
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
Demo
1. Label cluster nodes
2. Stand up HDFS
3. Launch a Spark job
4. Check Spark job output
What about data locality?
• Read data from local disks when possible
• Remote reads can be slow if network is
slow
• Implemented in HDFS daemons, integrated
with app frameworks like Spark
Client 2
node B
Client 1
node A
Datanode 1 Datanode 2
SLOWFAST
HDFS data locality in YARN
Spark driver sends tasks to right executors with tasks’ HDFS data.
Driver
Executor 1 Executor 2
/fileA /fileB
Read /fileA
Read /fileB
(/fileA → Datanode 1 → 196.0.0.5) == (Executor 1 →196.0.0.5)
Datanode 1 Datanode 2
node A
196.0.0.5
node B
196.0.0.6
Job 1
HDFS
YARN example
Hmm, how do I find right executors in Kubernetes...
(/fileA → Datanode 1 → 196.0.0.5) != (Executor 1 → 10.0.0.3)
Executor Pod 2
10.0.1.2
Driver Pod Executor Pod 1
10.0.0.2 10.0.0.3
Read /fileB
Read /fileA
/fileA /fileB
Datanode Pod 1
node A
196.0.0.5
node B
196.0.0.6
Kubernetes example
Datanode Pod 2
Fix Spark Driver code
1. Ask Kubernetes master to find the cluster node where
the executor pod is running.
2. Get the node IP.
3. Compare with the datanode IPs.
Rescued data locality
Executor Pod 2
10.0.1.2
Driver Executor Pod 1
10.0.0.2 10.0.0.3
(/fileA → Datanode 1 → 196.0.0.5) == (Executor 1 →10.0.0.3 → 196.0.0.5)
Read /fileA
Read /fileB
/fileA /fileB
node A
196.0.0.5
node B
196.0.0.6
Datanode Pod 1 Datanode Pod 2
Rescued data locality!
with data locality fix
- duration: 10 minutes
without data locality fix
- duration: 25 minutes
Recap
Got HDFS up and running.
Basic data locality works.
Open problems:
• Remaining data locality issues -- rack locality, node preference, etc
• Kerberos support
• Namenode High Availability
Join us!
● github.com/apache-spark-on-k8s
● pepperdata.com/careers/
More questions?
● Come to Pepperdata booth #101
● Mail kimoon@pepperdata.com

HDFS on Kubernetes—Lessons Learned with Kimoon Kim

  • 1.
    Kimoon Kim (kimoon@pepperdata.com) HDFSon Kubernetes -- Lessons Learned
  • 2.
    Outline 1. Kubernetes intro 2.Big Data on Kubernetes 3. Demo 4. Problems we fixed -- HDFS data locality
  • 3.
    Kubernetes New open-source clustermanager. Runs programs in Linux containers. 1000+ contributors and 40,000+ commits.
  • 4.
    “My app wasrunning fine until someone installed their software”
  • 5.
    More isolation isgood Kubernetes provides each program with: • a lightweight virtual file system – an independent set of S/W packages • a virtual network interface – a unique virtual IP address – an entire range of ports • etc
  • 6.
    Kubernetes architecture node Anode B Pod 1 Pod 2 Pod 3 10.0.0.2 196.0.0.5 196.0.0.6 10.0.0.3 10.0.1.2 Pod, a unit of scheduling and isolation. • runs a user program in a primary container • holds isolation layers like an virtual IP in an infra container
  • 7.
    Big Data onKubernetes github.com/apache-spark-on-k8s • Google, Haiwen, Hyperpilot, Intel, Palantir, Pepperdata, Red Hat, and growing. • patching up Spark Driver and Executor code to work on Kubernetes. Yesterday’s talk: spark-summit.org/2017/events/apache-spark-on-kuberne tes/
  • 8.
    Spark on Kubernetes nodeA node B Driver Pod Executor Pod 1 Executor Pod 2 10.0.0.2 196.0.0.5 196.0.0.6 10.0.0.3 10.0.1.2 Client Client Driver Pod Executor Pod 1 Executor Pod 2 10.0.0.4 10.0.0.5 10.0.1.3 Job 1 Job 2
  • 9.
    What about storage? Sparkoften stores data on HDFS. How can Spark on Kubernetes access HDFS data?
  • 10.
    Hadoop Distributed FileSystem Namenode • runs on a central cluster node. • maintains file system metadata. Datanodes • on every cluster node. • read and write file data on local disks.
  • 11.
    HDFS on Kubernetes nodeA node B Driver Pod Executor Pod 1 Executor Pod 2 10.0.0.2 196.0.0.5 196.0.0.6 10.0.0.3 10.0.1.2 Client Client Driver Pod Executor Pod 1 Executor Pod 2 10.0.0.4 10.0.0.5 10.0.1.3 Job 1 Job 2 Namenode Pod Datanode Pod 1 Datanode Pod 2 HDFS hadoop.fs.defaultFS hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
  • 12.
    Demo 1. Label clusternodes 2. Stand up HDFS 3. Launch a Spark job 4. Check Spark job output
  • 13.
    What about datalocality? • Read data from local disks when possible • Remote reads can be slow if network is slow • Implemented in HDFS daemons, integrated with app frameworks like Spark Client 2 node B Client 1 node A Datanode 1 Datanode 2 SLOWFAST
  • 14.
    HDFS data localityin YARN Spark driver sends tasks to right executors with tasks’ HDFS data. Driver Executor 1 Executor 2 /fileA /fileB Read /fileA Read /fileB (/fileA → Datanode 1 → 196.0.0.5) == (Executor 1 →196.0.0.5) Datanode 1 Datanode 2 node A 196.0.0.5 node B 196.0.0.6 Job 1 HDFS YARN example
  • 15.
    Hmm, how doI find right executors in Kubernetes... (/fileA → Datanode 1 → 196.0.0.5) != (Executor 1 → 10.0.0.3) Executor Pod 2 10.0.1.2 Driver Pod Executor Pod 1 10.0.0.2 10.0.0.3 Read /fileB Read /fileA /fileA /fileB Datanode Pod 1 node A 196.0.0.5 node B 196.0.0.6 Kubernetes example Datanode Pod 2
  • 16.
    Fix Spark Drivercode 1. Ask Kubernetes master to find the cluster node where the executor pod is running. 2. Get the node IP. 3. Compare with the datanode IPs.
  • 17.
    Rescued data locality ExecutorPod 2 10.0.1.2 Driver Executor Pod 1 10.0.0.2 10.0.0.3 (/fileA → Datanode 1 → 196.0.0.5) == (Executor 1 →10.0.0.3 → 196.0.0.5) Read /fileA Read /fileB /fileA /fileB node A 196.0.0.5 node B 196.0.0.6 Datanode Pod 1 Datanode Pod 2
  • 18.
    Rescued data locality! withdata locality fix - duration: 10 minutes without data locality fix - duration: 25 minutes
  • 20.
    Recap Got HDFS upand running. Basic data locality works. Open problems: • Remaining data locality issues -- rack locality, node preference, etc • Kerberos support • Namenode High Availability
  • 21.
    Join us! ● github.com/apache-spark-on-k8s ●pepperdata.com/careers/ More questions? ● Come to Pepperdata booth #101 ● Mail kimoon@pepperdata.com