Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

© Cloudera, Inc. All rights reserved.
HADOOP {SUBMARINE} PROJECT: RUNNING DEEP
LEARNING WORKLOADS ON YARN & KUBERNETES
Sunil Govindan & Zhankun Tang

© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
SPEAKER
• Sunil Govindan
• Apache Hadoop PMC Member & Committer
• Staff Engineer at Cloudera Inc.
• Zhankun Tang
• Apache Hadoop Committer
• Staff Engineer at Cloudera Inc.

AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future

PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST
“Hidden Technical Debt in Machine Learning Systems”, Google

PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST
No cross platform solution to streamline the ML lifecycle
Amazon
S3
HDFS
Deep LearningData
Engineering
Model
Management
Model
Serving
Submarine + Ecosystem !
YARN & Kubernetes

BEGINNING: HOW HADOOP HELPED ML WORKLOADS ?
• Hadoop HDFS is widely adopted storage
• Hadoop YARN is the best orchestration for big data workload so far
• Enterprise level Scheduler - 3K+ containers per second!
• Docker support - Don’t worry about the complex environment
• GPU/FPGA resource - Faster than CPU
• GPU topology scheduling - Same count but 3X speed (best case)
• YARN Native Service - For long-running services, DNS .etc
• Hadoop ecosystem for better data engineering
• Spark
• Hive
• Flink

AGENDA
Demo
Present & Future

NOW SUBMARINE CAN…
run an ML job run tensorboard
Launch an ML job without modifying algorithm
--type “TensorFlow” or “pyTorch”
--num_workers <arg>
--worker_resources <arg>
--worker_launch_cmd <arg>
--worker_docker_image <arg>
--num_ps <arg>
--ps_resources <arg>
--ps_launch_cmd <arg>
--ps_docker_image <arg>
--localization <arg>
OR --f job.yaml
Launch a tensorboard
--checkpoint_path <arg>
--tensorboard_resources <arg>
--tensorboard_docker_image <arg>

NOW SUBMARINE CAN…
run Notebook with Submarine Ecosystem run natively on Kubernetes
Streamline big data and ML job
-- Zeppelin Notebook
-- Zeppelin Submarine interpreter
--Shell
--Python
--Dashboard
Same interface with Submarine on YARN
--Submarine.runtime.class to enable K8s
--More powerful service management
--Tensorflow job, Tensorboard

NOW SUBMARINE WILL DO…
TonY Runtime on YARN Model Management & Serving
Integrate with Linkedin’s TonY (Tensorflow on
YARN)
-- PyTorch
-- Support previous Hadoop2 version
Manage, verify and iterate your models
easily
--Model CRUD
--Model Deploy
--Model Serving

AGENDA
Demo
Present & Future

DEMO
Hadoop Submarine on YARN
• This demo will
• Run a tensorflow job
• Launch tensorboard

DEMO
Hadoop Submarine on
Kubernetes
• This demo will
• Run a tensorflow job
• Launch tensorboard

AGENDA
Demo
Present & Future

SUBMARINE ECOSYSTEM
Zeppelin Notebook with Submarine Interpreter
• “%submarine.shell” to play around in the container
• “%submarine.python” to write Python code
• “%submarine.dashboard” to submit a job
• Logging
• Tensorboard
• YARN UI

SUBMARINE ECOSYSTEM
User Experience: Work “Locally” in Zeppelin with Submarine Interpreter

SUBMARINE ECOSYSTEM
Deep Dive Demo: Zeppelin with Submarine Interpreter
• Submarine Interpreter
will be created
automatically if user
wants a Submarine
note

SUBMARINE ECOSYSTEM
“%submarine.shell”

SUBMARINE ECOSYSTEM
“%submarine.python”

SUBMARINE ECOSYSTEM
User Experience: Submit Job to a cluster using Submarine Dashboard

SUBMARINE ECOSYSTEM
Deep Dive Demo: Submarine Dashboard in Zeppelin - Basics
“%submarine.dashboard”
- Introduction

SUBMARINE ECOSYSTEM
Deep Dive Demo: Submarine Dashboard in Zeppelin – Job Submission
- Submit TensorFlow job

SUBMARINE ECOSYSTEM
Deep Dive Demo : Submarine Dashboard in Zeppelin – Job Status
- Check job status

SUBMARINE ECOSYSTEM
User Experience : Access Tensorboard from Submarine Dashboard

SUBMARINE ECOSYSTEM
Deep Dive Demo : Access Tensorboard from Submarine Dashboard

AGENDA
Demo
Present & Future

HADOOP SUBMARINE – PRESENT STATUS
• Released v0.10 with Apache Hadoop v3.2.0 release.
• Now Submarine is a sub project under Hadoop!
• For faster release cadence independent of Hadoop releases
• To build a larger ecosystem
• Focused community development

HADOOP SUBMARINE – FUTURE
• Upcoming Hadoop Submarine v0.20 release
• In next few weeks time
• Upcoming Features
• Integrate Linkedin’s TonY as a new runtime
• Older version Hadoop support without Docker
• Integrating PyTorch along with TensorFlow
• New Kubernetes deployment support
• Hyper parameter tuning
• Model Serving

HADOOP SUBMARINE – CASE STUDY
Netease (NASDAQ: NTES)
• One of the largest online game/news/music provider in China.
• 6 Hadoop clusters, total ~ 6k nodes.
• 100k jobs per day, 40% are Spark jobs
• 1000 ML jobs per day.
• Runs in a separated GPU K8S cluster (~500 nodes), all data comes from HDFS and processed by Spark, etc
• Existing problems:
• Low utilization (YARN tasks cannot leverage this cluster)
• High maintenance cost (Need to manage the separated cluster)
• Working with community to develop, verifying Submarine on 200+ Nodes GPU cluster
• Plan to move all ML workload to Submarine in the future

HADOOP SUBMARINE – COMMUNITY
• WebSite: https://hadoop.apache.org/submarine/
• Weekly Community Meeting Details:
https://docs.google.com/document/d/1VCtCvNH6Ew8psuzP1oNgeODpYkv2JN5_rsB7
JJ0mgYs/edit?usp=sharing
• Code: https://github.com/apache/hadoop
• Apache Hadoop Submarine is Hadoop community driven joint development program
and quite a few companies (like Cloudera, Netease, Linkedin, Alibaba, Didi, Huawei,
etc.) are making contributions.

Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

Similar to Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN