SlideShare a Scribd company logo
1 of 31
Download to read offline
© Cloudera, Inc. All rights reserved.
HADOOP {SUBMARINE} PROJECT: RUNNING DEEP
LEARNING WORKLOADS ON YARN & KUBERNETES
Sunil Govindan & Zhankun Tang
© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
SPEAKER
• Sunil Govindan
• Apache Hadoop PMC Member & Committer
• Staff Engineer at Cloudera Inc.
• Zhankun Tang
• Apache Hadoop Committer
• Staff Engineer at Cloudera Inc.
© Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved.
AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future
© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST
“Hidden Technical Debt in Machine Learning Systems”, Google
© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST
No cross platform solution to streamline the ML lifecycle
Amazon
S3
HDFS
Deep LearningData
Engineering
Model
Management
Model
Serving
Submarine + Ecosystem !
YARN & Kubernetes
© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
BEGINNING: HOW HADOOP HELPED ML WORKLOADS ?
• Hadoop HDFS is widely adopted storage
• Hadoop YARN is the best orchestration for big data workload so far
• Enterprise level Scheduler - 3K+ containers per second!
• Docker support - Don’t worry about the complex environment
• GPU/FPGA resource - Faster than CPU
• GPU topology scheduling - Same count but 3X speed (best case)
• YARN Native Service - For long-running services, DNS .etc
• Hadoop ecosystem for better data engineering
• Spark
• Hive
• Flink
© Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved.
AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future
© Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
NOW SUBMARINE CAN…
run an ML job run tensorboard
Launch an ML job without modifying algorithm
--type “TensorFlow” or “pyTorch”
--num_workers <arg>
--worker_resources <arg>
--worker_launch_cmd <arg>
--worker_docker_image <arg>
--num_ps <arg>
--ps_resources <arg>
--ps_launch_cmd <arg>
--ps_docker_image <arg>
--localization <arg>
OR --f job.yaml
Launch a tensorboard
--checkpoint_path <arg>
--tensorboard_resources <arg>
--tensorboard_docker_image <arg>
© Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.
NOW SUBMARINE CAN…
run Notebook with Submarine Ecosystem run natively on Kubernetes
Streamline big data and ML job
-- Zeppelin Notebook
-- Zeppelin Submarine interpreter
--Shell
--Python
--Dashboard
Same interface with Submarine on YARN
--Submarine.runtime.class to enable K8s
--More powerful service management
--Tensorflow job, Tensorboard
© Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
NOW SUBMARINE WILL DO…
TonY Runtime on YARN Model Management & Serving
Integrate with Linkedin’s TonY (Tensorflow on
YARN)
-- PyTorch
-- Support previous Hadoop2 version
Manage, verify and iterate your models
easily
--Model CRUD
--Model Deploy
--Model Serving
© Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved.
AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future
© Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.
DEMO
Hadoop Submarine on YARN
• This demo will
• Run a tensorflow job
• Launch tensorboard
© Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
DEMO
Hadoop Submarine on
Kubernetes
• This demo will
• Run a tensorflow job
• Launch tensorboard
© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future
© Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Zeppelin Notebook with Submarine Interpreter
• “%submarine.shell” to play around in the container
• “%submarine.python” to write Python code
• “%submarine.dashboard” to submit a job
• Logging
• Tensorboard
• YARN UI
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
User Experience: Work “Locally” in Zeppelin with Submarine Interpreter
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo: Zeppelin with Submarine Interpreter
• Submarine Interpreter
will be created
automatically if user
wants a Submarine
note
© Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo: Zeppelin with Submarine Interpreter
“%submarine.shell”
© Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo: Zeppelin with Submarine Interpreter
“%submarine.python”
© Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
User Experience: Submit Job to a cluster using Submarine Dashboard
© Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo: Submarine Dashboard in Zeppelin - Basics
“%submarine.dashboard”
- Introduction
© Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo: Submarine Dashboard in Zeppelin – Job Submission
“%submarine.dashboard”
- Submit TensorFlow job
© Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo : Submarine Dashboard in Zeppelin – Job Status
“%submarine.dashboard”
- Check job status
© Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
User Experience : Access Tensorboard from Submarine Dashboard
© Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
SUBMARINE ECOSYSTEM
Deep Dive Demo : Access Tensorboard from Submarine Dashboard
“%submarine.dashboard”
© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
AGENDA
Why do we initiate Hadoop Submarine project ?
What can Hadoop Submarine offer ?
Demo
Hadoop Submarine Ecosystem
Present & Future
© Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
HADOOP SUBMARINE – PRESENT STATUS
• Released v0.10 with Apache Hadoop v3.2.0 release.
• Now Submarine is a sub project under Hadoop!
• For faster release cadence independent of Hadoop releases
• To build a larger ecosystem
• Focused community development
© Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.
HADOOP SUBMARINE – FUTURE
• Upcoming Hadoop Submarine v0.20 release
• In next few weeks time
• Upcoming Features
• Integrate Linkedin’s TonY as a new runtime
• Older version Hadoop support without Docker
• Integrating PyTorch along with TensorFlow
• New Kubernetes deployment support
• Hyper parameter tuning
• Model Serving
© Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved.
HADOOP SUBMARINE – CASE STUDY
Netease (NASDAQ: NTES)
• One of the largest online game/news/music provider in China.
• 6 Hadoop clusters, total ~ 6k nodes.
• 100k jobs per day, 40% are Spark jobs
• 1000 ML jobs per day.
• Runs in a separated GPU K8S cluster (~500 nodes), all data comes from HDFS and processed by Spark, etc
• Existing problems:
• Low utilization (YARN tasks cannot leverage this cluster)
• High maintenance cost (Need to manage the separated cluster)
• Working with community to develop, verifying Submarine on 200+ Nodes GPU cluster
• Plan to move all ML workload to Submarine in the future
© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.
HADOOP SUBMARINE – COMMUNITY
• WebSite: https://hadoop.apache.org/submarine/
• Weekly Community Meeting Details:
https://docs.google.com/document/d/1VCtCvNH6Ew8psuzP1oNgeODpYkv2JN5_rsB7
JJ0mgYs/edit?usp=sharing
• Code: https://github.com/apache/hadoop
• Apache Hadoop Submarine is Hadoop community driven joint development program
and quite a few companies (like Cloudera, Netease, Linkedin, Alibaba, Didi, Huawei,
etc.) are making contributions.
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot

Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowBringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Using Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest AirlinesUsing Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest AirlinesVMware Tanzu
 
Building Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireBuilding Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireJohn Blum
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDataWorks Summit
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014InMobi Technology
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache AmbariDataWorks Summit
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 

What's hot (20)

Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowBringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Using Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest AirlinesUsing Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest Airlines
 
Building Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireBuilding Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFire
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 

Similar to Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2DataWorks Summit
 
Nebulaworks Docker Overview 09-22-2015
Nebulaworks Docker Overview 09-22-2015Nebulaworks Docker Overview 09-22-2015
Nebulaworks Docker Overview 09-22-2015Chris Ciborowski
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analyticsWei-Chiu Chuang
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
COP_RoR_QuArrk_Session_Oct_2022.pptx
COP_RoR_QuArrk_Session_Oct_2022.pptxCOP_RoR_QuArrk_Session_Oct_2022.pptx
COP_RoR_QuArrk_Session_Oct_2022.pptxNitesh95975
 
Copr HD OpenStack Day India
Copr HD OpenStack Day IndiaCopr HD OpenStack Day India
Copr HD OpenStack Day Indiaopenstackindia
 
Using Docker For Development
Using Docker For DevelopmentUsing Docker For Development
Using Docker For DevelopmentLaura Frank Tacho
 
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...Amazon Web Services
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
New Repository in AEM 6 by Michael Marth
New Repository in AEM 6 by Michael MarthNew Repository in AEM 6 by Michael Marth
New Repository in AEM 6 by Michael MarthAEM HUB
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Kubernetes Java Operator
Kubernetes Java OperatorKubernetes Java Operator
Kubernetes Java OperatorAnthony Dahanne
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance TuningChristian Posta
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 

Similar to Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN (20)

Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Nebulaworks Docker Overview 09-22-2015
Nebulaworks Docker Overview 09-22-2015Nebulaworks Docker Overview 09-22-2015
Nebulaworks Docker Overview 09-22-2015
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analytics
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
COP_RoR_QuArrk_Session_Oct_2022.pptx
COP_RoR_QuArrk_Session_Oct_2022.pptxCOP_RoR_QuArrk_Session_Oct_2022.pptx
COP_RoR_QuArrk_Session_Oct_2022.pptx
 
Copr HD OpenStack Day India
Copr HD OpenStack Day IndiaCopr HD OpenStack Day India
Copr HD OpenStack Day India
 
Using Docker For Development
Using Docker For DevelopmentUsing Docker For Development
Using Docker For Development
 
YARN
YARNYARN
YARN
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
New Repository in AEM 6 by Michael Marth
New Repository in AEM 6 by Michael MarthNew Repository in AEM 6 by Michael Marth
New Repository in AEM 6 by Michael Marth
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Kubernetes Java Operator
Kubernetes Java OperatorKubernetes Java Operator
Kubernetes Java Operator
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance Tuning
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

  • 1. © Cloudera, Inc. All rights reserved. HADOOP {SUBMARINE} PROJECT: RUNNING DEEP LEARNING WORKLOADS ON YARN & KUBERNETES Sunil Govindan & Zhankun Tang
  • 2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved. SPEAKER • Sunil Govindan • Apache Hadoop PMC Member & Committer • Staff Engineer at Cloudera Inc. • Zhankun Tang • Apache Hadoop Committer • Staff Engineer at Cloudera Inc.
  • 3. © Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved. AGENDA Why do we initiate Hadoop Submarine project ? What can Hadoop Submarine offer ? Demo Hadoop Submarine Ecosystem Present & Future
  • 4. © Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved. PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST “Hidden Technical Debt in Machine Learning Systems”, Google
  • 5. © Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved. PAIN POINTS OF ML ENGINEER AND DATA SCIENTIST No cross platform solution to streamline the ML lifecycle Amazon S3 HDFS Deep LearningData Engineering Model Management Model Serving Submarine + Ecosystem ! YARN & Kubernetes
  • 6. © Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved. BEGINNING: HOW HADOOP HELPED ML WORKLOADS ? • Hadoop HDFS is widely adopted storage • Hadoop YARN is the best orchestration for big data workload so far • Enterprise level Scheduler - 3K+ containers per second! • Docker support - Don’t worry about the complex environment • GPU/FPGA resource - Faster than CPU • GPU topology scheduling - Same count but 3X speed (best case) • YARN Native Service - For long-running services, DNS .etc • Hadoop ecosystem for better data engineering • Spark • Hive • Flink
  • 7. © Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved. AGENDA Why do we initiate Hadoop Submarine project ? What can Hadoop Submarine offer ? Demo Hadoop Submarine Ecosystem Present & Future
  • 8. © Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved. NOW SUBMARINE CAN… run an ML job run tensorboard Launch an ML job without modifying algorithm --type “TensorFlow” or “pyTorch” --num_workers <arg> --worker_resources <arg> --worker_launch_cmd <arg> --worker_docker_image <arg> --num_ps <arg> --ps_resources <arg> --ps_launch_cmd <arg> --ps_docker_image <arg> --localization <arg> OR --f job.yaml Launch a tensorboard --checkpoint_path <arg> --tensorboard_resources <arg> --tensorboard_docker_image <arg>
  • 9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved. NOW SUBMARINE CAN… run Notebook with Submarine Ecosystem run natively on Kubernetes Streamline big data and ML job -- Zeppelin Notebook -- Zeppelin Submarine interpreter --Shell --Python --Dashboard Same interface with Submarine on YARN --Submarine.runtime.class to enable K8s --More powerful service management --Tensorflow job, Tensorboard
  • 10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved. NOW SUBMARINE WILL DO… TonY Runtime on YARN Model Management & Serving Integrate with Linkedin’s TonY (Tensorflow on YARN) -- PyTorch -- Support previous Hadoop2 version Manage, verify and iterate your models easily --Model CRUD --Model Deploy --Model Serving
  • 11. © Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved. AGENDA Why do we initiate Hadoop Submarine project ? What can Hadoop Submarine offer ? Demo Hadoop Submarine Ecosystem Present & Future
  • 12. © Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved. DEMO Hadoop Submarine on YARN • This demo will • Run a tensorflow job • Launch tensorboard
  • 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. DEMO Hadoop Submarine on Kubernetes • This demo will • Run a tensorflow job • Launch tensorboard
  • 14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved. AGENDA Why do we initiate Hadoop Submarine project ? What can Hadoop Submarine offer ? Demo Hadoop Submarine Ecosystem Present & Future
  • 15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Zeppelin Notebook with Submarine Interpreter • “%submarine.shell” to play around in the container • “%submarine.python” to write Python code • “%submarine.dashboard” to submit a job • Logging • Tensorboard • YARN UI
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM User Experience: Work “Locally” in Zeppelin with Submarine Interpreter
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo: Zeppelin with Submarine Interpreter • Submarine Interpreter will be created automatically if user wants a Submarine note
  • 18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo: Zeppelin with Submarine Interpreter “%submarine.shell”
  • 19. © Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo: Zeppelin with Submarine Interpreter “%submarine.python”
  • 20. © Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM User Experience: Submit Job to a cluster using Submarine Dashboard
  • 21. © Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo: Submarine Dashboard in Zeppelin - Basics “%submarine.dashboard” - Introduction
  • 22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo: Submarine Dashboard in Zeppelin – Job Submission “%submarine.dashboard” - Submit TensorFlow job
  • 23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo : Submarine Dashboard in Zeppelin – Job Status “%submarine.dashboard” - Check job status
  • 24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM User Experience : Access Tensorboard from Submarine Dashboard
  • 25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved. SUBMARINE ECOSYSTEM Deep Dive Demo : Access Tensorboard from Submarine Dashboard “%submarine.dashboard”
  • 26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved. AGENDA Why do we initiate Hadoop Submarine project ? What can Hadoop Submarine offer ? Demo Hadoop Submarine Ecosystem Present & Future
  • 27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved. HADOOP SUBMARINE – PRESENT STATUS • Released v0.10 with Apache Hadoop v3.2.0 release. • Now Submarine is a sub project under Hadoop! • For faster release cadence independent of Hadoop releases • To build a larger ecosystem • Focused community development
  • 28. © Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved. HADOOP SUBMARINE – FUTURE • Upcoming Hadoop Submarine v0.20 release • In next few weeks time • Upcoming Features • Integrate Linkedin’s TonY as a new runtime • Older version Hadoop support without Docker • Integrating PyTorch along with TensorFlow • New Kubernetes deployment support • Hyper parameter tuning • Model Serving
  • 29. © Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved. HADOOP SUBMARINE – CASE STUDY Netease (NASDAQ: NTES) • One of the largest online game/news/music provider in China. • 6 Hadoop clusters, total ~ 6k nodes. • 100k jobs per day, 40% are Spark jobs • 1000 ML jobs per day. • Runs in a separated GPU K8S cluster (~500 nodes), all data comes from HDFS and processed by Spark, etc • Existing problems: • Low utilization (YARN tasks cannot leverage this cluster) • High maintenance cost (Need to manage the separated cluster) • Working with community to develop, verifying Submarine on 200+ Nodes GPU cluster • Plan to move all ML workload to Submarine in the future
  • 30. © Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved. HADOOP SUBMARINE – COMMUNITY • WebSite: https://hadoop.apache.org/submarine/ • Weekly Community Meeting Details: https://docs.google.com/document/d/1VCtCvNH6Ew8psuzP1oNgeODpYkv2JN5_rsB7 JJ0mgYs/edit?usp=sharing • Code: https://github.com/apache/hadoop • Apache Hadoop Submarine is Hadoop community driven joint development program and quite a few companies (like Cloudera, Netease, Linkedin, Alibaba, Didi, Huawei, etc.) are making contributions.
  • 31. © Cloudera, Inc. All rights reserved. THANK YOU