SlideShare a Scribd company logo
1 of 43
Download to read offline
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
TensorFlow on Spark:
A Deep Dive into
Distributed Deep Learning
DataCon.TW 2020
Evans Ye, Verizon Media
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Evans Ye
2
Engineering Manager @ Verizon Media
● Use data to power advertising/eCommerce experience.
● Build next-gen Big Data & ML/AI solutions.
ASF Member @ Apache Software Foundation
● Spread the Apache way.
Apache Bigtop former VP, PMC member, Committer
● Drive project direction, build community, mentor new committers.
Director of Taiwan Data Engineering Association (TDEA)
● Promote OSS & data engineering technologies.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Agenda
1. Why Distributed Deep Learning?
2. Solution at Verizon Media
3. Distributed Deep Learning
4. Lightweight Distributed Deep Learning on PySpark
5. Recap & Future Work
3
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 4
Why Distributed
Deep Learning?
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Industry Trend
5
OpenAI blog post: AI and Compute
● Drastically increasing in
computation needs!
Before 2012:
● uncommon to use GPUs for ML
2012 to 2014:
● 1-8 GPUs rated at 1-2 TFLOPS
2014 to 2016:
● 10-100 GPUs rated at 5-10 TFLOPS
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Applying DL w/ GPUs in Enterprise
6
Deep Learning requires both big data & computing power(GPUs).
Data has gravity
● A dedicate GPU cluster posts a problem for data migration.
2.
DL training
1.
Prepare data
3.
Inferencingmodel
data
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 7
Solution at
Verizon Media
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Yahoo! TensorFlowOnSpark
(Open-sourced Feb. 2017)
8
Framework to create TensorFlow cluster on Spark and feeds the data for training.
Yahoo! Developer blog post for TFoS
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Input Modes
9
InputMode.SPARK
● HDFS → RDD.mapPartitions → TF worker (push mode)
InputMode.TENSORFLOW
● TF worker ← tf.data ← HDFS (pull mode)
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
What’s the Different?
InputMode.SPARK
● Data are proxied through Spark RDD, hence slower.
○ Supports whatever data can be loaded as RDD.
● TF worker runs in background. Failures happened behind the scene...
InputMode.TENSORFLOW
● Data fetched from HDFS directly, hence faster.
○ Supports TFRecords.
● TF worker runs in foreground. Failures are raised and retired by Spark.
10
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
TFoS API Example
11
cluster = TFCluster.run(sc, main_fun, args,
args.cluster_size, args.num_ps,
tensorboard=args.tensorboard,
input_mode=TFCluster.InputMode.SPARK, master_node='chief')
# InputMode.SPARK only
cluster.train(images_labels, args.epochs)
cluster.inference(images_labels)
cluster.shutdown()
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Get Data w/ InputMode.SPARK
12
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Get Data w/ InputMode.TENSORFLOW
13
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
TFoS Advantages
14
● Easily migrate existing TensorFlow programs with <10 lines of code
change.
● Support all TensorFlow functionalities: synchronous/asynchronous
training, model/data parallelism, inferencing and TensorBoard.
● Allow datasets on HDFS and other sources pushed by Spark or pulled by
TensorFlow.
● Easily integrate with your existing Spark data processing pipelines.
● Easily deployed on cloud or on-premise and on CPUs or GPUs.
* Ref: https://github.com/yahoo/TensorFlowOnSpark
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
More About TFoS
15
Spark Summit Talks:
● TensorFlow On Spark: Scalable TensorFlow Learning on Spark Clusters
● TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond
Github:
● https://github.com/yahoo/TensorFlowOnSpark
● TFoS 1.X Keras Example
● TFoS 2.X InputMode.TENSORFLOW Keras Example
● TFoS 2.X InputMode.SPARK Keras Example
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 16
Distributed Deep Learning
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Types of Parallelism
17
Data Parallelism
● Each worker trains on different data pieces.
● Sync/async approaches to update the parameters w/ gradients.
● Entire model should fit into GPU’s memory.
Model Parallelism
● EX: To train a 6 layers model,
assign first 3 layers to worker0, later 3 layers to worker1.
● If model can’t fit into a single GPU’s memory.
Hybrid approach
● data parallelism between nodes, model parallelism between GPUs.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Asynchronous Parameter Server
18
● Each worker computes the gradient and send the delta to PS for updates.
● The updated parameters are then pulled to worker for next step training.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Asynchronous Parameter Server
19
Asynchronous → Inconsistency
● Different workers may update the parameters at the same time. Since
they act asynchronously, there’s no guaranteed order.
Large Scale Distributed Deep Networks, Google, 2012
● “In practice we found relaxing consistency requirements to be remarkably
effective.”
● Additional stochasticity.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Asynchronous Parameter Server
20
Pros:
● Scalable, each worker works independently.
● Robust to machine failures.
Cons:
● Workers may computing gradients based on staled weights, hence delay
the convergence.
Suitable for large number of not so powerful devices and dynamic
environment which preemption can happen.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Synchronous AllReduce
21
● Each worker has its own model parameters
and computes gradient separately.
● All the workers sync to each other with all
the gradients.
● Next training step begins after all the workers
have the model updated.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Ring AllReduce
22
Baidu, 2017
● Bringing HPC Techniques to Deep Learning
Uber, 2018
● Horovod: fast and easy distributed deep
learning in TensorFlow
Each worker sends gradient to its successor
and receive gradient from its predecessor.
● Uses both upload & download bandwidth at
the same time, hence the communication
time is optimized.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Synchronous AllReduce
23
Pros:
● Faster convergence w/ powerful devices & strong communication links.
Cons:
● Synchronous in design hence may suffer from failures.
● Not suitable for devices with different computing power, bandwidth.
Suitable for multi-GPU on single machine or small number of machines.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 24
Lightweight
Distributed Deep Learning on
PySpark
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Rethink Distributed Training
25
Engineering side:
● Single node, multi-GPU training w/ synchronous allreduce can lead to
faster convergence.
● TensorFlow cluster is required for multi-node training only.
Science side:
● Huge amount of labeled training data is not easy to get.
● Leveraging well-trained model and fine-tune from there is common in
practice instead of train from scratch.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
What if Only Single Node Distributed Training is
Supported?
26
● No need to spawn up a TensorFlow cluster.
● The code can be simplified w/o coupling to a clustering framework, hence
easier for deployment and testing.
● Failure discovery and handling can be simplified.
● Single node, multi-GPU training is supported by
MirroredStrategy(NcclAllReduce) w/ Keras API.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 27
Introducing a Simple, yet
Powerful Solution that Leverages
Several PySpark tricks.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Lightweight Distributed Training Architecture
28
PySpark Preprocessing
● Leverage Spark for distributed preprocessing.
○ spark.sql or spark.read to load data.
● Collect data back to driver for training.
○ df.toPandas()
Multi-GPU Training (Driver)
● Small data
model.fit(x=train_x, y=train_y, ...)
● Data can’t fit into GPU memory
model.fit(x=generator, ...)
HDFS
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Huge Data Training is Support (up to 1B records)
29
Generator + Spark df.toLocalIterator() to collect the data sequentially.
● iter = df.toLocalIterator()
● record = next(iter)
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Performance
30
Take one of our production model for example:
Multi-GPU
AllReduce
Multi-Node
Parameter Server
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Lightweight Parallel Predicting Architecture
31
Preprocessing
● Leverage Spark for distributed preprocessing.
○ spark.sql or spark.read to load data.
Pandas UDF Predict (Executors)
● Data are handed over to Python in Pandas
DataFrame format.
● In Python UDF, predict by a simple Keras API.
● Resulting DataFrame are then passed back as
a Spark DataFrame.
● df.write or other post-processing if needed.
HDFS
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
PySpark Pandas UDF
32
Pandas Grouped Map UDF (Spark 2.4)
● The UDF leverages Apache Arrow for efficient JVM <-> Python SerDe.
● GroupBy a uniform distributed random ID, making data evenly grouped
across partitions:
● Make predictions in UDF:
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Predicting Performance
33
● Looking at the CPU predicting, it scales
well with more resource added.
● Though GPU predicting is slightly
faster, CPU predicting is more scalable
due to # of CPUs available.
● The solution is capable of predicting up
to 1B records.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Summary
34
Productivity
● The PySpark based solution is lightweight and easy to test on local IDE.
No need to spawn up a cluster.
Flexibility
● The trainer/predictor impl. are decoupled from the framework and can be
run independently.
○ EX: run on local machine w/o Spark.
● The solution can support other frameworks such as Pytorch, xgboost, etc.
Efficiency
● Cross-language SerDes(toPandas, Pandas UDF) are optimized by
PySpark Arrow Integration.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 35
Recap & Future Work
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Recap
36
● TFoS is a comprehensive solution for distributed deep learning.
● Distributed deep learning can be achieved in a more lightweight manner
with “TensorFlow on Spark” only, therefore increase the productivity.
● Know the details under the neath. Maybe it's an architecture-wise
problem for your training:
Types of Parallelism Data Parallelism Model Parallelism
Training Approaches Async Parameter Server Sync AllReduce
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Future Work
37
● Petastorm for efficient DL with Parquet file format.
● Horovod to support multi-framework distributed deep learning.
● More efficient AllReduce:
○ NCCL topology optimizations.
○ Blink: Fast and Generic Collectives for Distributed ML.
● Other related area that are also rapidly developing:
○ ML Lifecycle: MLFlow, KubeFlow.
○ Feature Store: Michelangelo.
○ GPU Acceleration: RAPIDS for end-to-end ETL acceleration.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Reference
38
● https://openai.com/blog/ai-and-compute/
● Open Sourcing TensorFlowOnSpark: Distributed Deep Learning on
Big-Data Clusters
● Introduction to Distributed Deep Learning
● Large Scale Distributed Deep Networks
● Bringing HPC Techniques to Deep Learning
● Horovod: fast and easy distributed deep learning in TensorFlow
● NCCL 2.0
● DISTRIBUTED DEEP NEURAL NETWORK TRAINING: NCCL ON
SUMMIT
● Blink: Fast and Generic Collectives for Distributed ML
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Q&A
39
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
Appendix
41
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
TFoS 1.X (TensorFlow 1.X)
42
“Parameter Server” approach is adopted for distributed training.
Distributed training is supported via tf.estimator.train_and_evaluate API.
● Keras model can be converted to estimator via
tf.keras.estimator.model_to_estimator, but the train/predict APIs are still
required to be TF APIs.
Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
TFoS 2.X (TensorFlow 2.X)
43
In 2.X, Keras APIs are supported for distributed training by
tf.distribute.Strategy API(Experimental).
Distributed training with TensorFlow (as of 2.3)
● Notice that Keras API’s support is better than the others.

More Related Content

What's hot

Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Databricks
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...Databricks
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 

What's hot (18)

Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
London level39
London level39London level39
London level39
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 

Similar to TensorFlow on Spark Deep Dive

Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...Edureka!
 
Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowAditya Bhattacharya
 
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-Keesing
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-KeesingOpenStack in the Enterprise - Are You Ready? - Maish Saidel-Keesing
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-KeesingCloud Native Day Tel Aviv
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016Intel Nervana
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceTimothy Spann
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
TonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopTonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopAnthony Hsu
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCDamienCarpy
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsOracle Developers
 
Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjurconjur_inc
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 

Similar to TensorFlow on Spark Deep Dive (20)

Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
 
Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlow
 
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-Keesing
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-KeesingOpenStack in the Enterprise - Are You Ready? - Maish Saidel-Keesing
OpenStack in the Enterprise - Are You Ready? - Maish Saidel-Keesing
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
TonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopTonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on Hadoop
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
 
Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjur
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 

More from Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfEvans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽Evans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations publicEvans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessEvans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache WayEvans Ye
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisionerEvans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaEvans Ye
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competitionEvans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

More from Evans Ye (19)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

TensorFlow on Spark Deep Dive

  • 1. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. TensorFlow on Spark: A Deep Dive into Distributed Deep Learning DataCon.TW 2020 Evans Ye, Verizon Media
  • 2. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Evans Ye 2 Engineering Manager @ Verizon Media ● Use data to power advertising/eCommerce experience. ● Build next-gen Big Data & ML/AI solutions. ASF Member @ Apache Software Foundation ● Spread the Apache way. Apache Bigtop former VP, PMC member, Committer ● Drive project direction, build community, mentor new committers. Director of Taiwan Data Engineering Association (TDEA) ● Promote OSS & data engineering technologies.
  • 3. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Agenda 1. Why Distributed Deep Learning? 2. Solution at Verizon Media 3. Distributed Deep Learning 4. Lightweight Distributed Deep Learning on PySpark 5. Recap & Future Work 3
  • 4. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 4 Why Distributed Deep Learning?
  • 5. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Industry Trend 5 OpenAI blog post: AI and Compute ● Drastically increasing in computation needs! Before 2012: ● uncommon to use GPUs for ML 2012 to 2014: ● 1-8 GPUs rated at 1-2 TFLOPS 2014 to 2016: ● 10-100 GPUs rated at 5-10 TFLOPS
  • 6. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Applying DL w/ GPUs in Enterprise 6 Deep Learning requires both big data & computing power(GPUs). Data has gravity ● A dedicate GPU cluster posts a problem for data migration. 2. DL training 1. Prepare data 3. Inferencingmodel data
  • 7. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 7 Solution at Verizon Media
  • 8. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Yahoo! TensorFlowOnSpark (Open-sourced Feb. 2017) 8 Framework to create TensorFlow cluster on Spark and feeds the data for training. Yahoo! Developer blog post for TFoS
  • 9. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Input Modes 9 InputMode.SPARK ● HDFS → RDD.mapPartitions → TF worker (push mode) InputMode.TENSORFLOW ● TF worker ← tf.data ← HDFS (pull mode)
  • 10. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. What’s the Different? InputMode.SPARK ● Data are proxied through Spark RDD, hence slower. ○ Supports whatever data can be loaded as RDD. ● TF worker runs in background. Failures happened behind the scene... InputMode.TENSORFLOW ● Data fetched from HDFS directly, hence faster. ○ Supports TFRecords. ● TF worker runs in foreground. Failures are raised and retired by Spark. 10
  • 11. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. TFoS API Example 11 cluster = TFCluster.run(sc, main_fun, args, args.cluster_size, args.num_ps, tensorboard=args.tensorboard, input_mode=TFCluster.InputMode.SPARK, master_node='chief') # InputMode.SPARK only cluster.train(images_labels, args.epochs) cluster.inference(images_labels) cluster.shutdown()
  • 12. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Get Data w/ InputMode.SPARK 12
  • 13. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Get Data w/ InputMode.TENSORFLOW 13
  • 14. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. TFoS Advantages 14 ● Easily migrate existing TensorFlow programs with <10 lines of code change. ● Support all TensorFlow functionalities: synchronous/asynchronous training, model/data parallelism, inferencing and TensorBoard. ● Allow datasets on HDFS and other sources pushed by Spark or pulled by TensorFlow. ● Easily integrate with your existing Spark data processing pipelines. ● Easily deployed on cloud or on-premise and on CPUs or GPUs. * Ref: https://github.com/yahoo/TensorFlowOnSpark
  • 15. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. More About TFoS 15 Spark Summit Talks: ● TensorFlow On Spark: Scalable TensorFlow Learning on Spark Clusters ● TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond Github: ● https://github.com/yahoo/TensorFlowOnSpark ● TFoS 1.X Keras Example ● TFoS 2.X InputMode.TENSORFLOW Keras Example ● TFoS 2.X InputMode.SPARK Keras Example
  • 16. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 16 Distributed Deep Learning
  • 17. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Types of Parallelism 17 Data Parallelism ● Each worker trains on different data pieces. ● Sync/async approaches to update the parameters w/ gradients. ● Entire model should fit into GPU’s memory. Model Parallelism ● EX: To train a 6 layers model, assign first 3 layers to worker0, later 3 layers to worker1. ● If model can’t fit into a single GPU’s memory. Hybrid approach ● data parallelism between nodes, model parallelism between GPUs.
  • 18. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Asynchronous Parameter Server 18 ● Each worker computes the gradient and send the delta to PS for updates. ● The updated parameters are then pulled to worker for next step training.
  • 19. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Asynchronous Parameter Server 19 Asynchronous → Inconsistency ● Different workers may update the parameters at the same time. Since they act asynchronously, there’s no guaranteed order. Large Scale Distributed Deep Networks, Google, 2012 ● “In practice we found relaxing consistency requirements to be remarkably effective.” ● Additional stochasticity.
  • 20. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Asynchronous Parameter Server 20 Pros: ● Scalable, each worker works independently. ● Robust to machine failures. Cons: ● Workers may computing gradients based on staled weights, hence delay the convergence. Suitable for large number of not so powerful devices and dynamic environment which preemption can happen.
  • 21. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Synchronous AllReduce 21 ● Each worker has its own model parameters and computes gradient separately. ● All the workers sync to each other with all the gradients. ● Next training step begins after all the workers have the model updated.
  • 22. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Ring AllReduce 22 Baidu, 2017 ● Bringing HPC Techniques to Deep Learning Uber, 2018 ● Horovod: fast and easy distributed deep learning in TensorFlow Each worker sends gradient to its successor and receive gradient from its predecessor. ● Uses both upload & download bandwidth at the same time, hence the communication time is optimized.
  • 23. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Synchronous AllReduce 23 Pros: ● Faster convergence w/ powerful devices & strong communication links. Cons: ● Synchronous in design hence may suffer from failures. ● Not suitable for devices with different computing power, bandwidth. Suitable for multi-GPU on single machine or small number of machines.
  • 24. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 24 Lightweight Distributed Deep Learning on PySpark
  • 25. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Rethink Distributed Training 25 Engineering side: ● Single node, multi-GPU training w/ synchronous allreduce can lead to faster convergence. ● TensorFlow cluster is required for multi-node training only. Science side: ● Huge amount of labeled training data is not easy to get. ● Leveraging well-trained model and fine-tune from there is common in practice instead of train from scratch.
  • 26. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. What if Only Single Node Distributed Training is Supported? 26 ● No need to spawn up a TensorFlow cluster. ● The code can be simplified w/o coupling to a clustering framework, hence easier for deployment and testing. ● Failure discovery and handling can be simplified. ● Single node, multi-GPU training is supported by MirroredStrategy(NcclAllReduce) w/ Keras API.
  • 27. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 27 Introducing a Simple, yet Powerful Solution that Leverages Several PySpark tricks.
  • 28. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Lightweight Distributed Training Architecture 28 PySpark Preprocessing ● Leverage Spark for distributed preprocessing. ○ spark.sql or spark.read to load data. ● Collect data back to driver for training. ○ df.toPandas() Multi-GPU Training (Driver) ● Small data model.fit(x=train_x, y=train_y, ...) ● Data can’t fit into GPU memory model.fit(x=generator, ...) HDFS
  • 29. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Huge Data Training is Support (up to 1B records) 29 Generator + Spark df.toLocalIterator() to collect the data sequentially. ● iter = df.toLocalIterator() ● record = next(iter)
  • 30. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Performance 30 Take one of our production model for example: Multi-GPU AllReduce Multi-Node Parameter Server
  • 31. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Lightweight Parallel Predicting Architecture 31 Preprocessing ● Leverage Spark for distributed preprocessing. ○ spark.sql or spark.read to load data. Pandas UDF Predict (Executors) ● Data are handed over to Python in Pandas DataFrame format. ● In Python UDF, predict by a simple Keras API. ● Resulting DataFrame are then passed back as a Spark DataFrame. ● df.write or other post-processing if needed. HDFS
  • 32. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. PySpark Pandas UDF 32 Pandas Grouped Map UDF (Spark 2.4) ● The UDF leverages Apache Arrow for efficient JVM <-> Python SerDe. ● GroupBy a uniform distributed random ID, making data evenly grouped across partitions: ● Make predictions in UDF:
  • 33. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Predicting Performance 33 ● Looking at the CPU predicting, it scales well with more resource added. ● Though GPU predicting is slightly faster, CPU predicting is more scalable due to # of CPUs available. ● The solution is capable of predicting up to 1B records.
  • 34. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Summary 34 Productivity ● The PySpark based solution is lightweight and easy to test on local IDE. No need to spawn up a cluster. Flexibility ● The trainer/predictor impl. are decoupled from the framework and can be run independently. ○ EX: run on local machine w/o Spark. ● The solution can support other frameworks such as Pytorch, xgboost, etc. Efficiency ● Cross-language SerDes(toPandas, Pandas UDF) are optimized by PySpark Arrow Integration.
  • 35. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. 35 Recap & Future Work
  • 36. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Recap 36 ● TFoS is a comprehensive solution for distributed deep learning. ● Distributed deep learning can be achieved in a more lightweight manner with “TensorFlow on Spark” only, therefore increase the productivity. ● Know the details under the neath. Maybe it's an architecture-wise problem for your training: Types of Parallelism Data Parallelism Model Parallelism Training Approaches Async Parameter Server Sync AllReduce
  • 37. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Future Work 37 ● Petastorm for efficient DL with Parquet file format. ● Horovod to support multi-framework distributed deep learning. ● More efficient AllReduce: ○ NCCL topology optimizations. ○ Blink: Fast and Generic Collectives for Distributed ML. ● Other related area that are also rapidly developing: ○ ML Lifecycle: MLFlow, KubeFlow. ○ Feature Store: Michelangelo. ○ GPU Acceleration: RAPIDS for end-to-end ETL acceleration.
  • 38. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Reference 38 ● https://openai.com/blog/ai-and-compute/ ● Open Sourcing TensorFlowOnSpark: Distributed Deep Learning on Big-Data Clusters ● Introduction to Distributed Deep Learning ● Large Scale Distributed Deep Networks ● Bringing HPC Techniques to Deep Learning ● Horovod: fast and easy distributed deep learning in TensorFlow ● NCCL 2.0 ● DISTRIBUTED DEEP NEURAL NETWORK TRAINING: NCCL ON SUMMIT ● Blink: Fast and Generic Collectives for Distributed ML
  • 39. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Q&A 39
  • 40. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.
  • 41. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited.Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. Appendix 41
  • 42. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. TFoS 1.X (TensorFlow 1.X) 42 “Parameter Server” approach is adopted for distributed training. Distributed training is supported via tf.estimator.train_and_evaluate API. ● Keras model can be converted to estimator via tf.keras.estimator.model_to_estimator, but the train/predict APIs are still required to be TF APIs.
  • 43. Verizon confidential and proprietary. Unauthorized disclosure, reproduction or other use prohibited. TFoS 2.X (TensorFlow 2.X) 43 In 2.X, Keras APIs are supported for distributed training by tf.distribute.Strategy API(Experimental). Distributed training with TensorFlow (as of 2.3) ● Notice that Keras API’s support is better than the others.