Hadoop ecosystem boosts Tensorflow and machine learning technologies

Hadoop Ecosystem Boosts TensorFlow
and Machine Learning Technologies
Yanbo Liang
Apache Spark Committer
Hortonworks
Wangda Tan
Apache Hadoop PMC member
Hortonworks

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Overview of Machine Learning on Big Data
Platform
 How Apache Hadoop YARN boosts machine
learning workloads
 Example walkthrough: How to do Click-Through-
Rate on a big data platform

Overview of Machine Learning on Big
Data Platform

Machine Learning on Big Data Platform
Apache Zeppelin

Machine Learning on Big Data Platform
Data Scientists Software engineers
Explore data
Create pipeline
Find best params
Save model
Load model
Deploy in production
Scoring on
batch/streaming data
Apache Zeppelin

Machine learning workflow
Feature
Selection
Data
Feature
Transform
Feature
Encoding
Feature
Evaluation
Model
Training
Feature
Model
Evaluation
Model
Validation
Model
Staging
Experiment
Online
Feature
Model
Database
Exper-
iment
Model as
Service
Real-time
Feature
Calibration
Data Preprocessing
Feature Engineering
Model
Training
Online
Service

Machine Learning in a Unified Platform
“Hidden Technical Debt in Machine Learning Systems”,
Google

Machine learning – Data Preprocessing
Feature
Selection
Data
Feature
Transform
Feature
Encoding
Feature
Evaluation
Feature
Engineering
 Import data
– HDFS
– AWS
– RDBMS
 Join data
 Data exploration
 Data sample
 Training/Test random split

Machine learning – Feature Engineering
Feature
Selection
Data
Feature
Transform
Feature
Encoding
Feature
Evaluation
Feature
Engineering
 Feature transform/selection
 Feature embedding

Machine learning – Model Training
Model
Training
Feature
Model
Evaluation
Model
Validation
Model
Staging
Model
Training
 Traditional machine
learning models
– Logistic Regression
– Gradient boosting tree
– Recommendation/ALS
– LDA
 Libraries
– Apache Spark MLlib
– XGBoost
 Deep learning models
– DNN
– CNN
– RNN
– LSTM
 Libraries
– TensorFlow
– Apache MXNet
– BigDL

Model Training - Deep learning can’t fit all
 Natural language processing
 Computer vision
 Speech/Video
 Anti-fraud
 Recommendation
 CTR estimation
 Topic model
 PageRank

Machine learning – Model Serving
Experiment
Online
Feature
Model
Database
Exper-
iment
Model as
Service
Real-time
Feature
Calibration
Online
Service
 Model deploy
 Model serving
– Batch
– Streaming
 Experiment
– offline
– online (A/B test)

How Apache Hadoop YARN boosts
machine learning workloads

Machine learning platform
Hadoop YARN
HDFS AWS S3 RDBMS
Spark MLlib XGBoost TensorFlow
Zeppelin
Hive/LLAP Spark SQL
CPU GPU SSD

Why all under YARN
SLA!
Monitoring!
A normal YARN user
Quotas!
Isolation!
Capacity Planning, Preemption, Reservation System.
Time time services, Grafana, etc.
Queues / Users quota, user access control.
CPU / Memory, (WIP) GPU, FPGA, Network

All running on the same YARN platform
LLAP
128 G 128 G 128 G 128 G 128 G
LLAP LLAP
128 G 128 G
GPUs

GPU support on YARN
 Why?
 GPU: Many cores to handle massive (but simple) computation tasks simultaneously:
GPU CPU
GPU Computation Intensive Other
Without GPU support, researchers/engineers
are almost impossible to wait job finish.

Challenges of GPU support
 Different levels of support
– Take me to a machine where GPUs are available with Partitions / Node Labels. (Current status)
– Take me to a machine where GPUs are available
• give me a full device only to me for the lifetime of my container
• give me multiple full devices only to me for the lifetime of my container
• give me full device(s) only to me for a portion of the lifetime of my container
• give me a slice of device(s) to me for a full / portion of the lifetime of my
container
 More dimensions:
– Bandwidths and on-GPU memory
– Topology of multiple GPUs
Slide credit to: Vinod Kumar Vavilapalli

YARN assembly: Makes everything easier!
 Forget about writing an application master, this is how you can run app on YARN ..
 Write assembly spec in JSON (we call it Yarnfile)
 Post the JSON as REST request to YARN server.
 YARN to figure out rest of it.
 An example:

YARN assembly: Run multi-stages job

YARN assembly: Run Distributed Tensorflow Training (with PS)

YARN Assembly: Parallel Parameter Tuning

YARN Assembly: Model Serving and Update
 Application & Services
upgrades
– ”Do an upgrade of my
Tensorflow serving model
with minimal impact to end-
users”
- Use serving.tensorflow-mode-serving.wtan.domain:1234 to access the service.
- YARN could do load balancing for launched instances.

Example walkthrough: How to do Click-
Through-Rate on a big data platform

Click-Through Rate (CTR) Prediction
 Given a user and context, predict probability of a click for an ad.
 Probably the most “profitable” machine learning problem in industry
 Basic setting quite well-studied; scale make it challenging
– Google, Facebook, Yahoo, Bing
 Challenges
– Simple binary problem; but want probabilities, not just the label
– Very skewed label distribution: clicks << skips
– Tons of data (every impression generates a training example)
– Limitations at serving: need to predict quickly

Labeled events
Impression0 click
Impression1 non-click
… …
Impression11 click
… …
… …
Labeled events
Labeled events

CTR model
 Logistic regression (LR)
– LR on SGD/LBFGS - batch
– Follow the regularized leader (FTRL) -
online
 Factorization Machines (FM/FFM)
 Gradient boosting tree (GBT)
 Deep neural networks (DNN)

Questions?

Hadoop ecosystem boosts Tensorflow and machine learning technologies

Recommended

Recommended

More Related Content

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Hadoop ecosystem boosts Tensorflow and machine learning technologies

Editor's Notes