Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed Deep Learning on Hadoop
Clusters
Andy Feng & Jun Shi
Yahoo! Inc.
Our Talks @ Hadoop Summit
2
 Storm on YARN (2013)
› http://bit.ly/1W02tZy
 Spark on YARN (2014)
› http://bit.ly/1W03dxE
...
Agenda
• Why Deep Learning on Hadoop?
• CaffeOnSpark
– Architecture
– API: Scala + Python
• Demo
– CaffeOnSpark + Python N...
Deep Learning
4
Use Case: Flickr Magic View flickr.com/cameraroll
Yahoo Use Case: Yahoo Weather
6
 Beauty
› Computational
assessed
 Relevant
› Location
› Time
› Cloudy
› Shower
› …
Weath...
Yahoo Vision Kit: Demo
7
(4)
Apply
ML Model
@ Scale
Flickr DL/ML Pipeline
(3)
Non-deep
Learning
@ Scale
* http://bit.ly/1KIDfof by Pierre Garrigues...
Deep Learning vs. Hadoop
9
10
Machine Learning & Deep Learning on Hadoop
11
Hadoop Cluster Enhanced
 GPU servers added
› 4 Tesla K80 cards
• 2 GK210 GPUs, 24GB memory
 Network interface enhance...
Deep Learning Frameworks
 Caffe
› Available since Sept, 2013, 6.3k forks
› Popular in vision community & Yahoo
 TensorFl...
 Released in Feb. 2016
• Apache 2.0 license
• Distributed deep learning
– GPU or CPU
– Ethernet or InfiniBand
• Easily de...
CaffeOnSpark: Scalable Architecture
14
CaffeOnSpark: 19x Speedup (est.)
Training latency (hours)
Top-5ValidationError
CaffeOnSpark: Deployment Options
16
• Single node
– Spark-submit –master local
• Multiple nodes
– Spark-submit –master URL...
Spark CLI
• spark-submit
--num-executors #_Processes
--class com.yahoo.ml.CaffeOnSpark
caffe-on-spark.jar
-devices #_gpus_...
CaffeOnSpark: One Program (Scala)
http://bit.ly/21ZY1c2
18
cos = new CaffeOnSpark(ctx) conf = new Config(ctx, args).init()...
CaffeOnSpark: One Notebook (Python)
http://bit.ly/1REZ0cN
19
20
CaffeOnSpark: UI & Logs
Demo: CaffeOnSpark on EC2
 https://github.com/yahoo/CaffeOnSpark/wiki
› Get started on EC2
› Python for CaffeOnSpark
CaffeOnSpark: What’s Next?
 Validation within training
 Enhanced data layer
 RNN and LSTM
 Java API
 Asynchronous dis...
Related Work: SparkNet & DL4J
1) [driver] sc.broadcast(model) to executors
2) [executor] apply DL training against a mini-...
Summary
24
 Yahoo Hadoop clusters enhanced for deep learning
› GPU nodes + CPU nodes
› Infiniband network for fast commun...
25
Thank You!
Repo: github.com/yahoo/CaffeOnSpark
Email: caffeonspark-users@googlegroups.com
Upcoming SlideShare
Loading in …5
×

Distributed Deep Learning on Hadoop Clusters

5,569 views

Published on

Distributed Deep Learning on Hadoop Clusters

Published in: Technology
  • Be the first to comment

Distributed Deep Learning on Hadoop Clusters

  1. 1. Distributed Deep Learning on Hadoop Clusters Andy Feng & Jun Shi Yahoo! Inc.
  2. 2. Our Talks @ Hadoop Summit 2  Storm on YARN (2013) › http://bit.ly/1W02tZy  Spark on YARN (2014) › http://bit.ly/1W03dxE  Machine Learning on Hadoop/Spark (2015) › http://bit.ly/1NW3GvO
  3. 3. Agenda • Why Deep Learning on Hadoop? • CaffeOnSpark – Architecture – API: Scala + Python • Demo – CaffeOnSpark + Python Notebook
  4. 4. Deep Learning 4
  5. 5. Use Case: Flickr Magic View flickr.com/cameraroll
  6. 6. Yahoo Use Case: Yahoo Weather 6  Beauty › Computational assessed  Relevant › Location › Time › Cloudy › Shower › … Weather App Yahoo Weather App
  7. 7. Yahoo Vision Kit: Demo 7
  8. 8. (4) Apply ML Model @ Scale Flickr DL/ML Pipeline (3) Non-deep Learning @ Scale * http://bit.ly/1KIDfof by Pierre Garrigues, Deep Learning Summit 2015 (2) Deep Learning @ Scale (1) Prepare Datasets @ Scale * 10 billion photos * 7.5 million per day
  9. 9. Deep Learning vs. Hadoop 9
  10. 10. 10 Machine Learning & Deep Learning on Hadoop
  11. 11. 11 Hadoop Cluster Enhanced  GPU servers added › 4 Tesla K80 cards • 2 GK210 GPUs, 24GB memory  Network interface enhanced › InfiniBand for direct access to GPU memory › Ethernet for external communication
  12. 12. Deep Learning Frameworks  Caffe › Available since Sept, 2013, 6.3k forks › Popular in vision community & Yahoo  TensorFlow › Released in Nov. 2015, 9.8k forks  Theano, Torch, DL4J, etc.
  13. 13.  Released in Feb. 2016 • Apache 2.0 license • Distributed deep learning – GPU or CPU – Ethernet or InfiniBand • Easily deployed on public cloud or private cloud 13 CaffeOnSpark Open Sourced github.com/yahoo/CaffeOnSpark
  14. 14. CaffeOnSpark: Scalable Architecture 14
  15. 15. CaffeOnSpark: 19x Speedup (est.) Training latency (hours) Top-5ValidationError
  16. 16. CaffeOnSpark: Deployment Options 16 • Single node – Spark-submit –master local • Multiple nodes – Spark-submit –master URL –connection ethernet – Ex. EC2 – Spark-submit –master URL –connection infiniband – Ex., Yahoo Hadoop cluster
  17. 17. Spark CLI • spark-submit --num-executors #_Processes --class com.yahoo.ml.CaffeOnSpark caffe-on-spark.jar -devices #_gpus_per_proc -conf solver_config_file -model model_file -train | -test | -feature Caffe Configuration layer { name: "data" type: "MemoryData" source_class=“com.yahoo.ml.caffe.LMDB” memory_data_param { source: ”hdfs:///mnist/trainingdata/" batch_size: 64; channels: 1; height: 28; width: 28; } … } 17 CaffeOnSpark: DL Made Easy
  18. 18. CaffeOnSpark: One Program (Scala) http://bit.ly/21ZY1c2 18 cos = new CaffeOnSpark(ctx) conf = new Config(ctx, args).init() // (1) training DL model dl_train_source = DataSource.getSource(conf, true) cos.train(dl_train_source) // (2) extract features via DL lr_raw_source = DataSource.getSource(conf, false) ext_df = cos.features(lr_raw_source) // (3) apply ML lr_input=ext_df.withColumn(“L", cos.floats2doubleUDF(ext_df(conf.label))) .withColumn(“F", cos.floats2doublesUDF(ext_df(conf.features(0)))) lr = new LogisticRegression().setLabelCol(”L").setFeaturesCol(”F") lr_model = lr.fit(lr_input_df) Non-deep Learning DeepLearning
  19. 19. CaffeOnSpark: One Notebook (Python) http://bit.ly/1REZ0cN 19
  20. 20. 20 CaffeOnSpark: UI & Logs
  21. 21. Demo: CaffeOnSpark on EC2  https://github.com/yahoo/CaffeOnSpark/wiki › Get started on EC2 › Python for CaffeOnSpark
  22. 22. CaffeOnSpark: What’s Next?  Validation within training  Enhanced data layer  RNN and LSTM  Java API  Asynchronous distributed training
  23. 23. Related Work: SparkNet & DL4J 1) [driver] sc.broadcast(model) to executors 2) [executor] apply DL training against a mini-batch of dataset to update models locally 3) [driver] aggregate(models) to produce a new model REPEAT Driver
  24. 24. Summary 24  Yahoo Hadoop clusters enhanced for deep learning › GPU nodes + CPU nodes › Infiniband network for fast communication  CaffeOnSpark open sourced › Empower Flickr and other Yahoo services • In production since Q3 2015 • Reduced training latency, and improved accuracy › Scalable deep learning made easy • spark-submit on your Spark cluster
  25. 25. 25 Thank You! Repo: github.com/yahoo/CaffeOnSpark Email: caffeonspark-users@googlegroups.com

×