Successfully reported this slideshow.
Your SlideShare is downloading. ×

Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 22 Ad

Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

Download to read offline

"In addition to the many data engineering initiatives at Starbucks, we are also working on many interesting data science initatives. The business scenarios involved in our deep learning initatives include (but are not limited to) planogram analysis (layout of our stores for efficient partner and customer flow) to predicting product pairings (e.g. purchase a caramel machiato and perhaps you would like caramel brownie) via the product components using graph convolutional networks.

For this session, we will be focusing on how we can run distributed Keras (TensorFlow backend) training to perform image analytics. This will be combined with MLflow to showcase the data science lifecycle and how Databricks + MLflow simplifies it. "

"In addition to the many data engineering initiatives at Starbucks, we are also working on many interesting data science initatives. The business scenarios involved in our deep learning initatives include (but are not limited to) planogram analysis (layout of our stores for efficient partner and customer flow) to predicting product pairings (e.g. purchase a caramel machiato and perhaps you would like caramel brownie) via the product components using graph convolutional networks.

For this session, we will be focusing on how we can run distributed Keras (TensorFlow backend) training to perform image analytics. This will be combined with MLflow to showcase the data science lifecycle and how Databricks + MLflow simplifies it. "

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

  1. 1. STARBUCKS TECHNOLOGY Simplifying Deep Learning with HorovodRunner at Starbucks
  2. 2. About the presenters Denny Lee Denny Lee is a Technology Evangelist with Databricks; he is a hands-on data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. His key focuses surround solving complex large scale data problems – providing not only architectural direction but the hands-on implementation of these systems. Vishwanath Subramanian is a Director of Data and Analytics Engineering at Starbucks. Vishwanath has over 15 years of experience with a background in distributed systems, product management, software engineering and Analytics. At Starbucks, his key focus is on providing Next Generation Analytics platforms and enabling large scale data processing and machine learning to enable Business Intelligence and Data Services across Starbucks. Vishwanath Subramanian
  3. 3. Scenarios • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development • Smarter checkout experiences • Predicting customer traffic • Planogram Analysis • And more…
  4. 4. Current State • Solving complex / streaming image and video analytics is hard • It also typically involves distributing the problem to multiple nodes • But how do I perform Keras+TensorFlow on a distributed environment?
  5. 5. Convolutional Neural Networks
  6. 6. Convolutional Neural Networks 28 x 28 28 x 28 14 x 14 Convolution 32 filters Convolution 64 filters Subsampling Stride (2,2) Feature Extraction Classification 0 1 8 9 FullyConnected Dropout
  7. 7. DEMO Running Keras CNNs Standalone Keras, TensorFlow, HorovodRunner, and MLflow: https://dbricks.co/2D58PDw
  8. 8. Introducing HorovodRunner • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development • HorovodRunner is a general API to run distributed learning workloads on Databricks using Uber’s Horovod framework • Combining Horovod with Apache Spark’s barrier mode allows longer- running deep learning training jobs • A Horovod MPI job is embedded as a Spark job using barrier execution mode
  9. 9. HorovodRunner • HorovodRunner takes a Python method that contains DL training code with Horovod hooks • The first executor collects the IP address of all of the task executors using BarrierTaskContext • Then it triggers a Horovod job using mpirun. • Each Python MPI process loads the pickled program back, deserializes it, and runs it.
  10. 10. HorovodRunner driver workers
  11. 11. HorovodRunner driver workers runCNN(): model.add(Conv2D(32, …)) model.add(Conv2D(64, …)) model.add(MaxPooling2D(…)) model.add(Dense(128, …) model.add(Dense(10, ’softmax’) optimizer = keras.optimizers .Adadelta(1.0) In standalone or hvd local mode, the code is running on the driver
  12. 12. HorovodRunner driver workers variables runCNN_hvd(): hvd.init() config.tf.ConfigProto() # Original code runCNN() callbacks = [] With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
  13. 13. HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
  14. 14. HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
  15. 15. HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
  16. 16. HorovodRunner driver workers Variables are transferred from driver to workers Code is executed at the workers
  17. 17. Migrate to HorovodRunner • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development # Primary code differences are noted below + hvd.init() + config.tfConfigProto() + config.gpu_options.allow_growth = True + config.gpu_options.visible_device_list = str(hvd.local_rank()) + epochs = int(math.ceil(12.0 / hvd.size())) + callbacks = [ + hvd.callbacks.BroadcastGlobalVariablesCallback(0), + ]
  18. 18. Comparing the runs using MLflow • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development
  19. 19. DEMO Object Detection Keras, TensorFlow, HorovodRunner, and MLflow
  20. 20. Object Detection Approaches RCNN (2012) • Region proposal algorithms - give you a set of regions in the image that are likely to contain objects. • Run those images in the bounding boxes to a pre-trained alexnet to compute the features for that bounding box. • Support vector machine, to classify what the object in the image is of. • Run the box through a linear regression model to output tighter coordinates for the box. • RCNN -> Fast RCNN ->Faster RCNN Rich feature hierarchies for accurate object detection and semantic segmentation - Girshick, Donahue, Darrell, Malik Fast R-CNN - Girshick Faster R-CNN: Towards Real-Time ObjectDetection with Region Proposal Networks - Ren, He, Girshick, Su
  21. 21. Object Detection Approaches (contd.) • YOLO – detection as a regression problem • Not a traditional classifier • Divide image into grid, each cell is responsible for predicting n bounding boxes • Output confidence score that predicted bounding box • Gives a probability distribution of all the classes its trained on • Confidence score and class prediction is combined is combined into a score for object classification • Based on threshold, we determine relevant boxes. • All the boxes fed to the neural network all at once. You Only Look Once: Unified, Real-Time Object Detection - Redmon, Divvala, Girshick, Farhadi
  22. 22. A TALENTED TECHNOLOGISTS DELIVERING TODAY aavaLEADING INTO THE FUTURE https://www.starbucks.com/careers/

×