SlideShare a Scribd company logo
1 of 42
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Scaling Vision Models Using Caffe2 on AWS
P i e t e r N o o r d h u i s | C a f f e 2 E n g i n e e r i n g a t F a c e b o o k
J o s e p h S p i s a k | H e a d o f A I / M L P a r t n e r s h i p s a t A W S
N o v e m b e r 2 9 , 2 0 1 7
AWS re:INVENT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
The AWS ML Strategy
Deep learning and GPU compute
The Caffe2 story
Amazon + Facebook = optimized Caffe2 on AWS
How to get started…
Key takeaways and call to action
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enable customers (at all levels of expertise) to
build machine learning-driven applications
ML @ AWS: Our mission
ML in the Hands of Every Developer
Services
Platforms
Frameworks
Infrastructure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Frameworks &
Infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
MobileCPU
IoT
(Greengrass)
Vision:
Rekognition Image
Rekognition Video
Speech:
Amazon Polly
Transcribe
Language:
Amazon Lex Translate
Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
AWS ML Stack
Application
Services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
EMR
Amazon
SageMaker
AWS
DeepLens
Amazon confidential
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Frameworks &
Infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
MobileCPU
IoT
(Greengrass)
Vision:
Rekognition Image
Rekognition Video
Speech:
Polly
Transcribe
Language:
Lex Translate
Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
AWS ML Stack
Application
Services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
EMR
Amazon
SageMaker
AWS
DeepLens
Amazon confidential
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EC2 P3 Instances (October 2017)
• Up to eight NVIDIA Tesla V100 GPUs
• 1 PetaFLOP of computational performance
– 14x better than P2
• 300 GB/s GPU-to-GPU communication
(NVLink) – 9X better than P2
• 16GB GPU memory with 900 GB/sec peak
GPU memory bandwidth
T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Deep Learning AMI
• Get started quickly with easy-to-launch tutorials
• Hassle-free setup and configuration
• Pay only for what you use – no additional charge for
the AMI
• Accelerate your model training and deployment
• Support for popular deep learning frameworks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Neural Network Exchange (ONNX)
• Developers can choose the framework that best fits their needs
• More customers can take advantage of MXNet’s performance and scalability
• MXNet users to run their model on various mobile and edge devices
(Qualcomm, Huawei, Intel, and ARM announced support for ONNX)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Grad student-driven project
• Focuses on CV applications
• Adopted by industry
• #2 DL framework in popularity
The original Caffe
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Full computation graph
• First-class distributed support
• Cross-platform
• CV / NLP / speech / ranking
Caffe2 brings…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2 uses CMake and builds on:
• Linux / Mac
• Windows
• iOS
• Android
• Tegra K1/X2
• Raspberry Pi
Cross-platform support
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NNPack
cuDNN Metal
Custom
Code
Modularity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multiple backend support
• cuDNN
• MKLDNN
• Metal
• Snapdragon NPE
Easy extensions
• caffe2/contrib/...
• Or a custom extension!
class MyTSNEOp : public
Operator<CPUContext> {…};
REGISTER_OPERATOR(
TSNE, MyTSNEOp);
Modularity Enables
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Defining a model
• Model is container for all ops
• Model can convert to protobuf
• Argument scope for Brew API
• Brew API is set of factory functions
• Image input is an operator
• ”reader” is object that can:
• seek(N)
• read()  (data, label)
• Image augmentation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Defining a model
• Helper function to define ResNet-50
• “data” comes from image input
• If label is specified  softmax & loss
• Add operators to compute
derivative w.r.t. loss
• Add optimizer to translate gradients
into model weight updates
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training loop
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scaling up
• Weak scaling vs strong scaling
• Here we focus on weak scaling
• Data vs model parallelism
• Here we focus on data parallelism
• Modes:
• Single machine / single GPU
• Single machine / multi GPU
• Multi machine / single GPU
• Multi machine / multi GPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inefficiencies of small batches
0
50
100
150
200
250
300
350
400
450
0 8 16 24 32 40 48 56 64
GPU throughput per batch size
approx images/sec
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parallelizing the model
• Instantiates 1 model per device
• Batch size multiplied by len(devices)
• Gradients are reduced (averaged) before applying weight updates
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
L1 L2 L3 L3b L2b L1b U3 U2 U1
Parallelizing the model
Forward Backward Update
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1
L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1
Parallelizing the model
Forward Backward Reduce Update
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
L1 L2 L3 L3b L2b L1b
U3 U2 U1R3 R2 R1
L1 L2 L3 L3b L2b L1b
U3 U2 U1R3 R2 R1
Parallelizing the model
Forward Backward
Reduce and update
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parallelizing the model
• Near-linear scaling (e.g. ~98%)
• Depends on efficiency of gradient averaging
• Every other operator executes locally
• Faster GPUs means more pressure on image input
• Throughput on p3.16xlarge: ~2600 images/sec
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parallelizing for multi-machine
• Key/value store for rendezvous
• TCP for cross-machine reduction
• Same “Parallelize” call
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-machine
• Scheduling (depends on your environment):
• Amazon EC2 Container Service
• SLURM for MPI on HPC clusters
• Person typing SSH commands
• Etcetera…
• Rendezvous:
• Let instances find each other once started
• Use key/value store, e.g. Redis, Amazon ElastiCache
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-machine rendezvous
$ python trainer.py –n=2 –id=0
# Set address at 0/1
# Wait for address at 1/0
# Get address at 1/0
# Byte-compare sockaddr
# If <: close() and connect()
# If >: accept() and close()
# Socket is connected!
$ python trainer.py –n=2 –id=1
# Set address at 1/0
# Wait for address at 0/1
# Get address at 0/1
# Byte-compare sockaddr
# If <: close() and connect()
# If >: accept() and close()
# Socket is connected!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-machine reduction
• allreduce in 3 stages:
• Local reduce (reduce from all GPUs to system memory)
• Allreduce across machines
• Local broadcast (broadcast from system memory to all GPUs)
• Single allreduce per buffer in the model
• Runtime depends on size of the buffer and network speed
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-machine reduction
0
10
20
30
40
50
60
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Parameter size distribution for ResNet-50 (N=214)
Parameter size (power of 2)
< 4k > 4k
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0
0.25
0.5
0.75
1
100 200 300 400
Allreduce (milliseconds)
scaling efficency = (Tf + Tb) / (Tf + max(Tb, M/B))
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-machine observations
• Single slow machine slows down entire collective
• Multi-machine scales well only if:
• Time spent in backwards pass is <= time spent reducing gradients
• Larger model size means more time on the network
• Larger global batch size requires tuning
• Current state of the art is 32k on ImageNet dataset
(by Preferred Networks; https://arxiv.org/pdf/1711.04325.pdf)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2 on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2 on AWS
• Amazon Deep Learning AMI
• CUDA 9 / cuDNN 7 / NCCL 2
• Amazon ElastiCache for VM rendezvous
• VPC for private network
• (optional) EFS for storing checkpoints
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2 on AWS
• Use Caffe2 installed on AMI (stable)
• Use Caffe2 Docker image (stable & nightly)
• nvidia-docker for execution
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caffe2 on AWS
• Data input is an open problem
• Small datasets in RAM or local disk
• Larger datasets off of network (e.g. S3)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key takeaways & call to action
Amazon AI is well-optimized to support Caffe2 and many other frameworks
The P3 Instance brings a leap forward in performance for deep learning
Distributed deep learning with Caffe2 enables large-scale training in hours instead
of days/weeks.
Call to action:
Get started with Caffe2 http://caffe2.ai/
Use the Deep Learning AMI  https://aws.amazon.com/amazon-ai/amis/
The Deep Learning Revolution
Terrence Sejnowski, The Salk Institute for Biological Studies
Eye, Robot: Computer Vision and Autonomous
Robotics
Aaron Ames & Pietro Perona, California Institute of Technology
Exploiting the Power of Language
Alexander Smola, Amazon Web Services
Reducing Supervision: Making More with Less
Martial Hebert, Carnegie Mellon University
Learning Where to Look in Video
Kristen Grauman, University of Texas
Look, Listen, Learn: The Intersection of Vision and
Sound
Antonio Torralba, MIT
Investing in the Deep Learning Future
Matt Ocko, Data Collective Venture Capital
Thursday, November 30th
1:00 - 5:00pm | Venetian, Ballroom F
https://reinvent.awsevents.com/learn/deep-learning-summit/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reminder: please fill out your surveys
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
C L I C K T O A D D T E X T
C L I C K T O A D D T E X T

More Related Content

What's hot

NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...
NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...
NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...Amazon Web Services
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...Amazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...Amazon Web Services
 
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...Amazon Web Services
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...Amazon Web Services
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...Amazon Web Services
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...Amazon Web Services
 
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaMCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaAmazon Web Services
 
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...Amazon Web Services
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersAmazon Web Services
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingAmazon Web Services
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSAmazon Web Services
 
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataGPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataAmazon Web Services
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAmazon Web Services
 

What's hot (20)

NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...
NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...
NEW LAUNCH! AWS DeepLens workshop: Building Computer Vision Applications - MC...
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
 
MAE301_Boom for your Buck
MAE301_Boom for your BuckMAE301_Boom for your Buck
MAE301_Boom for your Buck
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
AWS 容器服務入門實務
AWS 容器服務入門實務AWS 容器服務入門實務
AWS 容器服務入門實務
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...
NEW LAUNCH! AWS Greengrass and Amazon FreeRTOS: Connectivity and Security at ...
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaMCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
 
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with Containers
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
 
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataGPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale Migrations
 

Similar to Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017

How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateAmazon Web Services
 
Randall's re:Invent Recap
Randall's re:Invent RecapRandall's re:Invent Recap
Randall's re:Invent RecapRandall Hunt
 
AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAmazon Web Services
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
 
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSGAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSAmazon Web Services
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAPI Talent
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenAWS Germany
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon Web Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsAmazon Web Services
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Amazon Web Services
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerAmazon Web Services
 
Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...Hagay Lupesko
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyAmazon Web Services
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyAmazon Web Services
 
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...Amazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)Amazon Web Services
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...Amazon Web Services
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 

Similar to Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017 (20)

How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS Fargate
 
Randall's re:Invent Recap
Randall's re:Invent RecapRandall's re:Invent Recap
Randall's re:Invent Recap
 
AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWS
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)
 
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSGAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 Recap
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing Operations
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model Server
 
Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
 
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...
Stack Mastery: Create and Optimize Advanced AWS CloudFormation Templates - DE...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Scaling Vision Models Using Caffe2 on AWS P i e t e r N o o r d h u i s | C a f f e 2 E n g i n e e r i n g a t F a c e b o o k J o s e p h S p i s a k | H e a d o f A I / M L P a r t n e r s h i p s a t A W S N o v e m b e r 2 9 , 2 0 1 7 AWS re:INVENT
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda The AWS ML Strategy Deep learning and GPU compute The Caffe2 story Amazon + Facebook = optimized Caffe2 on AWS How to get started… Key takeaways and call to action
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enable customers (at all levels of expertise) to build machine learning-driven applications ML @ AWS: Our mission
  • 4. ML in the Hands of Every Developer Services Platforms Frameworks Infrastructure
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Frameworks & Infrastructure AWS Deep Learning AMI GPU (P3 Instances) MobileCPU IoT (Greengrass) Vision: Rekognition Image Rekognition Video Speech: Amazon Polly Transcribe Language: Amazon Lex Translate Comprehend Apache MXNet PyTorch Cognitive Toolkit Keras Caffe2 & Caffe TensorFlow Gluon AWS ML Stack Application Services Platform Services Amazon Machine Learning Mechanical Turk Spark & EMR Amazon SageMaker AWS DeepLens Amazon confidential
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Frameworks & Infrastructure AWS Deep Learning AMI GPU (P3 Instances) MobileCPU IoT (Greengrass) Vision: Rekognition Image Rekognition Video Speech: Polly Transcribe Language: Lex Translate Comprehend Apache MXNet PyTorch Cognitive Toolkit Keras Caffe2 & Caffe TensorFlow Gluon AWS ML Stack Application Services Platform Services Amazon Machine Learning Mechanical Turk Spark & EMR Amazon SageMaker AWS DeepLens Amazon confidential
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 P3 Instances (October 2017) • Up to eight NVIDIA Tesla V100 GPUs • 1 PetaFLOP of computational performance – 14x better than P2 • 300 GB/s GPU-to-GPU communication (NVLink) – 9X better than P2 • 16GB GPU memory with 900 GB/sec peak GPU memory bandwidth T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Deep Learning AMI • Get started quickly with easy-to-launch tutorials • Hassle-free setup and configuration • Pay only for what you use – no additional charge for the AMI • Accelerate your model training and deployment • Support for popular deep learning frameworks
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Open Neural Network Exchange (ONNX) • Developers can choose the framework that best fits their needs • More customers can take advantage of MXNet’s performance and scalability • MXNet users to run their model on various mobile and edge devices (Qualcomm, Huawei, Intel, and ARM announced support for ONNX)
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Grad student-driven project • Focuses on CV applications • Adopted by industry • #2 DL framework in popularity The original Caffe
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Full computation graph • First-class distributed support • Cross-platform • CV / NLP / speech / ranking Caffe2 brings…
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2 uses CMake and builds on: • Linux / Mac • Windows • iOS • Android • Tegra K1/X2 • Raspberry Pi Cross-platform support
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NNPack cuDNN Metal Custom Code Modularity
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multiple backend support • cuDNN • MKLDNN • Metal • Snapdragon NPE Easy extensions • caffe2/contrib/... • Or a custom extension! class MyTSNEOp : public Operator<CPUContext> {…}; REGISTER_OPERATOR( TSNE, MyTSNEOp); Modularity Enables
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Defining a model • Model is container for all ops • Model can convert to protobuf • Argument scope for Brew API • Brew API is set of factory functions • Image input is an operator • ”reader” is object that can: • seek(N) • read()  (data, label) • Image augmentation
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Defining a model • Helper function to define ResNet-50 • “data” comes from image input • If label is specified  softmax & loss • Add operators to compute derivative w.r.t. loss • Add optimizer to translate gradients into model weight updates
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training loop
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scaling up • Weak scaling vs strong scaling • Here we focus on weak scaling • Data vs model parallelism • Here we focus on data parallelism • Modes: • Single machine / single GPU • Single machine / multi GPU • Multi machine / single GPU • Multi machine / multi GPU
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inefficiencies of small batches 0 50 100 150 200 250 300 350 400 450 0 8 16 24 32 40 48 56 64 GPU throughput per batch size approx images/sec
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Parallelizing the model • Instantiates 1 model per device • Batch size multiplied by len(devices) • Gradients are reduced (averaged) before applying weight updates
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. L1 L2 L3 L3b L2b L1b U3 U2 U1 Parallelizing the model Forward Backward Update
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1 L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1 Parallelizing the model Forward Backward Reduce Update
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1 L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1 Parallelizing the model Forward Backward Reduce and update
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Parallelizing the model • Near-linear scaling (e.g. ~98%) • Depends on efficiency of gradient averaging • Every other operator executes locally • Faster GPUs means more pressure on image input • Throughput on p3.16xlarge: ~2600 images/sec
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Parallelizing for multi-machine • Key/value store for rendezvous • TCP for cross-machine reduction • Same “Parallelize” call
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-machine • Scheduling (depends on your environment): • Amazon EC2 Container Service • SLURM for MPI on HPC clusters • Person typing SSH commands • Etcetera… • Rendezvous: • Let instances find each other once started • Use key/value store, e.g. Redis, Amazon ElastiCache
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-machine rendezvous $ python trainer.py –n=2 –id=0 # Set address at 0/1 # Wait for address at 1/0 # Get address at 1/0 # Byte-compare sockaddr # If <: close() and connect() # If >: accept() and close() # Socket is connected! $ python trainer.py –n=2 –id=1 # Set address at 1/0 # Wait for address at 0/1 # Get address at 0/1 # Byte-compare sockaddr # If <: close() and connect() # If >: accept() and close() # Socket is connected!
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-machine reduction • allreduce in 3 stages: • Local reduce (reduce from all GPUs to system memory) • Allreduce across machines • Local broadcast (broadcast from system memory to all GPUs) • Single allreduce per buffer in the model • Runtime depends on size of the buffer and network speed
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-machine reduction 0 10 20 30 40 50 60 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Parameter size distribution for ResNet-50 (N=214) Parameter size (power of 2) < 4k > 4k
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0 0.25 0.5 0.75 1 100 200 300 400 Allreduce (milliseconds) scaling efficency = (Tf + Tb) / (Tf + max(Tb, M/B))
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-machine observations • Single slow machine slows down entire collective • Multi-machine scales well only if: • Time spent in backwards pass is <= time spent reducing gradients • Larger model size means more time on the network • Larger global batch size requires tuning • Current state of the art is 32k on ImageNet dataset (by Preferred Networks; https://arxiv.org/pdf/1711.04325.pdf)
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2 on AWS
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2 on AWS • Amazon Deep Learning AMI • CUDA 9 / cuDNN 7 / NCCL 2 • Amazon ElastiCache for VM rendezvous • VPC for private network • (optional) EFS for storing checkpoints
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2 on AWS • Use Caffe2 installed on AMI (stable) • Use Caffe2 Docker image (stable & nightly) • nvidia-docker for execution
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caffe2 on AWS • Data input is an open problem • Small datasets in RAM or local disk • Larger datasets off of network (e.g. S3)
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting Started
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key takeaways & call to action Amazon AI is well-optimized to support Caffe2 and many other frameworks The P3 Instance brings a leap forward in performance for deep learning Distributed deep learning with Caffe2 enables large-scale training in hours instead of days/weeks. Call to action: Get started with Caffe2 http://caffe2.ai/ Use the Deep Learning AMI  https://aws.amazon.com/amazon-ai/amis/
  • 40. The Deep Learning Revolution Terrence Sejnowski, The Salk Institute for Biological Studies Eye, Robot: Computer Vision and Autonomous Robotics Aaron Ames & Pietro Perona, California Institute of Technology Exploiting the Power of Language Alexander Smola, Amazon Web Services Reducing Supervision: Making More with Less Martial Hebert, Carnegie Mellon University Learning Where to Look in Video Kristen Grauman, University of Texas Look, Listen, Learn: The Intersection of Vision and Sound Antonio Torralba, MIT Investing in the Deep Learning Future Matt Ocko, Data Collective Venture Capital Thursday, November 30th 1:00 - 5:00pm | Venetian, Ballroom F https://reinvent.awsevents.com/learn/deep-learning-summit/
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reminder: please fill out your surveys
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! C L I C K T O A D D T E X T C L I C K T O A D D T E X T