AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tom 'Elvis' Jones, Partner Solutions Architect, Amazon Web Services
Diego Oppenheimer, CEO, Algorithmia
December 1, 2016
Bringing Deep Learning to the Cloud
with Amazon EC2
CMP314

* As of 1 October 2016
2009
48
280
722
82
2011 2013 2015
AWS has been continually expanding its services to support virtually any cloud workload
and now has more than 70 services that range from compute, storage, networking,
database, analytics, application services, deployment, management and mobile. AWS
has launched a total of 706 new features and/or services year to date* - for a total of
2,601 new features and/or services since inception in 2006.
AWS Pace of Innovation

Microservices
Serverless
Lambda
Pay as you go
Scalability

Machine learning
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
Your data + machine learning = smart applications

New P2 GPU Instance Types
• New EC2 GPU instance type for accelerated computing
• Offers up to 16 NVIDIA K80 GPUs (8 K80 cards) in a single instance
• The 16xlarge size provides:
• A combined 192 GB of GPU memory, 40 thousand CUDA cores
• 70 teraflops of single precision floating point performance
• Over 23 teraflops of double precision floating point performance
• Example workloads include:
• Deep learning, computational fluid dynamics, computational finance,
seismic analysis, molecular modeling, genomics, VR content rendering,
accelerated databases

P2 Instance Types
Three P2 instance sizes:
Instance Size GPUs GPU
Peer to
Peer
vCPUs Memory
(GiB)
Network
Bandwidth*
p2.xlarge 1 - 4 61 1.25Gbps
p2.8xlarge 8 Y 32 488 10Gbps
p2.16xlarge 16 Y 64 732 20Gbps
*In a placement group

P2 Instance Summary
High-performance GPU instance types, with many innovations
• Based on NVIDIA K80 GPUs, with up to 16 GPUs in a single instance
• With dedicated peer-to-peer connections supporting GPUDirect
• Intel Broadwell processors with up to 64 vCPUs and up to 732GiB RAM
• 20Gbps network on EC2, using Elastic Network Adaptor (ENA)
• Supporting a wide variety of ISV applications and open-source
frameworks

Diego Oppenheimer – CEO, Algorithmia
Product developer, entrepreneur, extensive background
in all things data.
Microsoft: PowerPivot, PowerBI, Excel, and SQL Server
Founder of algorithmic trading startup
BS/MS Carnegie Mellon University

Make state-of-the-art
algorithms accessible and
discoverable by everyone.

A marketplace for algorithms...
We host algorithms
Anyone can turn their algorithms into scalable web services
Typical users: scientists, academics, domain experts
We make them discoverable
Anyone can use and integrate these algorithms into their solutions
Typical users: businesses, data scientists, app developers, IoT makers
We make them monetizable
Users of algorithms pay for algorithms they use
Typical scenarios: heavy-load use cases with large user base

Sample algorithms (2600+ and growing daily)
● Text analysis summarizer, sentence tagger, profanity detection
● Machine learning digit recognizer, recommendation engines
● Web crawler, scraper, pagerank, emailer, html to text
● Computer vision image similarity, face detection, smile detection
● Audio & video speech recognition, sound filters, file conversions
● Computation linear regression, spike detection, fourier filter
● Graph traveling salesman, maze generator, theta star
● Utilities parallel for-each, geographic distance, email validator

Machine Intelligence Stack
Applications
Scalable CPU/GPU compute
Algorithms as micro services
Data stores

Scale?
• Support the workloads of 32,000 developers
• Mixed use CPU/GPU
• Spikey traffic
• Heterogeneous hardware

Hosting Deep Learning
Cloud hosting of deep learning models can be especially challenging
due to complex hardware and software dependencies
At Algorithmia we had to:
• Learn how to host and scale the 5 most common deep learning
frameworks (more coming).
• Deal with scaling and spikey traffic on GPUs
• Deal with multitenancy on GPUs
• Build an extensible and dynamic architecture to support deep
learning

First: What is Deep Learning?
• Deep learning uses artificial neurons similar to the brain to
represent high-dimensional data
• The mammal brain is organized in a deep architecture, e.g., the visual
system has 5 to 10 levels. [1]
• Deep learning excels in tasks where the basic unit (a single pixel,
frequency or word) has very little meaning in and of itself, but
contains high-level structure. Deep nets have been effective at
learning this structure without human intervention.
[1] Serre, Kreiman, Kouh, Cadieu, Knoblich, & Poggio,

What is Deep Learning Being Applied To?
Primarily: huge growth in unstructured data
• Pictures
• Videos
• Audio
• Speech
• Websites
• Emails
• Reviews
• Log files
• Social media

Today’s Use Cases
• Computer vision
• Image classification
• Object detection
• Face recognition
• Natural language
• Speech to test
• Chatbots
• Q&A systems (Siri, Alexa, Google Now)
• Machine translation
• Optimization
• Anomaly detection
• Recommender systems

Why Now?
…and why is deep learning suddenly everywhere?
Advances in research
• LeCun, Gradient-Based Learning Applied to
Document Recognition,1998
• Hinton, A Fast Learning Algorithm for Deep
Belief Nets, 2006
• Bengio, Learning Deep Architectures for AI,
2009
Advances in hardware
• GPUs: 10x performance, 5x energy efficiency
http://www.nvidia.com/content/events/geoInt2015/LBrown_DL_Image_ClassificationGEOINT.pdf

Deep Learning Hardware (2016)
GPUs: NVIDIA is dominating
One of the first GPU neural nets was on a NVIDIA
GTX 280 up to 9 layers neural network (2010
Ciresan and Schmidhuber)
• NVIDIA chips tend to outperform AMD
• More importantly, all the major frameworks use
CUDA as a first-class citizen. Poor support for
AMD’s OpenCL.

Deep Learning Hardware
GPU:
• Becoming more tailored for deep learning (e.g., Pascal chipset)
Custom hardware:
• FPGA (AWS F1, MSFT Project Catapult)
ASIC:
• Google TPU
• IBM TrueNorth
• Nervana Engine
• Graphcore IPUs

GPU Deep Learning Dependencies
Meta deep learning framework
Deep learning framework
cuDNN
CUDA
NVIDIA driver
GPU

Theano
Created by Université de Montréal
Theano pioneered the trend of using a symbolic graph for programming a network.
Very mature framework, good support for many kinds of networks.
Pros:
• Use Python + Numpy
• Declarative computational graph
• Good support for RNNs
• Wrapper frameworks make it more accessible (Keras, Lasagne, Blocks)
• BSB License
Cons:
• Low level framework
• Error messages can be unhelpful
• Large models can have long compile times
• Weak support for pre-trained models

Torch
Created by collaboration of various researchers. Used by DeepMind (prior to
Google). Torch is a general scientific computing framework for Lua.
Torch is more flexible than TensorFlow and Theano in that it’s imperative while
TF/Theano are declarative. That makes some operations (e.g, beam search) much
easier to do.
Pros:
• Very flexible multidimensional array engine
• Multiple back ends (CUDA and OpenMP)
• Lots of pre-trained models available
Cons:
• Lua
• Not good for recurrent networks
• Lack of commercial support

Caffe
Created by Berkley Vision and Learning center and community contributors.
Probably the most used framework today, certainly for CV.
Pros:
• Optimized for feedforward networks, convolutional nets and image processing.
• Simple Python API
• BSD License
Cons:
• C++/CUDA for new GPU layers
• Limited support for recurrent networks (recently added)
• Cumbersome for big networks (GoogLeNet, ResNet)

TensorFlow
Created by Google.
TensorFlow is written with a Python API over a C/C++ engine. TensorFlow generates a
computational graph (e.g., series of matrix operations) and performs automatic
differentiation.
Pros:
• Uses Python + Numpy
• Lots of interest from community
• Highly parallel, and designed to use various back ends (software, gpu, asic)
• Apache License
Cons:
• Slower than other frameworks [1]
• More features, more abstractions than torch
• Not many pre-trained models yet
[1] https://arxiv.org/pdf/1511.06435v3.pdf

Networks for Training
Where to get networks:
• If you’re just interested in using deep learning to classify images, you can
usually find off-the-shelf networks.
• VGG, GoogleNet, AlexNet, SqueezeNet
• Caffe Model Zoo

Training vs. Running
Deep learning generally consists of two phases: training and running.
Training deep learning models is challenging, with many solutions available
today.
Running deep learning models (at scale) is the next step, and has its own
challenges.

Hosting Deep Learning
Making deep learning models available as an API represents a
unique set of challenges that are rarely, if ever, addressed in
tutorials.

Why ML in the Cloud?
• Need to react to live user data
• Don’t want to manage own servers
• Need enough servers to sustain max load. You can save money
using cloud services
• Limited compute capacity on mobile

http://blog.algorithmia.com/2016/07/cloud-hosted-deep-learning-models
Use case: http://colorize-it.com

Capacity required at peak
Capacity required at peak
Potential for ghost/wasted
capacity
Without elastic
compute on GPUs
our cost would be
~75% more

Service Oriented Architecture
Going to want a dedicated infrastructure for handling
computationally intensive tasks like deep learning

LOADBALANCERS
CPU WORKER #1
CPU WORKER #N
Docker(algorithm#1)
Docker(algorithm#2)
..
Docker(algorithm#n)
CLIENTS
APISERVERSAPISERVERS
GPU WORKER #1
GPU WORKER #N
Docker(deep-algo#1)
Docker(deep-algo#2)
..
Docker(deep-algo#n)
m4
m4
m4
x1

Why P2s?
• More video memory
• 12 GB per GPU
• Modern CUDA support
• More CUDA cores to run in parallel
• New messages
• In particular, we had that problem with CUDA 3.0 not allowing
us to share memory as efficiently
• Price per flop

Customer Showcase:
• CSDisco offers cutting-edge eDiscovery technology for attorneys “DISCO ML”.
• DISCO ML is a deep learning based exploration tool that asynchronously re-learns as
attorneys move through their normal discovery workflows.
• “The proprietary, multi-layer artificial network uses deep learning and arrays of GPUs
to unpack and process learned information quickly. Combining Google’s advanced
Word2Vec embeddings with powerful convolutional and recurrent neural networks for
text...”

Customer Showcase:
Why they chose to host their ML on Algorithmia ?
• Scalability: Required scalable GPU based compute fabric for their neural net based
ML approach to on-board hundreds of new customers without taxing their engineering
department.
• Flexibility: Expected high peaks of usage during certain hours.
• Reduce ghost compute: excess capacity = unnecessary cost
• Ability to chain algorithms: Process is a series of operations from scoring, training ,
validation, querying – each is its own model hosted on Algorithmia. Algorithmia
provides an easy way to pipe algorithms into each other.

Challenges and how EC2 helps
(with some)

Challenge #1: New Hardware, Latest CUDA
AWS: G2 (Grid K520- 2013) and P2 instances
Azure: N-Series
Google: Just announced preview
SoftLayer: various cards including Tesla K80 and M60.
Small providers: Nimbix, Cirrascale, Penguin

Challenge #2: Language Bindings
You probably already have an existing stack in some programming
language. How does it talk to your deep learning framework?
Hope you like Python ( or Lua)
Solution: Services!

Challenge #3: Large Models
Deep learning models are getting larger
• State of the art networks are easily multi-gigabyte
• Need to be loaded and scaled
Solutions:
• More hardware
• Smaller models

Memory Per Model
Size (MB) Error % (top-5)
SqueezeNet
Compressed
0.6 19.7%
SqueezeNet 4.8 19.7%
AlexNet 240 19.7%
Inception v3 84 5.6%
VGG-19 574 7.5%
ResNet-50 102 7.8%
ResNet-200 519 4.8%

SqueezeNet
Hypothesis: the networks we’re using today are much larger and
more complicated than they need to be.
Enter SqueezeNet: AlexNet-level accuracy with 50x fewer
parameters and < 0.5 MB model size.
Not quite state of the art, but close, and MUCH easier to host.
[1] Iandola, Han, Moskewicz, Ashraf, Dally, Keutz

Model Compression
Recent efforts at aimed at pruning the size of networks
“Reduce the storage requirement of neural networks by 35x to 49x
without affecting their accuracy.” (Han, Mao, Dally -2015)

Challenge #4: GPU Sharing
GPUs were not designed to be shared like CPUs
• Limited amount of video memory
• Even with multi-context management, memory overflows and
unrestricted pointer logic are very dangerous for other
applications
• Developers need a way to share GPU resources safely from
potentially malicious applications.

Challenge #5: GPU Sharing - Containers
Docker – new standard in deploying applications, but adds an
additional layer of challenges to GPU computing.
• NVIDIA drivers must match inside and outside containers.
• CUDA drivers must match inside and outside containers.
• Some algorithms require X windows, which must be started
outside the container and mounted inside
• Nvidia-docker container is helpful but not a complete solution.
• New AWS Deep Learning AMI -> Huge step in right direction.

Lessons Learned
• Deep learning in the cloud is still in its infancy
• Hosting deep learning models is the next logical step after the
training model, but the difficulty is underappreciated.
• Tooling and frameworks are making things easier, but there is a
lot of opportunity for improvement
Big picture: the challenges involved with creating DL models is only
half the problem. Deploying them is an entirely different skillset.

Thank you!
Try out Algorithmia for free:
Code: reinvent16
Algorithmia.com

Remember to complete
your evaluations!

AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)

Similar to AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314) (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)