© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Radhika Ravirala | Solutions Architect | AWS
August 17th, 2017
Deep Dive into Apache MXNet on AWS
Deep Learning is a coming-of-age for neural networks
and is being used to solve previously intractable
machine learning problems.
“deep learning” trend in the past 10 years
image understanding speech recognition natural language
processing
autonomy
Agenda
• Applications
• Apache MXNet Overview
• Framework Comparison
• Mechanics of Apache MXNet
• Walkthrough| MXNet Jupyter Notebook
• Developer Tools and Resources
Deep Learning |Applications
Autonomous Driving Systems
Biometrics | Mobile Devices
Apache MXNet | Overview
Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
Most Open Best On AWS
Optimized for
deep learning on
AWS
Accepted into the
Apache Incubator
AI Services
AI Platform
AI Engines
Amazon
Rekognition
Amazon
Polly
Amazon
Lex
More to come
in 2017
Amazon
Machine Learning
Amazon Elastic
MapReduce
Spark &
SparkML
More to come
in 2017
Apache
MXNet
Caffe Theano KerasTorch CNTK
Amazon AI: Democratized Artificial Intelligence
TensorFlow
P2 ECS Lambda
AWS
Greengrass
FPGAEMR/Spark
More to
come
in 2017
Hardware
Amazon Strategy | Apache MXNet
Integrate with
AWS Services
Bring Scalable Deep
Learning to AWS
Services such as
Amazon EMR, AWS
Lambda and
Amazon ECS.
Foundation for
AI Services
AmazonAI API
Services, Internal AI
Research, Amazon
Core AI
Development
Leverage the
Community
Community brings
velocity and
innovation with no
single project owner
or controller
Deep Learning using MXNet @Amazon
• Applied Research
• Core Research
• Alexa
• Demand Forecasting
• Risk Analytics
• Search
• Recommendations
• AI Services | Rek, Lex, Polly
• Q&A Systems
• Supply Chain Optimization
• Advertising
• Machine Translation
• Video Content Analysis
• Robotics
• Lots of Computer Vision..
• Lots of NLP/U..
*Teams are either actively evaluating, in development, or transitioning to scale production
Collaborations and Community
4th DL Framework in Popularity
(Outpacing Torch, CNTK and
Theano)
Diverse Community
(Spans Industry and Academia)
0 10,000 20,000 30,000 40,000 50,000 60,000
Yutian Li (Stanford)
Nan Zhu (MSFT)
Liang Depeng (Sun Yat-sen U.)
Xingjian Shi (HKUST)
Tianjun Xiao (Tesla)
Chiyuan Zhang (MIT)
Yao Wang (AWS)
Jian Guo (TuSimple)
Yizhi Liu (Mediav)
Sandeep K. (AWS)
Sergey Kolychev (Whitehat)
Eric Xie (AWS)
Tianqi Chen (UW)
Mu Li (AWS)
Bing Su (Apple)
*As of 3/30/17
0 50 100 150 200
Torch
CNTK
DL4J
Theano
Apache MXNet
Keras
Caffe
TensorFlow
*As of 2/11/17
Deep Learning Framework Comparison
Apache MXNet TensorFlow Cognitive Toolkit
Industry Owner
N/A – Apache
Community
Google Microsoft
Programmability
Imperative and
Declarative
Declarative only Declarative only
Language
Support
R, Python, Scala, Julia,
Cpp. Javascript, Go,
Matlab and more..
Python, Cpp.
Experimental Go and
Java
Python, Cpp,
Brainscript.
Code Length|
AlexNet (Python)
44 sloc 107 sloc using TF.Slim 214 sloc
Memory Footprint
(LSTM)
2.6GB 7.2GB N/A
*sloc – source lines of code
0
4
8
12
16
1 2 4 8 16
Ideal
Inception v3
Resnet
Alexnet
91%
Efficiency
Multi-GPU Scaling With MXNet
0
64
128
192
256
1 2 4 8 16 32 64 128 256
Multi-GPU Scaling With MXNet
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
0
64
128
192
256
1 2 4 8 16 32 64 128 256
Multi-Machine Scaling With MXNet
Apache MXNet | The Basics
Apache MXNet | The Basics
• NDArray: Manipulate multi-dimensional arrays in a command line
paradigm (imperative).
• Symbol: Symbolic expression for neural networks (declarative).
• Module: Intermediate-level and high-level interface for neural
network training and inference.
• Loading Data: Feeding data into training/inference programs.
• Mixed Programming: Training algorithms developed using
NDArrays in concert with Symbols.
0.2
-0.1
...
0.7
Input Output
1 1 1
1 0 1
0 0 0
3
mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2)
lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx.sym.FullyConnected(data, num_hidden=128)
2
mx.symbol.Embedding(data, input_dim, output_dim = k)
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)
mx.sym.Activation(data, act_type="xxxx")
"relu"
"tanh"
"sigmoid"
"softrelu"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx.model.FeedForward model.fit
mx.sym.SoftmaxOutput
Anatomy of a Deep Learning Model
mx.sym.Convolution(data, kernel=(5,5), num_filter=20)
Deep Learning Models
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger).
• E.g. Numpy, Matlab, Torch, …
•Hard to optimize
PROS
CONSEasy to tweak
in Python
Imperative Programming
• More chances for
optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
•Less flexible
PROS
CONSC can share memory with
D because C is deleted
later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programming
IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
Mixed Programming Paradigm
Embed symbolic expressions into imperative
programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
Mixed Programming Paradigm
• Fit the core library with all dependencies into a
single C++ source file
• Easy to compile on any platform
Amalgamation
BlindTool by Joseph Paul Cohen, demo on Nexus 4
RUNS IN BROWSER
WITH JAVASCRIPT
And now, even easier with Apple’s Core ML
Roadmap / Areas of Investment
• Usability
• Keras Integration / Gluon Interface
• Apple’s Core ML Convertor
• MinPy being merged (Dynamic Computation graphs, Std Numpy interface)
• Documentation (installation, native documents, etc.)
• Tutorials, examples | Jupyter Notebooks
• Platform support
(Linux, Windows, OS X, mobile …)
• Language bindings
(Python, C++, R, Scala, Julia, JavaScript …)
• Sparse datatypes and LSTM performance improvements
• Deploy your model your way: Lambda (+GreenGrass), Amazon EC2/Docker,
Raspberry Pi
Gluon Experimental Interface
Apache MXNet | Developer Tools and Resources
One-Click GPU or CPU
Deep Learning
AWS Deep Learning AMI
Up to~40k CUDA cores
Apache MXNet
TensorFlow
Theano
Caffe
Torch
Keras
Pre-configured CUDA drivers,
MKL
Anaconda, Python3
Ubuntu and Amazon Linux
+ AWS CloudFormation template
+ Container image
Application Examples | Jupyter Notebooks
• https://github.com/dmlc/mxnet-notebooks
• Basic concepts
• NDArray - multi-dimensional array computation
• Symbol - symbolic expression for neural networks
• Module - neural network training and inference
• Applications
• MNIST: recognize handwritten digits
• Check out the distributed training results
• Predict with pre-trained models
• LSTMs for sequence learning
• Recommender systems
• Train a state of the art Computer Vision model (CNN)
• Lots more..
Developer Resources
MXNet Resources:
• MXNet Blog Post | AWS Endorsement
• Read up on MXNet and Learn More: mxnet.io
• MXNet Github Repo
• MXNet Recommender Systems Talk | Leo Dirac
Developer Resources:
• Deep Learning AMI |Amazon Linux
• Deep Learning AMI | Ubuntu
• CloudFormation Template Instructions
• Deep Learning Benchmark
• MXNet on Lambda
• MXNet on ECS/Docker
• MXNet on Raspberry Pi | Image Detector using Inception Network
Apache MXNet | Jupyter Notebook Demo
Training MNIST on MXNet
Thank You!
spisakj@amazon.com
gernest@amazon.com

Deep Dive into Apache MXNet on AWS

  • 1.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Radhika Ravirala | Solutions Architect | AWS August 17th, 2017 Deep Dive into Apache MXNet on AWS
  • 2.
    Deep Learning isa coming-of-age for neural networks and is being used to solve previously intractable machine learning problems. “deep learning” trend in the past 10 years image understanding speech recognition natural language processing autonomy
  • 3.
    Agenda • Applications • ApacheMXNet Overview • Framework Comparison • Mechanics of Apache MXNet • Walkthrough| MXNet Jupyter Notebook • Developer Tools and Resources
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    Apache MXNet Programmable PortableHigh Performance Near linear scaling across hundreds of GPUs Highly efficient models for mobile and IoT Simple syntax, multiple languages Most Open Best On AWS Optimized for deep learning on AWS Accepted into the Apache Incubator
  • 9.
    AI Services AI Platform AIEngines Amazon Rekognition Amazon Polly Amazon Lex More to come in 2017 Amazon Machine Learning Amazon Elastic MapReduce Spark & SparkML More to come in 2017 Apache MXNet Caffe Theano KerasTorch CNTK Amazon AI: Democratized Artificial Intelligence TensorFlow P2 ECS Lambda AWS Greengrass FPGAEMR/Spark More to come in 2017 Hardware
  • 10.
    Amazon Strategy |Apache MXNet Integrate with AWS Services Bring Scalable Deep Learning to AWS Services such as Amazon EMR, AWS Lambda and Amazon ECS. Foundation for AI Services AmazonAI API Services, Internal AI Research, Amazon Core AI Development Leverage the Community Community brings velocity and innovation with no single project owner or controller
  • 11.
    Deep Learning usingMXNet @Amazon • Applied Research • Core Research • Alexa • Demand Forecasting • Risk Analytics • Search • Recommendations • AI Services | Rek, Lex, Polly • Q&A Systems • Supply Chain Optimization • Advertising • Machine Translation • Video Content Analysis • Robotics • Lots of Computer Vision.. • Lots of NLP/U.. *Teams are either actively evaluating, in development, or transitioning to scale production
  • 12.
    Collaborations and Community 4thDL Framework in Popularity (Outpacing Torch, CNTK and Theano) Diverse Community (Spans Industry and Academia) 0 10,000 20,000 30,000 40,000 50,000 60,000 Yutian Li (Stanford) Nan Zhu (MSFT) Liang Depeng (Sun Yat-sen U.) Xingjian Shi (HKUST) Tianjun Xiao (Tesla) Chiyuan Zhang (MIT) Yao Wang (AWS) Jian Guo (TuSimple) Yizhi Liu (Mediav) Sandeep K. (AWS) Sergey Kolychev (Whitehat) Eric Xie (AWS) Tianqi Chen (UW) Mu Li (AWS) Bing Su (Apple) *As of 3/30/17 0 50 100 150 200 Torch CNTK DL4J Theano Apache MXNet Keras Caffe TensorFlow *As of 2/11/17
  • 13.
    Deep Learning FrameworkComparison Apache MXNet TensorFlow Cognitive Toolkit Industry Owner N/A – Apache Community Google Microsoft Programmability Imperative and Declarative Declarative only Declarative only Language Support R, Python, Scala, Julia, Cpp. Javascript, Go, Matlab and more.. Python, Cpp. Experimental Go and Java Python, Cpp, Brainscript. Code Length| AlexNet (Python) 44 sloc 107 sloc using TF.Slim 214 sloc Memory Footprint (LSTM) 2.6GB 7.2GB N/A *sloc – source lines of code
  • 14.
    0 4 8 12 16 1 2 48 16 Ideal Inception v3 Resnet Alexnet 91% Efficiency Multi-GPU Scaling With MXNet
  • 15.
    0 64 128 192 256 1 2 48 16 32 64 128 256 Multi-GPU Scaling With MXNet
  • 16.
    Ideal Inception v3 Resnet Alexnet 88% Efficiency 0 64 128 192 256 1 24 8 16 32 64 128 256 Multi-Machine Scaling With MXNet
  • 17.
    Apache MXNet |The Basics
  • 18.
    Apache MXNet |The Basics • NDArray: Manipulate multi-dimensional arrays in a command line paradigm (imperative). • Symbol: Symbolic expression for neural networks (declarative). • Module: Intermediate-level and high-level interface for neural network training and inference. • Loading Data: Feeding data into training/inference programs. • Mixed Programming: Training algorithms developed using NDArrays in concert with Symbols.
  • 19.
    0.2 -0.1 ... 0.7 Input Output 1 11 1 0 1 0 0 0 3 mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2) lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed) 4 2 2 0 4=Max 1 3 ... 4 0.2 -0.1 ... 0.7 mx.sym.FullyConnected(data, num_hidden=128) 2 mx.symbol.Embedding(data, input_dim, output_dim = k) Queen 4 2 2 0 2=Avg Input Weights cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman) mx.sym.Activation(data, act_type="xxxx") "relu" "tanh" "sigmoid" "softrelu" Neural Art Face Search Image Segmentation Image Caption “People Riding Bikes” Bicycle, People, Road, Sport Image Labels Image Video Speech Text “People Riding Bikes” Machine Translation “Οι άνθρωποι ιππασίας ποδήλατα” Events mx.model.FeedForward model.fit mx.sym.SoftmaxOutput Anatomy of a Deep Learning Model mx.sym.Convolution(data, kernel=(5,5), num_filter=20) Deep Learning Models
  • 20.
    import numpy asnp a = np.ones(10) b = np.ones(10) * 2 c = b * a d = c + 1 • Straightforward and flexible. • Take advantage of language native features (loop, condition, debugger). • E.g. Numpy, Matlab, Torch, … •Hard to optimize PROS CONSEasy to tweak in Python Imperative Programming
  • 21.
    • More chancesfor optimization • Cross different languages • E.g. TensorFlow, Theano, Caffe •Less flexible PROS CONSC can share memory with D because C is deleted later A = Variable('A') B = Variable('B') C = B * A D = C + 1 f = compile(D) d = f(A=np.ones(10), B=np.ones(10)*2) A B 1 + X Declarative Programming
  • 22.
    IMPERATIVE NDARRAY API DECLARATIVE SYMBOLIC EXECUTOR >>> importmxnet as mx >>> a = mx.nd.zeros((100, 50)) >>> b = mx.nd.ones((100, 50)) >>> c = a + b >>> c += 1 >>> print(c) >>> import mxnet as mx >>> net = mx.symbol.Variable('data') >>> net = mx.symbol.FullyConnected(data=net, num_hidden=128) >>> net = mx.symbol.SoftmaxOutput(data=net) >>> texec = mx.module.Module(net) >>> texec.forward(data=c) >>> texec.backward() NDArray can be set as input to the graph Mixed Programming Paradigm
  • 23.
    Embed symbolic expressionsinto imperative programming texec = mx.module.Module(net) for batch in train_data: texec.forward(batch) texec.backward() for param, grad in zip(texec.get_params(), texec.get_grads()): param -= 0.2 * grad Mixed Programming Paradigm
  • 24.
    • Fit thecore library with all dependencies into a single C++ source file • Easy to compile on any platform Amalgamation BlindTool by Joseph Paul Cohen, demo on Nexus 4 RUNS IN BROWSER WITH JAVASCRIPT
  • 25.
    And now, eveneasier with Apple’s Core ML
  • 26.
    Roadmap / Areasof Investment • Usability • Keras Integration / Gluon Interface • Apple’s Core ML Convertor • MinPy being merged (Dynamic Computation graphs, Std Numpy interface) • Documentation (installation, native documents, etc.) • Tutorials, examples | Jupyter Notebooks • Platform support (Linux, Windows, OS X, mobile …) • Language bindings (Python, C++, R, Scala, Julia, JavaScript …) • Sparse datatypes and LSTM performance improvements • Deploy your model your way: Lambda (+GreenGrass), Amazon EC2/Docker, Raspberry Pi
  • 27.
  • 28.
    Apache MXNet |Developer Tools and Resources
  • 29.
    One-Click GPU orCPU Deep Learning AWS Deep Learning AMI Up to~40k CUDA cores Apache MXNet TensorFlow Theano Caffe Torch Keras Pre-configured CUDA drivers, MKL Anaconda, Python3 Ubuntu and Amazon Linux + AWS CloudFormation template + Container image
  • 30.
    Application Examples |Jupyter Notebooks • https://github.com/dmlc/mxnet-notebooks • Basic concepts • NDArray - multi-dimensional array computation • Symbol - symbolic expression for neural networks • Module - neural network training and inference • Applications • MNIST: recognize handwritten digits • Check out the distributed training results • Predict with pre-trained models • LSTMs for sequence learning • Recommender systems • Train a state of the art Computer Vision model (CNN) • Lots more..
  • 31.
    Developer Resources MXNet Resources: •MXNet Blog Post | AWS Endorsement • Read up on MXNet and Learn More: mxnet.io • MXNet Github Repo • MXNet Recommender Systems Talk | Leo Dirac Developer Resources: • Deep Learning AMI |Amazon Linux • Deep Learning AMI | Ubuntu • CloudFormation Template Instructions • Deep Learning Benchmark • MXNet on Lambda • MXNet on ECS/Docker • MXNet on Raspberry Pi | Image Detector using Inception Network
  • 32.
    Apache MXNet |Jupyter Notebook Demo Training MNIST on MXNet
  • 33.