AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

Deep Learning & MXNet
Name: Jhen-Wei Huang (黃振維)
Department: Solutions Architect
Company: AWS

Image
understanding
Speech
recognition
Natural language
processing
…
Autonomy
Deep Learning Applications
https://trends.google.com/
Deep Learning Applications

Autonomous Driving SystemsAutonomous Driving Systems

Personal assistantsPersonal assistants

Line-free shoppingLine-free shopping

Why It’s Different This Time
Everything is digital: large data sets are available
Imagenet: 14M+ labeled images - http://www.image-net.org/
YouTube-8M: 7M+ labeled videos - https://research.google.com/youtube8m/
AWS public data sets - https://aws.amazon.com/public-datasets/
The parallel computing power of GPUs make training possible
Simard et al (2005), Ciresan et al (2011)
State of the art networks have hundreds of layers
Baidu’s Chinese speech recognition: 4TB of training data, +/- 10 Exaflops
Cloud scalability and elasticity make training affordable
Grab a lot of resources for fast training, then release them
Using a DL model is lightweight: you can do it on a Raspberry Pi

Deep Learning Models
Input Output
1 1 1
1 0 1
0 0 0
3
mx.sym.Convolution(data, kernel=(5,5), num_filter=20)
mx.sym.Pooling(data, pool_type="max", kernel=(2,2),
stride=(2,2)
lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx.sym.FullyConnected(data, num_hidden=128)
2
mx.symbol.Embedding(data, input_dim, output_dim = k)
0.2
-0.1
...
0.7
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)
mx.sym.Activation(data, act_type="xxxx")
"relu"
"tanh"
"sigmoid"
"softrelu"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding
Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding
Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx.model.FeedForward model.fit
mx.sym.SoftmaxOutput
Anatomy of a Deep Learning Model

Apache MXNet
Apache MXNet
2015 Project created
2016 Nov. Be Amazon DL framework of choice
2017 Jan. Project enters incubation
Latest Version : 0.11
Contributors : 436
Star : 11,578
License : Apache 2.0

Apache MXNet 0.11
- Julien Simon, https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0
https://aws.amazon.com/cn/blogs/ai/bring-machine-learning-to-ios-apps-using-apache-mxnet-and-apple-core-ml/
Apache MXNet 0.11 Released

Apache MXNet for Deep Learning
https://github.com/apache/incubator-mxnet
Apache MXNet for Deep Learning

Apache MXNet for IoT & the Edge
Apache MXNet for IoT & the Edge

BlindTool by Joseph Paul Cohen, demo on Nexus 4
• Fit the core library with all dependencies into a single
C++ source file
• Easy to compile on any platform
Amalgamation
Runs in browser with Javascript

Apache MXNet – Per formance
https://github.com/ilkarman/DeepLearningFrameworks
Apache MXNet - Performance

Multi-GPU Scaling With MXNet
Multi-GPU Scaling With MXNet

Multi-Machine Scaling With MXNet
Multi-Machine Scaling With MXnet
• Cloud Formation with Deep Learning AMI
• 16x P2.16xlarge
• Mounted on EFS
• ImageNet, 1.2M images, 1K classes
• 152-layer ResNet , 5.4d on 4xK80s (1.2h per epoch)
• 0.22 top-1 validation error
• 6h on 128x K80s , achieves the same validation error

Apache MXNet | The Basics
Apache MXNet | The Basics
• NDArray: Manipulate multi-dimensional arrays in a command line paradigm
(imperative).
• Symbol: Symbolic expression for neural networks (declarative).
• Module: Intermediate-level and high-level interface for neural network
training and inference.
• Loading Data: Feeding data into training/inference programs.
• Mixed Programming: Training algorithms developed using NDArrays in
concert with Symbols.
https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-1-848febdcf8ab

Apache MXNet – NDArr yApache MXNet - NDArray
NumPy
• Only CPU
• No automatic differentiation
NDArray
• Support CPUs/GPUs
• Scale to distributed system in
the cloud
• Automatic differentiation
• Lazy evaluation

Apache MXNet – NDArr y
context = mx.cpu()
context = mx.gpu(0)
context = mx.gpu(1)
g = copyto(c)
g = c.as_in_context(mx.gpu(0))
Apache MXNet - NDArray

Imperative Programming
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger).
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONSEasy to tweak
in Python

Declarative Programming
• More chances for
optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONSC can share memory with
D because C is deleted
later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X

Mixed Programming Paradigm
IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph

Demo – Training MXNet on MNIST
https://medium.com/@julsimon/training-mxnet-part-1-mnist-6f0dc4210c62
https://github.com/juliensimon/aws/tree/master/mxnet/mnist

Apache MXNet New Interface – Gluon
New Interface - Gluon

net.hybridize()
develop and debug models
with imperative programming
and switch to efficient symbolic
execution

Symbolic Gluon Imperative
• Efficient & Portable
• But hard to use
• Imperative for developing
• Symbolic for deploying
• Flexible
• May be slow

Apache MXNet Gluon 演示 – Fashion-MNISTMXNet Gluon – Fashion-MNIST
https://github.com/zalandoresearch/fashion-mnist

Apache MXNet
Developer Tools and Resources

Apache MXNet 框架 – AWS 的基础设施
CPU Instance
• C4.8xlarge (36 threads, 60GB RAM, 4Gbit)
• M4.16xlarge (64 threads, 256GB RAM, 10Gbit)
GPU Instance
• P2.16xlarge ( 16X NVIDIA Kepler K80, 64 threads, 732GB RAM, 20Gbit)
• G3.16xlarge (4XNVIDIA Maxwell M60, 64 threads, 488GB RAM, 20Gbit)
• NVIDIA Volta – coming to an instance near you
AWS EC2 Instance

One-Click GPU or CPU
Deep Learning
AWS Deep Learning AMI
Up to~40k CUDA cores
Apache MXNet
TensorFlow
Theano
Caffe
Torch
Keras
Pre-configured CUDA drivers, MKL
Anaconda, Python3
Ubuntu and Amazon Linux
+ AWS CloudFormation template
+ Container image

Apache MXNet 框架 – AWS 的基础设施AWS Machine Image for Deep Learning
http://docs.aws.amazon.com/mxnet/latest/dg/appendix-ami-release-notes.html

Apache MXNet 框架 – AWS 的基础设施AWS CloudFormation Template for Deep Learning
https://github.com/awslabs/deeplearning-cfn
• The AWS CloudFormation Deep
Learning template uses
the Amazon Deep Learning AMI to
launch a cluster of EC2 instances
and other AWS resources needed
to perform distributed deep
learning.

Apache MXNet 框架演示 – AWS上的集群图像分类训练
$ssh –A –i xxx.pem ubuntu@xxx.xxx.xxx.xxx
$mkdir $EFS_MOUNT/cifar_model/
$cd $EFS_MOUNT/deeplearning-cfn/examples/mxnet/example/image
classification/
$ ../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT
-H $DEEPLEARNING_WORKERS_PATH python train_cifar10.py
--network resnet --num-layers 110 --kv-store dist_device_sync
--model-prefix $EFS_MOUNT/cifar_model/cifar --num-epochs 300 --batch-size 128
The CIFAR-10 dataset consists of 60000 32x32 colour
images in 10 classes, with 6000 images per class. There
are 50000 training images and 10000 test images
Running Distributed Training

Apache MXNet 框架演示 – AWS上的集群图像分类训练
Environment 1 x P2.16xLarge 5 x P2.16xlarge
Execution Time 36,122s = 10.3h 7,632s = 2.12h
Performance Improved 473%
Running Distributed Training

Apache MXNet 框架 – 从源代码编译安装
USE_CUDA=1
USE_CUDNN=1
USE_OPENMP=1
USE_CUDA_PATH=/usr/local/cuda
USE_BLAS=mkl
USE_MKL2017=1
USE_MKL2017_EXPERIMENTAL=1
MKLML_ROOT=/usr/local
USE_INTEL_PATH=/opt/intel
USE_OPENCV=1
USE_S3=1
USE_NVRTC=1
USE_DIST_KVSTORE=1
##K80 3.7,M60 5.2, GTX 1070 6.1
MSHADOW_NVCCFLAGS += -gencode
arch=compute_61,code=sm_61
mxnet/mshadow/make/mshadow.mk
mxnet/make/config.mk
Build and Installation

在Amazon ECS容器服务上部署Apache MXNetDeploy a Deep Learning Framework on
Amazon ECS
Deploy MXNet on AWS using Docker container
https://github.com/awslabs/ecs-deep-learning-workshop

使用AWS Lambda和MXNet 进行预测Seamlessly Scale Predictions with AWS
Lambda and MXNet
Leverage AWS Lambda and MXNet to
build a scalable prediction pipeline
• https://github.com/awslabs/mxnet-lambda
• https://aws.amazon.com/cn/blogs/compute/seamlessl
y-scale-predictions-with-aws-lambda-and-mxnet/

https://github.com/dmlc/mxnet-notebooks
• Basic concepts
• NDArray - multi-dimensional array computation
• Symbol - symbolic expression for neural networks
• Module - neural network training and inference
• Applications
• MNIST: recognize handwritten digits
• Check out the distributed training results
• Predict with pre-trained models
• LSTMs for sequence learning
• Recommender systems
• Train a state of the art Computer Vision model (CNN)
• Lots more..
Application Examples | Jupyter Notebooks

MXNet Resources:
• MXNet Blog Post | AWS Endorsement
• Read up on MXNet and Learn More: mxnet.io
• MXNet Github Repo
• MXNet Recommender Systems Talk | Leo Dirac
Developer Resources:
• Deep Learning AMI |Amazon Linux
• Deep Learning AMI | Ubuntu
• CloudFormation Template Instructions
• Deep Learning Benchmark
• MXNet on Lambda
• MXNet on ECS/Docker
• MXNet on Raspberry Pi | Image Detector using Inception Network
Developer Resources

Neural art
https://github.com/apache/incubator-mxnet/tree/master/example/neural-style

AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

Similar to AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet (20)

More from Amazon Web Services

More from Amazon Web Services (20)

AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet