IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges

Unprecedented progress
DEEP LEARNING CHALLENGES
François Courteille Principal Solutions Architect@NVIDIA – fcourteille@nvidia.com
Meetup IA / PowerAI - 17 mai 2018

2
ACKNOWLEDGEMENTS
Adam Grzywaczewski – adamg@nvidia.com

3
50% Reduction in Emergency
Road Repair Costs
>$6M / Year Savings and
Reduced Risk of Outage
INFRASTRUCTUREHEALTHCARE IOT
AI TO TRANSFORM EVERY INDUSTRY
>80% Accuracy & Immediate
Alert to Radiologists

4
PREDICTIVE
MAINTENANCE
FIELD INSPECTIONFACTORY INSPECTION
AI FOR INDUSTRIAL APPLICATIONS
Quality Inspection
Fault Detection & Classification
Inventory Inspection
Condition Based Maintenance
Remaining Useful Life
Sensor Time Series Analysis
Failure Prediction

5
UIUC & NCSA: ASTROPHYSICS
5,000X LIGO Signal Processing
U. FLORIDA & UNC: DRUG DISCOVERY
300,000X Molecular Energetics Prediction
U.S. DoE: PARTICLE PHYSICS
33% More Accurate Neutrino Detection
PRINCETON & ITER: CLEAN ENERGY
50% Higher Accuracy for Fusion Sustainment
LBNL: High Energy Particle PHYSICS
100,000X faster simulated output by GAN
Univ. of Illinois: DRUG DISCOVERY
2X Accuracy for Protein Structure Prediction
DEEP LEARNING COMES TO HPC
Accelerates Scientific Discovery

7
CHALLENGES
Key challenges when developing Deep Learning projects
• Data & Models
• Infrastructure
• Software
• Algorithms
• People

9
DEEP LEARNING INSTITUTE
DLI Mission: Help the world to solve the most challenging
problems using AI and deep learning
We help developers, data scientists and engineers to get
started in architecting, optimizing, and deploying neural
networks to solve real-world problems in diverse industries
such as autonomous vehicles, healthcare, robotics, media
& entertainment and game development.

MULTI GPU DEEP LEARNING TRAINING

11
October 9-11, 2018 | Munich | #GTC18
www.gputechconf.eu
CONNECT
Connect with technology
experts from NVIDIA and
other leading organizations
LEARN
Gain insight and valuable
hands-on training through
hundreds of sessions and
research posters
DISCOVER
See how GPU technologies
are creating amazing
breakthroughs in important
fields such as deep learning
INNOVATE
Hear about disruptive
innovations as early-stage
companies and startups
present their work
Don’t miss the world’s most important event for GPU developers
October 9—11, 2018 in Munich
REGISTRATION IS OPEN AT WWW.GPUTECHCONF.EU

14
NEURAL NETWORKS ARE NOT NEW
Historically we never had large datasets
Dataset Size
Accuracy
0 5 10 15 20 25
Algorithm performance in small data regime
Small NN ML1
ML2 ML3
The MNIST (1999) database
contains 60,000 training images
and 10,000 testing images.

15
Availability of data and compute
Dataset Size
Accuracy
0 50 100 150 200 250
Algorithm performance in big data regime
Small NN ML1
ML2 ML3

16
Data and model size are the key to accuracy
Dataset Size
Accuracy
0 50 100 150 200 250 300 350 400 450
Algorithm performance in big data regime
Small NN ML1 ML2 ML3 Big NN

17
Surpassing human-level performance
Dataset Size
Accuracy
0 500 1000 1500 2000 2500
Algorithm performance in large data regime
Small NN ML1 ML2 ML3 Big NN Bigger NN

19
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
Logarithmic relationship between the dataset size and accuracy
• Translation
• Language Models
• Character Language Models
• Image Classification
• Attention Speech Models

21
EXPLODING DATASETS
Logarithmic relationship between the dataset size and accuracy

22
EXPLODING DATASETS
The good news – you can calculate how much data you need
Step 1: Establish how much accuracy you need
Target
accuracy

23
EXPLODING DATASETS
Step 2: Conduct a number of experiments with dataset of the varying size (in the Power-law region)
Target
accuracy

24
EXPLODING DATASETS
Step 3: Interpolate
Target
accuracy

25
EXPLODING DATASETS
The bad news – log-scale
Target
accuracy

27
2016 – Baidu Deep Speech 2
Superhuman Voice Recognition
2015 – Microsoft ResNet
Superhuman Image Recognition
2017 – Google Neural Machine Translation
Near Human Language Translation
100 ExaFLOPS
8700 Million Parameters
20 ExaFLOPS
7 ExaFLOPS
To Tackle Increasingly Complex Challenges
NEURAL NETWORK COMPLEXITY IS EXPLODING

28
100 EXAFLOPS
=
2 YEARS ON A DUAL CPU SERVER

29
EXPLODING MODEL COMPLEXITY
„Outrageously large neural networks“ – size does matter
Shazeer, Noam, et al. "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538 (2017).
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
Nuts and Bolts of Applying Deep Learning, Andrew Ng, 2016 - https://youtu.be/F1ka6a13S9I
VGG 19 vs Google LSTM using
Sparsely-Gated Mixture-of-Experts
layer

30
EXPLODING MODEL COMPLEXITY
Good news – model size scales sublinearly
BestModelSize
Amount of Data
Model Size Scales Sublinearly

31
EVIDENCE FROM IMAGE PROCESSING
Good news – model size scales sublinearly
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." arXiv preprint arXiv:1707.07012 (2017).

32
IMPLICATIONS
The good news: Requirements
are predictable.
We can predict how much data we
will need
We can predict how much
computing power we will need
The bad news: The values can
be significant.
Good and bad news

33
IMPLICATIONS
Experimental Nature of Deep Learning - Unacceptable training time
Idea
Code
Experiment

34
CONCLUSIONS
What does your research team do in the mean time?
Xkcd.com

35
THE GPU COMPUTING
From Gaming to High Performance Computing

36
TESLA V100 32GB
WORLD’S MOST ADVANCED DATA CENTER GPU
NOW WITH 2X THE MEMORY
5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
32GB HBM2 @ 900GB/s | 300GB/s NVLink

37
FASTER RESULTS ON COMPLEX DL AND HPC
Up to 50% Faster Results With 2x The Memory
Unsupervised Image
Translation
Input winter photo
AI converts it to
summer
Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with
TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with
cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)
Neural Machine
Translation (NMT)
3D FFT 1k x 1k x 1k
1.5X Faster
Calculations
1.5X Faster
Language Translation
1.2
step/sec
0.8
step/sec
2.5TF
3.8TF
GAN Image to ImageGen
1024x1024
res images
512x512
res images
4X Higher
resolution
Accuracy
(16 layers)
Accuracy
(152 layers)
HIGHER ACCURACY HIGHER RESOLUTIONFASTER RESULTS
R-CNN for object detection at 1080P with Caffe | V100 16GB
uses VGG16| V100 32GB uses Resnet-152
V100 16GB V100 32GB
VGG-16 RN-152
40% Lower Error
Rate
GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) |
V100 16GB and V100 32GB with FP32

38
IMPLICATIONS
Automotive example
Majority of useful problems are too complex for a single GPU training
VERY CONSERVATIVE CONSERVATIVE
Fleet size (data capture per hour) 100 cars / 1TB/hour 125 cars / 1.5TB/hour
Duration of data collection 260 days * 8 hours 325 days * 10 hours
Data Compression factor 0.0005 0.0008
Total training set 104 TB 487.5 TB
InceptionV3 training time
(with 1 Pascal GPU)
9.1 years 42.6 years
AlexNet training time
(with 1 Pascal GPU)
1.1 years 5.4 years

46
BALANCED SYSTEM DESIGNED FOR SCALE

48
PREFERRED NETWORKS
It consists of 128 nodes with 8 NVIDIA
P100 GPUs each, for 1024 GPUs in
total.
The nodes are connected with two FDR
Infiniband links (56Gbps x 2).
Training ImageNet in 15 minutes
Akiba, T., Suzuki, S., & Fukuda, K. (2017). Extremely large minibatch sgd: Training resnet-50 on
imagenet in 15 minutes. arXiv preprint arXiv:1711.04325.

49
DGX FAMILY
DGX-1V – 1 PetaFLOP FP16

50
IBM POWER9 ARCHITECTURE
CPU-GPU FAST CONNECTION
NVLINK

IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges

Similar to IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges (20)

More from IBM France Lab

More from IBM France Lab (20)

Recently uploaded

Recently uploaded (20)

IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges