SlideShare a Scribd company logo
Unprecedented progress
DEEP LEARNING CHALLENGES
François Courteille Principal Solutions Architect@NVIDIA – fcourteille@nvidia.com
Meetup IA / PowerAI - 17 mai 2018
2
ACKNOWLEDGEMENTS
Adam Grzywaczewski – adamg@nvidia.com
3
50% Reduction in Emergency
Road Repair Costs
>$6M / Year Savings and
Reduced Risk of Outage
INFRASTRUCTUREHEALTHCARE IOT
AI TO TRANSFORM EVERY INDUSTRY
>80% Accuracy & Immediate
Alert to Radiologists
4
PREDICTIVE
MAINTENANCE
FIELD INSPECTIONFACTORY INSPECTION
AI FOR INDUSTRIAL APPLICATIONS
Quality Inspection
Fault Detection & Classification
Inventory Inspection
Condition Based Maintenance
Remaining Useful Life
Sensor Time Series Analysis
Failure Prediction
5
UIUC & NCSA: ASTROPHYSICS
5,000X LIGO Signal Processing
U. FLORIDA & UNC: DRUG DISCOVERY
300,000X Molecular Energetics Prediction
U.S. DoE: PARTICLE PHYSICS
33% More Accurate Neutrino Detection
PRINCETON & ITER: CLEAN ENERGY
50% Higher Accuracy for Fusion Sustainment
LBNL: High Energy Particle PHYSICS
100,000X faster simulated output by GAN
Univ. of Illinois: DRUG DISCOVERY
2X Accuracy for Protein Structure Prediction
DEEP LEARNING COMES TO HPC
Accelerates Scientific Discovery
6
CHALLENGES
7
CHALLENGES
Key challenges when developing Deep Learning projects
• Data & Models
• Infrastructure
• Software
• Algorithms
• People
8
PEOPLE
9
DEEP LEARNING INSTITUTE
DLI Mission: Help the world to solve the most challenging
problems using AI and deep learning
We help developers, data scientists and engineers to get
started in architecting, optimizing, and deploying neural
networks to solve real-world problems in diverse industries
such as autonomous vehicles, healthcare, robotics, media
& entertainment and game development.
MULTI GPU DEEP LEARNING TRAINING
11
October 9-11, 2018 | Munich | #GTC18
www.gputechconf.eu
CONNECT
Connect with technology
experts from NVIDIA and
other leading organizations
LEARN
Gain insight and valuable
hands-on training through
hundreds of sessions and
research posters
DISCOVER
See how GPU technologies
are creating amazing
breakthroughs in important
fields such as deep learning
INNOVATE
Hear about disruptive
innovations as early-stage
companies and startups
present their work
Don’t miss the world’s most important event for GPU developers
October 9—11, 2018 in Munich
REGISTRATION IS OPEN AT WWW.GPUTECHCONF.EU
14
NEURAL NETWORKS ARE NOT NEW
Historically we never had large datasets
Dataset Size
Accuracy
0 5 10 15 20 25
Algorithm performance in small data regime
Small NN ML1
ML2 ML3
The MNIST (1999) database
contains 60,000 training images
and 10,000 testing images.
15
NEURAL NETWORKS ARE NOT NEW
Availability of data and compute
Dataset Size
Accuracy
0 50 100 150 200 250
Algorithm performance in big data regime
Small NN ML1
ML2 ML3
16
NEURAL NETWORKS ARE NOT NEW
Data and model size are the key to accuracy
Dataset Size
Accuracy
0 50 100 150 200 250 300 350 400 450
Algorithm performance in big data regime
Small NN ML1 ML2 ML3 Big NN
17
NEURAL NETWORKS ARE NOT NEW
Surpassing human-level performance
Dataset Size
Accuracy
0 500 1000 1500 2000 2500
Algorithm performance in large data regime
Small NN ML1 ML2 ML3 Big NN Bigger NN
18
DATA
19
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
Logarithmic relationship between the dataset size and accuracy
• Translation
• Language Models
• Character Language Models
• Image Classification
• Attention Speech Models
21
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
Logarithmic relationship between the dataset size and accuracy
22
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
The good news – you can calculate how much data you need
Step 1: Establish how much accuracy you need
Target
accuracy
23
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
The good news – you can calculate how much data you need
Step 2: Conduct a number of experiments with dataset of the varying size (in the Power-law region)
Target
accuracy
24
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
The good news – you can calculate how much data you need
Step 3: Interpolate
Target
accuracy
25
EXPLODING DATASETS
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
The bad news – log-scale
Target
accuracy
26
MODEL
27
2016 – Baidu Deep Speech 2
Superhuman Voice Recognition
2015 – Microsoft ResNet
Superhuman Image Recognition
2017 – Google Neural Machine Translation
Near Human Language Translation
100 ExaFLOPS
8700 Million Parameters
20 ExaFLOPS
300 Million Parameters
7 ExaFLOPS
60 Million Parameters
To Tackle Increasingly Complex Challenges
NEURAL NETWORK COMPLEXITY IS EXPLODING
28
100 EXAFLOPS
=
2 YEARS ON A DUAL CPU SERVER
29
EXPLODING MODEL COMPLEXITY
„Outrageously large neural networks“ – size does matter
Shazeer, Noam, et al. "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538 (2017).
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
Nuts and Bolts of Applying Deep Learning, Andrew Ng, 2016 - https://youtu.be/F1ka6a13S9I
VGG 19 vs Google LSTM using
Sparsely-Gated Mixture-of-Experts
layer
30
EXPLODING MODEL COMPLEXITY
Good news – model size scales sublinearly
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409.
BestModelSize
Amount of Data
Model Size Scales Sublinearly
31
EVIDENCE FROM IMAGE PROCESSING
Good news – model size scales sublinearly
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." arXiv preprint arXiv:1707.07012 (2017).
32
IMPLICATIONS
The good news: Requirements
are predictable.
We can predict how much data we
will need
We can predict how much
computing power we will need
The bad news: The values can
be significant.
Good and bad news
33
IMPLICATIONS
Experimental Nature of Deep Learning - Unacceptable training time
Idea
Code
Experiment
34
CONCLUSIONS
What does your research team do in the mean time?
Xkcd.com
35
THE GPU COMPUTING
From Gaming to High Performance Computing
36
TESLA V100 32GB
WORLD’S MOST ADVANCED DATA CENTER GPU
NOW WITH 2X THE MEMORY
5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
32GB HBM2 @ 900GB/s | 300GB/s NVLink
37
FASTER RESULTS ON COMPLEX DL AND HPC
Up to 50% Faster Results With 2x The Memory
Unsupervised Image
Translation
Input winter photo
AI converts it to
summer
Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with
TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with
cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)
Neural Machine
Translation (NMT)
3D FFT 1k x 1k x 1k
1.5X Faster
Calculations
1.5X Faster
Language Translation
1.2
step/sec
0.8
step/sec
2.5TF
3.8TF
GAN Image to ImageGen
1024x1024
res images
512x512
res images
4X Higher
resolution
Accuracy
(16 layers)
Accuracy
(152 layers)
HIGHER ACCURACY HIGHER RESOLUTIONFASTER RESULTS
R-CNN for object detection at 1080P with Caffe | V100 16GB
uses VGG16| V100 32GB uses Resnet-152
V100 16GB V100 32GB
VGG-16 RN-152
40% Lower Error
Rate
GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) |
V100 16GB and V100 32GB with FP32
38
IMPLICATIONS
Automotive example
Majority of useful problems are too complex for a single GPU training
VERY CONSERVATIVE CONSERVATIVE
Fleet size (data capture per hour) 100 cars / 1TB/hour 125 cars / 1.5TB/hour
Duration of data collection 260 days * 8 hours 325 days * 10 hours
Data Compression factor 0.0005 0.0008
Total training set 104 TB 487.5 TB
InceptionV3 training time
(with 1 Pascal GPU)
9.1 years 42.6 years
AlexNet training time
(with 1 Pascal GPU)
1.1 years 5.4 years
46
BALANCED SYSTEM DESIGNED FOR SCALE
48
PREFERRED NETWORKS
It consists of 128 nodes with 8 NVIDIA
P100 GPUs each, for 1024 GPUs in
total.
The nodes are connected with two FDR
Infiniband links (56Gbps x 2).
Training ImageNet in 15 minutes
Akiba, T., Suzuki, S., & Fukuda, K. (2017). Extremely large minibatch sgd: Training resnet-50 on
imagenet in 15 minutes. arXiv preprint arXiv:1711.04325.
49
DGX FAMILY
DGX-1V – 1 PetaFLOP FP16
50
IBM POWER9 ARCHITECTURE
CPU-GPU FAST CONNECTION
NVLINK

More Related Content

What's hot

ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
Ahmed Gad
 
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
Tarik Reza Toha
 

What's hot (6)

Transferable GAN-generated Images Detection Framework.
Transferable GAN-generated Images  Detection Framework.Transferable GAN-generated Images  Detection Framework.
Transferable GAN-generated Images Detection Framework.
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Capitalico / Chart Pattern Matching in Financial Trading Using RNN
Capitalico / Chart Pattern Matching in Financial Trading Using RNNCapitalico / Chart Pattern Matching in Financial Trading Using RNN
Capitalico / Chart Pattern Matching in Financial Trading Using RNN
 
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...
 
Productivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPCProductivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPC
 

Similar to IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges

Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Techsylvania
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
Dr. Haxel Consult
 

Similar to IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges (20)

Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
 
Big Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedBig Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning Demystified
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
Learning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven RegularizationLearning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven Regularization
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
 
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Talk on using AI to address some of humanities problems
Talk on using AI to address some of humanities problemsTalk on using AI to address some of humanities problems
Talk on using AI to address some of humanities problems
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning Tomography
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
stanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfstanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdf
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
 

More from IBM France Lab

20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
IBM France Lab
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
IBM France Lab
 
IBM Cloud Bordeaux Meetup - 20190325 - Software Factory
IBM Cloud Bordeaux Meetup - 20190325 - Software FactoryIBM Cloud Bordeaux Meetup - 20190325 - Software Factory
IBM Cloud Bordeaux Meetup - 20190325 - Software Factory
IBM France Lab
 

More from IBM France Lab (20)

20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
 
20200114 - IBM Cloud Paris Meetup - DevOps
20200114 - IBM Cloud Paris Meetup - DevOps20200114 - IBM Cloud Paris Meetup - DevOps
20200114 - IBM Cloud Paris Meetup - DevOps
 
20200128 - Meetup Nice Côte d'Azur - Agile Mindset
20200128 - Meetup Nice Côte d'Azur - Agile Mindset20200128 - Meetup Nice Côte d'Azur - Agile Mindset
20200128 - Meetup Nice Côte d'Azur - Agile Mindset
 
Défis de l'IA : droits, devoirs, enjeux économiques et éthiques
Défis de l'IA : droits, devoirs, enjeux économiques et éthiquesDéfis de l'IA : droits, devoirs, enjeux économiques et éthiques
Défis de l'IA : droits, devoirs, enjeux économiques et éthiques
 
Meetup ibm abakus banque postale
Meetup ibm abakus banque postaleMeetup ibm abakus banque postale
Meetup ibm abakus banque postale
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
 
IBM Watson IOT - Acoustic or Visual Insights
IBM Watson IOT - Acoustic or Visual InsightsIBM Watson IOT - Acoustic or Visual Insights
IBM Watson IOT - Acoustic or Visual Insights
 
Retour expérience Track & Trace - IBM using Sigfox.
Retour expérience Track & Trace - IBM using Sigfox.Retour expérience Track & Trace - IBM using Sigfox.
Retour expérience Track & Trace - IBM using Sigfox.
 
20190520 - IBM Cloud Paris-Saclay Meetup - Hardis Group
20190520  - IBM Cloud Paris-Saclay Meetup - Hardis Group20190520  - IBM Cloud Paris-Saclay Meetup - Hardis Group
20190520 - IBM Cloud Paris-Saclay Meetup - Hardis Group
 
IBM Cloud Paris Meetup - 20190520 - IA & Power
IBM Cloud Paris Meetup - 20190520 - IA & PowerIBM Cloud Paris Meetup - 20190520 - IA & Power
IBM Cloud Paris Meetup - 20190520 - IA & Power
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
 
IBM Cloud Bordeaux Meetup - 20190325 - Software Factory
IBM Cloud Bordeaux Meetup - 20190325 - Software FactoryIBM Cloud Bordeaux Meetup - 20190325 - Software Factory
IBM Cloud Bordeaux Meetup - 20190325 - Software Factory
 
IBM Cloud Paris Meetup - 20190129 - Assima
IBM Cloud Paris Meetup - 20190129 - AssimaIBM Cloud Paris Meetup - 20190129 - Assima
IBM Cloud Paris Meetup - 20190129 - Assima
 
IBM Cloud Paris Meetup - 20190129 - Myrtea
IBM Cloud Paris Meetup - 20190129 - MyrteaIBM Cloud Paris Meetup - 20190129 - Myrtea
IBM Cloud Paris Meetup - 20190129 - Myrtea
 
IBM Cloud Paris Meetup - 20181016 - L'agilité à l'échelle
IBM Cloud Paris Meetup - 20181016 - L'agilité à l'échelleIBM Cloud Paris Meetup - 20181016 - L'agilité à l'échelle
IBM Cloud Paris Meetup - 20181016 - L'agilité à l'échelle
 
IBM Cloud Côte d'Azur Meetup - Blockchain Business Processes & Rule-based Sm...
IBM Cloud Côte d'Azur Meetup - Blockchain Business Processes &  Rule-based Sm...IBM Cloud Côte d'Azur Meetup - Blockchain Business Processes &  Rule-based Sm...
IBM Cloud Côte d'Azur Meetup - Blockchain Business Processes & Rule-based Sm...
 
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger WorkshopIBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop
 
IBM Cloud Paris Meetup - 20180911 - Common Ledger for Public Administration
IBM Cloud Paris Meetup - 20180911 - Common Ledger for Public AdministrationIBM Cloud Paris Meetup - 20180911 - Common Ledger for Public Administration
IBM Cloud Paris Meetup - 20180911 - Common Ledger for Public Administration
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 

IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges

  • 1. Unprecedented progress DEEP LEARNING CHALLENGES François Courteille Principal Solutions Architect@NVIDIA – fcourteille@nvidia.com Meetup IA / PowerAI - 17 mai 2018
  • 3. 3 50% Reduction in Emergency Road Repair Costs >$6M / Year Savings and Reduced Risk of Outage INFRASTRUCTUREHEALTHCARE IOT AI TO TRANSFORM EVERY INDUSTRY >80% Accuracy & Immediate Alert to Radiologists
  • 4. 4 PREDICTIVE MAINTENANCE FIELD INSPECTIONFACTORY INSPECTION AI FOR INDUSTRIAL APPLICATIONS Quality Inspection Fault Detection & Classification Inventory Inspection Condition Based Maintenance Remaining Useful Life Sensor Time Series Analysis Failure Prediction
  • 5. 5 UIUC & NCSA: ASTROPHYSICS 5,000X LIGO Signal Processing U. FLORIDA & UNC: DRUG DISCOVERY 300,000X Molecular Energetics Prediction U.S. DoE: PARTICLE PHYSICS 33% More Accurate Neutrino Detection PRINCETON & ITER: CLEAN ENERGY 50% Higher Accuracy for Fusion Sustainment LBNL: High Energy Particle PHYSICS 100,000X faster simulated output by GAN Univ. of Illinois: DRUG DISCOVERY 2X Accuracy for Protein Structure Prediction DEEP LEARNING COMES TO HPC Accelerates Scientific Discovery
  • 7. 7 CHALLENGES Key challenges when developing Deep Learning projects • Data & Models • Infrastructure • Software • Algorithms • People
  • 9. 9 DEEP LEARNING INSTITUTE DLI Mission: Help the world to solve the most challenging problems using AI and deep learning We help developers, data scientists and engineers to get started in architecting, optimizing, and deploying neural networks to solve real-world problems in diverse industries such as autonomous vehicles, healthcare, robotics, media & entertainment and game development.
  • 10. MULTI GPU DEEP LEARNING TRAINING
  • 11. 11 October 9-11, 2018 | Munich | #GTC18 www.gputechconf.eu CONNECT Connect with technology experts from NVIDIA and other leading organizations LEARN Gain insight and valuable hands-on training through hundreds of sessions and research posters DISCOVER See how GPU technologies are creating amazing breakthroughs in important fields such as deep learning INNOVATE Hear about disruptive innovations as early-stage companies and startups present their work Don’t miss the world’s most important event for GPU developers October 9—11, 2018 in Munich REGISTRATION IS OPEN AT WWW.GPUTECHCONF.EU
  • 12. 14 NEURAL NETWORKS ARE NOT NEW Historically we never had large datasets Dataset Size Accuracy 0 5 10 15 20 25 Algorithm performance in small data regime Small NN ML1 ML2 ML3 The MNIST (1999) database contains 60,000 training images and 10,000 testing images.
  • 13. 15 NEURAL NETWORKS ARE NOT NEW Availability of data and compute Dataset Size Accuracy 0 50 100 150 200 250 Algorithm performance in big data regime Small NN ML1 ML2 ML3
  • 14. 16 NEURAL NETWORKS ARE NOT NEW Data and model size are the key to accuracy Dataset Size Accuracy 0 50 100 150 200 250 300 350 400 450 Algorithm performance in big data regime Small NN ML1 ML2 ML3 Big NN
  • 15. 17 NEURAL NETWORKS ARE NOT NEW Surpassing human-level performance Dataset Size Accuracy 0 500 1000 1500 2000 2500 Algorithm performance in large data regime Small NN ML1 ML2 ML3 Big NN Bigger NN
  • 17. 19 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. Logarithmic relationship between the dataset size and accuracy • Translation • Language Models • Character Language Models • Image Classification • Attention Speech Models
  • 18. 21 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. Logarithmic relationship between the dataset size and accuracy
  • 19. 22 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. The good news – you can calculate how much data you need Step 1: Establish how much accuracy you need Target accuracy
  • 20. 23 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. The good news – you can calculate how much data you need Step 2: Conduct a number of experiments with dataset of the varying size (in the Power-law region) Target accuracy
  • 21. 24 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. The good news – you can calculate how much data you need Step 3: Interpolate Target accuracy
  • 22. 25 EXPLODING DATASETS Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. The bad news – log-scale Target accuracy
  • 24. 27 2016 – Baidu Deep Speech 2 Superhuman Voice Recognition 2015 – Microsoft ResNet Superhuman Image Recognition 2017 – Google Neural Machine Translation Near Human Language Translation 100 ExaFLOPS 8700 Million Parameters 20 ExaFLOPS 300 Million Parameters 7 ExaFLOPS 60 Million Parameters To Tackle Increasingly Complex Challenges NEURAL NETWORK COMPLEXITY IS EXPLODING
  • 25. 28 100 EXAFLOPS = 2 YEARS ON A DUAL CPU SERVER
  • 26. 29 EXPLODING MODEL COMPLEXITY „Outrageously large neural networks“ – size does matter Shazeer, Noam, et al. "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538 (2017). Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). Nuts and Bolts of Applying Deep Learning, Andrew Ng, 2016 - https://youtu.be/F1ka6a13S9I VGG 19 vs Google LSTM using Sparsely-Gated Mixture-of-Experts layer
  • 27. 30 EXPLODING MODEL COMPLEXITY Good news – model size scales sublinearly Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409. BestModelSize Amount of Data Model Size Scales Sublinearly
  • 28. 31 EVIDENCE FROM IMAGE PROCESSING Good news – model size scales sublinearly Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." arXiv preprint arXiv:1707.07012 (2017).
  • 29. 32 IMPLICATIONS The good news: Requirements are predictable. We can predict how much data we will need We can predict how much computing power we will need The bad news: The values can be significant. Good and bad news
  • 30. 33 IMPLICATIONS Experimental Nature of Deep Learning - Unacceptable training time Idea Code Experiment
  • 31. 34 CONCLUSIONS What does your research team do in the mean time? Xkcd.com
  • 32. 35 THE GPU COMPUTING From Gaming to High Performance Computing
  • 33. 36 TESLA V100 32GB WORLD’S MOST ADVANCED DATA CENTER GPU NOW WITH 2X THE MEMORY 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS 20MB SM RF | 16MB Cache 32GB HBM2 @ 900GB/s | 300GB/s NVLink
  • 34. 37 FASTER RESULTS ON COMPLEX DL AND HPC Up to 50% Faster Results With 2x The Memory Unsupervised Image Translation Input winter photo AI converts it to summer Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V) Neural Machine Translation (NMT) 3D FFT 1k x 1k x 1k 1.5X Faster Calculations 1.5X Faster Language Translation 1.2 step/sec 0.8 step/sec 2.5TF 3.8TF GAN Image to ImageGen 1024x1024 res images 512x512 res images 4X Higher resolution Accuracy (16 layers) Accuracy (152 layers) HIGHER ACCURACY HIGHER RESOLUTIONFASTER RESULTS R-CNN for object detection at 1080P with Caffe | V100 16GB uses VGG16| V100 32GB uses Resnet-152 V100 16GB V100 32GB VGG-16 RN-152 40% Lower Error Rate GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) | V100 16GB and V100 32GB with FP32
  • 35. 38 IMPLICATIONS Automotive example Majority of useful problems are too complex for a single GPU training VERY CONSERVATIVE CONSERVATIVE Fleet size (data capture per hour) 100 cars / 1TB/hour 125 cars / 1.5TB/hour Duration of data collection 260 days * 8 hours 325 days * 10 hours Data Compression factor 0.0005 0.0008 Total training set 104 TB 487.5 TB InceptionV3 training time (with 1 Pascal GPU) 9.1 years 42.6 years AlexNet training time (with 1 Pascal GPU) 1.1 years 5.4 years
  • 37. 48 PREFERRED NETWORKS It consists of 128 nodes with 8 NVIDIA P100 GPUs each, for 1024 GPUs in total. The nodes are connected with two FDR Infiniband links (56Gbps x 2). Training ImageNet in 15 minutes Akiba, T., Suzuki, S., & Fukuda, K. (2017). Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325.
  • 38. 49 DGX FAMILY DGX-1V – 1 PetaFLOP FP16
  • 39. 50 IBM POWER9 ARCHITECTURE CPU-GPU FAST CONNECTION NVLINK