SlideShare a Scribd company logo
1 of 39
Download to read offline
Scaling Deep Learning
Can we learn to play Atari Pong faster than a 7-year-old child?
Mark O’Connor
VP Product Management
Allinea provides
high-performance
software tools
Allinea Forge
Debug and profile
codes
Performance Reports
Monitor and tune
applications
0
50
100
150
200
250
300
350
400
450
24-core Xeon
Simulationsteps/s
Is that fast? Let’s have a look with Allinea MAP
Choose hardware that matches the application performance
caption
0
100
200
300
400
500
600
700
800
900
1000
24-core Xeon Xeon Phi (KNL)
gradient descent
What about scaling to multiple nodes?
Model-level Data-level
Large Scale Distributed Deep Networks,
Jeff Dean et al., Google Inc
Scaling image classification to multiple nodes with MPI + TensorFlow
Data-level
Reproduced this on Amazon EC2 with a MPI + TensorFlow GPU cluster
0
2
4
6
8
10
2 4 8 16 32
Speedup with increasing node count
Is the EC2 network or old GPU model limiting the scalability?
caption
What limits the performance most as we scale up?
caption
0%
20%
40%
60%
80%
100%
2 4 8 16 32
MPI time Mem time Other
Diving deeper with MAP – why is so much time spent in memory accesses?
0
5
10
15
20
25
30
Runtime with 32 nodes
Print error every epoch Print error every 30 epochs
2.2x speedup by reducing the frequency of progress updates
gradient descent
Scaling deep reinforcement learning to multiple nodes
vs
Choosing a model: keep it simple …
Choosing a model: policy gradients
Run a policy for a while. See which actions led to high rewards.
Increase their probability.
Source: Andrej Karpathy’s blog. Karpathy is a genius.
Choosing a model: policy gradients
130 lines of python, no framework needed!
Policy gradients versus Google DeepMind’s A3C:
0
20
40
60
80
100
120
140
Hours training required to beat Atari Pong
Policy Gradients Google DeepMind AC3
So we need a 100x improvement to be competitive. Let’s go!
gradient descent
Parallel Policy Gradients on a single node:
y = 257.4x + 220
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8
Steps/s
MPI processes
Training speed on a single node
Is this code a good match for Intel® Xeon Phi™ Knight’s Landing?
Half the time is spent in well-vectorized
matrix multiplications
It scales well over multiple cores
The other half of the time is spent
emulating the Atari’s 6502 processor
We scale by running one emulator per
real CPU core
Conclusion: not a good match
Let’s use ARCHER, the UK National Supercomputing Service
Cray XC30
Intel® Xeon™
nodes
High performance
networking
Source: XKCD
What should I do while waiting for my job to begin?
Here we are on our quick EC2 cluster – how does Parallel Policy Gradients perform?
0
200
400
600
800
1000
1200
1400
1600
1800
1 4 7 10 13 16
Steps/s
MPI processes
Training speed on a single node
Time to investigate with Allinea Forge
What might cause this?
Time to investigate with Allinea Forge
Let’s try this again using Anaconda and an Intel MKL-backed Numpy
y = 336.48x + 745.83
0
1000
2000
3000
4000
5000
6000
7000
1 4 7 10 13 16
Steps/s
MPI processes
How does multimode scaling look now?
0
10000
20000
30000
40000
0 10 20 30 40 50 60 70 80 90
Steps/s
MPI processes
So are we done now? Let’s have a look at our performance at 90 cores:
Insight: we don’t really care about recording complete games
Does it matter what the score is here?
HPC
expertise
Domain
expertise
New
insights
Result from this insight: 25% speedup at 90 cores and 10x scalability
0 5000 10000 15000 20000 25000 30000 35000 40000
Steps/s with 90 cores
Update after whole game
Update every 4 seconds
The showdown: Google DeepMind’s A3C vs Supercomputer vs 7-year-old child
1
10
100
1000
10000
DQN (DeepMind,
2015)
A3C (Google, 2016) 7-year-old (James,
2016)
PPG (Allinea, 2016)
Minutes training time to defeat Atari Pong
The showdown: how fast can we beat Atari Pong with 1536 cores on Archer?
3m 54s 8m 00s 28m 00s
Extra: the best strategy found by any agent
Training stats:
•  100 wallclock minutes
•  180 k input frames
•  100 kJ to solution
Comparison to PPG:
•  30 wallclock minutes
•  880,000 k input frames
•  550,000 kJ to solution
Humans still ~103 ahead!
(… but AI has improved by
10x / year since 2014…)
Conclusions
Deep learning is going multi-node at significant scales
HPC can and should play a huge role in this
Researchers need frameworks and tools to help
them build high-performance multi-node models
Simple but scalable models can converge faster
state-of-the-art single-node models
Humans aren’t down for the count – yet!
Thank-you – meet us at booth #1508 this week!
Mark O’Connor
mark@allinea.com | @yieldthought | [ github link ]

More Related Content

What's hot

아마존의 딥러닝 기술 활용 사례
아마존의 딥러닝 기술 활용 사례아마존의 딥러닝 기술 활용 사례
아마존의 딥러닝 기술 활용 사례NAVER Engineering
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningAmanda Mackay (she/her)
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeJulien SIMON
 
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li Databricks
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the EdgeJulien SIMON
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryKarun Chennuri
 
Applications of Deep Learning in Telematics
Applications of Deep Learning in TelematicsApplications of Deep Learning in Telematics
Applications of Deep Learning in TelematicsDatabricks
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...Edge AI and Vision Alliance
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud jtsagata
 
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksYangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksAI Frontiers
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Indrajit Poddar
 
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Amazon Web Services
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化NVIDIA Taiwan
 
NetBackup vs. Competitor E (Scalability and Performance Benchmark)
NetBackup vs. Competitor E (Scalability and Performance Benchmark) NetBackup vs. Competitor E (Scalability and Performance Benchmark)
NetBackup vs. Competitor E (Scalability and Performance Benchmark) Veritas Technologies LLC
 
NetBackup vs. EMC (Scalability and Performance Benchmark)
NetBackup vs. EMC (Scalability and Performance Benchmark)NetBackup vs. EMC (Scalability and Performance Benchmark)
NetBackup vs. EMC (Scalability and Performance Benchmark)Veritas Technologies LLC
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 

What's hot (20)

아마존의 딥러닝 기술 활용 사례
아마존의 딥러닝 기술 활용 사례아마존의 딥러닝 기술 활용 사례
아마존의 딥러닝 기술 활용 사례
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine Learning
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the Edge
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud Foundry
 
Applications of Deep Learning in Telematics
Applications of Deep Learning in TelematicsApplications of Deep Learning in Telematics
Applications of Deep Learning in Telematics
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud
 
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksYangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
 
Embracing clouds
Embracing cloudsEmbracing clouds
Embracing clouds
 
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
NetBackup vs. Competitor E (Scalability and Performance Benchmark)
NetBackup vs. Competitor E (Scalability and Performance Benchmark) NetBackup vs. Competitor E (Scalability and Performance Benchmark)
NetBackup vs. Competitor E (Scalability and Performance Benchmark)
 
NetBackup vs. EMC (Scalability and Performance Benchmark)
NetBackup vs. EMC (Scalability and Performance Benchmark)NetBackup vs. EMC (Scalability and Performance Benchmark)
NetBackup vs. EMC (Scalability and Performance Benchmark)
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 

Similar to Scaling Deep Learning

Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Jun Okumura
 
Austin big data ai meetup march 14
Austin big data ai meetup march 14Austin big data ai meetup march 14
Austin big data ai meetup march 14Clarisse Hedglin
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTechgeetachauhan
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1IBM Sverige
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Threading Successes 01 Intro
Threading Successes 01   IntroThreading Successes 01   Intro
Threading Successes 01 Introguest40fc7cd
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterRenee Yao
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...Harnessing the power of Generative Adversarial Networks (GANs) for supervised...
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...Scaleway
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLSeldon
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setKognitio
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)Alibaba Cloud Korea
 

Similar to Scaling Deep Learning (20)

Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Austin big data ai meetup march 14
Austin big data ai meetup march 14Austin big data ai meetup march 14
Austin big data ai meetup march 14
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Threading Successes 01 Intro
Threading Successes 01   IntroThreading Successes 01   Intro
Threading Successes 01 Intro
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...Harnessing the power of Generative Adversarial Networks (GANs) for supervised...
Harnessing the power of Generative Adversarial Networks (GANs) for supervised...
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)
 

More from Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Scaling Deep Learning

  • 1. Scaling Deep Learning Can we learn to play Atari Pong faster than a 7-year-old child? Mark O’Connor VP Product Management
  • 2.
  • 4. Allinea Forge Debug and profile codes Performance Reports Monitor and tune applications
  • 5.
  • 7. Is that fast? Let’s have a look with Allinea MAP
  • 8. Choose hardware that matches the application performance caption 0 100 200 300 400 500 600 700 800 900 1000 24-core Xeon Xeon Phi (KNL)
  • 9. gradient descent What about scaling to multiple nodes? Model-level Data-level Large Scale Distributed Deep Networks, Jeff Dean et al., Google Inc
  • 10. Scaling image classification to multiple nodes with MPI + TensorFlow Data-level
  • 11. Reproduced this on Amazon EC2 with a MPI + TensorFlow GPU cluster 0 2 4 6 8 10 2 4 8 16 32 Speedup with increasing node count
  • 12. Is the EC2 network or old GPU model limiting the scalability? caption
  • 13. What limits the performance most as we scale up? caption 0% 20% 40% 60% 80% 100% 2 4 8 16 32 MPI time Mem time Other
  • 14. Diving deeper with MAP – why is so much time spent in memory accesses?
  • 15.
  • 16. 0 5 10 15 20 25 30 Runtime with 32 nodes Print error every epoch Print error every 30 epochs 2.2x speedup by reducing the frequency of progress updates
  • 17. gradient descent Scaling deep reinforcement learning to multiple nodes vs
  • 18. Choosing a model: keep it simple …
  • 19. Choosing a model: policy gradients Run a policy for a while. See which actions led to high rewards. Increase their probability. Source: Andrej Karpathy’s blog. Karpathy is a genius.
  • 20. Choosing a model: policy gradients 130 lines of python, no framework needed!
  • 21. Policy gradients versus Google DeepMind’s A3C: 0 20 40 60 80 100 120 140 Hours training required to beat Atari Pong Policy Gradients Google DeepMind AC3
  • 22. So we need a 100x improvement to be competitive. Let’s go! gradient descent
  • 23. Parallel Policy Gradients on a single node: y = 257.4x + 220 0 500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 Steps/s MPI processes Training speed on a single node
  • 24. Is this code a good match for Intel® Xeon Phi™ Knight’s Landing? Half the time is spent in well-vectorized matrix multiplications It scales well over multiple cores The other half of the time is spent emulating the Atari’s 6502 processor We scale by running one emulator per real CPU core Conclusion: not a good match
  • 25. Let’s use ARCHER, the UK National Supercomputing Service Cray XC30 Intel® Xeon™ nodes High performance networking
  • 26. Source: XKCD What should I do while waiting for my job to begin?
  • 27. Here we are on our quick EC2 cluster – how does Parallel Policy Gradients perform? 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 Steps/s MPI processes Training speed on a single node
  • 28. Time to investigate with Allinea Forge What might cause this?
  • 29. Time to investigate with Allinea Forge
  • 30. Let’s try this again using Anaconda and an Intel MKL-backed Numpy y = 336.48x + 745.83 0 1000 2000 3000 4000 5000 6000 7000 1 4 7 10 13 16 Steps/s MPI processes
  • 31. How does multimode scaling look now? 0 10000 20000 30000 40000 0 10 20 30 40 50 60 70 80 90 Steps/s MPI processes
  • 32. So are we done now? Let’s have a look at our performance at 90 cores:
  • 33. Insight: we don’t really care about recording complete games Does it matter what the score is here? HPC expertise Domain expertise New insights
  • 34. Result from this insight: 25% speedup at 90 cores and 10x scalability 0 5000 10000 15000 20000 25000 30000 35000 40000 Steps/s with 90 cores Update after whole game Update every 4 seconds
  • 35. The showdown: Google DeepMind’s A3C vs Supercomputer vs 7-year-old child 1 10 100 1000 10000 DQN (DeepMind, 2015) A3C (Google, 2016) 7-year-old (James, 2016) PPG (Allinea, 2016) Minutes training time to defeat Atari Pong
  • 36. The showdown: how fast can we beat Atari Pong with 1536 cores on Archer? 3m 54s 8m 00s 28m 00s
  • 37. Extra: the best strategy found by any agent Training stats: •  100 wallclock minutes •  180 k input frames •  100 kJ to solution Comparison to PPG: •  30 wallclock minutes •  880,000 k input frames •  550,000 kJ to solution Humans still ~103 ahead! (… but AI has improved by 10x / year since 2014…)
  • 38. Conclusions Deep learning is going multi-node at significant scales HPC can and should play a huge role in this Researchers need frameworks and tools to help them build high-performance multi-node models Simple but scalable models can converge faster state-of-the-art single-node models Humans aren’t down for the count – yet!
  • 39. Thank-you – meet us at booth #1508 this week! Mark O’Connor mark@allinea.com | @yieldthought | [ github link ]