SlideShare a Scribd company logo
1 of 74
Download to read offline
Bill Dally, Chief Scientist and SVP of Research
January 17, 2017
Deep Learning and HPC
2
A Decade of Scientific Computing with GPUs
2006 2008 2012 20162010 2014
Fermi: World’s
First HPC GPU
Oak Ridge Deploys World’s
Fastest Supercomputer w/ GPUs
World’s First Atomic
Model of HIV Capsid
GPU-Trained AI Machine
Beats World Champion in Go
Stanford Builds AI
Machine using GPUs
World’s First 3-D Mapping
of Human Genome
CUDA Launched
World’s First GPU
Top500 System
Google Outperform
Humans in ImageNet
Discovered How H1N1
Mutates to Resist Drugs
AlexNet beats expert code
by huge margin using GPUs
Stream Processing
@ Stanford
3
GPUs Enable Science
4
18,688 NVIDIA Tesla K20X GPUs
27 Petaflops Peak: 90% of Performance from GPUs
17.59 Petaflops Sustained Performance on Linpack
TITAN
5
U.S. to Build Two Flagship Supercomputers
Pre-Exascale Systems Powered by the Tesla Platform
100-300 PFLOPS Peak
IBM POWER9 CPU + NVIDIA Volta GPU
NVLink High Speed Interconnect
40 TFLOPS per Node, >3,400 Nodes
2017
Summit & Sierra Supercomputers
6
Fastest AI Supercomputer in TOP500
4.9 Petaflops Peak FP64 Performance
19.6 Petaflops DL FP16 Performance
124 NVIDIA DGX-1 Server Nodes
Most Energy Efficient Supercomputer
#1 on Green500 List
9.5 GFLOPS per Watt
2x More Efficient than Xeon Phi System
13 DGX-1 Servers in Top500
38 DGX-1 Servers for Petascale supercomputer
55x less servers, 12x less power vs CPU-only
supercomputer of similar performance
DGX SATURNV
World’s Most Efficient AI Supercomputer
FACTOIDS
7
EXASCALE APPLICATIONS ON SATURNV
Gflop/s
0
5,000
10,000
15,000
20,000
25,000
0 18 36 54 72 90 108 126 144
# of CPU Nodes
(in SuperMUC Supercomputer)
1x DGX-1: 8K Gflop/s
2x DGX-1: 15K Gflop/s
4x DGX-1: 20K Gflop/s
2K Gflop/s
3K Gflop/s
5K Gflop/s
7K Gflop/s
LQCD- Higher Energy Physics
SATURNV DGX Servers vs SuperMUC Supercomputer
QUDA version 0.9beta, using double-half mixed precision
DDalphaAMG using double-single
# of CPU Servers to Match
Performance of SATURNV
2,300
CPU Servers
S3D: Discovering New Fuel for Engines
3,800
CPU Servers
SPECFEM3D: Simulating Earthquakes
8
Exascale
System
Sketch
9
10
GPUs Enable Deep Learning
11
GPUs + Data + DNNs
12
74%
96%
2010 2011 2012 2013 2014 2015
Deep Learning
THE STAGE IS SET FOR THE AI REVOLUTION
2012: Deep Learning researchers
worldwide discover GPUs
2015: ImageNet — Deep Learning achieves
superhuman image recognition
2016: Microsoft’s Deep Learning system
achieves new milestone in speech recognition
Human
Hand-coded CV
Microsoft, Google
3.5% error rate
Microsoft
09/13/16
“The Microsoft 2016 Conversational Speech Recognition
System.” W. Xiong, J. Droppo, X. Huang, F. Seide, M.
Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016
13
A New era of computing
PC INTERNET
AI & INTELLIGENT DEVICES
MOBILE-CLOUD
14
Deep Learning Explodes at Google
Android apps
Drug discovery
Gmail
Image understanding
Maps
Natural language understanding
Photos
Robotics research
Speech
Translation
YouTube
Jeff Dean's talk at TiECon, May 7, 2016
15
Deep Learning Everywhere
INTERNET & CLOUD
Image Classification
Speech Recognition
Language Translation
Language Processing
Sentiment Analysis
Recommendation
MEDIA & ENTERTAINMENT
Video Captioning
Video Search
Real Time Translation
AUTONOMOUS MACHINES
Pedestrian Detection
Lane Tracking
Recognize Traffic Sign
SECURITY & DEFENSE
Face Detection
Video Surveillance
Satellite Imagery
MEDICINE & BIOLOGY
Cancer Cell Detection
Diabetic Grading
Drug Discovery
16
Now “Superhuman” at Many Tasks
Speech recognition
Image classification and detection
Face recognition
Playing Atari games
Playing Go
17
Deep Learning Enables Science
18
Deep learning enables SCIENCE
Classify Satellite Images for
Carbon Monitoring
Analyze Obituaries on the Web for
Cancer-related Discoveries
Determine Drug Treatments to Increase
Child’s Chance of Survival
NASA AMES
19
ML Filters “events”
from the Atlas
detector at the LHC
600M events/sec
Cranmer - NIPS 2016 Keynote
20
Using ML to Approximate Fluid Dynamics
“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf
“… Implementation led to a speed-up of one to three orders of magnitude
compared to the state-of-the-art position-based fluid solver and runs in
real-time for systems with up to 2 million particles”
21
Tompson et al. “Accelerating Eulerian Fluid Simulation With Convolutional Networks,”
arXiv preprint, 2016
Fluid Simulation with CNNs
22
Using ML to Approximate Schrodinger Equation
“Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning”, Rupp et al., Physical Letters
“For larger training sets, N >= 1000, the accuracy of the
ML model becomes competitive with mean-field
electronic structure theory—at a fraction of
the computational cost.”
23
Deep Learning has an insatiable demand for
computing performance
24
GPUs enabled Deep Learning
25
GPUs now Gate DL Progress
IMAGE RECOGNITION SPEECH RECOGNITION
Important Property of Neural Networks
Results get better with
more data +
bigger models +
more computation
(Better algorithms, new insights and
improved techniques always help, too!)
2012
AlexNet
2015
ResNet
152 layers
22.6 GFLOP
~3.5% error
8 layers
1.4 GFLOP
~16% Error
16X
Model
2014
Deep Speech 1
2015
Deep Speech 2
80 GFLOP
7,000 hrs of Data
~8% Error
10X
Training Ops
465 GFLOP
12,000 hrs of Data
~5% Error
26
Pascal “5 Miracles”
Boost Deep Learning 65X
Pascal — 5 Miracles NVIDIA DGX-1 Supercomputer 65X in 4 yrs Accelerate Every Framework
PaddlePaddle
Baidu Deep Learning
Pascal
16nm FinFET
CoWoS HBM2
NVLink
cuDNN
Chart: Relative speed-up of images/sec vs K40 in 2013. AlexNet training throughput based on 20 iterations. CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04. M40 datapoint: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled.
Kepler
Maxwell
Pascal
X
10X
20X
30X
40X
50X
60X
70X
2013 2014 2015 2016
27
Pascal GP100
10 TeraFLOPS FP32
20 TeraFLOPS FP16
16GB HBM – 750GB/s
300W TDP
67GFLOPS/W (FP16)
16nm process
160GB/s NV Link
Power Regulation
HBM Stacks
GPU Chip
Backplane Connectors
28
TESLA P4 & P40
INFERENCING ACCELERATORS
Pascal Architecture | INT8
P40: 250W | 40X Energy Efficient versus CPU
P40: 250W | 40X Performance versus CPU
29
TensorRT
PERFORMANCE OPTIMIZING
INFERENCING ENGINE
FP32, FP16, INT8 | Vertical & Horizontal Fusion | Auto-Tuning
VGG, GoogLeNet, ResNet, AlexNet & Custom Layers
Available Today: developer.nvidia.com/tensorrt
30
NVLINK enables scalability
31
NVLINK – Enables Fast Interconnect, PGAS Memory
GPU
Memory
System Interconnect
GPU
Memory
NVLINK
32NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA DGX-1
WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
170 TFLOPS
8x Tesla P100 16GB
NVLink Hybrid Cube Mesh
Optimized Deep Learning Software
Dual Xeon
7 TB SSD Deep Learning Cache
Dual 10GbE, Quad IB 100Gb
3RU – 3200W
33
Training Datacenter
Intelligent Devices
34
“Billions of INTELLIGENT devices”
“Billions of intelligent devices will take advantage of DNNs
to provide personalization and localization as GPUs
become faster and faster over the next several years.”
— Tractica
35
JETSON TX1
EMBEDDED AI SUPERCOMPUTER
10W | 1 TF FP16 | >20 images/sec/W
36
INTRODUCING
XAVIER
AI SUPERCOMPUTER SOC
7 Billion Transistors 16nm FF
8 Core Custom ARM64 CPU
512 Core Volta GPU
New Computer Vision Accelerator
Dual 8K HDR Video Processors
Designed for ASIL C Functional Safety
20 TOPS DL
160 SPECINT
20W
37
AI TRANSPORTATION — $10T INDUSTRY
PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI
DEEP LEARNING
38
NVIDIA DRIVE PX 2
AutoCruise to Full Autonomy — One Architecture
Full Autonomy
AutoChauffeur
AutoCruise
AUTONOMOUS DRIVING
Perception, Reasoning, Driving
AI Supercomputing, AI Algorithms, Software
Scalable Architecture
39
ANNOUNCING Driveworks alpha 1
OS FOR SELF-DRIVING CARS
DRIVEWORKS
PilotNet
OpenRoadNet
DriveNet
Localization
Path Planning
Traffic Prediction
Action Engine
Occupancy Grid
40
NVIDIA BB8 AI CAR
41
Nvidia AI self-driving cars
in development
Baidu nuTonomy Volvo WEpodsTomTom
42
NVAIL
43
AI Pioneers Pushing state-of-the-art
Reasoning, Attention, Memory — Long-term memory for NN
End-to-end training for autonomous flight and driving
Generic agents — Understand and predict behavior
RNN for long-term dependencies & multiple time scales
Unsupervised Learning — Generative Models
Deep reinforcement learning for autonomous AI agents
Reinforcement learning — Hierarchical and multi-agent
Semantic 3D reconstruction
44
Yasuo Kuniyoshi
Professor, School of Info Sci & Tech
Director, AI Center (Next Generation Intelligence Science Research Center)
The University of Tokyo
45
Challenge:
Provide Continued Performance
Improvement
46
But Moore’s Law is Over
C Moore, Data Processing in ExaScale-ClassComputer Systems, Salishan, April 2011
Its not about the FLOPs
16nm chip, 10mm on a side, 200W
DFMA 0.01mm2 10pJ/OP – 2GFLOPs
A chip with 104 FPUs:
100mm2
200W
20TFLOPS
Pack 50,000 of these in racks
1EFLOPS
10MW
Overhead
Locality
CPU
126 pJ/flop (SP)
Optimized for Latency
Deep Cache Hierarchy
Broadwell E5 v4
14 nm
GPU
28 pJ/flop (SP)
Optimized for Throughput
Explicit Management
of On-chip Memory
Pascal
16 nm
Fixed-Function Logic is Even More
Efficient
Energy/Op
CPU (scalar) 1.7nJ
GPU 30pJ
Fixed-Function 3pJ
How is Power Spent in a CPU?
In-order Embedded OOO Hi-perf
Clock + Control Logic
24%
Data Supply
17%
Instruction Supply
42%
Register File
11%
ALU 6% Clock + Pins
45%
ALU
4%
Fetch
11%
Rename
10%
Issue
11%
RF
14%
Data
Supply
5%
Dally [2008] (Embedded in-order CPU) Natarajan [2003] (Alpha 21264)
Overhead
985pJ
Payload
Arithmetic
15pJ
534/11/11Milad Mohammadi 53
54
ORF ORFORF
LS/BRFP/IntFP/Int
To LD/ST
L0Addr
L1Addr
Net
LM
Bank
0
To LD/ST
LM
Bank
3
RF
L0Addr
L1Addr
Net
RF
Net
Data
Path
L0
I$
ThreadPCs
Active
PCs
Inst
Control
Path
Scheduler
64 threads
4 active threads
2 DFMAs (4 FLOPS/clock)
ORF bank: 16 entries (128 Bytes)
L0 I$: 64 instructions (1KByte)
LM Bank: 8KB (32KB total)
Simpler Cores
= Energy Efficiency
Source: Azizi [PhD 2010]
Overhead
15pJ
Payload
Arithmetic
15pJ
64-bit DP
20pJ 26 pJ 256 pJ
1 nJ
500 pJ Efficient
off-chip link
256-bit buses
16 nJ
DRAM
Rd/Wr
256-bit access
8 kB SRAM 50 pJ
20mm
Communication Dominates Arithmetic
28nm CMOS
Processor Technology 40 nm 10nm
Vdd (nominal) 0.9 V 0.7 V
DFMA energy 50 pJ 7.6 pJ
64b 8 KB SRAM Rd 14 pJ 2.1 pJ
Wire energy (256 bits, 10mm) 310 pJ 174 pJ
Memory Technology 45 nm 16nm
DRAM interface pin bandwidth 4 Gbps 50 Gbps
DRAM interface energy 20-30 pJ/bit 2 pJ/bit
DRAM access energy 8-15 pJ/bit 2.5 pJ/bit
Keckler [Micro 2011], Vogelsang [Micro 2010]
Energy Shopping List
FP Op lower bound
=
4 pJ
GRS Test Chips
Probe Station
Test Chip #1 on Board
Test Chip #2 fabricated on production GPU
Eye Diagram from Probe Poulton et al. ISSCC 2013, JSSCC Dec 2013
Efficient Machines
Are Highly Parallel
Have Deep Storage Hierarchies
Have Heterogeneous Processors
62
Target Independent Programming
63
Programmers, tools, and architecture
Need to play their positions
Programmer
ArchitectureTools
forall molecule in set { // launch a thread array
forall neighbor in molecule.neighbors { //
forall force in forces { // doubly nested
molecule.force =
reduce_sum(force(molecule, neighbor))
}
}
}
Map foralls in time and space
Map molecules across memories
Stage data up/down hierarchy
Select mechanisms
Exposed storage hierarchy
Fast comm/sync/thread mechanisms
64
Target-
Independent
Source
Mapping
Tools
Target-
Dependent
Executable
Profiling &
Visualization
Mapping
Directives
Legion Programming Model
Separating program logic from machine mapping
Legion
Program
Legion
Runtime
Legion
Mapper
Target-independent specification
Task decomposition
Data description
Compute target-specific mapping
Placement of data
Placement of tasks
Schedule
66
The Legion Data Model: Logical Regions
Main idea: logical regions
- Describe data abstractly
- Relational data model
- No implied layout
- No implied placement
Sophisticated partitioning mechanism
- Multiple views onto data
Capture important data properties
- Locality
- Independence/aliasing
SP
p1 pn… s1 sn… g1 gn…
N
Field Space
Index Space
(Unstructured,
1-D, 2-D, N-D)
The Legion Programming Model
Computations expressed as tasks
- Declare logical region usage
- Declare field usage
- Describe privileges:
read-only, read-write, reduce
Tasks specified in sequential order
Legion infers implicit parallelism
Programs are machine-independent
- Tasks decouple computation
- Logical regions decouple
data
calc_currents(piece[0], , , );
calc_currents(piece[1], , , );
distribute_charge(piece[0], , , );
distribute_charge(piece[1], , , );
p0
p1 s1
s0 g0
g1
p0 s0 g0
p1 s1 g1
p1 pn… s1 sn… g1 gn…
68
Legion Runtime System
Functionally correct
application code
Mapping to target
machine
Extraction of parallelism
Management of
data transfers
Task scheduling and
Latency hiding
Data-Dependent
Behavior
Compiler/Runtime
understanding of
data
Legion Applications
with Tasks and Logical
Regions
Legion Mappers for
specific machines
Legion Runtime
understanding of
logical regions
Evaluation with a Real App: S3D
Evaluation with a production-grade combustion simulation
Ported more than 100K lines of MPI Fortran to Legion C++
Legion enabled new chemistry: Primary Reference Fuel (PRF) mechanism
Ran on two of the world’s top 10 supercomputers for 1 month
- Titan (#2) and Piz-Daint (#10)
Performance Results: Original S3D
Weak scaling compared to vectorized MPI Fortran version of S3D
Achieved up to 6X speedup
Titan Piz-Daint
Performance Results: OpenACC S3D
1.73X
2.85X
Also compared against experimental MPI+OpenACC version
Achieved 1.73 - 2.85X speedup on Titan
Why? Humans are really bad at scheduling complicated applications
72
HPC
Deep
Learning
73
HPC <-> Deep Learning
• HPC has enabled Deep Learning
• Concepts developed in the 1980s - GPUs provided needed performance
• Superhuman performance on many tasks – classification, go, …
• Enabling intelligent devices – including cars
• Deep Learning enables HPC
• Extracting meaning from data
• Replacing models with recognition
• HPC and Deep Learning both need more performance – but Moore’s Law is over
• Reduced overhead
• Efficient communication
• Resulting machines are parallel with deep memory hierarchies
• Target-Independent Programming
NVIDIA Deep Learning Institute 2017 基調講演

More Related Content

What's hot

1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介
1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介
1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介NVIDIA Japan
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2NVIDIA
 
NVIDIA CES 2016 Press Conference
NVIDIA CES 2016 Press ConferenceNVIDIA CES 2016 Press Conference
NVIDIA CES 2016 Press ConferenceNVIDIA
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardNVIDIA
 
GTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteGTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteNVIDIA
 
Opening Keynote at GTC 2015: Leaps in Visual Computing
Opening Keynote at GTC 2015: Leaps in Visual ComputingOpening Keynote at GTC 2015: Leaps in Visual Computing
Opening Keynote at GTC 2015: Leaps in Visual ComputingNVIDIA
 
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 NVIDIA Taiwan
 
GTC 2013 Jen-Hsun Huang Keynote
GTC 2013 Jen-Hsun Huang KeynoteGTC 2013 Jen-Hsun Huang Keynote
GTC 2013 Jen-Hsun Huang KeynoteNVIDIA
 
NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA
 
NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演NVIDIA Japan
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyNVIDIA Taiwan
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer VisionNVIDIA Taiwan
 
1030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.01030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.0NVIDIA Japan
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCINVIDIA Japan
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 
Harnessing AI for the Benefit of All.
Harnessing AI for the Benefit of All.Harnessing AI for the Benefit of All.
Harnessing AI for the Benefit of All.Alison B. Lowndes
 
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...Edge AI and Vision Alliance
 

What's hot (20)

1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介
1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介
1050: 車載用ADAS/自動運転プラットフォームDRIVE PX及びコックピット・プラットフォームDRIVE CXのご紹介
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
 
NVIDIA CES 2016 Press Conference
NVIDIA CES 2016 Press ConferenceNVIDIA CES 2016 Press Conference
NVIDIA CES 2016 Press Conference
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
JETSON : AI at the EDGE
JETSON : AI at the EDGEJETSON : AI at the EDGE
JETSON : AI at the EDGE
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
GTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteGTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang Keynote
 
Opening Keynote at GTC 2015: Leaps in Visual Computing
Opening Keynote at GTC 2015: Leaps in Visual ComputingOpening Keynote at GTC 2015: Leaps in Visual Computing
Opening Keynote at GTC 2015: Leaps in Visual Computing
 
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
 
GTC 2013 Jen-Hsun Huang Keynote
GTC 2013 Jen-Hsun Huang KeynoteGTC 2013 Jen-Hsun Huang Keynote
GTC 2013 Jen-Hsun Huang Keynote
 
NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA Overview 2015
NVIDIA Overview 2015
 
NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR Ready
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer Vision
 
1030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.01030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.0
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
Harnessing AI for the Benefit of All.
Harnessing AI for the Benefit of All.Harnessing AI for the Benefit of All.
Harnessing AI for the Benefit of All.
 
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
 

Similar to NVIDIA Deep Learning Institute 2017 基調講演

Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxgopikahari7
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingC4Media
 
Fueling the AI Revolution with Gaming
Fueling the AI Revolution with GamingFueling the AI Revolution with Gaming
Fueling the AI Revolution with GamingAlison B. Lowndes
 
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Codemotion
 
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BATbern
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsRenee Yao
 
NVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA
 
nVidia Presentation in OpenPOWER workshop Brazil
nVidia Presentation in OpenPOWER workshop BrazilnVidia Presentation in OpenPOWER workshop Brazil
nVidia Presentation in OpenPOWER workshop BrazilGanesan Narayanasamy
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Nvidia why every industry should be thinking about AI today
Nvidia why every industry should be thinking about AI todayNvidia why every industry should be thinking about AI today
Nvidia why every industry should be thinking about AI todayJustin Hayward
 
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Techsylvania
 

Similar to NVIDIA Deep Learning Institute 2017 基調講演 (20)

Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do PetróleoAplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptx
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
Fueling the AI Revolution with Gaming
Fueling the AI Revolution with GamingFueling the AI Revolution with Gaming
Fueling the AI Revolution with Gaming
 
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
 
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
 
The Revolution of Deep Learning
The Revolution of Deep LearningThe Revolution of Deep Learning
The Revolution of Deep Learning
 
NVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPU
 
nVidia Presentation in OpenPOWER workshop Brazil
nVidia Presentation in OpenPOWER workshop BrazilnVidia Presentation in OpenPOWER workshop Brazil
nVidia Presentation in OpenPOWER workshop Brazil
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Nvidia why every industry should be thinking about AI today
Nvidia why every industry should be thinking about AI todayNvidia why every industry should be thinking about AI today
Nvidia why every industry should be thinking about AI today
 
DataArt
DataArtDataArt
DataArt
 
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
 

More from NVIDIA Japan

HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?NVIDIA Japan
 
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA Japan
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情NVIDIA Japan
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdfNVIDIA Japan
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDKNVIDIA Japan
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Japan
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのNVIDIA Japan
 
Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報NVIDIA Japan
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラNVIDIA Japan
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことNVIDIA Japan
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIANVIDIA Japan
 
GTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーGTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーNVIDIA Japan
 
テレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティテレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティNVIDIA Japan
 
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~NVIDIA Japan
 
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×RoboticsエンジニアへのロードマップNVIDIA Japan
 
2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育NVIDIA Japan
 
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育NVIDIA Japan
 
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報NVIDIA Japan
 
Jetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにJetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにNVIDIA Japan
 

More from NVIDIA Japan (20)

HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?
 
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなの
 
Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラ
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIA
 
GTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーGTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリー
 
テレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティテレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティ
 
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
 
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
 
2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育
 
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
 
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
 
Jetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにJetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジに
 

Recently uploaded

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 

Recently uploaded (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 

NVIDIA Deep Learning Institute 2017 基調講演

  • 1. Bill Dally, Chief Scientist and SVP of Research January 17, 2017 Deep Learning and HPC
  • 2. 2 A Decade of Scientific Computing with GPUs 2006 2008 2012 20162010 2014 Fermi: World’s First HPC GPU Oak Ridge Deploys World’s Fastest Supercomputer w/ GPUs World’s First Atomic Model of HIV Capsid GPU-Trained AI Machine Beats World Champion in Go Stanford Builds AI Machine using GPUs World’s First 3-D Mapping of Human Genome CUDA Launched World’s First GPU Top500 System Google Outperform Humans in ImageNet Discovered How H1N1 Mutates to Resist Drugs AlexNet beats expert code by huge margin using GPUs Stream Processing @ Stanford
  • 4. 4 18,688 NVIDIA Tesla K20X GPUs 27 Petaflops Peak: 90% of Performance from GPUs 17.59 Petaflops Sustained Performance on Linpack TITAN
  • 5. 5 U.S. to Build Two Flagship Supercomputers Pre-Exascale Systems Powered by the Tesla Platform 100-300 PFLOPS Peak IBM POWER9 CPU + NVIDIA Volta GPU NVLink High Speed Interconnect 40 TFLOPS per Node, >3,400 Nodes 2017 Summit & Sierra Supercomputers
  • 6. 6 Fastest AI Supercomputer in TOP500 4.9 Petaflops Peak FP64 Performance 19.6 Petaflops DL FP16 Performance 124 NVIDIA DGX-1 Server Nodes Most Energy Efficient Supercomputer #1 on Green500 List 9.5 GFLOPS per Watt 2x More Efficient than Xeon Phi System 13 DGX-1 Servers in Top500 38 DGX-1 Servers for Petascale supercomputer 55x less servers, 12x less power vs CPU-only supercomputer of similar performance DGX SATURNV World’s Most Efficient AI Supercomputer FACTOIDS
  • 7. 7 EXASCALE APPLICATIONS ON SATURNV Gflop/s 0 5,000 10,000 15,000 20,000 25,000 0 18 36 54 72 90 108 126 144 # of CPU Nodes (in SuperMUC Supercomputer) 1x DGX-1: 8K Gflop/s 2x DGX-1: 15K Gflop/s 4x DGX-1: 20K Gflop/s 2K Gflop/s 3K Gflop/s 5K Gflop/s 7K Gflop/s LQCD- Higher Energy Physics SATURNV DGX Servers vs SuperMUC Supercomputer QUDA version 0.9beta, using double-half mixed precision DDalphaAMG using double-single # of CPU Servers to Match Performance of SATURNV 2,300 CPU Servers S3D: Discovering New Fuel for Engines 3,800 CPU Servers SPECFEM3D: Simulating Earthquakes
  • 9. 9
  • 11. 11 GPUs + Data + DNNs
  • 12. 12 74% 96% 2010 2011 2012 2013 2014 2015 Deep Learning THE STAGE IS SET FOR THE AI REVOLUTION 2012: Deep Learning researchers worldwide discover GPUs 2015: ImageNet — Deep Learning achieves superhuman image recognition 2016: Microsoft’s Deep Learning system achieves new milestone in speech recognition Human Hand-coded CV Microsoft, Google 3.5% error rate Microsoft 09/13/16 “The Microsoft 2016 Conversational Speech Recognition System.” W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016
  • 13. 13 A New era of computing PC INTERNET AI & INTELLIGENT DEVICES MOBILE-CLOUD
  • 14. 14 Deep Learning Explodes at Google Android apps Drug discovery Gmail Image understanding Maps Natural language understanding Photos Robotics research Speech Translation YouTube Jeff Dean's talk at TiECon, May 7, 2016
  • 15. 15 Deep Learning Everywhere INTERNET & CLOUD Image Classification Speech Recognition Language Translation Language Processing Sentiment Analysis Recommendation MEDIA & ENTERTAINMENT Video Captioning Video Search Real Time Translation AUTONOMOUS MACHINES Pedestrian Detection Lane Tracking Recognize Traffic Sign SECURITY & DEFENSE Face Detection Video Surveillance Satellite Imagery MEDICINE & BIOLOGY Cancer Cell Detection Diabetic Grading Drug Discovery
  • 16. 16 Now “Superhuman” at Many Tasks Speech recognition Image classification and detection Face recognition Playing Atari games Playing Go
  • 18. 18 Deep learning enables SCIENCE Classify Satellite Images for Carbon Monitoring Analyze Obituaries on the Web for Cancer-related Discoveries Determine Drug Treatments to Increase Child’s Chance of Survival NASA AMES
  • 19. 19 ML Filters “events” from the Atlas detector at the LHC 600M events/sec Cranmer - NIPS 2016 Keynote
  • 20. 20 Using ML to Approximate Fluid Dynamics “Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf “… Implementation led to a speed-up of one to three orders of magnitude compared to the state-of-the-art position-based fluid solver and runs in real-time for systems with up to 2 million particles”
  • 21. 21 Tompson et al. “Accelerating Eulerian Fluid Simulation With Convolutional Networks,” arXiv preprint, 2016 Fluid Simulation with CNNs
  • 22. 22 Using ML to Approximate Schrodinger Equation “Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning”, Rupp et al., Physical Letters “For larger training sets, N >= 1000, the accuracy of the ML model becomes competitive with mean-field electronic structure theory—at a fraction of the computational cost.”
  • 23. 23 Deep Learning has an insatiable demand for computing performance
  • 25. 25 GPUs now Gate DL Progress IMAGE RECOGNITION SPEECH RECOGNITION Important Property of Neural Networks Results get better with more data + bigger models + more computation (Better algorithms, new insights and improved techniques always help, too!) 2012 AlexNet 2015 ResNet 152 layers 22.6 GFLOP ~3.5% error 8 layers 1.4 GFLOP ~16% Error 16X Model 2014 Deep Speech 1 2015 Deep Speech 2 80 GFLOP 7,000 hrs of Data ~8% Error 10X Training Ops 465 GFLOP 12,000 hrs of Data ~5% Error
  • 26. 26 Pascal “5 Miracles” Boost Deep Learning 65X Pascal — 5 Miracles NVIDIA DGX-1 Supercomputer 65X in 4 yrs Accelerate Every Framework PaddlePaddle Baidu Deep Learning Pascal 16nm FinFET CoWoS HBM2 NVLink cuDNN Chart: Relative speed-up of images/sec vs K40 in 2013. AlexNet training throughput based on 20 iterations. CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04. M40 datapoint: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled. Kepler Maxwell Pascal X 10X 20X 30X 40X 50X 60X 70X 2013 2014 2015 2016
  • 27. 27 Pascal GP100 10 TeraFLOPS FP32 20 TeraFLOPS FP16 16GB HBM – 750GB/s 300W TDP 67GFLOPS/W (FP16) 16nm process 160GB/s NV Link Power Regulation HBM Stacks GPU Chip Backplane Connectors
  • 28. 28 TESLA P4 & P40 INFERENCING ACCELERATORS Pascal Architecture | INT8 P40: 250W | 40X Energy Efficient versus CPU P40: 250W | 40X Performance versus CPU
  • 29. 29 TensorRT PERFORMANCE OPTIMIZING INFERENCING ENGINE FP32, FP16, INT8 | Vertical & Horizontal Fusion | Auto-Tuning VGG, GoogLeNet, ResNet, AlexNet & Custom Layers Available Today: developer.nvidia.com/tensorrt
  • 31. 31 NVLINK – Enables Fast Interconnect, PGAS Memory GPU Memory System Interconnect GPU Memory NVLINK
  • 32. 32NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. NVIDIA DGX-1 WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER 170 TFLOPS 8x Tesla P100 16GB NVLink Hybrid Cube Mesh Optimized Deep Learning Software Dual Xeon 7 TB SSD Deep Learning Cache Dual 10GbE, Quad IB 100Gb 3RU – 3200W
  • 34. 34 “Billions of INTELLIGENT devices” “Billions of intelligent devices will take advantage of DNNs to provide personalization and localization as GPUs become faster and faster over the next several years.” — Tractica
  • 35. 35 JETSON TX1 EMBEDDED AI SUPERCOMPUTER 10W | 1 TF FP16 | >20 images/sec/W
  • 36. 36 INTRODUCING XAVIER AI SUPERCOMPUTER SOC 7 Billion Transistors 16nm FF 8 Core Custom ARM64 CPU 512 Core Volta GPU New Computer Vision Accelerator Dual 8K HDR Video Processors Designed for ASIL C Functional Safety 20 TOPS DL 160 SPECINT 20W
  • 37. 37 AI TRANSPORTATION — $10T INDUSTRY PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI DEEP LEARNING
  • 38. 38 NVIDIA DRIVE PX 2 AutoCruise to Full Autonomy — One Architecture Full Autonomy AutoChauffeur AutoCruise AUTONOMOUS DRIVING Perception, Reasoning, Driving AI Supercomputing, AI Algorithms, Software Scalable Architecture
  • 39. 39 ANNOUNCING Driveworks alpha 1 OS FOR SELF-DRIVING CARS DRIVEWORKS PilotNet OpenRoadNet DriveNet Localization Path Planning Traffic Prediction Action Engine Occupancy Grid
  • 41. 41 Nvidia AI self-driving cars in development Baidu nuTonomy Volvo WEpodsTomTom
  • 43. 43 AI Pioneers Pushing state-of-the-art Reasoning, Attention, Memory — Long-term memory for NN End-to-end training for autonomous flight and driving Generic agents — Understand and predict behavior RNN for long-term dependencies & multiple time scales Unsupervised Learning — Generative Models Deep reinforcement learning for autonomous AI agents Reinforcement learning — Hierarchical and multi-agent Semantic 3D reconstruction
  • 44. 44 Yasuo Kuniyoshi Professor, School of Info Sci & Tech Director, AI Center (Next Generation Intelligence Science Research Center) The University of Tokyo
  • 46. 46 But Moore’s Law is Over C Moore, Data Processing in ExaScale-ClassComputer Systems, Salishan, April 2011
  • 47. Its not about the FLOPs 16nm chip, 10mm on a side, 200W DFMA 0.01mm2 10pJ/OP – 2GFLOPs A chip with 104 FPUs: 100mm2 200W 20TFLOPS Pack 50,000 of these in racks 1EFLOPS 10MW
  • 49. CPU 126 pJ/flop (SP) Optimized for Latency Deep Cache Hierarchy Broadwell E5 v4 14 nm GPU 28 pJ/flop (SP) Optimized for Throughput Explicit Management of On-chip Memory Pascal 16 nm
  • 50. Fixed-Function Logic is Even More Efficient Energy/Op CPU (scalar) 1.7nJ GPU 30pJ Fixed-Function 3pJ
  • 51. How is Power Spent in a CPU? In-order Embedded OOO Hi-perf Clock + Control Logic 24% Data Supply 17% Instruction Supply 42% Register File 11% ALU 6% Clock + Pins 45% ALU 4% Fetch 11% Rename 10% Issue 11% RF 14% Data Supply 5% Dally [2008] (Embedded in-order CPU) Natarajan [2003] (Alpha 21264)
  • 54. 54 ORF ORFORF LS/BRFP/IntFP/Int To LD/ST L0Addr L1Addr Net LM Bank 0 To LD/ST LM Bank 3 RF L0Addr L1Addr Net RF Net Data Path L0 I$ ThreadPCs Active PCs Inst Control Path Scheduler 64 threads 4 active threads 2 DFMAs (4 FLOPS/clock) ORF bank: 16 entries (128 Bytes) L0 I$: 64 instructions (1KByte) LM Bank: 8KB (32KB total)
  • 55. Simpler Cores = Energy Efficiency Source: Azizi [PhD 2010]
  • 57. 64-bit DP 20pJ 26 pJ 256 pJ 1 nJ 500 pJ Efficient off-chip link 256-bit buses 16 nJ DRAM Rd/Wr 256-bit access 8 kB SRAM 50 pJ 20mm Communication Dominates Arithmetic 28nm CMOS
  • 58. Processor Technology 40 nm 10nm Vdd (nominal) 0.9 V 0.7 V DFMA energy 50 pJ 7.6 pJ 64b 8 KB SRAM Rd 14 pJ 2.1 pJ Wire energy (256 bits, 10mm) 310 pJ 174 pJ Memory Technology 45 nm 16nm DRAM interface pin bandwidth 4 Gbps 50 Gbps DRAM interface energy 20-30 pJ/bit 2 pJ/bit DRAM access energy 8-15 pJ/bit 2.5 pJ/bit Keckler [Micro 2011], Vogelsang [Micro 2010] Energy Shopping List FP Op lower bound = 4 pJ
  • 59.
  • 60. GRS Test Chips Probe Station Test Chip #1 on Board Test Chip #2 fabricated on production GPU Eye Diagram from Probe Poulton et al. ISSCC 2013, JSSCC Dec 2013
  • 61. Efficient Machines Are Highly Parallel Have Deep Storage Hierarchies Have Heterogeneous Processors
  • 63. 63 Programmers, tools, and architecture Need to play their positions Programmer ArchitectureTools forall molecule in set { // launch a thread array forall neighbor in molecule.neighbors { // forall force in forces { // doubly nested molecule.force = reduce_sum(force(molecule, neighbor)) } } } Map foralls in time and space Map molecules across memories Stage data up/down hierarchy Select mechanisms Exposed storage hierarchy Fast comm/sync/thread mechanisms
  • 65. Legion Programming Model Separating program logic from machine mapping Legion Program Legion Runtime Legion Mapper Target-independent specification Task decomposition Data description Compute target-specific mapping Placement of data Placement of tasks Schedule
  • 66. 66 The Legion Data Model: Logical Regions Main idea: logical regions - Describe data abstractly - Relational data model - No implied layout - No implied placement Sophisticated partitioning mechanism - Multiple views onto data Capture important data properties - Locality - Independence/aliasing SP p1 pn… s1 sn… g1 gn… N Field Space Index Space (Unstructured, 1-D, 2-D, N-D)
  • 67. The Legion Programming Model Computations expressed as tasks - Declare logical region usage - Declare field usage - Describe privileges: read-only, read-write, reduce Tasks specified in sequential order Legion infers implicit parallelism Programs are machine-independent - Tasks decouple computation - Logical regions decouple data calc_currents(piece[0], , , ); calc_currents(piece[1], , , ); distribute_charge(piece[0], , , ); distribute_charge(piece[1], , , ); p0 p1 s1 s0 g0 g1 p0 s0 g0 p1 s1 g1 p1 pn… s1 sn… g1 gn…
  • 68. 68 Legion Runtime System Functionally correct application code Mapping to target machine Extraction of parallelism Management of data transfers Task scheduling and Latency hiding Data-Dependent Behavior Compiler/Runtime understanding of data Legion Applications with Tasks and Logical Regions Legion Mappers for specific machines Legion Runtime understanding of logical regions
  • 69. Evaluation with a Real App: S3D Evaluation with a production-grade combustion simulation Ported more than 100K lines of MPI Fortran to Legion C++ Legion enabled new chemistry: Primary Reference Fuel (PRF) mechanism Ran on two of the world’s top 10 supercomputers for 1 month - Titan (#2) and Piz-Daint (#10)
  • 70. Performance Results: Original S3D Weak scaling compared to vectorized MPI Fortran version of S3D Achieved up to 6X speedup Titan Piz-Daint
  • 71. Performance Results: OpenACC S3D 1.73X 2.85X Also compared against experimental MPI+OpenACC version Achieved 1.73 - 2.85X speedup on Titan Why? Humans are really bad at scheduling complicated applications
  • 73. 73 HPC <-> Deep Learning • HPC has enabled Deep Learning • Concepts developed in the 1980s - GPUs provided needed performance • Superhuman performance on many tasks – classification, go, … • Enabling intelligent devices – including cars • Deep Learning enables HPC • Extracting meaning from data • Replacing models with recognition • HPC and Deep Learning both need more performance – but Moore’s Law is over • Reduced overhead • Efficient communication • Resulting machines are parallel with deep memory hierarchies • Target-Independent Programming