LEAPS IN VISUAL COMPUTING
JEN-HSUN HUANG, CO-FOUNDER & CEO | GTC 2015
FOUR ANNOUNCEMENTS
A New GPU
and
Deep Learning
A Very Fast Box
and
Deep Learning
Roadmap Reveal
and
Deep Learning
Self-Driving Cars
and
Deep Learning
AMAZING YEAR IN VISUAL COMPUTING
© 2015 Industrial Light & Magic. All Rights Reserved.
10X GROWTH IN GPU COMPUTING
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
2015
3 Million
CUDA Downloads
10X GROWTH IN GPU COMPUTING
2015
3 Million
CUDA Downloads
10X GROWTH IN GPU COMPUTING
319
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
2015
3 Million
CUDA Downloads
800
Universities Teaching
10X GROWTH IN GPU COMPUTING
319
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
2015
3 Million
CUDA Downloads
800
Universities Teaching
10X GROWTH IN GPU COMPUTING
319
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
60,000
Academic Papers
2015
3 Million
CUDA Downloads
800
Universities Teaching
10X GROWTH IN GPU COMPUTING
319
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
60,000
Academic Papers
450,000
Tesla GPUs
2015
3 Million
CUDA Downloads
60,000
Academic Papers
800
Universities Teaching
54,000
Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING
450,000
Tesla GPUs
319
CUDA Apps
2008
150,000
CUDA Downloads
4,000
Academic Papers
60
Universities Teaching
77
Supercomputing Teraflops
6,000
Tesla GPUs
27
CUDA Apps
8 Billion Transistors
3,072 CUDA Cores
7 TFLOPS SP / 0.2 TFLOPS DP
12GB Memory
TITAN X
THE WORLD’S FASTEST GPU
0
1
2
3
4
5
6
7
TITAN X FOR DEEP LEARNING
Training AlexNet
Days
16-core Xeon CPU TITAN TITAN Black
cuDNN
TITAN X
cuDNN
~
43
…
8 Billion Transistors
3,072 CUDA Cores
7 TFLOPS SP / 0.2 TFLOPS DP
12GB Memory
TITAN X
THE WORLD’S FASTEST GPU
$999
FOUR ANNOUNCEMENTS
A New GPU
and
Deep Learning
A Very Fast Box
and
Deep Learning
Roadmap Reveal
and
Deep Learning
Self-Driving Cars
and
Deep Learning
A SHORT HISTORY OF DEEP LEARNING
Convolutional Neural Networks for
Handwritten Digital Recognition
LECUN, BOTTOU, BENGIO, HAFFNER, 1998
ImageNet Classification with NVIDIA GPUs
KRIZHEVSKY, HINTON, ET AL., 2012
1995 2000 2005 2010 2015
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
“Delving Deep into Rectifiers: Surpassing
Human-Level Performance on ImageNet Classification”
— Microsoft: 4.94%, Feb. 6, 2015
“Deep Image: Scaling up Image Recognition”
— Baidu: 5.98%, Jan. 13, 2015
“Batch Normalization: Accelerating Deep Network Training
by Reducing Internal Covariant Shift”
— Google: 4.82%, Feb. 11, 2015
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
THE BIG BANG
DEEP LEARNING
VISUALIZED
GPU-ACCELERATED DEEP LEARNING
START-UPS
Detecting Mitosis in Breast Cancer Cells
— IDSIA
Predicting the Toxicity of New Drugs
— Johannes Kepler University
Understanding Gene Mutation to Prevent Disease
— University of Toronto
DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH
“Automated Image Captioning with
ConvNets and Recurrent Nets”
—Andrej Karpathy, Fei-Fei Li
DIGITS
DEEP GPU TRAINING SYSTEM
FOR DATA SCIENTISTS
Design DNNs
Visualize activations
Manage multiple trainings
USER
INTERFACE
Visualize
Layers
Configure
DNN
Process
Data
GPUGPU HW CloudGPU ClusterMulti-GPU
Theano
Torch
Monitor
Progress
Caffe
cuDNN, cuBLAS
CUDA
Monitor Progress
DIGITS
Configure DNNProcess Data Visualize Layers
Test Image
DIGITS DEVBOX
World’s fastest GPU
Max GPU out of a plug
Multi-GPU training & inference
“I’ve never seen AlexNet
run this fast…TitanX is
a monster, Crazy Fast”
DIGITS DEVBOX — EARLY RESULTS
“DIGITS makes it way easier
to design the best network
for the job”
0x
1x
2x
3x
4x
1 2 4
Multi-GPU scaling on Torch
AlexNet VGG
— Simon Osindero
A.I. Architech
— Soumith Chintala
Research Engineer
DIGITS DEVBOX
Available May 2015
$15,000
FOUR ANNOUNCEMENTS
A New GPU
and
Deep Learning
A Very Fast Box
and
Deep Learning
Roadmap Reveal
and
Deep Learning
Self-Driving Cars
and
Deep Learning
SGEMM/W
2012 20142008 2010 2016
48
36
12
0
24
60
2018
72
Tesla Fermi
Kepler
Maxwell
Pascal
Mixed Precision
3D Memory
NVLink
Volta
GPU ROADMAP
Pascal 2x SGEMM/W
FrameBufferCapacity(GB)
2012 20142008 2010 2016
40
30
10
0
20
50
2018
60
Tesla Fermi
Kepler
Maxwell
Pascal
Mixed Precision
3D Memory
NVLink
Volta
GPU ROADMAP
Pascal 2.7x Memory Capacity
HGEMM/W
2012 20142008 2010 2016
96
72
24
0
48
120
2018
144
Tesla Fermi Kepler
Maxwell
Pascal
Mixed Precision
3D Memory
NVLink
Volta
GPU ROADMAP
Pascal 4x Mixed Precision
STREAMGB/s
2012 20142008 2010 2016
600
450
150
0
300
750
2018
900
Tesla
Fermi
Kepler
Maxwell
Pascal
Mixed Precision
3D Memory
NVLink
Volta
GPU ROADMAP
Pascal 3x Bandwidth
PASCAL 10X MAXWELL
CONVOLUTION FULLY CONNECTED FULLY CONNECTED CONVOLUTION
(compute) (bandwidth) (bandwidth) (compute)
WEIGHT UPDATE
(interconnect)
4x (FP16) 6x 6x 4x 10x
Mixed Precision 3D Memory NVLINK
forward backward
Mixed Precision3D Memory
5x 2x
* Very rough estimates
FOUR ANNOUNCEMENTS
A New GPU
and
Deep Learning
A Very Fast Box
and
Deep Learning
Roadmap Reveal
and
Deep Learning
Self-Driving Cars
and
Deep Learning
TODAY’S ADAS
PLAN ACT
CPU
WARN
FPGA
CV ASIC
SENSE
BRAKE
NEXT-GENERATION ADAS
PLAN ACT
CPU
WARN
FPGA
CV ASIC
SENSE
BRAKE
STEER
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
DNN-based self-driving robot
Training data by human driver
No hand-coded CV algorithms
PROJECT LEADS
Urs Muller: Chief Architect,
Autonomous Driving, NVIDIA
Yann LeCun: Director,
AI Research, Facebook
PROJECT DAVE — DARPA AUTONOMOUS VEHICLE
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
DAVE IN ACTION
TRAINING DATA
225K Images
TEST DRIVE
No Training
TEST DRIVE
Partially Trained (52K images)
TEST DRIVE
Fully Trained (225K images)
3,000x
Faster
DAVE
AlexNet on
DRIVE PX
3.1 Million
12
38 Million
630 Million
184
116 Billion
Number of Connections
Frames / Second
Connections / Second
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER
PLAN ACT
CPU
WARN
FPGA
CV ASIC
DNN
SENSE
BRAKE
STEER
IMAGENET CHALLENGE
Accuracy %
2010 201420122011 2013
74%
84%
DNN
CV
72%
ACCELERATE
NVIDIA DRIVE™ PX
SELF-DRIVING CAR COMPUTER
Available May 2015
$10,000
ELON MUSK
LEAPS IN VISUAL COMPUTING
TITAN X
The World’s Fastest GPU
DIGITS DevBox
GPU Deep Learning Platform
Pascal — 10x Maxwell
For Deep Learning
NVIDIA DRIVE PX
Deep Learning Platform for Self-Driving Cars
Opening Keynote at GTC 2015: Leaps in Visual Computing

Opening Keynote at GTC 2015: Leaps in Visual Computing

  • 1.
    LEAPS IN VISUALCOMPUTING JEN-HSUN HUANG, CO-FOUNDER & CEO | GTC 2015
  • 2.
    FOUR ANNOUNCEMENTS A NewGPU and Deep Learning A Very Fast Box and Deep Learning Roadmap Reveal and Deep Learning Self-Driving Cars and Deep Learning
  • 3.
    AMAZING YEAR INVISUAL COMPUTING © 2015 Industrial Light & Magic. All Rights Reserved.
  • 4.
    10X GROWTH INGPU COMPUTING 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps
  • 5.
    2008 150,000 CUDA Downloads 4,000 Academic Papers 60 UniversitiesTeaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps 2015 3 Million CUDA Downloads 10X GROWTH IN GPU COMPUTING
  • 6.
    2015 3 Million CUDA Downloads 10XGROWTH IN GPU COMPUTING 319 CUDA Apps 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps
  • 7.
    2015 3 Million CUDA Downloads 800 UniversitiesTeaching 10X GROWTH IN GPU COMPUTING 319 CUDA Apps 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps
  • 8.
    2015 3 Million CUDA Downloads 800 UniversitiesTeaching 10X GROWTH IN GPU COMPUTING 319 CUDA Apps 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps 60,000 Academic Papers
  • 9.
    2015 3 Million CUDA Downloads 800 UniversitiesTeaching 10X GROWTH IN GPU COMPUTING 319 CUDA Apps 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps 60,000 Academic Papers 450,000 Tesla GPUs
  • 10.
    2015 3 Million CUDA Downloads 60,000 AcademicPapers 800 Universities Teaching 54,000 Supercomputing Teraflops 10X GROWTH IN GPU COMPUTING 450,000 Tesla GPUs 319 CUDA Apps 2008 150,000 CUDA Downloads 4,000 Academic Papers 60 Universities Teaching 77 Supercomputing Teraflops 6,000 Tesla GPUs 27 CUDA Apps
  • 12.
    8 Billion Transistors 3,072CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory TITAN X THE WORLD’S FASTEST GPU
  • 14.
    0 1 2 3 4 5 6 7 TITAN X FORDEEP LEARNING Training AlexNet Days 16-core Xeon CPU TITAN TITAN Black cuDNN TITAN X cuDNN ~ 43 …
  • 15.
    8 Billion Transistors 3,072CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory TITAN X THE WORLD’S FASTEST GPU $999
  • 16.
    FOUR ANNOUNCEMENTS A NewGPU and Deep Learning A Very Fast Box and Deep Learning Roadmap Reveal and Deep Learning Self-Driving Cars and Deep Learning
  • 17.
    A SHORT HISTORYOF DEEP LEARNING Convolutional Neural Networks for Handwritten Digital Recognition LECUN, BOTTOU, BENGIO, HAFFNER, 1998 ImageNet Classification with NVIDIA GPUs KRIZHEVSKY, HINTON, ET AL., 2012 1995 2000 2005 2010 2015 Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72%
  • 18.
    “Delving Deep intoRectifiers: Surpassing Human-Level Performance on ImageNet Classification” — Microsoft: 4.94%, Feb. 6, 2015 “Deep Image: Scaling up Image Recognition” — Baidu: 5.98%, Jan. 13, 2015 “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariant Shift” — Google: 4.82%, Feb. 11, 2015 IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72%
  • 19.
  • 20.
  • 21.
  • 22.
    Detecting Mitosis inBreast Cancer Cells — IDSIA Predicting the Toxicity of New Drugs — Johannes Kepler University Understanding Gene Mutation to Prevent Disease — University of Toronto DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH
  • 23.
    “Automated Image Captioningwith ConvNets and Recurrent Nets” —Andrej Karpathy, Fei-Fei Li
  • 24.
    DIGITS DEEP GPU TRAININGSYSTEM FOR DATA SCIENTISTS Design DNNs Visualize activations Manage multiple trainings USER INTERFACE Visualize Layers Configure DNN Process Data GPUGPU HW CloudGPU ClusterMulti-GPU Theano Torch Monitor Progress Caffe cuDNN, cuBLAS CUDA
  • 25.
    Monitor Progress DIGITS Configure DNNProcessData Visualize Layers Test Image
  • 26.
    DIGITS DEVBOX World’s fastestGPU Max GPU out of a plug Multi-GPU training & inference
  • 27.
    “I’ve never seenAlexNet run this fast…TitanX is a monster, Crazy Fast” DIGITS DEVBOX — EARLY RESULTS “DIGITS makes it way easier to design the best network for the job” 0x 1x 2x 3x 4x 1 2 4 Multi-GPU scaling on Torch AlexNet VGG — Simon Osindero A.I. Architech — Soumith Chintala Research Engineer
  • 28.
  • 29.
    FOUR ANNOUNCEMENTS A NewGPU and Deep Learning A Very Fast Box and Deep Learning Roadmap Reveal and Deep Learning Self-Driving Cars and Deep Learning
  • 30.
    SGEMM/W 2012 20142008 20102016 48 36 12 0 24 60 2018 72 Tesla Fermi Kepler Maxwell Pascal Mixed Precision 3D Memory NVLink Volta GPU ROADMAP Pascal 2x SGEMM/W
  • 31.
    FrameBufferCapacity(GB) 2012 20142008 20102016 40 30 10 0 20 50 2018 60 Tesla Fermi Kepler Maxwell Pascal Mixed Precision 3D Memory NVLink Volta GPU ROADMAP Pascal 2.7x Memory Capacity
  • 32.
    HGEMM/W 2012 20142008 20102016 96 72 24 0 48 120 2018 144 Tesla Fermi Kepler Maxwell Pascal Mixed Precision 3D Memory NVLink Volta GPU ROADMAP Pascal 4x Mixed Precision
  • 33.
    STREAMGB/s 2012 20142008 20102016 600 450 150 0 300 750 2018 900 Tesla Fermi Kepler Maxwell Pascal Mixed Precision 3D Memory NVLink Volta GPU ROADMAP Pascal 3x Bandwidth
  • 34.
    PASCAL 10X MAXWELL CONVOLUTIONFULLY CONNECTED FULLY CONNECTED CONVOLUTION (compute) (bandwidth) (bandwidth) (compute) WEIGHT UPDATE (interconnect) 4x (FP16) 6x 6x 4x 10x Mixed Precision 3D Memory NVLINK forward backward Mixed Precision3D Memory 5x 2x * Very rough estimates
  • 35.
    FOUR ANNOUNCEMENTS A NewGPU and Deep Learning A Very Fast Box and Deep Learning Roadmap Reveal and Deep Learning Self-Driving Cars and Deep Learning
  • 36.
  • 37.
    NEXT-GENERATION ADAS PLAN ACT CPU WARN FPGA CVASIC SENSE BRAKE STEER ACCELERATE
  • 38.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 39.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 40.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 41.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 42.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 43.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 44.
    DNN-based self-driving robot Trainingdata by human driver No hand-coded CV algorithms PROJECT LEADS Urs Muller: Chief Architect, Autonomous Driving, NVIDIA Yann LeCun: Director, AI Research, Facebook PROJECT DAVE — DARPA AUTONOMOUS VEHICLE IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72%
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
    3,000x Faster DAVE AlexNet on DRIVE PX 3.1Million 12 38 Million 630 Million 184 116 Billion Number of Connections Frames / Second Connections / Second
  • 51.
    NVIDIA DRIVE PXSELF-DRIVING CAR COMPUTER PLAN ACT CPU WARN FPGA CV ASIC DNN SENSE BRAKE STEER IMAGENET CHALLENGE Accuracy % 2010 201420122011 2013 74% 84% DNN CV 72% ACCELERATE
  • 52.
    NVIDIA DRIVE™ PX SELF-DRIVINGCAR COMPUTER Available May 2015 $10,000
  • 53.
  • 54.
    LEAPS IN VISUALCOMPUTING TITAN X The World’s Fastest GPU DIGITS DevBox GPU Deep Learning Platform Pascal — 10x Maxwell For Deep Learning NVIDIA DRIVE PX Deep Learning Platform for Self-Driving Cars