GTC 2016 — China
THE DEEP LEARNING
AI REVOLUTION
2
GPU DEEP LEARNING BIG BANG
Deep Learning NVIDIA GPU
NIPS (2012)
ImageNet Classification with Deep Convolutional
Neural Networks
Alex Krizhevsky
University of Toronto
Ilya Sutskever
University of Toronto
Geoffrey e. Hinton
University of Toronto
3
74%
96%
2010 2011 2012 2013 2014 2015
DL
GPU DEEP LEARNING ACHIEVES
“SUPERHUMAN” RESULTS
2012: Deep Learning researchers
worldwide discover GPUs
2015: DNN achieves
superhuman image recognition
2015: Deep Speech 2 achieves
superhuman voice recognition
ImageNet — Accuracy %
Human
Hand-coded CV
Microsoft,
Google
3.5% error rate
4
NVIDIA — “THE AI COMPUTING COMPANY”
GPU Computing Computer Graphics Artificial Intelligence
5
ANNOUNCING NEW GRAPHICS SDKS
Funhouse VR
Open Source
360 Video 1.0
Real-Time Panoramic VR
Iray VR
Photorealistic
VR Ray Tracing
GVDB
Sparse Volumes for
Special Effects
Remote Rendering
Video Compositing
Ansel
In-game Photography
Volumetric
Physical Light Models
OptiX 4.0
Multi-GPU Ray-Tracing
MDL 1.0
Physically Based Materials
Mental Ray
Now GPU-Accelerated!
6
7
NVIDIA VR FUNHOUSE
8
NVIDIA SILICON VALLEY HEADQUARTERS
9
GTC — 25X GROWTH IN GPU DL DEVELOPERS
4X Attendees 3X GPU Developers 25x Deep Learning Developers
2014
55,000400,00016,000
2,200
120,000
3,700
• Australia
• China
• Europe
• India
• Japan
• Korea
• United States
(Silicon Valley, D.C.)
20162014 2016
• Japan
• United States
• Higher Ed 35%
• Software 19%
• Internet 15%
• Auto 10%
• Government 5%
• Medical 4%
• Finance 4%
• Manufacturing 4%
2014 2016
10
WHY DID AI RESEARCHERS
ADOPT GPUs FOR DEEP LEARNING?
11
BRAIN IS LIKE A GPU
BRAIN CREATES MENTAL IMAGES WHEN WE THINK
12
GPU IS LIKE A BRAIN
13
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
Training
Device
Datacenter
14
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
Training
Device
Datacenter
TRAINING
Billions of Trillions of Operations
GPU train larger models,
accelerate time to market
15
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
Training
Device
Datacenter
DATACENTER INFERENCING
10s of billions of image, voice, video
queries per day
GPU inference for fast response,
maximize datacenter throughput
16
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
Training
Device
Datacenter
DEVICE INFERENCING
Billions of intelligent devices
GPU for real-time accurate response
17
AI — THE ULTIMATE
COMPUTING CHALLENGE
IMAGE RECOGNITION SPEECH RECOGNITION
Important Property of Neural Networks
Results get better with
more data +
bigger models +
more computation
(Better algorithms, new insights and
improved techniques always help, too!)
2012
AlexNet
2015
ResNet
152 layers
22.6 GFLOP/image
~3.5% error
8 layers
1.4 GFLOP/image
~16% Error
16X
Model
2014
Deep Speech 1
2015
Deep Speech 2
2 ExaFLOPS
25M | 7,000 Hours
~8% Error
10X
Training Ops
20 ExaFLOPS
100M | 12,000 Hours
~5% Error
18
PASCAL “5 MIRACLES”
BOOST DEEP LEARNING 65X
Pascal — 5 Miracles NVIDIA DGX-1 Supercomputer 65X in 4 yrs Accelerate Every Framework
PaddlePaddle
Baidu Deep Learning
Pascal
16nm FinFET
CoWoS HBM2
NVLink
cuDNN
Chart: Relative speed-up of images/sec vs K40 in 2013. AlexNet training throughput based on 20 iterations. CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04. M40 datapoint: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled.
Kepler
Maxwell
Pascal
X
10X
20X
30X
40X
50X
60X
70X
2013 2014 2015 2016
19
ANNOUNCING
NEW IBM SERVER
POWER8 + NVIDIA TESLA P100
FOR THE AI ENTERPRISE
“ Putting NVIDIA’s technology into the IBM system will speed
up performance for such emerging workloads as AI, deep
learning and data analytics.” — eWeek
20
Andrew Ng, Chief Scientist
21
Training
Device
Datacenter
22
ANNOUNCING
TESLA P4 & P40
INFERENCING ACCELERATORS
Pascal Architecture | INT8
P40: 250W | 40X Energy Efficient versus CPU
P40: 250W | 40X Performance versus CPU
23
ANNOUNCING
TensorRT
PERFORMANCE OPTIMIZING
INFERENCING ENGINE
FP32, FP16, INT8 | Vertical & Horizontal Fusion | Auto-Tuning
VGG, GoogLeNet, ResNet, AlexNet & Custom Layers
Available Today: developer.nvidia.com/tensorrt
24
25
26
NVIDIA GPU
DEEP LEARNING EVERYWHERE
Alibaba/Aliyun
iQIYI
Shazam
Amazon
JD.com
Skype
Facebook
Orange
Twitter
Flickr
Periscope
Yahoo Supermarket
Google
Pinterest
Yandex
iFLYTEK
Qihoo 360
Yelp
eBay
Tencent
Netflix
Baidu
Sogou
Microsoft
27
>1,500 AI STARTUPS AROUND THE WORLD
Deep Learning
for Cybersecurity
Deep Learning
for Genomics
Deep Learning
for Self-Driving Cars
Deep Learning
for Art
28
AI STARTUPS IN CHINA
Weather & Environment
Forecast
Eye-tracking for Human-
machine Interaction
Medical
Imaging
Face
Recognition
Product Recognition,
Detection, Search
Personal
Concierge App
29
Training
Device
Datacenter
30
“BILLIONS OF INTELLIGENT DEVICES”
“Billions of intelligent devices will take advantage of DNNs
to provide personalization and localization as GPUs
become faster and faster over the next several years.”
— Tractica
31
AI CITY — 1B CAMERAS BY 2020
~1 billion cameras worldwide by 2020
 30 billion inferences/sec
Tesla P40: 2,500 inferences/sec @ 720P
 AI City needs ~10M P40 servers
DATA: 1B cameras, IHS “Video Surveillance Intelligence Service, Aug. 2016”
32
1/20TH
THE SPACE,
1/10TH
THE POWER
Hikvision Blade
16 Jetson TX1s
NVIDIA DGX-1 Traditional Server Hikvision Blade
~21 1U Servers
42 CPUs
~4,000 W
1 Hikvision Blade
16 TX1 + 1 CPU
>8 1080 streams
~300 W
33
ANNOUNCING NVIDIA AI CITY PARTNERS
34
AI TRANSPORTATION — $10T INDUSTRY
PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI
DEEP LEARNING
35
FREE SPACE DETECTION CAR 3D DETECTION
36
NVIDIA BB8 AI CAR
37
NVIDIA DRIVE PX 2
AutoCruise to Full Autonomy — One Architecture
Full Autonomy
AutoChauffeur
AutoCruise
AUTONOMOUS DRIVING
Perception, Reasoning, Driving
AI Supercomputing, AI Algorithms, Software
Scalable Architecture
38
NVIDIA DRIVE PX 2
AUTOCRUISE
10W AI Car Computer | Passive Cooling | Automotive IO
AI Highway Driving | Localization & Mapping
39
NVIDIA & BAIDU
PARTNER ON AI SELF-DRIVING CARS
40
NVIDIA AI SELF-DRIVING CARS
IN DEVELOPMENT
Baidu nuTonomy Volvo WEpods NVIDIA
41
NVIDIA END-TO-END
DEEP LEARNING PLATFORM
TRAINING
PaddlePaddle
Baidu Deep Learning
DGX-1TESLA P100
42
NVIDIA END-TO-END
DEEP LEARNING PLATFORM
TRAINING
PaddlePaddle
Baidu Deep Learning
DGX-1TESLA P100
DATACENTER INFERENCING
ANNOUNCING TESLA P4 & P40
ANNOUNCING
TensorRT
43
NVIDIA END-TO-END
DEEP LEARNING PLATFORM
TRAINING
PaddlePaddle
Baidu Deep Learning
DGX-1TESLA P100
DATACENTER INFERENCING
ANNOUNCING TESLA P4 & P40
ANNOUNCING
TensorRT
CUDA
JETPACK DRIVEWORKS
JETSON TX1
ANNOUNCING
DRIVE PX 2 AUTOCRUISE
INTELLIGENT DEVICES
44
NVIDIA DEEP LEARNING
PLATFORM PARTNERS
AI ENTERPRISE AI CITY AI CAR
45
AI FOR EVERYONE
AI will Revolutionize Transportation AI will Revolutionize Healthcare AI will Revolutionize Society
GTC China 2016

GTC China 2016

  • 1.
    GTC 2016 —China THE DEEP LEARNING AI REVOLUTION
  • 2.
    2 GPU DEEP LEARNINGBIG BANG Deep Learning NVIDIA GPU NIPS (2012) ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky University of Toronto Ilya Sutskever University of Toronto Geoffrey e. Hinton University of Toronto
  • 3.
    3 74% 96% 2010 2011 20122013 2014 2015 DL GPU DEEP LEARNING ACHIEVES “SUPERHUMAN” RESULTS 2012: Deep Learning researchers worldwide discover GPUs 2015: DNN achieves superhuman image recognition 2015: Deep Speech 2 achieves superhuman voice recognition ImageNet — Accuracy % Human Hand-coded CV Microsoft, Google 3.5% error rate
  • 4.
    4 NVIDIA — “THEAI COMPUTING COMPANY” GPU Computing Computer Graphics Artificial Intelligence
  • 5.
    5 ANNOUNCING NEW GRAPHICSSDKS Funhouse VR Open Source 360 Video 1.0 Real-Time Panoramic VR Iray VR Photorealistic VR Ray Tracing GVDB Sparse Volumes for Special Effects Remote Rendering Video Compositing Ansel In-game Photography Volumetric Physical Light Models OptiX 4.0 Multi-GPU Ray-Tracing MDL 1.0 Physically Based Materials Mental Ray Now GPU-Accelerated!
  • 6.
  • 7.
  • 8.
  • 9.
    9 GTC — 25XGROWTH IN GPU DL DEVELOPERS 4X Attendees 3X GPU Developers 25x Deep Learning Developers 2014 55,000400,00016,000 2,200 120,000 3,700 • Australia • China • Europe • India • Japan • Korea • United States (Silicon Valley, D.C.) 20162014 2016 • Japan • United States • Higher Ed 35% • Software 19% • Internet 15% • Auto 10% • Government 5% • Medical 4% • Finance 4% • Manufacturing 4% 2014 2016
  • 10.
    10 WHY DID AIRESEARCHERS ADOPT GPUs FOR DEEP LEARNING?
  • 11.
    11 BRAIN IS LIKEA GPU BRAIN CREATES MENTAL IMAGES WHEN WE THINK
  • 12.
  • 13.
    13 GPU DEEP LEARNING ISA NEW COMPUTING MODEL Training Device Datacenter
  • 14.
    14 GPU DEEP LEARNING ISA NEW COMPUTING MODEL Training Device Datacenter TRAINING Billions of Trillions of Operations GPU train larger models, accelerate time to market
  • 15.
    15 GPU DEEP LEARNING ISA NEW COMPUTING MODEL Training Device Datacenter DATACENTER INFERENCING 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize datacenter throughput
  • 16.
    16 GPU DEEP LEARNING ISA NEW COMPUTING MODEL Training Device Datacenter DEVICE INFERENCING Billions of intelligent devices GPU for real-time accurate response
  • 17.
    17 AI — THEULTIMATE COMPUTING CHALLENGE IMAGE RECOGNITION SPEECH RECOGNITION Important Property of Neural Networks Results get better with more data + bigger models + more computation (Better algorithms, new insights and improved techniques always help, too!) 2012 AlexNet 2015 ResNet 152 layers 22.6 GFLOP/image ~3.5% error 8 layers 1.4 GFLOP/image ~16% Error 16X Model 2014 Deep Speech 1 2015 Deep Speech 2 2 ExaFLOPS 25M | 7,000 Hours ~8% Error 10X Training Ops 20 ExaFLOPS 100M | 12,000 Hours ~5% Error
  • 18.
    18 PASCAL “5 MIRACLES” BOOSTDEEP LEARNING 65X Pascal — 5 Miracles NVIDIA DGX-1 Supercomputer 65X in 4 yrs Accelerate Every Framework PaddlePaddle Baidu Deep Learning Pascal 16nm FinFET CoWoS HBM2 NVLink cuDNN Chart: Relative speed-up of images/sec vs K40 in 2013. AlexNet training throughput based on 20 iterations. CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04. M40 datapoint: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled. Kepler Maxwell Pascal X 10X 20X 30X 40X 50X 60X 70X 2013 2014 2015 2016
  • 19.
    19 ANNOUNCING NEW IBM SERVER POWER8+ NVIDIA TESLA P100 FOR THE AI ENTERPRISE “ Putting NVIDIA’s technology into the IBM system will speed up performance for such emerging workloads as AI, deep learning and data analytics.” — eWeek
  • 20.
  • 21.
  • 22.
    22 ANNOUNCING TESLA P4 &P40 INFERENCING ACCELERATORS Pascal Architecture | INT8 P40: 250W | 40X Energy Efficient versus CPU P40: 250W | 40X Performance versus CPU
  • 23.
    23 ANNOUNCING TensorRT PERFORMANCE OPTIMIZING INFERENCING ENGINE FP32,FP16, INT8 | Vertical & Horizontal Fusion | Auto-Tuning VGG, GoogLeNet, ResNet, AlexNet & Custom Layers Available Today: developer.nvidia.com/tensorrt
  • 24.
  • 25.
  • 26.
    26 NVIDIA GPU DEEP LEARNINGEVERYWHERE Alibaba/Aliyun iQIYI Shazam Amazon JD.com Skype Facebook Orange Twitter Flickr Periscope Yahoo Supermarket Google Pinterest Yandex iFLYTEK Qihoo 360 Yelp eBay Tencent Netflix Baidu Sogou Microsoft
  • 27.
    27 >1,500 AI STARTUPSAROUND THE WORLD Deep Learning for Cybersecurity Deep Learning for Genomics Deep Learning for Self-Driving Cars Deep Learning for Art
  • 28.
    28 AI STARTUPS INCHINA Weather & Environment Forecast Eye-tracking for Human- machine Interaction Medical Imaging Face Recognition Product Recognition, Detection, Search Personal Concierge App
  • 29.
  • 30.
    30 “BILLIONS OF INTELLIGENTDEVICES” “Billions of intelligent devices will take advantage of DNNs to provide personalization and localization as GPUs become faster and faster over the next several years.” — Tractica
  • 31.
    31 AI CITY —1B CAMERAS BY 2020 ~1 billion cameras worldwide by 2020  30 billion inferences/sec Tesla P40: 2,500 inferences/sec @ 720P  AI City needs ~10M P40 servers DATA: 1B cameras, IHS “Video Surveillance Intelligence Service, Aug. 2016”
  • 32.
    32 1/20TH THE SPACE, 1/10TH THE POWER HikvisionBlade 16 Jetson TX1s NVIDIA DGX-1 Traditional Server Hikvision Blade ~21 1U Servers 42 CPUs ~4,000 W 1 Hikvision Blade 16 TX1 + 1 CPU >8 1080 streams ~300 W
  • 33.
  • 34.
    34 AI TRANSPORTATION —$10T INDUSTRY PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI DEEP LEARNING
  • 35.
    35 FREE SPACE DETECTIONCAR 3D DETECTION
  • 36.
  • 37.
    37 NVIDIA DRIVE PX2 AutoCruise to Full Autonomy — One Architecture Full Autonomy AutoChauffeur AutoCruise AUTONOMOUS DRIVING Perception, Reasoning, Driving AI Supercomputing, AI Algorithms, Software Scalable Architecture
  • 38.
    38 NVIDIA DRIVE PX2 AUTOCRUISE 10W AI Car Computer | Passive Cooling | Automotive IO AI Highway Driving | Localization & Mapping
  • 39.
    39 NVIDIA & BAIDU PARTNERON AI SELF-DRIVING CARS
  • 40.
    40 NVIDIA AI SELF-DRIVINGCARS IN DEVELOPMENT Baidu nuTonomy Volvo WEpods NVIDIA
  • 41.
    41 NVIDIA END-TO-END DEEP LEARNINGPLATFORM TRAINING PaddlePaddle Baidu Deep Learning DGX-1TESLA P100
  • 42.
    42 NVIDIA END-TO-END DEEP LEARNINGPLATFORM TRAINING PaddlePaddle Baidu Deep Learning DGX-1TESLA P100 DATACENTER INFERENCING ANNOUNCING TESLA P4 & P40 ANNOUNCING TensorRT
  • 43.
    43 NVIDIA END-TO-END DEEP LEARNINGPLATFORM TRAINING PaddlePaddle Baidu Deep Learning DGX-1TESLA P100 DATACENTER INFERENCING ANNOUNCING TESLA P4 & P40 ANNOUNCING TensorRT CUDA JETPACK DRIVEWORKS JETSON TX1 ANNOUNCING DRIVE PX 2 AUTOCRUISE INTELLIGENT DEVICES
  • 44.
    44 NVIDIA DEEP LEARNING PLATFORMPARTNERS AI ENTERPRISE AI CITY AI CAR
  • 45.
    45 AI FOR EVERYONE AIwill Revolutionize Transportation AI will Revolutionize Healthcare AI will Revolutionize Society