SUPERCOMPUTING IN A CAR
Carlo Nardone, Senior Solution Architect EMEA Enterprise
ISC 2016
2
ENTERPRISE AUTOGAMING DATA CENTERPRO VISUALIZATION
THE WORLD LEADER IN VISUAL COMPUTING
3
IN THE BEGINNING
5
SIMULATION MEANS BETTER PRODUCTS, FASTER
ACTUAL CRASH SIMULATED CRASH
6
THE SELF DRIVING REVOLUTION
Safer Driving New Mobility Services Urban Redesign
7
AUTONOMOUS DRIVING IS HARD
8
Uber Enters the Race
Toyota Invests $1B
in AI Lab
Volvo Drive Me on
Public Roads in 2017
NHTSA: Computer
Counts as Driver
Tesla Model 3:
300K pre-orders
2016: AN AMAZING YEAR
FOR SELF-DRIVING CARS
Audi, BMW, Daimler
Buy HERE
Tesla Model S Auto-pilot
Baidu Enters the Race
Honda, Nissan, Toyota
Team Up
GM Buys Cruise
9
DEEP LEARNING FOR SELF-DRIVING CARS
10
NVIDIA PILOTNET VIDEO Paper on http://arxiv.org/abs/1604.07316
11
THE BIG BANG IN MACHINE LEARNING
DNN GPUBIG DATA
“The GPU is the workhorse of modern A.I.”
12
Image “Volvo XC90”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
WHAT IS DEEP LEARNING?
13
TRAINING VS INFERENCE
DEEP LEARNING EVERYWHERE
NVIDIA Titan X
NVIDIA Jetson
NVIDIA Tesla
NVIDIA DRIVE PX
15NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA DGX-1
WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
170 TFLOPS FP16
8x Tesla P100 16GB
NVLink Hybrid Cube Mesh
Accelerates Major AI Frameworks
Dual Xeon
7 TB SSD Deep Learning Cache
Dual 10GbE, Quad IB 100Gb
3RU – 3200W
16
NVIDIA END-TO-END
AUTONOMOUS DRIVING PLATFORM
NVIDIA DRIVE PX 2NVIDIA DGX-1
NVIDIA DRIVENET
Localization
Planning
Visualization
Perception
DRIVEWORKS
17
NVIDIA DRIVE PX 2
World’s First AI Supercomputer for Self-Driving Cars
12 CPU cores | Pascal GPU | 8 TFLOPS | 24 DL TOPS | 16nm FF | 250W | Liquid Cooled
19
SELF DRIVING COMPUTER I/O
NVIDIA Drive PX 2: 70 Gbps aggregate I/O
DISPLAY
DATA LOGGING
DRIVE TRAIN
POWER TRAIN
Ethernet
GMSL
Ethernet
FlexRay/Ethernet
TEGRA TEGRA
SMART
CAMERAS
CAMERAS
LIDAR
RADAR
CANBus
LVDS
USB/PCIE
20
DRIVE™ PX 2
COMPUTATION ENGINES
24 DL TOPS, 8 TFLOPS, high performance
CPU/GPU complex
21
NVIDIA DRIVE PX SW STACK
A full stack of rich software components
22
GPU INFERENCE ENGINE
High-performance framework makes it easy to
develop GPU-accelerated inference
Production deployment solution for deep learning
inference
Optimized inference for a given trained neural
network and target GPU
Solutions for Hyperscale, ADAS, Embedded
Supports deployment of 32-bit or 16-bit inference
Maximum Performance for Deep Learning Inference
developer.nvidia.com/gpu-inference-engine
GPU Inference Engine for Automotive
Pedestrian
Detection
Lane
Tracking
Traffic Sign
Recognition
---
NVIDIA DRIVE PX 2
ACTIVE LEARNING
Data Scientist Vehicle
Drive PX - Deploy
Model Classification
Detection
Segmentation
DIGITS / Tesla - Train
Network
Solver
Dashboard
24
A COMPLETE DEEP LEARNING PLATFORM
MANAGE TRAIN DEPLOY
DIGITS
DATA CENTER AUTOMOTIVE
TRAINTEST
MANAGE / AUGMENT
EMBEDDED
GPU INFERENCE ENGINE
NVIDIA DRIVE™ PX 2
Selected by Volvo on
Journey Towards a
Crash-Free Future
26
WORLD’S FIRST AUTONOMOUS CAR RACE
10 teams, 20 identical cars | DRIVE PX 2 as “brain” in every car | 2016/17 Formula E season
THANK YOU!
cnardone@nvidia.com
+39 335 5828197
www.nvidia.com/drive
28
DEEP LEARNING &
ARTIFICIAL INTELLIGENCE
Sep 28-29, 2016 | Amsterdam
www.gputechconf.eu #GTC16
SELF-DRIVING CARS VIRTUAL REALITY &
AUGMENTED REALITY
SUPERCOMPUTING & HPC
GTC Europe is a two-day conference designed to expose the innovative ways developers, businesses and academics
are using parallel computing to transform our world.
GTC EUROPE
2 Days | 800 Attendees | 50+ Exhibitors | 50+ Speakers | 15+ Tracks | 15+ Workshops | 1-to-1 Meetings
BACKUP SLIDES
30
MANY THINGS TO LEARN
31
THE BASIC SELF-DRIVING LOOP
LOCALIZE
MAP
CONTROLSENSE
PLAN
PERCEIVE
32
INTERFACES
70 Gigabits per second of I/O
Sensor Fusion Interfaces:
GMSL Camera, CAN, GbE, BroadR-Reach, FlexRay,
LIN, GPIO
Displays and Cockpit Computer Interfaces
HDMI, FPDLink III and GMSL
Development and Debug Interfaces
HDMI, GbE, 10GbE, USB3, USB 2 (UART/debug),
JTAG
Auto Grade connectors Debug/Lab interfaces
33
GPU INFERENCE ENGINE
Optimizations
• Fuse network layers
• Eliminate concatenation layers
• Kernel specialization
• Auto-tuning for target platform
• Select optimal tensor layout
• Batch size tuningTRAINED
NEURAL NETWORK
OPTIMIZED
INFERENCE
RUNTIME
developer.nvidia.com/gpu-inference-engine
OPEN PLATFORM FOR ALL DEVELOPERS
37
AUTOMOTIVE PARTNERS
Self Driving Vehicles
“Using NVIDIA DIGITS deep
learning platform, in less than
four hours we achieved over 96%
accuracy using Ruhr University
Bochum’s traffic sign database.
While others invested years of
development to achieve similar
levels of perception with
classical computer vision
algorithms, we have been able
to do it at the speed of light.”
Matthias Rudolph, Director of Architecture,
Driver Assistance Systems, Audi
“Deep learning on NVIDIA DIGITS
has allowed for a 30x enhancement
in training pedestrian detection
algorithms, which are being further
tested and developed as we move
them onto the NVIDIA DRIVE PX.”
Dragos Maciuca, Technical Director,
Ford Research and Innovation Center
DGX-1 DEEP LEARNING
SUPERCOMPUTER
41
42
DGX-1 GPU CLUSTER
Two fully connected quads,
connected at corners
160GB/s per GPU bidirectional to Peers
Load/store access to Peer Memory
Full atomics to Peer GPUs
High speed copy engines for bulk data copy
PCIe to/from CPU

2016 06 nvidia-isc_supercomputing_car_v02

  • 1.
    SUPERCOMPUTING IN ACAR Carlo Nardone, Senior Solution Architect EMEA Enterprise ISC 2016
  • 2.
    2 ENTERPRISE AUTOGAMING DATACENTERPRO VISUALIZATION THE WORLD LEADER IN VISUAL COMPUTING
  • 3.
  • 5.
    5 SIMULATION MEANS BETTERPRODUCTS, FASTER ACTUAL CRASH SIMULATED CRASH
  • 6.
    6 THE SELF DRIVINGREVOLUTION Safer Driving New Mobility Services Urban Redesign
  • 7.
  • 8.
    8 Uber Enters theRace Toyota Invests $1B in AI Lab Volvo Drive Me on Public Roads in 2017 NHTSA: Computer Counts as Driver Tesla Model 3: 300K pre-orders 2016: AN AMAZING YEAR FOR SELF-DRIVING CARS Audi, BMW, Daimler Buy HERE Tesla Model S Auto-pilot Baidu Enters the Race Honda, Nissan, Toyota Team Up GM Buys Cruise
  • 9.
    9 DEEP LEARNING FORSELF-DRIVING CARS
  • 10.
    10 NVIDIA PILOTNET VIDEOPaper on http://arxiv.org/abs/1604.07316
  • 11.
    11 THE BIG BANGIN MACHINE LEARNING DNN GPUBIG DATA “The GPU is the workhorse of modern A.I.”
  • 12.
    12 Image “Volvo XC90” Imagesource: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. WHAT IS DEEP LEARNING?
  • 13.
  • 14.
    DEEP LEARNING EVERYWHERE NVIDIATitan X NVIDIA Jetson NVIDIA Tesla NVIDIA DRIVE PX
  • 15.
    15NVIDIA CONFIDENTIAL. DONOT DISTRIBUTE. NVIDIA DGX-1 WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER 170 TFLOPS FP16 8x Tesla P100 16GB NVLink Hybrid Cube Mesh Accelerates Major AI Frameworks Dual Xeon 7 TB SSD Deep Learning Cache Dual 10GbE, Quad IB 100Gb 3RU – 3200W
  • 16.
    16 NVIDIA END-TO-END AUTONOMOUS DRIVINGPLATFORM NVIDIA DRIVE PX 2NVIDIA DGX-1 NVIDIA DRIVENET Localization Planning Visualization Perception DRIVEWORKS
  • 17.
    17 NVIDIA DRIVE PX2 World’s First AI Supercomputer for Self-Driving Cars 12 CPU cores | Pascal GPU | 8 TFLOPS | 24 DL TOPS | 16nm FF | 250W | Liquid Cooled
  • 19.
    19 SELF DRIVING COMPUTERI/O NVIDIA Drive PX 2: 70 Gbps aggregate I/O DISPLAY DATA LOGGING DRIVE TRAIN POWER TRAIN Ethernet GMSL Ethernet FlexRay/Ethernet TEGRA TEGRA SMART CAMERAS CAMERAS LIDAR RADAR CANBus LVDS USB/PCIE
  • 20.
    20 DRIVE™ PX 2 COMPUTATIONENGINES 24 DL TOPS, 8 TFLOPS, high performance CPU/GPU complex
  • 21.
    21 NVIDIA DRIVE PXSW STACK A full stack of rich software components
  • 22.
    22 GPU INFERENCE ENGINE High-performanceframework makes it easy to develop GPU-accelerated inference Production deployment solution for deep learning inference Optimized inference for a given trained neural network and target GPU Solutions for Hyperscale, ADAS, Embedded Supports deployment of 32-bit or 16-bit inference Maximum Performance for Deep Learning Inference developer.nvidia.com/gpu-inference-engine GPU Inference Engine for Automotive Pedestrian Detection Lane Tracking Traffic Sign Recognition --- NVIDIA DRIVE PX 2
  • 23.
    ACTIVE LEARNING Data ScientistVehicle Drive PX - Deploy Model Classification Detection Segmentation DIGITS / Tesla - Train Network Solver Dashboard
  • 24.
    24 A COMPLETE DEEPLEARNING PLATFORM MANAGE TRAIN DEPLOY DIGITS DATA CENTER AUTOMOTIVE TRAINTEST MANAGE / AUGMENT EMBEDDED GPU INFERENCE ENGINE
  • 25.
    NVIDIA DRIVE™ PX2 Selected by Volvo on Journey Towards a Crash-Free Future
  • 26.
    26 WORLD’S FIRST AUTONOMOUSCAR RACE 10 teams, 20 identical cars | DRIVE PX 2 as “brain” in every car | 2016/17 Formula E season
  • 27.
    THANK YOU! cnardone@nvidia.com +39 3355828197 www.nvidia.com/drive
  • 28.
    28 DEEP LEARNING & ARTIFICIALINTELLIGENCE Sep 28-29, 2016 | Amsterdam www.gputechconf.eu #GTC16 SELF-DRIVING CARS VIRTUAL REALITY & AUGMENTED REALITY SUPERCOMPUTING & HPC GTC Europe is a two-day conference designed to expose the innovative ways developers, businesses and academics are using parallel computing to transform our world. GTC EUROPE 2 Days | 800 Attendees | 50+ Exhibitors | 50+ Speakers | 15+ Tracks | 15+ Workshops | 1-to-1 Meetings
  • 29.
  • 30.
  • 31.
    31 THE BASIC SELF-DRIVINGLOOP LOCALIZE MAP CONTROLSENSE PLAN PERCEIVE
  • 32.
    32 INTERFACES 70 Gigabits persecond of I/O Sensor Fusion Interfaces: GMSL Camera, CAN, GbE, BroadR-Reach, FlexRay, LIN, GPIO Displays and Cockpit Computer Interfaces HDMI, FPDLink III and GMSL Development and Debug Interfaces HDMI, GbE, 10GbE, USB3, USB 2 (UART/debug), JTAG Auto Grade connectors Debug/Lab interfaces
  • 33.
    33 GPU INFERENCE ENGINE Optimizations •Fuse network layers • Eliminate concatenation layers • Kernel specialization • Auto-tuning for target platform • Select optimal tensor layout • Batch size tuningTRAINED NEURAL NETWORK OPTIMIZED INFERENCE RUNTIME developer.nvidia.com/gpu-inference-engine
  • 34.
    OPEN PLATFORM FORALL DEVELOPERS
  • 35.
  • 36.
    “Using NVIDIA DIGITSdeep learning platform, in less than four hours we achieved over 96% accuracy using Ruhr University Bochum’s traffic sign database. While others invested years of development to achieve similar levels of perception with classical computer vision algorithms, we have been able to do it at the speed of light.” Matthias Rudolph, Director of Architecture, Driver Assistance Systems, Audi
  • 37.
    “Deep learning onNVIDIA DIGITS has allowed for a 30x enhancement in training pedestrian detection algorithms, which are being further tested and developed as we move them onto the NVIDIA DRIVE PX.” Dragos Maciuca, Technical Director, Ford Research and Innovation Center
  • 38.
  • 39.
  • 40.
    42 DGX-1 GPU CLUSTER Twofully connected quads, connected at corners 160GB/s per GPU bidirectional to Peers Load/store access to Peer Memory Full atomics to Peer GPUs High speed copy engines for bulk data copy PCIe to/from CPU