SlideShare a Scribd company logo
1 of 41
Download to read offline
HPC Advisory Council Meeting Lugano | 22 March 2016
The Tesla Accelerated Computing Platform
Axel Koehler , Principal Solution Architect
2
Agenda
Introduction
TESLA Platform for HPC
TESLA Platform for HYPERSCALE
TESLA Platform for MACHINE LEARNING
TESLA System Software and Tools
Data Center GPU Manager, Docker
3
ENTERPRISE AUTOGAMING DATA CENTERPRO VISUALIZATION
4
TESLA PLATFORM PRODUCT STACK
Software
System Tools &
Services
Accelerators
Accelerated
Computing
Toolkit
Tesla K80
HPC
Enterprise Services · Data Center GPU Manager · Mesos · Docker
GRID 2.0
Tesla M60, M6
Enterprise
Virtualization DL Training
Hyperscale
Hyperscale Suite
Tesla M40 Tesla M4
Web Services
5
TESLA PLATFORM FOR HPC
6
CPU
Optimized for
Serial Tasks
GPU Accelerator
Optimized for
Parallel Tasks
HETEROGENEOUS COMPUTING MODEL!
Complementary Processors Work Together
7
COMMON PROGRAMMING MODELS ACROSS
MULTIPLE CPUS
x86
Libraries
Programming
Languages
Compiler
Directives
AmgX
cuBLAS
8
370 GPU-Accelerated
Applications
www.nvidia.com/appscatalog
9
TESLA K80
World’s Fastest Accelerator
for HPC & Data Analytics
0 5 10 15 20 25 30
Tesla K80 Server
Dual CPU Server
# of Days
AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond
CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2
CUDA Cores 2496
Peak DP 1.9 TFLOPS
Peak DP w/ Boost 2.9 TFLOPS
GDDR5 Memory 24 GB
Bandwidth 480 GB/s
Power 300 W
GPU Boost Dynamic
Simulation Time from
1 Month to 1 Week
5x Faster
AMBER Performance
10
VISUALIZE DATA INSTANTLY FOR FASTER SCIENCE
Traditional
Slower Time to Discovery
CPU Supercomputer Viz Cluster
Simulation- 1 Week Viz- 1 Day
Multiple Iterations
Time to Discovery =
Months
Tesla Platform
Faster Time to Discovery
GPU-Accelerated Supercomputer
Visualize while you
simulate/without
data transfers
Restart Simulation Instantly
Multiple Iterations
Time to Discovery = Weeks
Flexible
Scalable
Interactive
Days
Data Transfer
11
EGL CONTEXT MANAGEMENT
Top systems support OpenGL under X
EGL: Driver based context management
Support for full OpenGL*, not only GL ES
Available in e.g. VTK
New opportunities for CUDA/OpenGL** interop
*Full OpenGL in r355.11; **CUDA interop in r358.7
Leaving it to the driver
Tesla GPU
Tesla driver with EGL
ParaView/VMD
X-server
12
SCALABLE RENDERING AND COMPOSITING
Large-scale (volume) data visualization
Interactive visualization of TB of data
Stand-alone or coupling into simulation
HW Accelerated remote rendering
Plugin for ParaView available
http://www.nvidia-arc.com/products/nvidia-index.html
NVIDIA INDEX
Dataset from NCSA Blue Waters
13
NVLINK : A HIGH-SPEED GPU INTERCONNECT
Whitepaper: http://www.nvidia.com/object/nvlink.html
GPU to CPU via NVLink
NVLink
Pascal
CPU
(NVLINK
Enabled)
DDR
Memory
10s-100s GB
HBM
16-32GB
DDR4
50-75 GB/s
1Tbyte/s
PCIe
GPU to GPU via NVLink
PascalPascal
CPU
(x86)
PCIe Switch
NVlink
14
U.S. TO BUILD TWO FLAGSHIP SUPERCOMPUTERS
Powered by the Tesla Platform
100-300 PFLOPS Peak
10x in Scientific App Performance
IBM POWER9 CPU + NVIDIA Volta GPU
NVLink High Speed Interconnect
40 TFLOPS per Node, >3,400 Nodes
2017
Major Step Forward on the Path to Exascale
15
TESLA PLATFORM FOR HYPERSCALE
16
EXABYTES OF CONTENT PRODUCED DAILY
User-Generated Content Dominates Web Services
10M Users
40 years of video/day
1.7M Broadcasters
Users watch 1.5 hours/day
6B Queries/day
10% use speech
270M Items sold/day
43% on mobile devices
8B Video views/day
400% growth in 6 months
300 hours of video/minute
50% on mobile devices
Challenge: Harnessing the Data Tsunami in Real-time
17
TESLA FOR HYPERSCALE
10M Users
40 years of video/day
270M Items sold/day
43% on mobile devices
TESLA M4TESLA M40
HYPERSCALE SUITE
POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput
GPU Accelerated
FFmpeg
Image Compute
Engine
! !
GPU REST Engine
!
18
HTTP (~10ms)
GPU REST Engine (GRE) SDK
Accelerated Microservices
for Web and Mobile Applications
Supercomputer performance for hyper-scale
datacenters
Powerful nodes with low response time (~10ms)
Easy to develop new microservices
Open source, integrates with existing infrastructure
Easy to deploy & scale
Ready-to-run Docker file
GPU REST Engine
Image
Classification
Speech
Recognition
…
Image
Scaling
developer.nvidia.com/gre
19
TESLA M4
Highest Throughput
Hyperscale Workload
Acceleration
CUDA Cores 1024
Peak SP 2.2 TFLOPS
GDDR5 Memory 4 GB
Bandwidth 88 GB/s
Form Factor PCIe Low Profile
Power 50 – 75 W
Video
Processing
Image
Processing
Video
Transcode
Machine
Learning
Inference
H.264 & H.265, SD & HD
Stabilization and
Enhancements
Resize, Filter, Search,
Auto-Enhance
20
JETSON TX1
Embedded
Deep Learning
•  Unmatched performance under 10W
•  Advanced tech for autonomous machines
•  Smaller than a credit card
JETSON TX1
GPU 1 TFLOP/s 256-core Maxwell
CPU 64-bit ARM A57 CPUs
Memory 4 GB LPDDR4 | 25.6 GB/s
Storage 16 GB eMMC
Wifi/BT 802.11 2x2 ac/BT Ready
Networking 1 Gigabit Ethernet
Size 50mm x 87mm
Interface 400 pin board-to-board connector
21
HYPERSCALE DATACENTER NOW ACCELERATED
Tesla Platform
SERVERS FOR TRAINING
Scales with Data
SERVERS FOR INFERENCE, WEB SERVICES
Scales with Users
!
Exabytes of Content / Day Trained Model Model Deployed on Every Server Billions of Devices
22
TESLA PLATFORM FOR MACHINE LEARNING
23
DEEP LEARNING EVERYWHERE
INTERNET & CLOUD
Image Classification
Speech Recognition
Language Translation
Language Processing
Sentiment Analysis
Recommendation
MEDIA & ENTERTAINMENT
Video Captioning
Video Search
Real Time Translation
AUTONOMOUS MACHINES
Pedestrian Detection
Lane Tracking
Recognize Traffic Sign
SECURITY & DEFENSE
Face Detection
Video Surveillance
Satellite Imagery
MEDICINE & BIOLOGY
Cancer Cell Detection
Diabetic Grading
Drug Discovery
24
Why is Deep Learning Hot Now?
Big Data Availability GPU AccelerationNew ML Techniques
350 millions
images uploaded
per day
2.5 Petabytes of
customer data
hourly
300 hours of video
uploaded every
minute
25
Image “Volvo XC90”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
WHAT IS DEEP LEARNING?
26
DRIVE PX AUTO-PILOT
CAR COMPUTER
NVIDIA GPU DEEP LEARNING
SUPERCOMPUTER
Neural Net Model
Classified Object
!
Camera Inputs
Cars That See Better … And Learn
27
Camera Inputs
Medical Compute Center
(Training)
Hospital/Doctor
(Inference)
Classified Object
Med. device inputs
Neural Net Model
!
!
Deep Learning Platform In Medical
Feedback
28
GPUs deliver --
- same or better prediction accuracy
- faster results
- smaller footprint
- lower power
NEURAL
NETWORKS
GPUS
Inherently
Parallel ! !
Matrix
Operations ! !
FLOPS ! !
Bandwidth ! !
GPUS AND DEEP LEARNING
29
NVIDIA CUDA
ACCELERATED COMPUTING PLATFORM
WATSON CHAINER THEANO MATCONVNET
TENSORFLOW CNTK TORCH CAFFE
NVIDIA GPU THE ENGINE OF DEEP LEARNING
cuDNN
Deep Learning Primitives
IGNITING ARTIFICIAL
INTELLIGENCE
"  GPU-accelerated Deep Learning
subroutines
"  High performance neural network
training
"  Accelerates Major Deep Learning
frameworks: Caffe, Theano, Torch
"  Up to 3.5x faster AlexNet training
in Caffe than baseline GPU
Millions of Images Trained Per Day
Tiled FFT up to 2x faster than FFT
developer.nvidia.com/cudnn
0
20
40
60
80
100
cuDNN 1 cuDNN 2 cuDNN 3 cuDNN 4
0.0x
0.5x
1.0x
1.5x
2.0x
2.5x
31
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers
http://developer.nvidia.com/digits
32
TESLA M40
World’s Fastest Accelerator
for Deep Learning Training
0 1 2 3 4 5 6 7 8 9 10 11 12 13
GPU Server with
4x TESLA M40
Dual CPU Server
13x Faster Training
Caffe
Number of Days
CUDA Cores 3072
Peak SP 7 TFLOPS
GDDR5 Memory 12 GB
Bandwidth 288 GB/s
Power 250W
Reduce Training Time from 13 Days to just 1 Day
Note: Caffe benchmark with AlexNet,
CPU server uses 2x E5-2680v3 12 Core 2.5GHz CPU, 128GB System Memory, Ubuntu 14.04
33
Facebook’s deep learning machine
Purpose-Built for Deep Learning Training
2x Faster Training for Faster Deployment
2x Larger Networks for Higher Accuracy
Powered by Eight Tesla M40 GPUs
Open Rack Compliant
Serkan Piantino
Engineering Director of Facebook AI Research
“Most of the major advances in machine learning and AI in the
past few years have been contingent on tapping into powerful
GPUs and huge data sets to build and train advanced models”
34
Designed for AI Computing at large scale
Built on the NVIDIA Tesla Platform
• 8 Tesla M40s deliver aggregate 96 GB GDDR5
memory and 56 teraflops of SP performance
• Leverages world’s leading deep learning
platform to tap into frameworks such as Torch
and libraries such as cuDNN
Operational Efficiency and Serviceability
• Free-air Cooled Design Optimizes Thermal and
Power Efficiency
• Components swappable without tools
• Configurable PCI-e for versatility
35
NCCL
GOAL:
•  Build a research library of accelerated collectives that is easily
integrated and topology-aware so as to improve the scalability of
multi-GPU applications
APPROACH:
•  Pattern the library after MPI’s collectives
•  Handle the intra-node communication in an optimal way
•  Provide the necessary functionality for MPI to build on top to handle
inter-node
Accelerating Multi-GPU Communications for Deep Learning
github.com/NVIDIA/nccl
TESLA SYSTEM SOFTWARE AND TOOLS
DATA CENTER GPU MANAGEMENT
Device Management!
Board-level GPU
Configuration & Monitoring
•  Device Identification
•  Configuration & Monitoring
•  Clock Management
All GPUs Supported Tesla GPUs Only Tesla GPUs Only
! Active Diagnostics ! Health &
Governance
•  GPU Recovery & Isolation
•  System Validation
•  Comprehensive Diagnostics
•  Real-time Monitoring &
Analysis
•  Governance Policies
•  Power & Clock Management
Diagnostics, Recovery &
System Validation
Proactive Health, Policy &
Power Mgmt.
Today Data Center GPU Manager (DCGM)
DATA CENTER GPU MANAGER (DCGM)
Compute Node
Management Node
DC GPU Manager
DC Cluster Management SW
Mgmt. SW Agent
APIs
Network
Tesla Enterprise Driver
Admin
GPU GPU GPU GPU
Admin
CLI
DCGM Available as library & CLI
Ready for integration into ISV Mgmt. Software
—  eg. Bright Cluster Manager , IBM Platform Cluster Manager
Ready for integration with HPC Job Schedulers
—  eg. Altair PBS Works, Moab & Maui, IBM Platform LSF,
SLURM, Univa GRID Engine
DCGM currently in Public Beta
http://www.nvidia.com/object/data-center-gpu-manager.html
GROWING CONTAINER ADOPTION IN DATA
CENTER
“Docker spreads like wildfire, especially in the enterprise”
Rightscale 2016 Cloud Survey Report
>2X growth in Docker
adoption in a year
Across Enterprise, Cloud and HPC
GPU CONTAINERIZATION USING NVIDIA-DOCKER
Single command-line interface to take care of all
deployment steps
•  Discovery, Config/setup, Device allocation
Pre-built images on Docker HUB – CUDA, Caffe, Digits
•  Reproducible builds across heterogeneous targets
Remote deployment using NVIDIA-Docker-Plugin and
REST interface
Key Highlights
#  NVIDIA Docker on GitHUB (experimental) – Available Now
#  Bundled with CUDA Product – Future Versions (In planning)
Axel Koehler
akoehler@nvidia.com

More Related Content

What's hot

OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation OverviewNVIDIA Taiwan
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterRenee Yao
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangPAPIs.io
 
GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說NVIDIA Taiwan
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the EdgeUsman Qayyum
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化NVIDIA Taiwan
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierNVIDIA
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer VisionNVIDIA Taiwan
 

What's hot (20)

OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation Overview
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike Wang
 
GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next Frontier
 
AI + E-commerce
AI + E-commerceAI + E-commerce
AI + E-commerce
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer Vision
 

Similar to Tesla Accelerated Computing Platform

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1IBM Sverige
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTTrust S.A.
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening KeynoteNVIDIA
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu India
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 

Similar to Tesla Accelerated Computing Platform (20)

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AI
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Nvidia tesla-k80-overview
Nvidia tesla-k80-overviewNvidia tesla-k80-overview
Nvidia tesla-k80-overview
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening Keynote
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital World
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Tesla Accelerated Computing Platform

  • 1. HPC Advisory Council Meeting Lugano | 22 March 2016 The Tesla Accelerated Computing Platform Axel Koehler , Principal Solution Architect
  • 2. 2 Agenda Introduction TESLA Platform for HPC TESLA Platform for HYPERSCALE TESLA Platform for MACHINE LEARNING TESLA System Software and Tools Data Center GPU Manager, Docker
  • 3. 3 ENTERPRISE AUTOGAMING DATA CENTERPRO VISUALIZATION
  • 4. 4 TESLA PLATFORM PRODUCT STACK Software System Tools & Services Accelerators Accelerated Computing Toolkit Tesla K80 HPC Enterprise Services · Data Center GPU Manager · Mesos · Docker GRID 2.0 Tesla M60, M6 Enterprise Virtualization DL Training Hyperscale Hyperscale Suite Tesla M40 Tesla M4 Web Services
  • 6. 6 CPU Optimized for Serial Tasks GPU Accelerator Optimized for Parallel Tasks HETEROGENEOUS COMPUTING MODEL! Complementary Processors Work Together
  • 7. 7 COMMON PROGRAMMING MODELS ACROSS MULTIPLE CPUS x86 Libraries Programming Languages Compiler Directives AmgX cuBLAS
  • 9. 9 TESLA K80 World’s Fastest Accelerator for HPC & Data Analytics 0 5 10 15 20 25 30 Tesla K80 Server Dual CPU Server # of Days AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2 CUDA Cores 2496 Peak DP 1.9 TFLOPS Peak DP w/ Boost 2.9 TFLOPS GDDR5 Memory 24 GB Bandwidth 480 GB/s Power 300 W GPU Boost Dynamic Simulation Time from 1 Month to 1 Week 5x Faster AMBER Performance
  • 10. 10 VISUALIZE DATA INSTANTLY FOR FASTER SCIENCE Traditional Slower Time to Discovery CPU Supercomputer Viz Cluster Simulation- 1 Week Viz- 1 Day Multiple Iterations Time to Discovery = Months Tesla Platform Faster Time to Discovery GPU-Accelerated Supercomputer Visualize while you simulate/without data transfers Restart Simulation Instantly Multiple Iterations Time to Discovery = Weeks Flexible Scalable Interactive Days Data Transfer
  • 11. 11 EGL CONTEXT MANAGEMENT Top systems support OpenGL under X EGL: Driver based context management Support for full OpenGL*, not only GL ES Available in e.g. VTK New opportunities for CUDA/OpenGL** interop *Full OpenGL in r355.11; **CUDA interop in r358.7 Leaving it to the driver Tesla GPU Tesla driver with EGL ParaView/VMD X-server
  • 12. 12 SCALABLE RENDERING AND COMPOSITING Large-scale (volume) data visualization Interactive visualization of TB of data Stand-alone or coupling into simulation HW Accelerated remote rendering Plugin for ParaView available http://www.nvidia-arc.com/products/nvidia-index.html NVIDIA INDEX Dataset from NCSA Blue Waters
  • 13. 13 NVLINK : A HIGH-SPEED GPU INTERCONNECT Whitepaper: http://www.nvidia.com/object/nvlink.html GPU to CPU via NVLink NVLink Pascal CPU (NVLINK Enabled) DDR Memory 10s-100s GB HBM 16-32GB DDR4 50-75 GB/s 1Tbyte/s PCIe GPU to GPU via NVLink PascalPascal CPU (x86) PCIe Switch NVlink
  • 14. 14 U.S. TO BUILD TWO FLAGSHIP SUPERCOMPUTERS Powered by the Tesla Platform 100-300 PFLOPS Peak 10x in Scientific App Performance IBM POWER9 CPU + NVIDIA Volta GPU NVLink High Speed Interconnect 40 TFLOPS per Node, >3,400 Nodes 2017 Major Step Forward on the Path to Exascale
  • 15. 15 TESLA PLATFORM FOR HYPERSCALE
  • 16. 16 EXABYTES OF CONTENT PRODUCED DAILY User-Generated Content Dominates Web Services 10M Users 40 years of video/day 1.7M Broadcasters Users watch 1.5 hours/day 6B Queries/day 10% use speech 270M Items sold/day 43% on mobile devices 8B Video views/day 400% growth in 6 months 300 hours of video/minute 50% on mobile devices Challenge: Harnessing the Data Tsunami in Real-time
  • 17. 17 TESLA FOR HYPERSCALE 10M Users 40 years of video/day 270M Items sold/day 43% on mobile devices TESLA M4TESLA M40 HYPERSCALE SUITE POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput GPU Accelerated FFmpeg Image Compute Engine ! ! GPU REST Engine !
  • 18. 18 HTTP (~10ms) GPU REST Engine (GRE) SDK Accelerated Microservices for Web and Mobile Applications Supercomputer performance for hyper-scale datacenters Powerful nodes with low response time (~10ms) Easy to develop new microservices Open source, integrates with existing infrastructure Easy to deploy & scale Ready-to-run Docker file GPU REST Engine Image Classification Speech Recognition … Image Scaling developer.nvidia.com/gre
  • 19. 19 TESLA M4 Highest Throughput Hyperscale Workload Acceleration CUDA Cores 1024 Peak SP 2.2 TFLOPS GDDR5 Memory 4 GB Bandwidth 88 GB/s Form Factor PCIe Low Profile Power 50 – 75 W Video Processing Image Processing Video Transcode Machine Learning Inference H.264 & H.265, SD & HD Stabilization and Enhancements Resize, Filter, Search, Auto-Enhance
  • 20. 20 JETSON TX1 Embedded Deep Learning •  Unmatched performance under 10W •  Advanced tech for autonomous machines •  Smaller than a credit card JETSON TX1 GPU 1 TFLOP/s 256-core Maxwell CPU 64-bit ARM A57 CPUs Memory 4 GB LPDDR4 | 25.6 GB/s Storage 16 GB eMMC Wifi/BT 802.11 2x2 ac/BT Ready Networking 1 Gigabit Ethernet Size 50mm x 87mm Interface 400 pin board-to-board connector
  • 21. 21 HYPERSCALE DATACENTER NOW ACCELERATED Tesla Platform SERVERS FOR TRAINING Scales with Data SERVERS FOR INFERENCE, WEB SERVICES Scales with Users ! Exabytes of Content / Day Trained Model Model Deployed on Every Server Billions of Devices
  • 22. 22 TESLA PLATFORM FOR MACHINE LEARNING
  • 23. 23 DEEP LEARNING EVERYWHERE INTERNET & CLOUD Image Classification Speech Recognition Language Translation Language Processing Sentiment Analysis Recommendation MEDIA & ENTERTAINMENT Video Captioning Video Search Real Time Translation AUTONOMOUS MACHINES Pedestrian Detection Lane Tracking Recognize Traffic Sign SECURITY & DEFENSE Face Detection Video Surveillance Satellite Imagery MEDICINE & BIOLOGY Cancer Cell Detection Diabetic Grading Drug Discovery
  • 24. 24 Why is Deep Learning Hot Now? Big Data Availability GPU AccelerationNew ML Techniques 350 millions images uploaded per day 2.5 Petabytes of customer data hourly 300 hours of video uploaded every minute
  • 25. 25 Image “Volvo XC90” Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. WHAT IS DEEP LEARNING?
  • 26. 26 DRIVE PX AUTO-PILOT CAR COMPUTER NVIDIA GPU DEEP LEARNING SUPERCOMPUTER Neural Net Model Classified Object ! Camera Inputs Cars That See Better … And Learn
  • 27. 27 Camera Inputs Medical Compute Center (Training) Hospital/Doctor (Inference) Classified Object Med. device inputs Neural Net Model ! ! Deep Learning Platform In Medical Feedback
  • 28. 28 GPUs deliver -- - same or better prediction accuracy - faster results - smaller footprint - lower power NEURAL NETWORKS GPUS Inherently Parallel ! ! Matrix Operations ! ! FLOPS ! ! Bandwidth ! ! GPUS AND DEEP LEARNING
  • 29. 29 NVIDIA CUDA ACCELERATED COMPUTING PLATFORM WATSON CHAINER THEANO MATCONVNET TENSORFLOW CNTK TORCH CAFFE NVIDIA GPU THE ENGINE OF DEEP LEARNING
  • 30. cuDNN Deep Learning Primitives IGNITING ARTIFICIAL INTELLIGENCE "  GPU-accelerated Deep Learning subroutines "  High performance neural network training "  Accelerates Major Deep Learning frameworks: Caffe, Theano, Torch "  Up to 3.5x faster AlexNet training in Caffe than baseline GPU Millions of Images Trained Per Day Tiled FFT up to 2x faster than FFT developer.nvidia.com/cudnn 0 20 40 60 80 100 cuDNN 1 cuDNN 2 cuDNN 3 cuDNN 4 0.0x 0.5x 1.0x 1.5x 2.0x 2.5x
  • 31. 31 NVIDIA DIGITS Interactive Deep Learning GPU Training System Test Image Monitor ProgressConfigure DNNProcess Data Visualize Layers http://developer.nvidia.com/digits
  • 32. 32 TESLA M40 World’s Fastest Accelerator for Deep Learning Training 0 1 2 3 4 5 6 7 8 9 10 11 12 13 GPU Server with 4x TESLA M40 Dual CPU Server 13x Faster Training Caffe Number of Days CUDA Cores 3072 Peak SP 7 TFLOPS GDDR5 Memory 12 GB Bandwidth 288 GB/s Power 250W Reduce Training Time from 13 Days to just 1 Day Note: Caffe benchmark with AlexNet, CPU server uses 2x E5-2680v3 12 Core 2.5GHz CPU, 128GB System Memory, Ubuntu 14.04
  • 33. 33 Facebook’s deep learning machine Purpose-Built for Deep Learning Training 2x Faster Training for Faster Deployment 2x Larger Networks for Higher Accuracy Powered by Eight Tesla M40 GPUs Open Rack Compliant Serkan Piantino Engineering Director of Facebook AI Research “Most of the major advances in machine learning and AI in the past few years have been contingent on tapping into powerful GPUs and huge data sets to build and train advanced models”
  • 34. 34 Designed for AI Computing at large scale Built on the NVIDIA Tesla Platform • 8 Tesla M40s deliver aggregate 96 GB GDDR5 memory and 56 teraflops of SP performance • Leverages world’s leading deep learning platform to tap into frameworks such as Torch and libraries such as cuDNN Operational Efficiency and Serviceability • Free-air Cooled Design Optimizes Thermal and Power Efficiency • Components swappable without tools • Configurable PCI-e for versatility
  • 35. 35 NCCL GOAL: •  Build a research library of accelerated collectives that is easily integrated and topology-aware so as to improve the scalability of multi-GPU applications APPROACH: •  Pattern the library after MPI’s collectives •  Handle the intra-node communication in an optimal way •  Provide the necessary functionality for MPI to build on top to handle inter-node Accelerating Multi-GPU Communications for Deep Learning github.com/NVIDIA/nccl
  • 37. DATA CENTER GPU MANAGEMENT Device Management! Board-level GPU Configuration & Monitoring •  Device Identification •  Configuration & Monitoring •  Clock Management All GPUs Supported Tesla GPUs Only Tesla GPUs Only ! Active Diagnostics ! Health & Governance •  GPU Recovery & Isolation •  System Validation •  Comprehensive Diagnostics •  Real-time Monitoring & Analysis •  Governance Policies •  Power & Clock Management Diagnostics, Recovery & System Validation Proactive Health, Policy & Power Mgmt. Today Data Center GPU Manager (DCGM)
  • 38. DATA CENTER GPU MANAGER (DCGM) Compute Node Management Node DC GPU Manager DC Cluster Management SW Mgmt. SW Agent APIs Network Tesla Enterprise Driver Admin GPU GPU GPU GPU Admin CLI DCGM Available as library & CLI Ready for integration into ISV Mgmt. Software —  eg. Bright Cluster Manager , IBM Platform Cluster Manager Ready for integration with HPC Job Schedulers —  eg. Altair PBS Works, Moab & Maui, IBM Platform LSF, SLURM, Univa GRID Engine DCGM currently in Public Beta http://www.nvidia.com/object/data-center-gpu-manager.html
  • 39. GROWING CONTAINER ADOPTION IN DATA CENTER “Docker spreads like wildfire, especially in the enterprise” Rightscale 2016 Cloud Survey Report >2X growth in Docker adoption in a year Across Enterprise, Cloud and HPC
  • 40. GPU CONTAINERIZATION USING NVIDIA-DOCKER Single command-line interface to take care of all deployment steps •  Discovery, Config/setup, Device allocation Pre-built images on Docker HUB – CUDA, Caffe, Digits •  Reproducible builds across heterogeneous targets Remote deployment using NVIDIA-Docker-Plugin and REST interface Key Highlights #  NVIDIA Docker on GitHUB (experimental) – Available Now #  Bundled with CUDA Product – Future Versions (In planning)