EPSRC CDT Conference

ALISON B LOWNDES
Senior Scientist, Global AI
@alisonblowndes
August 2021

3
NVIDIA A100 80GB
Supercharging The World’s Highest
Performing AI Supercomputing GPU
80GB HBM2e
For largest datasets
and models
2TB/s +
World’s highest memory
bandwidth to feed the world’s
fastest GPU
Multi-Instance GPU
3rd Gen NVLink
3rd Gen Tensor Core
https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/

4
NEW MULTI-INSTANCE GPU (MIG)
Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service
nvidia.com/en-us/technologies/multi-instance-gpu/
Up To 7 GPU Instances In a Single A100: Dedicated
SM, Memory, L2 cache, Bandwidth for hardware
QoS & isolation
Simultaneous Workload Execution With Guaranteed
Quality Of Service: All MIG instances run in
parallel with predictable throughput & latency
Right Sized GPU Allocation: Different sized MIG
instances based on target workloads
Flexibility to run any type of workload on a MIG
instance
Diverse Deployment Environments: Supported with
Bare metal, Docker, Kubernetes, Virtualized Env.
Amber
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/

5
NVIDIA SELENE
Now Featuring NVIDIA DGX A100 640GB
#1 in Green 500 | #5 Top500 | #1 MLPerf | #1 Industrial System
4,480 A100 GPUs
560 DGX A100 640GB systems
850 Mellanox 200G HDR switches
14PB of high-performance storage
2.8 EFLOPS of AI peak performance
63 PFLOPS HPL @ 24GF/W
1 DGX A100 replaces 150 CPU
servers and saves 300 tons of CO2
per DGX per year!

6
NVIDIA DGX SUPERPOD
READY MADE NATIONAL AI INFRASTRUCTURE
Thailand CMKL Univ installs DGX
A100 Pod to advance National AI
Program sponsored by MEHESI
Luxembourg EuroHPC peta-
scale AI cluster, MeluXina,
built with ATOS + NVIDIA
Vietnam VinAI deploys DGX
A100 Pod to support AI
researchers and engineers
UAE Ministry of AI deploys DGX
Pod with 30 petaflops to advance
national AI programs
Egypt 1st national AI super-
computer as AI testbed
built with Dell and NVIDIA
Slovenia EuroHPCpeta-scale
AI supercomputer,Vega, with
ATOS and NVIDIA
Sweden WASP installs #1 AI
supercomputer at Linköping
Univ with ATOS and NVIDIA
India C-DAC installs largest
national AI supercomputer
with ATOX and NVIDIA
Italy #1 AI supercomputer,
LEONARDO,to launch with
14,000 NVIDIA Ampere GPUs
UK #1 AI supercomputerin
Cambridge built by NVIDIA
for life science R&D
US fastestAI supercomputer
for academiaat University of
Florida built with NVIDIA
Canada Shared Services installs
DGX Pod to support AI
adoption across Canadian gov’t
agencies
Czech Republic’s largest AI
supercomputer by NVIDIA&
HPE at TechnicalUniv Ostrava

7
First and only workstation with 4-way NVIDIA A100
GPUs, NVLink, and MIG
Four A100 Tensor Core GPUs, 320 GB total HBM2E
Multi-Instance GPU (MIG) for up to 28 GPU instances
in a single DGX Station A100
3rd generation NVLink
200 GB/s bi-directional bandwidth between any GPU
pair, almost 3x compared to PCIe Gen4
New maintenance-free refrigerant cooling system
DGX STATION A100 320G
Workgroup Appliance for the Age of AI
CPU and Memory
64-core AMD® EPYC® CPU, PCIe Gen4
512 GB system memory
Internal Storage
1.92 TB NVME M.2 SSD for OS
7.68TB NVME U.2 SSD for data cache
Connectivity
2x 10GbE (RJ45)
4x Mini DisplayPort for display out
Remote management 1GbE LAN port (RJ45)

Lunar surface imagery enhancement
13
Fig. 1: NASA’s Lunar Reconnaissance
Orbiter
Fig. 2: Apollo XVI landing site

Team Photo from Bootcamp
Introduce your team and faculty -
connect with your audience.
TIME: No more than 30 seconds
Researchers
Team Advisors:
Paula Harder
Jose I. Delgado-
Centeno
Team leads
Ben Moseley Valentin Bickel
Siddha Ganju Miguel A. Olivarez-
Mendez
Freddie Kalaitzis
Muhammed Razzak
Yarin Gal
Chedy Raissi

Next Level of AI GPGPU
in Space Applications
Aitech’s S-A1760 Venus™: most
powerful and smallest space AI GPGPU in
small form factor (SFF). Suitable for the next
gen of short duration spaceflight, NEO and
LEO.
17

19
19
Main Computational Challenges
Edge computing for smart grid management:
• Better management of data for better adequacy of supply
and demand => this will accelerate integration of
renewables therefore limiting need for storage
• Finding the shortest path in a grid is computationally
expensive for a neural network
• Real-time monitoring and adjustments of electricity flows
• Massive usage of IoT devices and sensors that gather and
transmit data back and forth in the system
❖ AI-based approaches for power grid stability
❖ Solutions like digital twins for logistics
❖ Smart metering
❖ Exploring Artificial Neural Networks (ANN) algorithms
for network planning, electric load forecasting
❖ Supply and demand forecasting with weather
forecasting as input
❖ Help developing supply chain digital twins as detailed
simulation models which use real-time data and
snapshots to forecast supply chain dynamics
Smart Grids Assessment using
intelligent
applications at
the EDGE

20
SIMNET CLIMATE
Rapid design optimization for alternative-energy solutions
https://windinspire.jhu.edu/wp-content/uploads/2016/12/Large-Eddy-simulation.jpg

21
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
SIMNET: AI BASED SIMULATION
http://developer.nvidia.com/simnet
AI based Techniques
Traditional Simulations
Physical Prototyping
Past Present Future
Traditional numerical solvers
work on one problem at a time
making design process time
consuming, do not address
real-time simulations, data
assimilation, inverse problems
Data driven NN require data, are
oblivious to physics laws, suffer
from
interpolation/extrapolation
errors and are not generalizable
Physical Prototyping is iterative,
time consuming, costly and not
optimized for material and
characteristics

22
SIMNET
22
Product Architecture
Visualization
Geometry &
Point Clouds
PINNs
based
Solver
Framework
HW
DGX POD
DGX
GPU
CSV Tensor Board VTK Paraview
Boundary
Conditions
Monitor Inference
Data
Validation
Data
Training
Domain
Monitor
Domain
Inference
Domain
Validation
Domain

23
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
AI ENABLING NEXT GENERATION SIMULATION
Inverse & Data Assimilation Problems Improved Physics & Predictions

24
24
EXPANDING NGC
NEW CONTAINERS FOR A100 & ARM
Now
NGC-READY SYSTEMS FOR A100
Starting Q3
NGC Private Registry
NGC Container
Environment Modules
Higher HPC app
performance w/ NVTAGS
NEW FEATURES
Now
Multi-arch support for x86,
Arm and Power
Learn More – ngc.nvidia.com | NGC Private Registry | NVTAGS | NGC Container Environment Modules
HPC Simulation & Visualization
AI Frameworks (A100)
Chroma
AutoDock 4
VMD
**
* Available week of June 22 ** Available starting with v20.06
*
*
*

26
EARTH-SYSTEM MODELS
Long-wave and short-wave radiation
Cloud macro and micro-physics
Deep and shallow convection
Planetary boundary layer
Turbulent mountain stress
Gravity wave drag
Surface fluxes
Aerosols
Chemistry
Simulating the Earth
LARGE SCALE
DYNAMICS
SMALL SCALE
PHYSICS
Rotational Fluid Dynamics,
Confined to Sphere
Compressible Atmosphere
Chaotic Internal Variability

27
OMNIVERSE EARTH
Digital Twins Of The Earth for Climate Adaptation And Resilience
ESA
EUMESTAT
ECMWF
NVIDIA

30
KAOLIN
- A Pytorch library for 3D DL
- Supports a wide range of 3D data representations
- Convenient dataloading/preprocessing/conversions
- Large collection of 3D neural nets to choose from
- Optimized implementations
- Omniverse-Kit integration for easy rendering,
interactive visualization, and much more.
https://gitlab-
master.nvidia.com/Toronto_
DL_Lab/kaolin

DiSECt
BEST STUDENT PAPER, RSS 2021

33
https://developer.nvidia.com/nvidia-cloudxr-sdk

34
https://github.com/NVIDIA/cuda-python

35
NVIDIA TOOLS EXTENSION LIBRARY (NVTX)
● Nsight 2021.2 release
● NVTX is a platform agnostic, tools
agnostic API
● Allows developers to
annotate(mark) source code,
events, code ranges etc
● NVIDIA optimized Tensorflow, PyTorch, MXnet
have NVTX annotations built in
Import Library
Insert Python Annotations
Use Any NVIDIA Profiling Tool
*Nsight Systems, Nsight Compute and Deep Learning Profiler make use of NVTX markers
https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvtx

36
Processes and
threads
CUDA and OpenGL
API trace
Multi-GPU
Kernel and memory
transfer activities
cuDNN and
cuBLAS trace
Thread/core
migration
Thread state

37
NVIDIA DATA LOADING LIBRARY (DALI)
Fast Data Processing Library for Accelerating Deep Learning
DALI in DL Training Workflow
Currently supports:
• ResNet50 (Image Classification), SSD (Object Detection)n
• Input Formats – JPEG, LMDB, RecordIO, TFRecord, COCO,
H.264, HVEC
• Python/C++ APIs to define, build & run an input pipeline
Full input pipeline acceleration including
data loading and augmentation
Drop-in integration with direct plugins to DL
frameworks and open source bindings
Portable workflows through multiple input
formats and configurable graphs
Flexible through configurable graphs and
custom operators
Over 1000 GitHub stars | Top 50 ML Projects (out of 22,000 in 2018)

38
DALI RESOURCES
Official Documentation (Quick Start, Developer Guides)
https://docs.nvidia.com/deeplearning/sdk/index.html#data-loading
GitHub Documentation
https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/index.html
DALI Samples & Tutorial
https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/index.html
DALI Blog
https://devblogs.nvidia.com/fast-ai-data-preprocessing-with-nvidia-dali/

39
TENSORS: MULTI-DIMENSIONAL STRUCTURE
Spatio-temporal data, (f)MRI, Deep Net Features, LIDAR data, …

40
4D CONVNET OVER SPACE AND TIME
3D space + time as a single entity (Minkowski space)
40
Slides by Chris Choy, NVIDIA

41
SPATIAL SPARSITY IN 3D
3D PERCEPTION WITH SPARSE TENSORS BY CHRISTOPHER CHOY
Dai et al., ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, CVPR’17

42
MINKOWSKI ENGINE
Discriminative Networks
Benjamin Graham, Sparse 3D convolutional neural networks, BMVC’15
Chris Choy, JunYoung Gwak, Silvio Savarese, 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, CVPR’19
github.com/NVIDIA/MinkowskiEngine

43
Jean Kossaifi
Anima Anandkumar
Maja Pantic
Yannis Panagakis
Jeremy Cohen Julia Gusak Meraj Hashemizadeh Aaron Meurer Yngve Mardal Moe
Taylor Lee Patti Marie Roald Caglayan Tuna
…
Aaron Meyer
tensorly.org

44
Materials/MDL Physics/VFX
AI Path-Tracing USD
NVIDIA CORE TECH
UNIVERSAL ASSET EXCHANGE AND SHARED VIRTUAL WORLD
COLLABORATORS PORTAL

45
CUTTING EDGE APPLICATIONS
Core Omniverse Apps
FOR DESIGNERS, CREATORS, ENGINEERS FOR ROBOTICISTS, SIMULATION SPECIALISTS
FOR GEFORCE RTX GAMERS
FOR DESIGNERS, CREATORS, ENGINEERS FOR 3D DEEP LEARNING RESEARCHERS
FOR GAME DEVELOPERS, ANIMATORS

47
OMNIVERSE MACHINIMA
Advanced Simulation Technologies
wrnch AI Pose Estimation
Use a mobile camera to capture human body motions and
automatically apply to 3D character mesh.
Omniverse RTX Renderer
1440p @ 30 fps, NVIDIA MDL materials, fully dynamic
lighting. Easily toggle between interactive path tracing
mode and real time ray tracing mode.
NVIDIA PhysX 5, Flow, and Blast
Exclusive access to NVIDIA PhysX 5 advanced physics
simulation tool kit, plus Flow and Blast for easily
implemented realistic fire, smoke and explosions.

48
OMNIVERSE AUDIO2FACE
› Powered by NVIDIA AI
› Instant automatic facial animation with realistic,
believable motion
› Switch between voices, genders, and languages
› Use dialogue track, or singing
AI-Powered Facial Animation from an
Audio Track

Best in Show Award, Siggraph 2021

52
DEPLOY ON ANY NVIDIA RTX GPU
From Laptop, to Data Center. On-premise or in the Cloud.
NVIDIA Studio
Any RTX Workstation or Laptop
EGX Platform
NVIDIA Certified Systems with
RTX
Professional Visualization
Quadro RTX 4000 to
NVIDIA RTX A6000

53
SEE YOU IN OMNIVERSE
DEVELOP ON OMNIVERSE DOCUMENTATION TUTORIALS AND WEBINARS
DOWNLOAD OPEN BETA
EXPLORE OMNIVERSE ENTERPRISE

BEST PAPER, ICRA 2021
REACTIVE HUMAN-TO-ROBOT HANDOVERS OF ARBITRARY OBJECTS
https://sites.google.com/nvidia.com/handovers-of-arbitrary-objects.

56
KAYA — A ROBOT FOR MAKERS
Low-cost platform to get
started with robotics
Follow Me App
Object Detection DNN
NVIDIA Jetson Nano

MUCH MORE WITH ISAAC SOFTWARE
GPU Accelerated Algorithms/DNNs (GEMs)
And more…
Free Space Segmentation 3D Object Pose Estimation Motion Planning Stereo Depth
Stereo Visual Inertial Odometry Super Pixels April Tags 2D Skeleton Pose Estimation
DeepStream Integration Planner with Costmaps Multi Lidar Support Navigation (LQR Path Planner)
Sensors Robot Platforms Audio

(1) RL trains expert (2) Student mimics expert
Training
Environment
policy
gradient
Exper
t
Weak
Augment
Exper
t
Supervise
gradient
Studen
t
Strong
Augment
SECANT
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies.
Fan, Wang, Huang, Yu, Fei-Fei, Zhu, Anandkumar. https://arxiv.org/abs/2106.09678

GANcraft learns details that are much finer than a single block

62
DRAFT – FOR PARTNER INTERNAL USE ONLY
THE NEW ERA OF VISUAL COMPUTING
Generative Design
Analytics
Ray Tracing
AR, VR, XR
Design Reviews
Virtualization
Performance
Quality
AI EVERYWHERE ADVANCED VISUALIZATION INTERACTIVE SIMULATION REMOTE COLLABORATION
Image courtesy of Altair Engineering
Image courtesy of KPF

63
DRAFT – FOR PARTNER INTERNAL USE ONLY

Pyramid Vision Transformer (PVT)
https://arxiv.org/abs/2105.15203
Wang et al., Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, arXiv21

66
NVIDIA Riva
GPU-Accelerated SDK for Multimodal Conversational AI Applications
Sign up: developer.nvidia.com/riva
End-to-End Multimodal Conversational AI Skills
Pre-trained SOTA models-100,000 Hours of DGX
Retrain with Transfer Learning Toolkit
Deploy Services with One Line of Code
<300 ms latency | 1/3rd Cost on A100 versus CPU
Audio/
text
PRETRAINED MODELS TAO
NVIDIA GPU CLOUD RETRAIN
ASR
Noisy Environments
Accents & Jargon Spontaneous,
Scripted, Phone
NLU
Question Answering, Contextual
Understanding Sentiment
TTS
Voice Fonts, Emotional
Control Inflection & Cadence
Dialog Manager
I
Riva SKILLS
TRANSFER LEARNING
TOOLKIT
Multi-speaker
domain
specific
output
INFERENCE
Available in Riva
1.0 Beta

67
Real-Time Transcription Virtual Assistant
Highly accurate domain specific
conversational AI bot
Riva OPEN BETA USE CASES
End-to-end voice enabled AI assistant
Chatbot
Generate highly accurate real-time
transcriptions

68
Riva RESOURCES
10 Pre-Trained Models + Notebooks
Collection of pre-trained ASR, NLU, TTS models with
notebooks to fine-tune with TLT and export to Riva
5 New Riva Developer Blogs
Introduction to NVIDIA Riva
Tutorials for building real-time apps with Riva, including:
Question Answering | Virtual Assistant | Transfer Learning
Transcription & Entity Recognition
4 New Sample Applications
Ready-to-run sample apps for transcription and entity
recognition, Virtual Assistant, Virtual Assistant with Rasa
Integration & Speechsquad

70
DEEP LEARNING INSTITUTE
Training  Labs
Nanodegrees
nvidia.com/DLI
TWO DAYS TO A DEMO
Create your first demo today
developer.nvidia.com/
embedded/twodaystoademo
JETSON DEVELOPER KIT
AGX Xavier Developer Kit $699
Xavier NX software patch
developer.nvidia.com/
buy-jetson
GTC
Largest event for GPU
developers
gputechconf.com
JETSON - START NOW

ENTERING THE AI HEALTHCARE ERA
AI Papers in PubMed
(Machine Learning or Deep Learning)
4.5x AI Investment
Drugs, Cancer, Molecular, Drug Discovery*
*Source: https://hai.stanford.edu/research/ai-index-2021
Healthcare Spend $8T | Growing & Aging Population | Chronic Disease | Public Health
NLP IMAGING INSTRUMENTS CONVERSATIONAL AI DRUG DISCOVERY
GENOMICS

NLP IMAGING AGX GUARDIAN DISCOVERY
PARABRICKS
NVIDIA CLARA
Computational Platform for the AI Healthcare Era
NVIDIA CLARA
Private Cloud Edge Datacenter Embedded Device

74
DLI UNIVERSITY
TRAINING
UNIVERSITY AMBASSADOR PROGRAM
• Qualified faculty and researchers can get certified to teach DLI
workshops to their students at no cost.
• Hundreds of universities certified around the world, including:
TEACHING KITS
• Qualified university educators can download courseware across
deep learning, accelerated computing, and robotics.
• Kits include lecture materials, GPU cloud resources, access to
self-paced DLI courses, and more.
Learn more at
www.nvidia.com/dli

75
APPLIED RESEARCH ACCELERATOR PROGRAM
Use case with deployed
GPU-accelerated
application
Development of GPU-
accelerated
application
Basic Research
conducted by University
Applied Research
project(s)
Program focus
Supports research projects that have the potential to make a real-world impact through
deployment into GPU-accelerated applications adopted by commercial and government
organizations.
Program Benefits
Hardware and funding grants
Technical guidance and support
Grant application support
Hands-on training with the NVIDIA Deep
Learning Institute
Networking and marketing opportunities Robotics and AI for Automation
Apply online: https://www.nvidia.com/en-gb/industries/higher-education-research/applied-research-program/

THANKS
for listening!
alowndes@nvidia.com

EPSRC CDT Conference

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to EPSRC CDT Conference

Similar to EPSRC CDT Conference (20)

More from Alison B. Lowndes

More from Alison B. Lowndes (18)

Recently uploaded

Recently uploaded (20)

EPSRC CDT Conference