HPC AT-SCALE ENABLED BY
DDN A3I AND NVIDIA SUPERPOD
William Beaudin
Sr Director, Engineering
wbeaudin@ddn.com
DDN ©2020 DataDirect Networks, Inc.
THE LEADER AT SCALE,
PROVEN IN PRODUCTION
20 years of HPC experience
The largest environments,
the most exacting requirements
DDN A3I Solutions – The World’s Fastest HPC Storage – Made Simple
FASTEST PERFORMANCE,
EFFORTLESS GROWTH
Faster and deeper insight
Complexity eliminated,
seamless end-to-end integration
RELIABLE, RESILIENT
AND FLEXIBLE
24x7 Productivity
The universal AI platform
for all stages of the data cycle
DDN ©2020 DataDirect Networks, Inc.
Optimized AI Platforms
For Every Use Case
Accelerate applications by achieving full GPU saturation on DGX
Streamline concurrent and continuous deep learning workflows
Flexible configuration with best technology and economics
Seamless scaling to match evolving workflow needs
Optimized for DGX platforms and NGC containers for DL and HPC
Easy to deploy and manage with turnkey support from DDN and partners
®
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
DDN A3I X-APPLIANCES – THE BUILDING BLOCKS FOR DATA AT-SCALE
ALL-NVME APPLIANCE FULLY-OPTIMIZED
FOR THE MOST INTENSIVE WORKLOADS
FAST, FLEXIBLE HYBRID APPLIANCE
SCALES EFFORTLESSLY WITH BEST DENSITY
FULLY VALIDATED AT-SCALE WITH NVIDIA DGX-2 SUPERPOD!
AI7990XXAI400XX
DDN ©2020 DataDirect Networks, Inc.
DDN
THE SUPERPOD ACCELERATOR
AI400XX
DDN ©2020 DataDirect Networks, Inc.
DDN A3I APPLIANCES MAKE SUPERPOD FAST AND EASY TO DEPLOY
► Simplified design with predictable performance
and capacity with future scaling
► Validated reference architectures for easier
planning with at-scale workflows
► Fully-configured appliances arrive ready to
deploy and install in minutes
► Seamless integration with NVIDIA DGX for
moving rapidly to production
► Comprehensive expert services from DDN and
partners delivered globally
DDN ©2020 DataDirect Networks, Inc.
HBA STORAGESAN SWITCHHBASERVERHCA/NICSWITCHHCA/NICCLIENT
DDN A3I IO PATH
SWITCHHCA/NICCLIENT
SIMPLIFIED STACK WITH
DDN A3I APPLIANCES
FILESYSTEM
COMMON IO PATH
DDN A3I APPLIANCES REDUCE COST AND COMPLEXITY
SIMPLE TO DEPLOY, MANAGE AND SCALE!
DDN ©2020 DataDirect Networks, Inc.
DDN A3I SHARED PARALLEL ARCHITECTURE
DDN A3I ARCHITECTURE - TRUE END-TO-END PARALLELISMLEGACY NAS ARCHITECTURE
NAS FILE
SERVER
NAS CLIENT
EMBEDDED
A3I SERVER
EMBEDDED
A3I SERVER
NAS IS A BOTTLENECK
CRIPPLES AS YOU GROW
DDN IS FAST AND RELIABLE
SCALES SEAMLESSLY AS YOU GROW
A3I CLIENTA3I CLIENT
EMBEDDED
A3I SERVER
EMBEDDED
A3I SERVER
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
INFINITE DGX POD
PERFORMANCE
0
20000
40000
60000
80000
100000
120000
140000
8 16 32 64 128
IMAGESPERSECOND
NUMBER OF GPUs
NFS Storage DDN Storage
At-scale multi-node distributed training
application using 8 NVIDIA V100 GPUs per
server engaged simultaneously.
Results demonstrate linear scaling, with full
application performance up to 128 GPUs.
NFS architecture and protocol storage stalls
at 4 nodes, event with all-flash disks and
high-speed InfiniBand network.
DDN A3I WITH DGX POD
SCALES SEAMLESSLY WITHOUT BOUNDS
NFS MAX
DDN LIMITLESS
SCALING
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
DDN A3I SOLUTIONS – NVIDIA SUPERPOD REFERENCE ARCHITECTURE
► AI400 + DGX-2 SuperPOD at-scale testing and
validation published by NVIDIA:
• The AI400 All-flash appliance delivers incredible
sequential and random read performance, as
required by the heaviest DL workloads.
• Metadata performance scales well from 1 to 96
nodes, with no degradation as the number of
nodes and threads increases.
• The AI400 is a fully-integrated platform that’s easy
to deploy. DDN provides excellent technical
deployment and support services.
► RA document available from NVIDIA website
NVIDIA DGX-2 SUPERPOD
REFERENCE ARCHITECTURE
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
DDN A3I SCALES FLEXIBLY TO MATCH YOUR SUPERPOD ENVIRONMENT
ALL-NVME MULTI-TIER MULTI-SITE
ALL-FLASH SOLUTION
FOR MAXIMUM PERFORMANCE
HYBRID OPTIMIZED
FOR MIXED WORKFLOWS
DISTRIBUTED SYSTEMS
PER DATACENTER CAPACITY
MULTI-CLOUD
FULL-CLOUD INTEGRATION
AND BETWEEN CLOUDS
DDN ©2020 DataDirect Networks, Inc.
DDN A3I MULTIRAIL ENABLES PLUG-AND-PLAY HPC NETWORKING
Fast, secure, resilient
networking made easy
Enhanced algorithm enables grouping of
multiple network interfaces and achieve full
aggregate throughput capabilities on a node.
Intelligent interface selection and traffic
management deliver unprecedented node
performance and dynamic load-balancing.
Active link health monitoring ensures rapid
failure detection and automatic recovery.
Multi-Rail Networking Discrete Networking
DDN A3I Multirail greatly simplifies DGX deployments.
DDN ©2020 DataDirect Networks, Inc.
DDN A3I - GPUDIRECT TO STORAGE DOUBLES DGX-2 THROUGHPUT!
80 GB/s per client,
20 X performance gains
Native client integration with A3I, fully-
transparent to users and applications
Enables a direct path to transfer data
between GPU memory and data storage
Eliminates unnecessary memory copies,
lowers CPU overhead, reduces latency,
bypasses hardware architecture limitations
Improves AI, DL, HPC application performance
39.8 GB/s
80 GB/s
0 GB/s
10 GB/s
20 GB/s
30 GB/s
40 GB/s
50 GB/s
60 GB/s
70 GB/s
80 GB/s
NO GPUDirect WITH GPUDirect
GPU READ THROUGHPUT WITH TWO DDN AI400
CPU IO GPUDirect
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
5 DGX-2
10 AI400
400 GB/s
0 GB/s
50 GB/s
100GB/s
150GB/s
200GB/s
250GB/s
300GB/s
350GB/s
400GB/s
1 DGX-2 2 DGX-2 3 DGX-2 4 DGX-2 5 DGX-2
GPU READ THROUGHPUT SCALING WITH TEN AI400s
®
DDN DELIVERS LINEAR PERFORMANCE SCALING
DDN A3I GPUDirect integration delivers
up to 80 GB/s of throughput per DGX-2
Enables a direct path to transfer data
between GPU memory and data storage
Performance scales linearly and provides
maximum at-scale application acceleration
DDN ©2020 DataDirect Networks, Inc.
Preprocess
SupercomputingMulti-Physics Workflows AMR Checkpoint
Classify Manage Train Tune Infer
User & Service Management
Data Management
Multi-Tenanted
Security
More Value from All
your Data In One Place
► Your data in the right place at the right time
► Multicloud Ready
► The right user and service management to aid your
environment’s efficiencies
► Comprehensive Security for Containers, Cloud and On
Prem
Transparent Tiering
NFS SMB HDFSS3
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
REAL-TIME
EXASCALE
ANALYTICS
DDN A3I INTEGRATED MONITORING
PLATFORM DELIVERS CURRENT AND
HISTORICAL METRICS TO UNDERSTAND
SYSTEM PERFORMANCE, OPTIMIZE
OPERATIONS, AND ACHIEVE THE FULL
POTENTIONAL OF YOUR APPLICATIONS. DISCOVER INSIGHTFUL TRENDS TO
OPTIMIZE YOUR SUPERPOD
DDN ©2020 DataDirect Networks, Inc.
DDN A3I - PARALLEL DATA PATHS TO CONTAINERS
Network Infrastructure
Host Operating System
Containerized Apps
DDN A3I Container-Optimized Client
DDN A3I Appliances
NGC
Full NVMe Performance,
In-Container
DDN A3I enables seamless, fastest file-level
access to shared storage directly from
containerized applications at runtime
Compartmentalized data access and multi-
tenancy with trusted levels of segregation
Capability inserted at runtime with a
universal wrapper and does not require any
modification to application or container
DDN ©2020 DataDirect Networks, Inc.
DDN A3I: SCALE SECURELY WITH SUPERIOR VISIBILITY AND CONTROL
AUTHENTICATION ACCESS CONTROL MULTITENANCY ENCRYPTION AUDITING
Establish user and
node identity with
full confidence.
Enforce policy and
multiple levels of
classification.
Share infrastructure
to enable limitless
at-scale flexibility.
Secure all your data
end-to-end,
live and at rest.
Record and retain
activity for review
and compliance.
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
SECURED
INNOVATION
WITH MULTITENANCY
Shared access to high-performance
infrastructure enables the most productive
and efficient collaborations at-scale.
Container-based authentication allows
your business to securely and seamlessly
deliver the right data to the right teams.
DDN LIMITLESS FLEXIBILITY AND SCALABILITY FOR ALL WORKLOADS
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
DDN A3I
MASSIVELY
ACCELERATES
LIFE SCIENCES
RESEARCH
100X
FASTER GENOMICS
1600X
FASTER MICROSCOPY
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
40X FASTER
GENOMICS
WITH DDN A3I SOLUTIONS
AND NVIDIA DGX AT SCALE!
• Robust national biobank improved through
wider access to data and better performance
through an expanding storage infrastructure
• HPC computational resources: 300 compute
node cluster, 3 shared memory compute
nodes, and NVIDIA DGX systems
• 29 PB of data and growing
• 40X speedup of genomic analysis with DDN,
NVIDIA and Parabricks
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
DDN A3I ACCELERATED LIFE SCIENCES RESEARCH DATA SOLUTIONS
STORE, PROCESS, ANALYZE, VISUALIZE
FROM A SINGLE DATA PLATFORM
CAPTURE IN REAL TIME
FROM MULTIPLE INSTRUMENTS
THE FASTEST
TIME TO RESULTS
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
TURNKEY SUPERPOD SOLUTIONS FROM DDN AND GLOBAL PARTNERS
NVIDIA SuperPOD approved partners
delivering completely integrated solutions
with DDN A3I end-to-end workflow
enablement and acceleration.
Jade -- 1st SuperPOD Worldwide
Oxford University and The Hartree Centre
SuperPOD solution delivered by ATOS
DDN Confidential
DDN ©2020 DataDirect Networks, Inc.
ACCELERATE YOUR
SUPERPOD WITH AI400XX
ACHIEVE FULL GPU PERFORMANCE
GROW SEAMLESSLY TO EXASCALE
BOOST WORKLOADS WITH GPUDIRECT
ENABLE END-TO-END AI WORKFLOWS
RELY ON THE AT-SCALE EXPERTS
DDN ©2020 DataDirect Networks, Inc.
DDN.COM/A3I

HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD

  • 1.
    HPC AT-SCALE ENABLEDBY DDN A3I AND NVIDIA SUPERPOD William Beaudin Sr Director, Engineering wbeaudin@ddn.com
  • 2.
    DDN ©2020 DataDirectNetworks, Inc. THE LEADER AT SCALE, PROVEN IN PRODUCTION 20 years of HPC experience The largest environments, the most exacting requirements DDN A3I Solutions – The World’s Fastest HPC Storage – Made Simple FASTEST PERFORMANCE, EFFORTLESS GROWTH Faster and deeper insight Complexity eliminated, seamless end-to-end integration RELIABLE, RESILIENT AND FLEXIBLE 24x7 Productivity The universal AI platform for all stages of the data cycle
  • 3.
    DDN ©2020 DataDirectNetworks, Inc. Optimized AI Platforms For Every Use Case Accelerate applications by achieving full GPU saturation on DGX Streamline concurrent and continuous deep learning workflows Flexible configuration with best technology and economics Seamless scaling to match evolving workflow needs Optimized for DGX platforms and NGC containers for DL and HPC Easy to deploy and manage with turnkey support from DDN and partners ®
  • 4.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. DDN A3I X-APPLIANCES – THE BUILDING BLOCKS FOR DATA AT-SCALE ALL-NVME APPLIANCE FULLY-OPTIMIZED FOR THE MOST INTENSIVE WORKLOADS FAST, FLEXIBLE HYBRID APPLIANCE SCALES EFFORTLESSLY WITH BEST DENSITY FULLY VALIDATED AT-SCALE WITH NVIDIA DGX-2 SUPERPOD! AI7990XXAI400XX
  • 5.
    DDN ©2020 DataDirectNetworks, Inc. DDN THE SUPERPOD ACCELERATOR AI400XX
  • 6.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I APPLIANCES MAKE SUPERPOD FAST AND EASY TO DEPLOY ► Simplified design with predictable performance and capacity with future scaling ► Validated reference architectures for easier planning with at-scale workflows ► Fully-configured appliances arrive ready to deploy and install in minutes ► Seamless integration with NVIDIA DGX for moving rapidly to production ► Comprehensive expert services from DDN and partners delivered globally
  • 7.
    DDN ©2020 DataDirectNetworks, Inc. HBA STORAGESAN SWITCHHBASERVERHCA/NICSWITCHHCA/NICCLIENT DDN A3I IO PATH SWITCHHCA/NICCLIENT SIMPLIFIED STACK WITH DDN A3I APPLIANCES FILESYSTEM COMMON IO PATH DDN A3I APPLIANCES REDUCE COST AND COMPLEXITY SIMPLE TO DEPLOY, MANAGE AND SCALE!
  • 8.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I SHARED PARALLEL ARCHITECTURE DDN A3I ARCHITECTURE - TRUE END-TO-END PARALLELISMLEGACY NAS ARCHITECTURE NAS FILE SERVER NAS CLIENT EMBEDDED A3I SERVER EMBEDDED A3I SERVER NAS IS A BOTTLENECK CRIPPLES AS YOU GROW DDN IS FAST AND RELIABLE SCALES SEAMLESSLY AS YOU GROW A3I CLIENTA3I CLIENT EMBEDDED A3I SERVER EMBEDDED A3I SERVER
  • 9.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. INFINITE DGX POD PERFORMANCE 0 20000 40000 60000 80000 100000 120000 140000 8 16 32 64 128 IMAGESPERSECOND NUMBER OF GPUs NFS Storage DDN Storage At-scale multi-node distributed training application using 8 NVIDIA V100 GPUs per server engaged simultaneously. Results demonstrate linear scaling, with full application performance up to 128 GPUs. NFS architecture and protocol storage stalls at 4 nodes, event with all-flash disks and high-speed InfiniBand network. DDN A3I WITH DGX POD SCALES SEAMLESSLY WITHOUT BOUNDS NFS MAX DDN LIMITLESS SCALING
  • 10.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. DDN A3I SOLUTIONS – NVIDIA SUPERPOD REFERENCE ARCHITECTURE ► AI400 + DGX-2 SuperPOD at-scale testing and validation published by NVIDIA: • The AI400 All-flash appliance delivers incredible sequential and random read performance, as required by the heaviest DL workloads. • Metadata performance scales well from 1 to 96 nodes, with no degradation as the number of nodes and threads increases. • The AI400 is a fully-integrated platform that’s easy to deploy. DDN provides excellent technical deployment and support services. ► RA document available from NVIDIA website NVIDIA DGX-2 SUPERPOD REFERENCE ARCHITECTURE
  • 11.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. DDN A3I SCALES FLEXIBLY TO MATCH YOUR SUPERPOD ENVIRONMENT ALL-NVME MULTI-TIER MULTI-SITE ALL-FLASH SOLUTION FOR MAXIMUM PERFORMANCE HYBRID OPTIMIZED FOR MIXED WORKFLOWS DISTRIBUTED SYSTEMS PER DATACENTER CAPACITY MULTI-CLOUD FULL-CLOUD INTEGRATION AND BETWEEN CLOUDS
  • 12.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I MULTIRAIL ENABLES PLUG-AND-PLAY HPC NETWORKING Fast, secure, resilient networking made easy Enhanced algorithm enables grouping of multiple network interfaces and achieve full aggregate throughput capabilities on a node. Intelligent interface selection and traffic management deliver unprecedented node performance and dynamic load-balancing. Active link health monitoring ensures rapid failure detection and automatic recovery. Multi-Rail Networking Discrete Networking DDN A3I Multirail greatly simplifies DGX deployments.
  • 13.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I - GPUDIRECT TO STORAGE DOUBLES DGX-2 THROUGHPUT! 80 GB/s per client, 20 X performance gains Native client integration with A3I, fully- transparent to users and applications Enables a direct path to transfer data between GPU memory and data storage Eliminates unnecessary memory copies, lowers CPU overhead, reduces latency, bypasses hardware architecture limitations Improves AI, DL, HPC application performance 39.8 GB/s 80 GB/s 0 GB/s 10 GB/s 20 GB/s 30 GB/s 40 GB/s 50 GB/s 60 GB/s 70 GB/s 80 GB/s NO GPUDirect WITH GPUDirect GPU READ THROUGHPUT WITH TWO DDN AI400 CPU IO GPUDirect
  • 14.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. 5 DGX-2 10 AI400 400 GB/s 0 GB/s 50 GB/s 100GB/s 150GB/s 200GB/s 250GB/s 300GB/s 350GB/s 400GB/s 1 DGX-2 2 DGX-2 3 DGX-2 4 DGX-2 5 DGX-2 GPU READ THROUGHPUT SCALING WITH TEN AI400s ® DDN DELIVERS LINEAR PERFORMANCE SCALING DDN A3I GPUDirect integration delivers up to 80 GB/s of throughput per DGX-2 Enables a direct path to transfer data between GPU memory and data storage Performance scales linearly and provides maximum at-scale application acceleration
  • 15.
    DDN ©2020 DataDirectNetworks, Inc. Preprocess SupercomputingMulti-Physics Workflows AMR Checkpoint Classify Manage Train Tune Infer User & Service Management Data Management Multi-Tenanted Security More Value from All your Data In One Place ► Your data in the right place at the right time ► Multicloud Ready ► The right user and service management to aid your environment’s efficiencies ► Comprehensive Security for Containers, Cloud and On Prem Transparent Tiering NFS SMB HDFSS3
  • 16.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. REAL-TIME EXASCALE ANALYTICS DDN A3I INTEGRATED MONITORING PLATFORM DELIVERS CURRENT AND HISTORICAL METRICS TO UNDERSTAND SYSTEM PERFORMANCE, OPTIMIZE OPERATIONS, AND ACHIEVE THE FULL POTENTIONAL OF YOUR APPLICATIONS. DISCOVER INSIGHTFUL TRENDS TO OPTIMIZE YOUR SUPERPOD
  • 17.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I - PARALLEL DATA PATHS TO CONTAINERS Network Infrastructure Host Operating System Containerized Apps DDN A3I Container-Optimized Client DDN A3I Appliances NGC Full NVMe Performance, In-Container DDN A3I enables seamless, fastest file-level access to shared storage directly from containerized applications at runtime Compartmentalized data access and multi- tenancy with trusted levels of segregation Capability inserted at runtime with a universal wrapper and does not require any modification to application or container
  • 18.
    DDN ©2020 DataDirectNetworks, Inc. DDN A3I: SCALE SECURELY WITH SUPERIOR VISIBILITY AND CONTROL AUTHENTICATION ACCESS CONTROL MULTITENANCY ENCRYPTION AUDITING Establish user and node identity with full confidence. Enforce policy and multiple levels of classification. Share infrastructure to enable limitless at-scale flexibility. Secure all your data end-to-end, live and at rest. Record and retain activity for review and compliance.
  • 19.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. SECURED INNOVATION WITH MULTITENANCY Shared access to high-performance infrastructure enables the most productive and efficient collaborations at-scale. Container-based authentication allows your business to securely and seamlessly deliver the right data to the right teams. DDN LIMITLESS FLEXIBILITY AND SCALABILITY FOR ALL WORKLOADS
  • 20.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. DDN A3I MASSIVELY ACCELERATES LIFE SCIENCES RESEARCH 100X FASTER GENOMICS 1600X FASTER MICROSCOPY
  • 21.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. 40X FASTER GENOMICS WITH DDN A3I SOLUTIONS AND NVIDIA DGX AT SCALE! • Robust national biobank improved through wider access to data and better performance through an expanding storage infrastructure • HPC computational resources: 300 compute node cluster, 3 shared memory compute nodes, and NVIDIA DGX systems • 29 PB of data and growing • 40X speedup of genomic analysis with DDN, NVIDIA and Parabricks
  • 22.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. DDN A3I ACCELERATED LIFE SCIENCES RESEARCH DATA SOLUTIONS STORE, PROCESS, ANALYZE, VISUALIZE FROM A SINGLE DATA PLATFORM CAPTURE IN REAL TIME FROM MULTIPLE INSTRUMENTS THE FASTEST TIME TO RESULTS
  • 23.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. TURNKEY SUPERPOD SOLUTIONS FROM DDN AND GLOBAL PARTNERS NVIDIA SuperPOD approved partners delivering completely integrated solutions with DDN A3I end-to-end workflow enablement and acceleration. Jade -- 1st SuperPOD Worldwide Oxford University and The Hartree Centre SuperPOD solution delivered by ATOS
  • 24.
    DDN Confidential DDN ©2020DataDirect Networks, Inc. ACCELERATE YOUR SUPERPOD WITH AI400XX ACHIEVE FULL GPU PERFORMANCE GROW SEAMLESSLY TO EXASCALE BOOST WORKLOADS WITH GPUDIRECT ENABLE END-TO-END AI WORKFLOWS RELY ON THE AT-SCALE EXPERTS
  • 25.
    DDN ©2020 DataDirectNetworks, Inc. DDN.COM/A3I