SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2019 Mellanox Technologies 1
Gilad Shainer, MUG, August 2019
InfiniBand In-Network
Computing Technology and Roadmap
© 2019 Mellanox Technologies 2
The Need for Intelligent and Faster Interconnect
CPU-Centric (Onload) Data-Centric (Offload)
Must Wait for the Data
Creates Performance Bottlenecks
Faster Data Speeds and In-Network Computing
Enable Higher Performance and Scale
GPU
CPU
GPU
CPU
Onload Network In-Network Computing
GPU
CPU
CPU
GPU
GPU
CPU
GPU
CPU
GPU
CPU
CPU
GPU
Analyze Data as it Moves!
Higher Performance and Scale
© 2019 Mellanox Technologies 3
GPUDirect
RDMA
Network
Communication
Application  Data Analysis
 Real Time
 Deep Learning
 Mellanox SHARP In-Network Computing
 MPI Tag Matching
 MPI Rendezvous
 Network Transport Offload
 RDMA and GPU-Direct RDMA
 SHIELD (Self-Healing Network)
 Enhanced Adaptive Routing and Congestion Control
Connectivity  Multi-Host Technology
 Socket-Direct Technology
 Enhanced Topologies
Accelerating All Levels of HPC / AI Frameworks
© 2019 Mellanox Technologies 4
Scalable Hierarchical
Aggregation and
Reduction Protocol
(SHARP)
© 2019 Mellanox Technologies 5
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
 Reliable Scalable General Purpose Primitive
 In-network Tree based aggregation mechanism
 Large number of groups
 Multiple simultaneous outstanding operations
 Applicable to Multiple Use-cases
 HPC Applications using MPI / SHMEM
 Distributed Machine Learning applications
 Scalable High Performance Collective Offload
 Barrier, Reduce, All-Reduce, Broadcast and more
 Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND
 Integer and Floating-Point, 16/32/64 bits
Data
Aggregated
Aggregated
Result
Aggregated
Result
Data
Host Host Host Host Host
SwitchSwitch
Switch
© 2019 Mellanox Technologies 6
SHARP AllReduce Performance Advantages (128 Nodes)
SHARP enables 75% Reduction in Latency
Providing Scalable Flat LatencyScalable Hierarchical
Aggregation and
Reduction Protocol
© 2019 Mellanox Technologies 7
SHARP AllReduce Performance Advantages
1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology
SHARP Enables Highest PerformanceScalable Hierarchical
Aggregation and
Reduction Protocol
© 2019 Mellanox Technologies 8
NCCL-SHARP Delivers Highest Performance
© 2019 Mellanox Technologies 9
SHARP Performance Advantage for AI
 SHARP provides 16% Performance Increase for deep learning, initial results
 TensorFlow with Horovod running ResNet50 benchmark, HDR InfiniBand (ConnectX-6, Quantum)
16%
11%
P100 NVIDIA GPUs, RH 7.5, Mellanox OFED 4.4, HPC-X v2.3, TensorFlow v1.11, Horovod 0.15.0
© 2019 Mellanox Technologies 10
MPI Tag Matching
Hardware Engine
© 2019 Mellanox Technologies 11
Tag Matching Hardware Engine Performance Advantage
0
1
2
3
4
5
6
7
8
0 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K
Latency(us)
Message Size (byte)
MPI Latency (Eager)
MVAPICH2 MVAPICH2+HW-TM
35%
0
10000
20000
30000
40000
50000
60000
70000
16K 32K 64K 128K 256K 512K
Latency(us)
Message Size (byte)
MPI iscatterv (1,280 Processes)
MVAPICH2 MVAPICH2+HW-TM
1.8X
Courtesy of Dhabaleswar K. (DK) Panda
Ohio State University
© 2019 Mellanox Technologies 12
GPUDirect
© 2019 Mellanox Technologies 13
Mellanox PeerDirect™ Technology
 Purpose-built for acceleration of Deep Learning
 Provides significant decrease in communication latency for acceleration devices
 Peer-to-peer communications between Mellanox adapters and third-party devices
 Enables GPUDirect™ RDMA, GPUDirect™ ASYNC, ROCm and others
CPU
Chipset
Vendor
Device
CPU
Chipset
Vendor
Device
0101001011
Designed for Deep Learning Acceleration
© 2019 Mellanox Technologies 14
10X Higher Performance with GPUDirect™ RDMA
 Accelerates HPC and Deep Learning performance
 Lowest communication latency for GPUs
GPUDirect™ RDMA
Courtesy of Dhabaleswar K. (DK) Panda
Ohio State University
© 2019 Mellanox Technologies 15
Quality of Service
© 2019 Mellanox Technologies 16
InfiniBand Quality of Service
Low Priority
VL Arbitrary
SL 0-3
SL 4
SL 6
SL 8
SL 10
SL 12
W 32
W 32
W 32
W 64
W 64
User / Workload Category Service Level
W 64
Virtual Lanes over Physical Link
VL-0
VL-1
VL-2
VL-4
VL-5
VL-6
High Priority
VL Arbitrary
User 1
User 2
User 3
User 4
Other
Clock Sync
Backup
Storage
MPI
MPI
Network
© 2019 Mellanox Technologies 17
SHIELD
Self Healing Technology
© 2019 Mellanox Technologies 18
SHIELD - Self Healing Technology
 The ability to overcome network failures, locally, by the switches
 Software-based solutions suffer from long delays detecting network failures
 5-30 seconds for 1K to 10K nodes clusters
 Accelerates network recovery time by 5000X
 The higher the speed or scale the greater the recovery value
 Available with EDR and HDR switches and beyond
Enables Unbreakable Data Centers
© 2019 Mellanox Technologies 19
Adaptive Routing
© 2019 Mellanox Technologies 20
InfiniBand Proven Adaptive Routing Performance
 Oak Ridge National Laboratory – Coral Summit supercomputer
 Bisection bandwidth benchmark, based on mpiGraph
 Explores the bandwidth between possible MPI process pairs
 AR results demonstrate an average performance of 96% of the maximum bandwidth measured
mpiGraph explores the bandwidth
between possible MPI process pairs. In
the histograms, the single cluster with AR
indicates that all pairs achieve nearly
maximum bandwidth while single-path
static routing has nine clusters as
congestion limits bandwidth, negatively
impacting overall application performance.
“The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems”,
Sudharshan S. Vazhkudai, Arthur S. Bland, Al Geist, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don
E. Maxwell, Veronica G. Vergara Larrea, Wayne Joubert, Matthew A. Ezell, Dustin Leverman, James H.
Rogers, Drew Schmidt, Mallikarjun Shankar, Feiyi Wang, Junqi Yin (Oak Ridge National Laboratory) and
Bronis R. de Supinski, Adam Bertsch, Robin Goldstone, Chris Chambreau, Ben Casses, Elsa Gonsiorowski,
Ian Karlin, Matthew L. Leininger, Adam Moody, Martin Ohmacht, Ramesh Pankajakshan, Fernando
Pizzano, Py Watson, Lance D. Weems (Lawrence Livermore National Laboratory) and James Sexton, Jim
Kahle, David Appelhans, Robert Blackmore, George Chochia, Gene Davison, Tom Gooding, Leopold
Grinberg, Bill Hanson, Bill Hartner, Chris Marroquin, Bryan Rosenburg, Bob Walkup (IBM)
InfiniBand High Network Efficiency - mpiGraph
Oak Ridge National Lab Summit Supercomputer
Static Routing Adaptive Routing
© 2019 Mellanox Technologies 21
HDR InfiniBand
© 2019 Mellanox Technologies 22
Highest-Performance 200Gb/s InfiniBand Solutions
Transceivers
Active Optical and Copper Cables
(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)
40 HDR (200Gb/s) InfiniBand Ports
80 HDR100 InfiniBand Ports
Throughput of 16Tb/s, <90ns Latency
200Gb/s Adapter, 0.6us latency
215 million messages per second
(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)
MPI, SHMEM/PGAS, UPC
For Commercial and Open Source Applications
Leverages Hardware Accelerations
System on Chip and SmartNIC
Programmable adapter
Smart Offloads
© 2019 Mellanox Technologies 23
Leading Connectivity
ConnectX-6 HDR InfiniBand Adapter
Leading Performance
Leading Features
 200Gb/s InfiniBand and Ethernet
 HDR, HDR100, EDR (100Gb/s) and lower speeds
 200GbE, 100GbE and lower speeds
 Single and dual ports
 200Gb/s throughput, 0.6usec latency, 215 million message per second
 PCIe Gen3 / Gen4, 32 lanes
 Integrated PCIe switch
 Multi-Host - up to 8 hosts, supporting 4 dual-socket servers
 In-network computing and memory for HPC collective offloads
 Security – Block-level encryption to storage, key management, FIPS
 Storage – NVMe Emulation, NVMe-oF target, Erasure coding, T10/DIF
© 2019 Mellanox Technologies 24
HDR InfiniBand Switches
 40 ports of HDR, 200G
 80 ports of HDR100, 100G
40 QSFP56 ports
 800 ports of HDR, 200G
 1600 ports of HDR100, 100G
800 QSFP56 ports
© 2019 Mellanox Technologies 25
Real Time Network Visibility
Network status/health in real time
Advanced monitoring for troubleshooting
 8 mirror agents triggered by congestion, buffer
usage and latency
 Measure queue depth using histograms (64ns
granularity)
 Buffer snapshots
 Congestion notifications and buffers status
Built-in Hardware Sensors for Rich Traffic Telemetry and Data Collection
© 2019 Mellanox Technologies 26
BlueField SoC
Advantages and Platforms
© 2019 Mellanox Technologies 27
BlueField Block Diagram
 Tile Architecture - 16 ARM® A72 CPUs subsystem
 SkyMesh™ fully coherent low-latency interconnect
 8MB L2 Cache, 8 Tiles
 Dual Port 100g IO Controller, based on ConnectX-5
 Dual 100Gb/s Ethernet/InfiniBand, compatible with ConnectX-5
 NVMe-oF hardware accelerator
 High-end Networking Offloads: RDMA, Erasure Coding, T10-DIF
 Fully Integrated PCIe switch
 32 Bifurcated PCI Gen3/4 lanes (up to 200Gb/s)
 Root Complex or Endpoint modes
 2x16, 4x8, 8x4 or 16x2 configurations
 Memory Controllers
 2x Channels DDR4 Memory Controllers w/ ECC
 NVDIMM-N Support
Dual VPI Ports
Ethernet/InfiniBand:
1, 10, 25,40,50,100G
32-lanes
PCIe Gen3/4
© 2019 Mellanox Technologies 28
BlueField for Smart Solutions
 SoC: Compute, networking and PCIe connectivity
 Dual port VPI EDR/100GbE
 16 Arm cores
 32 lanes of PCIe switch gen3/4
Storage Solutions
BlueField SoC (System on Chip)
 NVMe-based storage platforms
 RDMA, NVMe over Fabrics, RAID, Signature offload
 Partner’s solutions based on BlueField storage controller
Smart Adapters
 In-network computing and collective offloads
 Co-processor running proprietary smart algorithms
 Security and privacy algorithms
© 2019 Mellanox Technologies 29
L2/3 Cache
CPU
Hardware-based accelerators
Memory
A fully functioning Operating System
Network Adapter
BlueField Smart Adapter is a Computer
Network Adapter
© 2019 Mellanox Technologies 30
Highest Performance and Scalability for Exascale Platforms
7X
Higher
Performance
96%
Network
Utilization
Flat
Latency
5000X
Higher
Resiliency
2X
Higher
Performance
Deep
Learning
HDR 200G
NDR 400G
XDR 1000G
© 2019 Mellanox Technologies 31
5 62 166 264
India’s National
Supercomputing
Program
World’s First
HDR InfiniBand
Supercomputer
HDR 200G InfiniBand Accelerated Supercomputers
© 2019 Mellanox Technologies 32
Thank You

More Related Content

What's hot

Presentation f5 – beyond load balancer
Presentation   f5 – beyond load balancerPresentation   f5 – beyond load balancer
Presentation f5 – beyond load balancerxKinAnx
 
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Amazon Web Services
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haimharryvanhaaren
 
Ericsson 5G learning portfolio 2018
Ericsson 5G learning portfolio 2018Ericsson 5G learning portfolio 2018
Ericsson 5G learning portfolio 2018Ericsson
 
IP Address Management Best Practices
IP Address Management Best PracticesIP Address Management Best Practices
IP Address Management Best PracticesSolarWinds
 
BIG IP F5 GTM Presentation
BIG IP F5 GTM PresentationBIG IP F5 GTM Presentation
BIG IP F5 GTM PresentationPCCW GLOBAL
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/StableSR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stablejuet-y
 
Network Management Fundamentals - Back to the Basics
Network Management Fundamentals - Back to the BasicsNetwork Management Fundamentals - Back to the Basics
Network Management Fundamentals - Back to the BasicsSolarWinds
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 
An Introduction to BGP Flow Spec
An Introduction to BGP Flow SpecAn Introduction to BGP Flow Spec
An Introduction to BGP Flow SpecShortestPathFirst
 
Vxlan deep dive session rev0.5 final
Vxlan deep dive session rev0.5   finalVxlan deep dive session rev0.5   final
Vxlan deep dive session rev0.5 finalKwonSun Bae
 
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014Bruno Teixeira
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud Hidetsugu Sugiyama
 

What's hot (20)

Presentation f5 – beyond load balancer
Presentation   f5 – beyond load balancerPresentation   f5 – beyond load balancer
Presentation f5 – beyond load balancer
 
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
Ericsson 5G learning portfolio 2018
Ericsson 5G learning portfolio 2018Ericsson 5G learning portfolio 2018
Ericsson 5G learning portfolio 2018
 
IP Address Management Best Practices
IP Address Management Best PracticesIP Address Management Best Practices
IP Address Management Best Practices
 
BIG IP F5 GTM Presentation
BIG IP F5 GTM PresentationBIG IP F5 GTM Presentation
BIG IP F5 GTM Presentation
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/StableSR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
 
Network Management Fundamentals - Back to the Basics
Network Management Fundamentals - Back to the BasicsNetwork Management Fundamentals - Back to the Basics
Network Management Fundamentals - Back to the Basics
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
An Introduction to BGP Flow Spec
An Introduction to BGP Flow SpecAn Introduction to BGP Flow Spec
An Introduction to BGP Flow Spec
 
Vxlan deep dive session rev0.5 final
Vxlan deep dive session rev0.5   finalVxlan deep dive session rev0.5   final
Vxlan deep dive session rev0.5 final
 
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
 
Past Present and Future of CXL
Past Present and Future of CXLPast Present and Future of CXL
Past Present and Future of CXL
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
 

Similar to InfiniBand In-Network Computing Technology and Roadmap

Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopGanesan Narayanasamy
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascaleinside-BigData.com
 
Mellanox Announcements at SC15
Mellanox Announcements at SC15Mellanox Announcements at SC15
Mellanox Announcements at SC15inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Interconnect Your Future With Mellanox
Interconnect Your Future With MellanoxInterconnect Your Future With Mellanox
Interconnect Your Future With MellanoxMellanox Technologies
 
Interconnect Your Future: Paving the Road to Exascale
Interconnect Your Future: Paving the Road to ExascaleInterconnect Your Future: Paving the Road to Exascale
Interconnect Your Future: Paving the Road to Exascaleinside-BigData.com
 
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand AdapterAnnouncing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapterinside-BigData.com
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Interconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IBInterconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IBMellanox Technologies
 
IBM 40Gb Ethernet - A competitive alternative to Infiniband
IBM 40Gb Ethernet - A competitive alternative to InfinibandIBM 40Gb Ethernet - A competitive alternative to Infiniband
IBM 40Gb Ethernet - A competitive alternative to InfinibandAngel Villar Garea
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreinside-BigData.com
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandMellanox Technologies
 

Similar to InfiniBand In-Network Computing Technology and Roadmap (20)

Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmap
 
Mellanox OpenPOWER features
Mellanox OpenPOWER featuresMellanox OpenPOWER features
Mellanox OpenPOWER features
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Mellanox Announcements at SC15
Mellanox Announcements at SC15Mellanox Announcements at SC15
Mellanox Announcements at SC15
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
Interconnect Your Future With Mellanox
Interconnect Your Future With MellanoxInterconnect Your Future With Mellanox
Interconnect Your Future With Mellanox
 
Interconnect Your Future: Paving the Road to Exascale
Interconnect Your Future: Paving the Road to ExascaleInterconnect Your Future: Paving the Road to Exascale
Interconnect Your Future: Paving the Road to Exascale
 
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand AdapterAnnouncing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
 
Mellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDNMellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDN
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Interconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IBInterconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IB
 
IBM 40Gb Ethernet - A competitive alternative to Infiniband
IBM 40Gb Ethernet - A competitive alternative to InfinibandIBM 40Gb Ethernet - A competitive alternative to Infiniband
IBM 40Gb Ethernet - A competitive alternative to Infiniband
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
 
CTIA 2010 Corporate Overview
CTIA 2010 Corporate OverviewCTIA 2010 Corporate Overview
CTIA 2010 Corporate Overview
 
Lte latam 2016 v2.5a
Lte latam 2016 v2.5aLte latam 2016 v2.5a
Lte latam 2016 v2.5a
 
Новые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS FusionНовые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS Fusion
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

InfiniBand In-Network Computing Technology and Roadmap

  • 1. © 2019 Mellanox Technologies 1 Gilad Shainer, MUG, August 2019 InfiniBand In-Network Computing Technology and Roadmap
  • 2. © 2019 Mellanox Technologies 2 The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates Performance Bottlenecks Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale GPU CPU GPU CPU Onload Network In-Network Computing GPU CPU CPU GPU GPU CPU GPU CPU GPU CPU CPU GPU Analyze Data as it Moves! Higher Performance and Scale
  • 3. © 2019 Mellanox Technologies 3 GPUDirect RDMA Network Communication Application  Data Analysis  Real Time  Deep Learning  Mellanox SHARP In-Network Computing  MPI Tag Matching  MPI Rendezvous  Network Transport Offload  RDMA and GPU-Direct RDMA  SHIELD (Self-Healing Network)  Enhanced Adaptive Routing and Congestion Control Connectivity  Multi-Host Technology  Socket-Direct Technology  Enhanced Topologies Accelerating All Levels of HPC / AI Frameworks
  • 4. © 2019 Mellanox Technologies 4 Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
  • 5. © 2019 Mellanox Technologies 5 Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)  Reliable Scalable General Purpose Primitive  In-network Tree based aggregation mechanism  Large number of groups  Multiple simultaneous outstanding operations  Applicable to Multiple Use-cases  HPC Applications using MPI / SHMEM  Distributed Machine Learning applications  Scalable High Performance Collective Offload  Barrier, Reduce, All-Reduce, Broadcast and more  Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND  Integer and Floating-Point, 16/32/64 bits Data Aggregated Aggregated Result Aggregated Result Data Host Host Host Host Host SwitchSwitch Switch
  • 6. © 2019 Mellanox Technologies 6 SHARP AllReduce Performance Advantages (128 Nodes) SHARP enables 75% Reduction in Latency Providing Scalable Flat LatencyScalable Hierarchical Aggregation and Reduction Protocol
  • 7. © 2019 Mellanox Technologies 7 SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology SHARP Enables Highest PerformanceScalable Hierarchical Aggregation and Reduction Protocol
  • 8. © 2019 Mellanox Technologies 8 NCCL-SHARP Delivers Highest Performance
  • 9. © 2019 Mellanox Technologies 9 SHARP Performance Advantage for AI  SHARP provides 16% Performance Increase for deep learning, initial results  TensorFlow with Horovod running ResNet50 benchmark, HDR InfiniBand (ConnectX-6, Quantum) 16% 11% P100 NVIDIA GPUs, RH 7.5, Mellanox OFED 4.4, HPC-X v2.3, TensorFlow v1.11, Horovod 0.15.0
  • 10. © 2019 Mellanox Technologies 10 MPI Tag Matching Hardware Engine
  • 11. © 2019 Mellanox Technologies 11 Tag Matching Hardware Engine Performance Advantage 0 1 2 3 4 5 6 7 8 0 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K Latency(us) Message Size (byte) MPI Latency (Eager) MVAPICH2 MVAPICH2+HW-TM 35% 0 10000 20000 30000 40000 50000 60000 70000 16K 32K 64K 128K 256K 512K Latency(us) Message Size (byte) MPI iscatterv (1,280 Processes) MVAPICH2 MVAPICH2+HW-TM 1.8X Courtesy of Dhabaleswar K. (DK) Panda Ohio State University
  • 12. © 2019 Mellanox Technologies 12 GPUDirect
  • 13. © 2019 Mellanox Technologies 13 Mellanox PeerDirect™ Technology  Purpose-built for acceleration of Deep Learning  Provides significant decrease in communication latency for acceleration devices  Peer-to-peer communications between Mellanox adapters and third-party devices  Enables GPUDirect™ RDMA, GPUDirect™ ASYNC, ROCm and others CPU Chipset Vendor Device CPU Chipset Vendor Device 0101001011 Designed for Deep Learning Acceleration
  • 14. © 2019 Mellanox Technologies 14 10X Higher Performance with GPUDirect™ RDMA  Accelerates HPC and Deep Learning performance  Lowest communication latency for GPUs GPUDirect™ RDMA Courtesy of Dhabaleswar K. (DK) Panda Ohio State University
  • 15. © 2019 Mellanox Technologies 15 Quality of Service
  • 16. © 2019 Mellanox Technologies 16 InfiniBand Quality of Service Low Priority VL Arbitrary SL 0-3 SL 4 SL 6 SL 8 SL 10 SL 12 W 32 W 32 W 32 W 64 W 64 User / Workload Category Service Level W 64 Virtual Lanes over Physical Link VL-0 VL-1 VL-2 VL-4 VL-5 VL-6 High Priority VL Arbitrary User 1 User 2 User 3 User 4 Other Clock Sync Backup Storage MPI MPI Network
  • 17. © 2019 Mellanox Technologies 17 SHIELD Self Healing Technology
  • 18. © 2019 Mellanox Technologies 18 SHIELD - Self Healing Technology  The ability to overcome network failures, locally, by the switches  Software-based solutions suffer from long delays detecting network failures  5-30 seconds for 1K to 10K nodes clusters  Accelerates network recovery time by 5000X  The higher the speed or scale the greater the recovery value  Available with EDR and HDR switches and beyond Enables Unbreakable Data Centers
  • 19. © 2019 Mellanox Technologies 19 Adaptive Routing
  • 20. © 2019 Mellanox Technologies 20 InfiniBand Proven Adaptive Routing Performance  Oak Ridge National Laboratory – Coral Summit supercomputer  Bisection bandwidth benchmark, based on mpiGraph  Explores the bandwidth between possible MPI process pairs  AR results demonstrate an average performance of 96% of the maximum bandwidth measured mpiGraph explores the bandwidth between possible MPI process pairs. In the histograms, the single cluster with AR indicates that all pairs achieve nearly maximum bandwidth while single-path static routing has nine clusters as congestion limits bandwidth, negatively impacting overall application performance. “The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems”, Sudharshan S. Vazhkudai, Arthur S. Bland, Al Geist, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don E. Maxwell, Veronica G. Vergara Larrea, Wayne Joubert, Matthew A. Ezell, Dustin Leverman, James H. Rogers, Drew Schmidt, Mallikarjun Shankar, Feiyi Wang, Junqi Yin (Oak Ridge National Laboratory) and Bronis R. de Supinski, Adam Bertsch, Robin Goldstone, Chris Chambreau, Ben Casses, Elsa Gonsiorowski, Ian Karlin, Matthew L. Leininger, Adam Moody, Martin Ohmacht, Ramesh Pankajakshan, Fernando Pizzano, Py Watson, Lance D. Weems (Lawrence Livermore National Laboratory) and James Sexton, Jim Kahle, David Appelhans, Robert Blackmore, George Chochia, Gene Davison, Tom Gooding, Leopold Grinberg, Bill Hanson, Bill Hartner, Chris Marroquin, Bryan Rosenburg, Bob Walkup (IBM) InfiniBand High Network Efficiency - mpiGraph Oak Ridge National Lab Summit Supercomputer Static Routing Adaptive Routing
  • 21. © 2019 Mellanox Technologies 21 HDR InfiniBand
  • 22. © 2019 Mellanox Technologies 22 Highest-Performance 200Gb/s InfiniBand Solutions Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) 40 HDR (200Gb/s) InfiniBand Ports 80 HDR100 InfiniBand Ports Throughput of 16Tb/s, <90ns Latency 200Gb/s Adapter, 0.6us latency 215 million messages per second (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations System on Chip and SmartNIC Programmable adapter Smart Offloads
  • 23. © 2019 Mellanox Technologies 23 Leading Connectivity ConnectX-6 HDR InfiniBand Adapter Leading Performance Leading Features  200Gb/s InfiniBand and Ethernet  HDR, HDR100, EDR (100Gb/s) and lower speeds  200GbE, 100GbE and lower speeds  Single and dual ports  200Gb/s throughput, 0.6usec latency, 215 million message per second  PCIe Gen3 / Gen4, 32 lanes  Integrated PCIe switch  Multi-Host - up to 8 hosts, supporting 4 dual-socket servers  In-network computing and memory for HPC collective offloads  Security – Block-level encryption to storage, key management, FIPS  Storage – NVMe Emulation, NVMe-oF target, Erasure coding, T10/DIF
  • 24. © 2019 Mellanox Technologies 24 HDR InfiniBand Switches  40 ports of HDR, 200G  80 ports of HDR100, 100G 40 QSFP56 ports  800 ports of HDR, 200G  1600 ports of HDR100, 100G 800 QSFP56 ports
  • 25. © 2019 Mellanox Technologies 25 Real Time Network Visibility Network status/health in real time Advanced monitoring for troubleshooting  8 mirror agents triggered by congestion, buffer usage and latency  Measure queue depth using histograms (64ns granularity)  Buffer snapshots  Congestion notifications and buffers status Built-in Hardware Sensors for Rich Traffic Telemetry and Data Collection
  • 26. © 2019 Mellanox Technologies 26 BlueField SoC Advantages and Platforms
  • 27. © 2019 Mellanox Technologies 27 BlueField Block Diagram  Tile Architecture - 16 ARM® A72 CPUs subsystem  SkyMesh™ fully coherent low-latency interconnect  8MB L2 Cache, 8 Tiles  Dual Port 100g IO Controller, based on ConnectX-5  Dual 100Gb/s Ethernet/InfiniBand, compatible with ConnectX-5  NVMe-oF hardware accelerator  High-end Networking Offloads: RDMA, Erasure Coding, T10-DIF  Fully Integrated PCIe switch  32 Bifurcated PCI Gen3/4 lanes (up to 200Gb/s)  Root Complex or Endpoint modes  2x16, 4x8, 8x4 or 16x2 configurations  Memory Controllers  2x Channels DDR4 Memory Controllers w/ ECC  NVDIMM-N Support Dual VPI Ports Ethernet/InfiniBand: 1, 10, 25,40,50,100G 32-lanes PCIe Gen3/4
  • 28. © 2019 Mellanox Technologies 28 BlueField for Smart Solutions  SoC: Compute, networking and PCIe connectivity  Dual port VPI EDR/100GbE  16 Arm cores  32 lanes of PCIe switch gen3/4 Storage Solutions BlueField SoC (System on Chip)  NVMe-based storage platforms  RDMA, NVMe over Fabrics, RAID, Signature offload  Partner’s solutions based on BlueField storage controller Smart Adapters  In-network computing and collective offloads  Co-processor running proprietary smart algorithms  Security and privacy algorithms
  • 29. © 2019 Mellanox Technologies 29 L2/3 Cache CPU Hardware-based accelerators Memory A fully functioning Operating System Network Adapter BlueField Smart Adapter is a Computer Network Adapter
  • 30. © 2019 Mellanox Technologies 30 Highest Performance and Scalability for Exascale Platforms 7X Higher Performance 96% Network Utilization Flat Latency 5000X Higher Resiliency 2X Higher Performance Deep Learning HDR 200G NDR 400G XDR 1000G
  • 31. © 2019 Mellanox Technologies 31 5 62 166 264 India’s National Supercomputing Program World’s First HDR InfiniBand Supercomputer HDR 200G InfiniBand Accelerated Supercomputers
  • 32. © 2019 Mellanox Technologies 32 Thank You