More Related Content
Similar to Interconnect Your Future With Mellanox (20)
More from Mellanox Technologies (14)
Interconnect Your Future With Mellanox
- 2. Mellanox Performance Advantage (Source TopCrunch)
2014 Results
Higher Performance
with Half of the System Size!
LS-DYNA Applications, Car2Car Benchmark (Seconds)
1500
1400
1300
LS-DYNA is an advanced multiphysics
simulation software (CAE), developed by LSTC
Used in automotive, aerospace, military,
manufacturing, and bioengineering industries
1200
1100
1000
CRAY XC30/Aries
2000 Cores
CRAY XC30/Aries
4000 Cores
FDR InfiniBand (SGI)
2000 Cores
InfiniBand Delivers Highest System Performance, Efficiency and Scalability
All platforms use same CPUs of Intel® Xeon® E5-2690 v2 @3.00GHz, Cray platform is connected with Cray Aries interconnect, SGI platform is connected with Mellanox FDR InfiniBand
© 2014 Mellanox Technologies
2
- 3. Mellanox Performance Advantage (Source HPC Advisory Council)
More than
2X Performance!
HOOMD-blue is an highly optimized object-oriented
many-particle dynamics applications that performs
general purpose particle dynamics simulations
Developed by the University of Michigan
InfiniBand Delivers Highest System Performance, Efficiency and Scalability
© 2014 Mellanox Technologies
3
- 4. InfiniBand Leadership in TOP500 Petascale-Capable Systems
Mellanox InfiniBand is the interconnect of choice for Petascale computing
• Accelerates 48% of the sustained Petaflop systems (19 systems out of 40)
© 2014 Mellanox Technologies
4
- 5. Mellanox InfiniBand Connected Petascale Systems
Connecting Half of the World’s Petascale Systems
Mellanox Connected Petascale System Examples
© 2014 Mellanox Technologies
5
- 6. InfiniBand’s Unsurpassed System Efficiency
Average Efficiency
• InfiniBand: 86%
• Cray: 80%
• 10GbE: 65%
• GigE: 44%
TOP500 systems listed according to their efficiency
InfiniBand is the key element responsible for the highest system efficiency
Mellanox delivers efficiencies of more than 97% with InfiniBand
© 2014 Mellanox Technologies
6
- 7. Mellanox in the TOP500 Supercomputing List (Nov’13)
Mellanox FDR InfiniBand is the fastest interconnect solution on the TOP500
•
•
•
•
More than 12GB/s throughput, less than 0.7usec latency
Being used in 80 systems on the TOP500 list – 1.8X increase from the Nov’12 list
Connects the fastest InfiniBand-based supercomputers – TACC (#7), LRZ (#10)
Enables the two most efficient systems in the TOP200
Mellanox InfiniBand is the fastest interconnect technology on the list
• Enables the highest system utilization on the TOP500 – more than 97% system efficiency
• Enables the top seven highest utilized systems on the TOP500 list
Mellanox InfiniBand is the only Petascale-proven, standard interconnect solution
• Connects 19 out of the 40 Petaflop capable systems on the list
• Connects 4X the number of Cray based systems in the Top100, 6.5X in TOP500
Mellanox’s end-to-end scalable solutions accelerate GPU-based systems
• GPUDirect RDMA technology enables faster communications and higher performance
© 2014 Mellanox Technologies
7
- 8. System Example: NASA Ames Research Center Pleiades
20K InfiniBand nodes
Mellanox end-to-end FDR and QDR InfiniBand
Supports variety of scientific and engineering projects
• Coupled atmosphere-ocean models
• Future space vehicle design
• Large-scale dark matter halos and galaxy evolution
Asian Monsoon Water Cycle
High-Resolution Climate Simulations
© 2014 Mellanox Technologies
8
- 9. Leading Supplier of End-to-End Interconnect Solutions
Comprehensive End-to-End Software Accelerators and Managment
Management
MXM
FCA
Mellanox Messaging
Acceleration
Fabric Collectives
Acceleration
UFM
Unified Fabric Management
Storage and Data
VSA
UDA
Storage Accelerator
(iSCSI)
Unstructured Data
Accelerator
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
ICs
Adapter Cards
© 2014 Mellanox Technologies
Switches/Gateways
Host/Fabric Software
Metro / WAN
Cables/Modules
9
- 10. Converged Interconnect Solutions to Deliver Highest ROI for all Applications
Accelerating Half of the World’s Petascale Systems
Mellanox Connected Petascale System Examples
InfiniBand Enables Lowest Application Cost in the Cloud
(Examples)
© 2014 Mellanox Technologies
Businesses Success Depends on Mellanox
Dominant in Storage Interconnects
10
- 12. Virtual Protocol Interconnect (VPI) Technology
VPI Adapter
VPI Switch
Unified Fabric Manager
Switch OS Layer
Applications
Storage
Networking
Clustering
Management
Acceleration Engines
Ethernet: 10/40/56 Gb/s
3.0
64 ports 10GbE
36 ports 40/56GbE
48 10GbE + 12 40/56GbE
36 ports IB up to 56Gb/s
8 VPI subnets
InfiniBand:10/20/40/56 Gb/s
From data center to
campus and metro
connectivity
LOM
Adapter Card
Mezzanine Card
Standard Protocols of InfiniBand and Ethernet on the Same Wire!
© 2014 Mellanox Technologies
12
- 13. Mellanox ScalableHPC Communication Library to Accelerate Applications
MPI
OpenSHMEM / PGAS
MXM
FCA
•
•
•
•
•
•
•
•
Berkeley UPC
Reliable Messaging
Hybrid Transport Mechanism
Efficient Memory Registration
Receive Side Tag Matching
Topology Aware Collective Optimization
Hardware Multicast
Separate Virtual Fabric for Collectives
CORE-Direct Hardware Offload
Reduce Collective Latency
100.0
80.0
60.0
40.0
20.0
0.0
Latency (us)
Latency (us)
Barrier Collective Latency
0
500
1000
1500
2000
Processes (PPN=8)
Without FCA
© 2014 Mellanox Technologies
With FCA
2500
3000
2500
2000
1500
1000
500
0
0
500
1000
1500
2000
2500
Processes (PPN=8)
Without FCA
With FCA
13
- 14. Mellanox Connect-IB The World’s Fastest Adapter
The 7th generation of Mellanox interconnect adapters
World’s first 100Gb/s interconnect adapter (dual-port FDR 56Gb/s InfiniBand)
Delivers 137 million messages per second – 4X higher than competition
World leading scalable transport – no dependency on system size
© 2014 Mellanox Technologies
14
- 15. Smart Offloads for MPI/SHMEM/PGAS/UPC Collective Operations
Ideal
System noise
CORE-Direct (Offload)
CORE-Direct - Asynchronous
CORE-Direct Technology
US Department of Energy (DOE) funded project – ORNL and Mellanox
Adapter-based hardware offloading for collectives operations
Includes floating-point capability on the adapter for data reductions
CORE-Direct API is exposed through the Mellanox drivers
© 2014 Mellanox Technologies
15
- 16. 1
System
Memory
CPU
CPU
1
GPUDirect RDMA for Highest GPU Performance
GPU
Chip
set
Chip
set
System
Memory
GPU
InfiniBand
GPU
Memory
InfiniBand
GPUDirect RDMA
GPU
Memory
Source: Prof. DK Panda
67% Lower Latency
© 2014 Mellanox Technologies
5X Increase in Throughput
16
- 17. Remote GPU Access through rCUDA
GPU servers
CUDA Application
Application
GPU as a Service
Client Side
Server Side
Application
rCUDA daemon
rCUDA library
CUDA
Driver + runtime
Network Interface
Network Interface
CPU
VGPU
CPU
VGPU
CUDA
Driver + runtime
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
CPU
VGPU
rCUDA provides remote access from
every node to any GPU in the system
© 2014 Mellanox Technologies
17
- 18. Campus and Metro RDMA Long Reach Solutions
Example:
4 MetroX TX6100 systems over 6 km
Example:
4 MetroX TX6100 systems
• Connect IB over 2-4km
“A common problem is the time cost of moving data
between datacenters, which can slow computations
and delay results. Mellanox's MetroX lets us unify
systems across campus, and maintain the high-speed
access our researchers need, regardless of the
physical location of their work.”
• Replace Obsidian SDR
Mike Shuey, Purdue University
Example:
© 2014 Mellanox Technologies
2 MetroX TX6100 systems over 8 km
18
- 19. Variety of Clustering Topologies
CLOS (Fat Tree)
Typically enables best performance, lowest latency
Non-Blocking Network
Alleviates bandwidth bottleneck closer to the root.
Most common topology in many supercomputers
Hypercube
Supported by SGI
© 2014 Mellanox Technologies
Mesh / 3D Torus
Blocking network, good for applications with locality
Support for dedicate sub-networks
Simple expansion for future growth
Not limited to storage connection only at cube edges
DragonFly+
Concept of connecting “groups” together in a full-graph
Flexible definition of intra-group interconnection
19
- 20. The Mellanox Advatage
Connect-IB delivers superior performance: 100Gb/s, 0.7usec latency, 137 million messages/sec
ScalableHPC software library provides leading performance for MPI, OpenSHMEM/PGAS and UPC
Superiors applications offloads: RDMA, Collectives, scalable transport (Dynamically Connected)
Flexible topologies: Fat Tree, mesh, 3D Torus, Dragonfly+
Standard based solution, Open source support, large eco-system, one solution for all applications
Converged I/O – compute, storage, management on single fabric
Long term roadmap
© 2014 Mellanox Technologies
20
- 21. Technology Roadmap – One-Generation Lead over the Competition
Mellanox
56Gbs
40Gbs
20Gbs
Terascale
Petascale
3rd
“Roadrunner”
Virginia Tech (Apple)
Mellanox Connected
© 2014 Mellanox Technologies
Exascale
1st
TOP500 2003
2000
200Gbs
100Gbs
2005
Mega Supercomputers
2010
2015
2020
21
- 22. The Only Provider of End-to-End 40/56Gb/s Solutions
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
ICs
Adapter Cards
Switches/Gateways
Host/Fabric Software
Metro / WAN
Cables/Modules
From Data Center to Metro and WAN
X86, ARM and Power based Compute and Storage Platforms
The Interconnect Provider For 10Gb/s and Beyond
© 2014 Mellanox Technologies
22