Mellanox IBM

1,416 views
1,257 views

Published on

Presentation from the HPC event at IBM Denmark - September 2013, Copenhagen

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,416
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mellanox IBM

  1. 1. 10/4/2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and storage interconnect • FDR 56Gb/s InfiniBand and 10/40/56GbE • Reduces application wait-time for data • Dramatically increases ROI on data center infrastructure Company headquarters: • Yokneam, Israel; Sunnyvale, California • ~1,200 employees* worldwide Solid financial position • • • • Mellanox Technology Update Mellanox Confidential – Phil Hofstad Record revenue in FY12; $500.8M, up 93% year-over-year Q2’13 revenue of $98.2M Q3’13 guidance ~$104M to $109M Cash + investments @ 6/30/13 = $411.3M * As of June 2013 Sept 2013 © 2013 Mellanox Technologies - Mellanox Confidential - Virtual Protocol Interconnect (VPI) Technology Providing End-to-End Interconnect Solutions Comprehensive End-to-End Software Accelerators and Managment Management VPI Adapter VPI Switch Unified Fabric Manager Storage and Data Switch OS Layer Applications MXM FCA Mellanox Messaging Acceleration Fabric Collectives Acceleration UFM Unified Fabric Management VSA UDA Storage Accelerator (iSCSI) Unstructured Data Accelerator Networking Storage Clustering Management 64 ports 10GbE 36 ports 40/56GbE 48 10GbE + 12 40/56GbE 36 ports IB up to 56Gb/s 8 VPI subnets Acceleration Engines Comprehensive End-to-End InfiniBand and Ethernet Solutions Portfolio ICs Adapter Cards Switches/Gateways 2 Long-Haul Systems Ethernet: 10/40/56 Gb/s 3.0 Cables/Modules InfiniBand:10/20/40/56 Gb/s From data center to campus and metro connectivity LOM © 2013 Mellanox Technologies - Mellanox Confidential - 3 © 2013 Mellanox Technologies Adapter Card Mezzanine Card - Mellanox Confidential - 4 1
  2. 2. 10/4/2013 MetroDX™ and MetroX™ Data Center Expansion Example – Disaster Recovery MetroX™ and MetroDX™ extends InfiniBand and Ethernet RDMA reach Fastest interconnect over 40Gb/s InfiniBand or Ethernet links Supporting multiple distances Simple management to control distant sites Low-cost, low-power , long-haul solution 40Gb/s over Campus and Metro © 2013 Mellanox Technologies - Mellanox Confidential - 5 Key Elements in a Data Center Interconnect © 2013 Mellanox Technologies - Mellanox Confidential - 6 IPtronics and Kotura Complete Mellanox’s 100Gb/s+ Technology Applications Recent Acquisitions of Kotura and IPtronics Enable Mellanox to Deliver Complete High-Speed Optical Interconnect Solutions for 100Gb/s and Beyond Servers Storage Switch and IC Cables, Silicon Photonics, Parallel Optical Modules Adapter and IC © 2013 Mellanox Technologies Adapter and IC - Mellanox Confidential - 7 © 2013 Mellanox Technologies - Mellanox Confidential - 8 2
  3. 3. 10/4/2013 Mellanox InfiniBand Paves the Road to Exascale Leading Interconnect, Leading Performance 200Gb/s Bandwidth 100Gb/s 56Gb/s 40Gb/s 20Gb/s 10Gb/s 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 Same Software Interface 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 5usec 2.5usec 1.3usec 0.7usec 0.6usec Latency © 2013 Mellanox Technologies - Mellanox Confidential - 9 <0.5usec © 2013 Mellanox Technologies - Mellanox Confidential - 10 Connect-IB : The Exascale Foundation World’s first 100Gb/s interconnect adapter • PCIe 3.0 x16, dual FDR 56Gb/s InfiniBand ports to provide >100Gb/s Highest InfiniBand message rate: 137 million messages per second • 4X higher than other InfiniBand solutions <0.7 micro-second application latency Connect-IB Supports GPUDirect RDMA for direct GPU-to-GPU communication Architectural Foundation for Exascale Computing Unmatchable Storage Performance • 8,000,000 IOPs (1QP), 18,500,000 IOPs (32 QPs) New Innovative Transport – Dynamically Connected Transport Service Supports Scalable HPC with MPI, SHMEM and PGAS/UPC offloads Enter the World of Boundless Performance © 2013 Mellanox Technologies - Mellanox Confidential - 11 © 2013 Mellanox Technologies - Mellanox Confidential - 12 3
  4. 4. 10/4/2013 Dynamically Connected Transport Advantages © 2013 Mellanox Technologies - Mellanox Confidential - 14 FDR InfiniBand Delivers Highest Application Performance Scalable and Cost Effective Topologies 3D Torus, Fat-Tree, Dragonfly+ Message Rate (Million) Message Rate 150 100 50 0 QDR InfiniBand © 2013 Mellanox Technologies FDR InfiniBand - Mellanox Confidential - 15 © 2013 Mellanox Technologies - Mellanox Confidential - 16 4
  5. 5. 10/4/2013 Topologies / Capabilities Overview l l 1 2 3 m b 1 h Can be fully rearrangeably non-blocking (1:1) or blocking (x:1). Typically enables best performance, lowest latency Alleviates bandwidth bottleneck closer to the root. Most common topology in many supercomputer installations Blocking network, cost-effective for systems at scale Great performance solutions for applications with locality Support for dedicate sub-networks Simple expansion for future growth Fault tolerant with multiple link failures Support for adaptive routing, congestion control, QOS Not limited to storage connection only at cube edges r h HCA 1..h HCA 1..h Concept of connecting “groups” or “virtual routers” together in a full-graph (all to all) way Support higher dimensional networks Flexible definition of intra-group interconnection Dragonfly+ has higher scalability and more cost effective No intermediate proprietary storage conversion Unified Fabric Management UFM Progression of Advanced Capabilities… UpDn Routing Based on minimum hops to each node Nodes ranked based on distance from root nodes Restricted routes by strict ranking rules Routing Chains Multiple systems can be routed with unique associated topologies. Provides for more complex routing Support for Port-groups policies. A new policy file lets the user define “sub topologies” © 2013 Mellanox Technologies - Mellanox Confidential - IB Routers Scalability beyond LID space Administrative subnet isolation Fault isolation. IB routing between subnets 17 UFM in the Fabric © 2013 Mellanox Technologies - Mellanox Confidential - 18 Multi-Site Management Synchronization, Heartbeat Software or Appliance form factor 2 or more High Availability Switch and HCA management Full Mgmt or Monitoring Only modes Site1 © 2013 Mellanox Technologies - Mellanox Confidential - 19 © 2013 Mellanox Technologies Site2 ... - Mellanox Confidential - Site n 20 5
  6. 6. 10/4/2013 UFM Main Features Integration with 3rd Party Systems Extensible architecture Automatic Discovery Central Device Mgmt Fabric Dashboard Congestion Analysis Health & Perf Monitoring Advanced Alerting Fabric Health Reports Service Oriented Provisioning • Based on Web-services Alerts via SNMP Traps Open API for users or 3rd-party extensions • Allows simple reporting, provisioning, and monitoring • Task automation • Software Development Kit Monitoring System Web Based API Configuration Mgmt Extensible object model • User-defined fields • User-defined menus Web Based API Read Write Manage Orchestrator Job Scheduler… © 2013 Mellanox Technologies - Mellanox Confidential - 21 © 2013 Mellanox Technologies - Mellanox Confidential - 22 Mellanox ScalableHPC Accelerate Parallel Applications MPI SHMEM P1 P2 P3 Memory Memory Memory PGAS P1 P2 P3 P1 P2 P3 Scalable Communication Logical Shared Memory MXM, FCA Memory Logical Shared Memory Memory MXM • • • • Memory FCA • • • • Memory Reliable Messaging Optimized for Mellanox HCA Hybrid Transport Mechanism Efficient Memory Registration Receive Side Tag Matching Topology Aware Collective Optimization Hardware Multicast Separate Virtual Fabric for Collectives CORE-Direct Hardware Offload InfiniBand Verbs API © 2013 Mellanox Technologies - Mellanox Confidential - 23 © 2013 Mellanox Technologies - Mellanox Confidential - 24 6
  7. 7. 10/4/2013 MXM v2.0 - Highlights Mellanox FCA Collective Scalability Reduce Collective Barrier Collective Latency (us) More solutions will be added in the future Utilizing Mellanox offload engines Supported APIs (both sync/async): AM, p2p, atomics, synchronization 3000 100,0 90,0 80,0 70,0 60,0 50,0 40,0 30,0 20,0 10,0 0,0 2500 Latency (us) Transport library integrated with OpenMPI, OpenSHMEM, BUPC, Mvapich2 2000 1500 1000 500 0 0 500 1000 1500 2000 2500 0 500 1000 Without FCA Supported transports: RC, UD, DC, RoCE, SHMEM 1500 2000 2500 processes (PPN=8) Processes (PPN=8) With FCA Without FCA With FCA 8-Byte Broadcast Bandwidth (KB*processes) Supported built-in mechanisms: tag matching, progress thread, memory registration cache, fast path send for small messages, zero copy, flow control Supported data transfer protocols: Eager Send/Recv, Eager RDMA, Rendezvous 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 500 1000 1500 2000 2500 Processes (PPN=8) Without FCA © 2013 Mellanox Technologies - Mellanox Confidential - 25 © 2013 Mellanox Technologies With FCA - Mellanox Confidential - 26 GPUDirect RDMA Transmit 1 GPU GPU Direct CPU GPUDirect 1.0 Chip set Chip set InfiniBand CPU 1 Receive System Memory GPU InfiniBand GPU Memory 1 CPU CPU 1 GPU Memory System Memory GPU Chip set Chip set GPU Memory - Mellanox Confidential - 27 System Memory GPU InfiniBand © 2013 Mellanox Technologies System Memory © 2013 Mellanox Technologies InfiniBand GPUDirect RDMA - Mellanox Confidential - GPU Memory 28 7
  8. 8. 10/4/2013 Preliminary Performance of MVAPICH2 with GPU-Direct-RDMA Preliminary Performance of MVAPICH2 with GPUDirect RDMA GPU-GPU Internode MPI Bandwidth Small Message Latency 35 MVAPICH2-1.9 MVAPICH2-1.9 800 MVAPICH2-1.9-GDR 30 MVAPICH2-1.9-GDR Lower is Better 19.78 20 15 69 % 10 Bandwidth (MB/s) 700 25 Latency (us) Small Message Bandwidth 900 Higher is Better GPU-GPU Internode MPI Latency 600 500 3x 400 300 100 6.12 0 1 4 16 64 256 1K 0 4K 1 Message Size (bytes) 4 16 64 256 Message Size (bytes) 1K 4K Source: Prof. DK Panda 69% Lower Latency Source: Prof. DK Panda 3X Increase in Throughput © 2013 Mellanox Technologies - Mellanox Confidential - 29 Modular Switches SX6015 – 18 port externally managed NEW 216port 108 port - Mellanox Confidential - 30 SX1036 Edge Switches SX6025 – 36 ports externally managed 324 port © 2013 Mellanox Technologies Top-of-Rack (TOR) Ethernet Switch Portfolio FDR InfiniBand Switch Portfolio 648 port (Heisenberg Spin Glass) Application with 2 GPU Nodes 200 5 Execution Time of HSG SX6005 – 12 port externally managed 36x 40GbE non blocking TOR / Aggregation SX1024 SX6036 – 36 ports managed 48x10GbE+12x40GbE, 1.92Tbps non-blocking 10GbE 40GbE Aggregation SX6018 – 18 port managed SX1016 64x10GbE, 1.28Tbps non blocking NEW SX6012 – 12 port managed SX1012 Long Distance Bridge − VPI 12x40GbE non blocking, 1U half-width Management Highest Capacity in 1U NEW Bridging Routing NEW Unique Value Proposition • 12x 40GbE to 36x 40GbE • 64x 10GbE • 56GbE gives 40% better bandwidth • VPI to connect to Infiniband Latency • 220ns L2 latency • 330ns L3 latency © 2013 Mellanox Technologies - Mellanox Confidential - 31 © 2013 Mellanox Technologies Power (SX1036) • Under 1W per 10GbE interface • 2.3W per 40GbE interface - Mellanox Confidential - 32 8
  9. 9. 10/4/2013 The Generation of Open Ethernet (Open Switch Platform) ConnectX-3 Pro Dual port VPI Adapter Centralized SDN Controller Open Source Applications Routing L3 TE Route Flow Security Manager Power Manager Low power Applications OpenFlow Controller • Typ 2-port 10GigE – 2.9W LOM/mezz features • SMBus / NCSI - Baseboard Mgmt Controller I/F , WoL • Integrated sensors QUAGGA Performance Improvements Open • Larger context caches Switching L2 OpenFlow Routing L3 • Higher message rate Management Virtualization Ethernet • NVGRE/VXLAN Stateless Offloads Linux • SRIOV 10/20/40/56GbE Roadmap to 100GbE End-2-End Solution (NICs, Switches, Cables, Software) Small Footprint, minimal peripherals • FCBGA 17x17 (small footprint) • 1.0mm standard ball pitch © 2013 Mellanox Technologies - Mellanox Confidential - 33 © 2013 Mellanox Technologies - Mellanox Confidential - 34 Thank You Phil Hofstad philip@mellanox.com Tel: +44 7597 566281 © 2013 Mellanox Technologies - Mellanox Confidential - 35 9

×