The document provides an overview of InfiniBand essentials that every HPC expert must know. It discusses InfiniBand principles like fabric components, architecture, and discovery stages. It also covers protocol layers, Mellanox products, and implementations. The document is meant to educate professionals on InfiniBand fundamentals through topics like switches, adapters, cables, fabric management, and more.
Dynamische Routingprotokolle Aufzucht und Pflege - OSPFMaximilan Wilhelm
Herzlichen Glückwunsch! Sie dürfen ein Netzwerk mit mehr als 2 Routern administrieren. Dieser Vortrag erläutert, warum statisches Routing keine Lösung ist und schneller als einem lieb ist zum Problem werden kann. Als Einführung in dynamisches Routing und OSPF, erklärt dieser Vortrag wie sich Router gegenseitig finden, Routen austauschen, was eine Area ist und wie die Link-State Datenbank funktioniert.
OSPF wird praktisch am Beispiel des Bird Internet Routing Daemons und in Zusammenspiel mit klassischen Herstellern gezeigt.
TRex is an open source, low cost, stateful traffic generator fuelled by DPDK. It generates L4-7 traffic based on pre-processing and a smart replay of real traffic templates. TRex amplifies both client and server side traffic and can scale to 200Gb/sec with one UCS.
Integration and Interoperation of existing Nexus networks into an ACI Archite...Cisco Canada
Mike Herbert, Principal Engineer INSBU, at Cisco Connect Toronto focused on the integration and interoperation of existing nexus networks into an ACI architecture.
RoCEv2 is an extension of the original RoCE specification announced in 2010 that brought the benefits of Remote Direct Memory Access (RDMA) I/O architecture to Ethernet-based networks. RoCEv2 addresses the needs of today’s evolving enterprise data centers by enabling routing across Layer 3 networks. Extending RoCE to allow Layer 3 routing provides better traffic isolation and enables hyperscale data center deployments.
Watch the video presentation: http://insidehpc.com/2014/09/slidecast-ibta-releases-updated-specification-rocev2/
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this deck from the MVAPICH User Group, Gilad Shainer from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video: https://wp.me/p3RLHQ-kIC
Learn more: http://mellanox.com
and
http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Building DataCenter networks with VXLAN BGP-EVPNCisco Canada
The session specifically covers the requirements and approaches for deploying the Underlay, Overlay as well as the inter-Fabric connectivity of Data Center Networks or Fabrics. Within the VXLAN BGP-EVPN based Overlay, we focus on the insights like forwarding and control plane functions which are critical to the simplicity operation of the architecture in achieving scale, small failure domains and consistent configuration. To complete the overlay view on VXLAN BGP-EVPN, we are going to the insides of BGP and its EVPN address-familiy and extend to about how multiple DC Fabric can be interconnected within, either as stretched Fabrics or with true DCI. The session concludes with a brief overview of manageability functions, network orchestration capabilities and multi-tenancy details. This Advanced session is intended for network, design and operation engineers from Enterprises to Service Providers.
InfiniBand is a type of communications link for data flow between processors and I/O devices that offers throughput of up to 2.5 gigabytes per second and support for up to 64,000 addressable devices.
Presented by Shekhar Kumar of SOE,CUSAT, 200
Dynamische Routingprotokolle Aufzucht und Pflege - OSPFMaximilan Wilhelm
Herzlichen Glückwunsch! Sie dürfen ein Netzwerk mit mehr als 2 Routern administrieren. Dieser Vortrag erläutert, warum statisches Routing keine Lösung ist und schneller als einem lieb ist zum Problem werden kann. Als Einführung in dynamisches Routing und OSPF, erklärt dieser Vortrag wie sich Router gegenseitig finden, Routen austauschen, was eine Area ist und wie die Link-State Datenbank funktioniert.
OSPF wird praktisch am Beispiel des Bird Internet Routing Daemons und in Zusammenspiel mit klassischen Herstellern gezeigt.
TRex is an open source, low cost, stateful traffic generator fuelled by DPDK. It generates L4-7 traffic based on pre-processing and a smart replay of real traffic templates. TRex amplifies both client and server side traffic and can scale to 200Gb/sec with one UCS.
Integration and Interoperation of existing Nexus networks into an ACI Archite...Cisco Canada
Mike Herbert, Principal Engineer INSBU, at Cisco Connect Toronto focused on the integration and interoperation of existing nexus networks into an ACI architecture.
RoCEv2 is an extension of the original RoCE specification announced in 2010 that brought the benefits of Remote Direct Memory Access (RDMA) I/O architecture to Ethernet-based networks. RoCEv2 addresses the needs of today’s evolving enterprise data centers by enabling routing across Layer 3 networks. Extending RoCE to allow Layer 3 routing provides better traffic isolation and enables hyperscale data center deployments.
Watch the video presentation: http://insidehpc.com/2014/09/slidecast-ibta-releases-updated-specification-rocev2/
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this deck from the MVAPICH User Group, Gilad Shainer from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video: https://wp.me/p3RLHQ-kIC
Learn more: http://mellanox.com
and
http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Building DataCenter networks with VXLAN BGP-EVPNCisco Canada
The session specifically covers the requirements and approaches for deploying the Underlay, Overlay as well as the inter-Fabric connectivity of Data Center Networks or Fabrics. Within the VXLAN BGP-EVPN based Overlay, we focus on the insights like forwarding and control plane functions which are critical to the simplicity operation of the architecture in achieving scale, small failure domains and consistent configuration. To complete the overlay view on VXLAN BGP-EVPN, we are going to the insides of BGP and its EVPN address-familiy and extend to about how multiple DC Fabric can be interconnected within, either as stretched Fabrics or with true DCI. The session concludes with a brief overview of manageability functions, network orchestration capabilities and multi-tenancy details. This Advanced session is intended for network, design and operation engineers from Enterprises to Service Providers.
InfiniBand is a type of communications link for data flow between processors and I/O devices that offers throughput of up to 2.5 gigabytes per second and support for up to 64,000 addressable devices.
Presented by Shekhar Kumar of SOE,CUSAT, 200
Harvard HPC Seminar Series
Theresa Kaltz, PhD, High Performance Technical Computing, FAS, Harvard
Due to the wide availability and low cost of high speed networking, commodity clusters have become the de facto standard for building high performance parallel computing systems. This talk will introduce the leading technology for high speed interconnects called Infiniband and compare its deployment and performance to Ethernet. In addition, some emerging interconnect technologies and trends in cluster networking will be discussed.
Presented by Eran Bello at the "NFV & SDN Summit" held March 2014 in Paris, France
Ideal for Cloud DataCenter, Data Processing Platforms and Network Functions Virtualization
Leading SerDes Technology: High Bandwidth – Advanced Process
10/40/56Gb VPI with PCIe 3.0 Interface
10/40/56Gb High Bandwidth Switch: 36 ports of 10/40/56Gb or 64 ports of 10Gb
RDMA/RoCE technology: Ultra Low Latency Data Transfer
Software Defined Networking: SDN Switch and Control End to End Solution
Cloud Management: OpenStack integration
Paving the way to 100Gb/s Interconnect
End to End Network Interconnect for Compute/Processing and Switching
Software Defined Networking
High Bandwidth, Low Latency and Lower TCO: $/Port/Gb
A presentation about UCS and usNIC to the Math & Computer Science and Leadership Computing Facility divisions at Argonne National Laboratory (ANL). Presented to ANL by Dave Goodell (Cisco) on 2014-09-02.
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
Synopsis
We will start with MPLS 101 and then look into MPLS related OpenFlow actions. In the second half we will delve into RouteFlow architecture and extend it to enable Label Distribution Protocol (LDP) and MPLS routing. We will conclude with a mini-net based test bed switching traffic using MPLS labels instead of IP addresses.
This will be a hands on workshop. VM Images for Virtual Box will be provided. Attendees are expected to bring their laptops loaded with Virtual Box.
About Vikram Dham
Vikram is the CTO and co-founder of Kamboi Technologies, LLC where he advises networking companies, switch vendors and early adopters on SDN technology and distributed software development. Also, he is the founder of Bay Area Network Virtualization (BANV) meet-up group, that brings together technologists in the SDN/NFV/NV domain for technical talks, workshops and creates a truly "open" platform for sharing knowledge.
He has used SDN technologies for building software related to traffic engineering, security and routing. In the past, he was the Principal Engineer at Slingbox where he architected & built the distributed networking software for peer to peer connectivity of millions of end points. He holds MS degree in EE with a specialization in Computer Networks from Virginia Tech and has worked on research projects with companies like ECI Telecom, Raytheon and Avaya Research Labs.
Published twice a year and publicly available at http://www.top500.org, the TOP500 supercomputing list ranks the world’s most powerful computer systems according to the Linpack benchmark rating system.
Ahead of the NFV Curve with Truly Scale-out Network Function CloudificationMellanox Technologies
Presented at OpenStack Summit Vancouver by Chloe Jian Ma, Senior Director, Cloud Market Development (@chloe_ma)
Colin Tregenza Dancer, Director of Architecture
Mellanox demos presented at Interop Tokyo, 2014 by Kazusa Tomonaga, Sr. System Engineer for Mellanox. The event was held June 11-13, 2014 at Makuhari Messe, Chiba Pref., Japan
Building world class data centers, presented by Mellanox at Ceph Day, June 10, 2014. This event was dedicated to sharing Ceph’s transformative power and fostering the vibrant Ceph community. Hosted by InkTank and HGST
Management software is a critical component in today’s clusters. As clusters become larger, more complex and business critical, they require a proper end-to-end means to monitor, provision and control them. Traditionally, cluster administrators have had to manage the server and network sides separately without visibility into network performance and health. This results in manual, time consuming root cause analysis of events, and relatively long duration till resolution.
The CMU-UFM Connector combines HP’s Insight CMU server information with Mellanox’s Unified Fabric Manager™ (UFM™) fabric information. This enables the cluster administrator to view, in one location, the server and network information which greatly reduces operational efforts and duration till resolution.
The CMU-UFM Connector is an add-on software package installed on the HP-CMU management node.
Unified cluster and fabric topology view. One pane of glass to monitor both servers and fabric performance parameters. Fabric alert propagation from UFM to HP Insight CMU UFM fabric health reports launched from HP Insight CMU.
Inside HPC -> Super Computing 13 - Denver, CO Print 'N Fly Guide, including restaurants, entertainment and more in Denver, CO. Includes an exclusive interview with Michael Kagen, Mellanox CTO
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Mellanox Training Center 2Training Material
IB Principles
• Targets
• Fabric Components
• Fabric architecture
Fabric Discovery Stages
• Topology discovery
• Information Gathering
• Forwarding Tables
• Fabric SDN
• Fabric Activation
Protocol Layers Principle
• Supported Upper Layer protocols
• Transport layer
• Link Layer
• Physical Layer
Mellanox Products
• InfiniBand Switches
• Channel Adapters
• Cabling
• Fabric Management
HPC
3. Mellanox Training Center 3Training Material
Leading Supplier of End-to-End Interconnect Solutions
Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Virtual Protocol Interconnect
Storage
Front / Back-End
Server / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE &
FCoE
10/40/56GbE
Fibre Channel
Virtual Protocol Interconnect
4. Mellanox Training Center 4Training Material
Mellanox Common Target Implementations
Web 2.0 DB/EnterpriseHPC
Up to 10X
Performance and
Simulation Runtime
33% Higher GPU
Performance
Unlimited
Scalability
Lowest Latency
62% Better
Execution Time
42% Faster Messages
Per Second
Cloud
12X More
Throughput
Support More
Users at Higher
Bandwidth
Improve and
Guarantee SLAs
10X Database Query
Performance
4X Faster VM
Migration
More VMs per
Server and More
Bandwidth per VM
Storage
2X Hadoop
Performance
13X Memcached
Performance
4X Price/
Performance
Mellanox storage acceleration software provides >80%
more IOPS (I/O operations per second)
Financial Services
5. Mellanox Training Center 5Training Material
Mellanox VPI Interconnect Solutions
Mezzanine Card
VPI Adapter
Ethernet: 10/40 Gb/s
InfiniBand:10/20/40/
56 Gb/s
Networking Storage Clustering Management
Applications
Acceleration Engines
LOM Adapter Card
3.0
64 ports 10GbE
36 ports 40GbE
48 10GbE + 12 40GbE
36 ports IB up to 56Gb/s
8 VPI subnets
Switch OS Layer
Unified Fabric Manager
VPI Switch
7. Mellanox Training Center 7Training Material
Founded in 1999
Actively markets and promotes InfiniBand from an industry perspective through public relations
engagements, developer conferences and workshops
InfiniBand software is developed under OpenFabrics Open Source Alliance
http://www.openfabrics.org/index.html
InfiniBand standard is developed by the InfiniBand Trade Association (IBTA)
http://www.infinibandta.org/home
Steering Committee Members:
InfiniBand Trade Association (IBTA)
8. Mellanox Training Center 8Training Material
InfiniBand is a Switch Fabric Architecture
Interconnect technology connecting CPUs and I/O
Super high performance
• High bandwidth (starting at 10Gb/s and up to 100Gb/s)
• Low latency– fast application response across the cluster < 1µs end to end
( Mellanox switches 170 nanosec per HOP )
• Low CPU utilization with RDMA (Remote Direct Memory Access) –
Unlike Ethernet, TRAFFIC communication bypasses the OS and the CPU’s.
First industry standard high speed interconnect!
9. Mellanox Training Center 9Training Material
InfiniBand was originally designed for large-scale grids and clusters
Increased application performance
Single port solution for all LAN, SAN, and application communication
High reliability CLUSTER management (Redundant Subnet Manager)
Automatic Cluster switches and ports configuration performed by the Subnet Manager SW
InfiniBand is a Switch Fabric Architecture
First industry-standard high speed interconnect!
10. Mellanox Training Center 10Training Material
RDMA – How Does it Work
RDMA over InfiniBand
KERNELHARDWAREUSER
RACK 1
OS
NIC Buffer 1
Application
1
Application
2
OS
Buffer 1
NICBuffer 1
TCP/IP
RACK 2
HCA HCA
Buffer 1Buffer 1
Buffer 1
Buffer 1
Buffer 1
11. Mellanox Training Center 11Training Material
Industry-standard defined by the InfiniBand Trade Association
Defines System Area Network architecture
• Comprehensive specification: from physical to applications
Architecture supports
• Host Channel Adapters (HCA)
• Switches
• Routers
The InfiniBand Architecture
Processor
Node
InfiniBand
Subnet
Gateway
HCA
Processor
Node
Processor
Node
HCA
HCA
Storage
Subsystem
Consoles
RAID
Ethernet
Gateway
Fibre Channel
HCA
Subnet
Manager
•Switch
Switch
Switch
Switch
12. Mellanox Training Center 12Training Material
Host Channel Adapter (HCA)
• Device that terminates an IB link and
executes transport-level functions and
support the verbs interface
Switch
• A device that moves packets from one
link to another of the same IB Subnet
Router
• A device that transports packets
between different IBA subnets
Bridge/Gateway
• InfiniBand to Ethernet
InfiniBand Components Overview
13. Mellanox Training Center 13Training Material
Equivalent to a NIC (Ethernet)
- GUID Global Unique ID
Converts PCI to InfiniBand
CPU offload of transport operations
End-to-end QoS and congestion control
HCA bandwidth options:
• Single Data Rate 2.5GB/S * 4 = 10
• Double Data Rate 5 GB/S * 4 = 20
• Quadruple Data Rate 10GB/S * 4 = 40
• Fourteen Data Rate 14 Gb/s * 4 = 56
• Enhanced Data Rate 25 Gb/s * 4 = 100
Host Channel Adapters (HCA)
14. Mellanox Training Center 14Training Material
Any InfiniBand node requires GUID&LID addresses
GUID (Global Unique Identifier)- 64 bits address, “Like a Ethernet MAC address”
• Assigned by IB vendor
• Persistent through reboots
IB Switch “Multiple” Address GUIDS
• Node = Is meant to identify the HCA as a entity
• Port = Identifies the port as a port
• System = Allows to combine multiple GUIDS creating one entity
Global Unique Identifier (GUID) – Physical Address
15. Mellanox Training Center 15Training Material
A single 36 ports IB switch chip, is the Basic
Block for every IB switch module
We create a multiple ports switching module
using multiple chips
In this example we create 72 ports
switch, using 6 identical chips:
• 4 chips will function as lines
• 2 chips will function as core
The IB Fabric Basic Building Block
16. Mellanox Training Center 16Training Material
IB Fabric L2 Switching Addressing Local Identifier (LID)
Local Identifier- 16 bit L2 Address
• Assigned by the Subnet Manager when port becomes active
• Not persistent through reboots
LID Address Ranges
• 0x 0000 = Reserved
• 0x0001 = 0xBFFF = Unicast
• 0xc001 = 0xFFFE = Multicast
• 0xFFFF = Reserved for special use
17. Mellanox Training Center 17Training Material
Define different partitions for different customers
Define different partitions for different applications
Allows fabric partitioning for security purposes
Allows fabric partitioning for Quality of Service (QoS)
Each partition has an Identifier named PKEY
InfiniBand Network Segmentation – Partitions
PKEY ID 2
Sevice Level 3
PKEY ID 3
Sevice Level 3
PKEY ID 3
Sevice Level 1
18. Mellanox Training Center 18Training Material
Usage
• A 128 bit field in the Global Routing Header (GRH) used to route packets between different IB
subnets
• Multicast groups port identifier IB & IPOIB
Structure
• GUID- 64 bit identifier provided by the manufacturer
• IPv6 type header
• Subnet Prefix: A 0 to 64-bit:
- Identifier used to uniquely identify a set of end-ports which are managed by a common Subnet
Manager
GID - Global Identifier
port GUID: 0x0002c90300455fd1
fe80:0000:0000:0x0002c90300455fd1default gid:
19. Mellanox Training Center 19Training Material
Node: any managed entity– End Node, Switch, Router
Manager: active entity; sources commands and queries
• The subnet manager (SM)
Agent: passive (mostly) entity that will reside on every node, responds to Subnet Managers queries
Management Datagram (MAD):
• Standard message format for manager–agent communication
• Carried in an unreliable datagram (UD)
IB Basic Management Concepts
Agent
Agent
Agent
Agent
Agent
Agent
Agent
Agent
20. Mellanox Training Center 20Training Material
Objectives of Subnet Management
Initialization and configuration of the subnet elements
Establishing best traffic paths between source to destination through the subnet
Fault isolation
Continue these activities during topology changes
Prevent unauthorized Subnet Managers
21. Mellanox Training Center 21Training Material
Node & Switch Main identifiers
IB Port Basic Identifiers
• Host Channel Adapter– HCA (IB “NIC” )
• Port number
• Global Universal ID– GUID 64 bit (like mac) ex. 00:02:C9:02:00:41:38:30
- Each 36 ports “basic “ switch has its own switch & system GUID
- All ports belong to the same “basic “ switch will share the switch GUID
• Local Identifier - LID
LID
• Local Identifier that is assigned to any IB device by the SM and used for packets switching within an IB fabric .
• All ports of the same ASIC unit are using the same LID
22. Mellanox Training Center 22Training Material
Node & Switch Main identifiers
Virtual Lane
• Each Virtual Lane uses different buffers to send its packet towards the other side
• VL 15 is used for management only SM traffic
• VL 0-7 are used for traffic
• Used to separate different bandwidth & QoS using same physical port
23. Mellanox Training Center 23Training Material
De-Mux
Mux
Traffic Packets VL 0-7
Packets Transmitted
Link Control
VL-15
Traffic Packets VL 0-7
Node & Switch Main identifiers
25. Mellanox Training Center 25Training Material
1. Physical Subnet Establishment
2. Subnet Discovery
3. Information Gathering
4. LID Assignment
5. Path Establishment
6. Port Configuration
7. Switch Configuration
8. Subnet Activation
Subnet Manager & Fabric configuration Process
26. Mellanox Training Center 26Training Material
Subnet Manager (SM) Rules & Roles
Every subnet must have at least one
- Manages all elements in the IB fabric
- Discover subnet topology
- Assign LIDs to devices
- Calculate and program switch chip forwarding tables (LFT pathing)
- Monitor changes in subnet
Implemented anywhere in the fabric
- Node, Switch, Specialized device
No more than one active SM allowed
- 1 Active (Master) and remaining are Standby (HA)
27. Mellanox Training Center 27Training Material
1. The SM wakes up and starts the Fabric Discovery
process
2. The SM starts “ conversation “ with every node ,
over the InfiniBand link it is connected to .
in this stage the discovery stage, the SM collects :
• Switch Information followed by port information
• Host information
3. Any switch which is already discovered , will be used
as a gate for the SM , for further discovery of all this
switch links and the switches it is connected to
known also as its neighbors.
Fabric Discovery (A)
28. Mellanox Training Center 28Training Material
4. The SM gathers information by sending and receiving SMPs (Subnet Management Packets)
a. These special management packets are sent on Virtual Lane 15 (VL15)
• VL15 is a special NON flow controlled VL
b. Two primary “types” of SMPs creating Cluster
routing table:
• Directed routing (DR) table based on
Nodes GUIDS & port number
• This is the type primarily used by OpenSM
c. LID routing (LR)
• Topology and than packets routing table ,
Based on the LIDS which have been assigned to each node by the SM
Fabric Discovery (B)
29. Mellanox Training Center 29Training Material
Node Info Gathered
• Node type
• Num of ports
• GUID
• Partition table size
Port Info Gathered
• Forwarding Database size
• MTU
• Width
• VLs
Fabric Information Gathering During Discovery
30. Mellanox Training Center 30Training Material
Fabric Direct Route Information Gathering
Building the direct routing table
from & to each one of the fabric elements
Each node in a path is identified by its port number & GUID
The table content is saved in the SM LMX table
Switc
h-1
Switch-2 Switch-5 Switch-3 Switch-6 Switch-4 H-11 H-16
Switc
h-1
Port2 Port2
Switch2
Port2
Port 3 Port 3
Switch 3
Port 5
Port 8 Port 3
Switch
3_Port5
Switch
7_Port29
Port 8
Switch
4_Port 30
Switch
3_Port 5
Switch
6_Port 29
Switc
h-1
Port 8
Switch
5_Port9
Switch
3_Port5
H11 Port 1
Switch
6_Port2
Switch
3_Port4
Switch
4_Port30
Port 1
Switch
6_Port2
Switch
3_Port1
Switch1_P
ort2
Switch2_P
ort2
Port 1
Switch6_P
ort2
Port 1 Port 1
Switch
6_Port2
Switch
3_Port4
Port 1
Switch
6_Port 2
Switch
3_Port 4
Switch
4_Port 30
31. Mellanox Training Center 31Training Material
LID Assignment
After the SM finished gathering
any needed subnet information, it assigns a base LID and LMC
to each one of the attached end ports
• The LID is assigned to at the port rather than device level
• Switch external ports do not get/need LIDs
The DLID is used as the main address for InfiniBand packet
switching
Each Switch port can be identified by the combination
of LID & port number
LID 3
LID 21 LID 22 LID 23
LID 72 LID 75 LID 82
LID 81
32. Mellanox Training Center 32Training Material
Linear Forwarding Table Establishment (Path Establishment)
After the SM finished gathering
all Fabric information , including direct route tables ,
it assigns a LID to each one of the NODES
At this stage the LMX table will be populated with the relevant route
options to each one of the nodes
The output of the LMX will provide the Best Route
to Reach a DLID as well as the other Routes .
The Best Path Result Will be based on Shortest Path First (SPF)
algorithm
21 1 2 3 1
22 2 1 2 1
23 3 2 1 1
75 3 2 3 2
81 4 3 4 3
82 4 3 2 2
The
Dest.
LID
Best
Route/
exit
port
21 2
22 3
23 8
75 3
81 3
82 8
D-LID
PORT
LID 3
LID 21 LID 22 LID 23
LID 72 LID 75 LID 82
LID 81
33. Mellanox Training Center 33Training Material
LID Routed (LR) Forwarding
Uses the LFT tables
Based on the data gathered on the LMX – Direct Routing
It is the standard routing of packets used by switches
Uses regular link-level headers to define destination and other
information, such as:
• DLID = LID of the final destination
• SL = Service Level of the path
• Each switch uses the forwarding table and SL to VL table to decide on
the packet’s output port/VL
LFT Switch_1
The
Destination
LID
Best
Route/
exit port
21 2
22 3
23 8
75 3
81 3
82 8
34. Mellanox Training Center 34Training Material
LRH: Local Routing Header :
• Source & Destination LID
• Service Level-SL
• Virtual Lane-VL
• Packet Length
LID Routed (LR) Forwarding
LFT Switch_1
The
Dest.
LID
Best
Route/
exit
port
21 2
22 3
23 8
75 3
81 3
82 8
LRH GRH BTH Ext
HDRs
Playload ICRC VCRC
InfiniBand Data Packet
8B 40B 12B Var 0…4096B 4B 2B
35. Mellanox Training Center 35Training Material
Light sweep :
• Routine sweep of the Subnet Manager
• By default runs every 30 second
• Requires all switches to switch and port info
Light Sweep traces :
• Ports status change
• New SM speaks on the subnet
• Subnet Manager changes priority
Tracking FABRIC STATUS – SM Sweeps
LID 3
LID 21 LID 22 LID 23
LID 72 LID 75 LID 82
LID 81
36. Mellanox Training Center 36Training Material
Any change traced by the light sweep will cause Heavy Sweep
IB TRAP
• Changes of status of a switch will cause an on line IB TRAP
that will be sent to the Subnet Manager and cause Heavy Sweep
Heavy Sweep
• Will cause all SM fabric discovery to be performed from scratch
Tracking FABRIC STATUS – SM Sweeps
38. Mellanox Training Center 38Training Material
InfiniBand Fabric Commonly Used Topologies
Back to Back
2 Tier Fat Tree
Modular switches are based on Fat Tree architecture:
3D Torus Dual Star Hybrid
39. Mellanox Training Center 39Training Material
The IB Fabric Basic Building Block
A single 36 ports IB switch chip, is the Basic
block for every IB switch module
We create a multiple ports switching Module
using multiple chips
In this example we create 72 ports
switch, using 6 identical chips
• 4 chips will function as lines
• 2 chips will function as core
40. Mellanox Training Center 40Training Material
CLOS Topology
Pyramid Shape Topology
The switches at the top of the pyramid are called Spines/Core
• The Core/Spine switches are interconnected to the other switch environments
The switches at the bottom of the Pyramid are called Leafs/Lines/Edges
• The Leaf/Lines/Edge are connected to the fabric nodes/hosts
In a non blocking CLOS fabric there are equal number of external and internal connections
Internal Connections Spines/Core
External Connections Leaf/Line/Edge
41. Mellanox Training Center 41Training Material
External connections :
• The connections between the hosts and the Line switches
Internal Connections
• The connections between the core and the Line switches
In a non blocking fabric there is always a balanced cross
bisectional bandwidth
In case the number of external connections is higher than internal connections, we have a blocking configuration
CLOS Topology
Internal Connections Spines/Core
External Connections Leaf/Line/Edge
42. Mellanox Training Center 42Training Material
CLOS - 3
The topology detailed here is called CLOS 3
The maximum traffic path between source to destination
includes 3 HOPS (3 switches)
Example a session between A to B
• One Hop from A to switch L1-1
• Next Hop from switch L1-1 to switch L2-1
• Last Hop from L2-1 to L1-4
Spines/Core
Leaf/Line/Edge
43. Mellanox Training Center 43Training Material
In this example we can see 108 non blocked fabric
• 108 hosts are connected to the line switches
• 108 links connect between the line switches to the core switches to enable non blocking
interconnection of the line switches
CLOS - 3
18*6=108
45. Mellanox Training Center 45Training Material
IB Switch - L2
Upper Level
Protocols
Transport
Layer
Network
Layer
Link Layer
Physical
Layer
Client
Transactions
Messages
Que Pairs
Inter Subnet Routing
End Node Switch L2 End Node
L2 Switching LID Based
Client
IBA
Operations
SAR
IBA
Operations
SAR
Network Network
Link
Encoding
Media
Access
Control
Link
Encoding
Media
Access
Control
Packet
Relay
MAC
MAC
46. Mellanox Training Center 46Training Material
IB Architecture Layers
Software Transport Verbs and Upper Layer Protocols:
- Interface between application programs and hardware.
- Allows support of legacy protocols such as TCP/IP
- Defines methodology for management functions
Physical:
- Signal levels and frequency, media, connectors
Transport:
- Delivers packets to the appropriate Queue Pair;
Message Assembly/De-assembly, access rights, etc.
Data Link (symbols and framing):
- From source to destination on the same partition subnet
Flow control (credit-based); How packets are routed
Network:
- How packets are routed between different partitions/subnets
Client
Transactions
Messages
Que Pairs
Inter Subnet Routing
End Node Switch L2 End Node
L2 Switching LID Based
Client
IBA
Operations
SAR
IBA
Operations
SAR
Network Network
Link
EncodingMedia
Access
Control
Link
EncodingMedia
Access
Control
Packet
Relay
MAC
MAC
47. Mellanox Training Center 47Training Material
InfiniBand Header Structure
Upper Layer Protocol Transactions
Messages
Que Pairs
Inter Subnet Routing
End Node Switch L2 End Node
L2 Switching LID Based
Network
Link Layer
Packet
Relay
MAC
MAC
Transport
Original Message
SAR SAR SAR
SAR
Packets
Subnet Prefix +GUID
Network
Link Layer
Transport
SAR SAR SAR
SAR
Physical
Layer
Physical
Layer
Upper Layer Protocol
Original Message
48. Mellanox Training Center 48Training Material
The Network and Link protocols deliver a packet to the desired destination.
The Transport Layer
• Segmenting Assembly & Reassembly of :
-Messages data payload coming from the Upper Layer, into multiple packets that will suit valid
MTU size
• Delivers the packet to the proper Queue Pair (assigned to a specific session )
• Instructs the QP how to process the packet’s data (Work Request Eelement )
• Reassembles the packets arriving from the other side into messages
Transport Layer – Responsibilities
Transmit
Receive
WQE
Local QP
49. Mellanox Training Center 49Training Material
Upper Layer Protocol Transactions
Messages
Que Pairs
Inter Subnet Routing
End Node Switch L2 End Node
L2 Switching LID Based
Network
Link Layer Packet
Relay
MAC
MAC
Transport
Original Message
SAR SAR SAR
SAR
Packets
Subnet Prefix +GUID
Network
Link Layer
Transport
SAR SAR SAR
SAR
Physical
Layer
Physical
Layer
Upper Layer Protocol
Original Message
Transport Layer – Responsibilities
50. Mellanox Training Center 50Training Material
Switches use FDB (Forwarding Database)
• Based on DLID and SL, a packet is sent to the correct output port +specific VL
Layer 2 Forwarding
RX
Inbound Packet
Outbound Packet
link
RX
TX
RX
TX
SL DLID Payload
FDB
(DLID to Port)
SL to VL
Table
51. Mellanox Training Center 51Training Material
Arbitration
De-
mux
Mux
Link
Control
Packets
Credits
Returned
Link Control
Receive
BuffersPackets
Transmitted
Credit-based link-level flow control
• Link Flow Control assures NO packet loss within fabric even in the presence of congestion
• Link Receivers grant packet receive buffer space credits per Virtual Lane
• Flow Control credits are issued in 64 byte units
Separate flow control per Virtual Lanes provides:
• Alleviation of head-of-line blocking
Virtual Fabrics
Congestion and latency on one VL, does not impact traffic with guaranteed QoS on another VL,
even though they share the same Physical link
Link Layer – Flow Control
52. Mellanox Training Center 52Training Material
InfiniBand is a Lossless fabric.
Maximum Bit Error Rate (BER) allowed by the IB spec is 10e-12.
Statistically Mellanox fabrics provides around 10e-15
The Physical layer should guaranty affective signaling to meet this BER requirement
Physical Layer- Responsibilities
53. Mellanox Training Center 53Training Material
Industry standard Media types
• Copper: 7 Meter QDR , 3 METER FDR
• Fiber: 100/300m QDR & FDR
64/66 encoding on FDR links
• Encoding makes it possible to send digital high speed signals to a longer distance enhances
performance & bandwidth effectiveness
• X actual data bits are sent on the line by Y signal bits
• 64/66 * 56 = 54.6Gbps
8/10 bit encoding (DDR and QDR)
• X/Y line efficiency (example 80% * 40 = 32Gbps)
Physical Layer Cont
4X QSFP Fiber 4X QSFP Copper
54. Mellanox Training Center 54Training Material
Mellanox cables are rebranded from a cable vendor
• Mellanox cables are manufactured by Mellanox
Our vendor can sell the same cables
• No other vendor is allowed to sell Mellanox cables
Mellanox cables use a different assembly procedure
Mellanox cables are tested with unique test suite
Vendors’ “Finished Goods” fail Mellanox dedicated testing
Mellanox allows the customers to use any IBTA IB approved cables
Mellanox Cables – Perceptions Vs. Facts
Passive Copper Cables SFP+
Active Optical Cables
Active Copper Cables
55. Mellanox Training Center 55Training Material
Superior design and qualification process
Committed to Bit Error Rate (BER), better than 10-15
Longest reach with Mellanox end-to-end solution
Mellanox Passive Copper Cables
Data Rate PCC Max Reach
FDR 3 meter
FDR10 5 meter
QDR 7 meter
40GbE 7 meter
10GbE 7 meter
56. Mellanox Training Center 56Training Material
Superior design and qualification process
Committed to Bit Error Rate (BER), better than 10-15
Longest reach with Mellanox end-to-end solution
Optical Performance Optimization (patent pending)
Mellanox Active Fiber Cables
Data Rate Max Reach
FDR 300 meter
FDR10 100 meter
QDR 300 meter
40GbE 100 meter
60. Mellanox Training Center 60Training Material
FDR InfiniBand Switch Portfolio
648 port 324 port 216 port 108 port
Modular Switches Edge Switches
SX6025 – 36 ports externally managed SX6036 – 36 ports managed
SX6018 – 18 ports managed
Management
SX6015 – 18 ports externally managed
SX6012 – 12 ports managedSX6005 – 12 ports externally managed
Long Distance Bridge − VPI
Bridging
Routing
NEW NEW
NEW
NEW NEW
NEW
61. Mellanox Training Center 61Training Material
SwitchX® VPI Technology Highlights
VPI per Port
Same box runs
InfiniBand AND Ethernet
VPI on Box
Same box runs
InfiniBand OR Ethernet
VPI Bridging
Same box bridges
InfiniBand AND Ethernet
Bridging
Routing
3
InfiniBand
2
Virtual Protocol Interconnect ® (VPI)
One Switch – Multiple Technologies
1
Ethernet InfiniBand Ethernet
62. Mellanox Training Center 62Training Material
Provides InfiniBand and Ethernet Long-Haul Solutions
of up to 80km for campus and metro applications.
Connecting between data centers deployed across
multiple geographically distributed sites
Extending InfiniBand RDMA and Ethernet RoCE
beyond local data centers and storage clusters.
Perfect cost-effective, low power, easily managed and
scalable solution
Managed as a single unified network infrastructure.
MetroX™ - Mellanox Long-Haul Solutions
63. Mellanox Training Center 63Training Material
MetroDX and MetroX Features
TX6000 TX6100 TX6240 TX6280
Distance 1KM 10KM 40KM 80KM
Throughput 640Gb/s 240Gb/s 80Gb/s 40Gb/s
Port Density
16p X FDR10 long haul
16p X FDR downlink
6p X 40Gb/s long haul
6p X 56Gb/s downlink
2p X 10/40Gb/s long haul
2p X 56Gb/s downlink
1p X 10/40Gb/s long haul
1p X 56Gb/s downlink
Latency 200ns + 5us/km over fiber 200ns + 5us/km over fiber 700ns + 5us/km over fiber 700ns + 5us/km over fiber
Power ~200W ~200W ~280W ~280W
QoS One data VL + VL15 One data VL + VL15 One data VL + VL15 One data VL + VL15
Space 1RU 1RU 2RU 2RU
64. Mellanox Training Center 64Training Material
Mellanox Host Channel Adapters (HCA)
Reference to the following Document :
ConnectX®-3 VPI Single and Dual QSFP Port Adapter Card User Manual
http://www.mellanox.com/page/products_dyn?product_family=119&mtag=connectx_3_vpi
65. Mellanox Training Center 65Training Material
Up to 56Gb/s InfiniBand or 40 Gigabit Ethernet per port
PCI Express 3.0 (up to 8GT/s)
CPU offload of transport operations
Application offload
GPU communication acceleration
End-to-end QoS and congestion control
Hardware-based I/O virtualization
Dynamic power management
Fiber Channel encapsulation (FCoIB or FCoE)
Ethernet encapsulation (EoIB)
HCA ConnectX-3 InfiniBand Main Features
VirtualizationDatabase Cloud ComputingHPC
66. Mellanox Training Center 66Training Material
Adapters offering
ConnectX-3 Pro
NVGRE and VxLAN HW off-load
RoCE V2(UDP)
ECNQCN
ConnectX-3
VPI
Up to 56Gb IB
Up to 56 GbE
RDMA
CPU off-load
SR-IOV
Connect-IB
Up to 56Gb IB
Greater than 100Gb bi-directional
DC
T10DIF
PCIE x16
More than 130M message/sec
68. Mellanox Training Center 68Training Material
Enable fast cluster bring-up
• Point out issues with devices, systems, cables
• Provide inventory including cables, devices, FW, SW
• Perform device specific (proprietary) checks
• Eye-Opening and BER checks
• Catch cabling mistakes
Validate Subnet Manager work
• Verify connectivity at the lowest level possible
• Report subnet configuration
• SM agnostic
Goal of Fabric Utilities in HPC Context
69. Mellanox Training Center 69Training Material
Diagnose L2 communication failures
• At the entire subnet level
• On a point to point path
Monitor the Network Health
• Continuous and with low overhead
Goal of Fabric Utilities in HPC Context
70. Mellanox Training Center 70Training Material
ibutils: ibdiagnet/ibdiagpath
• An automated L2 health analysis procedure
• Text interface
• No dedicated “monitoring” mode
• Significant development past year on features and runtime performance at scale
UFM
• Highend monitoring and Provisioning capabilities
• GUI based with CLI options
• Includes ibutils capabilities with additional features
• Central device management
- Fabric dashboard
- Congestion analysis
• System Integration Capabilities
- SMP Traps and Alarms
Fabric Management Solution Overview
71. Mellanox Training Center 71Training Material
UFM in the Fabric
Software or Appliance form factor
2 or more High Availability
Switch and HCA management
Full Mgmt or Monitoring Only modes
Synchronization, Heartbeat
73. Mellanox Training Center 73Training Material
Extensible architecture
• Based on Web-services
Open API for users or 3rd-party extensions
• Allows simple reporting, provisioning, and monitoring
• Task automation
• Software Development Kit
Extensible object model
• User-defined fields
• User-defined menus
Integration with 3rd Party Systems
Web Based API
Read
Write
Manage
Monitoring System
Configuration Mgmt
Orchestrator
Job Scheduler…
Alerts via
SNMP Traps
Web Based
API
74. Mellanox Training Center 74Training Material
UFM – Comprehensive Robust Management
Automatic
Discovery
Central Device
Management
Fabric
Dashboard
Congestion
Analysis
Health & Performance
Monitoring
Service Oriented
Provisioning
Fabric Health
Reports
75. Mellanox Training Center 75Training Material
UFM Main Features
Automatic Discovery Central Device Mgmt Fabric Dashboard Congestion Analysis
Health & Perf Monitoring Advanced Alerting Fabric Health Reports Service Oriented Provisioning
76. Mellanox Training Center 76Training Material
Monitor & analyze fabric performance
• Bandwidth utilization
• Unique congestion monitoring
• Dashboard for aggregated fabric view
Real-time fabric-wide health monitoring
• Monitor events and errors through-out the fabric
• Threshold based alarms
• Granular monitoring of host and switch parameters
Innovative congestion mapping
• One view for fabric-wide congestion and traffic patterns
• Enables root cause analysis for routing, job placement
or resource allocation inefficiencies
All is managed at the job/aggregation level
Advanced Monitoring and Analysis