SlideShare a Scribd company logo
1 of 19
Download to read offline
QsNetIII An HPC Interconnect
  for PetaScale Systems

    Duncan Roweth, Quadrics Ltd
     ISC08 Dresden June 2008
Quadrics Background


• Develops interconnect products for the HPC market
   – HPC Linux systems
   – AlphaServer SC systems
• Quadrics is owned by the Finmeccanica group
• Quadrics will be 12 years old in July
Interconnect Network – QsNet

• QsNetIII Network                           • QsNetIII Components
   –                                            –
       Multi-stage switch network                   ASICs Elan5 and Elite5
   –                                            –
       Evolution of the QsNetII design              Adapters, switches, cables
   –                                            –
       Increased use of commodity hardware          Firmware, drivers, libraries
   –                                            –
       Increasing support for standard              Diagnostics, documentation
       software
Elan5 Adapter Overview


                                    CX4/              CX4/

 •                                 QSNetIII          QSNetIII
                QsNetIII
   2 × 25 Gbit/s        links
 • PCIe, PCIe2 host interface                                                                                   Elan5 Adapter
                                     Link              Link


 • Multiple packet engines       Packet Engine     Packet Engine     Packet Engine        Packet Engine       Packet Engine        Packet Engine       Packet Engine
                                 16K inst cache    16K inst cache     16K inst cache      16K inst cache       16K inst cache       16K inst cache     16K inst cache


 • 512KB of high bandwidth on    9K data buffers   9K data buffers    9K data buffers     9K data buffers      9K data buffers      9K data buffers    9K data buffers




   chip local memory                                                                         Fabric


 • SDRAM interface to optional                                                   x8



   local memory                                                                                                                                             Bridge
                                    Host I/F                    Local Memory                      Local Functions
                                                                                                                                 Object Cache Tags
                                       TLB



 • Buffer manager, object
                                                                                                    Buffer Manager                External cache
                                    Cmd Launch

                                                                                                                             SDRAM i/f       Ext i/f
                                                                                                        Free List
                                       PCIe
                                                        16K x 8 x 8 banks = 1MB ECC RAM                                                                       PLL
   cache
                                      SERDES




                                                                                                                             External EEPROM                Clocks
                                     PCIe
                                                                                                                              DDRII
                                   16 Lanes
QsNetIII Adapter Overview



•   QM700 PCIe x16
•   128MB adapter memory
•   2 QSFP links
•   Half height low profile



• Adapters variants
    – PCIe Gen2
    – Blade formats
    – 10Gbit/s Ethernet 10GBase-CX4
Elite5 - Overview


• Physical layer DDR XAUI
     – 4 x 6.25Gbit/s (2.5Gbytes/s)
       in each direction
•   32-way crosspoint router
•   32 virtual channels per link
•   Fat tree or mesh topologies
•   Adaptive routing
•   Broadcast & barrier support
•   Memory mapped stats & error
    counters accessed via control
    network
QsNetIII Adaptive Routing


• Packet by packet dynamic routing
   – Single cycle routing decision
• Selects route based on
   – Link state, errors etc
   – Number of pending acks
• High radix switches
   – 2 routing decisions for 2048 nodes
• More flexible than QsNetII
   – Operates on groups of links
   – Can adaptively route up or down
Bandwidth scalability – 1024 nodes


• Bandwidth achieved
  when 1024 nodes all
  communicate at the
  same time
• QsNetII provides better
  average bandwidth
  and much narrower
  spread in best to worst
  case performance



      System     Interconnect                     Min                    Max                 Average
      Atlas      Infiniband                        95                    762                     263
                 QsNetII
      Thunder                                     248                    403                     369

                  Data from Lawrence Livermore National Lab, published at the Sonoma OpenFabrics workshop June 2007
QsNetIII Device Overview




             Elan                     Elite
       Manufacturing partner LSI/TSMC G90 process
        Semi custom ASICs, 500MHz system clock
             High performance BGA package
            672 pin                 982 pin
             17W                      18W
QsNetIII – Federated Network Switches


• Node switch chassis
   – 128 links up 128 down


• Same chassis provides multiple
  top switch configurations:
   –   644 512-way systems
   –   328 1024-way systems
   –   1616 2048-way systems
   –   832 4096-way systems
QsNetIII Network 4096–way
QsNetIII cables


• QSFP connectors throughout
• Optical cables (e.g.Luxtera), 5-300m
    – PVDF Plenum rated
    – LSZH available as an option
• Active copper cables (Gore), 8-20m
• Copper cables (Gore) 1-10m
• No longer Quadrics proprietary

• Bit error rates are a big issue at 5 Gbps
  and above
    – Optical cables between switches
    – Short copper cables from nodes
QsNetIII for HP BladeSystem


Elan5 mezzanine adapter
2 QsNet links                 Elite5 switch module
PCI-E x8 (initially)          Full bandwidth
128 MB of memory              16 links to the blades (via backplane)
                              16 links to back of the module
2048-way QsNetIII BladeSystem Network
Building a 16K node system in 2009/10


• Single water cooled rack will   • 8 Blade switches per rack
  provide 1000-2000 standard      • Connect 128 of these racks
  cores ~12-25 TF.                  with 1024-way top switches




                                  • Single fibre cable per node -
                                    for full bi-section bandwidth.
QsNetIII Fault Tolerance


• All of the QsNetII Features
   –   CRCs on every packet
   –   Automatic retransmission
   –   Adaptive routing avoids failed links
   –   Redundant routes
   –   Redundant, hot plugable, PSUs and fans


+ Full line rate testing of each link as it comes up
   – Switches generate CRPAT, CJPAT or PRBS packets
   – Links are only added to the route tables when they are (a)
     up, (b) connect to the right place, and (c) can transfer data
     without error.
Software Model – Firmware & Drivers


• Base firmware in the ROMs
• Firmware modules loadable with the device driver
   – Elan, OpenFabrics, 10GE Ethernet, …
• Kernel modules
   – elan5, elan, rms
• Device dependent library (libelan5)
• Device independent library (libelan)
• User libraries
Software Model – Elan Libraries


• Point-to-point message      • Optimised collectives
  passing                     • Locks and atomics ops
• One-sided put/get           • Global memory allocation
• Transparent rail striping
Why Quadrics?


• Focus on the most demanding HPC applications
• Delivers large system scalability
   – All nodes achieve host adapter bandwidth at the same time
   – Minimal spread between best and worst case performance
   – Low and uniform latency
   – Highly optimised collectives
• Single supplier of interconnect hardware, software, support
• Stability of our products
• Track record of delivering production systems
• European company

More Related Content

What's hot

From virtual to high end HW routing for the adult
From virtual to high end HW routing for the adultFrom virtual to high end HW routing for the adult
From virtual to high end HW routing for the adultMarketingArrowECS_CZ
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryNetronome
 
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015
Eliminating SAN Congestion Just Got Much Easier-  webinar - Nov 2015 Eliminating SAN Congestion Just Got Much Easier-  webinar - Nov 2015
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015 Tony Antony
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDKLagopus SDN/OpenFlow switch
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaJim St. Leger
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...inside-BigData.com
 
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OpenvSwitch
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Intel® Ethernet Update
Intel® Ethernet Update Intel® Ethernet Update
Intel® Ethernet Update Michelle Holley
 
Virtual Network Performance Challenge
Virtual Network Performance ChallengeVirtual Network Performance Challenge
Virtual Network Performance ChallengeStephen Hemminger
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPIJeff Squyres
 
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OpenvSwitch
 
Cisco nexus 7000, nexus 5000 and 2000 fa qs
Cisco nexus 7000, nexus 5000 and 2000 fa qsCisco nexus 7000, nexus 5000 and 2000 fa qs
Cisco nexus 7000, nexus 5000 and 2000 fa qsIT Tech
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFNetronome
 

What's hot (20)

From virtual to high end HW routing for the adult
From virtual to high end HW routing for the adultFrom virtual to high end HW routing for the adult
From virtual to high end HW routing for the adult
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDK
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
 
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015
Eliminating SAN Congestion Just Got Much Easier-  webinar - Nov 2015 Eliminating SAN Congestion Just Got Much Easier-  webinar - Nov 2015
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
Cisco nx os
Cisco nx os Cisco nx os
Cisco nx os
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
Cisco data center training for ibm
Cisco data center training for ibmCisco data center training for ibm
Cisco data center training for ibm
 
Решения NFV в контексте операторов связи
Решения NFV в контексте операторов связиРешения NFV в контексте операторов связи
Решения NFV в контексте операторов связи
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
 
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
NFV в сетях операторов связи
NFV в сетях операторов связиNFV в сетях операторов связи
NFV в сетях операторов связи
 
Intel® Ethernet Update
Intel® Ethernet Update Intel® Ethernet Update
Intel® Ethernet Update
 
Virtual Network Performance Challenge
Virtual Network Performance ChallengeVirtual Network Performance Challenge
Virtual Network Performance Challenge
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPI
 
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
 
Cisco nexus 7000, nexus 5000 and 2000 fa qs
Cisco nexus 7000, nexus 5000 and 2000 fa qsCisco nexus 7000, nexus 5000 and 2000 fa qs
Cisco nexus 7000, nexus 5000 and 2000 fa qs
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 

Similar to QsNetIII, An HPC Interconnect For Peta Scale Systems

IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 
Recent Developments in Donard
Recent Developments in DonardRecent Developments in Donard
Recent Developments in DonardPMC-Sierra Inc.
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Shuquan Huang
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesNetronome
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:Tony Antony
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14KALRAY
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 

Similar to QsNetIII, An HPC Interconnect For Peta Scale Systems (20)

XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Recent Developments in Donard
Recent Developments in DonardRecent Developments in Donard
Recent Developments in Donard
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
 
pps Matters
pps Matterspps Matters
pps Matters
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

QsNetIII, An HPC Interconnect For Peta Scale Systems

  • 1. QsNetIII An HPC Interconnect for PetaScale Systems Duncan Roweth, Quadrics Ltd ISC08 Dresden June 2008
  • 2. Quadrics Background • Develops interconnect products for the HPC market – HPC Linux systems – AlphaServer SC systems • Quadrics is owned by the Finmeccanica group • Quadrics will be 12 years old in July
  • 3. Interconnect Network – QsNet • QsNetIII Network • QsNetIII Components – – Multi-stage switch network ASICs Elan5 and Elite5 – – Evolution of the QsNetII design Adapters, switches, cables – – Increased use of commodity hardware Firmware, drivers, libraries – – Increasing support for standard Diagnostics, documentation software
  • 4. Elan5 Adapter Overview CX4/ CX4/ • QSNetIII QSNetIII QsNetIII 2 × 25 Gbit/s links • PCIe, PCIe2 host interface Elan5 Adapter Link Link • Multiple packet engines Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache • 512KB of high bandwidth on 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers chip local memory Fabric • SDRAM interface to optional x8 local memory Bridge Host I/F Local Memory Local Functions Object Cache Tags TLB • Buffer manager, object Buffer Manager External cache Cmd Launch SDRAM i/f Ext i/f Free List PCIe 16K x 8 x 8 banks = 1MB ECC RAM PLL cache SERDES External EEPROM Clocks PCIe DDRII 16 Lanes
  • 5. QsNetIII Adapter Overview • QM700 PCIe x16 • 128MB adapter memory • 2 QSFP links • Half height low profile • Adapters variants – PCIe Gen2 – Blade formats – 10Gbit/s Ethernet 10GBase-CX4
  • 6. Elite5 - Overview • Physical layer DDR XAUI – 4 x 6.25Gbit/s (2.5Gbytes/s) in each direction • 32-way crosspoint router • 32 virtual channels per link • Fat tree or mesh topologies • Adaptive routing • Broadcast & barrier support • Memory mapped stats & error counters accessed via control network
  • 7. QsNetIII Adaptive Routing • Packet by packet dynamic routing – Single cycle routing decision • Selects route based on – Link state, errors etc – Number of pending acks • High radix switches – 2 routing decisions for 2048 nodes • More flexible than QsNetII – Operates on groups of links – Can adaptively route up or down
  • 8. Bandwidth scalability – 1024 nodes • Bandwidth achieved when 1024 nodes all communicate at the same time • QsNetII provides better average bandwidth and much narrower spread in best to worst case performance System Interconnect Min Max Average Atlas Infiniband 95 762 263 QsNetII Thunder 248 403 369 Data from Lawrence Livermore National Lab, published at the Sonoma OpenFabrics workshop June 2007
  • 9. QsNetIII Device Overview Elan Elite Manufacturing partner LSI/TSMC G90 process Semi custom ASICs, 500MHz system clock High performance BGA package 672 pin 982 pin 17W 18W
  • 10. QsNetIII – Federated Network Switches • Node switch chassis – 128 links up 128 down • Same chassis provides multiple top switch configurations: – 644 512-way systems – 328 1024-way systems – 1616 2048-way systems – 832 4096-way systems
  • 12. QsNetIII cables • QSFP connectors throughout • Optical cables (e.g.Luxtera), 5-300m – PVDF Plenum rated – LSZH available as an option • Active copper cables (Gore), 8-20m • Copper cables (Gore) 1-10m • No longer Quadrics proprietary • Bit error rates are a big issue at 5 Gbps and above – Optical cables between switches – Short copper cables from nodes
  • 13. QsNetIII for HP BladeSystem Elan5 mezzanine adapter 2 QsNet links Elite5 switch module PCI-E x8 (initially) Full bandwidth 128 MB of memory 16 links to the blades (via backplane) 16 links to back of the module
  • 15. Building a 16K node system in 2009/10 • Single water cooled rack will • 8 Blade switches per rack provide 1000-2000 standard • Connect 128 of these racks cores ~12-25 TF. with 1024-way top switches • Single fibre cable per node - for full bi-section bandwidth.
  • 16. QsNetIII Fault Tolerance • All of the QsNetII Features – CRCs on every packet – Automatic retransmission – Adaptive routing avoids failed links – Redundant routes – Redundant, hot plugable, PSUs and fans + Full line rate testing of each link as it comes up – Switches generate CRPAT, CJPAT or PRBS packets – Links are only added to the route tables when they are (a) up, (b) connect to the right place, and (c) can transfer data without error.
  • 17. Software Model – Firmware & Drivers • Base firmware in the ROMs • Firmware modules loadable with the device driver – Elan, OpenFabrics, 10GE Ethernet, … • Kernel modules – elan5, elan, rms • Device dependent library (libelan5) • Device independent library (libelan) • User libraries
  • 18. Software Model – Elan Libraries • Point-to-point message • Optimised collectives passing • Locks and atomics ops • One-sided put/get • Global memory allocation • Transparent rail striping
  • 19. Why Quadrics? • Focus on the most demanding HPC applications • Delivers large system scalability – All nodes achieve host adapter bandwidth at the same time – Minimal spread between best and worst case performance – Low and uniform latency – Highly optimised collectives • Single supplier of interconnect hardware, software, support • Stability of our products • Track record of delivering production systems • European company