QsNetIII An HPC Interconnect
  for PetaScale Systems

    Duncan Roweth, Quadrics Ltd
     ISC08 Dresden June 2008
Quadrics Background


• Develops interconnect products for the HPC market
   – HPC Linux systems
   – AlphaServer SC syste...
Interconnect Network – QsNet

• QsNetIII Network                           • QsNetIII Components
   –                     ...
Elan5 Adapter Overview


                                    CX4/              CX4/

 •                                 QS...
QsNetIII Adapter Overview



•   QM700 PCIe x16
•   128MB adapter memory
•   2 QSFP links
•   Half height low profile



•...
Elite5 - Overview


• Physical layer DDR XAUI
     – 4 x 6.25Gbit/s (2.5Gbytes/s)
       in each direction
•   32-way cros...
QsNetIII Adaptive Routing


• Packet by packet dynamic routing
   – Single cycle routing decision
• Selects route based on...
Bandwidth scalability – 1024 nodes


• Bandwidth achieved
  when 1024 nodes all
  communicate at the
  same time
• QsNetII...
QsNetIII Device Overview




             Elan                     Elite
       Manufacturing partner LSI/TSMC G90 process...
QsNetIII – Federated Network Switches


• Node switch chassis
   – 128 links up 128 down


• Same chassis provides multipl...
QsNetIII Network 4096–way
QsNetIII cables


• QSFP connectors throughout
• Optical cables (e.g.Luxtera), 5-300m
    – PVDF Plenum rated
    – LSZH a...
QsNetIII for HP BladeSystem


Elan5 mezzanine adapter
2 QsNet links                 Elite5 switch module
PCI-E x8 (initial...
2048-way QsNetIII BladeSystem Network
Building a 16K node system in 2009/10


• Single water cooled rack will   • 8 Blade switches per rack
  provide 1000-2000 ...
QsNetIII Fault Tolerance


• All of the QsNetII Features
   –   CRCs on every packet
   –   Automatic retransmission
   – ...
Software Model – Firmware & Drivers


• Base firmware in the ROMs
• Firmware modules loadable with the device driver
   – ...
Software Model – Elan Libraries


• Point-to-point message      • Optimised collectives
  passing                     • Lo...
Why Quadrics?


• Focus on the most demanding HPC applications
• Delivers large system scalability
   – All nodes achieve ...
Upcoming SlideShare
Loading in...5
×

QsNetIII, An HPC Interconnect For Peta Scale Systems

475

Published on

QsNetIII Network
–Multi-stage switch network
–Evolution of the QsNetIIdesign
–Increased use of commodity hardware
–Increasing support for standard software
•QsNetIII Components
–ASICs Elan5 and Elite5
–Adapters, switches, cables
–Firmware, drivers, libraries
–Diagnostics, documentation

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
475
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

QsNetIII, An HPC Interconnect For Peta Scale Systems

  1. 1. QsNetIII An HPC Interconnect for PetaScale Systems Duncan Roweth, Quadrics Ltd ISC08 Dresden June 2008
  2. 2. Quadrics Background • Develops interconnect products for the HPC market – HPC Linux systems – AlphaServer SC systems • Quadrics is owned by the Finmeccanica group • Quadrics will be 12 years old in July
  3. 3. Interconnect Network – QsNet • QsNetIII Network • QsNetIII Components – – Multi-stage switch network ASICs Elan5 and Elite5 – – Evolution of the QsNetII design Adapters, switches, cables – – Increased use of commodity hardware Firmware, drivers, libraries – – Increasing support for standard Diagnostics, documentation software
  4. 4. Elan5 Adapter Overview CX4/ CX4/ • QSNetIII QSNetIII QsNetIII 2 × 25 Gbit/s links • PCIe, PCIe2 host interface Elan5 Adapter Link Link • Multiple packet engines Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine Packet Engine 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache 16K inst cache • 512KB of high bandwidth on 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers 9K data buffers chip local memory Fabric • SDRAM interface to optional x8 local memory Bridge Host I/F Local Memory Local Functions Object Cache Tags TLB • Buffer manager, object Buffer Manager External cache Cmd Launch SDRAM i/f Ext i/f Free List PCIe 16K x 8 x 8 banks = 1MB ECC RAM PLL cache SERDES External EEPROM Clocks PCIe DDRII 16 Lanes
  5. 5. QsNetIII Adapter Overview • QM700 PCIe x16 • 128MB adapter memory • 2 QSFP links • Half height low profile • Adapters variants – PCIe Gen2 – Blade formats – 10Gbit/s Ethernet 10GBase-CX4
  6. 6. Elite5 - Overview • Physical layer DDR XAUI – 4 x 6.25Gbit/s (2.5Gbytes/s) in each direction • 32-way crosspoint router • 32 virtual channels per link • Fat tree or mesh topologies • Adaptive routing • Broadcast & barrier support • Memory mapped stats & error counters accessed via control network
  7. 7. QsNetIII Adaptive Routing • Packet by packet dynamic routing – Single cycle routing decision • Selects route based on – Link state, errors etc – Number of pending acks • High radix switches – 2 routing decisions for 2048 nodes • More flexible than QsNetII – Operates on groups of links – Can adaptively route up or down
  8. 8. Bandwidth scalability – 1024 nodes • Bandwidth achieved when 1024 nodes all communicate at the same time • QsNetII provides better average bandwidth and much narrower spread in best to worst case performance System Interconnect Min Max Average Atlas Infiniband 95 762 263 QsNetII Thunder 248 403 369 Data from Lawrence Livermore National Lab, published at the Sonoma OpenFabrics workshop June 2007
  9. 9. QsNetIII Device Overview Elan Elite Manufacturing partner LSI/TSMC G90 process Semi custom ASICs, 500MHz system clock High performance BGA package 672 pin 982 pin 17W 18W
  10. 10. QsNetIII – Federated Network Switches • Node switch chassis – 128 links up 128 down • Same chassis provides multiple top switch configurations: – 644 512-way systems – 328 1024-way systems – 1616 2048-way systems – 832 4096-way systems
  11. 11. QsNetIII Network 4096–way
  12. 12. QsNetIII cables • QSFP connectors throughout • Optical cables (e.g.Luxtera), 5-300m – PVDF Plenum rated – LSZH available as an option • Active copper cables (Gore), 8-20m • Copper cables (Gore) 1-10m • No longer Quadrics proprietary • Bit error rates are a big issue at 5 Gbps and above – Optical cables between switches – Short copper cables from nodes
  13. 13. QsNetIII for HP BladeSystem Elan5 mezzanine adapter 2 QsNet links Elite5 switch module PCI-E x8 (initially) Full bandwidth 128 MB of memory 16 links to the blades (via backplane) 16 links to back of the module
  14. 14. 2048-way QsNetIII BladeSystem Network
  15. 15. Building a 16K node system in 2009/10 • Single water cooled rack will • 8 Blade switches per rack provide 1000-2000 standard • Connect 128 of these racks cores ~12-25 TF. with 1024-way top switches • Single fibre cable per node - for full bi-section bandwidth.
  16. 16. QsNetIII Fault Tolerance • All of the QsNetII Features – CRCs on every packet – Automatic retransmission – Adaptive routing avoids failed links – Redundant routes – Redundant, hot plugable, PSUs and fans + Full line rate testing of each link as it comes up – Switches generate CRPAT, CJPAT or PRBS packets – Links are only added to the route tables when they are (a) up, (b) connect to the right place, and (c) can transfer data without error.
  17. 17. Software Model – Firmware & Drivers • Base firmware in the ROMs • Firmware modules loadable with the device driver – Elan, OpenFabrics, 10GE Ethernet, … • Kernel modules – elan5, elan, rms • Device dependent library (libelan5) • Device independent library (libelan) • User libraries
  18. 18. Software Model – Elan Libraries • Point-to-point message • Optimised collectives passing • Locks and atomics ops • One-sided put/get • Global memory allocation • Transparent rail striping
  19. 19. Why Quadrics? • Focus on the most demanding HPC applications • Delivers large system scalability – All nodes achieve host adapter bandwidth at the same time – Minimal spread between best and worst case performance – Low and uniform latency – Highly optimised collectives • Single supplier of interconnect hardware, software, support • Stability of our products • Track record of delivering production systems • European company
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×