What is 3d torus


Published on

A high level description of what 3D Torus is and how it is implemented in the Aurora supercomputers from Eurotech

Published in: Technology

What is 3d torus

  1. 1. What is 3D TorusThe switchless interconnection topology
  2. 2. A demanding future for HPC• Supercomputers are asked to process demanding computational loads (process and data)•• Processor power is paramount but a key aspect of parallel computers is the communication network that interconnects the computing nodes• Together with speed, HPC systems are increasingly asked to be more available• One additional challenge with large systems is scalability, so the ability to add nodes to a cluster without affecting performance and reliability or affecting them as little as possible• It is also paramount for future machines to consume less energy
  3. 3. Conceptual difference Switched Infiniband network Switchless Torus network
  4. 4. 3D Torus topology• Connecting nodes using a 3D Torus configuration means than each node in a cluster is connected to the adjacent ones via short cabling• The signal is routed directly from one node to the other with no need of switches. 3D means that the communication takes places in 6 different “directions”: X+, X-, Y+, Y-, Z+, Z-• In practical terms, each node can be connected to 6 other nodes: in this way, the graph of the connections resembles a tri- dimensional matrix
  5. 5. 3D Torus topology• Such configuration allows the addition of nodes to a system without degrading performance.• Each new node is joint as an addition of a grid, linked to it with no extensive cabling or switching• Scaling linearly, with little or no performance loss, is strictly true for those problems that heavily rely on next neighbor communication• the addition of a node in a large system happens with much less working and potential troubles• Being the connections between nodes short and direct, the latency of the links is very low
  6. 6. 3D Torus advantages• High speed and low latency• Linear scalability. Switchless configuration that avoids bottlenecks and allows hardware cost reduction• Improvement of MTBF• Regular and hidden wiring, leading to less cabling• Lower energy usage for communication• Good match between physical communication channels and local pattern algorithms• Less energy consumed
  7. 7. 3D Torus applicationsPlace, Date
  8. 8. 3D Torus applications• The maximization of the performance of the 3D Torus takes place with a subset of problems which is specific but rather large.• These are local pattern problems, which typically deal with modeling systems whose functioning/reaction depends on adjacent systems.
  9. 9. Example: Lattice QCD• Computer simulations of Lattice QCD (the theory of strong interactions e.g. inside protons) is one of the great challenges for massively parallel supercomputers and requires a communication network with high bandwidth and low latency. • The equations governing Lattice QDC describe local interactions (each degree of freedom interacts with its nearest neighbors) and this results in a well balanced computational task in which each degree of freedom (the value of a field on a space-time point) obeys the same equations, which are coupled to a small number of other degrees of freedom residing nearby.
  10. 10. Example: Fluid Dynamics• Fluid Dynamics in turbulent regime shares the same opportunity of being “easily” put on a supercomputer in the formulation defined by what are known as Lattice Boltzmann Methods (LB).• This is a scientific field which is both intriguing from the point of view of fundamental science and relevant to many technological applications.
  11. 11. Additional applications• Many Monte Carlo simulations and embarrassingly parallel problems can exploit the full performance advantage of the 3D Torus architecture• Problems that require all to all dialogue between nodes may exploit less the full performance of the 3D torus interconnection• However, independently from the type of application and problem, the 3D torus still bears the massive advantage of scalability and serviceability
  12. 12. Eurotech Aurora 3D Torus
  13. 13. Aurora Torus peculiarities• Unified network architecture: – the 3D Torus coexists with an Infiniband network. – Both local and global MPI calls can be processed efficiently – Dedicated synchronization network – Gigabit Ethernet• FPGA driven Torus. Based the result of the work of Aurora Science researchers who acquired experienced with Janus and QPACE• Full duplex communication links – Allowing sub-tori to create subdomain• The length of cables kept very short due to smart backplane design
  14. 14. Aurora 3D Torus Network• Aurora Science implementation – Based on FTNW (Pisanti, Schifano, Simma) • http://sourceforge.net/projects/ftnw – GPL licensed – Optimized for nearest-neighbor communication – Proven technology in LQDC communities• Extoll implementation on Aurora – Licensed – Optimized for all-to-all communication for wide range of applications – Future interconnect paving the way to exascale computing
  15. 15. Aurora FPGA: 3D Torus network processor PCIe 2.0 x8
  16. 16. Aurora S– 3D Torus network CPU CPU PCIe2 x8 PCIe2 x8 40 Gbps 40 Gbps FPGA phy phy phy phy phy phy 4x 4x 4x 4x 4x 4x X+ X- Y+ Y- Z+ Z- 10 10 10 10 10 10 Gbps Gbps Gbps Gbps Gbps Gbps
  17. 17. Aurora systems
  18. 18. AURORA, a high density – highly efficient familyof supercomputers One Aurora Rack 256 nodes, 512 CPUs, 3072 cores, 48U 100 TFLOPS @ 100kW Entirely liquid cooled
  19. 19. Aurora identity card High computational power Liquid cooling Energy efficiency Reliability and availability Scalability Unified network architecture Compatibility
  20. 20. Aurora identity card High computational power Liquid cooling Energy efficiency Reliability and availability Scalability Unified network architecture Compatibility
  21. 21. Unified Network Architecture Ultra low latency Regular, massive, local High bandwidth patterns 3D Torus Nearest neighbor Unlimited scalability select Very low latency Irregular, long distance High bandwidth patterns (Molecular Infiniband Switched network Dynamics) Multiple services Storage (SAN) Monitoring (IPMI) Very fast channel Net processor synch Global commands Thread synch Synch Subdomain manag. Global clock Low/high level synch System coordination Debugging
  22. 22. Eurotech HPC PrinciplesHigh performance. We want our customer run their simulations andapplications as quick as the world latest technologies allow.Energy efficiency and Green. We built products to allow our customer to saveon energy bills and leverage sustainability.Scalability. We want our solutions to scale linearly and our customer to growaccording to their needs and budget availabilityAvailability. Intelligent design, quality, support readiness and preventivemaintenance to increase the availability of our HPC systems during theirlifetime.Cost effectiveness. We concentrate a lot of our efforts to deliver advancedtechnology at competitive prices and to allow our customers reducing the totalcost of ownership.Versatility and compatibility. We designed our products to tackle differentproblems in the most effective way possible
  23. 23. www.eurotech.com/aurora