Fca Product Overview Feb222010 As


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Fca Product Overview Feb222010 As

  1. 1. Voltaire Fabric Collective Accelerator ™ (FCA) Accelerate your Fabric
  2. 2. The Challenge: Collective Operations Performance <ul><li>Collective operations take large amount of the application run time and don’t scale well </li></ul><ul><li>System/OS “noise” affects scalability </li></ul><ul><li>Simple offload solutions DON’T address the key problems: </li></ul><ul><ul><li>Network congestion due to “All-to-All” communication </li></ul></ul><ul><ul><li>Computation & messaging performance </li></ul></ul><ul><ul><li>Difficult to manage and orchestrate </li></ul></ul>Poor application scalability and low cluster efficiency
  3. 3. Collective Communication Portion of MPI Runtime Percentage Collective Operations % of MPI Job Runtime
  4. 4. Introducing Voltaire Fabric Collective Accelerator (FCA) ……… . CPU in switch used to offload collective operations Collective tree & Rank placement optimized to the topology Use of IB multicast for result distribution Inter-core communication optimized + + + + + +
  5. 5. FCA Solution Architecture <ul><li>First fully integrated solution to offload collectives </li></ul><ul><li>Combines intelligence on server, switches, and management </li></ul><ul><ul><li>UFM™ - Automates fabric collective offload/monitoring and integrates with schedulers </li></ul></ul><ul><ul><li>Voltaire “smart” switch based CPUs perform reduction and messaging operation </li></ul></ul><ul><ul><li>Voltaire OMA (Open MPI plug-in) - Addresses host side collective (multi-core) </li></ul></ul>
  6. 6. FCA Addressing the Problem End To End <ul><li>Increase performance, reduce congestion </li></ul><ul><ul><li>Reduce fabric traffic to single message per wire, dramatically reduce congestion </li></ul></ul><ul><ul><li>FCA offload “shields” collective operation from node “noise” </li></ul></ul><ul><ul><li>Enable non-blocking collective (overlap communication and calculation) </li></ul></ul><ul><ul><li>Linear scalability to many thousands of nodes with predictable hardware performance </li></ul></ul><ul><li>Simple, fully integrated </li></ul><ul><ul><li>No change in application – OMA drop-in Open MPI plug-in </li></ul></ul><ul><ul><li>Switches come equipped with FCA offload code out of the box </li></ul></ul><ul><ul><li>UFM automates the process and integrates with scheduler, saving setup burden </li></ul></ul><ul><ul><li>Fully integrated monitoring capabilities </li></ul></ul><ul><li>FCA reduced collective operations runtime by up to 100X </li></ul><ul><ul><li>11K nodes MPI collective operations within 25 usec </li></ul></ul>
  7. 7. FCA Preliminary Performance Results 78% 66%
  8. 8. FCA What is the alternative/competitive solution? A Fabric Wide Challenge requires a Fabric Wide Solution Integration with Job Schedulers Network Congestion Elimination Fabric switches offload computation 1-2% 30-40% Expected MPI Job runtime Improvement OS “noise” reduction Result distribution based on IB multicast Topology aware NIC-based offload FCA
  9. 9. FCA Bringing InfiniBand to Capability Clusters <ul><li>Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing </li></ul><ul><li>FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks </li></ul><ul><li>Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing </li></ul>Price/Complexity Performance Capacity Capability
  10. 10. Voltaire Fabric Collective Accelerator Summary <ul><li>Fabric computing offload </li></ul><ul><ul><li>Combination of SW & HW in a single solution </li></ul></ul><ul><ul><li>Offloading blocking computational tasks </li></ul></ul><ul><ul><li>Algorithms leveraging the topology for computation (trees) </li></ul></ul><ul><li>Extreme MPI performance & scalability </li></ul><ul><ul><li>Capability computing on commodity clusters </li></ul></ul><ul><ul><li>Two orders of magnitude, ten-times faster in Collective runtime </li></ul></ul><ul><ul><li>Linear scalability (O18) </li></ul></ul><ul><li>Transparent to the application </li></ul><ul><ul><li>Standard Open MPI plug-in </li></ul></ul><ul><ul><li>Plug & play - No need for do any code changes </li></ul></ul><ul><ul><li>Simple SDK for integration with other MPIs </li></ul></ul>Accelerate your Fabric!
  11. 11. Thank You