Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reconfigurable FPGA-based Clusters: Next step in Supercomputing

Presented at Graduate Student Symposium 2007.

  • Login to see the comments

Reconfigurable FPGA-based Clusters: Next step in Supercomputing

  1. 1. Reconfigurable FPGA-based Clusters: Next step in Supercomputing Vivek Venugopal, Kevin Shinpaugh HPReC systems FPGA Interconnect Introduction • Partially Reconfigurable System processor/co-processor network nodes Bus/Switch 00 01 02 03 • Current High Performance Computing (HPC) Inside each node FPGA • Cluster of FPGAs equivalent to a huge processor with embedded 10 11 12 13 applications include genome sequencing (BLAST), reconfigurability, replication and parallelism molecular dynamics simulation (AMBER, NAMD), RAM 20 21 22 23 • Issues prevalent with systems: astrophysics simulation, weather prediction, etc. 30 31 32 33 • Scalability of the system with respect to type of application GPP Motivation Interconnection Network • Availability of a fast interconnection network for I/O bound • Completely Reconfigurable applications (Bandwidth access) • HPC systems cater to two types of applications: Bus/Switch System • More processors or more floating point cores for compute Inside each HPC applications Infiniband interconnect node I/F FPGA bound applications Memory RAM User Logic >100 million gates Compute I/O bound n bound FPGA Power PC application application Interconnection Network • HPC systems need to be built according to the Test platforms (HPReC) CPU AMD application for maximum efficiency. Opteron Current Future Cray XD1 SGI RASC 3.2 GB/s platform platform HPC scenario HPReC scenario Processing AMD Opteron + Intel Itanium + hardware Xilinx Virtex-4 FPGAs Xilinx Virtex-II FPGAs Cache Memory FPGA RapidArray Interconnect 16 MB Xilinx Application Interconnect Application Interface chip Rapid Array Interconnect Numalink Interconnect QDR SDRAM Virtex 4 3.2 GB/s 12.8 GB/s network Bandwidth 4 GB/s 12.8 GB/s access 2 x 2 GB/s • Most of the hardware mapping from the software is automated and is based on the availability of specified libraries or processors for implementation, eg. GPP with fixed flexible hardware for RapidArray Interconnect Bus logic Mitrionics, Handel-C, etc. computation and blocks communication Performance and speedup Better performance and HPReC applications scales with more processors speedup with FPGA Reconfigurable systems Select map CPU Loader prog. interface Intel FPGA • A combination of I/O bound and compute Itanium2 • Reconfigurable computing is based on the concept bound applications PCI that the application defines the processor. 66 MHz • Bioinformatics 2 x 3.2 GB/s •FPGAs are inherently parallel with lower power •Smith Waterman algorithm Cache Memory Algorithm dissipation and are available with a huge library of 16 MB TIO FPGA •BLAST QDR SDRAM Xilinx Virtex II 9.6 GB/s application cores. • Physics • Reconfiguration can result in (i) efficient hardware 4 x 3.2 GB/s • Coliter data utilization for repeated operations in a specific • Molecular Simulation Dynamics application and (ii) better data passing on the Numalink 4 Interconnect Bus •AMBER interconnection network between the processors. References [1] “Cray XD1 datasheet,” Cray Inc., Technical report, 2005.. Available: [2] “Cray XD1 supercomputer for reconfigurable computing,” Cray Inc., Technical report, 2005. Available: [3] “ SGI Reconfigurable Application Specific Computing: Accelerating Production Workflows,” Silicon Graphics Inc., Technical report, December 2006. Available: [4] “ Extraordinary Acceleration of Workflows with Reconfigurable Application-specific Computing from SGI,” Silicon Graphics Inc., Technical report, November 2004. Available: