Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC workshop, SC19)


Published on

byteLAKE's and Xilinx's presentation from the Fifth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'19).

The workshop was held in conjunction with:
SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis.

Workshop: CFD Acceleration with FPGA

Abstract: Learn more about byteLAKE's CFD Suite (BCS), a collection of Alveo (FPGA) optimized CFD workloads.
announcing today with first microservices: Advection and Thomas Algorithm (linear algebra module). Presenters will explain technical details, followed by in-depth technical discussion with a demo at Xilinx booth.


Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest:

Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More:

Published in: Technology
  • Login to see the comments

  • Be the first to like this

CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC workshop, SC19)

  1. 1. CFD Acceleration with FPGA Krzysztof Rojek, CTO at byteLAKE, PhD, DSc at Czestochowa University of Technology Jamon Bowen, Director, Segment Marketing and Planning at Xilinx Launching byteLAKE’s CFD Suite
  2. 2. FPGAs – The Ultimate Parallel Processing Device › No predefined instruction set or underlying architecture › Developer customizes the architecture to his needs » Custom datapaths » Custom bit-width » Custom memory hierarchies › Excels at all types of parallelism » Deeply pipelined (e.g. Video codecs) » Bit manipulations (e.g. AES, SHA) » Wide datapath (e.g. DNN) » Custom memory hierarchy (e.g: Data analytics) › Adapts to evolving algorithms and workload needs
  3. 3. VITIS – Heterogeneous compute development environment
  4. 4. Using C, C++ or OpenCL to Program FPGAs › Xilinx pioneered C to FPGA compilation technology (aka “HLS”) in 2011 › Enables “Software Programmability” of FPGAs › Includes open source collection of optimized HLS libraries loop_main:for(int j=0;j<NUM_SIMGROUPS;j+=2) { loop_share:for(uint k=0;k<NUM_SIMS;k++) { loop_parallel:for(int i=0;i<NUM_RNGS;i++) { mt_rng[i].BOX_MULLER(&num1[i][k],&num2[i][k],ratio4,ratio3); float payoff1 = expf(num1[i][k])-1.0f; float payoff2 = expf(num2[i][k])-1.0f; if(num1[i][k]>0.0f) pCall1[i][k]+= payoff1; else pPut1[i][k]-=payoff1; if(num2[i][k]>0.0f) pCall2[i][k]+=payoff2; else pPut2[i][k]-=payoff2; } } } FPGACompile
  5. 5. Software Programmability: FPGA Development in C/C++ Page 6 PCIe x86 CPU Host Application Runtime and Drivers Acceleration API FPGA Accelerated Functions DMA Engine AXI Interfaces User Application Code Xilinx Acceleration Platform C/C++ code with OpenCL API calls C/C++ or OpenCL C FPG A CPU
  6. 6. Agenda CFD, Computational Fluid Dynamics › Numerical analysis and algorithms to solve fluid flows problems. › Model fluids density, velocity, pressure, temperature, and chemical concentrations in relation to time and space. › Typical applications: weather simulations, aerodynamic characteristics modelling and optimization, flow around buildings simulations etc. 7
  7. 7. Architecture › The compute domain is divided into 4 sub-domains › Host sends data to the FPGA global memory › Host calls kernel to execute it on FPGA (kernel is called many times) › Each kernel call represents a single time step › FPGA sends the output array back to host
  8. 8. Alveo Optimizations 5774.60 4597.60 4572.00 1179.00 673.10 575.70 483.60 342.90 23.80 9.96 Execution time [s]
  9. 9. 10 Conclusions INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance (the higher the better) INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Energy (the lower the better) INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance/W (the higher the better) • Up to 4x more performance • Up to 80% lower energy consumption • Up to 6x more performance/Watt
  10. 10. Launching byteLAKE’s CFD Suite (BCS) › Highlights » Collection of Alveo Optimized CFD Workloads » Acceleration = Faster Results » Green Computing = Improved Efficiency » Microservices = Quick Start » Excellent TCO = Cost Saving » AI Driven Approach
  11. 11. First Microservices Launching Today › Advection › Thomas Algorithm (linear algebra module) › Low barrier entry » Scalable on demand » As a Service / Cloud » On-premise
  12. 12. Way Forward More Microservices (roadmap) byteLAKE’s CFD Suite (GCS) Use Case Specific AI Driven Highly Optimized Green Energy Automotive Construction Chemistry Oil & Gas
  13. 13. byteLAKE at SC19 HPC and AI Convergence Denver, CO, Colorado Convention Center, Nov 17-21 Booth: H2RC, 607• CFD Acceleration with FPGA (workshop) • byteLAKE’s CFD Suite (Alveo optimized, demo) • Leveraging AI for Reforestation Efforts and AI Training Acceleration (demo) /en/SC19
  14. 14. Thank You