CFD Acceleration with FPGA
Krzysztof Rojek, CTO at byteLAKE, PhD, DSc at Czestochowa University of Technology
Jamon Bowen, Director, Segment Marketing and Planning at Xilinx
Launching byteLAKE’s CFD Suite
FPGAs – The Ultimate Parallel Processing Device
› No predefined instruction set or underlying architecture
› Developer customizes the architecture to his needs
» Custom datapaths
» Custom bit-width
» Custom memory hierarchies
› Excels at all types of parallelism
» Deeply pipelined (e.g. Video codecs)
» Bit manipulations (e.g. AES, SHA)
» Wide datapath (e.g. DNN)
» Custom memory hierarchy (e.g: Data analytics)
› Adapts to evolving algorithms and workload needs
VITIS – Heterogeneous compute development
environment
Using C, C++ or OpenCL to Program FPGAs
› Xilinx pioneered C to FPGA compilation technology (aka “HLS”) in 2011
› Enables “Software Programmability” of FPGAs
› Includes open source collection of optimized HLS libraries
loop_main:for(int j=0;j<NUM_SIMGROUPS;j+=2) {
loop_share:for(uint k=0;k<NUM_SIMS;k++) {
loop_parallel:for(int i=0;i<NUM_RNGS;i++) {
mt_rng[i].BOX_MULLER(&num1[i][k],&num2[i][k],ratio4,ratio3);
float payoff1 = expf(num1[i][k])-1.0f;
float payoff2 = expf(num2[i][k])-1.0f;
if(num1[i][k]>0.0f)
pCall1[i][k]+= payoff1;
else
pPut1[i][k]-=payoff1;
if(num2[i][k]>0.0f)
pCall2[i][k]+=payoff2;
else
pPut2[i][k]-=payoff2;
}
}
}
FPGACompile
Software Programmability: FPGA Development in C/C++
Page 6
PCIe
x86 CPU
Host
Application
Runtime and Drivers
Acceleration API
FPGA
Accelerated
Functions
DMA Engine
AXI Interfaces
User
Application
Code
Xilinx
Acceleration
Platform
C/C++ code
with
OpenCL API calls
C/C++
or
OpenCL C
FPG
A
CPU
Agenda
CFD,
Computational
Fluid Dynamics
› Numerical analysis and algorithms
to solve fluid flows problems.
› Model fluids density, velocity,
pressure, temperature, and
chemical concentrations in relation
to time and space.
› Typical applications: weather
simulations, aerodynamic
characteristics modelling and
optimization, flow around buildings
simulations etc.
7
Architecture
› The compute domain is divided
into 4 sub-domains
› Host sends data to the FPGA
global memory
› Host calls kernel to execute it on
FPGA (kernel is called many times)
› Each kernel call represents
a single time step
› FPGA sends the output array
back to host
Alveo Optimizations
5774.60
4597.60 4572.00
1179.00
673.10 575.70 483.60 342.90 23.80 9.96
Execution time [s]
10
Conclusions
INTEL
XEON E5-
2995
INTEL
XEON E5-
2995
INTEL
XEON
GOLD 6148
INTEL
XEON
PLATINUM
8168
XILINX
ALVEO
U250
Performance (the higher
the better)
INTEL
XEON E5-
2995
INTEL
XEON E5-
2995
INTEL
XEON
GOLD 6148
INTEL
XEON
PLATINUM
8168
XILINX
ALVEO
U250
Energy (the lower the
better)
INTEL
XEON E5-
2995
INTEL
XEON E5-
2995
INTEL
XEON
GOLD 6148
INTEL
XEON
PLATINUM
8168
XILINX
ALVEO
U250
Performance/W (the
higher the better)
• Up to 4x more performance
• Up to 80% lower energy consumption
• Up to 6x more performance/Watt
Launching byteLAKE’s CFD Suite
(BCS)
› Highlights
» Collection of Alveo Optimized CFD Workloads
» Acceleration = Faster Results
» Green Computing = Improved Efficiency
» Microservices = Quick Start
» Excellent TCO = Cost Saving
» AI Driven Approach
First Microservices Launching Today
› Advection
› Thomas Algorithm (linear algebra module)
› Low barrier entry
» Scalable on demand
» As a Service / Cloud
» On-premise
Way Forward
More Microservices (roadmap)
byteLAKE’s
CFD Suite
(GCS)
Use Case
Specific
AI Driven
Highly Optimized
Green Energy Automotive Construction Chemistry Oil & Gas
byteLAKE at SC19
HPC and AI Convergence
Denver, CO, Colorado Convention Center, Nov 17-21
Booth:
H2RC, 607• CFD Acceleration with FPGA (workshop)
• byteLAKE’s CFD Suite (Alveo optimized, demo)
• Leveraging AI for Reforestation Efforts
and AI Training Acceleration (demo)
byteLAKE.com
/en/SC19
Thank You
welcome@byteLAKE.com

CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC workshop, SC19)

  • 1.
    CFD Acceleration withFPGA Krzysztof Rojek, CTO at byteLAKE, PhD, DSc at Czestochowa University of Technology Jamon Bowen, Director, Segment Marketing and Planning at Xilinx Launching byteLAKE’s CFD Suite
  • 2.
    FPGAs – TheUltimate Parallel Processing Device › No predefined instruction set or underlying architecture › Developer customizes the architecture to his needs » Custom datapaths » Custom bit-width » Custom memory hierarchies › Excels at all types of parallelism » Deeply pipelined (e.g. Video codecs) » Bit manipulations (e.g. AES, SHA) » Wide datapath (e.g. DNN) » Custom memory hierarchy (e.g: Data analytics) › Adapts to evolving algorithms and workload needs
  • 3.
    VITIS – Heterogeneouscompute development environment
  • 4.
    Using C, C++or OpenCL to Program FPGAs › Xilinx pioneered C to FPGA compilation technology (aka “HLS”) in 2011 › Enables “Software Programmability” of FPGAs › Includes open source collection of optimized HLS libraries loop_main:for(int j=0;j<NUM_SIMGROUPS;j+=2) { loop_share:for(uint k=0;k<NUM_SIMS;k++) { loop_parallel:for(int i=0;i<NUM_RNGS;i++) { mt_rng[i].BOX_MULLER(&num1[i][k],&num2[i][k],ratio4,ratio3); float payoff1 = expf(num1[i][k])-1.0f; float payoff2 = expf(num2[i][k])-1.0f; if(num1[i][k]>0.0f) pCall1[i][k]+= payoff1; else pPut1[i][k]-=payoff1; if(num2[i][k]>0.0f) pCall2[i][k]+=payoff2; else pPut2[i][k]-=payoff2; } } } FPGACompile
  • 5.
    Software Programmability: FPGADevelopment in C/C++ Page 6 PCIe x86 CPU Host Application Runtime and Drivers Acceleration API FPGA Accelerated Functions DMA Engine AXI Interfaces User Application Code Xilinx Acceleration Platform C/C++ code with OpenCL API calls C/C++ or OpenCL C FPG A CPU
  • 6.
    Agenda CFD, Computational Fluid Dynamics › Numericalanalysis and algorithms to solve fluid flows problems. › Model fluids density, velocity, pressure, temperature, and chemical concentrations in relation to time and space. › Typical applications: weather simulations, aerodynamic characteristics modelling and optimization, flow around buildings simulations etc. 7
  • 7.
    Architecture › The computedomain is divided into 4 sub-domains › Host sends data to the FPGA global memory › Host calls kernel to execute it on FPGA (kernel is called many times) › Each kernel call represents a single time step › FPGA sends the output array back to host
  • 8.
    Alveo Optimizations 5774.60 4597.60 4572.00 1179.00 673.10575.70 483.60 342.90 23.80 9.96 Execution time [s]
  • 9.
    10 Conclusions INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance (the higher the better) INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Energy (the lower the better) INTEL XEON E5- 2995 INTEL XEON E5- 2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance/W (the higher the better) • Up to 4x more performance • Up to 80% lower energy consumption • Up to 6x more performance/Watt
  • 10.
    Launching byteLAKE’s CFDSuite (BCS) › Highlights » Collection of Alveo Optimized CFD Workloads » Acceleration = Faster Results » Green Computing = Improved Efficiency » Microservices = Quick Start » Excellent TCO = Cost Saving » AI Driven Approach
  • 11.
    First Microservices LaunchingToday › Advection › Thomas Algorithm (linear algebra module) › Low barrier entry » Scalable on demand » As a Service / Cloud » On-premise
  • 12.
    Way Forward More Microservices(roadmap) byteLAKE’s CFD Suite (GCS) Use Case Specific AI Driven Highly Optimized Green Energy Automotive Construction Chemistry Oil & Gas
  • 13.
    byteLAKE at SC19 HPCand AI Convergence Denver, CO, Colorado Convention Center, Nov 17-21 Booth: H2RC, 607• CFD Acceleration with FPGA (workshop) • byteLAKE’s CFD Suite (Alveo optimized, demo) • Leveraging AI for Reforestation Efforts and AI Training Acceleration (demo) byteLAKE.com /en/SC19
  • 14.