SlideShare a Scribd company logo
1 of 39
Download to read offline
FPGA-accelerated High-Performance Computing
Close to Breakthrough or Pipedream?
Christian Plessl
Paderborn Center for Parallel Computing
& Dept. Computer Science
Paderborn University, Germany
24 January 2018
• HPC and Computational Science
• Status of using FPGAs in HPC
• FPGA-accelerated HPC in Paderborn
– plans
– lessons learned
• Conclusions and call to action
Outline
to
Science
Computational
Science
From
Experiment
4
Theory
5
High-Performance Computing (HPC)
6
• Use computers simulation to obtain scientific
results
• “Third paradigm” following experiment and theory
• Advantages of computer experiments:
– make predictions what will happen
– perform experiments that would otherwise be impossible,
too difficult, too dangerous
– perfect reproducibility
– can offer explanations why something happens
7
What is Computational Science?
• Computational science penetration all fields
– engineering
– natural sciences
– humanities
• Growing processing demand
– simulation
– optimization
– data intensive analytics
• Computer are virtual instruments
– microscopes, telescopes, chemistry labs, ...
– improve exponentially in capabilities in contrast to their
physical counterparts
8
Computational Science Drives HPC Demand
images: UCLA, MPG
9
Which Sciences Are Using HPC?
Paderborn Center for
Parallel Computing
Materials Science 30%
Physics 24%
Engineering 13%
Earth Science 10%
Chemistry 15%
Biological Sciences 6%
Computer Science 2%
2016 INCITE
BY DOMAIN
3.57 BILLION
CORE-HOURS
Chemistry 9%
Biological Sciences 1%
Argonne Leadership
Computing Facility
Physics
33%
Engineering
21%
Chemistry
27%
Computer	
Science
12%
Economics
2%
other
5%
ca. 700M
core-hours/year
• Massively parallel computation across all levels
– instruction, core, socket, rack
• Power consumption has become a first-class concern
– operating cost and and power supply
– cooling infrastructure
10
HPC: Massive Scale and Challenges
1
4
16
64
256
1,024
4,096
16,384
65,536
262,144
1,048,576
4,194,304
16,777,216
06/93
06/94
06/95
06/96
06/97
06/98
06/99
06/00
06/01
06/02
06/03
06/04
06/05
06/06
06/07
06/08
06/09
06/10
06/11
06/12
06/13
06/14
06/15
06/16
06/17
Number	of	CPU	cores	in	Top	500	Supercomputers	
Cores #1
Cores #100
Power #1 [kW]
Power #100 [kW]
Trend for rank 1
Trend for rank 100
data: Top500
• Ambitious roadmaps for HPC
– launch of Exascale computing projects in US, Europe, Japan around
2010
– objective: 1 ExaFLOP – less than 20 MW – by 2020
• Requires substantial improvements in the whole stack
– processor architecture, network, programming models
– system design, cooling
• Efficient computing resources are more important than ever
– a wealth of new and re-invented architectures
– accelerators
– heterogeneous computing
11
Quest for Energy-Efficient Computing
a huge opportunity for
reconfigurable computing
f
Cell
FPGA
GPU
Manycore
Vector-
processor
accelerators
• Accelerators entered HPC a decade
ago
• Performance (Top 500, 11/2017)
– 20% of systems use accelerators
– 25%-35% of accumulated performance
• Efficiency (Green 500, 11/2017)
– most efficient systems use PEZY-SC or
GPU accelerators
12
Accelerators on the Rise
statistics: Top500.org
• Breakdown of Top 500 11/2017 by
accelerator type
• Interesting observations
1. 80% of the systems do not use any
accelerators
2. Only NVidia GPUs and Intel Xeon Phi
gained traction
3. FPGAs are absent from the Top500
• Quick rise ≠ universal adoption
– Why don’t we see much broader
adoption of accelerators?
– Stagnation or a matter of time?
13
A Different Take on the Same Data
statistics: Top500.org
FPGA
14
Overarching Questions and Motivation for this Talk
1. If accelerators – in particular
FPGAs – are so great, why aren’t
they in much wider use?
2. What can we do to change this
situation?
• Currently operational, larger scale general-purpose FPGA installations
– CHREC U. Florida: Novo-G#
– Hartree Center UK: Maxeler MPC-X cluster
– Texas TACC: Catapult 1 and Intel HARP v2 cluster
– Paderborn University: XCL + HARP v2 cluster
• HPC Applications with FPGA support
– no generally available, production-ready HPC codes
– some proof of concept codes (e.g. Maxeler Application Gallery)
– probably some integrated solutions/appliances (bioinformatics, cryptography)
• HPC Libraries with FPGA support
– nothing usable/maintained (not even FFT, BLAS, LAPACK)
– announced: Intel and Xilinx acceleration libraries (mainly deep learning)
15
Maybe Top500 Is Too Narrow. Perform a Broader Search
• Numerous publications show the potential of FPGAs for relevant HPC problems
• Some examples
– Linear algebra: CG solver for sparse linear equation systems [1]
§ 20-40x faster than CPU
– Geophysics: 3D convolution [1]
§ 70x faster than CPU, 14x faster than GPU
– Molecular dynamics [2]
§ 80x faster than NAMD (single core) CPU
– Bioinformatics (BLAST) [3]
§ 5x faster than optimized, parallel CPU implementation
– Climate modeling [4]
§ 4 FPGAs 19x faster than two socket CPU, 7x faster than GPU
16
Are FPGAs Not Promising for HPC? I Don’t Think So
[1] O. Lindtjorn, R. G. Clapp, O. Pell, O. Mencer, M. J. Flynn, and H. Fu. Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro,
Mar.–Apr. 2011.
[2] M. Chiu and M. C. Herbordt. Molecular dynamics simulations on high-performance reconfigurable computing systems. ACM TRETS Nov. 2010.
[3] A. Mahram, and M. C. Herbordt. NCBI BLASTP on High-Performance Reconfigurable Computing System. ACM TRETS Jan 2015.
[4] L. Gan, H. Fu, W. Luk et. al. Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms. ACM TRETS Mar. 2015
• Hypothesis: clear value proposition is mandatory
– no other affordable technology can satisfy the requirements
– NRE: avoidance of cost for ASIC design
– CAPEX, OPEX: reduction in investment and operating cost
• Areas where FPGAs seem to have some commercial relevance
– networking equipment (latency, NRE)
– high frequency trading (latency, NRE)
– bioinformatics (NRE, CAPEX)
– cryptanalysis (CAPEX and OPEX)
– defense / medical signal processing (space, power consumption)
– deep learning inference (CAPEX and OPEX)
• Viability of general purpose use (e.g. Amazon F1) so far unproven
17
Some Areas Where FPGAs Are Successfully Used
So, where is the problem?
• Since joining Paderborn University in 2007, I started to connect
more intensely with the HPC community
• The timing was good
– HPC community was increasingly interested in computer architecture and
accelerators
– FPGAs were already known as hot technology for the future of HPC
– some computational scientists were actually interested in collaboration
• My naïve assumptions
– HPC folks will be convinced of FPGAs once they see case studies
– publishing the results at mainstream HPC conferences is in reach
– FPGAs will soon be mainstream to have chances for an HPC faculty
position
19
Pitching FPGAs Acceleration to HPC Audience (1)
20
Pitching FPGAs Acceleration to HPC Audience (2)
collaboration with
physicist ✓
problem that sounds
somewhat important ✓
state-of-the-art CPUs
and FPGAs ✓
21
Pitching FPGAs Acceleration to HPC Audience (3)
algorithm simple
enough to understand ✓
22
Pitching FPGAs Acceleration to HPC Audience (4)
high-level hardware
synthesis, no HDL
(any computational
scientist can do this) ✓
23
Pitching FPGAs Acceleration to HPC Audience (5)
the synthesis result is
not something weird,
that can only be understood
by electrical engineers ✓
24
Pitching FPGAs Acceleration to HPC Audience (6)
CPU and FPGA use double
precision arithmetic ✓
CPU implementation appears
to be reasonably optimized:
• multi-threaded ✓
• cache blocking ✓
• NUMA-aware memory
allocation ✓
speedup is not stellar, but OK
considering the strong CPU
baseline ✓
• The CPU performance baseline is too low
– stencil codes can be much better optimized
– code probably not vectorized
• Optimization for FPGA insufficiently understood
– what is the theoretical performance limit and bottlenecks
(computation, memory, dependencies)
– how can FPGAs ever win, if the DRAM is slower than for
CPUs (lack of understanding of pipelining, streaming, ...)
• Fear, uncertainty, doubt
– is this work actually relevant for computational scientists?
– can you train HPC developers to use FPGAs?
– will the required investment in expensive FPGA hard and
software pay off?
25
My Pitch Was Not Received as Well as Expected
Performance of FPGA
~16 DP computations / update
→ 1000 MCell/s = 16 GFLOPS
Peak Performance CPU
2 sockets * 4 cores * 2.5GHz * 1 – 8
FLOPS = 20 – 160 GFLOPS
• HPC developers are constantly told exciting stories
– this technology is the future: Itanium, Cell, BlueGene, Xeon Phi
– the compiler will handle the complexity for you
• No user cares for energy efficiency
– only infrastructure providers do
• User care for ease-of-use and protection of their investments
– many codes are gigantic, countless person years investment
– there are plenty of free computing resources available for academics
• Benefits of the new technology are not convincingly
presented
– proof-of-concept case study, no real-state of the art problems
– improvement in metrics not relevant for target users (method vs.
insight-driven research)
26
What was Going Wrong?
TRUST US ✓
• The pitfalls FPGA acceleration in
HPC are currently not widely
acknowledged, discussed and
understood
• Interesting position paper published
2009 in ACM TRETS
– Premise: FPGAs show lots of promise
but lack acceptance in general-purpose
HPC installations
– Proposed 12 areas where researcher
need to make contributions to increase
acceptance of FPGAs in HPC
• Many observations and conclusions
still apply today
27
Pitfalls of HPC Acceleration for HPC
The Long Road to Production Reconfigurable Supercomputing · 26: 11
Table I. The State of FPGA Research Toward HPC
Area Status Activity Difficulty
Step 1: Standardization poor moderate low
Step 2: High Performance Forward Portability poor low high
Step 3: Enhanced Device Performance good low high
Step 4: Enhanced System Architecture fair none moderate
Step 5: Simplified Library Usage fair low low
Step 6: Concurrent APIs poor low low
Step 7: Better Performance Studies fair moderate moderate
Step 8: Improved Programming Environment good high high
Step 9: Improved Infrastructure poor low moderate
Step 10: Enhanced Communications good moderate moderate
Step 11: Enhanced Reliability poor low high
Step 12: Provide OS Support poor low low
objective viewpoint that only it can. Step 2 (high performance forward porta-
bility) is in similarly bad shape, and is a much harder problem. There is rela-
Critical Areas Identified by Underwood et al.
28
Getting to the Core of the Problem
29
Accelerator research stands in striking contrast to high
performance computing and general microprocessor
optimization work. In the latter, optimization work often
goes into widely available libraries (e.g. ATLAS and
FFTW).
In contrast, accelerator research tends to be a single
proof of concept effort that never makes it outside the
lab – despite the fact that it targets widely used core
algorithms. [..]
It is time for accelerator researchers to invest the extra
effort and make their work applicable.
[Underwood et. al 2009]
Intuitive Assessment of the Progress We Made Since 2009
30
The Long Road to Production Reconfigurable Supercomputing · 26: 11
Table I. The State of FPGA Research Toward HPC
Area Status Activity Difficulty
Step 1: Standardization poor moderate low
Step 2: High Performance Forward Portability poor low high
Step 3: Enhanced Device Performance good low high
Step 4: Enhanced System Architecture fair none moderate
Step 5: Simplified Library Usage fair low low
Step 6: Concurrent APIs poor low low
Step 7: Better Performance Studies fair moderate moderate
Step 8: Improved Programming Environment good high high
Step 9: Improved Infrastructure poor low moderate
Step 10: Enhanced Communications good moderate moderate
Step 11: Enhanced Reliability poor low high
Step 12: Provide OS Support poor low low
objective viewpoint that only it can. Step 2 (high performance forward porta-
bility) is in similarly bad shape, and is a much harder problem. There is rela-
• The time of the free lunch for performance is over
– GPUs have paved the way for application modifications
– previously the code was assumed to be sacred and untouchable
• Energy efficiency has become a pressing issue
– opens up another dimension for competition
• There is finally a “killer app”
– inference for deep neural networks
– FPGAs ride the AI hype-wave
• Cloud and data center players make massive investments in FPGAs
– Altera acquisition by Intel, IBM/Xilinx partnership
– use of FPGAs in clouds of Microsoft, Amazon, Baidu, IBM, Huawei, etc.
– the overall ecosystem will profit from this
31
Changes in Ecosystem Since Underwood’s Assessment
• OpenCL High-Level Synthesis design flows
– language capable of specifying many aspects
relevant for FPGAs
– standardized and used in other contexts too
– supports easier design space exploration
– abstracts from FPGA board, memory channels,
PCIe interfaces
• Highly capable FPGA devices
– vast amounts of DSP blocks
– suitable bit widths for implementing floating point
arithmetic
• Steps towards better system integration
– shared and coherent global memory access
32
Technological Progress Since Underwood’s Assessment
HPC-relevant Intel Stratix 10 features
• 5.5 M LE
• 28 MB block RAM
• 10 TFLOPS single-precision floating-
point performance
• 80 GFLOPS/W (best Green500
system achieves 17 DP GFLOPS/W)
• hardened PCIe x16
• hardened memory controllers for
DDR4
• up to 96 transceivers
• Longstanding experience with FPGAs for HPC
• Current FPGA infrastructure
– two testbed clusters for public use
– additional FPGA systems from most major vendors
33
HPC with FPGAs at Paderborn University
System Inst CPU FPGA Toolflow Properties
Convey HC-1 2010 Xeon 5138 4x Xilinx Virtex-5 LX 330 HDL + vector
processor overlay
CPU and FPGA connected via FSB, cache-
coherent NUMA architecture
Maxeler MPC-C 2012 Xeon X5660 4x Xilinx Virtex-6 SX475T MaxJ data flow
language
4 PCIe boards, MaxRing interconnect
Nallatech 385A 2016 Xeon E5-
1260v2
Intel/Altera Arria 10 GX1150 Intel OpenCL Nallatech 385A FPGA card
IBM S812L 2016 POWER8
10-cores
Xilinx Virtex-7 VX690T Xilinx OpenCL AlphaData PCIe FPGA board (ADM-PCIE-
7V3)
Micron
Workstation
2016 Intel i7-
5930K
Xilinx Kintex-7
UltrascaleKU115
Xilinx OpenCL Pico AC-510 FPGA board with Hybrid-
memory cube
XCL cluster 2017 Xeon E5-
1630v4
Xilinx Virtex-7 VX690T +
Xilinx Kintex Ultrascale
KU115
Xilinx OpenCL 8-node cluster with 2 FPGA cards per node
(AlphaData ADM-PCIE-7V3 and ADM-PCIE-
8K5)
HARP cluster 2017 Xeon E5-v4 Intel BDW+FPGA hybrid
CPU/FPGA
Intel OpenCL, HDL 10-node cluster with 1 BDW+FPGA
processor per node
• Recently acquired funding for next generation
HPC system
– 10M€ HPC system + 15M€ data center building
• FPGAs play a strategic role our HPC investment
– exploration of FPGAs in HPC
– port libraries and real scientific applications to FPGAs
– work on parallel FPGA implementations (MPI, PGAS)
– study performance and energy trade-offs
• Investment complemented by research, development and support efforts
– infrastructure accessible for free for researchers in Germany
– international collaborations possible and desired, negotiated on case-by-case basis
34
HPC with FPGAs at Paderborn University (2)
• Idea: Build experience for production system with FPGA testbed clusters
– building a cluster from components proved far more difficult than ever expected
– lot of effort from technicians, admins, and researchers
• FPGA hard- and software stacks are not ready for primetime yet
• Main difficulties
– poor onboarding experience
– fragility of firmware, drivers and software stack
– available management tools not suitable for multi-user HPC environment
– security implications poorly understood
• Conclusion: we will procure the production FPGA systems as validated solutions
from major HPC vendors
35
War Stories and Challenges (1)
• Poor onboarding experience
– hardly anything works out of the gate when installing FPGA
card in server
– outdated and incorrect administrator guides
– typical admins are not able to cope with the technology,
lack of good self-diagnostics
• Fragility of firmware, driver and software stack
– reliance of very specific (sometimes patched) OS versions
– intermingling of HLS flows, backend tools and BSPs
– unstable drivers (crashes, deadlocks, corruption of
data/configuration)
– in-field firmware upgrades not always possible, take too
long or cannot be automated
36
War Stories and Challenges (2)
• Available management tools not suitable for multi-user HPC environment
– no best practices to support applications relying on specific BSP-variants, driver versions, etc.
– no best practices/capabilities for automated firmware provisioning in cluster and workload
management systems
– static partitioning of FPGA into subsets per firmware (OpenCL / HDL and different tool releases)
leads to inacceptable resource fragmentation
• Security implications poorly understood
– ecosystem does not systematically consider multi-user scenario
– FPGA and board vendors are not confident asserting security properties of BSPs
– shared memory without memory protection opens the gates for evil (cache coherent CPU+FPGA,
PCIe bus master)
– OpenCL BSPs are delivered by vendors, no possibility to verify correctness and security
– too many ways to crash or lock up a machine (denial of service)
37
War Stories and Challenges (3)
• The future is bright
– FPGAs can deliver attractive solutions for HPC and data center workloads
– we have the most capable FPGA silicon we ever had
– HLS tools can not only deliver increased productivity but also competitive results for increasing
number of domains
– there finally is a “killer application” for FPGAs
– serious investments and commitment to FPGAs from suppliers and hyperscale data centers
• There is still substantial groundwork to do
– improve stability of software and hardware stack
– address needs of multi-user environment (security, backward compatibility, automated
provisioning of BSPs)
– better support for HPC languages and libraries (Fortran, OpenMP, OpenACC, MPI)
• The needs of data center applications will hopefully move the whole field along
38
Conclusions
• Perform fair comparisons
– no overblown claims, use strong and optimized baselines
– equivalent hardware generations
• Break out of the case studies dilemma
– target actual scientific codes rather than extracted kernels
– use relevant problem sizes and test data
– aim for generic designs that can handle broad range of problems
– target multi-FPGA implementation
• Spread the word
– connect with the HPC community and present your results
– release the results as open-source
• Join us in this effort!
39
Call to Action
christian.plessl@uni-paderborn.de
https://pc2.uni-paderborn.de
Twitter: @plessl // @pc2_upb

More Related Content

What's hot

05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPCRCCSRENKEI
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Datainside-BigData.com
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Igor José F. Freitas
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...Ganesan Narayanasamy
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architecturesinside-BigData.com
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is HereMartin Hamilton
 
Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Academy of Science of South Africa (ASSAf)
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC
 
OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC
 
Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Grigori Fursin
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras SystemsRCCSRENKEI
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCinside-BigData.com
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide shareGuy Tel-Zur
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 

What's hot (20)

05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Data
 
HPC in higher education
HPC in higher educationHPC in higher education
HPC in higher education
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is Here
 
Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019
 
OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021
 
Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
 
CORAL Fact Sheet
CORAL Fact SheetCORAL Fact Sheet
CORAL Fact Sheet
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide share
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 

Similar to FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedream?

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...Big Data Week
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Development Trends of Next-Generation Supercomputers
Development Trends of Next-Generation SupercomputersDevelopment Trends of Next-Generation Supercomputers
Development Trends of Next-Generation Supercomputersinside-BigData.com
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula Project
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell World
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...Grigori Fursin
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
 
Raspberry Pi Cluster Test Bed
Raspberry Pi Cluster Test BedRaspberry Pi Cluster Test Bed
Raspberry Pi Cluster Test BedBradford Bazemore
 
Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCinside-BigData.com
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
 
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchAri Berman
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY
 

Similar to FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedream? (20)

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
 
Available HPC Resources at CSUC
Available HPC Resources at CSUCAvailable HPC Resources at CSUC
Available HPC Resources at CSUC
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Development Trends of Next-Generation Supercomputers
Development Trends of Next-Generation SupercomputersDevelopment Trends of Next-Generation Supercomputers
Development Trends of Next-Generation Supercomputers
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Raspberry Pi Cluster Test Bed
Raspberry Pi Cluster Test BedRaspberry Pi Cluster Test Bed
Raspberry Pi Cluster Test Bed
 
Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSC
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
SC13 Diary
SC13 DiarySC13 Diary
SC13 Diary
 
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 

FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedream?

  • 1. FPGA-accelerated High-Performance Computing Close to Breakthrough or Pipedream? Christian Plessl Paderborn Center for Parallel Computing & Dept. Computer Science Paderborn University, Germany 24 January 2018
  • 2. • HPC and Computational Science • Status of using FPGAs in HPC • FPGA-accelerated HPC in Paderborn – plans – lessons learned • Conclusions and call to action Outline
  • 7. • Use computers simulation to obtain scientific results • “Third paradigm” following experiment and theory • Advantages of computer experiments: – make predictions what will happen – perform experiments that would otherwise be impossible, too difficult, too dangerous – perfect reproducibility – can offer explanations why something happens 7 What is Computational Science?
  • 8. • Computational science penetration all fields – engineering – natural sciences – humanities • Growing processing demand – simulation – optimization – data intensive analytics • Computer are virtual instruments – microscopes, telescopes, chemistry labs, ... – improve exponentially in capabilities in contrast to their physical counterparts 8 Computational Science Drives HPC Demand images: UCLA, MPG
  • 9. 9 Which Sciences Are Using HPC? Paderborn Center for Parallel Computing Materials Science 30% Physics 24% Engineering 13% Earth Science 10% Chemistry 15% Biological Sciences 6% Computer Science 2% 2016 INCITE BY DOMAIN 3.57 BILLION CORE-HOURS Chemistry 9% Biological Sciences 1% Argonne Leadership Computing Facility Physics 33% Engineering 21% Chemistry 27% Computer Science 12% Economics 2% other 5% ca. 700M core-hours/year
  • 10. • Massively parallel computation across all levels – instruction, core, socket, rack • Power consumption has become a first-class concern – operating cost and and power supply – cooling infrastructure 10 HPC: Massive Scale and Challenges 1 4 16 64 256 1,024 4,096 16,384 65,536 262,144 1,048,576 4,194,304 16,777,216 06/93 06/94 06/95 06/96 06/97 06/98 06/99 06/00 06/01 06/02 06/03 06/04 06/05 06/06 06/07 06/08 06/09 06/10 06/11 06/12 06/13 06/14 06/15 06/16 06/17 Number of CPU cores in Top 500 Supercomputers Cores #1 Cores #100 Power #1 [kW] Power #100 [kW] Trend for rank 1 Trend for rank 100 data: Top500
  • 11. • Ambitious roadmaps for HPC – launch of Exascale computing projects in US, Europe, Japan around 2010 – objective: 1 ExaFLOP – less than 20 MW – by 2020 • Requires substantial improvements in the whole stack – processor architecture, network, programming models – system design, cooling • Efficient computing resources are more important than ever – a wealth of new and re-invented architectures – accelerators – heterogeneous computing 11 Quest for Energy-Efficient Computing a huge opportunity for reconfigurable computing f Cell FPGA GPU Manycore Vector- processor accelerators
  • 12. • Accelerators entered HPC a decade ago • Performance (Top 500, 11/2017) – 20% of systems use accelerators – 25%-35% of accumulated performance • Efficiency (Green 500, 11/2017) – most efficient systems use PEZY-SC or GPU accelerators 12 Accelerators on the Rise statistics: Top500.org
  • 13. • Breakdown of Top 500 11/2017 by accelerator type • Interesting observations 1. 80% of the systems do not use any accelerators 2. Only NVidia GPUs and Intel Xeon Phi gained traction 3. FPGAs are absent from the Top500 • Quick rise ≠ universal adoption – Why don’t we see much broader adoption of accelerators? – Stagnation or a matter of time? 13 A Different Take on the Same Data statistics: Top500.org
  • 14. FPGA 14 Overarching Questions and Motivation for this Talk 1. If accelerators – in particular FPGAs – are so great, why aren’t they in much wider use? 2. What can we do to change this situation?
  • 15. • Currently operational, larger scale general-purpose FPGA installations – CHREC U. Florida: Novo-G# – Hartree Center UK: Maxeler MPC-X cluster – Texas TACC: Catapult 1 and Intel HARP v2 cluster – Paderborn University: XCL + HARP v2 cluster • HPC Applications with FPGA support – no generally available, production-ready HPC codes – some proof of concept codes (e.g. Maxeler Application Gallery) – probably some integrated solutions/appliances (bioinformatics, cryptography) • HPC Libraries with FPGA support – nothing usable/maintained (not even FFT, BLAS, LAPACK) – announced: Intel and Xilinx acceleration libraries (mainly deep learning) 15 Maybe Top500 Is Too Narrow. Perform a Broader Search
  • 16. • Numerous publications show the potential of FPGAs for relevant HPC problems • Some examples – Linear algebra: CG solver for sparse linear equation systems [1] § 20-40x faster than CPU – Geophysics: 3D convolution [1] § 70x faster than CPU, 14x faster than GPU – Molecular dynamics [2] § 80x faster than NAMD (single core) CPU – Bioinformatics (BLAST) [3] § 5x faster than optimized, parallel CPU implementation – Climate modeling [4] § 4 FPGAs 19x faster than two socket CPU, 7x faster than GPU 16 Are FPGAs Not Promising for HPC? I Don’t Think So [1] O. Lindtjorn, R. G. Clapp, O. Pell, O. Mencer, M. J. Flynn, and H. Fu. Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro, Mar.–Apr. 2011. [2] M. Chiu and M. C. Herbordt. Molecular dynamics simulations on high-performance reconfigurable computing systems. ACM TRETS Nov. 2010. [3] A. Mahram, and M. C. Herbordt. NCBI BLASTP on High-Performance Reconfigurable Computing System. ACM TRETS Jan 2015. [4] L. Gan, H. Fu, W. Luk et. al. Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms. ACM TRETS Mar. 2015
  • 17. • Hypothesis: clear value proposition is mandatory – no other affordable technology can satisfy the requirements – NRE: avoidance of cost for ASIC design – CAPEX, OPEX: reduction in investment and operating cost • Areas where FPGAs seem to have some commercial relevance – networking equipment (latency, NRE) – high frequency trading (latency, NRE) – bioinformatics (NRE, CAPEX) – cryptanalysis (CAPEX and OPEX) – defense / medical signal processing (space, power consumption) – deep learning inference (CAPEX and OPEX) • Viability of general purpose use (e.g. Amazon F1) so far unproven 17 Some Areas Where FPGAs Are Successfully Used
  • 18. So, where is the problem?
  • 19. • Since joining Paderborn University in 2007, I started to connect more intensely with the HPC community • The timing was good – HPC community was increasingly interested in computer architecture and accelerators – FPGAs were already known as hot technology for the future of HPC – some computational scientists were actually interested in collaboration • My naïve assumptions – HPC folks will be convinced of FPGAs once they see case studies – publishing the results at mainstream HPC conferences is in reach – FPGAs will soon be mainstream to have chances for an HPC faculty position 19 Pitching FPGAs Acceleration to HPC Audience (1)
  • 20. 20 Pitching FPGAs Acceleration to HPC Audience (2) collaboration with physicist ✓ problem that sounds somewhat important ✓ state-of-the-art CPUs and FPGAs ✓
  • 21. 21 Pitching FPGAs Acceleration to HPC Audience (3) algorithm simple enough to understand ✓
  • 22. 22 Pitching FPGAs Acceleration to HPC Audience (4) high-level hardware synthesis, no HDL (any computational scientist can do this) ✓
  • 23. 23 Pitching FPGAs Acceleration to HPC Audience (5) the synthesis result is not something weird, that can only be understood by electrical engineers ✓
  • 24. 24 Pitching FPGAs Acceleration to HPC Audience (6) CPU and FPGA use double precision arithmetic ✓ CPU implementation appears to be reasonably optimized: • multi-threaded ✓ • cache blocking ✓ • NUMA-aware memory allocation ✓ speedup is not stellar, but OK considering the strong CPU baseline ✓
  • 25. • The CPU performance baseline is too low – stencil codes can be much better optimized – code probably not vectorized • Optimization for FPGA insufficiently understood – what is the theoretical performance limit and bottlenecks (computation, memory, dependencies) – how can FPGAs ever win, if the DRAM is slower than for CPUs (lack of understanding of pipelining, streaming, ...) • Fear, uncertainty, doubt – is this work actually relevant for computational scientists? – can you train HPC developers to use FPGAs? – will the required investment in expensive FPGA hard and software pay off? 25 My Pitch Was Not Received as Well as Expected Performance of FPGA ~16 DP computations / update → 1000 MCell/s = 16 GFLOPS Peak Performance CPU 2 sockets * 4 cores * 2.5GHz * 1 – 8 FLOPS = 20 – 160 GFLOPS
  • 26. • HPC developers are constantly told exciting stories – this technology is the future: Itanium, Cell, BlueGene, Xeon Phi – the compiler will handle the complexity for you • No user cares for energy efficiency – only infrastructure providers do • User care for ease-of-use and protection of their investments – many codes are gigantic, countless person years investment – there are plenty of free computing resources available for academics • Benefits of the new technology are not convincingly presented – proof-of-concept case study, no real-state of the art problems – improvement in metrics not relevant for target users (method vs. insight-driven research) 26 What was Going Wrong? TRUST US ✓
  • 27. • The pitfalls FPGA acceleration in HPC are currently not widely acknowledged, discussed and understood • Interesting position paper published 2009 in ACM TRETS – Premise: FPGAs show lots of promise but lack acceptance in general-purpose HPC installations – Proposed 12 areas where researcher need to make contributions to increase acceptance of FPGAs in HPC • Many observations and conclusions still apply today 27 Pitfalls of HPC Acceleration for HPC
  • 28. The Long Road to Production Reconfigurable Supercomputing · 26: 11 Table I. The State of FPGA Research Toward HPC Area Status Activity Difficulty Step 1: Standardization poor moderate low Step 2: High Performance Forward Portability poor low high Step 3: Enhanced Device Performance good low high Step 4: Enhanced System Architecture fair none moderate Step 5: Simplified Library Usage fair low low Step 6: Concurrent APIs poor low low Step 7: Better Performance Studies fair moderate moderate Step 8: Improved Programming Environment good high high Step 9: Improved Infrastructure poor low moderate Step 10: Enhanced Communications good moderate moderate Step 11: Enhanced Reliability poor low high Step 12: Provide OS Support poor low low objective viewpoint that only it can. Step 2 (high performance forward porta- bility) is in similarly bad shape, and is a much harder problem. There is rela- Critical Areas Identified by Underwood et al. 28
  • 29. Getting to the Core of the Problem 29 Accelerator research stands in striking contrast to high performance computing and general microprocessor optimization work. In the latter, optimization work often goes into widely available libraries (e.g. ATLAS and FFTW). In contrast, accelerator research tends to be a single proof of concept effort that never makes it outside the lab – despite the fact that it targets widely used core algorithms. [..] It is time for accelerator researchers to invest the extra effort and make their work applicable. [Underwood et. al 2009]
  • 30. Intuitive Assessment of the Progress We Made Since 2009 30 The Long Road to Production Reconfigurable Supercomputing · 26: 11 Table I. The State of FPGA Research Toward HPC Area Status Activity Difficulty Step 1: Standardization poor moderate low Step 2: High Performance Forward Portability poor low high Step 3: Enhanced Device Performance good low high Step 4: Enhanced System Architecture fair none moderate Step 5: Simplified Library Usage fair low low Step 6: Concurrent APIs poor low low Step 7: Better Performance Studies fair moderate moderate Step 8: Improved Programming Environment good high high Step 9: Improved Infrastructure poor low moderate Step 10: Enhanced Communications good moderate moderate Step 11: Enhanced Reliability poor low high Step 12: Provide OS Support poor low low objective viewpoint that only it can. Step 2 (high performance forward porta- bility) is in similarly bad shape, and is a much harder problem. There is rela-
  • 31. • The time of the free lunch for performance is over – GPUs have paved the way for application modifications – previously the code was assumed to be sacred and untouchable • Energy efficiency has become a pressing issue – opens up another dimension for competition • There is finally a “killer app” – inference for deep neural networks – FPGAs ride the AI hype-wave • Cloud and data center players make massive investments in FPGAs – Altera acquisition by Intel, IBM/Xilinx partnership – use of FPGAs in clouds of Microsoft, Amazon, Baidu, IBM, Huawei, etc. – the overall ecosystem will profit from this 31 Changes in Ecosystem Since Underwood’s Assessment
  • 32. • OpenCL High-Level Synthesis design flows – language capable of specifying many aspects relevant for FPGAs – standardized and used in other contexts too – supports easier design space exploration – abstracts from FPGA board, memory channels, PCIe interfaces • Highly capable FPGA devices – vast amounts of DSP blocks – suitable bit widths for implementing floating point arithmetic • Steps towards better system integration – shared and coherent global memory access 32 Technological Progress Since Underwood’s Assessment HPC-relevant Intel Stratix 10 features • 5.5 M LE • 28 MB block RAM • 10 TFLOPS single-precision floating- point performance • 80 GFLOPS/W (best Green500 system achieves 17 DP GFLOPS/W) • hardened PCIe x16 • hardened memory controllers for DDR4 • up to 96 transceivers
  • 33. • Longstanding experience with FPGAs for HPC • Current FPGA infrastructure – two testbed clusters for public use – additional FPGA systems from most major vendors 33 HPC with FPGAs at Paderborn University System Inst CPU FPGA Toolflow Properties Convey HC-1 2010 Xeon 5138 4x Xilinx Virtex-5 LX 330 HDL + vector processor overlay CPU and FPGA connected via FSB, cache- coherent NUMA architecture Maxeler MPC-C 2012 Xeon X5660 4x Xilinx Virtex-6 SX475T MaxJ data flow language 4 PCIe boards, MaxRing interconnect Nallatech 385A 2016 Xeon E5- 1260v2 Intel/Altera Arria 10 GX1150 Intel OpenCL Nallatech 385A FPGA card IBM S812L 2016 POWER8 10-cores Xilinx Virtex-7 VX690T Xilinx OpenCL AlphaData PCIe FPGA board (ADM-PCIE- 7V3) Micron Workstation 2016 Intel i7- 5930K Xilinx Kintex-7 UltrascaleKU115 Xilinx OpenCL Pico AC-510 FPGA board with Hybrid- memory cube XCL cluster 2017 Xeon E5- 1630v4 Xilinx Virtex-7 VX690T + Xilinx Kintex Ultrascale KU115 Xilinx OpenCL 8-node cluster with 2 FPGA cards per node (AlphaData ADM-PCIE-7V3 and ADM-PCIE- 8K5) HARP cluster 2017 Xeon E5-v4 Intel BDW+FPGA hybrid CPU/FPGA Intel OpenCL, HDL 10-node cluster with 1 BDW+FPGA processor per node
  • 34. • Recently acquired funding for next generation HPC system – 10M€ HPC system + 15M€ data center building • FPGAs play a strategic role our HPC investment – exploration of FPGAs in HPC – port libraries and real scientific applications to FPGAs – work on parallel FPGA implementations (MPI, PGAS) – study performance and energy trade-offs • Investment complemented by research, development and support efforts – infrastructure accessible for free for researchers in Germany – international collaborations possible and desired, negotiated on case-by-case basis 34 HPC with FPGAs at Paderborn University (2)
  • 35. • Idea: Build experience for production system with FPGA testbed clusters – building a cluster from components proved far more difficult than ever expected – lot of effort from technicians, admins, and researchers • FPGA hard- and software stacks are not ready for primetime yet • Main difficulties – poor onboarding experience – fragility of firmware, drivers and software stack – available management tools not suitable for multi-user HPC environment – security implications poorly understood • Conclusion: we will procure the production FPGA systems as validated solutions from major HPC vendors 35 War Stories and Challenges (1)
  • 36. • Poor onboarding experience – hardly anything works out of the gate when installing FPGA card in server – outdated and incorrect administrator guides – typical admins are not able to cope with the technology, lack of good self-diagnostics • Fragility of firmware, driver and software stack – reliance of very specific (sometimes patched) OS versions – intermingling of HLS flows, backend tools and BSPs – unstable drivers (crashes, deadlocks, corruption of data/configuration) – in-field firmware upgrades not always possible, take too long or cannot be automated 36 War Stories and Challenges (2)
  • 37. • Available management tools not suitable for multi-user HPC environment – no best practices to support applications relying on specific BSP-variants, driver versions, etc. – no best practices/capabilities for automated firmware provisioning in cluster and workload management systems – static partitioning of FPGA into subsets per firmware (OpenCL / HDL and different tool releases) leads to inacceptable resource fragmentation • Security implications poorly understood – ecosystem does not systematically consider multi-user scenario – FPGA and board vendors are not confident asserting security properties of BSPs – shared memory without memory protection opens the gates for evil (cache coherent CPU+FPGA, PCIe bus master) – OpenCL BSPs are delivered by vendors, no possibility to verify correctness and security – too many ways to crash or lock up a machine (denial of service) 37 War Stories and Challenges (3)
  • 38. • The future is bright – FPGAs can deliver attractive solutions for HPC and data center workloads – we have the most capable FPGA silicon we ever had – HLS tools can not only deliver increased productivity but also competitive results for increasing number of domains – there finally is a “killer application” for FPGAs – serious investments and commitment to FPGAs from suppliers and hyperscale data centers • There is still substantial groundwork to do – improve stability of software and hardware stack – address needs of multi-user environment (security, backward compatibility, automated provisioning of BSPs) – better support for HPC languages and libraries (Fortran, OpenMP, OpenACC, MPI) • The needs of data center applications will hopefully move the whole field along 38 Conclusions
  • 39. • Perform fair comparisons – no overblown claims, use strong and optimized baselines – equivalent hardware generations • Break out of the case studies dilemma – target actual scientific codes rather than extracted kernels – use relevant problem sizes and test data – aim for generic designs that can handle broad range of problems – target multi-FPGA implementation • Spread the word – connect with the HPC community and present your results – release the results as open-source • Join us in this effort! 39 Call to Action christian.plessl@uni-paderborn.de https://pc2.uni-paderborn.de Twitter: @plessl // @pc2_upb