•0 likes•149 views

Report

Share

Download to read offline

Presentation given by Tobias Becker (Maxeler) at the LEGaTO Final Event: Low-Energy Heterogeneous Computing Workshop on 4 September 2020 This event was collocated with FPL 2020

Follow

- 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 9 September 2020 Infection Research with Maxeler Dataflow Computing LEGaTO Final Event: Low-Energy Heterogeneous Computing Workshop Tobias Becker Maxeler Technologies
- 2. Low-Energy Heterogeneous Computing Workshop Introduction – Biomarkers Analysis • Biomarkers can serve many unique purposes including : − screening for early signs of disease, or monitoring of progression − monitoring eﬀects of the treatments − study organ function • Various data types − measurable molecules found in body tissues, cells and fluids − MRI − function tests − heart rate, blood pressure • Modern high-throughout technologies can produce lots of data: micro array, next-gen sequencing, mass spectroscopy • Goal: Using statistical methods to research effectiveness of drugs, vaccination strategies and harmfulness of pathogens 2 09/09/20
- 3. Low-Energy Heterogeneous Computing Workshop Computational aspects • High-throughput data from biological experiments are gathered into vectors whose dimension correspond to the number of measurements • For statistical analysis, the number of samples (n) must be larger than the number of biomarker candidates (p) for good classiﬁcation performance to get a reliable diagnosis. • Obtaining samples is expensive • Basis: Pilot studies − Small sample size but large number of candidates (short fat data) − Distinguishing real correlations from random correlations • Strategy: − Select top features from thousands that can predict the cases − Estimating the appropriate sample size to get significant results 3 09/09/20
- 4. Low-Energy Heterogeneous Computing Workshop Algorithm: Biomarker Candidate Evaluation 4 09/09/20 Pilot study with small sample size Global evaluation of single biomarkers More ``good´´ single biomarkers than expceted? Discard/Redesign study no Compute potential of biomarker combinations & estimate required sample size to validate combinations yes Real Data Compute goodness of BCs (Entropy) Random Data Evaluate pilot study (HiPerMAb) Estimate p-values Maxeler DFE Implemented in R Time consuming: runs hours on typical datasets
- 5. Low-Energy Heterogeneous Computing Workshop Optimisation • Goals: − Decrease simulation runtime − Enable to handle larger data sets and more complex problems − Decrease energy consumption • Target: − Maxeler dataflow computer with MAX5 DFE − Also works on Xilinx Alveo and Amazon EC2 F1 • Optimisation plan: − Manual: Port R code to C and MaxJ − Optimise using LEGaTO toolflow: Port to OmpSs and MaxJ 5
- 6. Low-Energy Heterogeneous Computing Workshop PCI Express Manager DFE Memory MyManager (.maxj) x x + 30 x Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), m.build(); link(“y", CPU)); MyKernel(DATA_SIZE, x, DATA_SIZE*4, y, DATA_SIZE*4); for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30; Main Memory CPU CPU Code Host Code (.c) MaxCompiler: Application Development in MaxJ 6 SLiC MaxelerOS DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30; io.output("y", result, dfeInt(32)); MyKernel (.maxj) int*x, *y; x x + 30 y
- 7. Low-Energy Heterogeneous Computing Workshop Practical Acceleration Process in MaxJ Focus profiling on loop structure, not function calls init_run : 66.92s CPU 66.92s WALL ( 1 calls) electrons : 3116.79s CPU 3116.79s WALL ( 1 calls) Called by electrons: v_of_rho : 17.91s CPU 17.91s WALL ( 51 calls) c_bands : 2305.58s CPU 2305.58s WALL ( 50 calls) sum_band : 597.16s CPU 597.16s WALL ( 50 calls) mix_rho : 2.80s CPU 2.80s WALL ( 50 calls) Called by c_bands: h_psi : 962.63s CPU 962.63s WALL ( 684 calls) g_psi : 25.24s CPU 25.24s WALL ( 582 calls) cdiaghg : 666.02s CPU 666.02s WALL ( 682 calls) Called by h_psi: add_vuspsi : 79.00s CPU 79.00s WALL ( 684 calls) General routines calbec : 56.10s CPU 56.10s WALL ( 684 calls) fft : 3.30s CPU 3.30s WALL ( 560 calls) fftw : 1156.55s CPU 1156.50s WALL ( 97578 calls)
- 8. Low-Energy Heterogeneous Computing Workshop Task-based kernel identification for DFE mapping • OmpSs identifies “static” task graphs while running • Annotation of IO and compute help to create DFE task model • Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines
- 9. Low-Energy Heterogeneous Computing Workshop LEGaTO tools for DFE mapping 9 September 2020 task graph loopflow graph • automatically generate loopflow graph • annotate with key information
- 10. Low-Energy Heterogeneous Computing Workshop Results 10 • Application accelerated with Maxeler DFE − Accelerated Cut Index function (entropy calculation) in Biomarker Evaluation − Added hardware random number generator to Cut Index accelerator − Evaluated 10^6 problem size on Jülich testbed − Validated new toolflow and results Original R code C++ Cut Index DFE Cut Index Time[s] 4514 74.8 5.49 Speedup 1(baseline) 60 822 Original R code C++ Cut Index DFE Cut Index Energy [Wh] 575 9.7 1.3 efficiency gain 1(baseline) 59 443
- 11. Thanks!