1
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Heterogeneous HPC Computing
in the DeepHealth Project
José Flich (UPV)
Monica Caballero (everis)
European Big Data Value Forum (EBDVF) 2019
15 October 2019, Helsinki
2
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
About DeepHealth
Aim & Goals
§ Facilitate the daily work and increase the productivity of medical personnel and IT professionals in terms of image
processing and the use and training of predictive models without the need of combining numerous tools.
§ Offer a unified framework adapted to exploit underlying heterogeneous HPC and Big Data architectures
supporting state-of-the-art and next-generation Deep Learning (AI) and Computer Vision algorithms to enhance
European-based medical software platforms.
§ Put HPC computing power at the service of biomedical applications with DL needs and, through an
interdisciplinary approach, apply DL techniques on large and complex image biomedical datasets to support new and
more efficient ways of diagnosis, monitoring and treatment of diseases.
Duration: 36 months
Starting date: Jan 2019
Budget 14.642.366 €
EU funding 12.774.824 €
21 partners from 9 countries: Research
centers, Health organizations, large industries
and SMEs
3
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
About DeepHealth
• The DeepHealth toolkit: Free and open-source software with two core technology libraries and a dedicated
front-end.
• EDDLL: The European Distributed Deep Learning Library
• ECVL: the European Computer Vision Library
• Ready to run algorithms on Hybrid HPC + Big Data architectures with heterogeneous hardware
• Seven biomedical and AI software platforms will integrate the DeepHealth libraries to improve their
potential.
Use-cases
• 14 pilot test-beds in 3 areas:
• Neurological diseases
• Tumor detection and early cancer prediction
• Digital pathology and automated image annotation.
• Pilots will allow to train models and evaluate the performance of the proposed solutions in terms of time
and accuracy.
Expected results
4
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
DeepHealth HPC Goals
5
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
DeepHealth Goals
• Develop a European Distributed Deep-Learning Library (EDDL)
• Develop a European Computer Vision Library (ECVL)
• Adapt EDDL/ECVL to HPC infrastructure
• Heterogeneous Architectures
• Apply the EDDL/ECVL to 7 European Platforms for Medical applications
• Apply the DeepHealth solution to 14 use cases (pilots) for medical diagnosis
development adaptation use
6
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
HPC Goals and Related Challenges
• Adapt EDDL and ECVL libraries to HPC infrastructure
• Computation
• CPUs, GPUs, FPGAs
• Communication
• Distribution of training process
• KPI
• 4X performance improvement and 7X better power efficiency for target
DeepHealth infrastructure with advanced HPC technologies
(combining manycores with vectorial units, GPUs, FPGAs, and low-
latency interconnects) compared to standard HPC infrastructure
7
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Platform
Platform
Platform
Challenges
At different levels
EDDL
library
ECVL
library
Use case
Heterog.HPC
CPU CPU CPU GPU GPU GPU FPGA FPGA FPGA FPGA
Interconnect
Use caseUse case
Use caseUse caseUse case
• Develop EDDL/ECVL
• Adapt Platforms
• Adapt Use Cases
• Adapt HPC
• computation, runtime, distribution, interconnect
1
1
1
2 2
3
3 3
4
4 4
4 4 4
4
Implementation Challenge:
Adapting new libraries (for performance)
as they are being implemented and tested
8
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Types of Systems
Heterogeneity support
CPU
GPU
Interconnect
CPU
GPU
CPU GPU CPU
GPU
CPU
GPU
Interconnect
CPU
GPU
CPU
FPGA
CPU
FPGA
CPU
Interconnect
CPU CPU CPU
CPU
GPU
Interconnect
GPU
CPU
GPU
GPU
CPU
GPU
GPU
CPU
GPU
Interconnect
CPU
GPU
CPU
FPGA
FPGA
GPU
9
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
DeepHealth HPC Goals
• Reinvest in FET-HPC projects (MANGO)
• Large FPGA cluster for heterogeneous HPC Exploration
10
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Target HPC Systems
11
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
MareNostrum 4
Total	peak	performance:	13,7 Pflops
General	Purpose	Cluster:	 11.15	Pflops (1.07.2017)
CTE1-P9+Volta:	 1.57	Pflops (1.03.2018)
CTE2-Arm	V8:	 0.5	Pflops (????)
CTE3-KNH?:	 0.5	Pflops (????)
MareNostrum 1
2004	– 42,3	Tflops
1st Europe	/	4th World
New	technologies	
MareNostrum 2
2006	– 94,2	Tflops
1st Europe	/	5th World
New	technologies
MareNostrum 3
2012	– 1,1	Pflops
12th Europe	/	36th World
MareNostrum 4
2017	– 11,1	Pflops
2nd Europe	/	13th World
New	technologies
12
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
BSC HPC Infrastructures
PUT YOUR SMART SUBTITLE HERE
• General Purpose Cluster (in production)
• 48 racks with 3456 nodes, each with 2 Intel Xeon Platinum proc.
• Total of 11.15 PFLOPs in Double Precision
• System with total of 165888 processors and 390TB of main memory
• 29th fastest supercomputer in top500, 7th fastest supercomputer in Europe
• CTE1-P9+VOLTA (in production)
• 54 nodes, each with 2 POWER9 proc., 4 Volta GPUs, 6.4TB NVMe
• Total of 1.57 PFLOPs in Double Precision
• Same node as Sierra supercomputer at LLNL (2nd fastest supercomputer in
top500)
• Suitable for HPC and Machine Learning workloads
13
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
BSC HPC Infrastructures
PUT YOUR SMART SUBTITLE HERE
• CTE2-Arm v8 (to be deployed in 2020)
• Same processor as in the future post-K supercomputer in Japan
• Targets Exascale workloads: 2.7 TFLOPS double precision compute power,
5.4 TFLOPS in single precision; 10.8 TFLOPS in half-precision (16 bits)
• HPC and AI convergence: up to 21.6 TOPS in 8-bit int precision
• 7nm technology; 48 cores; 4 stacks of 8GB HBM2 (total of 32GB)
• Novel 512-bit SVE ext. with specific instructions for machine learning
• Might be interesting as a cutting edge system by the end of DeepHealth
• Mont-Blanc 3 prototype (in production)
• 48 nodes, 2 processors/node (96 processors in total)
• Cavium Thunder X2 processor: 32-core Arm v8, 4-way SMT, up to 2.5GHz
• Targets HPC workloads in datacenters
• System with up to 3K cores and 12K threads
• Liquid cooling
14
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
MANGO prototype
From FET-HPC MANGO project
• 16 (interconnected) clusters, each with
• One Server node
• 12 FPGAs (lego system)
• Xilinx 7–series, Zynq-7000, Kintex Ultrascale+
• Intel Stratix-10
• DDR3, DDR4 pluggable memory modules
• Connections: PCIe Express Gen 2/3 lanes, 40Gbps QSFP
prototype
onecluster
15
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
PROD: Development of a customized FPGA-
based PCIe Board
• Based on latest Intel or Xilinx FPGA
technology (TBD)
• High bandwidth and low latency PCIe
interface for data exchange with host
• Modular peripherals (memories,
interfaces) - TBD
16
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
The DeepHealth Computing Infrastructure
Overview
COMPSs
Global Resource Manage
(Slurm-based)
Distributed Programming Model
(e.g., M/R, task-based)
Non-functional
requirements description
API provided to ECVL and EDDLL developers (WP2/WP3)
Parallel
Run-time
Netlist Partitioning
Vivado tools
N2D2
framework
Mango
Run-time
Mango
Cluster
MareNostrum 4 (Intel)
Arm ThunderX2
POWER9+Voltas Cluster
Private (NVIDIA)
+ Public Cloud
DeepHealth HPC HW Resources DeepHealth Cloud HW Resources
OpenStack
platform
Parallel Programming Models
(e.g., CUDA, OpenCL, OpenMP)
Cloud
API
DeepHealth SW Architecture
Private Cloud
(x86+NVIDIA T4)Tailored FPGA PCIe card
1200 cores
cluster (x86)
BSC
UNITO
PROD
UPV
UNITOTREE
Programming models and access methods for
EDDLL and ECVL development
The DeepHealth computing infrastructure including
HPC and big-data cloud-based resources
Multiple Workloads Scheduling
Single Workload
Scheduling
Container-based
(Parallel) Programming Models
HW
EDDLL workload
(e.g., training)
EDDL workload
(e.g., inference)
Single Workload
Scheduling
17
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
COMPSs
• Framework (programming model + runtime system) to develop parallel
applications for distributed infrastructures
• Abstract model: exposes parallelism while hides the infrastructure
• Agnostic of computing platform
• Task-based programming model build on top of general purpose sequential
programming languages (Python, C, C++, Java)
def display(c):
…
def add(a, b, c):
c = a + b
for i in range(MSIZE):
add(A[i],B[i],C[i])
display(C)
@task(c=INOUT)
def display(c):
…
@task(a=IN,b=IN,c=OUT)
def add(a, b, c):
c = a + b
for i in range(MSIZE):
add(A[i],B[i],C[i])
display(C)
ad
d
ad
d
ad
d
dis
pla
y
…
MSIZE
18
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
EPFL: Multi-objective RM policies
• Power/performance/accuracy-aware
runtime resource management policies
• Automatic selection of the most efficient
resources
• Adding one new axis: accuracy!
• Heuristics, ML-based and hyper-heuristic
RM policies (algorithms)
• Single-node: selection of accelerators
(allocation), DVFS settings
• Multiple nodes (Global RM of MANGO)
• Integrated with DeepHealth SW stack
• MANGO API + COMPS + Slurm
19
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Data Parallelism
• Training batch distribution
• Gradient collection and weights distribution
• AllReduce, Broadcast support to be exploited
• Different strategies will be implemented and evaluated
• Synchronization primitives (relaxed models)
CPU
GPU
Interconnect
CPU GPU CPU
FPGA
FPGA
GPU
High Pressure
on the
Interconnect
20
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Netlist partitioning (CEA)
• Use a multi-FPGA platform as a single virtual large FPGA
• For very large inference networks that do not fit into a single
FPGA
• Direct IO-to-IO connection between FPGAs
• Optimized partitioning of the netlist into several netlists
• Combinatiorial optimization model, taking into account
critical paths & resource quantities in each FPGA
• Several state-of-the-art optimization methods, from
Kernighan-Lin to simulated annealing
• Execution of the design on the multi-FPGA platform
• Multiplexing of signals to deal with the limited
interconnection
21
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Heterogeneous Computing
• DeepLearning and Computer Vision kernels to be deployed for
• CPU
• Math processing routines (MKL, Eigen)
• GPU
• CUDA vs OpenCL programming
• FPGA
• OpenCL vs HLS vs RTL programming
• Intel/Altera vs Xilinx platforms
22
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
HPC Things to Explore in DeepHealth
• Communication impact
• Will the network become the bottleneck?
• Use cases sizes
• Accuracy vs performance trade-off
• FPGA suitability for Training (Floating point precision requirement)
• Will be energy efficient for such large challenge?
• Which FPGA devices will perform better (accuracy vs. energy trade-off)
• Scalability of the solution (EDDL/ECVL)
• Will perform well on any end-used HPC-like platform?
• … so, ahead a challenging future for DeepHealth HPC teams!
23
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
José Flich (jflich@disca.upv.es)
Mónica Caballero (monica.caballero.galeote@everis.com)
Thank you!

Heterogeneous HPC Computing in the DeepHealth Project

  • 1.
    1 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Heterogeneous HPC Computing in the DeepHealth Project José Flich (UPV) Monica Caballero (everis) European Big Data Value Forum (EBDVF) 2019 15 October 2019, Helsinki
  • 2.
    2 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. About DeepHealth Aim & Goals § Facilitate the daily work and increase the productivity of medical personnel and IT professionals in terms of image processing and the use and training of predictive models without the need of combining numerous tools. § Offer a unified framework adapted to exploit underlying heterogeneous HPC and Big Data architectures supporting state-of-the-art and next-generation Deep Learning (AI) and Computer Vision algorithms to enhance European-based medical software platforms. § Put HPC computing power at the service of biomedical applications with DL needs and, through an interdisciplinary approach, apply DL techniques on large and complex image biomedical datasets to support new and more efficient ways of diagnosis, monitoring and treatment of diseases. Duration: 36 months Starting date: Jan 2019 Budget 14.642.366 € EU funding 12.774.824 € 21 partners from 9 countries: Research centers, Health organizations, large industries and SMEs
  • 3.
    3 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. About DeepHealth • The DeepHealth toolkit: Free and open-source software with two core technology libraries and a dedicated front-end. • EDDLL: The European Distributed Deep Learning Library • ECVL: the European Computer Vision Library • Ready to run algorithms on Hybrid HPC + Big Data architectures with heterogeneous hardware • Seven biomedical and AI software platforms will integrate the DeepHealth libraries to improve their potential. Use-cases • 14 pilot test-beds in 3 areas: • Neurological diseases • Tumor detection and early cancer prediction • Digital pathology and automated image annotation. • Pilots will allow to train models and evaluate the performance of the proposed solutions in terms of time and accuracy. Expected results
  • 4.
    4 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. DeepHealth HPC Goals
  • 5.
    5 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. DeepHealth Goals • Develop a European Distributed Deep-Learning Library (EDDL) • Develop a European Computer Vision Library (ECVL) • Adapt EDDL/ECVL to HPC infrastructure • Heterogeneous Architectures • Apply the EDDL/ECVL to 7 European Platforms for Medical applications • Apply the DeepHealth solution to 14 use cases (pilots) for medical diagnosis development adaptation use
  • 6.
    6 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. HPC Goals and Related Challenges • Adapt EDDL and ECVL libraries to HPC infrastructure • Computation • CPUs, GPUs, FPGAs • Communication • Distribution of training process • KPI • 4X performance improvement and 7X better power efficiency for target DeepHealth infrastructure with advanced HPC technologies (combining manycores with vectorial units, GPUs, FPGAs, and low- latency interconnects) compared to standard HPC infrastructure
  • 7.
    7 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Platform Platform Platform Challenges At different levels EDDL library ECVL library Use case Heterog.HPC CPU CPU CPU GPU GPU GPU FPGA FPGA FPGA FPGA Interconnect Use caseUse case Use caseUse caseUse case • Develop EDDL/ECVL • Adapt Platforms • Adapt Use Cases • Adapt HPC • computation, runtime, distribution, interconnect 1 1 1 2 2 3 3 3 4 4 4 4 4 4 4 Implementation Challenge: Adapting new libraries (for performance) as they are being implemented and tested
  • 8.
    8 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Types of Systems Heterogeneity support CPU GPU Interconnect CPU GPU CPU GPU CPU GPU CPU GPU Interconnect CPU GPU CPU FPGA CPU FPGA CPU Interconnect CPU CPU CPU CPU GPU Interconnect GPU CPU GPU GPU CPU GPU GPU CPU GPU Interconnect CPU GPU CPU FPGA FPGA GPU
  • 9.
    9 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. DeepHealth HPC Goals • Reinvest in FET-HPC projects (MANGO) • Large FPGA cluster for heterogeneous HPC Exploration
  • 10.
    10 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Target HPC Systems
  • 11.
    11 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. MareNostrum 4 Total peak performance: 13,7 Pflops General Purpose Cluster: 11.15 Pflops (1.07.2017) CTE1-P9+Volta: 1.57 Pflops (1.03.2018) CTE2-Arm V8: 0.5 Pflops (????) CTE3-KNH?: 0.5 Pflops (????) MareNostrum 1 2004 – 42,3 Tflops 1st Europe / 4th World New technologies MareNostrum 2 2006 – 94,2 Tflops 1st Europe / 5th World New technologies MareNostrum 3 2012 – 1,1 Pflops 12th Europe / 36th World MareNostrum 4 2017 – 11,1 Pflops 2nd Europe / 13th World New technologies
  • 12.
    12 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. BSC HPC Infrastructures PUT YOUR SMART SUBTITLE HERE • General Purpose Cluster (in production) • 48 racks with 3456 nodes, each with 2 Intel Xeon Platinum proc. • Total of 11.15 PFLOPs in Double Precision • System with total of 165888 processors and 390TB of main memory • 29th fastest supercomputer in top500, 7th fastest supercomputer in Europe • CTE1-P9+VOLTA (in production) • 54 nodes, each with 2 POWER9 proc., 4 Volta GPUs, 6.4TB NVMe • Total of 1.57 PFLOPs in Double Precision • Same node as Sierra supercomputer at LLNL (2nd fastest supercomputer in top500) • Suitable for HPC and Machine Learning workloads
  • 13.
    13 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. BSC HPC Infrastructures PUT YOUR SMART SUBTITLE HERE • CTE2-Arm v8 (to be deployed in 2020) • Same processor as in the future post-K supercomputer in Japan • Targets Exascale workloads: 2.7 TFLOPS double precision compute power, 5.4 TFLOPS in single precision; 10.8 TFLOPS in half-precision (16 bits) • HPC and AI convergence: up to 21.6 TOPS in 8-bit int precision • 7nm technology; 48 cores; 4 stacks of 8GB HBM2 (total of 32GB) • Novel 512-bit SVE ext. with specific instructions for machine learning • Might be interesting as a cutting edge system by the end of DeepHealth • Mont-Blanc 3 prototype (in production) • 48 nodes, 2 processors/node (96 processors in total) • Cavium Thunder X2 processor: 32-core Arm v8, 4-way SMT, up to 2.5GHz • Targets HPC workloads in datacenters • System with up to 3K cores and 12K threads • Liquid cooling
  • 14.
    14 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. MANGO prototype From FET-HPC MANGO project • 16 (interconnected) clusters, each with • One Server node • 12 FPGAs (lego system) • Xilinx 7–series, Zynq-7000, Kintex Ultrascale+ • Intel Stratix-10 • DDR3, DDR4 pluggable memory modules • Connections: PCIe Express Gen 2/3 lanes, 40Gbps QSFP prototype onecluster
  • 15.
    15 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. PROD: Development of a customized FPGA- based PCIe Board • Based on latest Intel or Xilinx FPGA technology (TBD) • High bandwidth and low latency PCIe interface for data exchange with host • Modular peripherals (memories, interfaces) - TBD
  • 16.
    16 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. The DeepHealth Computing Infrastructure Overview COMPSs Global Resource Manage (Slurm-based) Distributed Programming Model (e.g., M/R, task-based) Non-functional requirements description API provided to ECVL and EDDLL developers (WP2/WP3) Parallel Run-time Netlist Partitioning Vivado tools N2D2 framework Mango Run-time Mango Cluster MareNostrum 4 (Intel) Arm ThunderX2 POWER9+Voltas Cluster Private (NVIDIA) + Public Cloud DeepHealth HPC HW Resources DeepHealth Cloud HW Resources OpenStack platform Parallel Programming Models (e.g., CUDA, OpenCL, OpenMP) Cloud API DeepHealth SW Architecture Private Cloud (x86+NVIDIA T4)Tailored FPGA PCIe card 1200 cores cluster (x86) BSC UNITO PROD UPV UNITOTREE Programming models and access methods for EDDLL and ECVL development The DeepHealth computing infrastructure including HPC and big-data cloud-based resources Multiple Workloads Scheduling Single Workload Scheduling Container-based (Parallel) Programming Models HW EDDLL workload (e.g., training) EDDL workload (e.g., inference) Single Workload Scheduling
  • 17.
    17 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. COMPSs • Framework (programming model + runtime system) to develop parallel applications for distributed infrastructures • Abstract model: exposes parallelism while hides the infrastructure • Agnostic of computing platform • Task-based programming model build on top of general purpose sequential programming languages (Python, C, C++, Java) def display(c): … def add(a, b, c): c = a + b for i in range(MSIZE): add(A[i],B[i],C[i]) display(C) @task(c=INOUT) def display(c): … @task(a=IN,b=IN,c=OUT) def add(a, b, c): c = a + b for i in range(MSIZE): add(A[i],B[i],C[i]) display(C) ad d ad d ad d dis pla y … MSIZE
  • 18.
    18 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. EPFL: Multi-objective RM policies • Power/performance/accuracy-aware runtime resource management policies • Automatic selection of the most efficient resources • Adding one new axis: accuracy! • Heuristics, ML-based and hyper-heuristic RM policies (algorithms) • Single-node: selection of accelerators (allocation), DVFS settings • Multiple nodes (Global RM of MANGO) • Integrated with DeepHealth SW stack • MANGO API + COMPS + Slurm
  • 19.
    19 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Data Parallelism • Training batch distribution • Gradient collection and weights distribution • AllReduce, Broadcast support to be exploited • Different strategies will be implemented and evaluated • Synchronization primitives (relaxed models) CPU GPU Interconnect CPU GPU CPU FPGA FPGA GPU High Pressure on the Interconnect
  • 20.
    20 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Netlist partitioning (CEA) • Use a multi-FPGA platform as a single virtual large FPGA • For very large inference networks that do not fit into a single FPGA • Direct IO-to-IO connection between FPGAs • Optimized partitioning of the netlist into several netlists • Combinatiorial optimization model, taking into account critical paths & resource quantities in each FPGA • Several state-of-the-art optimization methods, from Kernighan-Lin to simulated annealing • Execution of the design on the multi-FPGA platform • Multiplexing of signals to deal with the limited interconnection
  • 21.
    21 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Heterogeneous Computing • DeepLearning and Computer Vision kernels to be deployed for • CPU • Math processing routines (MKL, Eigen) • GPU • CUDA vs OpenCL programming • FPGA • OpenCL vs HLS vs RTL programming • Intel/Altera vs Xilinx platforms
  • 22.
    22 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. HPC Things to Explore in DeepHealth • Communication impact • Will the network become the bottleneck? • Use cases sizes • Accuracy vs performance trade-off • FPGA suitability for Training (Floating point precision requirement) • Will be energy efficient for such large challenge? • Which FPGA devices will perform better (accuracy vs. energy trade-off) • Scalability of the solution (EDDL/ECVL) • Will perform well on any end-used HPC-like platform? • … so, ahead a challenging future for DeepHealth HPC teams!
  • 23.
    23 The project hasreceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. José Flich (jflich@disca.upv.es) Mónica Caballero (monica.caballero.galeote@everis.com) Thank you!