SlideShare a Scribd company logo
1 of 81
Download to read offline
DReAMS
High Performance Reconfigurable Computing at
NECSTLab
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
Lorenzo Di Tucci, Marco Rabozzi, Marco D. Santambrogio
05/31/2018
MSR, Mountain View - CA
1
{lorenzo.ditucci, marco.rabozzi, marco.santambrogio}@polimi.it
2NECSTLab @ PoliMi
NECSTLab is a laboratory inside DEIB department of Politecnico di
Milano (Dipartimento di Elettronica, Informazione e Bioingegneria).
3NECSTLab in a nutshell
4NECSTLab in a nutshell
5System Security Research Line
• MaTA
Malware and Threat Analysis
• FraudSec
Frauds Analysis and Detection
• MoSec
Mobile Security
• CyPhy
Security of Cyber-physical systems
Prof. S. Zanero
stefano.zanero@polimi.it
6
Computer Architecture
Research Line
DReAMS
• To discover the world of FPGA-based systems
• Design and implementation of reconfigurable computing:
from architectural aspect to CAD development
• How to use CS to “speedup/improve” biomed applications
ORCA
• Unleashed computer architecture and operating systems
• From embedded to HPC computing systems, focusing on
computer architectures, OS and monitoring infrastructures.
STeEL
• To make smart ambient coming true!
• On how to make heterogeneous components to coexist to
improve quality of life and comfort while minimizing power
and energy consumption
• Emotional and Physical Comfort
• Biometric Human recognition
Prof. D. Sciuto
donatella.sciuto@polimi.it
Prof. M. Santambrogio
marco.santambrogio@polimi.it
7
DReAMS
• To discover the world of FPGA-based systems
• Design and implementation of reconfigurable computing:
from architectural aspect to CAD development
• How to use CS to “speedup/improve” biomed applications
ORCA
• Unleashed computer architecture and operating systems
• From embedded to HPC computing systems, focusing on
computer architectures, OS and monitoring infrastructures.
STeEL
• To make smart ambient coming true!
• On how to make heterogeneous components to coexist to
improve quality of life and comfort while minimizing power
and energy consumption
• Emotional and Physical Comfort
• Biometric Human recognition
Prof. D. Sciuto
donatella.sciuto@polimi.it
Prof. M. Santambrogio
marco.santambrogio@polimi.it
Computer Architecture
Research Line
8DReAMS
9Hardware for hUman Genomics
10Genomic research
Recent advancements in genomic research allow to perform multiple analysis on
DNA affecting different fields
However, in order to extract biological meaning from secondary genome analysis a
complex process has to be performed
11Genome sequencing
Given a biological sample, genome sequencing is the process of determining the
precise order of nucleotides within a DNA molecule
This process produces short DNA fragments which need to be assembled to
reconstruct the original sequence
ACGTAGCTCGGACCATAGCA
CCGCCGTAGCTCGGACCATAGCACATG
AGTTTTGGGGGACCATAGCACATGGACACATGC
GGACCATAGCACATGGACACATGC
GGTCAAAAATAGCACATGGACACATGC
ATTGTATCGGACCATATTGCTTAGCATGTATTTGC
CATGGACACATGC
CGTAACCATAGCACATGGACACATGC
TTTTAGGTAATTGCCATAGCACATGGACACAT
12Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
Reference-based assembly
ACGTAGCTCGGACCATAGCA
GGACCATAGCACATGGACACATGC
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
CATGGACACATGC
13Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
De novo assembly
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
Applications are limited to species with available reference genomes
14De novo assembly
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
15De novo assembly
Issue:
• General purpose architectures are inefficient
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
16De novo assembly
Issue:
• General purpose architectures are inefficient
Solution:
• In such scenario, hardware accelerators proved to be effective in optimizing the
performance over power consumption ratio
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
17De novo assembly
Issue:
• General purpose architectures are inefficient
Solution:
• In such scenario, hardware accelerators proved to be effective in optimizing the
performance over power consumption ratio
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
18Hardware architectures
Learning curve for multiple architectures
19Objective
An advanced support to genomic research exploiting
heterogeneous hardware architectures
20Genomics Hardware Pipeline
PIPELINE
CREATION
DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
HUG has exactly
what I need!
YESLINE
21Scientific Data Visualization
22Scientific Data Visualization
23Scientific Data Visualization
24Scientific Data Visualization
25HUG Today
[1]
[2]
[3]
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops
(IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
26Smith-Waterman
● Dynamic programming algorithm
● Perform local sequence alignment between two strings (DNA, RNA..)
● Guaranteed to find the optimal local alignment with regards to the
scoring system used
• In order to increase system performance, the state of the art is full of
implementation based on heuristics
• Highly compute intensive
Speedup
in computation
Decrease in algorithm
precision
27Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
28Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
29Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
HUG Today
[1]
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[2]
Sequence Alignment via Smith-Waterman Algorithm [2]
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
30
31TiREX
Regular
Expression
Compiler
1 & ACGT
2 JIM offset
3 (
4 |)* AC
5 & TT
Instruction Set
ACGTCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGACGTCGGGGCGTGCAAATGC
CCCGTGCGATTTGCGTGACGTCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGACG
TCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGCGTGCGATTTGCGTGACGTCGGG
GCGTGCAAACGTGCGATTTGCGTGACGTCGGGGCGTGCAAAGCTCGATCGATCGATC
GA…
Data
Match results
Flexible Instruction Set Architecture (ISA) representing the regular
expression
Customized processor for pattern matching
ACGT(A|C)*TT
32Pattern Matching via TiREX
Regular Expression Flex* 16-core†
(VC707) Speedup
ACCGTGGA 271 µs 2.07 µs 130.90X
(TTT)+CT 121 µs 4.54 µs 26.65X
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27X
* running on a Intel i7 with a peak frequency of 2.8GHz
†
running at 130 MHz
33HUG Today
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
Pattern Matching to identify gene motifs during Gene Annotation
[1]
[2]
[3]
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops
(IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
34NOMICA
35NOMICA
36HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops
(IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[1]
[2]
PairHMM for gene prediction/finding in the Gene Annotation phase
[3]
37Preliminary Analysis
38Preliminary Analysis
39Preliminary Analysis
40Roofline Model
Performance model that depicts the relation between attainable
performance and operational intensity
41Roofline Model for FPGAs
• Estimate performance before implementing the kernel
• Compare performance on different FPGA boards
• FPGAs have no fixed architecture
- It needs to be generated for each kernel
- and each target FPGA board
Proposed solution: automatic tool to generate Roofline models
42Roofline Model for FPGAs
• Based on the Vivado Design Suite
• Given High Level Code (now RTL too), it generates bitstream for the
target board
• Resembles GPU design flow
Host
OpenCL runtime & APIs
Accelerator
C, C++, OpenCL, RTL
PCIe
43Genomics Hardware Pipeline
PIPELINE
CREATION
DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
HUG has exactly
what I need!
YESLINE
44
Genomics Hardware Pipeline
on the AWS cloud
45Custom Code Integration
HETEROGENEOUS ARCHITECTURE
HUG
I’d like to integrate
my own algorithm
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
46Genomics Hardware Pipeline
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
47Genomics Hardware Pipeline
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
The CAOS framework
● guide the application designer in the implementation of
efficient hardware-software solutions for high performance
FPGA-based systems
● promote open research by allowing external researchers to
easily integrate and benchmark their algorithms
48
The proposed CAOS framework
…
49
The proposed CAOS framework
…
50
…
…
CAOS infrastructure 51
…
…
CAOS infrastructure 52
CAOS architectural templates
• SST (Single Streaming Timestep)
– Scientific applications with a regular execution flow in which
– a N-dimensional data structure is updated over time
– Example: Heat flow simulation, Gauss-Seidel linear systems solver
• Dataflow
– Applications with a static compute graph where the same
computation is repeated for multiple input items
– Example: Asian Option Pricing, Image processing filters
• Master / Slave
– More general applications with a regular execution flow whose
working set can be effectively tiled
– Example: N-Body Physics Simulation
53
CAOS architectural templates
• SST (Single Streaming Timestep)
– Scientific applications with a regular execution flow in which
– a N-dimensional data structure is updated over time
– Example: Heat flow simulation, Gauss-Seidel linear systems solver
• Dataflow
– Applications with a static compute graph where the same
computation is repeated for multiple input items
– Example: Asian Option Pricing, Image processing filters
• Master / Slave
– More general applications with a regular execution flow whose
working set can be effectively tiled
– Example: N-Body Physics Simulation
54
• Iterative Stencil Loop (ISL) Algorithm
• FPGA-based Stencil Time-Step (SST) Architecture [1]
SST background 55
Design Space Exploration 56
• Problem:
– Identify an optimal implementation of
an SST-based design on a target
FPGA
Design Space Exploration 57
• Problem:
– Identify an optimal implementation of
an SST-based design on a target
FPGA
• Proposed Approach[1]
- leverage floorplanning to:
– Determine the maximum number of SST that can be instantiated
– Optimize SSTs placement to ease timing closure
Experimental result 58
CAOS architectural templates
• SST (Single Streaming Timestep)
– Scientific applications with a regular execution flow in which
– a N-dimensional data structure is updated over time
– Example: Heat flow simulation, Gauss-Seidel linear systems solver
• Dataflow
– Applications with a static compute graph where the same
computation is repeated for multiple input items
– Example: Asian Option Pricing, Image processing filters
• Master / Slave
– More general applications with a regular execution flow whose
working set can be effectively tiled
– Example: N-Body Physics Simulation
59
void foo(type_1* in_1, type_2* in_2, scalar_type_1* v1, type_1* out_1){
for(int i = offs; i < I_SIZE; i++){
S1: …statements…
for(int j = …; j < 15; j++){
S2: …statements…
}
S3: …statements…
for(int j = … ){
S4: …statements…
}
}
for(int i = offs_3; … ){
S5: …statements…
}
}
Supported code & dataflow IROUTERSTREAMINGLOOPS 60
void foo(type_1* in_1, type_2* in_2, scalar_type_1* v1, type_1* out_1){
for(int i = offs; i < I_SIZE; i++){
S1: …statements…
for(int j = …; j < 15; j++){
S2: …statements…
}
S3: …statements…
for(int j = … ){
S4: …statements…
}
}
for(int i = offs_3; … ){
S5: …statements…
}
}
Supported code & dataflow IR
NESTED
LOOP
NODES
LOOP CARRIED
DEPENDENCY ARC
OUTERSTREAMINGLOOPS 61
Design optimizations
•
•
62
Which optimization to apply? 63
Which optimization to apply?
•
•
64
Which optimization to apply?
•
•
65
66Asian Option Pricing
67CAOS experimental results
30 15.83 1.0
15 31.22 1.0
10 45.95 0.1
8 56.91 1.0
6 74.35 0.1
5 88.10 0.1
39.06 37.65 21.86 37.15
55.08 42.09 24.47 40.71
29.30 41.50 31.55 52.32
87.11 54.30 29.60 47.73
41.02 59.13 39.51 64.31
46.88 63.96 43.31 70.09
•
•
68
CAOS vs
Hand-tuned implementation
1.78s 1.25s
Testbench parameters:
• Averaging points window: 30
• Asian Options in portfolio: 10000
• Market scenarios: 5000
69
CAOS vs
Hand-tuned implementation
1.78s 1.25s
Testbench parameters:
• Averaging points window: 30
• Asian Options in portfolio: 10000
• Market scenarios: 5000
CAOS architectural templates
• SST (Single Streaming Timestep)
– Scientific applications with a regular execution flow in which
– a N-dimensional data structure is updated over time
– Example: Heat flow simulation, Gauss-Seidel linear systems solver
• Dataflow
– Applications with a static compute graph where the same
computation is repeated for multiple input items
– Example: Asian Option Pricing, Image processing filters
• Master / Slave
– More general applications with a regular execution flow whose
working set can be effectively tiled
– Example: N-Body Physics Simulation
70
Master / Slave background
• Support a larger class of source codes, ideally, close
to the same set of codes supported by High Level
Synthesis tools (e.g. Vivado HLS)
• Tiled computation:
– Application’s memory transferred in
blocks from DDR to local FPGA
memory
– Computation performed on a block of
data at a time
71
Master / Slave
architectural template
• Current Implementation
– Relies on Vivado HLS for resource and performance estimation
– Explore potential optimizations by testing HLS directives
(e.g. #pragma HLS UNROLL, #pragma HLS PIPELINE, …)
– Quite accurate but slow (HLS synthesis time may be high)
• On going work:
– Leverage and extend the classic roofline model for CPU to
predict performance and potential optimizations
– Potentially less accurate but faster, allowing to explore a larger
optimization space
72
Adapting the roofline
model for FPGA
PROBLEM:
• Theoretical peak performance dependent on the types of
operations performed, their bitwidth and technology mapping
IDEA:
• Estimate peak performance for each combination of:
– Target device
– Operation type
– Operands bitwidth
– Target frequency
• Analyze the source code and derive a unified roofline according to
the operations count within the code
73
Roofline model for FPGA 74
Instructions count
from static code
analysis
75Roofline model for FPGA
HW operators for ideal
performance
(within resource constraints)
76Roofline model for FPGA
Estimated HW
resources
breakdown
77Roofline model for FPGA
Peak performance
(Ops / sec)
with fully pipelined
operators
@ target frequency
78Roofline model for FPGA
Operational intensity
derived from static
code analysis
79Roofline model for FPGA
80We presented DReAMS
81We presented DReAMS
Thanks!
Lorenzo Di Tucci, Marco Rabozzi, Marco D. Santambrogio
{lorenzo.ditucci, marco.rabozzi, marco.santambrogio}@polimi.it

More Related Content

What's hot

Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...Wesley De Neve
 
Shallow introduction for Deep Learning Retinal Image Analysis
Shallow introduction for Deep Learning Retinal Image AnalysisShallow introduction for Deep Learning Retinal Image Analysis
Shallow introduction for Deep Learning Retinal Image AnalysisPetteriTeikariPhD
 
Towards Reliable AI-Powered Vision for Autonomous Systems
Towards Reliable AI-Powered Vision for Autonomous Systems Towards Reliable AI-Powered Vision for Autonomous Systems
Towards Reliable AI-Powered Vision for Autonomous Systems KTN
 
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET Journal
 
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Techsylvania
 
Hybrid Artificial Intelligence System for Cyber Security
Hybrid Artificial Intelligence System for Cyber SecurityHybrid Artificial Intelligence System for Cyber Security
Hybrid Artificial Intelligence System for Cyber SecurityKonstantinos Demertzis
 
OCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep LearningOCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep LearningPetteriTeikariPhD
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Lviv Startup Club
 
Structural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 yearsStructural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 yearsAlexandreBonvin2
 
Multispectral Purkinje Imaging
 Multispectral Purkinje Imaging Multispectral Purkinje Imaging
Multispectral Purkinje ImagingPetteriTeikariPhD
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSgeetachauhan
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,..."Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...Edge AI and Vision Alliance
 
Design of lighting systems for animal experiments
Design of lighting systems for animal experimentsDesign of lighting systems for animal experiments
Design of lighting systems for animal experimentsPetteriTeikariPhD
 
Lighting design for Startup Offices
Lighting design for Startup OfficesLighting design for Startup Offices
Lighting design for Startup OfficesPetteriTeikariPhD
 
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND ITS IMPL...
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND  ITS IMPL...A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND  ITS IMPL...
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND ITS IMPL...Nexgen Technology
 

What's hot (17)

Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...
 
Shallow introduction for Deep Learning Retinal Image Analysis
Shallow introduction for Deep Learning Retinal Image AnalysisShallow introduction for Deep Learning Retinal Image Analysis
Shallow introduction for Deep Learning Retinal Image Analysis
 
Towards Reliable AI-Powered Vision for Autonomous Systems
Towards Reliable AI-Powered Vision for Autonomous Systems Towards Reliable AI-Powered Vision for Autonomous Systems
Towards Reliable AI-Powered Vision for Autonomous Systems
 
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
 
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
 
Hybrid Artificial Intelligence System for Cyber Security
Hybrid Artificial Intelligence System for Cyber SecurityHybrid Artificial Intelligence System for Cyber Security
Hybrid Artificial Intelligence System for Cyber Security
 
OCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep LearningOCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep Learning
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
 
Structural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 yearsStructural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 years
 
Multispectral Purkinje Imaging
 Multispectral Purkinje Imaging Multispectral Purkinje Imaging
Multispectral Purkinje Imaging
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCS
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,..."Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
 
Design of lighting systems for animal experiments
Design of lighting systems for animal experimentsDesign of lighting systems for animal experiments
Design of lighting systems for animal experiments
 
Lighting design for Startup Offices
Lighting design for Startup OfficesLighting design for Startup Offices
Lighting design for Startup Offices
 
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND ITS IMPL...
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND  ITS IMPL...A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND  ITS IMPL...
A COST-EFFECTIVE MINUTIAE DISK CODE FOR FINGERPRINT RECOGNITION AND ITS IMPL...
 

Similar to High Performance Reconfigurable Computing for Genomics Research

An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...NECST Lab @ Politecnico di Milano
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in spaceFacultad de Informática UCM
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
20594-39025-1-PB.pdf
20594-39025-1-PB.pdf20594-39025-1-PB.pdf
20594-39025-1-PB.pdfIjictTeam
 
Emmbeded .pdf
Emmbeded .pdfEmmbeded .pdf
Emmbeded .pdfLeviIuo
 
Introduction to SeqAn, an Open-source C++ Template Library
Introduction to SeqAn, an Open-source C++ Template LibraryIntroduction to SeqAn, an Open-source C++ Template Library
Introduction to SeqAn, an Open-source C++ Template LibraryCan Ozdoruk
 
1-bit semantic segmentation
1-bit semantic segmentation1-bit semantic segmentation
1-bit semantic segmentationJeonghoonKim30
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Mixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic CoreMixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic CoreIJERA Editor
 
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...IRJET Journal
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationTomasz Bednarz
 
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...IJECEIAES
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequencesClaudio Gallicchio
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
Hardware Implementation of Algorithm for Cryptanalysis
Hardware Implementation of Algorithm for CryptanalysisHardware Implementation of Algorithm for Cryptanalysis
Hardware Implementation of Algorithm for Cryptanalysisijcisjournal
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNREVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNIRJET Journal
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 

Similar to High Performance Reconfigurable Computing for Genomics Research (20)

An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
20594-39025-1-PB.pdf
20594-39025-1-PB.pdf20594-39025-1-PB.pdf
20594-39025-1-PB.pdf
 
Emmbeded .pdf
Emmbeded .pdfEmmbeded .pdf
Emmbeded .pdf
 
Introduction to SeqAn, an Open-source C++ Template Library
Introduction to SeqAn, an Open-source C++ Template LibraryIntroduction to SeqAn, an Open-source C++ Template Library
Introduction to SeqAn, an Open-source C++ Template Library
 
1-bit semantic segmentation
1-bit semantic segmentation1-bit semantic segmentation
1-bit semantic segmentation
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Mixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic CoreMixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic Core
 
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal ...
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 Presentation
 
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...
Revealing AES Encryption Device Key on 328P Microcontrollers with Differentia...
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Hardware Implementation of Algorithm for Cryptanalysis
Hardware Implementation of Algorithm for CryptanalysisHardware Implementation of Algorithm for Cryptanalysis
Hardware Implementation of Algorithm for Cryptanalysis
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNREVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNN
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

High Performance Reconfigurable Computing for Genomics Research

  • 1. DReAMS High Performance Reconfigurable Computing at NECSTLab Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Lorenzo Di Tucci, Marco Rabozzi, Marco D. Santambrogio 05/31/2018 MSR, Mountain View - CA 1 {lorenzo.ditucci, marco.rabozzi, marco.santambrogio}@polimi.it
  • 2. 2NECSTLab @ PoliMi NECSTLab is a laboratory inside DEIB department of Politecnico di Milano (Dipartimento di Elettronica, Informazione e Bioingegneria).
  • 3. 3NECSTLab in a nutshell
  • 4. 4NECSTLab in a nutshell
  • 5. 5System Security Research Line • MaTA Malware and Threat Analysis • FraudSec Frauds Analysis and Detection • MoSec Mobile Security • CyPhy Security of Cyber-physical systems Prof. S. Zanero stefano.zanero@polimi.it
  • 6. 6 Computer Architecture Research Line DReAMS • To discover the world of FPGA-based systems • Design and implementation of reconfigurable computing: from architectural aspect to CAD development • How to use CS to “speedup/improve” biomed applications ORCA • Unleashed computer architecture and operating systems • From embedded to HPC computing systems, focusing on computer architectures, OS and monitoring infrastructures. STeEL • To make smart ambient coming true! • On how to make heterogeneous components to coexist to improve quality of life and comfort while minimizing power and energy consumption • Emotional and Physical Comfort • Biometric Human recognition Prof. D. Sciuto donatella.sciuto@polimi.it Prof. M. Santambrogio marco.santambrogio@polimi.it
  • 7. 7 DReAMS • To discover the world of FPGA-based systems • Design and implementation of reconfigurable computing: from architectural aspect to CAD development • How to use CS to “speedup/improve” biomed applications ORCA • Unleashed computer architecture and operating systems • From embedded to HPC computing systems, focusing on computer architectures, OS and monitoring infrastructures. STeEL • To make smart ambient coming true! • On how to make heterogeneous components to coexist to improve quality of life and comfort while minimizing power and energy consumption • Emotional and Physical Comfort • Biometric Human recognition Prof. D. Sciuto donatella.sciuto@polimi.it Prof. M. Santambrogio marco.santambrogio@polimi.it Computer Architecture Research Line
  • 10. 10Genomic research Recent advancements in genomic research allow to perform multiple analysis on DNA affecting different fields However, in order to extract biological meaning from secondary genome analysis a complex process has to be performed
  • 11. 11Genome sequencing Given a biological sample, genome sequencing is the process of determining the precise order of nucleotides within a DNA molecule This process produces short DNA fragments which need to be assembled to reconstruct the original sequence ACGTAGCTCGGACCATAGCA CCGCCGTAGCTCGGACCATAGCACATG AGTTTTGGGGGACCATAGCACATGGACACATGC GGACCATAGCACATGGACACATGC GGTCAAAAATAGCACATGGACACATGC ATTGTATCGGACCATATTGCTTAGCATGTATTTGC CATGGACACATGC CGTAACCATAGCACATGGACACATGC TTTTAGGTAATTGCCATAGCACATGGACACAT
  • 12. 12Genome assembly Genome assembly: reconstruct a genome from a set of shorter reads Reference-based assembly ACGTAGCTCGGACCATAGCA GGACCATAGCACATGGACACATGC ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA CATGGACACATGC
  • 13. 13Genome assembly Genome assembly: reconstruct a genome from a set of shorter reads De novo assembly ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA Applications are limited to species with available reference genomes
  • 14. 14De novo assembly Genomics algorithms are usually: • compute-intensive • massive amount of data • fast-changing
  • 15. 15De novo assembly Issue: • General purpose architectures are inefficient Genomics algorithms are usually: • compute-intensive • massive amount of data • fast-changing
  • 16. 16De novo assembly Issue: • General purpose architectures are inefficient Solution: • In such scenario, hardware accelerators proved to be effective in optimizing the performance over power consumption ratio Genomics algorithms are usually: • compute-intensive • massive amount of data • fast-changing
  • 17. 17De novo assembly Issue: • General purpose architectures are inefficient Solution: • In such scenario, hardware accelerators proved to be effective in optimizing the performance over power consumption ratio Genomics algorithms are usually: • compute-intensive • massive amount of data • fast-changing
  • 18. 18Hardware architectures Learning curve for multiple architectures
  • 19. 19Objective An advanced support to genomic research exploiting heterogeneous hardware architectures
  • 20. 20Genomics Hardware Pipeline PIPELINE CREATION DATA UPLOAD PROCESSING DATA VISUALIZATION HUG has exactly what I need! YESLINE
  • 25. 25HUG Today [1] [2] [3] [1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017. [2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018. [3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
  • 26. 26Smith-Waterman ● Dynamic programming algorithm ● Perform local sequence alignment between two strings (DNA, RNA..) ● Guaranteed to find the optimal local alignment with regards to the scoring system used • In order to increase system performance, the state of the art is full of implementation based on heuristics • Highly compute intensive Speedup in computation Decrease in algorithm precision
  • 27. 27Smith-Waterman Platform Performance [GCUPS] Power Efficiency [GCUPS/W] AWS-VU9P (3 queries in parallel) 110.0 4.400 Tesla K20 45.0 0.200 ADM-PCIE-KU3 42.5 1.699 Nvidia GeForce GTX 295 30.0 0.104 Xtreme Data XD1000 25.6 0.430 Altera Stratix V on Nallatech PCIe-385 24.7 0.988 Nvidia GeForce GTX 295 16.1 0.056 ADM-PCIE-7V3 14.8 0.594 Dual-core Nvidia 9800 GX2 14.5 0.074 Nvidia GeForce GTX 280 9.7 0.041 Xtreme Data XD2000i 9.0 0.150 2XNvidia GeForce 8800 3.6 0.017
  • 28. 28Smith-Waterman Platform Performance [GCUPS] Power Efficiency [GCUPS/W] AWS-VU9P (3 queries in parallel) 110.0 4.400 Tesla K20 45.0 0.200 ADM-PCIE-KU3 42.5 1.699 Nvidia GeForce GTX 295 30.0 0.104 Xtreme Data XD1000 25.6 0.430 Altera Stratix V on Nallatech PCIe-385 24.7 0.988 Nvidia GeForce GTX 295 16.1 0.056 ADM-PCIE-7V3 14.8 0.594 Dual-core Nvidia 9800 GX2 14.5 0.074 Nvidia GeForce GTX 280 9.7 0.041 Xtreme Data XD2000i 9.0 0.150 2XNvidia GeForce 8800 3.6 0.017
  • 29. 29Smith-Waterman Platform Performance [GCUPS] Power Efficiency [GCUPS/W] AWS-VU9P (3 queries in parallel) 110.0 4.400 Tesla K20 45.0 0.200 ADM-PCIE-KU3 42.5 1.699 Nvidia GeForce GTX 295 30.0 0.104 Xtreme Data XD1000 25.6 0.430 Altera Stratix V on Nallatech PCIe-385 24.7 0.988 Nvidia GeForce GTX 295 16.1 0.056 ADM-PCIE-7V3 14.8 0.594 Dual-core Nvidia 9800 GX2 14.5 0.074 Nvidia GeForce GTX 280 9.7 0.041 Xtreme Data XD2000i 9.0 0.150 2XNvidia GeForce 8800 3.6 0.017
  • 30. HUG Today [1] RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION [2] Sequence Alignment via Smith-Waterman Algorithm [2] [1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017. [2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018. 30
  • 31. 31TiREX Regular Expression Compiler 1 & ACGT 2 JIM offset 3 ( 4 |)* AC 5 & TT Instruction Set ACGTCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGACGTCGGGGCGTGCAAATGC CCCGTGCGATTTGCGTGACGTCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGACG TCGGGGCGTGCAAATGCCCCGTGCGATTTGCGTGCGTGCGATTTGCGTGACGTCGGG GCGTGCAAACGTGCGATTTGCGTGACGTCGGGGCGTGCAAAGCTCGATCGATCGATC GA… Data Match results Flexible Instruction Set Architecture (ISA) representing the regular expression Customized processor for pattern matching ACGT(A|C)*TT
  • 32. 32Pattern Matching via TiREX Regular Expression Flex* 16-core† (VC707) Speedup ACCGTGGA 271 µs 2.07 µs 130.90X (TTT)+CT 121 µs 4.54 µs 26.65X (CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27X * running on a Intel i7 with a peak frequency of 2.8GHz † running at 130 MHz
  • 33. 33HUG Today RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION Pattern Matching to identify gene motifs during Gene Annotation [1] [2] [3] [1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017. [2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018. [3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
  • 36. 36HUG Today [1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017. [2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018. [3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018. RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION [1] [2] PairHMM for gene prediction/finding in the Gene Annotation phase [3]
  • 40. 40Roofline Model Performance model that depicts the relation between attainable performance and operational intensity
  • 41. 41Roofline Model for FPGAs • Estimate performance before implementing the kernel • Compare performance on different FPGA boards • FPGAs have no fixed architecture - It needs to be generated for each kernel - and each target FPGA board Proposed solution: automatic tool to generate Roofline models
  • 42. 42Roofline Model for FPGAs • Based on the Vivado Design Suite • Given High Level Code (now RTL too), it generates bitstream for the target board • Resembles GPU design flow Host OpenCL runtime & APIs Accelerator C, C++, OpenCL, RTL PCIe
  • 43. 43Genomics Hardware Pipeline PIPELINE CREATION DATA UPLOAD PROCESSING DATA VISUALIZATION HUG has exactly what I need! YESLINE
  • 45. 45Custom Code Integration HETEROGENEOUS ARCHITECTURE HUG I’d like to integrate my own algorithm RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
  • 46. 46Genomics Hardware Pipeline FAST PROTOTYPING CUSTOM HARDWARE ALGORITHM PIPELINE CREATION OR INTEGRATION DATA UPLOAD PROCESSING DATA VISUALIZATION YESLINE NOLINE Is the algorithm available on HUG?
  • 47. 47Genomics Hardware Pipeline FAST PROTOTYPING CUSTOM HARDWARE ALGORITHM PIPELINE CREATION OR INTEGRATION DATA UPLOAD PROCESSING DATA VISUALIZATION YESLINE NOLINE Is the algorithm available on HUG?
  • 48. The CAOS framework ● guide the application designer in the implementation of efficient hardware-software solutions for high performance FPGA-based systems ● promote open research by allowing external researchers to easily integrate and benchmark their algorithms 48
  • 49. The proposed CAOS framework … 49
  • 50. The proposed CAOS framework … 50
  • 53. CAOS architectural templates • SST (Single Streaming Timestep) – Scientific applications with a regular execution flow in which – a N-dimensional data structure is updated over time – Example: Heat flow simulation, Gauss-Seidel linear systems solver • Dataflow – Applications with a static compute graph where the same computation is repeated for multiple input items – Example: Asian Option Pricing, Image processing filters • Master / Slave – More general applications with a regular execution flow whose working set can be effectively tiled – Example: N-Body Physics Simulation 53
  • 54. CAOS architectural templates • SST (Single Streaming Timestep) – Scientific applications with a regular execution flow in which – a N-dimensional data structure is updated over time – Example: Heat flow simulation, Gauss-Seidel linear systems solver • Dataflow – Applications with a static compute graph where the same computation is repeated for multiple input items – Example: Asian Option Pricing, Image processing filters • Master / Slave – More general applications with a regular execution flow whose working set can be effectively tiled – Example: N-Body Physics Simulation 54
  • 55. • Iterative Stencil Loop (ISL) Algorithm • FPGA-based Stencil Time-Step (SST) Architecture [1] SST background 55
  • 56. Design Space Exploration 56 • Problem: – Identify an optimal implementation of an SST-based design on a target FPGA
  • 57. Design Space Exploration 57 • Problem: – Identify an optimal implementation of an SST-based design on a target FPGA • Proposed Approach[1] - leverage floorplanning to: – Determine the maximum number of SST that can be instantiated – Optimize SSTs placement to ease timing closure
  • 59. CAOS architectural templates • SST (Single Streaming Timestep) – Scientific applications with a regular execution flow in which – a N-dimensional data structure is updated over time – Example: Heat flow simulation, Gauss-Seidel linear systems solver • Dataflow – Applications with a static compute graph where the same computation is repeated for multiple input items – Example: Asian Option Pricing, Image processing filters • Master / Slave – More general applications with a regular execution flow whose working set can be effectively tiled – Example: N-Body Physics Simulation 59
  • 60. void foo(type_1* in_1, type_2* in_2, scalar_type_1* v1, type_1* out_1){ for(int i = offs; i < I_SIZE; i++){ S1: …statements… for(int j = …; j < 15; j++){ S2: …statements… } S3: …statements… for(int j = … ){ S4: …statements… } } for(int i = offs_3; … ){ S5: …statements… } } Supported code & dataflow IROUTERSTREAMINGLOOPS 60
  • 61. void foo(type_1* in_1, type_2* in_2, scalar_type_1* v1, type_1* out_1){ for(int i = offs; i < I_SIZE; i++){ S1: …statements… for(int j = …; j < 15; j++){ S2: …statements… } S3: …statements… for(int j = … ){ S4: …statements… } } for(int i = offs_3; … ){ S5: …statements… } } Supported code & dataflow IR NESTED LOOP NODES LOOP CARRIED DEPENDENCY ARC OUTERSTREAMINGLOOPS 61
  • 64. Which optimization to apply? • • 64
  • 65. Which optimization to apply? • • 65
  • 67. 67CAOS experimental results 30 15.83 1.0 15 31.22 1.0 10 45.95 0.1 8 56.91 1.0 6 74.35 0.1 5 88.10 0.1 39.06 37.65 21.86 37.15 55.08 42.09 24.47 40.71 29.30 41.50 31.55 52.32 87.11 54.30 29.60 47.73 41.02 59.13 39.51 64.31 46.88 63.96 43.31 70.09 • •
  • 68. 68 CAOS vs Hand-tuned implementation 1.78s 1.25s Testbench parameters: • Averaging points window: 30 • Asian Options in portfolio: 10000 • Market scenarios: 5000
  • 69. 69 CAOS vs Hand-tuned implementation 1.78s 1.25s Testbench parameters: • Averaging points window: 30 • Asian Options in portfolio: 10000 • Market scenarios: 5000
  • 70. CAOS architectural templates • SST (Single Streaming Timestep) – Scientific applications with a regular execution flow in which – a N-dimensional data structure is updated over time – Example: Heat flow simulation, Gauss-Seidel linear systems solver • Dataflow – Applications with a static compute graph where the same computation is repeated for multiple input items – Example: Asian Option Pricing, Image processing filters • Master / Slave – More general applications with a regular execution flow whose working set can be effectively tiled – Example: N-Body Physics Simulation 70
  • 71. Master / Slave background • Support a larger class of source codes, ideally, close to the same set of codes supported by High Level Synthesis tools (e.g. Vivado HLS) • Tiled computation: – Application’s memory transferred in blocks from DDR to local FPGA memory – Computation performed on a block of data at a time 71
  • 72. Master / Slave architectural template • Current Implementation – Relies on Vivado HLS for resource and performance estimation – Explore potential optimizations by testing HLS directives (e.g. #pragma HLS UNROLL, #pragma HLS PIPELINE, …) – Quite accurate but slow (HLS synthesis time may be high) • On going work: – Leverage and extend the classic roofline model for CPU to predict performance and potential optimizations – Potentially less accurate but faster, allowing to explore a larger optimization space 72
  • 73. Adapting the roofline model for FPGA PROBLEM: • Theoretical peak performance dependent on the types of operations performed, their bitwidth and technology mapping IDEA: • Estimate peak performance for each combination of: – Target device – Operation type – Operands bitwidth – Target frequency • Analyze the source code and derive a unified roofline according to the operations count within the code 73
  • 75. Instructions count from static code analysis 75Roofline model for FPGA
  • 76. HW operators for ideal performance (within resource constraints) 76Roofline model for FPGA
  • 78. Peak performance (Ops / sec) with fully pipelined operators @ target frequency 78Roofline model for FPGA
  • 79. Operational intensity derived from static code analysis 79Roofline model for FPGA
  • 81. 81We presented DReAMS Thanks! Lorenzo Di Tucci, Marco Rabozzi, Marco D. Santambrogio {lorenzo.ditucci, marco.rabozzi, marco.santambrogio}@polimi.it