SlideShare a Scribd company logo
POLITECNICO DI MILANO
Master of Science in Engineering of Computing Systems
Department of Electronics, Informatics and Bioengineering
Master thesis of:
Alberto Zeni - 884540
A Methodology for Automatic GPU Kernel
Optimization
Advisor:
Ing. Marco D. Santambrogio
Co-advisor:
Dott. Ing. Lorenzo Di Tucci
December 18th 2019
Room 5.0.2 - Politecnico di Milano
Context Definition
2
1000x
by
2025
40 Years of Microprocessor Trend Data
____________________________________________________
1980 1990 2000 2010 2020
107
106
105
104
103
102
Context Definition
3
Thesis Contributions
4
● We propose a methodology that guides the user
to develop highly optimized GPU kernels
● We demonstrate the usefulness of our
methodology by implementing it into a semi
automatic tool for kernel optimization
● We show the results of the application of our
methodology on two highly computationally
intensive algorithms
Methodology application
5
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Methodology application
6
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Roofline Generator
7
Roofline Model Adaptation
8
● Model built on the characteristics of the GPU and
algorithm executed and independent to the
algorithm implementation
= number of iterations
= gpu cores frequency
= number of operations to be computed
at iteration i
= number of blocks
= number of scheduled threads per block
= number of integer cores
=
= number of streaming multiprocessors
= maximum number of blocks per streaming
multiprocessor
=
Methodology application
9
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Methodology application
10
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Source Code Parser
11
● Automatically unrolls loops if possible
● Automatically changes the memory hierarchy
● Automatically changes the number of scheduled
threads
● Automatically changes the number of scheduled
blocks
● Creates a report of the optimizations that can be
applied manually
Methodology application
12
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Methodology application
13
Unoptimized
source code
Roofline
Generator
Roofline and
Performance
Analyzer
Optimized
source code
Optimization Flow
Compiler
Optimizer
Source Code
Parser
Smith-Waterman Algorithm
14
● Optimal algorithm for
local sequence
alignment
● Execution times scale
up to the length of the
aligned sequences
A
G
G
G
T
C
A
A
0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1
0 0 0 0 0 0 2 1 0
0 0 0 0 0 0 1 3 1
0 0 0 0 0 0 1 2 2
0 0 0 0 1 0 0 0 1
0 0 1 1 0 0 0 0 0
0 1 0 0 0 1 0 0 1
0 1 0 0 0 1 0 0 1
A C C T A G G A
X-Drop Algorithm
15
X-Drop Algorithm
0 -1 -2 -∞
-1 1 -1 -∞
-2 -1 0 -∞
-∞ -∞ -∞
A C C T A G G A
A
G
G
G
T
C
A
A
● X-Drop termination
offers a great tradeoff
between speed and
accuracy results
● Execution times scale
up to the length of the
sequences
● Very efficient if the two
sequences do not align
GPU implemented optimizations
16
● The two algorithms follow the same computational pattern
● We started with a simple implementation of the algorithms
using a single thread and a single block
● We followed our methodology with the help of our tool to
optimize the algorithms at different levels and introduce
Inter and Intra Parallelism
1 -1 -3 -4
-1 0 -2
-3
A C G G
A
T
T
C
A C G G
A
T
T
C
Intra Level Parallelism
17
● Parallel computation of
the anti-diagonals
● Each GPU thread is
assigned to compute a
single cell as our
methodology suggested
● Anti-diagonals split in
different segments to
align sequences of any
length
Inter Level parallelism
18
● Parallel execution of
the alignments with
multiple blocks
● Each block has an
alignment assigned
GPU memory optimizations
19
● To ensure coalesced memory access one of the
sequences is stored backwards on the GPU
20
Evaluation Settings
Benchmarked Applications:
● SeqAn: State of the art library that includes an highly
optimized version of X-Drop), run on 176 threads
● ksw2: State of the art CPU SIMD implementation of
Z-drop, run on 80 threads
● Bowtie2: State of the art Smith-Waterman
implementation, run on 64 threads
● CUDASW++ 3.0: State of the art Smith-Waterman GPU
+ CPU SIMD implementation
Evaluation Settings
21
Platforms:
● Intel Haswell Nodes: 2 Intel® Xeon™ E5-2698
Processors (64 threads) with 128 GB of RAM
● Intel Skylake Nodes: 2 Intel® Xeon™ 6148 Gold
Processors (80 threads) with 384 GB of RAM
● IBM Power 9 Nodes: 2 IBM Power 9 Processors (168
threads) with 512 GB of RAM
● GPU: NVIDIA V100 ‘Tesla’ GPUs with 16GB of HBM2
memory
Smith-Waterman Unoptimized
Roofline
22
Smith-Waterman Optimized
Roofline
23
Smith-Waterman Comparison
24
34x
3x
1x
11x
1x
X-drop Unoptimized Roofline
25
X-drop Optimized Roofline
26
X-drop GPU and SeqAn Comparison
27
2x
6x
X-drop GPU and ksw2 Comparison
28
1.5x
120x
Conclusions
29
A methodology for automatic GPU Kernel Optimization
and its implementation inside a tool for automatic kernel
optimization
We applied our methodology to two highly computational
intensive algorithms
Optimized GPU X-drop Implementation with:
● More than 6.6x speed-up with respect to SeqAn
● More than 120x speed-up with respect to ksw2
Optimized GPU Smith-Waterman Implementation with:
● More than 34x speed-up with respect to Bowtie2
● More than 3x speed-up with respect to CUDASW++ 3.0
Thank you for your attention
Master thesis of:
Alberto Zeni - 884540
A Methodology for Automatic GPU Kernel
Optimization
Advisor:
Ing. Marco D. Santambrogio
Co-advisor:
Dott. Ing. Lorenzo Di Tucci
December 18th 2019
Room - Politecnico di Milano
Algorithm Computation
31
● Computation Flow of the Smith-Waterman and
X-drop algorithms
Smith-Waterman Algorithm
32
● Optimal algorithm for
local sequence
alignment
● Execution times scale
up to the length of the
aligned sequences
A
G
G
G
T
C
A
A
A C C T A G G A
Intra Level Parallelism
33
● Parallel reduction to find
the max of the
antidiagonal using warp
instructions
-1 2 5 0 -1 -2 3 1
-1 2 5 1
5 2
5
Berkeley Roofline Model
34
Methodology Overview
35
Unoptimized
source code
Plot Roofline
and collect
performance
metrics
Analyze Kernel
Performance
and generate
optimizations
Apply
optimizations
to the Kernel
Optimized
source code
Analysis Flow
Methodology Overview
36
Unoptimized
source code
Plot Roofline
and collect
performance
metrics
Analyze Kernel
Performance
and generate
optimizations
Apply
optimizations
to the Kernel
Optimized
source code
Analysis Flow
Methodology Overview
37
Unoptimized
source code
Plot Roofline
and collect
performance
metrics
Analyze Kernel
Performance
and generate
optimizations
Apply
optimizations
to the Kernel
Optimized
source code
Analysis Flow
Methodology Overview
38
Unoptimized
source code
Plot Roofline
and collect
performance
metrics
Analyze Kernel
Performance
and generate
optimizations
Apply
optimizations
to the Kernel
Optimized
source code
Analysis Flow
39
GPU Architecture
● Thousands of cores that operate in a SIMD fashion,
with respect to multiple sequential cores
GPU Programming Model
40
Host
Machine
Kernel
Grid
Block
(0,0)
Block
(0,1)
Block
(0,2)
Block
(0,3)
Block
(0,4)
Block
(0,5)
Block
(0,6)
Block
(1,0)
Block
(1,1)
Block
(1,2)
Block
(1,3)
Block
(1,4)
Block
(1,5)
Block
(1,6)
Thread
(0,2)
Thread
(0,1)
Thread
(1,0)
Thread
(1,2)
Thread
(1,1)
Thread
(2,0)
Thread
(2,2)
Thread
(2,1)
Thread
(3,0)
Thread
(3,2)
Thread
(3,1)
Thread
(4,0)
Thread
(4,2)
Thread
(4,1)
Thread
(0,0)
GPU
X-Drop Algorithm
41
● Optimized to follow the alignment of the two
sequences and stop when the two do not align

More Related Content

What's hot

Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects
E2MATRIX
 
Lalit Singh FPGA resume
Lalit Singh FPGA resumeLalit Singh FPGA resume
Lalit Singh FPGA resume
Lalit singh
 
Pic24HJ256GP210 course 2015 09-23
Pic24HJ256GP210 course 2015 09-23Pic24HJ256GP210 course 2015 09-23
Pic24HJ256GP210 course 2015 09-23
Prajin Palangsantikul
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
RISC-V International
 
Implementation of Low Power and High Speed encryption Using Crypto-Hardware
Implementation of Low Power and High Speed encryption Using Crypto-HardwareImplementation of Low Power and High Speed encryption Using Crypto-Hardware
Implementation of Low Power and High Speed encryption Using Crypto-Hardware
IJMER
 
FPGA implementation of an Adaptive Noise Canceller (ANC)
FPGA implementation of an Adaptive Noise Canceller (ANC)FPGA implementation of an Adaptive Noise Canceller (ANC)
FPGA implementation of an Adaptive Noise Canceller (ANC)
Hocine Merabti
 

What's hot (6)

Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects
 
Lalit Singh FPGA resume
Lalit Singh FPGA resumeLalit Singh FPGA resume
Lalit Singh FPGA resume
 
Pic24HJ256GP210 course 2015 09-23
Pic24HJ256GP210 course 2015 09-23Pic24HJ256GP210 course 2015 09-23
Pic24HJ256GP210 course 2015 09-23
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Implementation of Low Power and High Speed encryption Using Crypto-Hardware
Implementation of Low Power and High Speed encryption Using Crypto-HardwareImplementation of Low Power and High Speed encryption Using Crypto-Hardware
Implementation of Low Power and High Speed encryption Using Crypto-Hardware
 
FPGA implementation of an Adaptive Noise Canceller (ANC)
FPGA implementation of an Adaptive Noise Canceller (ANC)FPGA implementation of an Adaptive Noise Canceller (ANC)
FPGA implementation of an Adaptive Noise Canceller (ANC)
 

Similar to A Methodology for Automatic GPU Kernel Optimization

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
NECST Lab @ Politecnico di Milano
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
iguazio
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
NECST Lab @ Politecnico di Milano
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
NECST Lab @ Politecnico di Milano
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdfDesign_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
ssuser1e1bab
 
Cadancesimulation
CadancesimulationCadancesimulation
Cadancesimulation
Gautham Reddy
 
3 Axis Drawing Machine
3 Axis Drawing Machine3 Axis Drawing Machine
3 Axis Drawing Machine
IRJET Journal
 
Implementation of PID Controller PWM Module on FPGA
Implementation of PID Controller PWM Module on FPGAImplementation of PID Controller PWM Module on FPGA
Implementation of PID Controller PWM Module on FPGA
ijtsrd
 
poster_final
poster_finalposter_final
poster_final
Timur Vekslender
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
Ti DSP optimization on Jacinto
Ti DSP optimization on JacintoTi DSP optimization on Jacinto
Ti DSP optimization on Jacinto
Hank (Tai-Chi) Wang
 
IRJET- Design and Implementation of PID Controller using HDL on FPGA
IRJET- 	  Design and Implementation of PID Controller using HDL on FPGAIRJET- 	  Design and Implementation of PID Controller using HDL on FPGA
IRJET- Design and Implementation of PID Controller using HDL on FPGA
IRJET Journal
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
inside-BigData.com
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
Vajira Thambawita
 
C041221821
C041221821C041221821
C041221821
IOSR-JEN
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML models
Databricks
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
Sundance Multiprocessor Technology Ltd.
 
Robotic Platform for Appearance Editing
Robotic Platform for Appearance EditingRobotic Platform for Appearance Editing
Robotic Platform for Appearance Editing
Tharindu Mathew
 

Similar to A Methodology for Automatic GPU Kernel Optimization (20)

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdfDesign_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
Design_amp_analysis_of_16_bit_RISC_processor_using_low_power_pipelining.pdf
 
Cadancesimulation
CadancesimulationCadancesimulation
Cadancesimulation
 
3 Axis Drawing Machine
3 Axis Drawing Machine3 Axis Drawing Machine
3 Axis Drawing Machine
 
Implementation of PID Controller PWM Module on FPGA
Implementation of PID Controller PWM Module on FPGAImplementation of PID Controller PWM Module on FPGA
Implementation of PID Controller PWM Module on FPGA
 
poster_final
poster_finalposter_final
poster_final
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Ti DSP optimization on Jacinto
Ti DSP optimization on JacintoTi DSP optimization on Jacinto
Ti DSP optimization on Jacinto
 
IRJET- Design and Implementation of PID Controller using HDL on FPGA
IRJET- 	  Design and Implementation of PID Controller using HDL on FPGAIRJET- 	  Design and Implementation of PID Controller using HDL on FPGA
IRJET- Design and Implementation of PID Controller using HDL on FPGA
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
C041221821
C041221821C041221821
C041221821
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML models
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Robotic Platform for Appearance Editing
Robotic Platform for Appearance EditingRobotic Platform for Appearance Editing
Robotic Platform for Appearance Editing
 

More from NECST Lab @ Politecnico di Milano

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
NECST Lab @ Politecnico di Milano
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
NECST Lab @ Politecnico di Milano
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
NECST Lab @ Politecnico di Milano
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
NECST Lab @ Politecnico di Milano
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
NECST Lab @ Politecnico di Milano
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
NECST Lab @ Politecnico di Milano
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
NECST Lab @ Politecnico di Milano
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
NECST Lab @ Politecnico di Milano
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
NECST Lab @ Politecnico di Milano
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
NECST Lab @ Politecnico di Milano
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
NECST Lab @ Politecnico di Milano
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
NECST Lab @ Politecnico di Milano
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
NECST Lab @ Politecnico di Milano
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
NECST Lab @ Politecnico di Milano
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
NECST Lab @ Politecnico di Milano
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
NECST Lab @ Politecnico di Milano
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
NECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
NECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
gaafergoudaay7aga
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 

Recently uploaded (20)

Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 

A Methodology for Automatic GPU Kernel Optimization