A Parallel, Energy Efficient Hardware Architecture
for the merAligner on FPGA using Chisel HCL
Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it
Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it
Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov
RAW @ JW Marriott, Vancouver
May 22 2018
speaker: Alessandro Comodi
Context 1
Large amounts of
genomic data Algorithm complexity
In such scenario there is a need to have efficient solutions both
from a performance and a power consumption point of view
Sequence Alignment 2
Sequence alignment algorithms are some of the most compute
intensive ones
Pure software solution
Poor performance
High power consumption
merAligner 3
To overcome performance issues Lawrence Berkeley National
Labs and UC Berkeley have proposed the merAligner
High
Performance
Low power efficiency
(90 kW per cabinet)
More than 15,000 cores
Contributions 4
• The design and development of a hardware
architecture for the Smith-Waterman algorithm
on FPGA, using Chisel HCL
• The development of a wrapper written in Chisel,
used to integrate RTL cores into the Xilinx
SDAccel Framework
Smith-Waterman 5
The main bottleneck of the merAligner tool is the Smith-
Waterman algorithm implementation
Highly parallel computation
Architecture 6
Systolic array
based design
Each processing element is fed
with the result of the previous one
Results 7
Read Reference
Frequency
[MHz]
Performance
[GCUPS]
Speed up
Performance
Efficiency
[GCUPS/W]
Speed up
Power efficiency
128[*] 1024 - 3.87 - 0.0165 -
128 1024 150 3.542 0.91X 0.141 8.54X
128 2048 140 5.616 1.45X 0.224 14.35X
128 4096 180 6.529 1.68X 0.261 15.81X
128 16384 110 11.443 2.84X 0.457 27.69X
256 1024 160 6.123 1.58X 0.244 14.78X
256 2048 160 8.393 2.16X 0.335 20.30X
256 4096 130 15.225 3.93X 0.609 36.90X
256 16384 140 27.312 7.05X 1.092 66.18X
[*] State of the Art Smith-Waterman software implementation
Concluding Remarks 8
Read Reference
Frequency
[MHz]
Performance
[GCUPS]
Speed up
Performance
Efficiency
[GCUPS/W]
Speed up
Power efficiency
128[*] 1024 - 3.87 - 0.0165 -
128 1024 150 3.542 0.91X 0.141 8.54X
128 2048 140 5.616 1.45X 0.224 14.35X
128 4096 180 6.529 1.68X 0.261 15.81X
128 16384 110 11.443 2.84X 0.457 27.69X
256 1024 160 6.123 1.58X 0.244 14.78X
256 2048 160 8.393 2.16X 0.335 20.30X
256 4096 130 15.225 3.93X 0.609 36.90X
256 16384 140 27.312 7.05X 1.092 66.18X
[*] State of the Art Smith-Waterman software implementation
Thank you for your attention!
Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it
Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it
Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov
speaker: Alessandro Comodi
Hardware architecture for the acceleration of the
Smith-Waterman step of the merAligner on FPGA
using Chisel HCL

A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HCL

  • 1.
    A Parallel, EnergyEfficient Hardware Architecture for the merAligner on FPGA using Chisel HCL Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov RAW @ JW Marriott, Vancouver May 22 2018 speaker: Alessandro Comodi
  • 2.
    Context 1 Large amountsof genomic data Algorithm complexity In such scenario there is a need to have efficient solutions both from a performance and a power consumption point of view
  • 3.
    Sequence Alignment 2 Sequencealignment algorithms are some of the most compute intensive ones Pure software solution Poor performance High power consumption
  • 4.
    merAligner 3 To overcomeperformance issues Lawrence Berkeley National Labs and UC Berkeley have proposed the merAligner High Performance Low power efficiency (90 kW per cabinet) More than 15,000 cores
  • 5.
    Contributions 4 • Thedesign and development of a hardware architecture for the Smith-Waterman algorithm on FPGA, using Chisel HCL • The development of a wrapper written in Chisel, used to integrate RTL cores into the Xilinx SDAccel Framework
  • 6.
    Smith-Waterman 5 The mainbottleneck of the merAligner tool is the Smith- Waterman algorithm implementation Highly parallel computation
  • 7.
    Architecture 6 Systolic array baseddesign Each processing element is fed with the result of the previous one
  • 8.
    Results 7 Read Reference Frequency [MHz] Performance [GCUPS] Speedup Performance Efficiency [GCUPS/W] Speed up Power efficiency 128[*] 1024 - 3.87 - 0.0165 - 128 1024 150 3.542 0.91X 0.141 8.54X 128 2048 140 5.616 1.45X 0.224 14.35X 128 4096 180 6.529 1.68X 0.261 15.81X 128 16384 110 11.443 2.84X 0.457 27.69X 256 1024 160 6.123 1.58X 0.244 14.78X 256 2048 160 8.393 2.16X 0.335 20.30X 256 4096 130 15.225 3.93X 0.609 36.90X 256 16384 140 27.312 7.05X 1.092 66.18X [*] State of the Art Smith-Waterman software implementation
  • 9.
    Concluding Remarks 8 ReadReference Frequency [MHz] Performance [GCUPS] Speed up Performance Efficiency [GCUPS/W] Speed up Power efficiency 128[*] 1024 - 3.87 - 0.0165 - 128 1024 150 3.542 0.91X 0.141 8.54X 128 2048 140 5.616 1.45X 0.224 14.35X 128 4096 180 6.529 1.68X 0.261 15.81X 128 16384 110 11.443 2.84X 0.457 27.69X 256 1024 160 6.123 1.58X 0.244 14.78X 256 2048 160 8.393 2.16X 0.335 20.30X 256 4096 130 15.225 3.93X 0.609 36.90X 256 16384 140 27.312 7.05X 1.092 66.18X [*] State of the Art Smith-Waterman software implementation Thank you for your attention! Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov speaker: Alessandro Comodi Hardware architecture for the acceleration of the Smith-Waterman step of the merAligner on FPGA using Chisel HCL