HUG + Nomica: a scalable FPGA-based architecture for variant-calling

Intel PSG, San Jose - CA
HUGenomics
A Support to Genomics Research
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
Lorenzo Di Tucci, Giulia Guidi, Marco Rabozzi, Sara Notargiacomo,
Marco D. Santambrogio
06/01/2018
lorenzo.ditucci@polimi.it

Genomic research
Recent advancements in genomic research allow to perform multiple
analysis on DNA affecting different fields
However, in order to extract biological meaning from secondary genome
analysis a complex process has to be performed
2

Genome sequencing
Given a biological sample, genome sequencing is the process of
determining the precise order of nucleotides within a DNA molecule
This process produces short DNA fragments which need to be assembled
to reconstruct the original sequence
ACGTAGCTCGGACCATAGCA
CCGCCGTAGCTCGGACCATAGCACATG
AGTTTTGGGGGACCATAGCACATGGACACATGC
GGACCATAGCACATGGACACATGC
GGTCAAAAATAGCACATGGACACATGC
ATTGTATCGGACCATATTGCTTAGCATGTATTTGC
CATGGACACATGC
CGTAACCATAGCACATGGACACATGC
TTTTAGGTAATTGCCATAGCACATGGACACAT
3

Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
Reference-based assembly
ACGTAGCTCGGACCATAGCA
GGACCATAGCACATGGACACATGC
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCT
TA
CATGGACACATGC
4

Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
De novo assembly
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
Applications are limited to species with available reference genomes
5

De novo assembly
Issue:
• General purpose architectures are inefficient
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
6

Hardware architectures
Learning curve for multiple architectures
7

Objective
An advanced support to genomic research
exploiting
heterogeneous hardware architectures
8

Genomics Hardware Pipeline
PIPELINE
CREATION
DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
HUG has exactly
what I need!
YESLINE
9

Scientific Data Visualization 10

Custom Code Integration
HETEROGENEOUS ARCHITECTURE
HUG
I’d like to integrate
my own algorithm
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATIO
N
14

Genomics Hardware Pipeline
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
15

HUG Today
Smith-Waterman
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[1]
[2]
16

HUG Today
[1]
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[2]
Sequence Alignment via Smith-Waterman Algorithm [2]
17

Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
18

Smith-Waterman
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
ADM-PCIE-7V3 14.8 0.594
19

Smith-Waterman
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
ADM-PCIE-7V3 14.8 0.594
20

HUG Today
Pattern Matching to identify gene motifs during Gene Annotation
[1]
[2]
21

Pattern Matching via TiREX
Regular Expression Flex* 16-core† (VC707) Speedup
ACCGTGGA 271 µs 2.07 µs 130.90X
(TTT)+CT 121 µs 4.54 µs 26.65X
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27X
* running on a Intel i7 with a peak frequency of 2.8GHz
† running at 130 MHz+
22

HUG Today
[1]
[2]
PairHMM for gene prediction/finding in the Gene Annotation phase
23

Davide Sampietro
Chiara Crippa
Lorenzo Di Tucci
Emanuele Del Sozzo
{davide.sampietro, chiara2.crippa}@mail.polimi.it
{lorenzo.ditucci, emanuele.delsozzo, marco.santambrogio}@polimi.it
24

Haplotype
Caller
MuTect
2
Existing workflows
(GATK)
27

M
XY
PairHMM FA
Haplotype
Caller
MuTect
2
Existing workflows
(GATK)
28

Existing workflows
(GATK)
Haplotype
Caller
MuTect
2
29

PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
ATACGCCACT
ATCCTGACTRead:
Hap:
Underlying model
(PairHMM)
30

Some alignments look
very unlikely
ATACGC--CCACT
AT-C-CTGA--CT
PairHMM
Observable process
(pair of words)
31

ATAC-GCCACT
ATCCTG--ACT
Some other make much
more sense
PairHMM
Observable process
(pair of words)
32

ATAC-GCCACT
ATCCTG--ACT
Even though they are
not perfect
PairHMM
Observable process
(pair of words)
33

ATAC-GCCACT
ATCCTG--ACT
PairHMM
Observable process
(pair of words)
Underlying model
(PairHMM)
34

Serves as a measure to understand correlation between two strings
PairHMM Forward
Algorithm
Takes into account every possible alignment
35

M
XY
MM
MY
MX
GM
GM
X
XYY
The problem is reformulated through dynamic programming
PairHMM Forward
Algorithm
36

FA – Data
dependencies
haplotype
read
37

FA – Data
dependencies
haplotype
read
38

FA – Data
dependencies
haplotype
read
39

FA – Data
dependencies
haplotype
read
40

FA – Data
dependencies
haplotype
read
41

FA – Data
dependencies
haplotype
read
42

FA – datapath
X[i-1][j-1] Y[i-1][j-1]
gm[j] mm[j] M[i-1][j-1]
X[i][j]
X[i][j-1] M[i][j-1]xx[j] mx[j] Y[i-1][j] M[i-1][j]yy[j] my[j]
Y[i][j]
prior
M[i][j]
43

Tree height
reduction
[S. Huang et al., Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling]
X[i-1][j-1] Y[i-1][j-1]
gm[j] mm[j] M[i-1][j-1]
prior
M[i][j]
44

X[i][j-1] Y[i][j-1]
gm[j]
XY[i][j]
XY[i-1][j]
mm[j] M[i-1][j-1]
prior
M[i][j]
Tree height
reduction
45

X[i][j-1] Y[i][j-1]
gm[j]
XY[i][j]
mm[j]
MMM[i][j]
M[i][j-1]
MMM[i-1][j]XY[i-1][j]
prior
M[i][j]
Tree height
reduction
46

Limitations of
Systolic Arrays
Systolic Array size tuning
▪ Longer reads cannot be processed
▪ Shorter reads leave the PEs idle
Solution: instantiate fixed size PE Array
▪ Longer reads are chunked
▪ Upper bound on idle PEs
47

e.g. 2 PEs instead of 5
Our flavour:
PE Chunks
read
PE
haplotype
PE
48

PEPEPEPE
Our flavour:
PE Chunks
read
haplotype
49

PEPE
Our flavour:
PE Chunks
read
haplotype
50

PEPEPEPE
Our flavour:
PE Chunks
read
haplotype
51

PEPE
Our flavour:
PE Chunks
read
haplotype
52

PEPEPEPE
Our flavour:
PE Chunks
read
haplotype
53

Our flavour:
PE Chunks
read
haplotype
54

Our flavour:
PE Chunks
read
haplotype
55

Results
Platform #PEs Frequency
(MHz)
Time
(ms)
Speedup
Java on CPU N/A N/A 10800 1X
Intel FPGA Arria 10 128 180 89 121X
Nvidia K40 N/A N/A 70 154X
Stratix V D8 (Intel
FPGA)
64 207 8.3 1301X
PE Ring (Stratix V) 64 200 5.3 2038X
Arria 10 (Intel FPGA) 208 213 2.8 3857X
PE Ring (Arria 10) 128 231 2.6 4154X
56

Pipeline filling latency
Dead times when switching to the next haplotype
Solution: concatenate several haplotypes
Limitations of
Systolic Arrays
Systolic Array size tuning
▪ Longer reads cannot be processed
▪ Shorter reads leave the PEs idle
Solution: instantiate fixed size PE Array
▪ Longer reads are chunked
▪ Upper bound on idle PEs
57

PEPEPEPEPE
Dead times in
computation
read
haplotype
58

PEPEPEPEPEPEPEPEPEPE
Dead times in
computation
read
haplotype
59

PEPEPEPEPE
Dead times in
computation
read
haplotype
60

Dead times in
computation
read
haplotype
61

PEPEPEPEPEPE
Dead times in
computation
read
haplotype
62

PEPEPEPEPEPEPEPEPE
Dead times in
computation
read
haplotype
63

PEPEPEPEPEPE
Dead times in
computation
read
haplotype
64

read
PEPEPEPE
haplotype
PEPEPE
Dead times in
computation
65

Fine-grained
Parallelism
read
PEPEPEPE
haplotype 1
PEPE
haplotype 2
66

PEPEPEPEPEPEPEPEPEPEPE
Fine-grained
Parallelism
read
haplotype 1haplotype 2
67

PEPEPEPEPEPE
Fine-grained
Parallelism
read
68

PEPEPEPEPEPEPEPEPEPEPE
Fine-grained
Parallelism
read
69

PEPEPEPEPEPE
Fine-grained
Parallelism
read
70

Fine-grained
Parallelism
read
71

Fine-grained
Parallelism
read
72

Our Solution
Input data
Output data
73

GATK workflow
(Java)
C++ Library
Our Solution
Input data
Output data
74

Thanks for your
attention
Lorenzo Di Tucci, Giulia Guidi, Marco Rabozzi, Davide Sampietro,
Emanuele Del Sozzo, Chiara Crippa, Sara Notargiacomo,
lorenzo.ditucci@polimi.it
davide.sampietro@mail.polimi.it

HUG + Nomica: a scalable FPGA-based architecture for variant-calling

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to HUG + Nomica: a scalable FPGA-based architecture for variant-calling

Similar to HUG + Nomica: a scalable FPGA-based architecture for variant-calling (20)

More from NECST Lab @ Politecnico di Milano

More from NECST Lab @ Politecnico di Milano (20)

Recently uploaded

Recently uploaded (20)

HUG + Nomica: a scalable FPGA-based architecture for variant-calling