In the coming years, genome research will likely transform medical and agricultural practices. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. Within this context, we propose the development of a novel hardware and software advanced support for genomics research, named HUGenomics. The framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture. In the context of this tool we present also Nomica, an acceleration of the variant-calling pipelines, that allow to spot potentially harmful variants in a target genome. In particular, this pipelines employ the PairHMM Forward Algorithm, which emerged as bottleneck of the procedure as well as the step that would benefit the most from parallelization. The goal of nomica is exploring the paradigm of HSA (Heterogeneous System Architectures) and integrating a hardware accelerator based on FPGA into GATK, a state-of-the-art software solution that supports research in the genomic field.
Software and Systems Engineering Standards: Verification and Validation of Sy...
HUG + Nomica: a scalable FPGA-based architecture for variant-calling
1. Intel PSG, San Jose - CA
HUGenomics
A Support to Genomics Research
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
Lorenzo Di Tucci, Giulia Guidi, Marco Rabozzi, Sara Notargiacomo,
Marco D. Santambrogio
06/01/2018
lorenzo.ditucci@polimi.it
2. Genomic research
Recent advancements in genomic research allow to perform multiple
analysis on DNA affecting different fields
However, in order to extract biological meaning from secondary genome
analysis a complex process has to be performed
2
3. Genome sequencing
Given a biological sample, genome sequencing is the process of
determining the precise order of nucleotides within a DNA molecule
This process produces short DNA fragments which need to be assembled
to reconstruct the original sequence
ACGTAGCTCGGACCATAGCA
CCGCCGTAGCTCGGACCATAGCACATG
AGTTTTGGGGGACCATAGCACATGGACACATGC
GGACCATAGCACATGGACACATGC
GGTCAAAAATAGCACATGGACACATGC
ATTGTATCGGACCATATTGCTTAGCATGTATTTGC
CATGGACACATGC
CGTAACCATAGCACATGGACACATGC
TTTTAGGTAATTGCCATAGCACATGGACACAT
3
4. Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
Reference-based assembly
ACGTAGCTCGGACCATAGCA
GGACCATAGCACATGGACACATGC
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCT
TA
CATGGACACATGC
4
5. Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
De novo assembly
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
Applications are limited to species with available reference genomes
5
6. De novo assembly
Issue:
• General purpose architectures are inefficient
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
6
14. Custom Code Integration
HETEROGENEOUS ARCHITECTURE
HUG
I’d like to integrate
my own algorithm
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATIO
N
14
16. HUG Today
Smith-Waterman
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[1]
[2]
16
17. HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[1]
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[2]
Sequence Alignment via Smith-Waterman Algorithm [2]
17
18. Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
18
19. Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
19
20. Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
20
21. HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
Pattern Matching to identify gene motifs during Gene Annotation
[1]
[2]
21
22. Pattern Matching via TiREX
Regular Expression Flex* 16-core† (VC707) Speedup
ACCGTGGA 271 µs 2.07 µs 130.90X
(TTT)+CT 121 µs 4.54 µs 26.65X
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27X
* running on a Intel i7 with a peak frequency of 2.8GHz
† running at 130 MHz+
22
23. HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[1]
[2]
PairHMM for gene prediction/finding in the Gene Annotation phase
23
24. Davide Sampietro
Chiara Crippa
Lorenzo Di Tucci
Emanuele Del Sozzo
Marco D. Santambrogio
{davide.sampietro, chiara2.crippa}@mail.polimi.it
{lorenzo.ditucci, emanuele.delsozzo, marco.santambrogio}@polimi.it
24
30. PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
ATACGCCACT
ATCCTGACTRead:
Hap:
Underlying model
(PairHMM)
30
31. Some alignments look
very unlikely
ATACGC--CCACT
AT-C-CTGA--CT
PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
31
32. ATAC-GCCACT
ATCCTG--ACT
Some other make much
more sense
PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
32
33. ATAC-GCCACT
ATCCTG--ACT
Even though they are
not perfect
PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
33
34. ATAC-GCCACT
ATCCTG--ACT
PairHMM
A statistical model that allows to establish the correlation of two strings.
It serves as a means to correct sequencing errors
Observable process
(pair of words)
Underlying model
(PairHMM)
34
35. Serves as a measure to understand correlation between two strings
PairHMM Forward
Algorithm
Takes into account every possible alignment
35
44. Tree height
reduction
[S. Huang et al., Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling]
X[i-1][j-1] Y[i-1][j-1]
gm[j] mm[j] M[i-1][j-1]
prior
M[i][j]
44
47. Limitations of
Systolic Arrays
Systolic Array size tuning
▪ Longer reads cannot be processed
▪ Shorter reads leave the PEs idle
Solution: instantiate fixed size PE Array
▪ Longer reads are chunked
▪ Upper bound on idle PEs
47
48. e.g. 2 PEs instead of 5
Our flavour:
PE Chunks
read
PE
haplotype
PE
48
54. e.g. 2 PEs instead of 5
Our flavour:
PE Chunks
read
haplotype
54
55. e.g. 2 PEs instead of 5
Our flavour:
PE Chunks
read
haplotype
55
56. Results
Platform #PEs Frequency
(MHz)
Time
(ms)
Speedup
Java on CPU N/A N/A 10800 1X
Intel FPGA Arria 10 128 180 89 121X
Nvidia K40 N/A N/A 70 154X
Stratix V D8 (Intel
FPGA)
64 207 8.3 1301X
PE Ring (Stratix V) 64 200 5.3 2038X
Arria 10 (Intel FPGA) 208 213 2.8 3857X
PE Ring (Arria 10) 128 231 2.6 4154X
56
57. Pipeline filling latency
Dead times when switching to the next haplotype
Solution: concatenate several haplotypes
Limitations of
Systolic Arrays
Systolic Array size tuning
▪ Longer reads cannot be processed
▪ Shorter reads leave the PEs idle
Solution: instantiate fixed size PE Array
▪ Longer reads are chunked
▪ Upper bound on idle PEs
57
75. Thanks for your
attention
Lorenzo Di Tucci, Giulia Guidi, Marco Rabozzi, Davide Sampietro,
Emanuele Del Sozzo, Chiara Crippa, Sara Notargiacomo,
Marco D. Santambrogio
lorenzo.ditucci@polimi.it
davide.sampietro@mail.polimi.it