1. Hardware accelera*on of gene*c variant caller
NOMICA
Lorenzo Di Tucci, Giulia Guidi, Chiara Crippa, Sara Notargiacomo, Marco D. Santambrogio
Xilinx - 8th June 2017
3. 3
MOTIVATION
Many human diseases have a
gene3c component
By iden*fying causal variants
specific treatments may be
recommended
Large scale databases of known
variants are used for comparison
Even modestly complex
algorithm would require
prohibi3vely long 3me to
complete
ACCELERATION OF THE VARIANT CALLING
4. 4
FASTQ VCF
DNA is extracted from a blood sample of the individual and
loaded on to a sequencing machine
GATK Best
Practice
• Data
Pre-processing
• Variant
Discovery
• Callset
Refinement
VARIANT
CALLING
Sequencing
7. 7
PAIRHMM
Pairwise alignment of each read against
each candidate haplotype
The likelihood of each read-haplotype
pairing is calculated
Computationally intensive
Insert
I
Delete
D
Match
M
aii
aim
ami
amm
adm
amd
add
8. 8
PROFILING
Stage Time Runtime %
Assembly 2,598s 13%
Pair-HMM 14,225s 70%
Transversal + Genotyping 3,379s 17%
Biological sample - NA12878 80xWGS - Mauricio et al. of Broad Ins*tute
10. Thanks for your aWen*on!
Lorenzo Di Tucci, Giulia Guidi, Chiara Crippa, Sara Notargiacomo, Marco D. Santambrogio
{giulia.guidi, chiara2.crippa}@mail.polimi.it
{lorenzo.ditucci, sara.notargiacomo, marco.santambrogio}@polimi.it
NOMICA