1. HUGenomics
Hardware for hUman Genomics
e-Novia
15/11/2017
Lorenzo Di Tucci, Giulia Guidi, Chiara Crippa, Davide Sampietro,
Alessandro Comodi, Davide Conficconi, Sara Notargiacomo, Marco D. Santambrogio
2. 2
Personalized medicine
Responds to a standard dose
Responds to a higher dose
Responds to a lower dose
Responds to an alternative
medication
• Thousands of DNA variants are associated with different diseases
• Possibility to develop a more precise and personalized medicine for
each individual patient
3. 3
Challenges
a. Development of new architectures to efficiently process the huge
amount of data generated in the years
b. Development of technologies to efficiently and clearly visualize
the computed data
4. 4
Proposed solution
HUG is an integrated hardware and software system that aims
at:
• becoming an advanced support for bioinformatics
• facilitating the identification of correlations, making less
complicated the identification of biological meanings
15. 7
Applications
About 80% of rare diseases
are of genetic origin
By identifying gene changes
specific treatments
may be recommended
16. 7
Applications
About 80% of rare diseases
are of genetic origin
By identifying gene changes
specific treatments
may be recommended
Finding the changes implies
a comparison against
huge databases
of known variants
17. 7
Applications
About 80% of rare diseases
are of genetic origin
By identifying gene changes
specific treatments
may be recommended
Finding the changes implies
a comparison against
huge databases
of known variants
Even modestly complex
algorithms would require a
prohibitively long time to
complete
18. 8
https://macarthurlab.org
1
2
DNA is extracted from a blood sample of the individual and
loaded on to a sequencing machine
GATK Best
Practice
• Data
Pre-processing
• Variant
Discovery
• Callset
Refinement
FASTQ VCF
2
Variant Discovery
19. 9
Workflows are specific to the variant type:
• Germline SNPs & Indels
• Somatic SNVs & Indels
Workflows
https://software.broadinstitute.org/gatk/
20. 9
Workflows are specific to the variant type:
• Germline SNPs & Indels
• Somatic SNVs & Indels
Workflows
https://software.broadinstitute.org/gatk/
21. 9
Workflows are specific to the variant type:
• Germline SNPs & Indels
• Somatic SNVs & Indels
Workflows
https://software.broadinstitute.org/gatk/
PairHMM
24. 11
Pairwise alignment of each read
against each candidate haplotype
The likelihood of each
read-haplotype pair is calculated
Pair-HMM
Every possible alignment is taken
into account for cumulative
probability
43. • Tiled Architecture
• Two ways to operate:
– Single Instruction Multiple
Data (SIMD)
– Multiple Instructions Single
Data (MISD)
22
From single to multicore
44. 23Results: Quad-Core implementation
Performance *
Regular Expression
Time Flex
(intel i7
2.80Ghz)
Time TiReX
(Quad-Core100 MHz)
Speedup
ACCGTGGA 271 us 10 us 27.1x
(TTTT)+CT 121 us 6.23 us 19.42x
(CAGT)|(GGGG)|(TTGG)TGCA(C|
G)+
263 us 24 us 10.9x
* Dimension of the test file: 16 KB
48. Thanks for your attention
e-Novia
15/11/2017
Lorenzo Di Tucci, Giulia Guidi, Chiara Crippa, Davide Sampietro,
Alessandro Comodi, Davide Conficconi, Sara Notargiacomo, Marco D. Santambrogio
{giulia.guidi, chiara2.crippa, davide.sampietro, alessandro.comodi, davide.conficconi}@mail.polimi.it
{lorenzo.ditucci, sara.notargiacomo, marco.santambrogio}@polimi.it