In the coming years, genome research will likely transform medical and agricultural practices. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. Within this context, we propose the development of a novel hardware and software advanced support for genomics research, named HUGenomics. The framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
HUG: Hardware for Genomics
1. HUGenomics
A Support to Genomics Research
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Marco D. Santambrogio
05/23/2018
lorenzo.ditucci@polimi.it
Wozniak Lounge, Soda Hall, University of California at Berkeley
2. !2Genomic research
Recent advancements in genomic research allow to perform multiple analysis
on DNA affecting different fields
However, in order to extract biological meaning from secondary genome
analysis a complex process has to be performed
3. !3Genome sequencing
Given a biological sample, genome sequencing is the process of determining
the precise order of nucleotides within a DNA molecule
This process produces short DNA fragments which need to be assembled to
reconstruct the original sequence
ACGTAGCTCGGACCATAGCA
CCGCCGTAGCTCGGACCATAGCACATG
AGTTTTGGGGGACCATAGCACATGGACACATGC
GGACCATAGCACATGGACACATGC
GGTCAAAAATAGCACATGGACACATGC
ATTGTATCGGACCATATTGCTTAGCATGTATTTGC
CATGGACACATGC
CGTAACCATAGCACATGGACACATGC
TTTTAGGTAATTGCCATAGCACATGGACACAT
4. !4Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
Reference-based assembly
ACGTAGCTCGGACCATAGCA
GGACCATAGCACATGGACACATGC
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
CATGGACACATGC
5. !5Genome assembly
Genome assembly: reconstruct a genome from a set of shorter reads
De novo assembly
ACGTAGCTCGGACCATAGCAGGACCATAGCACATGGACATGGACACATGCTTA
Applications are limited to species with available reference genomes
6. !6De novo assembly
Issue:
• General purpose architectures are inefficient
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
Genomics algorithms are usually:
• compute-intensive
• massive amount of data
• fast-changing
Solution:
• In such scenario, hardware accelerators proved to be effective in
optimizing the performance over power consumption ratio
15. !15Genomics Hardware Pipeline
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
17. !17HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[1]
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[2]
Sequence Alignment via Smith-Waterman Algorithm [2]
18. !18Smith-Waterman
Platform Performance [GCUPS] Power Efficiency [GCUPS/W]
AWS-VU9P (3 queries in parallel) 110.0 4.400
Tesla K20 45.0 0.200
ADM-PCIE-KU3 42.5 1.699
Nvidia GeForce GTX 295 30.0 0.104
Xtreme Data XD1000 25.6 0.430
Altera Stratix V on Nallatech PCIe-385 24.7 0.988
Nvidia GeForce GTX 295 16.1 0.056
ADM-PCIE-7V3 14.8 0.594
Dual-core Nvidia 9800 GX2 14.5 0.074
Nvidia GeForce GTX 280 9.7 0.041
Xtreme Data XD2000i 9.0 0.150
2XNvidia GeForce 8800 3.6 0.017
19. !19HUG Today
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
Pattern Matching to identify gene motifs during Gene Annotation
[1]
[2]
[3]
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium
Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
20. !20Pattern Matching via TiREX
Regular Expression Flex* 16-core† (VC707) Speedup
ACCGTGGA 271 µs 2.07 µs 130.90X
(TTT)+CT 121 µs 4.54 µs 26.65X
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27X
* running on a Intel i7 with a peak frequency of 2.8GHz
† running at 130 MHz
21. !21HUG Today
[1] Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Luca Cerina, Alberto Scolari, and Marco D. Santambrogio. "HUGenomics: A support to personalized medicine research." In Research and Technologies for
Society and Industry (RTSI), 2017 IEEE 3rd International Forum on, pp. 1-5. IEEE, 2017.
[2] Lorenzo Di Tucci, Davide Conficconi, Alessandro Comodi, Steven Hofmeyr, David Donofrio and Marco Domenico Santambrogio. "A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA
using Chisel HDL " In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
[3] Davide Conficconi, Alessandro Comodi, Alberto Scolari and Marco Domenico Santambrogio. “TiReX: a Tiled Regular Expression Matching Architecture" In Parallel and Distributed Processing Symposium
Workshops (IPDPSW), 2018 IEEE International, t.b.p. IEEE, 2018.
RAW READ CONTIGGING SCAFFOLDING RE-SCAFFOLDING ANNOTATION
[1]
[2]
PairHMM for gene prediction/finding in the Gene Annotation phase
[3]
22. Thanks for your attention
Lorenzo Di Tucci, Giulia Guidi, Sara Notargiacomo, Marco D. Santambrogio
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
05/23/2018
lorenzo.ditucci@polimi.it
Wozniak Lounge, Soda Hall, University of California at Berkeley