SlideShare a Scribd company logo
1 of 12
Martin Prete
Alexander Predeus
01-jun-2021
Bioinformatics on GPU:
why,
how,
and what
to expect
in the future
Graphics processing units (GPUs)
● Specialized processing chips, initially for graphics
● Thousands of cores for parallel computations
● Software development for GPUs streamlined by Nvidia’s CUDA
● Became heavily used with the rise of neural networks
● Other architectures like FPGA, TPU/NPU are available
GPUs in bioinformatics
● Alignment (blast, short read aligners)
● Simulations of biomolecules (e.g. proteins)
● Any task that can use deep learning:
○ Variant calling with DeepVariant
○ Oxford Nanopore basecalling
○ Cell type inference in scRNA-seq
○ Many more every day...
When is GPU useful?
● Amount of biological data grows
(at least) exponentially
● Sequences, cells, tracker data, etc
● However:
○ Appropriate type of
computation is needed
○ RAM requirements and
transfer I/O must not
outweigh the benefits
DeepVariant
● Accurate variant calling is a very
hard problem
● Deep Variant = variant caller
using neural networks
● Converts pileup (alignment) data
into image “channels”
● Takes advantage of CNN image
expertise at Google
DeepVariant
● Variant calling is much faster on a
GPU, even though the team made
effort to optimize on CPU
● Model training is dramatically faster
(and nearly impossible on CPU)
DeepVariant
● Variant calling is much faster on a
GPU, even though the team made
effort to optimize on CPU
● Model training is dramatically faster
(and nearly impossible on CPU)
● Huge demand for this task in
medical genomics!
● Illumina is offering DRAGEN, an
FPGA-based solution that
streamlines human variant calling
Nanopore Basecalling
● Lots of errors (up to 30%!) initially
● Basecalling is hard for Nanopore-derived reads
because signal is continuous (homopolymers are
particularly an issue)
● However, NN-based methods improve every year
● Algorithms developed for speech-recognition
technology are hugely useful (CTC)
● Results in median accuracy over 98%, some reads
are perfect over 1000s of bps
Basecalling on CPU vs GPU
● Nanopore basecalling is virtually useless on CPU (big run = days on 64 cores)
● FPGA or GPU-based calling is 1-2 orders of magnitude faster
Interactive GPU usage
● Model “zoos” become increasingly standard in genomics and bioinformatics
● Scientists start using high level NN-based tools on a daily basis
● Effective usage of such modelling requires constant parameter adjustment
scVI-tools for scRNA-seq
● Huge cell atlases are being
generated and annotated at
Sanger and worldwide
● Millions of cells will soon
become billions
● Efficient label transfer and
bias removal become
incredibly important
● Difference becomes dramatic
as datasets size increases
Thank you for your attention!

More Related Content

What's hot

Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 

What's hot (20)

Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 

Similar to Bioinformatics on GPU

HPC_June2011
HPC_June2011HPC_June2011
HPC_June2011cfloare
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit Ganesan Narayanasamy
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3muayyad alsadi
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
Presentación GPUs MAEB 2012
Presentación GPUs MAEB 2012Presentación GPUs MAEB 2012
Presentación GPUs MAEB 2012gustavo_romero
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesNumenta
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFPreferred Networks
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chipsJeffrey Funk
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Filipo Mór
 
Msc presentation Bioinformatics
Msc presentation BioinformaticsMsc presentation Bioinformatics
Msc presentation BioinformaticsPavel Kartashev
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 

Similar to Bioinformatics on GPU (20)

Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
HPC_June2011
HPC_June2011HPC_June2011
HPC_June2011
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Presentación GPUs MAEB 2012
Presentación GPUs MAEB 2012Presentación GPUs MAEB 2012
Presentación GPUs MAEB 2012
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devices
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOF
 
Iwann2011 gpus
Iwann2011 gpusIwann2011 gpus
Iwann2011 gpus
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chips
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
 
Msc presentation Bioinformatics
Msc presentation BioinformaticsMsc presentation Bioinformatics
Msc presentation Bioinformatics
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
19-7960-01.pptx
19-7960-01.pptx19-7960-01.pptx
19-7960-01.pptx
 

Recently uploaded

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 

Recently uploaded (20)

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 

Bioinformatics on GPU

  • 1. Martin Prete Alexander Predeus 01-jun-2021 Bioinformatics on GPU: why, how, and what to expect in the future
  • 2. Graphics processing units (GPUs) ● Specialized processing chips, initially for graphics ● Thousands of cores for parallel computations ● Software development for GPUs streamlined by Nvidia’s CUDA ● Became heavily used with the rise of neural networks ● Other architectures like FPGA, TPU/NPU are available
  • 3. GPUs in bioinformatics ● Alignment (blast, short read aligners) ● Simulations of biomolecules (e.g. proteins) ● Any task that can use deep learning: ○ Variant calling with DeepVariant ○ Oxford Nanopore basecalling ○ Cell type inference in scRNA-seq ○ Many more every day...
  • 4. When is GPU useful? ● Amount of biological data grows (at least) exponentially ● Sequences, cells, tracker data, etc ● However: ○ Appropriate type of computation is needed ○ RAM requirements and transfer I/O must not outweigh the benefits
  • 5. DeepVariant ● Accurate variant calling is a very hard problem ● Deep Variant = variant caller using neural networks ● Converts pileup (alignment) data into image “channels” ● Takes advantage of CNN image expertise at Google
  • 6. DeepVariant ● Variant calling is much faster on a GPU, even though the team made effort to optimize on CPU ● Model training is dramatically faster (and nearly impossible on CPU)
  • 7. DeepVariant ● Variant calling is much faster on a GPU, even though the team made effort to optimize on CPU ● Model training is dramatically faster (and nearly impossible on CPU) ● Huge demand for this task in medical genomics! ● Illumina is offering DRAGEN, an FPGA-based solution that streamlines human variant calling
  • 8. Nanopore Basecalling ● Lots of errors (up to 30%!) initially ● Basecalling is hard for Nanopore-derived reads because signal is continuous (homopolymers are particularly an issue) ● However, NN-based methods improve every year ● Algorithms developed for speech-recognition technology are hugely useful (CTC) ● Results in median accuracy over 98%, some reads are perfect over 1000s of bps
  • 9. Basecalling on CPU vs GPU ● Nanopore basecalling is virtually useless on CPU (big run = days on 64 cores) ● FPGA or GPU-based calling is 1-2 orders of magnitude faster
  • 10. Interactive GPU usage ● Model “zoos” become increasingly standard in genomics and bioinformatics ● Scientists start using high level NN-based tools on a daily basis ● Effective usage of such modelling requires constant parameter adjustment
  • 11. scVI-tools for scRNA-seq ● Huge cell atlases are being generated and annotated at Sanger and worldwide ● Millions of cells will soon become billions ● Efficient label transfer and bias removal become incredibly important ● Difference becomes dramatic as datasets size increases
  • 12. Thank you for your attention!