Bioinformatics on GPU

•Download as PPTX, PDF•

0 likes•42 views

Alex Predeus

Presentation at WSI - June 2021

Science

Martin Prete
Alexander Predeus
01-jun-2021
Bioinformatics on GPU:
why,
how,
and what
to expect
in the future

Graphics processing units (GPUs)
● Specialized processing chips, initially for graphics
● Thousands of cores for parallel computations
● Software development for GPUs streamlined by Nvidia’s CUDA
● Became heavily used with the rise of neural networks
● Other architectures like FPGA, TPU/NPU are available

GPUs in bioinformatics
● Alignment (blast, short read aligners)
● Simulations of biomolecules (e.g. proteins)
● Any task that can use deep learning:
○ Variant calling with DeepVariant
○ Oxford Nanopore basecalling
○ Cell type inference in scRNA-seq
○ Many more every day...

When is GPU useful?
● Amount of biological data grows
(at least) exponentially
● Sequences, cells, tracker data, etc
● However:
○ Appropriate type of
computation is needed
○ RAM requirements and
transfer I/O must not
outweigh the benefits

DeepVariant
● Accurate variant calling is a very
hard problem
● Deep Variant = variant caller
using neural networks
● Converts pileup (alignment) data
into image “channels”
● Takes advantage of CNN image
expertise at Google

DeepVariant
● Variant calling is much faster on a
GPU, even though the team made
effort to optimize on CPU
● Model training is dramatically faster
(and nearly impossible on CPU)

Nanopore Basecalling
● Lots of errors (up to 30%!) initially
● Basecalling is hard for Nanopore-derived reads
because signal is continuous (homopolymers are
particularly an issue)
● However, NN-based methods improve every year
● Algorithms developed for speech-recognition
technology are hugely useful (CTC)
● Results in median accuracy over 98%, some reads
are perfect over 1000s of bps

Basecalling on CPU vs GPU
● Nanopore basecalling is virtually useless on CPU (big run = days on 64 cores)
● FPGA or GPU-based calling is 1-2 orders of magnitude faster

Interactive GPU usage
● Model “zoos” become increasingly standard in genomics and bioinformatics
● Scientists start using high level NN-based tools on a daily basis
● Effective usage of such modelling requires constant parameter adjustment

scVI-tools for scRNA-seq
● Huge cell atlases are being
generated and annotated at
Sanger and worldwide
● Millions of cells will soon
become billions
● Efficient label transfer and
bias removal become
incredibly important
● Difference becomes dramatic
as datasets size increases

What's hot

Workshop NGS data analysis - 1Maté Ongenaert

ECCB 2010 Next-gen sequencing TutorialThomas Keane

Ngs intro_v6_publicFrançois PAILLIER

Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen

Valladolid final-septiembre-2010TELECOM I+D

White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC

BioChain Next Generation Sequencing Productsbiochain

Making powerful science: an introduction to NGS data analysisAdamCribbs1

White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC

Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSVHIR Vall d’Hebron Institut de Recerca

Ngs part i 2013Elsa von Licy

Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt

Introduction to second generation sequencingDenis C. Bauer

NGS - Basic principles and sequencing platformsAnnelies Haegeman

Toolbox for bacterial population analysis using NGSMirko Rossi

Rnaseq basics ngs_application1Yaoyu Wang

Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier

Genome Assembly 2018Aureliano Bombarely

2011 jeroen vanhoudt_ngsDin Apellidos

Next Generation Sequencing (NGS)LOGESWARAN KA

What's hot (20)

Workshop NGS data analysis - 1

ECCB 2010 Next-gen sequencing Tutorial

Ngs intro_v6_public

Next-generation sequencing data format and visualization with ngs.plot 2015

Valladolid final-septiembre-2010

White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...

BioChain Next Generation Sequencing Products

Making powerful science: an introduction to NGS data analysis

White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...

Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS

Ngs part i 2013

Combining PacBio with short read technology for improved de novo genome assembly

Introduction to second generation sequencing

NGS - Basic principles and sequencing platforms

Toolbox for bacterial population analysis using NGS

Rnaseq basics ngs_application1

Next-generation sequencing and quality control: An Introduction (2016)

Genome Assembly 2018

2011 jeroen vanhoudt_ngs

Next Generation Sequencing (NGS)

Similar to Bioinformatics on GPU

Dl2 computing gpuArmando Vieira

Lrz kurs: big data analysisFerdinand Jamitzky

HPC_June2011cfloare

DigitRecognition.pptxruvex

Early Application experiences on Summit Ganesan Narayanasamy

Accelerating stochastic gradient descent using adaptive mini batch size3muayyad alsadi

GPU Computing for Data Science Domino Data Lab

Presentación GPUs MAEB 2012gustavo_romero

In datacenter performance analysis of a tensor processing unitJinwon Lee

Deep learning at the edge: 100x Inference improvement on edge devicesNumenta

MN-3, MN-Core and HPL - SC21 Green500 BOFPreferred Networks

Iwann2011 gpuspacvslideshare

Deep learning with FPGAAyush Singh, MS

Neurosynaptic chipsJeffrey Funk

Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Filipo Mór

Msc presentation BioinformaticsPavel Kartashev

Mauricio breteernitiz hpc-exascale-isctembreternitz

APSys Presentation Final copy2Junli Gu

Monte Carlo on GPUsfcassier

19-7960-01.pptxsurvivesurviving

Similar to Bioinformatics on GPU (20)

Dl2 computing gpu

Lrz kurs: big data analysis

HPC_June2011

DigitRecognition.pptx

Early Application experiences on Summit

Accelerating stochastic gradient descent using adaptive mini batch size3

GPU Computing for Data Science

Presentación GPUs MAEB 2012

In datacenter performance analysis of a tensor processing unit

Deep learning at the edge: 100x Inference improvement on edge devices

MN-3, MN-Core and HPL - SC21 Green500 BOF

Iwann2011 gpus

Deep learning with FPGA

Neurosynaptic chips

Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...

Msc presentation Bioinformatics

Mauricio breteernitiz hpc-exascale-iscte

APSys Presentation Final copy2

Monte Carlo on GPUs

19-7960-01.pptx

Recently uploaded

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed

Speech, hearing, noise, intelligibility.pptxpriyankatabhane

Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh

Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013

Forest laws, Indian forest laws, why they are importantadityabhardwaj282

Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909

Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh

Welcome to GFDL for Take Your Child To Work DayZachary Labe

Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar

Neurodevelopmental disorders according to the dsm 5 trssuser06f238

Evidences of Evolution General Biology 2John Carlo Rollon

Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9

Recently uploaded (20)

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx

Speech, hearing, noise, intelligibility.pptx

Volatile Oils Pharmacognosy And Phytochemistry -I

Scheme-of-Work-Science-Stage-4 cambridge science.docx

Forest laws, Indian forest laws, why they are important

Analytical Profile of Coleus Forskohlii | Forskolin .pdf

Grafana in space: Monitoring Japan's SLIM moon lander in real time

Gas_Laws_powerpoint_notes.ppt for grade 10

Artificial Intelligence In Microbiology by Dr. Prince C P

‏‏VIRUS - 123455555555555555555555555555555555555555

Vision and reflection on Mining Software Repositories research in 2024

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝

Welcome to GFDL for Take Your Child To Work Day

Analytical Profile of Coleus Forskohlii | Forskolin .pptx

Neurodevelopmental disorders according to the dsm 5 tr

Evidences of Evolution General Biology 2

Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

zoogeography of pakistan.pptx fauna of Pakistan

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...

Bioinformatics on GPU

1. Martin Prete Alexander Predeus 01-jun-2021 Bioinformatics on GPU: why, how, and what to expect in the future

2. Graphics processing units (GPUs) ● Specialized processing chips, initially for graphics ● Thousands of cores for parallel computations ● Software development for GPUs streamlined by Nvidia’s CUDA ● Became heavily used with the rise of neural networks ● Other architectures like FPGA, TPU/NPU are available

3. GPUs in bioinformatics ● Alignment (blast, short read aligners) ● Simulations of biomolecules (e.g. proteins) ● Any task that can use deep learning: ○ Variant calling with DeepVariant ○ Oxford Nanopore basecalling ○ Cell type inference in scRNA-seq ○ Many more every day...

4. When is GPU useful? ● Amount of biological data grows (at least) exponentially ● Sequences, cells, tracker data, etc ● However: ○ Appropriate type of computation is needed ○ RAM requirements and transfer I/O must not outweigh the benefits

5. DeepVariant ● Accurate variant calling is a very hard problem ● Deep Variant = variant caller using neural networks ● Converts pileup (alignment) data into image “channels” ● Takes advantage of CNN image expertise at Google

6. DeepVariant ● Variant calling is much faster on a GPU, even though the team made effort to optimize on CPU ● Model training is dramatically faster (and nearly impossible on CPU)

7. DeepVariant ● Variant calling is much faster on a GPU, even though the team made effort to optimize on CPU ● Model training is dramatically faster (and nearly impossible on CPU) ● Huge demand for this task in medical genomics! ● Illumina is offering DRAGEN, an FPGA-based solution that streamlines human variant calling

8. Nanopore Basecalling ● Lots of errors (up to 30%!) initially ● Basecalling is hard for Nanopore-derived reads because signal is continuous (homopolymers are particularly an issue) ● However, NN-based methods improve every year ● Algorithms developed for speech-recognition technology are hugely useful (CTC) ● Results in median accuracy over 98%, some reads are perfect over 1000s of bps

9. Basecalling on CPU vs GPU ● Nanopore basecalling is virtually useless on CPU (big run = days on 64 cores) ● FPGA or GPU-based calling is 1-2 orders of magnitude faster

10. Interactive GPU usage ● Model “zoos” become increasingly standard in genomics and bioinformatics ● Scientists start using high level NN-based tools on a daily basis ● Effective usage of such modelling requires constant parameter adjustment

11. scVI-tools for scRNA-seq ● Huge cell atlases are being generated and annotated at Sanger and worldwide ● Millions of cells will soon become billions ● Efficient label transfer and bias removal become incredibly important ● Difference becomes dramatic as datasets size increases

12. Thank you for your attention!

Bioinformatics on GPU

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bioinformatics on GPU

Similar to Bioinformatics on GPU (20)

Recently uploaded

Recently uploaded (20)

Bioinformatics on GPU