Nvidia in bioinformatics

3,673 views

Published on

Presentation by Jonathan Cohen & Mark Berger at Bioinformatics conference July 2013. It covers
- GPU Programming in 10 slides
- GPUs in Bioinformatics
- Porting SeqAn to CUDA
- Resources for developers and bioinformatics professionals

Published in: Technology

Nvidia in bioinformatics

  1. 1. GPU ACCELERATION OF BIOINFORMATICS PIPELINES Jonathan Cohen and Mark Berger, NVIDIA
  2. 2. Agenda GPU Programming in 10 slides – Cohen (10 minutes) GPUs for Bioinformatics – Cohen (10 minutes) Experiences porting SeqAn to CUDA – Siragusa (15 minutes) Resources – Berger (5 minutes) Discussion, Q&A – All (20 minutes)
  3. 3. GPU Programming in Ten Slides
  4. 4. CUDA – Programming for Throughput CPU threads: Large amount of memory per thread Full-featured instruction set 1-16 execute simultaneous CUDA threads: Lightweight footprint Full-featured instruction set 10,000 execute simultaneously CPU Host Executes functions GPU Device Executes kernels Run few threads, each one very fast Run many threads, each one slow, => total throughput high
  5. 5. CUDA Kernels: Parallel Threads A kernel is an array of threads, executed in parallel All threads execute the same code Each thread has an ID Select input/output data Control decisions float x = input[threadID]; float y = func(x); output[threadID] = y;
  6. 6. CUDA Kernels: Subdivide into Blocks
  7. 7. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks
  8. 8. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid
  9. 9. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid A kernel is executed as a grid of blocks of threads
  10. 10. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid A kernel is executed as a grid of blocks of threads GPU
  11. 11. Accelerated Computing Multi-core plus Many-cores CPU Optimized for Serial Tasks GPU Accelerator Optimized for Many Parallel Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency
  12. 12. How GPU Acceleration Works Application Code + GPU CPU 5% of Code Compute-Intensive Functions Rest of Sequential CPU Code
  13. 13. Hello World in CUDA __global__ void parallel_hello_world() { printf(“Hello, world. This is thread %d, block %d!n”, threadIdx.x, blockIdx.x); } int main() { parallel_hello_world<<<128,128>>>(); return 0; } > nvcc –o hello_world –arch=sm_30 main.cu > ./hello_world Hello, world. This is thread 0, block 0! Hello, world. This is thread 1, block 0! ...
  14. 14. GPUs for Bioinformatics
  15. 15. Life Technologies Ion Proton 3 GPUs per Device S3229 - GPU Accelerated Signal Processing in Ion Proton Whole Genome Sequencer Mohit Gupta ( Life Technologies ) Jakob Siegel ( Life Technologies ) https://registration.gputechconf.com/form/session-listing
  16. 16. BGI & NVIDIA Joint Innovation Lab SOAP3 Aligner S3257 - Tackling Big Data in Genomics with GPU BingQiang Wang (Beijing Genomics Institute) https://registration.gputechconf.com/form/session-listing
  17. 17. CUDASW++ From Bertil Schmidt’s group: http://cudasw.sourceforge.net/homepage.htm Y. Liu, A. Wirawan, B. Schmidt: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117. Performance comparisons on the Swiss-Prot database. “On GTX680 (GTX690), CUDASW++ 3.0 yields an average performance of 109.4 (169.7) GCUPS, with a maximum of 119.0 (185.6) GCUPS.”
  18. 18. NVIDIA GPU Life Science Focus Molecular Dynamics: All codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD Great multi-GPU performance GPU codes: Abalone, ACEMD, HOOMD-Blue Focus: scaling to large numbers of GPUs Quantum Chemistry: key codes ported or optimizing Active GPU acceleration projects: VASP, NWChem, Gaussian, GAMESS, ABINIT, Quantum Espresso, BigDFT, CP2K, GPAW, etc. GPU code: TeraChem Analytical and Medical Imaging Instruments
  19. 19. NVBIO A GPU based C++ framework for High Throughput Sequence Analysis Short & Long Read Alignment Variant Calling Compression … Overall Design: flexibility & customizability – a templated library parallelism at every level optimize throughput, server-like design optimize the whole pipeline, not just a single component (e.g. including data transfers, SAM, BAM, CRAM I/O, …)
  20. 20. A modular library FM-index Suffix Trie Radix Tree Sorted Dictionary Edit Distance Smith-Waterman Needleman-Wunsch Gotoh Banded/Full DP DP AlignmentTries Exact Search Backtracking Text Search FASTQ FASTA Sequence I/O SAM BAM CRAM Alignment I/O HTML report generators Support Tools GPU CPU O(1k-10k) threads O(10-100) threads
  21. 21. nvBowtie2 - Real Datasets speedup 4.3x alignment rate +0.5% disagreement 0.002% Ion Proton 100M x 175bp (8-350) end-to-end - speedup 2.4x alignment rate = disagreement 0.006% Illumina Genome Analyzer II 10M x 100bp x 2 end-to-end ERR161544 speedup 7.6x alignment rate -0.6% disagreement 0.03% Ion Proton 100M x 175bp (8-350) local - speedup 2.6x alignment rate = disagreement 0.022% Illumina Genome Analyzer II 10M x 100bp x 2 local ERR161544
  22. 22. TT32 NVBIO: efficient sequences analysis on GPUs Jacopo Pantaleoni Tuesday 2:10 pm, Hall 9
  23. 23. GPU Technology Conference https://registration.gputechconf.com/form/session-listing Tag: “Bioinformatics and Genomics” http://www.gputechconf.com/page/home.html Google: “GPU Technology Conference”
  24. 24. Resources
  25. 25. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages Maximum Flexibility OpenACC Directives Easily Accelerate Applications
  26. 26. GPU Accelerated Libraries “Drop-in” Acceleration for your Applications Linear Algebra FFT, BLAS, SPARSE, Matrix Numerical & Math RAND, Statistics Data Struct. & AI Sort, Scan, Zero Sum Visual Processing Image & Video NVIDIA cuFFT, cuBLAS, cuSPARSE NVIDIA Math Lib NVIDIA cuRAND NVIDIA NPP NVIDIA Video Encode GPU AI – Board Games GPU AI – Path Finding
  27. 27. OpenACC: Open, Simple, Portable • Open Standard • Easy, Compiler-Driven Approach • Portable on GPUs and Xeon Phi main() { … <serial code> … #pragma acc kernels { <compute intensive code> } … } Compiler Hint CAM-SE Climate 6x Faster on GPU 2x Faster on CPU only Top Kernel: 50% of Runtime Available from:
  28. 28. GPU Programming Languages OpenACC, CUDA FortranFortran OpenACC, CUDA CC Thrust, CUDA C++C++ PyCUDA, Anaconda AcceleratePython GPU.NETC# R, MATLAB, Mathematica, LabVIEWNumerical analytics
  29. 29. Reaching New Developers - CUDA Python Python Productivity + GPU Performance Easy to Learn Powerful Libraries Popular in New Developers HPC & Data Analytics Data from CodeEval.com, based on 100k+ code samples
  30. 30. Easiest Way to Learn CUDA 50K Registered 127 Countries $$ Learn from the Best Anywhere, Any Time It’s Free! Engage with an Active Community
  31. 31. Feedback/Discussion

×