• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Heshan Lin: Accelerating Short Read Mapping, Local Realignment, and a Discovery on a Graphics Processing Unit (GPU)
 

Heshan Lin: Accelerating Short Read Mapping, Local Realignment, and a Discovery on a Graphics Processing Unit (GPU)

on

  • 1,105 views

Heshan Lin's talk at the 1st Earth Microbial Project meeting in Shenzhen, June 15th 2011.

Heshan Lin's talk at the 1st Earth Microbial Project meeting in Shenzhen, June 15th 2011.

Statistics

Views

Total Views
1,105
Views on SlideShare
1,104
Embed Views
1

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Heshan Lin: Accelerating Short Read Mapping, Local Realignment, and a Discovery on a Graphics Processing Unit (GPU) Heshan Lin: Accelerating Short Read Mapping, Local Realignment, and a Discovery on a Graphics Processing Unit (GPU) Presentation Transcript

    • Accelerating Sequence Analysis on Graphics Processing Unit (GPU)
      Wu Feng and Heshan Lin
      Department of Computer Science
    • NGS Democratizing DNA Sequencing
      Sequencing available to the masses in the near future
      Source: www.genome.gov
    • Bottleneck Shift -> Computation
      ChIP-Seq …
      Transcriptome Sequencing
      Complete Genome Re-sequencing
      Metagenomics
      BIG Data
    • Traditional HPC Resources
      HPC Users
      ?
      Clusters
      Supercomputers
      The Masses
    • Graphics Processing Unit (GPU)
      Graphics & gaming -> general purpose computing
      Ubiquitously available: Desktop, laptop, iPad
    • “Personalized Supercomputer”
      • 10x > CPU
      • 512 cores
      • 10^12 flops
      • On par with power of a supercomputer in 2004
    • Traditional CPU Cores
      Optimized for single thread
      Control
      (Fetch / Decode)
      Out-of-order Control Logic
      ALU
      Branch Predictor
      Execution Context
      (Registers)
      Memory Prefecter
      Data Cache
      Courtesy to K. Fatahalian
    • Source: Borkar, De Intel
    • GPU: Optimized for Throughput
      Use much simpler cores
      Use vectorization to replicate simple cores
      Control
      (Fetch / Decode)
      Control
      (Fetch / Decode)
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      ALU
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Execution Context
      (Registers)
      Shared Execution Context
      Courtesy to K. Fatahalian
    • Take with a Grain of Salt
      Raw Compute Power != Application Performance
      Not all applications are suitable for GPUs
      Developing fully optimized codes on GPU is non-trivial and requires computational rethinking
      A GPU core is MUCH SLOWER than a CPU core
      Need a lot of parallelism to hide memory latency
      Reduce branching as much as possible
      Think about an army of synchronized snails
    • GPU Potential for Sequence Alignment
      Why sequence alignment?
      Fundamental in sequence analysis
      Computationally intensive
      Preliminary study
    • Lessons Learnt
      CPU optimized code may be difficult to accelerate on GPUs
      BLASTP 6.5x vs. Smith Waterman 30x
      Require rethinking of algorithm design
      Scalable but less optimal algorithm is better
      Example: RMAP
      Originally uses hash table to find the match (O(n))
      Switched to a slower binary search algorithm (O(nlogn))
    • Opportunities
      Smith Waterman
      Needleman-Wunsch
      BWA
      Time
      BLAST
      Bowtie
      Next-gen Algorithm?
      Accuracy
    • Compute the Cure Initiative
      Partnership between NVIDIA and VT
      Goal: Leverage GPU power to fight cancer
      Current focus: GPU accelerated sequence alignment framework
      http://www.nvidia.com/object/compute-the-cure.html
    • Conclusion
      Democratizing DNA sequencing requires more accessible HPC resources
      GPUs present both opportunities and challenges
      Initial results are promising
      For more information
      Synergy website – http://synergy.cs.vt.edu
    • Acknowledgement
      Collaborators
      David Mittelman, Virginia Bioinformatics Institute
      Students
      AshwinAji
      Shucai Xiao
      Funding
      NVIDIA Compute the Cure Program
      NSF Center for High-Performance Reconfigurable Computing