• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cloud burst 소개

Cloud burst 소개






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Cloud burst 소개 Cloud burst 소개 Presentation Transcript

    • CloudBurst• CloudBurst : Highly Sensitive Short Read Mapping with MapReduce• New parallel read-mapping algorithm optimized for mapping NGS data to the human genome and other reference genomes• SNP discovery, genotyping, and personal genomics
    • CloudBurst• It is modeled after the short read mapping program RMAP• Reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences• This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes.
    • CloudBurst• Running time – scales linearly with the number of reads mapped – with near linear speedup as the number of processors increases.• CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome.
    • Algorithm Overview• CloudBurst uses seed-and-extend algorithms to map reads to a reference genome.• Seed – k differences : the alignment must have a region of length s=r/k+1 called a seed that exactly matches the reference.• Extend – CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences
    • Algorithm Overview• CloudBurst uses the Hadoop implementation of MapReduce to catalog and extend the seeds• Map phase emits – all length-s k-mers from the reference sequences – all non-overlapping length-s kmers from the reads• Shuffle phase – read and reference kmers are brought together• Reduce phase – the seeds are extended into end-to-end alignments
    • Algorithm Overview
    • DemoGetting Started.docx 참고
    • Related Tools• Bowtie: Ultrafast short read alignment• SoapSNP: Accurate SNP/consensus calling• Tophat: RNA-Seq splice junction mapper• Cufflinks: Isoform assembly, quantitation• Hadoop: Open Source MapReduce• CloudBurst: Sensitive MapReduce alignment• Crossbow: Read Mapping and SNP calling in the clouds• Jnomics: Cloud-Scale Sequence Analysis• Contrail: Cloud-based de novo assembly• Myrna: Cloud-Scale differential expression of RNAseq
    • Q&A
    • Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing. 출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html