CloudBurst
• CloudBurst : Highly Sensitive Short Read
  Mapping with MapReduce

• New parallel read-mapping algorithm
  optimized for mapping NGS data to the
  human genome and other reference
  genomes

• SNP discovery, genotyping, and personal
  genomics
CloudBurst
• It is modeled after the short read mapping
  program RMAP

• Reports either all alignments or the unambiguous
  best alignment for each read with any number of
  mismatches or differences

• This level of sensitivity could be prohibitively time
  consuming, but CloudBurst uses the open-source
  Hadoop implementation of MapReduce to
  parallelize execution using multiple compute
  nodes.
CloudBurst
• Running time
  – scales linearly with the number of reads mapped
  – with near linear speedup as the number of
    processors increases.


• CloudBurst reduces the running time from
  hours to mere minutes for typical jobs
  involving mapping of millions of short reads to
  the human genome.
Algorithm Overview
• CloudBurst uses seed-and-extend algorithms to
  map reads to a reference genome.

• Seed
  – k differences : the alignment must have a region of
    length s=r/k+1 called a seed that exactly matches the
    reference.

• Extend
  – CloudBurst attempts to extend the alignment into an
    end-to-end alignment with at most k mismatches or
    differences
Algorithm Overview
• CloudBurst uses the Hadoop implementation of
  MapReduce to catalog and extend the seeds

• Map phase emits
   – all length-s k-mers from the reference sequences
   – all non-overlapping length-s kmers from the reads

• Shuffle phase
   – read and reference kmers are brought together

• Reduce phase
   – the seeds are extended into end-to-end alignments
Algorithm Overview
Demo



Getting Started.docx 참고
Related Tools
•   Bowtie: Ultrafast short read alignment
•   SoapSNP: Accurate SNP/consensus calling
•   Tophat: RNA-Seq splice junction mapper
•   Cufflinks: Isoform assembly, quantitation
•   Hadoop: Open Source MapReduce
•   CloudBurst: Sensitive MapReduce alignment
•   Crossbow: Read Mapping and SNP calling in the clouds
•   Jnomics: Cloud-Scale Sequence Analysis
•   Contrail: Cloud-based de novo assembly
•   Myrna: Cloud-Scale differential expression of RNAseq
Q&A
Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing.



                                                       출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html

Cloud burst 소개

  • 1.
    CloudBurst • CloudBurst :Highly Sensitive Short Read Mapping with MapReduce • New parallel read-mapping algorithm optimized for mapping NGS data to the human genome and other reference genomes • SNP discovery, genotyping, and personal genomics
  • 2.
    CloudBurst • It ismodeled after the short read mapping program RMAP • Reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences • This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes.
  • 3.
    CloudBurst • Running time – scales linearly with the number of reads mapped – with near linear speedup as the number of processors increases. • CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome.
  • 4.
    Algorithm Overview • CloudBurstuses seed-and-extend algorithms to map reads to a reference genome. • Seed – k differences : the alignment must have a region of length s=r/k+1 called a seed that exactly matches the reference. • Extend – CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences
  • 5.
    Algorithm Overview • CloudBurstuses the Hadoop implementation of MapReduce to catalog and extend the seeds • Map phase emits – all length-s k-mers from the reference sequences – all non-overlapping length-s kmers from the reads • Shuffle phase – read and reference kmers are brought together • Reduce phase – the seeds are extended into end-to-end alignments
  • 6.
  • 7.
  • 9.
    Related Tools • Bowtie: Ultrafast short read alignment • SoapSNP: Accurate SNP/consensus calling • Tophat: RNA-Seq splice junction mapper • Cufflinks: Isoform assembly, quantitation • Hadoop: Open Source MapReduce • CloudBurst: Sensitive MapReduce alignment • Crossbow: Read Mapping and SNP calling in the clouds • Jnomics: Cloud-Scale Sequence Analysis • Contrail: Cloud-based de novo assembly • Myrna: Cloud-Scale differential expression of RNAseq
  • 10.
  • 11.
    Figure 1: AMapReduce approach for detecting genetic variants from high-throughput genome sequencing. 출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html