SlideShare a Scribd company logo
1 of 11
CloudBurst
• CloudBurst : Highly Sensitive Short Read
  Mapping with MapReduce

• New parallel read-mapping algorithm
  optimized for mapping NGS data to the
  human genome and other reference
  genomes

• SNP discovery, genotyping, and personal
  genomics
CloudBurst
• It is modeled after the short read mapping
  program RMAP

• Reports either all alignments or the unambiguous
  best alignment for each read with any number of
  mismatches or differences

• This level of sensitivity could be prohibitively time
  consuming, but CloudBurst uses the open-source
  Hadoop implementation of MapReduce to
  parallelize execution using multiple compute
  nodes.
CloudBurst
• Running time
  – scales linearly with the number of reads mapped
  – with near linear speedup as the number of
    processors increases.


• CloudBurst reduces the running time from
  hours to mere minutes for typical jobs
  involving mapping of millions of short reads to
  the human genome.
Algorithm Overview
• CloudBurst uses seed-and-extend algorithms to
  map reads to a reference genome.

• Seed
  – k differences : the alignment must have a region of
    length s=r/k+1 called a seed that exactly matches the
    reference.

• Extend
  – CloudBurst attempts to extend the alignment into an
    end-to-end alignment with at most k mismatches or
    differences
Algorithm Overview
• CloudBurst uses the Hadoop implementation of
  MapReduce to catalog and extend the seeds

• Map phase emits
   – all length-s k-mers from the reference sequences
   – all non-overlapping length-s kmers from the reads

• Shuffle phase
   – read and reference kmers are brought together

• Reduce phase
   – the seeds are extended into end-to-end alignments
Algorithm Overview
Demo



Getting Started.docx 참고
Related Tools
•   Bowtie: Ultrafast short read alignment
•   SoapSNP: Accurate SNP/consensus calling
•   Tophat: RNA-Seq splice junction mapper
•   Cufflinks: Isoform assembly, quantitation
•   Hadoop: Open Source MapReduce
•   CloudBurst: Sensitive MapReduce alignment
•   Crossbow: Read Mapping and SNP calling in the clouds
•   Jnomics: Cloud-Scale Sequence Analysis
•   Contrail: Cloud-based de novo assembly
•   Myrna: Cloud-Scale differential expression of RNAseq
Q&A
Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing.



                                                       출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html

More Related Content

What's hot

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)Takumi Asai
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource AllocationSujith Jay Nair
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Koichi Shirahata
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D
 
Data mining-2011-09
Data mining-2011-09Data mining-2011-09
Data mining-2011-09Ted Dunning
 

What's hot (20)

ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource Allocation
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Data mining-2011-09
Data mining-2011-09Data mining-2011-09
Data mining-2011-09
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 

Viewers also liked (20)

Natural Disaster - Cloud Burst
Natural Disaster - Cloud BurstNatural Disaster - Cloud Burst
Natural Disaster - Cloud Burst
 
Weather winds, air masses, air pressures, fronts
Weather   winds, air masses, air pressures, frontsWeather   winds, air masses, air pressures, fronts
Weather winds, air masses, air pressures, fronts
 
2010 leh cloud bursting
2010 leh cloud bursting2010 leh cloud bursting
2010 leh cloud bursting
 
Портрет бабушки
Портрет бабушкиПортрет бабушки
Портрет бабушки
 
Laporan
LaporanLaporan
Laporan
 
турнир вежливых ребят
турнир вежливых ребяттурнир вежливых ребят
турнир вежливых ребят
 
Indira's New York Food tour
Indira's New York Food tourIndira's New York Food tour
Indira's New York Food tour
 
El aquí y el ahora
El aquí y el ahoraEl aquí y el ahora
El aquí y el ahora
 
Larisa
LarisaLarisa
Larisa
 
Shira
ShiraShira
Shira
 
Google AdWords 101
Google AdWords 101Google AdWords 101
Google AdWords 101
 
Tabitha 2012 power point
Tabitha 2012 power pointTabitha 2012 power point
Tabitha 2012 power point
 
Azul
AzulAzul
Azul
 
Física y ciencia
Física y cienciaFísica y ciencia
Física y ciencia
 
Обо мне
Обо мнеОбо мне
Обо мне
 
ADHD In The Workplace
ADHD In The WorkplaceADHD In The Workplace
ADHD In The Workplace
 
Cорокин Захар Артёмович
Cорокин Захар АртёмовичCорокин Захар Артёмович
Cорокин Захар Артёмович
 
конёк горбунок
конёк горбунокконёк горбунок
конёк горбунок
 
WaterHackathonTelAviv - instructions
WaterHackathonTelAviv - instructionsWaterHackathonTelAviv - instructions
WaterHackathonTelAviv - instructions
 
Встреча с поэтом Е.Н. Ткач
Встреча с поэтом Е.Н. ТкачВстреча с поэтом Е.Н. Ткач
Встреча с поэтом Е.Н. Ткач
 

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst

Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduceScott Cinnamond
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersAmjith Singh
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataFrens Jan Rumph
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc NetworksKishan Patel
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Predictionivaderivader
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution vivek pratap singh
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCHimanshu Bedi
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst (20)

NGBT_poster_v0.4
NGBT_poster_v0.4NGBT_poster_v0.4
NGBT_poster_v0.4
 
Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduce
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc Networks
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Wireless Sensor
Wireless SensorWireless Sensor
Wireless Sensor
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
CCI DAY PRESENTATION
CCI DAY PRESENTATIONCCI DAY PRESENTATION
CCI DAY PRESENTATION
 
Manet
ManetManet
Manet
 
9517cnc06
9517cnc069517cnc06
9517cnc06
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Fa25939942
Fa25939942Fa25939942
Fa25939942
 

More from 주영 송

5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용주영 송
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification주영 송
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)주영 송
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)주영 송
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)주영 송
 
Cloud burst tutorial
Cloud burst tutorialCloud burst tutorial
Cloud burst tutorial주영 송
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7주영 송
 

More from 주영 송 (12)

R_datamining
R_dataminingR_datamining
R_datamining
 
Giraph
GiraphGiraph
Giraph
 
Mahout
MahoutMahout
Mahout
 
5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)
 
Cloud burst tutorial
Cloud burst tutorialCloud burst tutorial
Cloud burst tutorial
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
R intro
R introR intro
R intro
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Highly Sensitive Cloud-Based Read Mapping with CloudBurst

  • 1. CloudBurst • CloudBurst : Highly Sensitive Short Read Mapping with MapReduce • New parallel read-mapping algorithm optimized for mapping NGS data to the human genome and other reference genomes • SNP discovery, genotyping, and personal genomics
  • 2. CloudBurst • It is modeled after the short read mapping program RMAP • Reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences • This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes.
  • 3. CloudBurst • Running time – scales linearly with the number of reads mapped – with near linear speedup as the number of processors increases. • CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome.
  • 4. Algorithm Overview • CloudBurst uses seed-and-extend algorithms to map reads to a reference genome. • Seed – k differences : the alignment must have a region of length s=r/k+1 called a seed that exactly matches the reference. • Extend – CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences
  • 5. Algorithm Overview • CloudBurst uses the Hadoop implementation of MapReduce to catalog and extend the seeds • Map phase emits – all length-s k-mers from the reference sequences – all non-overlapping length-s kmers from the reads • Shuffle phase – read and reference kmers are brought together • Reduce phase – the seeds are extended into end-to-end alignments
  • 8.
  • 9. Related Tools • Bowtie: Ultrafast short read alignment • SoapSNP: Accurate SNP/consensus calling • Tophat: RNA-Seq splice junction mapper • Cufflinks: Isoform assembly, quantitation • Hadoop: Open Source MapReduce • CloudBurst: Sensitive MapReduce alignment • Crossbow: Read Mapping and SNP calling in the clouds • Jnomics: Cloud-Scale Sequence Analysis • Contrail: Cloud-based de novo assembly • Myrna: Cloud-Scale differential expression of RNAseq
  • 10. Q&A
  • 11. Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing. 출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html