SlideShare a Scribd company logo
1 of 14
Download to read offline
Base Calling Error Toleration in Reference Based
Assembly
Hadi Gharibi
Email: h_gharibi@ee.sharif.edu
Sharif University of Technology
Max Planck Institute for Molecular Genetics
May 2015
How Base Calling Error Can Be Tolerated in Next
Generation Sequencing (NGS)
2
Importance
Challenges
Our
Hypothesis
Our
Approach
• Deal with Large Amount of Data
• Impact on Sequencing Data Analysis Time and Accuracy
Researchers have developed many base calling algorithms,
however, they have not resolved the tradeoff between
accuracy and time complexity.
• Required Accuracy
• Sequencing Data Analysis Execution Time
Base Calling Error Is Compensated in Down-stream
Sequencing Steps
• Massive Data
• Diverse Algorithms
Importance: Base Calling Translates Noisy
Intensity Data Into Reads
3
© EMBO Conference, 2014 [1]
© illumina Incorporation, 2011.[2]
Intensity
Image
Processing
Base
Calling
ReadAssemblingGenome
Challenge: Base Calling Errors Are Always
Compared
4
© C. Ye, 2014 [3]
Figure: Error rate for base callers
per sequencing cycle on the
PhiX174 test data is plotted.
Accurate callers are slower than the
others. [3]
Fundamental Question:
5
Our Approach: Analytical Assumptions and
Method
6
Assumptions
• Random Genome
• Single Variations
• Mismatches << Read Length
• Uniform Substitution Error
• Equally Likely Base Errors
Method
• Variant Calling for Re-sequencing
• Derive Variant Calling Errors
Analytical Results: Base Calling Error Is
Tolerated by Mapping Mismatch
7
Figure: Variant Calling Error
Vs. Base Calling Error
Random Genome
Mismatches={2, 5, 7, 9}
Genome Size ~ 4Mbp
Read Length= 30bp
Variation Rate= 0.01
Simulation Method and Setup
8
• Generate Target Genome
• Simulate Reads [4]
• Add Base Calling Error
• Call Variants
• Calculate Variant Calling Error
Method Setup
© Gemsim, 2013[4]
Simulation Results: Simulation Verifies Analysis
Predictions
9
• E-Coli Genome [5]
• Mismatches= {3, 4, 5}
• Genome Size ~ 4Mbp
• Read Length= 30bp
• Variation Rate~ 0.01
• Single-end Shotgun Run
• Map with SOAP[6]
Figure: Variant Calling Error Vs.
Base Calling Error
© NCBI, 2014[5]
© G. BGI, 2008[6]
Simulation Results: Random Genome Obviates
Repeat Region Effect
10
• Genome Sizes ~ 4Mbp
• Mismatches= 3
• Read Length= 30bp
• Variation Rate~ 0.01
• Single-end Shotgun Run
• Map with SOAP[6]
Figure: Random Genome Vs.
E-Coli Genome
© G. BGI, 2008[6]
11
Conclusion
Simulation
Results
• Confirm the Hypothesis
• Genome Repeat Regions Impair Accuracy
• Confirm the Hypothesis
• Higher Mismatches May Not Obey
Analytical
Results
Next Steps
12
Simulation Steps
• Genome Having More Repeat Regions
• Develop Mapper with Higher Mismatches
• Genome Structure
• Paired-end Shotgun Sequencing
• Erasure Base Calling Error
• Other Variant Types
Analytical Steps
References
[1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and
Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo
[2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011.
[3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high-
throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219,
2014.
[4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing
platforms”, Briefings in Bioinformatics, 2011.
[5] GemSIM, “Gemsim,” 2013. [Online]. Available:
http://sourceforge.net/projects/gemsim
[6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,”
2014. [Online]. Available:
http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta
[7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available:
http://soap.genomics.org.cn
13
Acknowledgement
Thank You for Your Patience, Time and Attention.
14

More Related Content

Viewers also liked

Press book AFDIAG
Press book AFDIAGPress book AFDIAG
Press book AFDIAGpbcom1998
 
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of MonitoringIntro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of MonitoringThe Watershed Institute
 
shop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmshop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmqiana369
 
بيلة الفينيل كيتون
بيلة الفينيل كيتونبيلة الفينيل كيتون
بيلة الفينيل كيتونKhadija Moussayer
 
What is Globalization
What is GlobalizationWhat is Globalization
What is Globalizationvova231
 
Procurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategyProcurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategyRidwan Ibrahim
 
La reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historiaLa reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historiaalexferrerp17
 
2013session5 1
2013session5 12013session5 1
2013session5 1acvq
 

Viewers also liked (9)

Press book AFDIAG
Press book AFDIAGPress book AFDIAG
Press book AFDIAG
 
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of MonitoringIntro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
 
shop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmshop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcm
 
بيلة الفينيل كيتون
بيلة الفينيل كيتونبيلة الفينيل كيتون
بيلة الفينيل كيتون
 
What is Globalization
What is GlobalizationWhat is Globalization
What is Globalization
 
Best Tips for Logo Design
Best Tips for Logo DesignBest Tips for Logo Design
Best Tips for Logo Design
 
Procurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategyProcurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategy
 
La reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historiaLa reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historia
 
2013session5 1
2013session5 12013session5 1
2013session5 1
 

Similar to Base Calling Error Toleration in Reference Base Assembly

Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptGrace136708
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsRavi Kumar
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsConsuelo Valdes
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018GenomeInABottle
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...Bruce WANG
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017philippbayer
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop finalMeng-Ru (Raymond) Tsai
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Testyihucha
 
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...ssuser4b1f48
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 

Similar to Base Calling Error Toleration in Reference Base Assembly (20)

Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World Finals
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Test
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 

Recently uploaded

Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinNathan Cone
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docxUlahVanessaBasa
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenajana861314
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...jana861314
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTAlexander F. Mayer
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness1hk20is002
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxjana861314
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 

Recently uploaded (20)

Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig Bobchin
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongena
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWST
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 

Base Calling Error Toleration in Reference Base Assembly

  • 1. Base Calling Error Toleration in Reference Based Assembly Hadi Gharibi Email: h_gharibi@ee.sharif.edu Sharif University of Technology Max Planck Institute for Molecular Genetics May 2015
  • 2. How Base Calling Error Can Be Tolerated in Next Generation Sequencing (NGS) 2 Importance Challenges Our Hypothesis Our Approach • Deal with Large Amount of Data • Impact on Sequencing Data Analysis Time and Accuracy Researchers have developed many base calling algorithms, however, they have not resolved the tradeoff between accuracy and time complexity. • Required Accuracy • Sequencing Data Analysis Execution Time Base Calling Error Is Compensated in Down-stream Sequencing Steps • Massive Data • Diverse Algorithms
  • 3. Importance: Base Calling Translates Noisy Intensity Data Into Reads 3 © EMBO Conference, 2014 [1] © illumina Incorporation, 2011.[2] Intensity Image Processing Base Calling ReadAssemblingGenome
  • 4. Challenge: Base Calling Errors Are Always Compared 4 © C. Ye, 2014 [3] Figure: Error rate for base callers per sequencing cycle on the PhiX174 test data is plotted. Accurate callers are slower than the others. [3]
  • 6. Our Approach: Analytical Assumptions and Method 6 Assumptions • Random Genome • Single Variations • Mismatches << Read Length • Uniform Substitution Error • Equally Likely Base Errors Method • Variant Calling for Re-sequencing • Derive Variant Calling Errors
  • 7. Analytical Results: Base Calling Error Is Tolerated by Mapping Mismatch 7 Figure: Variant Calling Error Vs. Base Calling Error Random Genome Mismatches={2, 5, 7, 9} Genome Size ~ 4Mbp Read Length= 30bp Variation Rate= 0.01
  • 8. Simulation Method and Setup 8 • Generate Target Genome • Simulate Reads [4] • Add Base Calling Error • Call Variants • Calculate Variant Calling Error Method Setup © Gemsim, 2013[4]
  • 9. Simulation Results: Simulation Verifies Analysis Predictions 9 • E-Coli Genome [5] • Mismatches= {3, 4, 5} • Genome Size ~ 4Mbp • Read Length= 30bp • Variation Rate~ 0.01 • Single-end Shotgun Run • Map with SOAP[6] Figure: Variant Calling Error Vs. Base Calling Error © NCBI, 2014[5] © G. BGI, 2008[6]
  • 10. Simulation Results: Random Genome Obviates Repeat Region Effect 10 • Genome Sizes ~ 4Mbp • Mismatches= 3 • Read Length= 30bp • Variation Rate~ 0.01 • Single-end Shotgun Run • Map with SOAP[6] Figure: Random Genome Vs. E-Coli Genome © G. BGI, 2008[6]
  • 11. 11 Conclusion Simulation Results • Confirm the Hypothesis • Genome Repeat Regions Impair Accuracy • Confirm the Hypothesis • Higher Mismatches May Not Obey Analytical Results
  • 12. Next Steps 12 Simulation Steps • Genome Having More Repeat Regions • Develop Mapper with Higher Mismatches • Genome Structure • Paired-end Shotgun Sequencing • Erasure Base Calling Error • Other Variant Types Analytical Steps
  • 13. References [1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo [2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011. [3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high- throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219, 2014. [4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing platforms”, Briefings in Bioinformatics, 2011. [5] GemSIM, “Gemsim,” 2013. [Online]. Available: http://sourceforge.net/projects/gemsim [6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,” 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta [7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available: http://soap.genomics.org.cn 13
  • 14. Acknowledgement Thank You for Your Patience, Time and Attention. 14