SlideShare a Scribd company logo
Base Calling Error Toleration in Reference Based
Assembly
Hadi Gharibi
Email: h_gharibi@ee.sharif.edu
Sharif University of Technology
Max Planck Institute for Molecular Genetics
May 2015
How Base Calling Error Can Be Tolerated in Next
Generation Sequencing (NGS)
2
Importance
Challenges
Our
Hypothesis
Our
Approach
• Deal with Large Amount of Data
• Impact on Sequencing Data Analysis Time and Accuracy
Researchers have developed many base calling algorithms,
however, they have not resolved the tradeoff between
accuracy and time complexity.
• Required Accuracy
• Sequencing Data Analysis Execution Time
Base Calling Error Is Compensated in Down-stream
Sequencing Steps
• Massive Data
• Diverse Algorithms
Importance: Base Calling Translates Noisy
Intensity Data Into Reads
3
© EMBO Conference, 2014 [1]
© illumina Incorporation, 2011.[2]
Intensity
Image
Processing
Base
Calling
ReadAssemblingGenome
Challenge: Base Calling Errors Are Always
Compared
4
© C. Ye, 2014 [3]
Figure: Error rate for base callers
per sequencing cycle on the
PhiX174 test data is plotted.
Accurate callers are slower than the
others. [3]
Fundamental Question:
5
Our Approach: Analytical Assumptions and
Method
6
Assumptions
• Random Genome
• Single Variations
• Mismatches << Read Length
• Uniform Substitution Error
• Equally Likely Base Errors
Method
• Variant Calling for Re-sequencing
• Derive Variant Calling Errors
Analytical Results: Base Calling Error Is
Tolerated by Mapping Mismatch
7
Figure: Variant Calling Error
Vs. Base Calling Error
Random Genome
Mismatches={2, 5, 7, 9}
Genome Size ~ 4Mbp
Read Length= 30bp
Variation Rate= 0.01
Simulation Method and Setup
8
• Generate Target Genome
• Simulate Reads [4]
• Add Base Calling Error
• Call Variants
• Calculate Variant Calling Error
Method Setup
© Gemsim, 2013[4]
Simulation Results: Simulation Verifies Analysis
Predictions
9
• E-Coli Genome [5]
• Mismatches= {3, 4, 5}
• Genome Size ~ 4Mbp
• Read Length= 30bp
• Variation Rate~ 0.01
• Single-end Shotgun Run
• Map with SOAP[6]
Figure: Variant Calling Error Vs.
Base Calling Error
© NCBI, 2014[5]
© G. BGI, 2008[6]
Simulation Results: Random Genome Obviates
Repeat Region Effect
10
• Genome Sizes ~ 4Mbp
• Mismatches= 3
• Read Length= 30bp
• Variation Rate~ 0.01
• Single-end Shotgun Run
• Map with SOAP[6]
Figure: Random Genome Vs.
E-Coli Genome
© G. BGI, 2008[6]
11
Conclusion
Simulation
Results
• Confirm the Hypothesis
• Genome Repeat Regions Impair Accuracy
• Confirm the Hypothesis
• Higher Mismatches May Not Obey
Analytical
Results
Next Steps
12
Simulation Steps
• Genome Having More Repeat Regions
• Develop Mapper with Higher Mismatches
• Genome Structure
• Paired-end Shotgun Sequencing
• Erasure Base Calling Error
• Other Variant Types
Analytical Steps
References
[1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and
Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo
[2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011.
[3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high-
throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219,
2014.
[4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing
platforms”, Briefings in Bioinformatics, 2011.
[5] GemSIM, “Gemsim,” 2013. [Online]. Available:
http://sourceforge.net/projects/gemsim
[6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,”
2014. [Online]. Available:
http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta
[7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available:
http://soap.genomics.org.cn
13
Acknowledgement
Thank You for Your Patience, Time and Attention.
14

More Related Content

Viewers also liked

Press book AFDIAG
Press book AFDIAGPress book AFDIAG
Press book AFDIAG
pbcom1998
 
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of MonitoringIntro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
The Watershed Institute
 
shop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmshop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmqiana369
 
بيلة الفينيل كيتون
بيلة الفينيل كيتونبيلة الفينيل كيتون
بيلة الفينيل كيتون
Khadija Moussayer
 
What is Globalization
What is GlobalizationWhat is Globalization
What is Globalization
vova231
 
Best Tips for Logo Design
Best Tips for Logo DesignBest Tips for Logo Design
Best Tips for Logo Design
Mad Mind Studios
 
Procurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategyProcurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategy
Ridwan Ibrahim
 
La reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historiaLa reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historia
alexferrerp17
 
2013session5 1
2013session5 12013session5 1
2013session5 1acvq
 

Viewers also liked (9)

Press book AFDIAG
Press book AFDIAGPress book AFDIAG
Press book AFDIAG
 
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of MonitoringIntro Getting Your Feet Wet: Intro to Different Types of Monitoring
Intro Getting Your Feet Wet: Intro to Different Types of Monitoring
 
shop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcmshop mua đồng hồ casio ở tphcm
shop mua đồng hồ casio ở tphcm
 
بيلة الفينيل كيتون
بيلة الفينيل كيتونبيلة الفينيل كيتون
بيلة الفينيل كيتون
 
What is Globalization
What is GlobalizationWhat is Globalization
What is Globalization
 
Best Tips for Logo Design
Best Tips for Logo DesignBest Tips for Logo Design
Best Tips for Logo Design
 
Procurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategyProcurement negotiation and contract drafting strategy
Procurement negotiation and contract drafting strategy
 
La reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historiaLa reproducción de la imagen y su impacto en la historia
La reproducción de la imagen y su impacto en la historia
 
2013session5 1
2013session5 12013session5 1
2013session5 1
 

Similar to Base Calling Error Toleration in Reference Base Assembly

Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
GenomeInABottle
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
CSIRO
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
GenomeInABottle
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
Grace136708
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
Ravi Kumar
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World Finals
Consuelo Valdes
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
GenomeInABottle
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker, Inc.
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
Oregon State University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...
Bruce WANG
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
philippbayer
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
Meng-Ru (Raymond) Tsai
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Test
yihucha
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
c.titus.brown
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
Zachary S. Brown
 
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
ssuser4b1f48
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
Michaela Greiler
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
Taesu Kim
 

Similar to Base Calling Error Toleration in Reference Base Assembly (20)

Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World Finals
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Test
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 

Recently uploaded

cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
FarhanaHussain18
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
DrRajeshDas
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 

Recently uploaded (20)

cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 

Base Calling Error Toleration in Reference Base Assembly

  • 1. Base Calling Error Toleration in Reference Based Assembly Hadi Gharibi Email: h_gharibi@ee.sharif.edu Sharif University of Technology Max Planck Institute for Molecular Genetics May 2015
  • 2. How Base Calling Error Can Be Tolerated in Next Generation Sequencing (NGS) 2 Importance Challenges Our Hypothesis Our Approach • Deal with Large Amount of Data • Impact on Sequencing Data Analysis Time and Accuracy Researchers have developed many base calling algorithms, however, they have not resolved the tradeoff between accuracy and time complexity. • Required Accuracy • Sequencing Data Analysis Execution Time Base Calling Error Is Compensated in Down-stream Sequencing Steps • Massive Data • Diverse Algorithms
  • 3. Importance: Base Calling Translates Noisy Intensity Data Into Reads 3 © EMBO Conference, 2014 [1] © illumina Incorporation, 2011.[2] Intensity Image Processing Base Calling ReadAssemblingGenome
  • 4. Challenge: Base Calling Errors Are Always Compared 4 © C. Ye, 2014 [3] Figure: Error rate for base callers per sequencing cycle on the PhiX174 test data is plotted. Accurate callers are slower than the others. [3]
  • 6. Our Approach: Analytical Assumptions and Method 6 Assumptions • Random Genome • Single Variations • Mismatches << Read Length • Uniform Substitution Error • Equally Likely Base Errors Method • Variant Calling for Re-sequencing • Derive Variant Calling Errors
  • 7. Analytical Results: Base Calling Error Is Tolerated by Mapping Mismatch 7 Figure: Variant Calling Error Vs. Base Calling Error Random Genome Mismatches={2, 5, 7, 9} Genome Size ~ 4Mbp Read Length= 30bp Variation Rate= 0.01
  • 8. Simulation Method and Setup 8 • Generate Target Genome • Simulate Reads [4] • Add Base Calling Error • Call Variants • Calculate Variant Calling Error Method Setup © Gemsim, 2013[4]
  • 9. Simulation Results: Simulation Verifies Analysis Predictions 9 • E-Coli Genome [5] • Mismatches= {3, 4, 5} • Genome Size ~ 4Mbp • Read Length= 30bp • Variation Rate~ 0.01 • Single-end Shotgun Run • Map with SOAP[6] Figure: Variant Calling Error Vs. Base Calling Error © NCBI, 2014[5] © G. BGI, 2008[6]
  • 10. Simulation Results: Random Genome Obviates Repeat Region Effect 10 • Genome Sizes ~ 4Mbp • Mismatches= 3 • Read Length= 30bp • Variation Rate~ 0.01 • Single-end Shotgun Run • Map with SOAP[6] Figure: Random Genome Vs. E-Coli Genome © G. BGI, 2008[6]
  • 11. 11 Conclusion Simulation Results • Confirm the Hypothesis • Genome Repeat Regions Impair Accuracy • Confirm the Hypothesis • Higher Mismatches May Not Obey Analytical Results
  • 12. Next Steps 12 Simulation Steps • Genome Having More Repeat Regions • Develop Mapper with Higher Mismatches • Genome Structure • Paired-end Shotgun Sequencing • Erasure Base Calling Error • Other Variant Types Analytical Steps
  • 13. References [1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo [2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011. [3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high- throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219, 2014. [4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing platforms”, Briefings in Bioinformatics, 2011. [5] GemSIM, “Gemsim,” 2013. [Online]. Available: http://sourceforge.net/projects/gemsim [6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,” 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta [7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available: http://soap.genomics.org.cn 13
  • 14. Acknowledgement Thank You for Your Patience, Time and Attention. 14