SlideShare a Scribd company logo
New Strategy to detect SNPs
Miguel Galves
José Augusto Quitzau
Zanoni Dias
Scylla Bioinformatics –Brazil
{miguel,jquitzau,zanoni}@scylla.com.br
Agenda
 Introduction
 HIV Dataset
 Detection Strategy
 Trimming Procedure
 Base-Calling Strategies
 Filter Algorithm
 Consensus Algorithm
 Tests Protocol
 Results
 Discussion
Introduction
 Polymorphism: set of base pair locus at
which different alleles exists in individuals in
some population
– The second most frequent allele must appear in
at least 1% of the individuals
 SNP: polymorphism in a single base pair
position
 SNP discovery is very important to
understand complex diseases
HIV Dataset
 HIV genetic sequences:
– 1302 bp
– Well-conserved region
 35 batches from 35 individuals:
– 6 PCR reads, with average size of 690bp
– 1 validated sequence, with manually annotated
SNPs
 HIV Reference Sequence
Detection Strategy: Survey
 Trimming Procedure
 Base-Calling Correction
 SNPs Filter
 Batch Consensus Algorithm
Trimming Procedure
 Low Quality Ends filtering
 Converts phred’s quality sequence to error
probability sequence:
⇒ Q = -10 x log10(p)
 Subtract 0.05 from all values (Q=13)
 Maximum Score Subsequence Algorithm
Base Calling: Area Ratio
 The base calling is made in 5 Steps:
1. Chromatogram area delimitation
2. Peak search
3. Choice of the nearest peaks
4. Calculation of the nearest peaks area
5. Calculation of the polymorphic/reference peak area
 If the calculated ratio is above a certain threshold, the
point is considered a polymorphism.
Base Calling: Area Delimitation
Base Calling: Peak Identification
Base Calling: Average Height Ratio
 Almost the same steps:
1. Chromatogram area delimitation
2. Peak search
3. Choice of the nearest peaks
4. Calculation of the nearest peaks average height
5. Calculation of the polymorphic/reference peak average
height.
 Again, if the calculated ratio is above a certain
threshold, the point is considered a polymorphism.
Base Calling: Peak Identification
Filter Algorithm
 Analyzes each sequence
 Uses a window based algorithm to eliminate
adjacents SNPs
– Window size: 11 bases
– Empirical score system assigned to polymorphism
in the window
Consensus Algorithm
 Rule-based algorithm
– Empirical rules
 Analyzes the whole cross section to define a
consensus
– Take account of nucleotide frequencies and
qualities
 Do not create N symbols, nor tri-allelic
polymorphisms.
Consensus Algorithm: Example
Sequence 1 A25 C30 C18 C30 A21
Sequence 2 A30 C25 C15 C25 A16
Sequence 3 - M18 A9 C30 -
Sequence 4 - - S12 G17 T18
Consensus A M S S W
Tests Protocol: Third Party Packages
 Two external packages used to compare our results:
– Polybayes: SNP detection tool based on Bayesian
Methods
– Polyphred: SNP detection tool based on chromatogram
analysis
 ACE file (contig and consensus) created for each
batch using phrap
 ACE file analyzed by Polyphred and Polybayes
 Results viewed with consed
Tests Protocol: Our strategy
 Reads trimmed using Maximum
Subsequence Algorithm
 Base-calling analysis and correction using
algorithms describe previously
 SNP filtering
 Multiple alignment
– Reference sequence as anchor
 Consensus creation
Third Party Results: Polybayes
 Polybayes detected SNPs in only 2 batches out of 35
Batch Existing
SNPs
Detected
SNPs
Correct
SNPs
False
Positives
False
Negatives
Batch 13 12 1 1 0 11
Batch 15 5 1 0 1 5
Third Party Results: Polyphred
 Polyphred detected SNPs in only 4 batches out of 35
Batch Existing
SNPs
Detected
SNPs
Correct
SNPs
False
Positives
False
Negatives
Batch 07 10 1 0 1 10
Batch 14 4 3 0 3 4
Batch 32 26 1 0 1 26
Batch 35 15 8 1 7 14
Trimming Results
 Reads average size:
– Before trimming: 690.15bp
– After trimming: 374.74bp
– Reduction of 45%
 Reference sequence average base coverage
– Before trimming: 2.69
– After trimming: 1.77
Results: True Positive (%) x batch
Results: False Negative (%) x batch
Results: False Positive (%) x batch
Results: Summary
Polybayes Polyphred Area Avg. Height
Avg SD Avg SD Avg SD Avg SD
TP 0.3 1.4 0.2 1.1 75.4 19.2 52.6 21.5
FN 99.7 1.4 99.8 1.1 23.2 18.4 45.6 21.7
DP 0.0 0.0 0.0 0.0 1.4 4.3 1.8 4.0
FP 2.9 16.9 11.1 31.3 393.9 312.3 554.4 511.3
TP + FN + DP = 100%
Discussion
 Polybayes and Polyphred need large sets of data to
produces good results
 Our algorithm produces quite satisfactory results
taking into account data characteristics:
– Low average coverage
– High amount of low quality bases
– High amount of polymorphisms (virus DNA)
 Area Ratio strategy produces better results than
Average Height strategy
Future Work
 Test the algorithms whith larger batches,
whith higher average coverage, to improve
consensus algorithm
 Reproduce the experiments using genetic
sequences of more conserved life forms,
such as mammals
Acknowledgments

More Related Content

What's hot

Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
naveed ul mushtaq
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro array
krupa sagar
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chip
namvgta
 
15 arrays
15 arrays15 arrays
Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016
Dhirend N. Singh
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
Scott Dawson
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
Varsha Gayatonde
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
ajay301
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
Ann Mary Mathew
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary data
FAO
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
Anilkumar C
 
Protein microarray
Protein microarrayProtein microarray
Protein microarray
Ghalia Nawal
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype Estimator
Devin Petersohn
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
Bahauddin Zakariya University lahore
 
Candidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementCandidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop Improvement
BonipasAntony2
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
Jhoirene Clemente
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
Ronak Shah
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
university of education,Lahore
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its application
prateek kumar
 

What's hot (20)

Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro array
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chip
 
15 arrays
15 arrays15 arrays
15 arrays
 
Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary data
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
 
Protein microarray
Protein microarrayProtein microarray
Protein microarray
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype Estimator
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Candidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementCandidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop Improvement
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its application
 

Viewers also liked

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
Thomas Keane
 
Snp
SnpSnp
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
Bipul Das
 
7 0
7 07 0
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP ID
cgstorer
 
Over- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdfOver- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdf
Jennifer Spencer
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Thermo Fisher Scientific
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
MUBOSScz
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
Avjinder (Avi) Kaler
 
SNP
SNPSNP
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
Senthil Natesan
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
Karan Veer Singh
 
SNP
SNPSNP
Genetic polymorphism
Genetic polymorphismGenetic polymorphism
Genetic polymorphism
Dandu Prasad Reddy
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
Kumar Gaurav
 

Viewers also liked (15)

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Snp
SnpSnp
Snp
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
 
7 0
7 07 0
7 0
 
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP ID
 
Over- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdfOver- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdf
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
SNP
SNPSNP
SNP
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
SNP
SNPSNP
SNP
 
Genetic polymorphism
Genetic polymorphismGenetic polymorphism
Genetic polymorphism
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 

Similar to New Strategy to detect SNPs

SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
Integrated DNA Technologies
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
QIAGEN
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Thermo Fisher Scientific
 
Cn presentation
Cn presentationCn presentation
Cn presentation
Elsa von Licy
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
QIAGEN
 
TIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracyTIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracy
Anax Fotopoulos
 
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
Lawrence Greenfield
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
Valeriya Simeonova
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Thermo Fisher Scientific
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
Elsa von Licy
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategies
Elsa von Licy
 
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Thermo Fisher Scientific
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
Alejandro Correa Bahnsen, PhD
 
Pcr array 2013
Pcr array 2013Pcr array 2013
Pcr array 2013
Elsa von Licy
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
Patricia Francis-Lyon
 
Technical Tips for qPCR
Technical Tips for qPCRTechnical Tips for qPCR
Technical Tips for qPCR
Integrated DNA Technologies
 
Principle, Procedure and applications of Digital PCR.pptx
Principle, Procedure  and applications of Digital PCR.pptxPrinciple, Procedure  and applications of Digital PCR.pptx
Principle, Procedure and applications of Digital PCR.pptx
Vikramadityaupmanyu
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer study
Swift Biosciences
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
Computer Science Club
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
Elsa von Licy
 

Similar to New Strategy to detect SNPs (20)

SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
Cn presentation
Cn presentationCn presentation
Cn presentation
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 
TIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracyTIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracy
 
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategies
 
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
Pcr array 2013
Pcr array 2013Pcr array 2013
Pcr array 2013
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Technical Tips for qPCR
Technical Tips for qPCRTechnical Tips for qPCR
Technical Tips for qPCR
 
Principle, Procedure and applications of Digital PCR.pptx
Principle, Procedure  and applications of Digital PCR.pptxPrinciple, Procedure  and applications of Digital PCR.pptx
Principle, Procedure and applications of Digital PCR.pptx
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer study
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 

More from Miguel Galves

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Miguel Galves
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014
Miguel Galves
 
Comparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment MethodsComparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment Methods
Miguel Galves
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de Mestrado
Miguel Galves
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base única
Miguel Galves
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento web
Miguel Galves
 
GIS em 3 horas
GIS em 3 horasGIS em 3 horas
GIS em 3 horas
Miguel Galves
 
AJAX
AJAXAJAX
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociais
Miguel Galves
 

More from Miguel Galves (9)

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014
 
Comparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment MethodsComparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment Methods
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de Mestrado
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base única
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento web
 
GIS em 3 horas
GIS em 3 horasGIS em 3 horas
GIS em 3 horas
 
AJAX
AJAXAJAX
AJAX
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociais
 

Recently uploaded

gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 

Recently uploaded (20)

gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 

New Strategy to detect SNPs

  • 1. New Strategy to detect SNPs Miguel Galves José Augusto Quitzau Zanoni Dias Scylla Bioinformatics –Brazil {miguel,jquitzau,zanoni}@scylla.com.br
  • 2. Agenda  Introduction  HIV Dataset  Detection Strategy  Trimming Procedure  Base-Calling Strategies  Filter Algorithm  Consensus Algorithm  Tests Protocol  Results  Discussion
  • 3. Introduction  Polymorphism: set of base pair locus at which different alleles exists in individuals in some population – The second most frequent allele must appear in at least 1% of the individuals  SNP: polymorphism in a single base pair position  SNP discovery is very important to understand complex diseases
  • 4. HIV Dataset  HIV genetic sequences: – 1302 bp – Well-conserved region  35 batches from 35 individuals: – 6 PCR reads, with average size of 690bp – 1 validated sequence, with manually annotated SNPs  HIV Reference Sequence
  • 5. Detection Strategy: Survey  Trimming Procedure  Base-Calling Correction  SNPs Filter  Batch Consensus Algorithm
  • 6. Trimming Procedure  Low Quality Ends filtering  Converts phred’s quality sequence to error probability sequence: ⇒ Q = -10 x log10(p)  Subtract 0.05 from all values (Q=13)  Maximum Score Subsequence Algorithm
  • 7. Base Calling: Area Ratio  The base calling is made in 5 Steps: 1. Chromatogram area delimitation 2. Peak search 3. Choice of the nearest peaks 4. Calculation of the nearest peaks area 5. Calculation of the polymorphic/reference peak area  If the calculated ratio is above a certain threshold, the point is considered a polymorphism.
  • 8. Base Calling: Area Delimitation
  • 9. Base Calling: Peak Identification
  • 10. Base Calling: Average Height Ratio  Almost the same steps: 1. Chromatogram area delimitation 2. Peak search 3. Choice of the nearest peaks 4. Calculation of the nearest peaks average height 5. Calculation of the polymorphic/reference peak average height.  Again, if the calculated ratio is above a certain threshold, the point is considered a polymorphism.
  • 11. Base Calling: Peak Identification
  • 12. Filter Algorithm  Analyzes each sequence  Uses a window based algorithm to eliminate adjacents SNPs – Window size: 11 bases – Empirical score system assigned to polymorphism in the window
  • 13. Consensus Algorithm  Rule-based algorithm – Empirical rules  Analyzes the whole cross section to define a consensus – Take account of nucleotide frequencies and qualities  Do not create N symbols, nor tri-allelic polymorphisms.
  • 14. Consensus Algorithm: Example Sequence 1 A25 C30 C18 C30 A21 Sequence 2 A30 C25 C15 C25 A16 Sequence 3 - M18 A9 C30 - Sequence 4 - - S12 G17 T18 Consensus A M S S W
  • 15. Tests Protocol: Third Party Packages  Two external packages used to compare our results: – Polybayes: SNP detection tool based on Bayesian Methods – Polyphred: SNP detection tool based on chromatogram analysis  ACE file (contig and consensus) created for each batch using phrap  ACE file analyzed by Polyphred and Polybayes  Results viewed with consed
  • 16. Tests Protocol: Our strategy  Reads trimmed using Maximum Subsequence Algorithm  Base-calling analysis and correction using algorithms describe previously  SNP filtering  Multiple alignment – Reference sequence as anchor  Consensus creation
  • 17. Third Party Results: Polybayes  Polybayes detected SNPs in only 2 batches out of 35 Batch Existing SNPs Detected SNPs Correct SNPs False Positives False Negatives Batch 13 12 1 1 0 11 Batch 15 5 1 0 1 5
  • 18. Third Party Results: Polyphred  Polyphred detected SNPs in only 4 batches out of 35 Batch Existing SNPs Detected SNPs Correct SNPs False Positives False Negatives Batch 07 10 1 0 1 10 Batch 14 4 3 0 3 4 Batch 32 26 1 0 1 26 Batch 35 15 8 1 7 14
  • 19. Trimming Results  Reads average size: – Before trimming: 690.15bp – After trimming: 374.74bp – Reduction of 45%  Reference sequence average base coverage – Before trimming: 2.69 – After trimming: 1.77
  • 20. Results: True Positive (%) x batch
  • 21. Results: False Negative (%) x batch
  • 22. Results: False Positive (%) x batch
  • 23. Results: Summary Polybayes Polyphred Area Avg. Height Avg SD Avg SD Avg SD Avg SD TP 0.3 1.4 0.2 1.1 75.4 19.2 52.6 21.5 FN 99.7 1.4 99.8 1.1 23.2 18.4 45.6 21.7 DP 0.0 0.0 0.0 0.0 1.4 4.3 1.8 4.0 FP 2.9 16.9 11.1 31.3 393.9 312.3 554.4 511.3 TP + FN + DP = 100%
  • 24. Discussion  Polybayes and Polyphred need large sets of data to produces good results  Our algorithm produces quite satisfactory results taking into account data characteristics: – Low average coverage – High amount of low quality bases – High amount of polymorphisms (virus DNA)  Area Ratio strategy produces better results than Average Height strategy
  • 25. Future Work  Test the algorithms whith larger batches, whith higher average coverage, to improve consensus algorithm  Reproduce the experiments using genetic sequences of more conserved life forms, such as mammals