SlideShare a Scribd company logo
Source:
Little DP. DNA barcode sequence identification incorporating taxonomic
hierarchy and within taxon variability. PLoS One. 2011;6(8):e20552.
Raunak Shrestha
13th Oct. 2011
What is DNA Barcoding?
Barcoding is a standardized
approach to identifying plants and
animals by minimal sequences of
DNA, called DNA barcodes.
DNA Barcode: A short DNA sequence, from
a uniform locality on the genome, used for
identifying species.
C A T G
DNA Barcoding developments
2003
DNA Barcoding developments (cont….)
2005
2007
DNA Barcoding developments (cont….)
2008
2009• MULTI-LOCUS GENE APPROACH FOR PLANT DNA
BARCODING
• Chloroplast genes matK + rbcL recommended as the
barcode regions
COI 1560 bp
BARCODE 648 bp
MINI-COI (186 bp)
ProblemswithconventionalSequence
IdentificationEngines(SIDEs)
Source: Dr. F. Brinkman. Lecture slide-4 MBB741, 2011
SIDEs such as
BLAST does not
consider
Taxonomic
Hierarchy
Information
Blastp Results
• Even a difference of single nucleotide can have
significant impact on DNA Barcoding interpretation
• SIDEs such as BLAST and FASTA “corrects” it to overcome
the sampling biasness.
• For closely related species, SIDEs such as BLAST and
FASTA usually cannot diagnose such organism as separate
species or of different taxon hierarchy
ProblemswithconventionalSequence
IdentificationEngines(SIDEs)(cont….)
Character based Identification
ProblemswithconventionalSequence
IdentificationEngines(SIDEs)(cont….)
• In a huge dataset using Parsimonous tree building method can
generate large number of possible solution for even a small
number of terminals
• “Computationally Expensive”
• Character-based phylogenetic methods requires multiple-
sequence alignment (MSA).
• Several MSA tools may not be able to efficiently align the
barcode sequences
• Barcode sequence:
• Inter Species Variation > Intra-Species Variation
• Conserved enough so that it could be amplified with ‘universal PCR
primers’ .
Phylogenetic Method based Identification
BRONX algorithm
• BRONX (Barcode Recognition Obtained with Nucleotide
eXpose´s)
• use an uncorrected character–based measure of similarity,
• work with difficult to align markers,
• capitalize upon knowledge of hierarchic evolutionary
relationships,
• indicate ambiguous classification assignments, and
• account for within taxon variation.
BRONX algorithm (cont…)
• Reduces the reference sequences into a series of characters
defined by flanking context (‘pretext’ and ‘postext’)
The size of the pretext/postext used, and the range of
text sizes stored, may vary by implementation.
BRONX algorithm (cont…)
• Uses exhaustive tree construction algorithm
• Then it starts comparing the sequences of each terminal
• Match the pretext and the postext of the paired sequences
• If there is a pretext match as well as postext match
• Score for each combination shared with the paired sequences
• If no match
• Determine all possible postext combination downstream of the
matched pretext
• Choose the nearest postext match to the postext and align
sequences accordingly
• Choose next postext and align the sequence
• Score all the all alignment
• The alignment with the highest final score is(are) considered
identification
Objective of the paper
To test the accuracy of BRONX sequence
identification against leading published
SIDEs.
Dataset
• DNA Barcode sequence of matK and rbcL from databases
• Sequences chosen only if both the sequences of matK and
rbcL were obtained from same individual (voucher specimen)
• Global multiple sequence alignment
• Alignment refined with MUSCLE
• Sequence trimmed to be amplified with the following PCR
primers
• matK 3F (5’-CGTACAGTACTTTTGTGTTTACGAG-3’)
• matK 1R (5’-ACCCAGTCCATCTGGAAATCTTGGTTC-3’)
• rbcL aF (5’-ATGTCACCACAAACAGAGACTAAAGC-3)
• rbcL aR (5’-GAAACGGTCTCTCCAACGCAT-3’)
• Final dataset: 2083 sequences of each marker representing
990 genera and 1745 species
Dataset
• Mini-barcodes:
• Each of 2083 sequences were reduced to 100-200 base
sequences as the mini-barcodes.
• Position of the barcodes were randomly chosen
Benchmarking
• Benchmark of 11 different algorithms for both DNA barcodes
and mini-barcodes
1. B = BRONX;
2. C = CAOS;
3. D = DNA–BAR/degenbar;
4. F = forced (constrained) tree–search;
5. J = SAP neighbor joining;
6. L = pairwise matching (local alignment);
7. N = NCBI-BLAST;
8. P = pairwise matching (global alignment);
9. S = SAP Barcoder;
10. T = de novo tree–search;
11. W = WU-BLAST.
Results
Genus-level
identification
Weak test of
species-level
identification
Strong test of
species-level
identification
All test of
species-level
identification
Tests of identification using full–length
barcode queries.
Results
• Genus level identification highly successful (>99%) for BRONX,
DNA-BAR/degenbar, NCBI-BLAST and pairwise matching using
full-length matK data
• rcbL not variable enough to distinguish between genera
(~97% success)
• DNA-BAR/degenbar outperformed all other SIDEs in species-
level identification
• but BRONX too was significantly better in genus-level
identification
• BRONX should be preferred for genus-level identification
queries over other SIDEs.
Results
Tests of identification using mini-barcode
queries.
Genus-level
identification
Weak test of
species-level
identification
Strong test of
species-level
identification
All test of
species-level
identification
Results
• For mini-barcode queries, identification success was relatively
lower than that of full-length queries
Identification success for strong test with
combined matK and rbcL
Full-length query
(DNA-BAR/degenbar)
Mini-barcode query
(BRONX)
91 % 47 %
• Performance of DNA-BAR/degenbar was similar to other SIDEs for
mini-barcode queries (11.24% success)
• Performance of BRONX for mini-barcode queries were better than
all other SIDEs
• Moderate agreement among SIDEs for full-length queries
(k=0.487-0.633)
• Little agreement among SIDEs for mini-barcode queries
(k =0.191-0.137)
• Identification success did not improve with combined data of
matk and rbcL.
Similarity of SIDE performance measured by Fleiss' index
of interrater agreement (k)
Results
Conclusion
• BRONX to be preferred over other SIDEs when
• Identification of genus are desired
• Mini-barcode is used for identification
• DNA-BAR/degenbar exhibit superior performance in species
level identification with full-length queries
• Due to inconstant performance no tree-based method should
be used for barcode sequence identification
• BLAST is rapid means of sequence identification but other
SIDEs provide better accuracy and consistency
Critique
• Quality of sequence data in public database -> GIGO
• DNA barcode data depends upon the primer selected to
amplify sequence
• Use of only a single primer set of each locus
• Does this mimic the real world dataset ?
• It would have been even better if the performance was
measured in terms of computing time required for analysis.
• It seems that, till date, no algorithm is available which can
incorporate both full-length query sequence as well as mini-
barcode sequence query and give higher identification success
at both genus and species level identification.
Questions
?
Thank you

More Related Content

What's hot

DNA Bar-code to Distinguish the Species
DNA Bar-code to Distinguish the SpeciesDNA Bar-code to Distinguish the Species
DNA Bar-code to Distinguish the Species
Roya Shariati
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923
drugmetabol
 

What's hot (20)

DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS
 
DNA Bar-code to Distinguish the Species
DNA Bar-code to Distinguish the SpeciesDNA Bar-code to Distinguish the Species
DNA Bar-code to Distinguish the Species
 
Plant Barcoding
Plant BarcodingPlant Barcoding
Plant Barcoding
 
David Schindel - DNA Barcoding and the consortium for the barcode of life (CBOL)
David Schindel - DNA Barcoding and the consortium for the barcode of life (CBOL)David Schindel - DNA Barcoding and the consortium for the barcode of life (CBOL)
David Schindel - DNA Barcoding and the consortium for the barcode of life (CBOL)
 
Dna barcoding
Dna  barcoding Dna  barcoding
Dna barcoding
 
DNA Barcoding and its application in species identification
DNA Barcoding and its application in species identificationDNA Barcoding and its application in species identification
DNA Barcoding and its application in species identification
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
 
Use of DNA barcoding and its role in the plant species/varietal Identifica...
Use of DNA  barcoding  and its role in the plant species/varietal  Identifica...Use of DNA  barcoding  and its role in the plant species/varietal  Identifica...
Use of DNA barcoding and its role in the plant species/varietal Identifica...
 
DNA Barcoding: A simple way of identifying species by DNA
DNA Barcoding: A simple way of identifying species by DNADNA Barcoding: A simple way of identifying species by DNA
DNA Barcoding: A simple way of identifying species by DNA
 
Identification of fish species using dna barcode from visakhapatnam, east coa...
Identification of fish species using dna barcode from visakhapatnam, east coa...Identification of fish species using dna barcode from visakhapatnam, east coa...
Identification of fish species using dna barcode from visakhapatnam, east coa...
 
Dario Lijtmaer - Brief introduction to barcoding and the current goals and ca...
Dario Lijtmaer - Brief introduction to barcoding and the current goals and ca...Dario Lijtmaer - Brief introduction to barcoding and the current goals and ca...
Dario Lijtmaer - Brief introduction to barcoding and the current goals and ca...
 
Poster (Final)
Poster (Final) Poster (Final)
Poster (Final)
 
Dna forensic
Dna forensicDna forensic
Dna forensic
 
Role of biotechnology in forensic science
Role of biotechnology in forensic scienceRole of biotechnology in forensic science
Role of biotechnology in forensic science
 
Interpretation of dna typing results and codis
Interpretation of dna typing results and codis Interpretation of dna typing results and codis
Interpretation of dna typing results and codis
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923
 
Forensic DNA Typing-M. Asif
Forensic DNA Typing-M. AsifForensic DNA Typing-M. Asif
Forensic DNA Typing-M. Asif
 
Forensics
ForensicsForensics
Forensics
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
 

Similar to DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability

20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 

Similar to DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability (20)

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Molecular markers types and applications
Molecular markers types and applicationsMolecular markers types and applications
Molecular markers types and applications
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Rflp
RflpRflp
Rflp
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
Sts
StsSts
Sts
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 

More from Raunak Shrestha

High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
High-resolution genome-wide copy-number analysis suggests a monoclonal origin...High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
Raunak Shrestha
 
Cross-species gene normalization by species inference
Cross-species gene normalization by species inferenceCross-species gene normalization by species inference
Cross-species gene normalization by species inference
Raunak Shrestha
 
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
Raunak Shrestha
 
Proteins with complex architecture as potential targets for drug design: a ca...
Proteins with complex architecture as potential targets for drug design: a ca...Proteins with complex architecture as potential targets for drug design: a ca...
Proteins with complex architecture as potential targets for drug design: a ca...
Raunak Shrestha
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
Raunak Shrestha
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of Cancer
Raunak Shrestha
 
Personalized Oncology Through Integrative High-Throughput Sequencing:
Personalized Oncology Through Integrative High-Throughput Sequencing:Personalized Oncology Through Integrative High-Throughput Sequencing:
Personalized Oncology Through Integrative High-Throughput Sequencing:
Raunak Shrestha
 
Genomic architecture and evolution of clear cell renal cell carcinomas define...
Genomic architecture and evolution of clear cell renal cell carcinomas define...Genomic architecture and evolution of clear cell renal cell carcinomas define...
Genomic architecture and evolution of clear cell renal cell carcinomas define...
Raunak Shrestha
 
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
Raunak Shrestha
 

More from Raunak Shrestha (12)

A multidimensional strategy to detect polypharmacological targets in the abse...
A multidimensional strategy to detect polypharmacological targets in the abse...A multidimensional strategy to detect polypharmacological targets in the abse...
A multidimensional strategy to detect polypharmacological targets in the abse...
 
High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
High-resolution genome-wide copy-number analysis suggests a monoclonal origin...High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
High-resolution genome-wide copy-number analysis suggests a monoclonal origin...
 
Cross-species gene normalization by species inference
Cross-species gene normalization by species inferenceCross-species gene normalization by species inference
Cross-species gene normalization by species inference
 
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
 
Improving pan-genome annotation using whole genome multiple alignment
Improving pan-genome annotation using whole genome multiple alignmentImproving pan-genome annotation using whole genome multiple alignment
Improving pan-genome annotation using whole genome multiple alignment
 
Proteins with complex architecture as potential targets for drug design: a ca...
Proteins with complex architecture as potential targets for drug design: a ca...Proteins with complex architecture as potential targets for drug design: a ca...
Proteins with complex architecture as potential targets for drug design: a ca...
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of Cancer
 
Personalized Oncology Through Integrative High-Throughput Sequencing:
Personalized Oncology Through Integrative High-Throughput Sequencing:Personalized Oncology Through Integrative High-Throughput Sequencing:
Personalized Oncology Through Integrative High-Throughput Sequencing:
 
Genomic architecture and evolution of clear cell renal cell carcinomas define...
Genomic architecture and evolution of clear cell renal cell carcinomas define...Genomic architecture and evolution of clear cell renal cell carcinomas define...
Genomic architecture and evolution of clear cell renal cell carcinomas define...
 
Emerging landscape of oncogenic signatures across human cancers
Emerging landscape of oncogenic signatures across human cancers Emerging landscape of oncogenic signatures across human cancers
Emerging landscape of oncogenic signatures across human cancers
 
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...
 

Recently uploaded

Mastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial FreedomMastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial Freedom
FatimaMary4
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 

Recently uploaded (20)

Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...
 
Retinal consideration in cataract surgery
Retinal consideration in cataract surgeryRetinal consideration in cataract surgery
Retinal consideration in cataract surgery
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
The History of Diagnostic Medical imaging
The History of Diagnostic Medical imagingThe History of Diagnostic Medical imaging
The History of Diagnostic Medical imaging
 
Mastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial FreedomMastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial Freedom
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptxDECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
 
Gauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptxGauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptx
 
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #GirlsFor Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
 
US E-cigarette Summit: Taming the nicotine industrial complex
US E-cigarette Summit: Taming the nicotine industrial complexUS E-cigarette Summit: Taming the nicotine industrial complex
US E-cigarette Summit: Taming the nicotine industrial complex
 
PT MANAGEMENT OF URINARY INCONTINENCE.pptx
PT MANAGEMENT OF URINARY INCONTINENCE.pptxPT MANAGEMENT OF URINARY INCONTINENCE.pptx
PT MANAGEMENT OF URINARY INCONTINENCE.pptx
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
 
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptxFinal CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
 
Effects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthEffects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial health
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
 
Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
Non-Invasive assessment of arterial stiffness in advanced heart failure patie...Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
 
Compare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from homeCompare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from home
 
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
 

DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability

  • 1. Source: Little DP. DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. PLoS One. 2011;6(8):e20552. Raunak Shrestha 13th Oct. 2011
  • 2. What is DNA Barcoding? Barcoding is a standardized approach to identifying plants and animals by minimal sequences of DNA, called DNA barcodes. DNA Barcode: A short DNA sequence, from a uniform locality on the genome, used for identifying species. C A T G
  • 4. DNA Barcoding developments (cont….) 2005 2007
  • 5. DNA Barcoding developments (cont….) 2008 2009• MULTI-LOCUS GENE APPROACH FOR PLANT DNA BARCODING • Chloroplast genes matK + rbcL recommended as the barcode regions COI 1560 bp BARCODE 648 bp MINI-COI (186 bp)
  • 6. ProblemswithconventionalSequence IdentificationEngines(SIDEs) Source: Dr. F. Brinkman. Lecture slide-4 MBB741, 2011 SIDEs such as BLAST does not consider Taxonomic Hierarchy Information Blastp Results
  • 7. • Even a difference of single nucleotide can have significant impact on DNA Barcoding interpretation • SIDEs such as BLAST and FASTA “corrects” it to overcome the sampling biasness. • For closely related species, SIDEs such as BLAST and FASTA usually cannot diagnose such organism as separate species or of different taxon hierarchy ProblemswithconventionalSequence IdentificationEngines(SIDEs)(cont….) Character based Identification
  • 8. ProblemswithconventionalSequence IdentificationEngines(SIDEs)(cont….) • In a huge dataset using Parsimonous tree building method can generate large number of possible solution for even a small number of terminals • “Computationally Expensive” • Character-based phylogenetic methods requires multiple- sequence alignment (MSA). • Several MSA tools may not be able to efficiently align the barcode sequences • Barcode sequence: • Inter Species Variation > Intra-Species Variation • Conserved enough so that it could be amplified with ‘universal PCR primers’ . Phylogenetic Method based Identification
  • 9. BRONX algorithm • BRONX (Barcode Recognition Obtained with Nucleotide eXpose´s) • use an uncorrected character–based measure of similarity, • work with difficult to align markers, • capitalize upon knowledge of hierarchic evolutionary relationships, • indicate ambiguous classification assignments, and • account for within taxon variation.
  • 10. BRONX algorithm (cont…) • Reduces the reference sequences into a series of characters defined by flanking context (‘pretext’ and ‘postext’) The size of the pretext/postext used, and the range of text sizes stored, may vary by implementation.
  • 11. BRONX algorithm (cont…) • Uses exhaustive tree construction algorithm • Then it starts comparing the sequences of each terminal • Match the pretext and the postext of the paired sequences • If there is a pretext match as well as postext match • Score for each combination shared with the paired sequences • If no match • Determine all possible postext combination downstream of the matched pretext • Choose the nearest postext match to the postext and align sequences accordingly • Choose next postext and align the sequence • Score all the all alignment • The alignment with the highest final score is(are) considered identification
  • 12. Objective of the paper To test the accuracy of BRONX sequence identification against leading published SIDEs.
  • 13. Dataset • DNA Barcode sequence of matK and rbcL from databases • Sequences chosen only if both the sequences of matK and rbcL were obtained from same individual (voucher specimen) • Global multiple sequence alignment • Alignment refined with MUSCLE • Sequence trimmed to be amplified with the following PCR primers • matK 3F (5’-CGTACAGTACTTTTGTGTTTACGAG-3’) • matK 1R (5’-ACCCAGTCCATCTGGAAATCTTGGTTC-3’) • rbcL aF (5’-ATGTCACCACAAACAGAGACTAAAGC-3) • rbcL aR (5’-GAAACGGTCTCTCCAACGCAT-3’) • Final dataset: 2083 sequences of each marker representing 990 genera and 1745 species
  • 14. Dataset • Mini-barcodes: • Each of 2083 sequences were reduced to 100-200 base sequences as the mini-barcodes. • Position of the barcodes were randomly chosen
  • 15. Benchmarking • Benchmark of 11 different algorithms for both DNA barcodes and mini-barcodes 1. B = BRONX; 2. C = CAOS; 3. D = DNA–BAR/degenbar; 4. F = forced (constrained) tree–search; 5. J = SAP neighbor joining; 6. L = pairwise matching (local alignment); 7. N = NCBI-BLAST; 8. P = pairwise matching (global alignment); 9. S = SAP Barcoder; 10. T = de novo tree–search; 11. W = WU-BLAST.
  • 16. Results Genus-level identification Weak test of species-level identification Strong test of species-level identification All test of species-level identification Tests of identification using full–length barcode queries.
  • 17. Results • Genus level identification highly successful (>99%) for BRONX, DNA-BAR/degenbar, NCBI-BLAST and pairwise matching using full-length matK data • rcbL not variable enough to distinguish between genera (~97% success) • DNA-BAR/degenbar outperformed all other SIDEs in species- level identification • but BRONX too was significantly better in genus-level identification • BRONX should be preferred for genus-level identification queries over other SIDEs.
  • 18. Results Tests of identification using mini-barcode queries. Genus-level identification Weak test of species-level identification Strong test of species-level identification All test of species-level identification
  • 19. Results • For mini-barcode queries, identification success was relatively lower than that of full-length queries Identification success for strong test with combined matK and rbcL Full-length query (DNA-BAR/degenbar) Mini-barcode query (BRONX) 91 % 47 % • Performance of DNA-BAR/degenbar was similar to other SIDEs for mini-barcode queries (11.24% success) • Performance of BRONX for mini-barcode queries were better than all other SIDEs
  • 20. • Moderate agreement among SIDEs for full-length queries (k=0.487-0.633) • Little agreement among SIDEs for mini-barcode queries (k =0.191-0.137) • Identification success did not improve with combined data of matk and rbcL. Similarity of SIDE performance measured by Fleiss' index of interrater agreement (k) Results
  • 21. Conclusion • BRONX to be preferred over other SIDEs when • Identification of genus are desired • Mini-barcode is used for identification • DNA-BAR/degenbar exhibit superior performance in species level identification with full-length queries • Due to inconstant performance no tree-based method should be used for barcode sequence identification • BLAST is rapid means of sequence identification but other SIDEs provide better accuracy and consistency
  • 22. Critique • Quality of sequence data in public database -> GIGO • DNA barcode data depends upon the primer selected to amplify sequence • Use of only a single primer set of each locus • Does this mimic the real world dataset ? • It would have been even better if the performance was measured in terms of computing time required for analysis. • It seems that, till date, no algorithm is available which can incorporate both full-length query sequence as well as mini- barcode sequence query and give higher identification success at both genus and species level identification.