BRONX algorithm outperforms other sequence identification engines (SIDEs) like BLAST and FASTA in identifying plant and animal species from DNA barcode sequences, especially at the genus level and when using mini-barcode sequences. The study evaluated 11 SIDEs including BRONX, BLAST, and DNA-BAR on their ability to correctly identify species and genera of plants from DNA barcode sequences of the matK and rbcL genes. BRONX performed best for genus-level identification of full-length sequences and for all levels of identification using mini-barcodes. While DNA-BAR worked best on species identification with full-length barcodes, BRONX was comparable and improved on mini-barcode queries, demonstrating its ability
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability
1. Source:
Little DP. DNA barcode sequence identification incorporating taxonomic
hierarchy and within taxon variability. PLoS One. 2011;6(8):e20552.
Raunak Shrestha
13th Oct. 2011
2. What is DNA Barcoding?
Barcoding is a standardized
approach to identifying plants and
animals by minimal sequences of
DNA, called DNA barcodes.
DNA Barcode: A short DNA sequence, from
a uniform locality on the genome, used for
identifying species.
C A T G
5. DNA Barcoding developments (cont….)
2008
2009• MULTI-LOCUS GENE APPROACH FOR PLANT DNA
BARCODING
• Chloroplast genes matK + rbcL recommended as the
barcode regions
COI 1560 bp
BARCODE 648 bp
MINI-COI (186 bp)
7. • Even a difference of single nucleotide can have
significant impact on DNA Barcoding interpretation
• SIDEs such as BLAST and FASTA “corrects” it to overcome
the sampling biasness.
• For closely related species, SIDEs such as BLAST and
FASTA usually cannot diagnose such organism as separate
species or of different taxon hierarchy
ProblemswithconventionalSequence
IdentificationEngines(SIDEs)(cont….)
Character based Identification
8. ProblemswithconventionalSequence
IdentificationEngines(SIDEs)(cont….)
• In a huge dataset using Parsimonous tree building method can
generate large number of possible solution for even a small
number of terminals
• “Computationally Expensive”
• Character-based phylogenetic methods requires multiple-
sequence alignment (MSA).
• Several MSA tools may not be able to efficiently align the
barcode sequences
• Barcode sequence:
• Inter Species Variation > Intra-Species Variation
• Conserved enough so that it could be amplified with ‘universal PCR
primers’ .
Phylogenetic Method based Identification
9. BRONX algorithm
• BRONX (Barcode Recognition Obtained with Nucleotide
eXpose´s)
• use an uncorrected character–based measure of similarity,
• work with difficult to align markers,
• capitalize upon knowledge of hierarchic evolutionary
relationships,
• indicate ambiguous classification assignments, and
• account for within taxon variation.
10. BRONX algorithm (cont…)
• Reduces the reference sequences into a series of characters
defined by flanking context (‘pretext’ and ‘postext’)
The size of the pretext/postext used, and the range of
text sizes stored, may vary by implementation.
11. BRONX algorithm (cont…)
• Uses exhaustive tree construction algorithm
• Then it starts comparing the sequences of each terminal
• Match the pretext and the postext of the paired sequences
• If there is a pretext match as well as postext match
• Score for each combination shared with the paired sequences
• If no match
• Determine all possible postext combination downstream of the
matched pretext
• Choose the nearest postext match to the postext and align
sequences accordingly
• Choose next postext and align the sequence
• Score all the all alignment
• The alignment with the highest final score is(are) considered
identification
12. Objective of the paper
To test the accuracy of BRONX sequence
identification against leading published
SIDEs.
13. Dataset
• DNA Barcode sequence of matK and rbcL from databases
• Sequences chosen only if both the sequences of matK and
rbcL were obtained from same individual (voucher specimen)
• Global multiple sequence alignment
• Alignment refined with MUSCLE
• Sequence trimmed to be amplified with the following PCR
primers
• matK 3F (5’-CGTACAGTACTTTTGTGTTTACGAG-3’)
• matK 1R (5’-ACCCAGTCCATCTGGAAATCTTGGTTC-3’)
• rbcL aF (5’-ATGTCACCACAAACAGAGACTAAAGC-3)
• rbcL aR (5’-GAAACGGTCTCTCCAACGCAT-3’)
• Final dataset: 2083 sequences of each marker representing
990 genera and 1745 species
14. Dataset
• Mini-barcodes:
• Each of 2083 sequences were reduced to 100-200 base
sequences as the mini-barcodes.
• Position of the barcodes were randomly chosen
15. Benchmarking
• Benchmark of 11 different algorithms for both DNA barcodes
and mini-barcodes
1. B = BRONX;
2. C = CAOS;
3. D = DNA–BAR/degenbar;
4. F = forced (constrained) tree–search;
5. J = SAP neighbor joining;
6. L = pairwise matching (local alignment);
7. N = NCBI-BLAST;
8. P = pairwise matching (global alignment);
9. S = SAP Barcoder;
10. T = de novo tree–search;
11. W = WU-BLAST.
17. Results
• Genus level identification highly successful (>99%) for BRONX,
DNA-BAR/degenbar, NCBI-BLAST and pairwise matching using
full-length matK data
• rcbL not variable enough to distinguish between genera
(~97% success)
• DNA-BAR/degenbar outperformed all other SIDEs in species-
level identification
• but BRONX too was significantly better in genus-level
identification
• BRONX should be preferred for genus-level identification
queries over other SIDEs.
18. Results
Tests of identification using mini-barcode
queries.
Genus-level
identification
Weak test of
species-level
identification
Strong test of
species-level
identification
All test of
species-level
identification
19. Results
• For mini-barcode queries, identification success was relatively
lower than that of full-length queries
Identification success for strong test with
combined matK and rbcL
Full-length query
(DNA-BAR/degenbar)
Mini-barcode query
(BRONX)
91 % 47 %
• Performance of DNA-BAR/degenbar was similar to other SIDEs for
mini-barcode queries (11.24% success)
• Performance of BRONX for mini-barcode queries were better than
all other SIDEs
20. • Moderate agreement among SIDEs for full-length queries
(k=0.487-0.633)
• Little agreement among SIDEs for mini-barcode queries
(k =0.191-0.137)
• Identification success did not improve with combined data of
matk and rbcL.
Similarity of SIDE performance measured by Fleiss' index
of interrater agreement (k)
Results
21. Conclusion
• BRONX to be preferred over other SIDEs when
• Identification of genus are desired
• Mini-barcode is used for identification
• DNA-BAR/degenbar exhibit superior performance in species
level identification with full-length queries
• Due to inconstant performance no tree-based method should
be used for barcode sequence identification
• BLAST is rapid means of sequence identification but other
SIDEs provide better accuracy and consistency
22. Critique
• Quality of sequence data in public database -> GIGO
• DNA barcode data depends upon the primer selected to
amplify sequence
• Use of only a single primer set of each locus
• Does this mimic the real world dataset ?
• It would have been even better if the performance was
measured in terms of computing time required for analysis.
• It seems that, till date, no algorithm is available which can
incorporate both full-length query sequence as well as mini-
barcode sequence query and give higher identification success
at both genus and species level identification.