Johannes Bergsten lecture on Thursday, Sept 17, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.
A Critique of the Proposed National Education Policy Reform
Johannes Bergsten Dna Barcoding
1. DNA Barcoding
Johannes Bergsten
Swedish Museum of Natural History
Department of Entomology
E-mail: johannes.bergsten@nrm.se
Biodiversity Informatics Course, 14-24 September, 2009
Swedish Museum of Natural History, Stockholm, Sweden
Imagecredit:Barcodinginstituteofontario
2. How it all started in 2003
Propose a CO1-based (~650bp of the 5’ end)
global identification system of animals,
and show the success (96.4-100%) of assigning
test specimens to the correct phyla, order and species
(Lepidoptera from Guelph) through a CO1-profile.
98% of congeneric species in 11
animal phyla showed
>2% sequence divergence in CO1
3. What is DNA Barcoding?
• A way of identifying
samples to species based
on a short standardised
gene-region
• Keywords:
• Identify
• Samples
• Species
• Gene
• Short
• Standardised
4. 2 main uses of DNA Barcoding
• identify specimens – a global identification system
• discover new species – aid and speed up the discovery of the
remaining biodiversity
5.
6. Why DNA Barcoding?
-the applications
• Identification of all life stages, eggs, larvae, nymphs, pupa, adults
• Identification of fragments or products of organisms
• Identification of stomach contents, trace ecological food-chains
• Identification of cryptic look-alike species
• Food control
• Customs control
• Invasive species control
• Disease vector control
• Police
• Agriculture
• Forestry
• Conservation
• Education
• Etc
7. Examples
What is the fillet
served on your plate,
on a market or
in a package?
What are the eggs
or molt in the ballast water
of ships? Are they non-
native invasive species?
9. Why DNA Barcoding?
The biodiversity-taxonomy crisis
• The Biodiversity crisis
• We have yet to discover and describe maybe 90% of the biodiversity
• Humans are responsible for a mass extinction that is going fast!
• Traditional taxonomy is too slow!
• Taxonomic expertise is vanishing and training new taxonomists is
too expensive
• Democratizing taxonomic knowledge
13. “- Mum is this a grizzly bear
or a black bear?”
“- Well Johnnie why don’t
you go poke your barcoder
into it and find out.”
(Cameron et al Syst. Biol: 2006)
Criticism
14. The Barcoding Movement
• CBOL: a consortium of 200 member
institutions/organizations from 50
countries that promote and standardize
DNA Barcoding
• iBOL: an alliance of 16 nations trying to
get the big bucks to do the job.
15. The chosen gene for Metazoans
• Cytochrome
Oxidase subunit I
• Mitochondrial
• Easy to amplify
• Relatively fast
evolving
Credit: iBOL
16. The chosen genes for plants
Plastid genes rbcL and matK form a 2-locus plant barcode
33. DNA Barcode standards
• The standards include three components:
1) Creation of a reserved keyword
(”BARCODE”). NCBI and its
collaborators will add the BARCODE ’Flag’
to new submissions that meet the
standards established in consultation with
CBOL. Data records that meet these
criteria will be known as BARCODE
records in INSDC (BRIs);
34. Required data elements
• 2) Required data elements.
• To provide the user community with
reliable, retrievable and verifiable
information concerning the barcode
sequence itself, the specimen from which
it was obtained, and the species name
that was applied by the submitter.
35. Data on the specimen
• a) Include a link to a voucher specimen using a
structured field* specified by CBOL and NCBI,
and to the metadata associated with that
specimen and contained in the public database
of the voucher specimen’s repository.
• b) Include a link to a documented species name
found in one of the sources specified by CBOL
and NCBI;
• c) Include Country-Code, using the controlled
vocabulary used by GenBank;
*(institution|collection|item) e.g. NHRS:ENT-LEPI:AA008745
36. The Barcode region
• d) Come from a gene region accepted by CBOL
as an effective barcode. Initially, only
cytochrome c oxidase 1 is approved as a
barcode region, defined relative to the mouse
mitochondrial genome as the 648 bp region that
starts at position 58 and stops at position 705.
• (For plants matK and rbcL is expected to get the
same status very soon)
• CBOL has procedures for applying for other
generegions to be given barcode status
37. Quality of sequence
• e) Include at least 500 contiguous unambiguous base-pairs from
bidirectional sequencing within the approved barcode region.
However, if requested, GenBank could assign the BARCODE flag to
records with shorter sequences
• f) Include no more than 1% ambiguous sites for the entire submitted
sequence;
• g) Include the name of the gene region used;
• h) Be associated with trace file submitted to the NCBI Trace Archive
or the Ensembl Trace Server;
• i) Include the sequences of all forward and reverse primers used.
For records in which the contiguous sequence was assembled from
more than one amplicon or when a cocktail of multiple primers was
used for amplification, multiple sets of primer pairs must be
provided. In addition, submission of the names of the forward and
reverse primers with the primer sequences is strongly
recommended.
38. Strongly recommended
data elements.
• Strongly recommended data elements. The
following data elements have been added to the
INSDC at CBOL’s request for validation of the
voucher specimen, and will be strongly
recommended but not required:
• j) Latitude and longitude;
• k) Name of the identifier;
• l) Name of the collector;
• m) Date of collection
39. Governance rules.
• 3) Governance rules. The INSDC provides an archive of records
that can only be changed by the submitter. In the case of BRIs, the
following modifications are implemented:
• CBOL can allow <500bp sequences to get barcode status (e.g.
types, extinct spp.)
• CBOL maintains a process by which alternative generegions can
attain barcode status
• BRIs submitted via BOLD are jointly submitted by the researcher
and BOLD and can be edited by both.
• CBOL can recommend the BARCODE status to be removed from
sequences submitted to INSDC by an individual researcher.
• A system for attaching third-party comments, criticism and
suggested corrections to BRIs will be installed.
50. Barcoding rest on the idea that between species genetic distance
is larger, than within species variation.
Genetic distance
The Barcoding gap
1%
51. Organism
Distrib
ution
Geographical
sampling species
sam
pled Prop.
ind/
sp.
intrasp
var.
intersp
div.
Id.
success paper
Spiders World
Local
(Canada) 40,000 168
0.004
2 3 1.40% 16.40% 100%
Barrett &
Hebert
(2005)
Birds World
Regional (N.
Am.) 9000 260 0.028 2 0.43% 7.93% 100%
Hebert et
al (2004)
Lepidopt.
3 sup fam World
Local
(Guelph) 91700 200
0.002
2 1.7 0.25% 6.80% 100%
Hebert et
al (2003)
mayflies World
Regional (N.
Am.) 2,500 80 0.032 1.9 1.10% 18.10% 99.00%
Ball et al
(2005)
Differ by >an order of magnitude
= Barcoding Gap
Supporting data for the Barcoding Gap
Critique:
Well sampled?
52. Sisterspecies vs congeners
Panthera leo (lejon)
Panthera tigris (tiger)
Motacilla flava (gulärla)
Motacilla alba (sädesärla)
Carabus nitens (guldlöpare) Carabus coriaceus (läderlöpare)
Salix herbacea (dvärgvide)
Salix caprea (sälg)
Sisterspecies vs congeners
Agabus elongatus
A. congener A. lapponicus
A. thomsoni
A. moestus
A. levanderi
A. clypealis
A. pseudoclypealis
Sylvia minula (ökenärtsångare)
Sylvia curucca (ärtsångare)
Eupeodes luniger
Eupeodes latilunulatus
Sisterspecies vs congeners
Carex rostrata (flaskstarr)Carex vesicaria (blåsstarr)
Pipistrellus pipistrellus (Pipistrell)
Pipistrellus pygmaeus (dvärgfladdermus)
55. How DNA barcodes should not
be used
• “It is expected that DNA barcodes will
contribute to the discovery and formal
recognition of new species. However,
DNA barcodes should not be used as
the sole criterion for description of new
species, which instead require analysis of
diverse data, including morphology,
ecology, and behavior, as well as
genetics.”
From draft conference report: Taxonomy, DNA, and the Barcode of Life, 2003
56. How not to be used
• ”We were interested to see whether Xus exemplaris would be
considered a species under standard DNA barcoding protocol”
• ”Using the DNA Barcoding protocol…..therefore under a 3%
threshold and a 10x mean intraspecific threshold Xus exemplaris
would be considered a good species.
• ”However if we use the smallest among-species divergence as
recomended by Meier et al (2008) Xus exemplaris would not be
considered a good species under the protocol.”
57. Barcodes are very useful for
species discovery
• For poorly known groups DNA delimitation
can be a good starting point for species
discovery
• There are alternatives to an artifical 1, 2 or
3% sequence divergence as a threshold
• E.g. GMYC General Mixed Yule
Coalescence method (Pons et al, 2006)