Johannes Bergsten Dna Barcoding


Published on

Johannes Bergsten lecture on Thursday, Sept 17, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.

Published in: Education, Technology
1 Comment
  • Hi Johannes,

    I’m pleased to invite you to attend our webinar with Microsoft this Thursday at 10am PST.

    Learn How 2D Barcodes and Mobile Phones can Bridge the Digital and Physical Worlds. Free webinar 10/27 10am PST.
    Paul Cunnington, Microsoft director TAG products management and Michael Ahearn, iLoop Mobile vice-president strategic marketing will dive into Mtag opportunity and explain how it fits into a mobile marketing strategy.
    Register here:

    Feel free to tweet or share with others.
    Best regards
    Virginie Glaenzer
    Director Marketing at iLoop Mobile
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Johannes Bergsten Dna Barcoding

  1. 1. DNA Barcoding Johannes Bergsten Swedish Museum of Natural History Department of Entomology E-mail: Biodiversity Informatics Course, 14-24 September, 2009 Swedish Museum of Natural History, Stockholm, Sweden Imagecredit:Barcodinginstituteofontario
  2. 2. How it all started in 2003 Propose a CO1-based (~650bp of the 5’ end) global identification system of animals, and show the success (96.4-100%) of assigning test specimens to the correct phyla, order and species (Lepidoptera from Guelph) through a CO1-profile. 98% of congeneric species in 11 animal phyla showed >2% sequence divergence in CO1
  3. 3. What is DNA Barcoding? • A way of identifying samples to species based on a short standardised gene-region • Keywords: • Identify • Samples • Species • Gene • Short • Standardised
  4. 4. 2 main uses of DNA Barcoding • identify specimens – a global identification system • discover new species – aid and speed up the discovery of the remaining biodiversity
  5. 5. Why DNA Barcoding? -the applications • Identification of all life stages, eggs, larvae, nymphs, pupa, adults • Identification of fragments or products of organisms • Identification of stomach contents, trace ecological food-chains • Identification of cryptic look-alike species • Food control • Customs control • Invasive species control • Disease vector control • Police • Agriculture • Forestry • Conservation • Education • Etc
  6. 6. Examples What is the fillet served on your plate, on a market or in a package? What are the eggs or molt in the ballast water of ships? Are they non- native invasive species?
  7. 7. Further examples Illegally traded bushmeat, sharkfins, skins Do the products come from protected or banned-for-trade species?
  8. 8. Why DNA Barcoding? The biodiversity-taxonomy crisis • The Biodiversity crisis • We have yet to discover and describe maybe 90% of the biodiversity • Humans are responsible for a mass extinction that is going fast! • Traditional taxonomy is too slow! • Taxonomic expertise is vanishing and training new taxonomists is too expensive • Democratizing taxonomic knowledge
  9. 9. The crisis-illustrated This is where we stand today! Credit:DavidE.Schindel
  10. 10. Sequencing is getting cheap
  11. 11. The Vision Imagecredit:Barcodinginstituteofontario
  12. 12. “- Mum is this a grizzly bear or a black bear?” “- Well Johnnie why don’t you go poke your barcoder into it and find out.” (Cameron et al Syst. Biol: 2006) Criticism
  13. 13. The Barcoding Movement • CBOL: a consortium of 200 member institutions/organizations from 50 countries that promote and standardize DNA Barcoding • iBOL: an alliance of 16 nations trying to get the big bucks to do the job.
  14. 14. The chosen gene for Metazoans • Cytochrome Oxidase subunit I • Mitochondrial • Easy to amplify • Relatively fast evolving Credit: iBOL
  15. 15. The chosen genes for plants Plastid genes rbcL and matK form a 2-locus plant barcode
  16. 16. What are you waiting for? Credit: iBOL
  17. 17. BOLD - project managment
  18. 18. Projects
  19. 19. BOLD – identification engine
  20. 20. No match
  21. 21. Read Publication on BOLD
  22. 22. DNA Barcode standards • The standards include three components: 1) Creation of a reserved keyword (”BARCODE”). NCBI and its collaborators will add the BARCODE ’Flag’ to new submissions that meet the standards established in consultation with CBOL. Data records that meet these criteria will be known as BARCODE records in INSDC (BRIs);
  23. 23. Required data elements • 2) Required data elements. • To provide the user community with reliable, retrievable and verifiable information concerning the barcode sequence itself, the specimen from which it was obtained, and the species name that was applied by the submitter.
  24. 24. Data on the specimen • a) Include a link to a voucher specimen using a structured field* specified by CBOL and NCBI, and to the metadata associated with that specimen and contained in the public database of the voucher specimen’s repository. • b) Include a link to a documented species name found in one of the sources specified by CBOL and NCBI; • c) Include Country-Code, using the controlled vocabulary used by GenBank; *(institution|collection|item) e.g. NHRS:ENT-LEPI:AA008745
  25. 25. The Barcode region • d) Come from a gene region accepted by CBOL as an effective barcode. Initially, only cytochrome c oxidase 1 is approved as a barcode region, defined relative to the mouse mitochondrial genome as the 648 bp region that starts at position 58 and stops at position 705. • (For plants matK and rbcL is expected to get the same status very soon) • CBOL has procedures for applying for other generegions to be given barcode status
  26. 26. Quality of sequence • e) Include at least 500 contiguous unambiguous base-pairs from bidirectional sequencing within the approved barcode region. However, if requested, GenBank could assign the BARCODE flag to records with shorter sequences • f) Include no more than 1% ambiguous sites for the entire submitted sequence; • g) Include the name of the gene region used; • h) Be associated with trace file submitted to the NCBI Trace Archive or the Ensembl Trace Server; • i) Include the sequences of all forward and reverse primers used. For records in which the contiguous sequence was assembled from more than one amplicon or when a cocktail of multiple primers was used for amplification, multiple sets of primer pairs must be provided. In addition, submission of the names of the forward and reverse primers with the primer sequences is strongly recommended.
  27. 27. Strongly recommended data elements. • Strongly recommended data elements. The following data elements have been added to the INSDC at CBOL’s request for validation of the voucher specimen, and will be strongly recommended but not required: • j) Latitude and longitude; • k) Name of the identifier; • l) Name of the collector; • m) Date of collection
  28. 28. Governance rules. • 3) Governance rules. The INSDC provides an archive of records that can only be changed by the submitter. In the case of BRIs, the following modifications are implemented: • CBOL can allow <500bp sequences to get barcode status (e.g. types, extinct spp.) • CBOL maintains a process by which alternative generegions can attain barcode status • BRIs submitted via BOLD are jointly submitted by the researcher and BOLD and can be edited by both. • CBOL can recommend the BARCODE status to be removed from sequences submitted to INSDC by an individual researcher. • A system for attaching third-party comments, criticism and suggested corrections to BRIs will be installed.
  29. 29. Credit for slide: David E. Schindel
  30. 30. Voucher repository linkout from genbank
  31. 31. Linkout from Genbank to taxonomy databases
  32. 32. BOLD linkout from genbank
  33. 33. Trace archives
  34. 34. Recommended data elements
  35. 35. How to submit data
  36. 36. Will DNA Barcoding work? Image credit: Barcoding institute of ontario
  37. 37. Barcoding rest on the idea that between species genetic distance is larger, than within species variation. Genetic distance The Barcoding gap 1%
  38. 38. Organism Distrib ution Geographical sampling species sam pled Prop. ind/ sp. intrasp var. intersp div. Id. success paper Spiders World Local (Canada) 40,000 168 0.004 2 3 1.40% 16.40% 100% Barrett & Hebert (2005) Birds World Regional (N. Am.) 9000 260 0.028 2 0.43% 7.93% 100% Hebert et al (2004) Lepidopt. 3 sup fam World Local (Guelph) 91700 200 0.002 2 1.7 0.25% 6.80% 100% Hebert et al (2003) mayflies World Regional (N. Am.) 2,500 80 0.032 1.9 1.10% 18.10% 99.00% Ball et al (2005) Differ by >an order of magnitude = Barcoding Gap Supporting data for the Barcoding Gap Critique: Well sampled?
  39. 39. Sisterspecies vs congeners Panthera leo (lejon) Panthera tigris (tiger) Motacilla flava (gulärla) Motacilla alba (sädesärla) Carabus nitens (guldlöpare) Carabus coriaceus (läderlöpare) Salix herbacea (dvärgvide) Salix caprea (sälg) Sisterspecies vs congeners Agabus elongatus A. congener A. lapponicus A. thomsoni A. moestus A. levanderi A. clypealis A. pseudoclypealis Sylvia minula (ökenärtsångare) Sylvia curucca (ärtsångare) Eupeodes luniger Eupeodes latilunulatus Sisterspecies vs congeners Carex rostrata (flaskstarr)Carex vesicaria (blåsstarr) Pipistrellus pipistrellus (Pipistrell) Pipistrellus pygmaeus (dvärgfladdermus)
  40. 40. Overlap in cowries Meyer and Paulay, PLoS Biology (2006)
  41. 41. Overlap the reality
  42. 42. How DNA barcodes should not be used • “It is expected that DNA barcodes will contribute to the discovery and formal recognition of new species. However, DNA barcodes should not be used as the sole criterion for description of new species, which instead require analysis of diverse data, including morphology, ecology, and behavior, as well as genetics.” From draft conference report: Taxonomy, DNA, and the Barcode of Life, 2003
  43. 43. How not to be used • ”We were interested to see whether Xus exemplaris would be considered a species under standard DNA barcoding protocol” • ”Using the DNA Barcoding protocol…..therefore under a 3% threshold and a 10x mean intraspecific threshold Xus exemplaris would be considered a good species. • ”However if we use the smallest among-species divergence as recomended by Meier et al (2008) Xus exemplaris would not be considered a good species under the protocol.”
  44. 44. Barcodes are very useful for species discovery • For poorly known groups DNA delimitation can be a good starting point for species discovery • There are alternatives to an artifical 1, 2 or 3% sequence divergence as a threshold • E.g. GMYC General Mixed Yule Coalescence method (Pons et al, 2006)
  45. 45. Aulonogyrus cristatus Aulonogyrus goudoti Gyrinus madagascariensis Dineutes subspinosus Dineutes sinuosipennis Dineutes proximus Gyrinus ignitus Orectogyrus cyanicollis Orectogyrus pallidocinctus Orectogyrus vestitus Orectogyrus sedilloti GMYC model (Pons et al, 2006) Andasibe Ranomafana Mont. D’Ambre Antsabe likelihood 574 576 578 580 582 584 586 588 590 592 -1 4 9 14 19 24 29 34 39 44 49 likelihood P<0.01
  46. 46. Large inventories of the unknown