The Encyclopedia of Life: How realistic is it?


Published on

Discussion seminar for the ENTO681 course.

Starting points were:
Wilson, 2003 -
Mallet & Willmot, 2003 -
Godfray, 2002 -
Doctorow, 2001 -

Author: Ana Dal Molin

***This is shared under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License***

Published in: Education, Technology, Design
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Encyclopedia of Life: How realistic is it?

  1. 1. The Encyclopedia of Life: How realistic is it? Ana Dal Molin ENTO681 Seminar Texas A&M University 23 Feb 2009 This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. 2. 18(2) 2003
  3. 3. Why?
  4. 4. Imagine an electronic page for each species of organism on Earth, available everywhere by single access on command. Edward O. Wilson
  5. 5. E. O. Wilson’s idea <ul><li>Entries with genome, proteome, morphology, geographical distribution, habitat, phylogenetic position, ecological relationships and practical importance </li></ul><ul><li>Communicate with other DBs </li></ul><ul><li>Content peer-reviewed </li></ul><ul><li>Taxonomy is underfunded for the size of the enterprise, and there are too few taxonomists </li></ul><ul><li>E-types </li></ul><ul><li>“ accelerate as traditional taxonomic procedures (…) are replaced by high-resolution digital photography, nucleic acid sequencing and Internet publications” </li></ul><ul><li>Three overlapping phases: </li></ul><ul><ul><li>The Catalog of Life (collaborative effort of sp2000, ITIS, CBD and GBIF) </li></ul></ul><ul><ul><li>Inventories (All Species Foundation) </li></ul></ul><ul><ul><li>Expand the EOL over the Catalog of Life </li></ul></ul>
  6. 6. species images general information description genetics museums classif. Just a matter of organizing existing information? literature copyright format IUCN Red List BHL
  7. 8. Many Internet taxonomy initiatives exist J. Mallet perhaps too many K. Willmott
  8. 9. ~483,000 names (Jan 2009) 1.1 million names (includes LSIDs) (Dec 2008) Compiles several databases, including ITIS, GBIF, sp2000, CBD Redundancy of tools ? Focus on searches From Rio’92 Earth Summit (UN) Several databases (separate programs) “ iSpecies is a test of E O Wilson's idea of a web page for each species” Results from independent initiatives that use specific software : site inactive! 10207 titles 10,000,000 pages (Nov 2008) ~171,400,000 occurrence records (v. 1.2.3)
  9. 10. <ul><li>Multiple initiatives overlap continues for: </li></ul><ul><li>Keys </li></ul><ul><li>Regional inventories / faunistic databases </li></ul><ul><li>Taxon-specific information </li></ul><ul><li>Museum-specific information (types, holdings) </li></ul><ul><li>Literature databases </li></ul><ul><li>Catalogs </li></ul><ul><li>Tools </li></ul><ul><li>Etc. </li></ul>
  10. 11. Are we lacking funds? NSF: Biodiversity Surveys and Inventories (BS&I) including support for Planetary Biodiversity Inventories: Mission to an (almost) unknown planet (PBI) NSF: PEET All Species Foundation Summit (Harvard, 2001) Earth Summit (CBD, 1992; RIO+10, 2002) “ Important people jet frequently to international biodiversity conferences in expensive locales, while few improvements in taxonomy are yet evident” (Mallet & Willmott) C. Hine’s copy of “What on Earth” House of the Lords report: flags are mentions to information and communication technologies (in “Systematics as Cyberscience”, MIT, 2008)
  11. 12. Mallet & Willmott’s points <ul><li>Biologists need to seek consensus </li></ul><ul><li>Do not fragment information </li></ul><ul><li>Unitary taxonomy, DNA taxonomy and the Phylocode all argue that existing rules of nomenclature are inadequate / inefficient </li></ul><ul><li>Is it sensible to add another requirement to the already slow process of describing new taxa? </li></ul><ul><li>ICBN and ICZN rejected central registries in 1999 </li></ul><ul><li>The taxonomic impediment exists </li></ul><ul><ul><li>Not for lack of money </li></ul></ul><ul><ul><li>Not for lack of purpose </li></ul></ul><ul><ul><li>Not for lack of structure </li></ul></ul><ul><ul><li>For lack of basic work </li></ul></ul>
  12. 13. (…) a unitary organization (…) and web taxonomy should replace printed taxonomy Taxonomists lack goals that are both realistic and relevant. C. J. Godfray Int J Syst Evol Microbiol + LPSN a.k.a. Index Kewensis
  13. 14. Dreams of consumption: GenBank GenBank is frequently referenced as what taxonomists should be doing…
  14. 15. However, it is not an exclusive/central resource, not free from redundancy with other DBs. Solution: synchronization. “ Taxonomic information could become much more unitary even under existing codes. GenBank and EMBL did not become primary sources of DNA sequence information by decree .” (Mallet & Willmott)
  15. 16. Dreams of consumption: PubMed
  16. 17. Is this possible? Metadata Data Metadata repository Name Index Occurrence Index Yellow Pages Regional Atlas Annotation Tools Biosecurity Portal Analysis Tools Products LaSalle, 2008. Atlas of Living Australia, ICE2008 presentation
  17. 18.
  18. 19. <ul><li>People lie </li></ul><ul><li>People are lazy </li></ul><ul><li>People are stupid </li></ul><ul><li>Mission Impossible: know thyself </li></ul><ul><li>Schemas aren't neutral </li></ul><ul><li>Metrics influence results </li></ul><ul><li>There's more than one way to describe something </li></ul>Cory Doctorow
  19. 20. The fragility of metadata is an important concern because things such as the semantic web rely on conventions on data markup becoming widely adopted and used with care, which, according to Doctorow, will not and cannot happen. Ex. AY281248 - Australia: Gubbata, NSW (GPS: 33 38' 07'', 146 33' 12'' Genbank instructions: degrees latitude and longitude in format &quot;d[d.dd] N|S d[dd.dd] W|E&quot; Translating: Examples from Page, R. -33° 38' 7.08&quot;, +146° 33' 10.80“ IS in Australia Ex. DQ502492 - Nicaragua: Rio San Juan, Near Isla de Diamante (ca. 15 km SE El Castillo on Rio San Juan), 10deg56'N Ex. DQ226041 - /lat_lon=&quot;6 28.06'N; 58 37.16'W&quot;
  20. 21. Present criticisms about such initiatives <ul><li>Difficulty to inventory everything (Wilson) </li></ul><ul><li>Incongruence of species concept across taxa (Wilson) </li></ul><ul><li>Quality control (Wilson) </li></ul><ul><li>Information overload (Wilson) </li></ul><ul><li>Lack of cooperation: competing proposals, organizations and websites abound (Mallet & Willmott) </li></ul><ul><li>It had no significant impact on the taxonomic process (Mallet & Willmott) </li></ul><ul><li>Metadata are not reliable (Doctorow) </li></ul><ul><li>To that, add </li></ul><ul><li>Make people able to get LSIDs (or the identifier required) </li></ul><ul><li>Make people use LSIDs (or the identifier required) </li></ul><ul><li>Make tools communicate </li></ul><ul><li>Recently, even the format of such central encyclopedias: </li></ul><ul><li>that they should be “wikis” </li></ul>
  21. 22. The biodiversity information pipeline <ul><li>The capacity to deliver biodiversity information </li></ul><ul><li>How we are inputting biodiversity information </li></ul>LaSalle, 2008. Overcoming the taxonomic impediment. ICE2008 presentation
  22. 23. Questions 1. How realistic is it to have a web page for every species, including an image database that can ultimately be used in fingerprint-like fashion? 2. What exactly are the objectives behind the EOL, GBIF, and the other initiatives? Are they in fact overlapping? 3. Is this collaboration or: 3a. Unnecessary split of resources? 3b. Adding to the mess of linked data without actual information? 4. Can we learn from the example of other areas? Is our situation that different from astronomy or molecular databases, for example? 5. Do we need to change the way taxonomy is being done? 6. Do we need to change the way we deliver information? What are we doing wrong?