iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

972 views

Published on

Talk presented at iEvoBio 2014 conference in Raleigh, North Carolina. Though there's a similar title and overlap with the talk I posted last week, there is new material here especially geared towards an informatics crowd savvy in the tools and technology.

Published in: Science, Technology, Education
  • Be the first to comment

  • Be the first to like this

iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

  1. 1. Frontiers of discovery with Encyclopedia of Life TraitBank research and other case studies Cyndy Parr Smithsonian Institution National Museum of Natural History parrc@si.edu @cydparr http://www.slideshare.net/csparr
  2. 2. Central challenges • What are all the organisms on the planet? • What do we know about them? • How can we build new knowledge about them?
  3. 3. GenBank 60 million DNA sequence records 900,000 species 4,000 genomes How are these related to traits?
  4. 4. Phenomes: the next frontier In Phenoscape 57 publications had 565,158 anatomical trait descriptions for 2,527 kinds of organisms = 223 traits/organism In ZFIN 38,189 trait descriptions for 4,727 genes for Zebrafish 1.9 million species on the planet = LOTS OF TRAITS
  5. 5. • How is EOL different • How EOL gets used • Introducing TraitBank • Loading up TraitBank • EOL & TraitBank in research • Future of TraitBank Outline
  6. 6. eol.org
  7. 7. Third party applications How EOL is different
  8. 8. How EOL gets used http://www.notesfromnature.org/
  9. 9. http://www.onezoom.org/ http://yanwong.me/
  10. 10. PhyloTiler http://viburnum.peabody.yale.edu/~piel/Tree_4_color/ Links and images…what about research?
  11. 11. Search groups for “EOL papers” at Mendeley.com
  12. 12. Anatolia Zooarchaeology Case Study led by Alexandria Archive Institute 1. 14 different sites 2. 34+ zooarchaeologists 3. Decoding, cleanup, metadata documentation 4. 220,000+ specimens 5. 450 entities linked to 143 EOL taxon concepts 6. Anatomical entities linked to Uberon.org 7. Biometrics linked to measurement ontology 8. Collaborative analysis Anatolia Zooarchaeology Case Study led by Alexandria Archive Institute 1. 14 different sites 2. 34+ zooarchaeologists 3. Decoding, cleanup, metadata documentation 4. 220,000+ specimens 5. 450 entities linked to 143 EOL taxon concepts 6. Anatomical entities linked to Uberon.org 7. Biometrics linked to measurement ontology 8. Collaborative analysis http://opencontext.org/ Kansa, E., Kansa, S. W., & Arbuckle, B. (2014). Publishing and Pushing: Mixing Models for Communicating Research Data in Archaeology. International Journal for Digital Curation, 9.
  13. 13. Page, R. D. M. (2013). BioNames: linking taxonomy, texts, and trees. PeerJ, 1, e190. doi:10.7717/peerj.190 BioNames.org Rod Page
  14. 14. But can we do more? Introducing TraitBank
  15. 15. Search & Download Data Sources Data Summaries on EOL Taxon Pages Which plants grow well in acidic soil? What do water bears eat? What is the biggest species of whale? Structured Data TraitBank JSON-LD API
  16. 16. • Numeric data (measurements) • Categorical data (controlled vocabulary) • Species interactions • Mostly summaries for populations, species • Individual specimens • Higher taxa http://eol.org/traitbank released January 2014
  17. 17. TraitBank Quick facts
  18. 18. TraitBank Data tab
  19. 19. TraitBank Metadata
  20. 20. TraitBank Search & download
  21. 21. TraitBank Search & download
  22. 22. TraitBank Data glossary http://eol.org/data_glossary
  23. 23. Download
  24. 24. TraitBank Data model
  25. 25. TraitBank Uploading Darwin Core Archives Common names | Taxa | References | MeasurementsOrFacts | Associations | Events | Occurrences
  26. 26. Term URIs from existing ontologiesbioportal.bioontologies.org Subject Area Ontology Example terms Statistics Semanticscience Integrated Ontology (SIO) mean, minimal value, standard deviation Units of measure Units of Measurement Ontology (UO) meter, years, degree Celsius Habitat information Environments Ontology (EnvO) wetland, desert, snow field Attributes of organisms Phenotype Quality Ontology (PATO) aerobic, conical, evergreen Plant attributes Plant Trait Ontology flower color, life cycle habit, salt tolerance Animal attributes Vertebrate Trait Ontology body mass, total life span, onset of fertility Animal natural history Animal Natural History and Life History Ontology (ETHAN) nocturnal, oviparous, scavenger
  27. 27. Term URIs from existing ontologies •Where necessary: request terms •Last resort: create provisional terms with http://eol.org/schema/terms/xxxx •Still to do • create “equivalentTo” or “similarTo” relations • even more fancy inference
  28. 28. JSON-LD e.g. http://eol.org/api/traits/1045608? cache_ttl=2419200 Google Knowledge Graph
  29. 29. TraitBank data sources Sources include: Databases (OBIS, AnAge, Paleodb, Phenoscape) Literature (Dryad, Ecological Archives, Data tables) Natural History Collections (Label data) Legacy/unpublished data Loading up TraitBank
  30. 30. TraitBank ~7 million records 326 traits 1.2 million taxa 40+ datasets http://eol.org/collections/97700
  31. 31. Text mining Environments-EOL Evangelos Pafilis, Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Crete, Greece 491,616 habitat terms for 136,548 taxa
  32. 32. Text mining Automated annotation Manual annotation
  33. 33. Morphological Data from NMNH catalog Abi Nishimura Project: Clean-up morphological data from NMNH KE-Emu catalog and publish to TraitBank Goal: Make it easier to access and analyze this valuable morphological data Sakurai Midori, http://eol.org/data_objects/26918624 Raw data from Spectral Tarsier Tarsius tarsier database search
  34. 34. RESULTS •Primate data published (320 taxa) •Comprehensive mammals data to be published soon (4662 taxa) •Bird catalog currently being mined Wan Hong, http://eol.org/data_objects/29203274
  35. 35. Mineralization of tissue in marine organisms Jen Hammock with Steve Cairns For modeling impacts of ocean acidification 143,000 records for 119,000 species and subspecies of Micro- and Macroalgae, Cnidaria, Polychaetes, Bryozoans, Brachiopods, Sponges, Mollusks, Echinoderms and Arthropods Mineralized tissue = ●Biogenic silica ●Calcium carbonate ○ Calcite ○ and/or Aragonite
  36. 36. 2013-14 EOL Rubenstein Fellows EOL & TraitBank research 1. EnvO habitat terms (Pafilis et al.) 2. Altitude Specificity of Flower Coloration (Wright & Seltmann) 3. Morphological impacts of extinction risk in fish (Chang) 4. Butterfly-host plant associations (Ferrer-Parris et al.) 5. Taxon Tree Tool (Lin) 6. Global Biotic Interactions (GLoBI, Poelen & Mungall et al) http://www.globalbioticinteractions.org/ 7. Reol: An R interface for EOL (Banbury, O’Meara) Banbury, B. L., & O’Meara, B. C. (2014). Ecology and Evolution, 4(12). doi:10.1002/ece3.1109
  37. 37. Chang crowdsourcingJonathan Chang, UCLA http://jonathanchang.org/ Amazon Mechanical Turk
  38. 38. EOL-BHL Research Sprint
  39. 39. 1. Character displacement across the Tree of Life 2. Illuminating the Dark Parts of the Tree of Life 3. Evolution in the usage of anatomical concepts in the biodiversity literature 4. Planning for global change: using species interactions in conservation 5. No place like home: Defining “habitat” for biodiversity science 6. Assessing risk status of Mexican amphibians 7. Quantifying color from digital imagery: color may determine species’ responses to habitat edges and to climate change 8. More is less - Identifying global trends in species’ niche width 9. Identifying key species traits associated with climate change vulnerability NESCent-EOL-BHL Research Sprint
  40. 40. Quantifying color from digital imagery 1. Automate processing of almost 300k images (of EOL’s 2.4 million) 2. Identify pinned specimen images 3. Process these for color and pattern information 4. Put this info into TraitBank Elise Larsen, Yan Wong
  41. 41. Illuminating the Dark Parts of the Tree of Life Jessica Oswald, Karen Cranston, Gordon Burleigh, Cyndy Parr 1. Query EOL, GBIF, GenBank for # records 2. Create score for amount of information available 3. Map score to phylogeny
  42. 42. Global Genome Initiative Data Portal For every family: •Use TraitBank to assemble counts of records in repositories •Compute a score (percentile) to assess knowledge available relative to other families •Make it easy to browse to find families that require effort Beta launch end of June
  43. 43. • Decorate trees with traits • NSF Genealogy of Life • NSF Big Data • NSF ABI Isotopes and Interactions • Microsoft/WCMC Global Ecosystem Models TraitBank future plans
  44. 44. Leveraging social networks Ahn, J., et al.. (2012). Visually Exploring Social Participation in Encyclopedia of Life. In 2012 International Conference on Social Informatics (pp. 149–156). IEEE. Rotman, D., et al. (2014). Motivations affecting initial and long-term participation in citizen science projects in three countries. In iConference 2014 Proceedings (pp. 110-124). http://biotracker.umd.edu • motivation model for citizen scientists • international attitudes of scientists and citizens to working together • factors that increase curation network activity • currently working on motivations of EOL content partners
  45. 45. Annotation of a specimen record Ovary size and reproductive state Age markers Fat status Body mass and other size attributes
  46. 46. Annotation of an observation record
  47. 47. For more information • See & cite Parr, et al. 2014 Biodiv. Data Journal • See our TraitBank paper (in review) http://www.semantic-web-journal.net/content/traitbank-practical • Open source code https://github.com/EOL/ • APIs at http://eol.org/api • Become an EOL Curator
  48. 48. Take home messages • EOL can be useful for research • TraitBank is already awesome • Mutualism between collections, EOL, citizen science • Let’s collaborate
  49. 49. Atlas of Living Australia • Biodiversity Heritage Library Consortium • Chinese Academy of Sciences • La Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO) • The Field Museum • Harvard University • El Instituto Nacional de Biodiversidad (INBio) • Marine Biological Laboratory • Missouri Botanical Garden • Muséum National d’histoire Naturelle • Naturalis Netherlands • New Library of Alexandria • Smithsonian Institution • South African National Biodiversity Institute • All of our content providers and curators Steve Cairnes • John Keltner • Katie Barker • Jonathan Coddington • Sean Brady • Tom Orrell • Chris Meyers • Yan Wong • Jon Norenburg • Torsten Dikow • Yurong He • Jenny Preece and others on BioTracker team • Pensoft Publishing • EOL Science Advisory Board Katja Schulz, Jen Hammock, Marie Studer, Jeff Holmes, Nathan Wilson, Patrick Leary, Jeremy Rice, Lisa Walley, Bob Corrigan, Erick Mata, Dmitry Mozzherin, Abi Nishimura • Sarah Miller • Anthony Goddard, Mark Westneat and former BioSynC staff http://eol.org @cydparr parrc@si.edu Major Funding for TraitBank provided by the Alfred P. Sloan Foundation. Fellows program supported by Daniel M. Rubenstein, Research sprint by Richard Lounsbery Foundation.
  50. 50. 1. Terms are not in any existing ontology e.g., seawater oxygen saturation, eutrophic pond, north-facing bluff 2. Synonyms are not included e.g., vernal pond/intermittent pond 3. Standard classifications should be mapped e.g., NatureServe, NOAA 4. Environment estimates vs. well-documented niche parameters e.g., text mining results vs. NatureServe habitats, OBIS data vs. niche analyses Challenges
  51. 51. 14 datasets with 25k taxa, 422k interactions, for 3k locations alpha version of ingestion, normalization, aggregation alpha version of web API alpha version of data exports GLoBI http://globalbioticinteractions.wordpress.com/ Jorrit Poelen, Chris Mungall, James Simon GoMexSi

×