Behavior ontology workshop princeton


Published on

Presented at 2014 Animal Behavior Ontology workshop at Princeton, NJ in association with the Animal Behavior Society annual meeting.

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Also the work of Katja Sxhulz and Jen Hammock, and 5 contractors
  • Cost model….
    Up to a year you can restrict access
  • Before I talk about TraitBank, let’s think a little bit about GenBank. We are in the midst of a genomics revolution and most people would agree this has been a major advance in our ability to construct new knowledge about organisms. The cost to generate a full genome sequence is dropping more or less daily.

    What is all this genetic information DOING? How does it relate to what we can see and measure about organisms, their phenotypes, or their traits? How does DNA interact with the environment to result in both normal and abnormal development? How did it evolve? How fast do DNA changes make a difference in the lives of organisms?

  • Last year I did some calculations. These may be a bit out of date but should still work for scale. Phenoscape is a database that is looking at anatomical traits in fishes. Looking just at 57 publications they have more than 500K descriptions for 2500 kinds of organisms.

    ZFIN is a model organism database for zebrafish, a common model organism for developmental biologists. In just this one species they have captured nearly 40,000 traits – just for ONE very well-studied SPECIES

    Lots of traits AND LOTS OF WAYS TO DESCRIBE THEM, whether talking about stovepiped projects and formats and vocabularies
  • This january we released the first version of our TraitBank platform. If you want to find TraitBank you can go to this URL or look in the footer of every EOL page.
    Things like Body mass, Metabolic Rate, Lifespan, Number of offspring per year
    Things like”habitat” keywords, Vine/Shrub/Tree , “is it invasive in a place” “
  • TraitBank data are ingested, standardized as much as possible using ontologies, and managed as triples in a Virtuoso triple store. The trait information for each taxon is displayed in a Data tab on EOL taxon pages. There is search and download and a JSON-LD interface.
  • TraitBank currently holds data from X different data sets which provide about 7 million records for 326 traits across over 300,000 taxa.
    Here’s an overview of all the traits we have, weighted by the number of records represented in our triple store.

    TraitBank covers all kinds of traits, including geographic distribution, morphology, ecology, and life history, but we currently have a focus on marine data and data of use to conservation science.

  • You can also go to almost any page on EOL with content and you will see a quick facts box on the overview tab. This is a selection of traits. These particular traits, things like growth habit, Invasive listings, habitat keywords, are all contrlled vocabulary gterms. If you go to “See all” or to the Data tab you’ll see..

  • We also have numeric data with units and life stage identifiers. If you hover over you can get a definition of the term – different sources may provide the same or similar data (Pantheria & anage provide body mass and weight. We try not to lose any information but group things together – point to tabs-- so you can see related measurments.

  • Each record is annotated with rich metadata, including provenance, citation, information about methods etc.

  • TraitBank has a search interface, which is currently limited to queries for a single trait, with the option to restrict the search to a particular clade or a range of values.
  • Search results can be explored through the EOL interface or downloaded in a csv file that contains all metadata for a given record, including the uri mappings to ontologies.
  • We get most of these data from the Environments-EOL project, led by Vangelis Pafilis, which mines the taxon descriptions in the EOL text collection for environment descriptive terms and maps them to the vocabulary of the ENVO ontology.

    Currently Environments-EOL provides us with almost half a million habitat keywords for over 100,000 taxa.

  • Also, BioCubes
  • Use BHL or EOL and other sources to tackle biological questions
    Matched each awardee with informatics expert
    4-7 February 2014, Durham, NC
    organized by Cynthia Parr and Craig McClain
    Funded in part by Richard Lounsbery foundation
  • Behavior ontology workshop princeton

    1. 1. Why should we care? Cyndy Parr EOL Chief Scientist Photo by Rita Willaert
    2. 2. US Federal Agencies Must Have Open Data February 22, 2013 Office of Science and Technology Policy Memorandum Increasing Access to the Results of Federally Funded Scientific Research May 09, 2013 Executive Order Making Open and Machine Readable the New Default for Government Information “As one vital benefit of open government, making information resources easy to find, accessible, and usable can fuel entrepreneurship, innovation, and scientific discovery that improves Americans' lives and contributes significantly to job creation.”
    3. 3. Journal data sharing requirements for supplementary data
    4. 4. Even better reasons A researcher may use an ontology to: 1) expand a query to discover all studies or other resources (videos, images, web discussions) 2) annotate papers with standardized terms describing results related to her hypotheses 3) standardize datasets so that they can be assembled for re-analysis, meta-analysis or phylogenetic analysis
    5. 5. Standardize datasets for re-analysis, meta-analysis or phylogenetic analysis • Cross-species, cross-domain Evolution of development of lateralization in both brain and behavior • Within-species, cross-researcher, over time Changes in reproductive behavior in red-winged blackbirds due to global warming. • Cross-species cross-domain Understanding parasite manipulation of host behaviors (e.g. by Toxoplasma gondii)
    6. 6. 2008 text, media, literature all species, genera, etc. names infrastructure data curation human/machine interfaces 5 million visitors per year
    7. 7. GenBank 60 million DNA sequence records 900,000 species 4,000 genomes How are these related to traits?
    8. 8. Why TraitBank In Phenoscape 57 publications had 565,158 anatomical trait descriptions for 2,527 kinds of organisms = 223 traits/organism In ZFIN 38,189 trait descriptions for 4,727 genes for Zebrafish 1.9 million species on the planet = LOTS OF TRAITS & no central repository
    9. 9. TraitBank data sources launched January 2014 Numeric data (measurements) Categorical data (controlled vocabulary) Species interactions Mostly summaries From: Databases Literature Natural History Collections Legacy/unpublished data
    10. 10. Search & Download Data Sources Data Summaries on EOL Taxon Pages Which plants grow well in acidic soil? What do water bears eat? What is the biggest species of whale? Structured Data TraitBank JSON-LD API
    11. 11. TraitBank ~7 million records 326 traits 1.2 million taxa 40+ datasets
    12. 12. TraitBank Quick facts
    13. 13. TraitBank Data tab
    14. 14. TraitBank Metadata
    15. 15. TraitBank Search & download
    16. 16. TraitBank Search & download
    17. 17. TraitBank Data glossary
    18. 18. Download
    19. 19. TraitBank Uploading Darwin Core Archives Common names | Taxa | References | MeasurementsOrFacts | Associations | Events | Occurrences
    20. 20. Term URIs from existing ontologies Subject Area Ontology Example terms Statistics Semanticscience Integrated Ontology (SIO) mean, minimal value, standard deviation Units of measure Units of Measurement Ontology (UO) meter, years, degree Celsius Habitat information Environments Ontology (EnvO) wetland, desert, snow field Attributes of organisms Phenotype Quality Ontology (PATO) aerobic, conical, evergreen Plant attributes Plant Trait Ontology flower color, life cycle habit, salt tolerance Animal attributes Vertebrate Trait Ontology body mass, total life span, onset of fertility Animal natural history Animal Natural History and Life History Ontology (ETHAN) nocturnal, oviparous, scavenger
    21. 21. Text mining Environments-EOL Evangelos Pafilis, Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Crete, Greece 491,616 habitat terms for 136,548 taxa
    22. 22. Annotation of an observation record
    23. 23. EOL-BHL Research Sprint