Also the work of Katja Sxhulz and Jen Hammock, and 5 contractors
Cost model…. Up to a year you can restrict access
Before I talk about TraitBank, let’s think a little bit about GenBank. We are in the midst of a genomics revolution and most people would agree this has been a major advance in our ability to construct new knowledge about organisms. The cost to generate a full genome sequence is dropping more or less daily.
What is all this genetic information DOING? How does it relate to what we can see and measure about organisms, their phenotypes, or their traits? How does DNA interact with the environment to result in both normal and abnormal development? How did it evolve? How fast do DNA changes make a difference in the lives of organisms?
Last year I did some calculations. These may be a bit out of date but should still work for scale. Phenoscape is a database that is looking at anatomical traits in fishes. Looking just at 57 publications they have more than 500K descriptions for 2500 kinds of organisms.
ZFIN is a model organism database for zebrafish, a common model organism for developmental biologists. In just this one species they have captured nearly 40,000 traits – just for ONE very well-studied SPECIES
Lots of traits AND LOTS OF WAYS TO DESCRIBE THEM, whether talking about stovepiped projects and formats and vocabularies
This january we released the first version of our TraitBank platform. If you want to find TraitBank you can go to this URL or look in the footer of every EOL page. Things like Body mass, Metabolic Rate, Lifespan, Number of offspring per year Things like”habitat” keywords, Vine/Shrub/Tree , “is it invasive in a place” “
TraitBank data are ingested, standardized as much as possible using ontologies, and managed as triples in a Virtuoso triple store. The trait information for each taxon is displayed in a Data tab on EOL taxon pages. There is search and download and a JSON-LD interface.
TraitBank currently holds data from X different data sets which provide about 7 million records for 326 traits across over 300,000 taxa. Here’s an overview of all the traits we have, weighted by the number of records represented in our triple store.
TraitBank covers all kinds of traits, including geographic distribution, morphology, ecology, and life history, but we currently have a focus on marine data and data of use to conservation science.
You can also go to almost any page on EOL with content and you will see a quick facts box on the overview tab. This is a selection of traits. These particular traits, things like growth habit, Invasive listings, habitat keywords, are all contrlled vocabulary gterms. If you go to “See all” or to the Data tab you’ll see..
We also have numeric data with units and life stage identifiers. If you hover over you can get a definition of the term – different sources may provide the same or similar data (Pantheria & anage provide body mass and weight. We try not to lose any information but group things together – point to tabs-- so you can see related measurments.
Each record is annotated with rich metadata, including provenance, citation, information about methods etc.
TraitBank has a search interface, which is currently limited to queries for a single trait, with the option to restrict the search to a particular clade or a range of values.
Search results can be explored through the EOL interface or downloaded in a csv file that contains all metadata for a given record, including the uri mappings to ontologies.
We get most of these data from the Environments-EOL project, led by Vangelis Pafilis, which mines the taxon descriptions in the EOL text collection for environment descriptive terms and maps them to the vocabulary of the ENVO ontology.
Currently Environments-EOL provides us with almost half a million habitat keywords for over 100,000 taxa.
Use BHL or EOL and other sources to tackle biological questions Matched each awardee with informatics expert 4-7 February 2014, Durham, NC organized by Cynthia Parr and Craig McClain Funded in part by Richard Lounsbery foundation
Behavior ontology workshop princeton
Why should we care?
EOL Chief Scientist
Photo by Rita Willaert
US Federal Agencies
Must Have Open Data
February 22, 2013
Office of Science and Technology Policy Memorandum
Increasing Access to the Results of Federally Funded Scientific
May 09, 2013
Making Open and Machine Readable the New Default for
“As one vital benefit of open government, making information
resources easy to find, accessible, and usable can fuel
entrepreneurship, innovation, and scientific discovery that
improves Americans' lives and contributes significantly to job
Journal data sharing requirements for
Even better reasons
A researcher may use an ontology to:
1) expand a query to discover all studies or
other resources (videos, images, web
2) annotate papers with standardized terms
describing results related to her hypotheses
3) standardize datasets so that they can be
assembled for re-analysis, meta-analysis or
Standardize datasets for re-analysis,
meta-analysis or phylogenetic analysis
• Cross-species, cross-domain
Evolution of development of lateralization in both
brain and behavior
• Within-species, cross-researcher, over time
Changes in reproductive behavior in red-winged
blackbirds due to global warming.
• Cross-species cross-domain
Understanding parasite manipulation of host
behaviors (e.g. by Toxoplasma gondii)
text, media, literature
all species, genera, etc.
5 million visitors per year
60 million DNA sequence records
How are these related to traits?
57 publications had 565,158 anatomical trait
descriptions for 2,527 kinds of organisms
= 223 traits/organism
38,189 trait descriptions for 4,727 genes for
1.9 million species on the planet
= LOTS OF TRAITS & no central repository
TraitBank data sourceshttp://eol.org/traitbank
launched January 2014
Natural History Collections
Search & Download
Data Summaries on
EOL Taxon Pages
Which plants grow well in
What do water bears eat?
What is the biggest
species of whale?
~7 million records
1.2 million taxa
TraitBank Uploading Darwin Core Archives
Common names | Taxa | References | MeasurementsOrFacts | Associations | Events |
Term URIs from existing ontologies
Subject Area Ontology Example terms
mean, minimal value,
Units of Measurement Ontology
meter, years, degree
Environments Ontology (EnvO) wetland, desert, snow field
Phenotype Quality Ontology (PATO) aerobic, conical, evergreen
Plant attributes Plant Trait Ontology flower color, life cycle habit,
Animal attributes Vertebrate Trait Ontology body mass, total life span,
onset of fertility
Animal Natural History and Life
History Ontology (ETHAN)
Evangelos Pafilis, Hellenic Centre for Marine Research (HCMR), Institute of
Marine Biology, Biotechnology and Aquaculture (IMBBC), Crete, Greece
491,616 habitat terms for 136,548 taxa