Practical interoperability across semantic stores of data for ecological, taxonomic, phylogenetic, and metagenomics research
Upcoming SlideShare
Loading in...5
×
 

Practical interoperability across semantic stores of data for ecological, taxonomic, phylogenetic, and metagenomics research

on

  • 388 views

Presented at the Biodiversity Information Standards (Taxonomic Databases Working Group) 2013 meeting in Florence, Italy on 31 October 2013. Essentially, an introduction to aspects of the back end of ...

Presented at the Biodiversity Information Standards (Taxonomic Databases Working Group) 2013 meeting in Florence, Italy on 31 October 2013. Essentially, an introduction to aspects of the back end of the new trait repository of Encyclopedia of Life.

Statistics

Views

Total Views
388
Views on SlideShare
388
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • EOL's TraitBank™ aggregates and manages attribute (trait) data across the tree of life in a Virtuoso triple store. Attributes of organisms include morphological descriptors, life history characteristics, habitat preferences, and interactions with other organisms. In this talk we focus on how we add to and improve semantics of both data and metadata in order to improve interoperability across the domains of morphology, ecology, and genomics.  At least initially, most data aggregated by TraitBank will not have been "born semantic." Wherever possible, for each dataset, staff will select Uniform Resource Identifiers (URIs) for terms in existing ontologies (e.g. those registered in bioportal.bioontologies.org) to anchor the type of the attribute (e.g. habitat from the Environments Ontology).  We also use terms from ontologies or other controlled vocabularies for value of attributes (e.g. a particular type of habitat) as well as for most metadata describing the context of the measurement (e.g. life stage, geographic scope). As large datasets are ingested we will propose new terms if needed to managers of existing ontologies. Using a customized interface we ensure and can share good definitions and labels for terms that don't yet have them. We also use this interface to promote good practice when others choose URIs for directly-added data. However, we will remain flexible and allow new community-generated terms. We anticipate iterative processes to relate new terms to each other and to existing ontologies. Our usage of semantic reasoning will initially be quite light, limited to units conversion and inverse relationships. Eventually it could be expanded to infer values based on phylogeny. A prime example of the approach of reusing ontologies is the Global Biotic Interactions group (GLoBI, http://globalbioticinteractions.wordpress.com/) which reuses and extends classes and relations from existing biomedical and genomic ontologies. In particular Globi.owl draws interaction processes from the Gene Ontology, taxonomic ranks from the Open Biomedical Ontology (OBO) taxrank ontology, relations from the OBO Relations Ontology, life cycle stages and body parts from UBERON, observation and specimen terms from various ontologies, behaviors from NeuroBehaviorOntology and habitat keywords from Environment Ontology. GLoBI standardizes data then flows it to EOL. Though challenges remain to be addressed, the ultimate goal is to expose semantically-annotated, contextualized data so that it can contribute to 1) phylogenetic analyses aimed at understanding evolutionary responses and evolutionary history, 2) facilitation of new species discovery, 3) metagenomic analyses aimed at integrated understanding of ecosystem processes, and 4) Global biotic models.
  • Starting with marine dataIn the most simplistic view, we’ll be storing triplesThis data will be organized on a data tab, sorting out the data into the 35 or so “topics” that we currently have text chapters for, and we will also allow powerful downloading and searching capabilityFinally we’ll be setting up ways for other applications to grab the data and do interesting things with it. We already have a tool for making field guides,The approach here builds on our innovations for EOL and adds some proven technology called the “semantic web” to our domain. The next step takes this chain of innovation even further.
  • Note that we can set inverse of association types
  • http://globalbioticinteractions.files.wordpress.com/2013/08/puerto_rico_interactions.pngInformation Visualization

Practical interoperability across semantic stores of data for ecological, taxonomic, phylogenetic, and metagenomics research Practical interoperability across semantic stores of data for ecological, taxonomic, phylogenetic, and metagenomics research Presentation Transcript

  • Practical interoperability across semantic stores of data for blah blah blah eol.org @eol @cydparr
  • The road to TraitBank In second year of 2 year project: Marine Expert Audience Conservation science Virtuoso triple store <EOL taxon id> <hasAvgBodyMass in g> <value> <EOL taxon id> <preysOn> <scientific name> Beta testing NOW for public launch early 2014 21 datasets with 2.8 million data records for 520,000 taxa Harvest, display, curate, search, download MOST DATA NOT BORN SEMANTIC From text mining From literature tables From data papers From databases
  • Term URIs from existing ontologies • • • • • • • e.g. those registered in bioportal.bioontologies.org Statistics from Semantic Science Integrated Ontology Units Ontology Environments Ontology EnvO Gene Ontology ETHAN (Natural history, with Joel Sachs) Vertebrate Trait Ontology Plant Trait Ontology • Where necessary: request terms • Last resort: create provisional terms with http://eol.org/schema/terms/xxxx • Of course, also using unique EOL taxon identifiers, which we’ve mapped to identifiers of other projects
  • Known URIs tool Only light reasoning so far– just to infer inverse relationships like “eats” and “is eaten by”
  • GLoBI http://globalbioticinteractions.wordpress.com/ Jorrit Poelen, Chris Mungall, James Simon GoMexSi 14 datasets with 25k taxa, 422k interactions, for 3k locations alpha version of ingestion, normalization, aggregation alpha version of web API alpha version of data exports
  • GLoBI ontology work https://github.com/jhpoelen/eol-globidata/tree/master/eol-globi-ontology Interaction processes from Gene Ontology Relations from OBO Relations Ontology Life cycle stages and body parts from UBERON Observation and specimen terms from various Behaviors from NeuroBehaviorOntology and Habitat keywords from Environment Ontology New terms: /eats, /interactsWith, /preysUpon, /hasHost, /hosts, /parasitizes
  • Adding data
  • To do • Term evaluation and recommendations • Map similar terms • Map terms to upper ontology like Species Profile Model • Leverage reasoning for data validation To access to the Beta test, happening NOW Send your EOL login to: @cydparr parrc@si.edu