Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2016 bmdid-mappings

Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data

  • Be the first to comment

2016 bmdid-mappings

  1. 1. ONTOLOGY MAPPING FOR LIFE SCIENCE LINKED DATA ISWC2016:::BMDID::Dumontier1 Amrapali Zaveri and Michel Dumontier Stanford Center for Biomedical Informatics Research Stanford University
  2. 2. Large and growing network of Linked Data 2 ISWC2016:::BMDID::DumontierLinking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  3. 3. ISWC2016:::BMDID::Dumontier Linked Data for the Life Sciences 3 Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF. chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications • 11B+ interlinked statements from 35 biomedical datasets and 400+ ontologies • dataset description, provenance & statistics • A growing interoperable ecosystem with the EBI, NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers
  4. 4. Biomedical Linked Data ISWC2016:::BMDID::Dumontier4
  5. 5. the lack of coordination to a global schema makes Linked Data chaotic and unwieldy ISWC2016:::BMDID::Dumontier5
  6. 6. Federated queries require intimate knowledge of each dataset schema Get all protein catabolic processes (and more specific GO terms) in biomodels SELECT ?go ?label count(distinct ?x) WHERE { service <http://bioportal.bio2rdf.org/sparql> { ?go rdfs:label ?label . ?go rdfs:subClassOf+ ?tgo ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") } service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . } } ISWC2016:::BMDID::Dumontier6
  7. 7. uniprot:P05067 uniprot:Protein is a sio:gene is a is a Previous work involved manual mappings between Bio2RDF types and relations and the Semanticscience Integrated Ontology (SIO) dataset ontology Knowledge Base ISWC2016:::BMDID::Dumontier pharmgkb:PA30917 refseq:Protein is a is a omim:189931 omim:Gene pharmgkb:Gene Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. Bio-ontologies 2012. 7
  8. 8. ISWC2016:::BMDID::Dumontier8 Semanticscience Ontology (SIO) An effective upper level ontology. 1500+ classes 207 object properties (inc. inverses) 1 datatype property
  9. 9. Bio2RDF and SIO powered SPARQL federated query: Find chemicals (from CTD) and proteins (from SGD) that participate in the same process (from GOA) SELECT ?chem, ?prot, ?proc FROM <http://bio2rdf.org/ctd> WHERE { SERVICE <http://ctd.bio2rdf.org/sparql> { ?chemical a sio:chemical-entity. ?chemical rdfs:label ?chem. ?chemical sio:is-participant-in ?process. ?process rdfs:label ?proc. FILTER regex (?process, "http://bio2rdf.org/go:") } SERVICE <http://sgd.bio2rdf.org/sparql> { ?protein a sio:protein . ?protein sio:is-participant-in ?process. ?protein rdfs:label ?prot . } } ISWC2016:::BMDID::Dumontier9
  10. 10. Many vocabularies, ontologies and community-based standards are now available ISWC2016:::BMDID::Dumontier
  11. 11. PubChem uses multiple terminologies ISWC2016:::BMDID::Dumontier11
  12. 12. Existing limitations with Bio2RDF mappings • New datasets have been added • Existing datasets have changed • The target ontology (SIO) has changed • The target ontology (SIO) is incomplete and there may be better ontologies to use • These ontologies are evolving, today’s mappings may be invalid or imprecise tomorrow • Manual process -> not easy and not reproducible -> must automate ISWC2016:::BMDID::Dumontier12
  13. 13. Goal Develop a semi-automated procedure to generate high quality mappings between Bio2RDF and SIO. ISWC2016:::BMDID::Dumontier13
  14. 14. approach 14 distance metrics graph -based instance -based BioPortal crowdsourcing previous work* Our work Automated Manual ISWC2016:::BMDID::Dumontier
  15. 15. Idea: Create mappings between SIO and Bio2RDF using ontologies in BioPortal 15 Bio2RDF NCBO Annotator/ Recommender SIO ISWC2016:::BMDID::Dumontier
  16. 16. Bio2RDF-SIO mappings via transitive closure through BioPortal ontologies 16 Bio2RDF SIO Super Class Mapped Class match ISWC2016:::BMDID::Dumontier
  17. 17. Results 17 319 (of 6093) classes pruned 1 NCBO Annotator 174 Bio2RDF classes matched directly and exactly to SIO 2 NCBO Recommender 94 Bio2RDF classes matched to BioPortal ontologies Bio2RDF remove blank nodes, general resources, OWL vocabulary & non-Bio2RDF types/relations. ISWC2016:::BMDID::Dumontier
  18. 18. Results 18 SIO 1500 classes 475 BioPortal Ontologies 3 393 BioPortal ontologies matched to SIO ISWC2016:::BMDID::Dumontier
  19. 19. Results 19 Bio2RDF 319 classes 4 Traverse hierarchy SIO 1500 classes 393 BioPortal ontologies matched to SIO 94 Bio2RDF classes matched to BioPortal ontologies ISWC2016:::BMDID::Dumontier
  20. 20. Results 20 Bio2RDF 319 classes 4 Traverse hierarchy SIO 1500 classes 393 BioPortal ontologies matched to SIO 94 Bio2RDF classes matched to BioPortal ontologies 71 matches Mapped class Super class ISWC2016:::BMDID::Dumontier
  21. 21. Results — Example 21 Bio2RDF class clinicaltrials:Clincial-Study Super class Edda:Study_Design Mapped class edda:clinical_trial SIO class sio:001041| (study design) skos:broader ISWC2016:::BMDID::Dumontier
  22. 22. Mappings often occurred to more than one class 22 sider:Drug-Indication-Association sio:010038 (drug) sio:010299 (disease) sio:000897 (association) ISWC2016:::BMDID::Dumontier
  23. 23. Manual validation of mappings 23 Bio2RDF Class SIO Class Annotation drugbank:Biotech no match clinicaltrials:Organization sio:00012 (organization) exact drugbank:toxicity sio:001008 (toxicity) exact sgd:GlycineCount sio:000794 (count) partial – is-a wormbase:Genetic- Interaction sio:010035 (gene) partial – part-of clinicaltrials:Serious-Event sio:000614 (attribute) incorrect drugbank:Source sio:000510 (model) incorrect All results available at https://goo.gl/eiijmQ ISWC2016:::BMDID::Dumontier
  24. 24. Conclusion • Developed a semi-automated methodology to map Bio2RDF classes to SIO via BioPortal ontologies • 245 of 319 Bio2RDF classes matched to SIO 24 ISWC2016:::BMDID::Dumontier
  25. 25. Limitations • Unmatched classes: neither SIO nor other ontologies have complete coverage • Overly general concepts: Semantically incompatible classes • Incorrect mappings: Matches to part of the class • Mappings are insufficient to precisely to retrieve data across different datasets 25 ISWC2016:::BMDID::Dumontier
  26. 26. Future Work • Extend SIO to include classes that are ultimately not found • Explore mid-level portion of SIO to eliminate root level mappings • Scalable validation by via crowdsourcing • Pursue query rewriting 26 ISWC2016:::BMDID::Dumontier
  27. 27. dumontierlab.com michel.dumontier@stanford.edu Website: http://dumontierlab.com 27 ISWC2016:::BMDID::Dumontier

    Be the first to comment

    Login to see the comments

  • amrapalijz

    Oct. 18, 2016
  • VenkatGullapalli

    Oct. 19, 2017

Bio2RDF is an open-source project that offers a large and connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in the ontology will offer a better-integrated view of the Life Science Linked Data

Views

Total views

526

On Slideshare

0

From embeds

0

Number of embeds

9

Actions

Downloads

21

Shares

0

Comments

0

Likes

2

×