• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Using SPARQL to Query BioPortal Ontologies and Metadata
 

Using SPARQL to Query BioPortal Ontologies and Metadata

on

  • 1,373 views

BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other ...

BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies that the US National Li- brary of Medicine distributes in its own proprietary format. We have published the RDF based serializations of all these ontologies and their meta- data at sparql.bioontology.org.

Statistics

Views

Total Views
1,373
Views on SlideShare
1,368
Embed Views
5

Actions

Likes
4
Downloads
30
Comments
0

3 Embeds 5

https://twitter.com 2
http://www.slashdocs.com 2
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Using SPARQL to Query BioPortal Ontologies and Metadata Using SPARQL to Query BioPortal Ontologies and Metadata Presentation Transcript

    • Using SPARQL to Query BioPortal Ontologies and Metadata Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natasha F. Noy Stanford Center for Biomedical Informatics Research (BMIR) Stanford University ISWC 2012 sparql.bioontology.org Boston, US 1Tuesday, November 13, 12
    • 2Tuesday, November 13, 12
    • The main entry point to BioPortal data are the REST APIs. 3Tuesday, November 13, 12
    • The main entry point to BioPortal data are the REST APIs. Via REST services we cannot offer answers to queries that require fine access to the data. 3Tuesday, November 13, 12
    • The main entry point to BioPortal data are the REST APIs. Via REST services we cannot offer answers to queries that require fine access to the data. SPARQL finer data access 3Tuesday, November 13, 12
    • The main entry point to BioPortal data are the REST APIs. Via REST services we cannot offer answers to queries that require fine access to the data. SPARQL finer data access Challenges, opportunities and lessons learnt with sparql.bioontology.org 3Tuesday, November 13, 12
    • 4Tuesday, November 13, 12
    • Our SPARQL endpoint is different from others because our data are primarily ontologies themselves and not data about individuals. Still lessons learnt apply to other domains, we have to deal with: performance scalability heterogeneity query articulation 5Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. 6Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 6Tuesday, November 13, 12
    • BioPortal Data • Ontology Content • OBO Format • Rich Release Format (RRF) • OWL • Ontology Metadata • Mapping Data 7Tuesday, November 13, 12
    • BioPortal Data • Ontology Content • OBO Format • Rich Release Format (RRF) RDF • OWL • Ontology Metadata • Mapping Data 7Tuesday, November 13, 12
    • BioPortal Data • Ontology Content • OBO Format • Rich Release Format (RRF) RDF • OWL • Ontology Metadata • Mapping Data Triple Store 7Tuesday, November 13, 12
    • BioPortal Data • Ontology Content • OBO Format • Rich Release Format (RRF) RDF • OWL • Ontology Metadata SPARQL • Mapping Data Triple Store 7Tuesday, November 13, 12
    • RDF - Ontology Metadata omv:Ontology meta:VirtualOntology version name date ontology meta:hasVersion ontology format /1353 /46896 meta:hasVersion (....) ontology meta:hasVersion /46116 meta:hasDataGraph ontology <http://bioportal.bioontology.org/ontologies/SNOMED> /42122 8Tuesday, November 13, 12
    • RDF - Mappings <http://purl.bioontology.org/mapping/2767e8e0-001b-012e-749f-005056bd0010> maps:has_process_info <.../procinfo/2008-04-23-38138> ; maps:comment "Manual mappings between Mouse anatomy and NCIT." ; maps:relation skos:closeMatch ; maps:target <http://purl.org/obo/owl/MA#MA_0001096> ; maps:source <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Olfactory_Nerve> ; maps:source_ontology_id <http://bioportal.bioontology.org/ontologies/1032> ; maps:target_ontology_id <http://bioportal.bioontology.org/ontologies/1000> ; a maps:One_To_One_Mapping . Mapping <http://purl.bioontology.org/mapping/nonloom/procinfo/2008-04-23-38138> maps:date "2008-04-23T19:21:45Z"^^xsd:dateTime ; maps:mapping_source "Organization" ; maps:mapping_source_contact_info "http://www.nlm.nih.gov" ; maps:mapping_source_name "NLM" ; maps:mapping_source_site <http://www.nlm.nih.gov> ; maps:mapping_type "Manual" ; maps:submitted_by 38138 . Noy, N.F., Griffith, N., Musen, M.A.: Collecting community-based mappings in an ontology repository. In: International Semantic Web Conference. pp. 371–386 (2008) 9Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 10Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies preferred labels synonyms definitions 11Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies rdfs:subClassOf preferred labels synonyms definitions 11Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies rdfs:subClassOf preferred labels synonyms rdfs:subPropertyOf definitions 11Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies rdfs:subClassOf preferred labels synonyms rdfs:subPropertyOf definitions 12Tuesday, November 13, 12
    • BP Taxonomies Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies. We offer rdfs:subClassOf reasoning to collect hierarchy closures. 13Tuesday, November 13, 12
    • BP Taxonomies Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies. We offer rdfs:subClassOf reasoning to collect hierarchy closures. backward-chain off by default 13Tuesday, November 13, 12
    • BP Taxonomies 2 use cases and their challenges hierarchies with mappings partial traversal 14Tuesday, November 13, 12
    • hierarchies and mappings With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology. "malignant hyperthermia" Human Disease Ontology 15Tuesday, November 13, 12
    • hierarchies and mappings With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology. 15Tuesday, November 13, 12
    • hierarchies and mappings With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology. 15Tuesday, November 13, 12
    • hierarchies and mappings With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology. 15Tuesday, November 13, 12
    • partial traversal Some applications need to traverse the hierarchy for a fixed number of steps. 16Tuesday, November 13, 12
    • partial traversal Some applications need to traverse the hierarchy for a fixed number of steps. 16Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies rdfs:subClassOf preferred labels synonyms rdfs:subPropertyOf definitions 17Tuesday, November 13, 12
    • Common Attributes in BP Ontologies taxonomies rdfs:subClassOf preferred labels synonyms rdfs:subPropertyOf definitions 18Tuesday, November 13, 12
    • BP preferred labels, synonyms and definitions 34 ontologies record preferred labels, synonyms and definitions using their own predicates. When ontology authors upload ontologies into BioPortal they have to choose what are the predicates that represent these attributes. 19Tuesday, November 13, 12
    • rdfs:label SKOS skos:prefLabel skos:altLabel skos:definition pref. label alt. label definition user predicates predicates predicates defined We provide uniform access to these proper ties by linking these different properties to the standard SKOS properties using rdfs:subPropertyOf.Tuesday, November 13, 12
    • By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels. 21Tuesday, November 13, 12
    • By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels. 21Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 22Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 22Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 22Tuesday, November 13, 12
    • Complex Query Articulation EquivalentClasses( :x ObjectUnionOf( :Class1 :Class2 :Class3 ) ) . Functional Syntax :x owl:equivalentClass [ owl:Class; RDF Turtle Serialization owl:unionOf ( :Class0 :Class1 :Class2 ) ] . x RDF Model Representation owl:Class owl:equivalentClass rdf:type Class0 Class1 Class2 Anon0 rdf:first rdf:first rdf:first owl:unionOf Anon1 Anon2 Anon3 rdf:nil rdf:rest rdf:rest rdf:rest 23Tuesday, November 13, 12
    • obo:VO_0000001 a owl:Class ; rdfs:label "vaccine" ; rdfs:seeAlso "MeSH: D014612" ; obo:IAO_0000115 "A vaccine is a processed (...) " ; obo:IAO_0000116 "Many vaccines are developed (...) " ; obo:IAO_0000117 "YH, BP, BS, MC, LC, XZ, RS" ; rdfs:subClassOf obo:OBI_0000047 ; owl:equivalentClass [ a owl:Class ; owl:intersectionOf (obo:OBI_0000047 [ Example of a a owl:Restriction ; owl:onProperty obo:BFO_0000085 ; relatively complex owl:someValuesFrom [ a owl:Class ; OWL construction owl:intersectionOf (obo:VO_0000278 [ a owl:Restriction ; from the Vaccine owl:onProperty obo:BFO_0000054 ; owl:someValuesFrom obo:VO_0000494 Ontology ] ) ] ] [ a owl:Restriction ; owl:onProperty obo:OBI_0000312 ; owl:someValuesFrom obo:VO_0000590 ] ) ]. 24Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 25Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 25Tuesday, November 13, 12
    • Challenges, opportunities and lessons learn with sparql.bioontology.org 1. Retrieval of common attributes and how simple reasoning can help. 2. Complexity of query articulation when targeting OWL complex constructions. 3. Best practices in using an open shared endpoint: I. Selective queries work better. II. Careful with large result sets. Paginate. III. How the client reads matter. 25Tuesday, November 13, 12
    • Best practices in using a shared SPARQL endpoint selective queries work better control size of result sets - pagination how clients read matters 26Tuesday, November 13, 12
    • selective queries work better 27Tuesday, November 13, 12
    • selective queries work better 27Tuesday, November 13, 12
    • selective queries work better for each ?p 27Tuesday, November 13, 12
    • control size of result sets - pagination 28Tuesday, November 13, 12
    • control size of result sets - pagination while len(results) == LIMIT OFFSET += LIMIT 28Tuesday, November 13, 12
    • Use libraries that parse the result set on demand how clients read matters 60 retrieval of all preferred labels from NCBI Taxonomy (500K solutions) 45 130.0 30 97.5 65.0 15 32.5 0 0 JSON+Python JSON+CJSON XML+Jena ARQ XML+Sesame XML JSON output size in MB parsing time in seconds 29Tuesday, November 13, 12
    • Undefined 1 (2009) 1–5 1 IOS Press Using SPARQL to Query BioPortal Ontologies and Metadata BioPortal as a Dataset of Linked Biomedical Ontologies and Terminologies in RDF. Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natalya F. Noy Manuel Salvadores, a,⇤ Paul R. Alexander, a Mark A. Musen a and Natalya F. Noy a a Stanford Center for Biomedical Informatics Research Stanford Center for Biomedical Informatics Research Stanford University, US Stanford University, US E-mail: {manuelso, palexander, musen, noy}@stanford.edu, {manuelso,matthew.horridge,palexander,ray.fergerson,musen,noy}@stanford. edu Abstract. BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to Abstract. BioPortal is a repository of biomedical ontologies—the largest date. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medical such repository, with more than 300 ontologies to date. This set includes terminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDF ontologies that were developed in OWL, OBO and other languages, as version of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representing well as a large number of medical terminologies that the US National Li- both metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFS brary of Medicine distributes in its own proprietary format. We have pub- reasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties for lished the RDF based serializations of all these ontologies and their meta- the class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated both data at sparql.bioontology.org. This dataset contains 203M triples, manually and automatically, which come with their own metadata. representing both content and metadata for the 300+ ontologies; and 9M Keywords: biomedical ontologies, BioPortal, RDF, linked data mappings between terms. This endpoint can be queried with SPARQL which opens new usage scenarios for the biomedical domain. This paper presents lessons learned from having redesigned several applications that today use this SPARQL endpoint to consume ontological data. 1. Introduction numerous requests from users for the SPARQL end- In our laboratory, we have developed BioPortal, a point, which would enable them to query and analyze community-based ontology repository for biomedical the data in much more precise and application-specific Keywords: Ontologies, SPARQL, RDF, Biomedical, Linked Data ontologies [20,1]. Users can publish their ontologies ways than our set of REST APIs allowed. to BioPortal, submit new versions, browse the ontolo- This paper describes the Linked Data aspects of the 1 SPARQL In Use In BioPortal: gies, and access the ontologies and their components BioPortal’s ecosystem and the structure of our linked through a set of REST services, SPARQL and de- datasets in RDF. In addition, we describe the process Overview of Opportunities and Challenges that we used to transform different ontology formats referenceable URIs. Over the past four years, as BioPortal grew in popu- into RDF and the mappings between ontologies. We Ontology repositories act as a gateway for users who need to find ontologies for larity, research institutions and corporations have used describe several issues with using the shared SPARQL their applications. Ontology developers submit their ontologies to these reposi- our REST APIs extensively. The use of the REST ser- endpoint elsewhere [10]. This discussion includes the tories in order to promote their vocabularies and to encourage inter-operation. vices has experienced outstanding growth in 2011. The details on retrieving common attributes from multi- In biomedicine, cultural heritage, and other domains, many of the ontologies and average number of hits per month grew from 3M hits ple ontologies, articulating complex queries, and the vocabularies are extremely large, with tens of thousands of classes. in 2010 to 122M hits in 2011.Our users have incorpo- lessons that we have learned on the best practices of In our laboratory, we have developed BioPortal, a community-based ontology rated these services in applications that perform drug using a shared SPARQL endpoint. repository for biomedical ontologies [11]. Users can publish their ontologies to surveillance, gene annotation, enrichment and clas- 2. Biomedical Ontologies in BioPortal BioPortal, submit new versions, browse the ontologies, and access the ontologies sification of scientific literature, and other tasks. In December 2011, we released a public SPARQL end- Researchers and practitioners in the Semantic Web and their components through a set of REST services. BioPortal provides search normally deal with two types of data: (1) ontologies, across all ontologies in its collection, a repository of automatically and manually point, http://sparql.bioontology.org, to provide direct access to our datasets in RDF. We had vocabularies or TBoxes; and (2) instance data or sim- generated mappings between classes in di↵erent ontologies, ontology reviews, ply data. It is important to clarify that BioPortal’s con- new term requests, and discussions generated by the ontology users in the com- tent is almost exclusively ontologies and related arti- munity. BioPortal contains metadata about each ontology and its versions as * Corresponding author. E-mail: manuelso@stanford.edu. facts. By contrast, most other datasets of the Linked well as mappings between terms in di↵erent ontologies. 0000-0000/09/$00.00 c 2009 – IOS Press and the authors. All rights reserved 30Tuesday, November 13, 12
    • Conclusions • Our use of SPARQL is different from many other use cases because our data are primarily ontologies themselves and not data about individuals. • SPARQL and a small amount of reasoning can be particularly powerful in providing easy access to common attributes. • Exposing OWL through a SPARQL endpoint poses a number of challenges. • There are challenges in running a shared open SPARQL endpoint. We can overcome these challenges if we encourage developers to conform to a set of simple best practices. 31Tuesday, November 13, 12
    • Thank you Questions 32Tuesday, November 13, 12