BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies that the US National Li- brary of Medicine distributes in its own proprietary format. We have published the RDF based serializations of all these ontologies and their meta- data at sparql.bioontology.org.
Using SPARQL to Query BioPortal Ontologies and Metadata
1. Using SPARQL to Query BioPortal
Ontologies and Metadata
Manuel Salvadores, Matthew Horridge, Paul R. Alexander,
Ray W. Fergerson, Mark A. Musen, and Natasha F. Noy
Stanford Center for Biomedical Informatics Research (BMIR)
Stanford University
ISWC 2012
sparql.bioontology.org Boston, US
1
Tuesday, November 13, 12
3. The main entry point to BioPortal data are
the REST APIs.
3
Tuesday, November 13, 12
4. The main entry point to BioPortal data are
the REST APIs.
Via REST services we cannot offer answers
to queries that require fine access to the
data.
3
Tuesday, November 13, 12
5. The main entry point to BioPortal data are
the REST APIs.
Via REST services we cannot offer answers
to queries that require fine access to the
data.
SPARQL finer data access
3
Tuesday, November 13, 12
6. The main entry point to BioPortal data are
the REST APIs.
Via REST services we cannot offer answers
to queries that require fine access to the
data.
SPARQL finer data access
Challenges, opportunities and lessons
learnt with sparql.bioontology.org
3
Tuesday, November 13, 12
8. Our SPARQL endpoint is different from others
because our data are primarily ontologies
themselves and not data about individuals.
Still lessons learnt apply to other domains, we have
to deal with:
performance
scalability
heterogeneity
query articulation
5
Tuesday, November 13, 12
10. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
6
Tuesday, November 13, 12
11. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
6
Tuesday, November 13, 12
12. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
6
Tuesday, November 13, 12
13. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
6
Tuesday, November 13, 12
14. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
6
Tuesday, November 13, 12
15. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
6
Tuesday, November 13, 12
16. BioPortal Data
• Ontology Content
• OBO Format
• Rich Release Format (RRF)
• OWL
• Ontology Metadata
• Mapping Data
7
Tuesday, November 13, 12
17. BioPortal Data
• Ontology Content
• OBO Format
• Rich Release Format (RRF) RDF
• OWL
• Ontology Metadata
• Mapping Data
7
Tuesday, November 13, 12
18. BioPortal Data
• Ontology Content
• OBO Format
• Rich Release Format (RRF) RDF
• OWL
• Ontology Metadata
• Mapping Data
Triple Store
7
Tuesday, November 13, 12
19. BioPortal Data
• Ontology Content
• OBO Format
• Rich Release Format (RRF) RDF
• OWL
• Ontology Metadata SPARQL
• Mapping Data
Triple Store
7
Tuesday, November 13, 12
20. RDF - Ontology Metadata
omv:Ontology
meta:VirtualOntology version
name
date
ontology meta:hasVersion ontology
format
/1353 /46896
meta:hasVersion
(....)
ontology
meta:hasVersion /46116 meta:hasDataGraph
ontology <http://bioportal.bioontology.org/ontologies/SNOMED>
/42122
8
Tuesday, November 13, 12
21. RDF - Mappings
<http://purl.bioontology.org/mapping/2767e8e0-001b-012e-749f-005056bd0010>
maps:has_process_info <.../procinfo/2008-04-23-38138> ;
maps:comment "Manual mappings between Mouse anatomy and NCIT." ;
maps:relation skos:closeMatch ;
maps:target <http://purl.org/obo/owl/MA#MA_0001096> ;
maps:source <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Olfactory_Nerve> ;
maps:source_ontology_id <http://bioportal.bioontology.org/ontologies/1032> ;
maps:target_ontology_id <http://bioportal.bioontology.org/ontologies/1000> ;
a maps:One_To_One_Mapping . Mapping
<http://purl.bioontology.org/mapping/nonloom/procinfo/2008-04-23-38138>
maps:date "2008-04-23T19:21:45Z"^^xsd:dateTime ;
maps:mapping_source "Organization" ;
maps:mapping_source_contact_info "http://www.nlm.nih.gov" ;
maps:mapping_source_name "NLM" ;
maps:mapping_source_site <http://www.nlm.nih.gov> ;
maps:mapping_type "Manual" ;
maps:submitted_by 38138 .
Noy, N.F., Griffith, N., Musen, M.A.: Collecting community-based mappings in an ontology
repository. In: International Semantic Web Conference. pp. 371–386 (2008)
9
Tuesday, November 13, 12
22. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
10
Tuesday, November 13, 12
23. Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
11
Tuesday, November 13, 12
24. Common Attributes in BP Ontologies
taxonomies rdfs:subClassOf
preferred labels
synonyms
definitions
11
Tuesday, November 13, 12
25. Common Attributes in BP Ontologies
taxonomies rdfs:subClassOf
preferred labels
synonyms rdfs:subPropertyOf
definitions
11
Tuesday, November 13, 12
26. Common Attributes in BP Ontologies
taxonomies rdfs:subClassOf
preferred labels
synonyms rdfs:subPropertyOf
definitions
12
Tuesday, November 13, 12
27. BP Taxonomies
Almost every ontology in BioPortal uses
rdfs:subClassOf to record class hierarchies.
We offer rdfs:subClassOf reasoning to collect
hierarchy closures.
13
Tuesday, November 13, 12
28. BP Taxonomies
Almost every ontology in BioPortal uses
rdfs:subClassOf to record class hierarchies.
We offer rdfs:subClassOf reasoning to collect
hierarchy closures.
backward-chain
off by default
13
Tuesday, November 13, 12
29. BP Taxonomies
2 use cases and their challenges
hierarchies with mappings
partial traversal
14
Tuesday, November 13, 12
30. hierarchies and mappings
With mappings one can continue browsing a taxonomy
beyond the boundaries of a certain ontology.
"malignant hyperthermia"
Human Disease Ontology
15
Tuesday, November 13, 12
31. hierarchies and mappings
With mappings one can continue browsing a taxonomy
beyond the boundaries of a certain ontology.
15
Tuesday, November 13, 12
32. hierarchies and mappings
With mappings one can continue browsing a taxonomy
beyond the boundaries of a certain ontology.
15
Tuesday, November 13, 12
33. hierarchies and mappings
With mappings one can continue browsing a taxonomy
beyond the boundaries of a certain ontology.
15
Tuesday, November 13, 12
34. partial traversal
Some applications need to traverse the
hierarchy for a fixed number of steps.
16
Tuesday, November 13, 12
35. partial traversal
Some applications need to traverse the
hierarchy for a fixed number of steps.
16
Tuesday, November 13, 12
36. Common Attributes in BP Ontologies
taxonomies rdfs:subClassOf
preferred labels
synonyms rdfs:subPropertyOf
definitions
17
Tuesday, November 13, 12
37. Common Attributes in BP Ontologies
taxonomies rdfs:subClassOf
preferred labels
synonyms rdfs:subPropertyOf
definitions
18
Tuesday, November 13, 12
38. BP preferred labels,
synonyms and definitions
34 ontologies record preferred labels,
synonyms and definitions using their
own predicates.
When ontology authors upload ontologies into BioPortal they have
to choose what are the predicates that represent these attributes.
19
Tuesday, November 13, 12
39. rdfs:label
SKOS
skos:prefLabel skos:altLabel skos:definition
pref. label alt. label definition user
predicates predicates predicates defined
We provide uniform access to these
proper ties by linking these different
properties to the standard SKOS properties
using rdfs:subPropertyOf.
Tuesday, November 13, 12
40. By including the
rdfs:subPropertyOf links in
“globals” we do not need to
know what property is used
in NIF-RTH to retrieve
preferred labels.
21
Tuesday, November 13, 12
41. By including the
rdfs:subPropertyOf links in
“globals” we do not need to
know what property is used
in NIF-RTH to retrieve
preferred labels.
21
Tuesday, November 13, 12
42. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
22
Tuesday, November 13, 12
43. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
22
Tuesday, November 13, 12
44. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
22
Tuesday, November 13, 12
46. obo:VO_0000001
a owl:Class ;
rdfs:label "vaccine" ;
rdfs:seeAlso "MeSH: D014612" ;
obo:IAO_0000115 "A vaccine is a processed (...) " ;
obo:IAO_0000116 "Many vaccines are developed (...) " ;
obo:IAO_0000117 "YH, BP, BS, MC, LC, XZ, RS" ;
rdfs:subClassOf obo:OBI_0000047 ;
owl:equivalentClass [
a owl:Class ;
owl:intersectionOf (obo:OBI_0000047
[ Example of a
a owl:Restriction ;
owl:onProperty obo:BFO_0000085 ; relatively complex
owl:someValuesFrom [
a owl:Class ; OWL construction
owl:intersectionOf (obo:VO_0000278
[
a owl:Restriction ;
from the Vaccine
owl:onProperty obo:BFO_0000054 ;
owl:someValuesFrom obo:VO_0000494
Ontology
]
)
]
]
[
a owl:Restriction ;
owl:onProperty obo:OBI_0000312 ;
owl:someValuesFrom obo:VO_0000590
]
)
].
24
Tuesday, November 13, 12
47. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
25
Tuesday, November 13, 12
48. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
25
Tuesday, November 13, 12
49. Challenges, opportunities and lessons learn with
sparql.bioontology.org
1. Retrieval of common attributes and how simple
reasoning can help.
2. Complexity of query articulation when targeting
OWL complex constructions.
3. Best practices in using an open shared endpoint:
I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
25
Tuesday, November 13, 12
50. Best practices in using a
shared SPARQL endpoint
selective queries work better
control size of result sets - pagination
how clients read matters
26
Tuesday, November 13, 12
54. control size of result sets - pagination
28
Tuesday, November 13, 12
55. control size of result sets - pagination
while len(results) == LIMIT
OFFSET += LIMIT
28
Tuesday, November 13, 12
56. Use libraries that parse the
result set on demand
how clients read matters
60
retrieval of all preferred labels from
NCBI Taxonomy (500K solutions)
45
130.0
30
97.5
65.0
15
32.5
0
0 JSON+Python JSON+CJSON XML+Jena ARQ XML+Sesame
XML JSON
output size in MB parsing time in seconds
29
Tuesday, November 13, 12
57. Undefined 1 (2009) 1–5 1
IOS Press
Using SPARQL to Query BioPortal
Ontologies and Metadata
BioPortal as a Dataset of Linked Biomedical
Ontologies and Terminologies in RDF.
Manuel Salvadores, Matthew Horridge, Paul R. Alexander,
Ray W. Fergerson, Mark A. Musen, and Natalya F. Noy Manuel Salvadores, a,⇤ Paul R. Alexander, a Mark A. Musen a and Natalya F. Noy a
a
Stanford Center for Biomedical Informatics Research
Stanford Center for Biomedical Informatics Research Stanford University, US
Stanford University, US E-mail: {manuelso, palexander, musen, noy}@stanford.edu,
{manuelso,matthew.horridge,palexander,ray.fergerson,musen,noy}@stanford.
edu
Abstract. BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to
Abstract. BioPortal is a repository of biomedical ontologies—the largest date. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medical
such repository, with more than 300 ontologies to date. This set includes terminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDF
ontologies that were developed in OWL, OBO and other languages, as version of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representing
well as a large number of medical terminologies that the US National Li- both metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFS
brary of Medicine distributes in its own proprietary format. We have pub- reasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties for
lished the RDF based serializations of all these ontologies and their meta- the class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated both
data at sparql.bioontology.org. This dataset contains 203M triples, manually and automatically, which come with their own metadata.
representing both content and metadata for the 300+ ontologies; and 9M Keywords: biomedical ontologies, BioPortal, RDF, linked data
mappings between terms. This endpoint can be queried with SPARQL
which opens new usage scenarios for the biomedical domain. This paper
presents lessons learned from having redesigned several applications that
today use this SPARQL endpoint to consume ontological data. 1. Introduction numerous requests from users for the SPARQL end-
In our laboratory, we have developed BioPortal, a point, which would enable them to query and analyze
community-based ontology repository for biomedical the data in much more precise and application-specific
Keywords: Ontologies, SPARQL, RDF, Biomedical, Linked Data
ontologies [20,1]. Users can publish their ontologies ways than our set of REST APIs allowed.
to BioPortal, submit new versions, browse the ontolo- This paper describes the Linked Data aspects of the
1 SPARQL In Use In BioPortal: gies, and access the ontologies and their components BioPortal’s ecosystem and the structure of our linked
through a set of REST services, SPARQL and de- datasets in RDF. In addition, we describe the process
Overview of Opportunities and Challenges that we used to transform different ontology formats
referenceable URIs.
Over the past four years, as BioPortal grew in popu- into RDF and the mappings between ontologies. We
Ontology repositories act as a gateway for users who need to find ontologies for larity, research institutions and corporations have used describe several issues with using the shared SPARQL
their applications. Ontology developers submit their ontologies to these reposi- our REST APIs extensively. The use of the REST ser- endpoint elsewhere [10]. This discussion includes the
tories in order to promote their vocabularies and to encourage inter-operation. vices has experienced outstanding growth in 2011. The details on retrieving common attributes from multi-
In biomedicine, cultural heritage, and other domains, many of the ontologies and average number of hits per month grew from 3M hits ple ontologies, articulating complex queries, and the
vocabularies are extremely large, with tens of thousands of classes. in 2010 to 122M hits in 2011.Our users have incorpo- lessons that we have learned on the best practices of
In our laboratory, we have developed BioPortal, a community-based ontology rated these services in applications that perform drug using a shared SPARQL endpoint.
repository for biomedical ontologies [11]. Users can publish their ontologies to surveillance, gene annotation, enrichment and clas- 2. Biomedical Ontologies in BioPortal
BioPortal, submit new versions, browse the ontologies, and access the ontologies sification of scientific literature, and other tasks. In
December 2011, we released a public SPARQL end- Researchers and practitioners in the Semantic Web
and their components through a set of REST services. BioPortal provides search normally deal with two types of data: (1) ontologies,
across all ontologies in its collection, a repository of automatically and manually point, http://sparql.bioontology.org, to
provide direct access to our datasets in RDF. We had vocabularies or TBoxes; and (2) instance data or sim-
generated mappings between classes in di↵erent ontologies, ontology reviews, ply data. It is important to clarify that BioPortal’s con-
new term requests, and discussions generated by the ontology users in the com- tent is almost exclusively ontologies and related arti-
munity. BioPortal contains metadata about each ontology and its versions as * Corresponding author. E-mail: manuelso@stanford.edu. facts. By contrast, most other datasets of the Linked
well as mappings between terms in di↵erent ontologies.
0000-0000/09/$00.00 c 2009 – IOS Press and the authors. All rights reserved
30
Tuesday, November 13, 12
58. Conclusions
• Our use of SPARQL is different from many other use cases
because our data are primarily ontologies themselves and
not data about individuals.
• SPARQL and a small amount of reasoning can be
particularly powerful in providing easy access to common
attributes.
• Exposing OWL through a SPARQL endpoint poses a
number of challenges.
• There are challenges in running a shared open SPARQL
endpoint. We can overcome these challenges if we
encourage developers to conform to a set of simple best
practices.
31
Tuesday, November 13, 12
59. Thank you
Questions
32
Tuesday, November 13, 12