The document discusses using SKOS (Simple Knowledge Organization System) to simplify complex biomedical ontologies modeled in OWL (Web Ontology Language) for practical use by information architects and taxonomists. While OWL enables sophisticated modeling, SKOS provides a simpler framework for representing hierarchies and term properties. The Semantic Web allows both OWL and SKOS to coexist and be linked. Code is presented that extracts hierarchical relationships from an OWL ontology to generate equivalent SKOS properties, satisfying needs of both machine reasoning and human interaction with terminology.
2. • The Working Taxonomist and the need for reusing terms in a
vocabulary network.
• The Semantic Web to the rescue!
– SKOS: Yeah!
– OWL: uh oh!
• The false either/or.
– SKOS: too simple!
– OWL: too complicated!
• The Semantic Web is designed to allow for both the
sophistication of OWL and the practicality of SKOS to coexist.
• Here’s how.
Overview
4. Working Taxonomist
Build!
Preferred Name: Cisplatin
A drug used to treat
many types of cancer.
Cisplatin contains the
metal platinum. It kills
cancer cells by
damaging their DNA and
stopping them from
dividing. Cisplatin is a
type of alkylating agent.
Abiplatin
Blastolem
Briplatin
CDDP
Cis-diammine-dichloroplatinum
Cis-platinum
Cis-platinum II Diamine
Dichloride
Cismaplat
Cisplatina
Cisplatinum
Cisplatyl
Citoplatino
Cysplatyna
DDP
Lederplatin
Metaplatin
Neoplatin
Peyrone's Chloride
Placis
Plastistil
Platamine
Platiblastin
Platinex
Platinol
Platinol-AQ
6. Working Taxonomist
Cisplatin
A drug used to treat many types of cancer.
Cisplatin contains the metal platinum. It kills
cancer cells by damaging their DNA and
stopping them from dividing. Cisplatin is a type
of alkylating agent.
Cismaplat
Lederplatin
WHAT’S
WRONG
WITH
THIS
PICTURE?
Build!
8. Semantic Web—a solution?
SKOS…provides a means for
representing knowledge
organization systems (including
controlled vocabularies, thesauri,
taxonomies, and folksonomies) in
a distributed and linkable way.
SKOS vocabularies provide a
cornerstone for linking information
on the web. … Publishing
vocabularies in SKOS allows the
concepts they define to be
referenced on a global scale.
”
“
Remember: The “S” is for Simple!
9. SKOS in a nutshell
skos:definition
skos:prefLabel
skos:altLabel
skos:broader
skos:broader
Build!
10. SKOS in a nutshell
ncit:C2901 a skos:Concept .
ncit:C2901 skos:prefLabel “Bladder Neoplasm” .
ncit:C2901 skos:altLabel “Tumor of Bladder”,
“Urinary Bladder Tumor” .
ncit:C2901 skos:definition “A benign or malignant…” .
ncit:C2901 skos:broader ncit:C2900 .
ncit:C2901 skos:broader ncit:C3431 .
Written in code, it would look like this
Simple. Not complicated.
11. Semantic Web—a solution?
PROBLEM SOLVED!!
Build!
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
13. 1. Semantic Web is about more than just Taxonomies.
2. Biomedical Ontologies are using the Semantic Web
the model the processes of life and OWL provides
that framework.
3. OWL allows you model anything. SKOS explicitly
models taxonomies and thesauri.
4. To the Working Taxonomist trying to improve
search, navigation, document management and
build KM solutions, OWL is very complicated.
Semantic Web – what I learned
14. Semantic Web—too complicated?
ncit:C2901 a owl:Thing .
ncit:C2901 rdfs:label “Bladder Neoplasm” .
ncit:C2901 ncit:FULL_SYN “Tumor of Bladder”,
“Urinary Bladder Tumor” .
ncit:C2901 ncit:DEFINITION “A benign or malignant…” .
ncit:C2901 owl:
equivalentClass
[ rdf:type owl:Class ;
owl:intersectionOf
(ncit:C2900 ncit:C3431)
] .
“Bladder Neoplasm” as represented in OWL.
What happened to the hierarchy?
16. Semantic Web—too complicated?
ncit:C2901 owl:
equivalentClass
[ rdf:type owl:Class ;
owl:intersectionOf
(ncit:C2900 ncit:C3431)
] .
This is the challenge. The NCI OWL file represents the “hierarchy” like this:
ncit:C2901 skos:broader ncit:C2900 .
ncit:C2901 skos:broader ncit:C3431 .
When, as a Working Taxonomist, all I need to represent is this:
17. • Biomedical ontologies are modeled to describe scientific entities
with enough detail so that machines can make accurate
inferences.
• OWL is needed for that level of sophistication. How OWL is used
can vary greatly, even within the same knowledge domain.
• SKOS is only modeling terminology for the practical application
in knowledge management systems (the way taxonomies have
always been used). SKOS vocabularies can be used
interchangeably.
• OWL helps machines understand information. SKOS helps
humans interact with information.
OWL vs SKOS – either/or?
18. OWL and SKOS – both/and
Q: Can we use the Semantic Web to
allow the sophisticated OWL modeling of
the NCI Thesaurus, but still have our
hierarchy of terms and term properties in
SKOS?
A: Yes! It’s built into the fundamentals
of the Semantic Web.
Build!
20. OWL and SKOS – both/and
ncit:C2900
ncit:C2901
ncit:C3431
owl:equivalentClass
owl:intersectionOf
Build!
this OWL:
Graphically, our code will take…
“Bladder Neoplasm”
“Bladder Disorder”“Urinary System Neoplasm”
21. OWL and SKOS – both/and
ncit:C2900
ncit:C2901
ncit:C3431
Build!
skos:broaderskos:broader
this SKOS:
…and create
“Bladder Neoplasm”
“Bladder Disorder”“Urinary System Neoplasm”
22. • Sophisticated ontologies that use OWL are
necessary for certain disciplines and applications.
• We can leverage the basics from these ontologies to
create SKOS properties.
– The capability to built into the Semantic Web.
– It does not have to change, convert, or otherwise
manipulate the original ontology.
– Publishers of ontologies can do this, too.
• We’re just making them more useful to the
Working Taxonomist.
Bottom line
24. Acknowledgements:
• My SmartLogic colleagues, especially Matthieu
Jonglez, Anne Lapkin, Evelyn Kent, and Stuart Laurie.
• My early Semantic Web mentors: Dean Allemang, Bob
Ducharme, Tom Plasterer, and Kerstin Forsberg.
• The Special Libraries Association Taxonomy
Community of Practice.
• The American Library Association’s Linked Library
Data Interest Group, especially Theodore Gerontakos
and Sarah Quimby.
Thank you!
26. SKOS in a nutshell
“Bladder Neoplasm” “Tumor of Bladder”
ncit:C2900
ncit:C2901
ncit:C2900
skos:definition
“ A benign or malignant,
primary or metastatic
neoplasm of the bladder.”
skos:prefLabel
“Bladder Disorder”“Urinary System Neoplasm”
skos:prefLabel
skos:altLabelskos:prefLabel
skos:broaderskos:broader
“Urinary Bladder Tumor”
Build!
27. Semantic Web—too complicated?
ncit:C2900
ncit:C2901
ncit:C3431
rdfs:label
“Bladder Neoplasm”
ncit:FULL_SYN
“Urinary Bladder Tumor”
ncit:DEFINITION
“ A benign or malignant, primary or
metastatic neoplasm of the bladder.”
“Tumor of Bladder”
owl:equivalentClass
owl:intersectionOf
rdfs:label
“Bladder Disorder”
“Urinary System
Neoplasm”
rdfs:label
Build!
Editor's Notes
The Working Taxonomist – scenario showing the need for reusing terms in a vocabulary network.
The Semantic Web to the rescue!
SKOS: for taxonomies.
OWL: for ontological modeling.
Biomedical ontologies tend more towards OWL.
The false either/or.
Biomedical ontologies – focused on machine learning, modeling reality. Show complex NCIt modeling using OWL.
Biomedical taxonomies – focused on human learning, KM applications. Show SKOS modeling of same concept.
Semantic Web fundamentals are designed to allow for both to coexist.
Here’s how.
Here’s my story as a working taxonomist --
In managing an enterprise taxonomy for a Pharm, I would reference external vocabularies (I’m not a doctor, just a humble Librarian). What to librarians do when they don’t know something – they LOOK IT UP!
For example I might receive a request to add a new drug to the enterprise taxonomy.
ChEBI – Chemicals of Biomedical Interest. European Bioinformatics Institute
NCIt as one example – looks like a good record for Cisplatin. I’ll use some of it.
And of course I’ll do the same of the hierarchy – and I may need to create more new terms, copy those labels and definitions over, to complete the tree and relationship needed for this new concept.
Drag and Drop Drag and Drop . Don’t I drag enough around all day!
If I’m writing a blog and want to reference another page, do I copy and paste the contents of the other page into my blog? No! I reference it using URLs and Links. Cutting and pasting data can feel just as crazy.
Yes, you could bring some of these vocabularies into your own Taxonomy management system, and then borrow terms as needed in a poly-hierarchical way. But:
* you’re still copying entire vocabularies, probably converting them to a proprietary format, and ingesting them into your system. You’re maintaining custom conversion code, etc. You may need a local copy of the vocabulary, but if the vocabularies themselves were available in SKOS, it would make this much easier.
Then I learned about the Semantic Web and SKOS
The Semantic Web was designed to solve the exact challenge I was facing. My first class in semantic web was given by Dean Allemang who literally wrote the book – probably the best book – on the subject. And he talks specifically about SKOS – read quote. That’s exactly what I needed! I started imagining how it would work with my favorite vocabularies – like the NCI Thesaurus:…
(Other SKOS advantages:
Plug-and-play vocabularies into the same system, without depending on custom code for each vocabulary.
Link concepts across vocabularies so you can search across disparate data sets with a rich synonym set, or be able to deep dive into each repository with it’s particular vocabulary.
)
The NCIT is presented on the web as a totally recognizable thesaurus – hierarchy and term properties and relationships – what we working taxonomists deal with every day. SKOS is designed for it. These SKOS properties and a few others are all that’s needed to create this presentation of information.
One quick look at what the actual code would look like. This is, basically, all SKOS is. If you write it like this, with the appropriate header information, any SKOS compliant system can understand it. That’s important – any system could understand it.
So the next step should be easy – if all the vocabularies we wanted were available in SKOS they would be easy to reuse and network. Where do we look for them? In the “cloud”…
(note that the “ncit:C2901” is a shorthand way of the actual URI for that concept which is : http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C2901 )
If you know about the Semantic Web ,you know about the great Linked Open Data cloud diagram. All we need to do now is go into the cloud pick out the SKOS taxonomies that are out there and presto – a vocabulary network! Let’s zero on the biomedical/life sciences part of the cloud and take a look…
Hmmmm, well this interesting. I see MeSH listed but in the context of Bio2RDF whatever that is. I don’t see any explicitly SKOS sources. And Bio2Rdf – must be an aggregator, not the actual publisher of these sources.
Biomedical ontologies are oriented towards modeling diseases, chemicals, biological systems and processes. Not about taxonomies or “Simple” knowledge organization systems.
Here’s what I learned very quickly…
Semantic Web is not just about taxonomies… this is embarrassing, but it took me a while to realize that. After all Semantic Web folks are talking all the time about “ontologies” and “vocabularies” – isn’t that we I, the Working Taxonomy, deals with? Well, not really.
Especially in the Biomedical domain – ontologies mean more than just taxonomies and thesauri; they’re modeling the actual processes of life along with interactions with things like drugs and the environment. It can get very complicated – believe me, just go to a BioOntology conference to find out. It will blow your mind.
But for the Working Taxonomist trying to implement practical KM solutions for our businesses, it is overly complicated.
I found this out very clearly when I found that my favorite thesaurus, the NCI Thesaurus was available in Semantic Web format – but modeled in OWL. This is how our concept “Bladder Neoplasm is modeled in the NCI Owl format:….
Here’s how our concept “Bladder neoplasms” is actually represented in the NCI Thesaurus OWL file, which is available free for download. Actually the labels, synonyms and definitions are modeled with additional XML within the properties, but I’ve removed that complexity for the sake of the example.
Again don’t worry about understanding the code. The main label, synonyms, and definition are somewhat the same, but focus on how the “hierarchy” is represented – what is going on here?
This took me a while to understand. What this is saying is that “Bladder Neoplasm” is equivalent to something that is BOTH a “Bladder Disorder” AND a “Urinary System Neoplasm”. It’s equal to the combination of both – not one or the other. So a machine should not infer that “Bladder Neoplasm” is wholly a “Urinary System Neoplasm”. There may be characteristics of a “Urinary System Neoplasm” that are not characteristics of “Bladder Neoplasm”.
**But honestly if you understand this modeling, you’re probably at the wrong conference! **
And this is a very simple OWL construct. You can imagine how sophisticated this modeling can get when you think about the nature of diseases, drugs, and biological processes. Even in the NCI Thesaurus OWL file this is the a relatively simple structure.
**If you want to understand how this can represented in a poly-hierarchy, then you’re at the right conference! **
One more look at this as code – remember we saw what the SKOS code would look like earlier. This is what the OWL code would look like:
When I first discovered that this was how they were modeling the “hierarchy” my mind was blown. Now I understand why, but all I wanted was to see my term in the tree – just as on the NCIt website.
*This is what people mean when they say SKOS isn’t sophisticated enough. And it’s what people mean when they say OWL is too complicated for taxonomies. *
Here’s my message: They’re both right. Here’s why.
OWL is used to model knowledge so that machines can understand. SKOS is modeling concepts to help Humans understand. Like the web presentation of the NCI Thesaurus – the OWL modeling of Bladder Neoplasms is not needed to present the NCI Thesaurus to the user in a meaningful way. SKOS, however, can do it very well. Better yet, the same presentation could be used to for any SKOS-modeled vocabulary.
If the answer was NO, I wouldn’t be here!! Next I’m going to show you a bit of Semantic Web code. And the point is not to teach the code, but to provide concrete evidence that this actually works.
This is the heart of my presentation. This is not just an idea – it really works.
The point here is not to understand any of this syntax, or to teach OWL, or even SKOS. The point is that it’s possible, and it’s built into the Semantic Web standards. Because the basic building blocks of all semantic web data are so fundamental, you can do these transformations.
This is not a PERL script or any other traditional code used to transform data – it’s built into the same language that used to define the data itself.
The same type of rule can be used to accommodate the custom NCIT relationship types, e.g. ncit:DEFINITION can be rendered as skos:description.
Looking at this transformation graphically – here’s our OWL model. Our code will create SKOS from it.
We’re adding VALUE to them by making them more widely and more easily applicable to a broader set of use cases and applications: Search, SharePoint, Knowledge Management, Document Management – the focus of this KM conference.
I encourage everyone to embrace SKOS: taxonomists, ontologists, software providers, and vocabulary publishers.
‘End’ slide
Let’s look at those properties graphically… Breaking out those term properties and relationships using SKOS modeling.
For clarity, this is both a simplification, and a combination of two OWL representations of NCIt. However, the OWL modeling is the same in both.
Labels, definitions, pretty straightforward. However, it’s the “hierarchy” that I want to focus on.
Let’s look at the OWL properties in more detail…