Ontologies: Necessary, but not Sufficient
Robert Stevens
School of Computer Science
The University of Manchester
Manchester
United Kingdom
M13 9PL
Robert.Stevens@Manchester.ac.UK
Knowing what
we’re talking about
• This is what ontologies are for
• For human and machines
Number of PubMed papers per year for
1998 to 2016 (without normalisation)
PubMed search:
(ontology[All Fields] OR ontologies[All Fields])
Number of PubMed papers per year for
1998 to 2016 (with normalisation)
Number of PubMed papers per year for
1998 to 2016 (without normalisation)
PubMed search:
("gene ontology"[MeSH Terms] OR ("gene"[All Fields] AND
"ontology"[All Fields]) OR "gene ontology"[All Fields])
Number of PubMed papers per year for
1998 to 2016 (with normalisation)
PubMed papers from
1998 to 2016:
“ontology OR ontologies”
Other GO ontology
Other: 25%
GO: 75%
Word cloud for all the citations of
the 1998 search
Total 35
citations
Text from all
fields
No data
cleaning
Make word cloud via https://www.jasondavies.com/wordcloud/
PubMed search: (ontology[All Fields] OR ontologies[All Fields])
AND (”1998/01/01"[PDAT] : ”1998/12/31"[PDAT])
Word cloud for all the citations of
the 2005 search
First 200
citations of a
total of 516
citations for
for 2005
Text from all
fields
No data
cleaning
Make word cloud via https://www.jasondavies.com/wordcloud/
PubMed search: (ontology[All Fields] OR ontologies[All Fields])
AND ("2005/01/01"[PDAT] : "2005/12/31"[PDAT])
Word cloud for the first 200
citations of the 2016 search
First 200
citations of a
total of 2647
for 2016
Text from all
fields
No data
cleaning
Make word cloud via https://www.jasondavies.com/wordcloud/
PubMed search: (ontology[All Fields] OR ontologies[All Fields])
AND ("2016/01/01"[PDAT] : "2016/12/31"[PDAT])
Top ten mentioned resources
in PMc full text corpus
• R
• Gene Ontology
• GenBank
• BLAST
• PDB
• KEGG
• GEO
• Ensembl
• ABA
• Cluster
Duck et al PLOS1 2016
The OBO Library
http://www.obofoundry.org
Anatomy & development
zfs, wbbt, wbls, uberon, tgma,
tads, spd, pdumdv, plana, poro,
opl, olatdv, oarcs, mmusdv,
mfmo, ma, xao, zfa, aeo, PO,
caro, ceph, cmf, cteno, ddanat,
ehdaa2, emap, emapa, fao,
fbbt, fbdv, fma, hao, hsapdv
How we do science
mro, xco, zeco, uo, stato,
swo, obcs, ms, mamo, kisao,
cheminf, chmo, mmo, sbo,
sep, sepio, obi, agro, bcgo,
cdao, cmo, duo, eaglei, fbbi,
fix
Phenotypes and disease
ogsf, ohd, wbphenotype, vt, oba,
to, micro, ncit, mondo, mfomd,
doid, ppo, upheno, miro, nbo,
sibo, omp, pato, geno, apo, bspo,
cvdo, ddpheno, dpo, flopo, hp,
mp, mpath, ido, idomal
Molecules, Macromolecules
GO MF & BP, pr, ncro, mirnao,
mod, rnao, mop, xl, rex, omit,
chebi, mi
Clinical
epo, ogms, omrse, ontoneo, oostt,
ovae, pdro, symp, vo, aero, dideo,
dinto, dron, exo, genepio, ico, oae
Data sources
fbcv, omiabis, bco,
cio, miapa, iao, obib
Cells and their parts
Go CC, cl, clo, bto
Environment
envo, eo, ero, geo
Mental
phenomena
mf, mfoem
Species and populations
tto, vto, ncbitaxon, pco, rs, taxran
Genes and Genomes
ogg, ogi, so, hom, vario
The GO Challenge
Computer scientists have made significant contributions
to linguistic formalisms and computational tools for
developing complex vocabulary systems using reason-
based structures, and we hope that our
ontologies will be useful in providing a
well-developed data set for this community
to test their systems.
Ashburner et al Nat. Genet. 2000
Ontologies don’t do Biology
• We have riches in annotations
• We should do more than Gene Expression
Analysis
• We need software that uses ontologies to
draw conclusions
• That can be automated reasoning
• …but also via levels of indirection
A rich description of
the common buttercup
and (hasRegion some
(MarginRegion
and (hasSepalPetalFeature some Entire)
and (hasSepalPetalFeature some Membranous)))
and (hasRegion some
(SurfaceRegion
and (hasSepalPetalFeature some Pubescent)
and (hasSurfaceSelector some LowerSurfaceSelector)))
and (hasRegion some
(SurfaceRegion
and (hasSepalPetalFeature some Smooth)
and (hasSurfaceSelector some UpperSurfaceSelector)))
and (hasRegion some
(TipRegion
and (hasForm some Truncate)))
and (hasSepalPetalFeature some PalmatelyNetted)
and (hasSepalPetalShape some Ovate)
and (hasSepalousity some Aposepalos)))))
and (hasPart some
(Corolla
and (hasPart exactly 5 (Petal
and (hasColour some Yellow)
and (hasPetalousity some Apopetalos)
and (hasRegion some
(BaseRegion
and (hasForm some Acute)))
and (hasRegion some
(MarginRegion
and (hasSepalPetalFeature some Entire)))
and (hasRegion some
(TipRegion
and (hasForm some Acute)))
and (hasSepalPetalFeature some PalmatelyNetted)
and (hasSepalPetalShape some Obovate)
and (hasPart exactly 1 Nectary)))))
and (hasPerianthArrangement some
AlternatingPerianthArrangement)
and (hasPart only
(Calyx
or Corolla))))
Class: "Ranunculus Repens"
SubClassOf:
Flower
and (hasFlowerSymmetry some RadialSymmetry)
and (hasPart some
(Androecium
and (hasAndroecialFusion some Apostemonous)
and (hasPart some
(Stamen
and (hasPart some Filament)
and (hasPart some
(Anther
and (hasAntherAttachment some AdnateAntherAttachment)
and (hasDehiscenceType some LongitudinalDehiscence)))))))
and (hasPart some
(Gynoecium
and (hasGynoecialFusion some Apocarpous)
and (hasPart some
(Pistil
and (hasPart some Carpel)
and (hasPart some Style)
and (hasPart some
(Stigma
and (hasStickiness some Stickiness)
and (hasStigmaShape some HookedStigmaShape)))
and (hasPart only
(Carpel
or Stigma
or Style))))
and (hasSexualPartArrangement some SpiralArrangement)))
and (hasPart exactly 1 (Perianth
and (hasPart some
(Calyx
and (hasPart exactly 5 (Sepal
and (hasColour some Green)
and (hasRegion some
(BaseRegion
and (hasForm some Truncate)))
Ontology
Ontology driven
user interfaces
Class: "Ranunculus Repens"
SubClassOf:
Flower
and (hasFlowerSymmetry some RadialSymmetry)
and (hasPart some
… … …
generate menus
Graphical User
Interface (GUI)
generate axioms
for a flower
More axioms
• Better maintenance, better use
• More queries
• Moving away from vocabulary as the
sole deliverable
• Sampling across ontologies to deliver a
particular use case
• Still a challenge to reasoning tools
My favourite tenet of
Agile methods
Maximising the work not done
Going programmatic
Ontology
Ontology
Manual creation
Manual curation
Software programmatic
intervention
creates
updates
Program
Visual
Inspection
Visual
Inspection
Making Ontology Development
programmatic
• Making ontologies by hand is hard
• Pattern based development is the way
• Programmatic first, rather than
programmatic after
• Programmatic only
Views Over Ontologies
Coping with custom, practice and differing views
– Carbohydrates (Mungall et al JBI 2011)
– Genes/proteins; what matters about chemicals
(OpenPhacts, Batchelor et al, ISWC 2014)
Different answers for different communities; the
right answer is not always what people want
Navigation within Knowledge – constipation in the Read
codes
Using other forms of knowledge representation, such as
SKOS, RDF graphs, knowledge graphs, and so on; it’s
all knowledge in some form
Avoid the ontological hammer
Experimental Factor Ontology:
a view on the worlds bio-ontologies
Applications
External ontologies
Disease BioAssays
Cell lines
Cell types
Small molecules
Evidence
Taxonomy
Drugs
Adverse events
InformationGene function
Plant anatomy
Mouse anatomy
Phenotype
EVA Expression Atlas
GWAS catalog
Array Express
1 million+
terms
20,000 terms
Applications
Reuse and request
import and update
entity request
Client ontologySource ontologies
Data Driven Ontology Content
• Ontologies represent the data we describe
• Our data should guide us as to what to
describe
• FCA, ML approaches to analysing data and KB
content
• Let our data help us improve our ontologies
• And let our ontologies improve our data mining
The rest of the world
• SNOMED, MeSH, ICD, UMLS
• A whole host of medical and clinical
vocabularies
• They will continue to exist
• We need to work with them
• Rather than just ontology, we should talk about
knowledge representations
OBOPedia entry for
Golgi apparatus
http://www.obopedia.org.uk
Ontology as Tutorial
• We have a huge amount of knowledge captured
in our ontologies
• Particularly rich with natural language definitions
and vocabulary
• It should be usable as a learning resource
What we need to do (at least)
• Make ontology development industrial
• Make our ontologies axiomatically rich
• Enable effective sampling of ontologies
• Enable differing views over knowledge
• (At some point) stop creating new ontologies
• Think about knowledge ecosystems and not
just ontologies
• Use ontologies to do some biology
Knowledge in Biology
• Bio-ontologies should be “Knowledge in Biology”
(thanks Phil)
• Knowledge in some kind of computational form
is vital
• Ontologies are not the only knowledge fruit
• …but they are a vital, necessary component

Ontologies: Necessary, but not sufficient

  • 1.
    Ontologies: Necessary, butnot Sufficient Robert Stevens School of Computer Science The University of Manchester Manchester United Kingdom M13 9PL Robert.Stevens@Manchester.ac.UK
  • 2.
    Knowing what we’re talkingabout • This is what ontologies are for • For human and machines
  • 3.
    Number of PubMedpapers per year for 1998 to 2016 (without normalisation) PubMed search: (ontology[All Fields] OR ontologies[All Fields]) Number of PubMed papers per year for 1998 to 2016 (with normalisation)
  • 4.
    Number of PubMedpapers per year for 1998 to 2016 (without normalisation) PubMed search: ("gene ontology"[MeSH Terms] OR ("gene"[All Fields] AND "ontology"[All Fields]) OR "gene ontology"[All Fields]) Number of PubMed papers per year for 1998 to 2016 (with normalisation)
  • 5.
    PubMed papers from 1998to 2016: “ontology OR ontologies” Other GO ontology Other: 25% GO: 75%
  • 6.
    Word cloud forall the citations of the 1998 search Total 35 citations Text from all fields No data cleaning Make word cloud via https://www.jasondavies.com/wordcloud/ PubMed search: (ontology[All Fields] OR ontologies[All Fields]) AND (”1998/01/01"[PDAT] : ”1998/12/31"[PDAT])
  • 7.
    Word cloud forall the citations of the 2005 search First 200 citations of a total of 516 citations for for 2005 Text from all fields No data cleaning Make word cloud via https://www.jasondavies.com/wordcloud/ PubMed search: (ontology[All Fields] OR ontologies[All Fields]) AND ("2005/01/01"[PDAT] : "2005/12/31"[PDAT])
  • 8.
    Word cloud forthe first 200 citations of the 2016 search First 200 citations of a total of 2647 for 2016 Text from all fields No data cleaning Make word cloud via https://www.jasondavies.com/wordcloud/ PubMed search: (ontology[All Fields] OR ontologies[All Fields]) AND ("2016/01/01"[PDAT] : "2016/12/31"[PDAT])
  • 9.
    Top ten mentionedresources in PMc full text corpus • R • Gene Ontology • GenBank • BLAST • PDB • KEGG • GEO • Ensembl • ABA • Cluster Duck et al PLOS1 2016
  • 10.
    The OBO Library http://www.obofoundry.org Anatomy& development zfs, wbbt, wbls, uberon, tgma, tads, spd, pdumdv, plana, poro, opl, olatdv, oarcs, mmusdv, mfmo, ma, xao, zfa, aeo, PO, caro, ceph, cmf, cteno, ddanat, ehdaa2, emap, emapa, fao, fbbt, fbdv, fma, hao, hsapdv How we do science mro, xco, zeco, uo, stato, swo, obcs, ms, mamo, kisao, cheminf, chmo, mmo, sbo, sep, sepio, obi, agro, bcgo, cdao, cmo, duo, eaglei, fbbi, fix Phenotypes and disease ogsf, ohd, wbphenotype, vt, oba, to, micro, ncit, mondo, mfomd, doid, ppo, upheno, miro, nbo, sibo, omp, pato, geno, apo, bspo, cvdo, ddpheno, dpo, flopo, hp, mp, mpath, ido, idomal Molecules, Macromolecules GO MF & BP, pr, ncro, mirnao, mod, rnao, mop, xl, rex, omit, chebi, mi Clinical epo, ogms, omrse, ontoneo, oostt, ovae, pdro, symp, vo, aero, dideo, dinto, dron, exo, genepio, ico, oae Data sources fbcv, omiabis, bco, cio, miapa, iao, obib Cells and their parts Go CC, cl, clo, bto Environment envo, eo, ero, geo Mental phenomena mf, mfoem Species and populations tto, vto, ncbitaxon, pco, rs, taxran Genes and Genomes ogg, ogi, so, hom, vario
  • 11.
    The GO Challenge Computerscientists have made significant contributions to linguistic formalisms and computational tools for developing complex vocabulary systems using reason- based structures, and we hope that our ontologies will be useful in providing a well-developed data set for this community to test their systems. Ashburner et al Nat. Genet. 2000
  • 12.
    Ontologies don’t doBiology • We have riches in annotations • We should do more than Gene Expression Analysis • We need software that uses ontologies to draw conclusions • That can be automated reasoning • …but also via levels of indirection
  • 13.
    A rich descriptionof the common buttercup and (hasRegion some (MarginRegion and (hasSepalPetalFeature some Entire) and (hasSepalPetalFeature some Membranous))) and (hasRegion some (SurfaceRegion and (hasSepalPetalFeature some Pubescent) and (hasSurfaceSelector some LowerSurfaceSelector))) and (hasRegion some (SurfaceRegion and (hasSepalPetalFeature some Smooth) and (hasSurfaceSelector some UpperSurfaceSelector))) and (hasRegion some (TipRegion and (hasForm some Truncate))) and (hasSepalPetalFeature some PalmatelyNetted) and (hasSepalPetalShape some Ovate) and (hasSepalousity some Aposepalos))))) and (hasPart some (Corolla and (hasPart exactly 5 (Petal and (hasColour some Yellow) and (hasPetalousity some Apopetalos) and (hasRegion some (BaseRegion and (hasForm some Acute))) and (hasRegion some (MarginRegion and (hasSepalPetalFeature some Entire))) and (hasRegion some (TipRegion and (hasForm some Acute))) and (hasSepalPetalFeature some PalmatelyNetted) and (hasSepalPetalShape some Obovate) and (hasPart exactly 1 Nectary))))) and (hasPerianthArrangement some AlternatingPerianthArrangement) and (hasPart only (Calyx or Corolla)))) Class: "Ranunculus Repens" SubClassOf: Flower and (hasFlowerSymmetry some RadialSymmetry) and (hasPart some (Androecium and (hasAndroecialFusion some Apostemonous) and (hasPart some (Stamen and (hasPart some Filament) and (hasPart some (Anther and (hasAntherAttachment some AdnateAntherAttachment) and (hasDehiscenceType some LongitudinalDehiscence))))))) and (hasPart some (Gynoecium and (hasGynoecialFusion some Apocarpous) and (hasPart some (Pistil and (hasPart some Carpel) and (hasPart some Style) and (hasPart some (Stigma and (hasStickiness some Stickiness) and (hasStigmaShape some HookedStigmaShape))) and (hasPart only (Carpel or Stigma or Style)))) and (hasSexualPartArrangement some SpiralArrangement))) and (hasPart exactly 1 (Perianth and (hasPart some (Calyx and (hasPart exactly 5 (Sepal and (hasColour some Green) and (hasRegion some (BaseRegion and (hasForm some Truncate)))
  • 14.
    Ontology Ontology driven user interfaces Class:"Ranunculus Repens" SubClassOf: Flower and (hasFlowerSymmetry some RadialSymmetry) and (hasPart some … … … generate menus Graphical User Interface (GUI) generate axioms for a flower
  • 15.
    More axioms • Bettermaintenance, better use • More queries • Moving away from vocabulary as the sole deliverable • Sampling across ontologies to deliver a particular use case • Still a challenge to reasoning tools
  • 16.
    My favourite tenetof Agile methods Maximising the work not done
  • 17.
    Going programmatic Ontology Ontology Manual creation Manualcuration Software programmatic intervention creates updates Program Visual Inspection Visual Inspection
  • 18.
    Making Ontology Development programmatic •Making ontologies by hand is hard • Pattern based development is the way • Programmatic first, rather than programmatic after • Programmatic only
  • 19.
    Views Over Ontologies Copingwith custom, practice and differing views – Carbohydrates (Mungall et al JBI 2011) – Genes/proteins; what matters about chemicals (OpenPhacts, Batchelor et al, ISWC 2014) Different answers for different communities; the right answer is not always what people want Navigation within Knowledge – constipation in the Read codes Using other forms of knowledge representation, such as SKOS, RDF graphs, knowledge graphs, and so on; it’s all knowledge in some form Avoid the ontological hammer
  • 20.
    Experimental Factor Ontology: aview on the worlds bio-ontologies Applications External ontologies Disease BioAssays Cell lines Cell types Small molecules Evidence Taxonomy Drugs Adverse events InformationGene function Plant anatomy Mouse anatomy Phenotype EVA Expression Atlas GWAS catalog Array Express 1 million+ terms 20,000 terms Applications
  • 21.
    Reuse and request importand update entity request Client ontologySource ontologies
  • 22.
    Data Driven OntologyContent • Ontologies represent the data we describe • Our data should guide us as to what to describe • FCA, ML approaches to analysing data and KB content • Let our data help us improve our ontologies • And let our ontologies improve our data mining
  • 23.
    The rest ofthe world • SNOMED, MeSH, ICD, UMLS • A whole host of medical and clinical vocabularies • They will continue to exist • We need to work with them • Rather than just ontology, we should talk about knowledge representations
  • 24.
    OBOPedia entry for Golgiapparatus http://www.obopedia.org.uk
  • 25.
    Ontology as Tutorial •We have a huge amount of knowledge captured in our ontologies • Particularly rich with natural language definitions and vocabulary • It should be usable as a learning resource
  • 26.
    What we needto do (at least) • Make ontology development industrial • Make our ontologies axiomatically rich • Enable effective sampling of ontologies • Enable differing views over knowledge • (At some point) stop creating new ontologies • Think about knowledge ecosystems and not just ontologies • Use ontologies to do some biology
  • 27.
    Knowledge in Biology •Bio-ontologies should be “Knowledge in Biology” (thanks Phil) • Knowledge in some kind of computational form is vital • Ontologies are not the only knowledge fruit • …but they are a vital, necessary component

Editor's Notes