Mungall keynote-biocurator-2017

Chris Mungall
Biocuration, Stanford, 2017
2017: AN ONTOLOGY
BIOCURATION ODYSSEY
chrismungall

Outline
 My path towards biocuration
 Ontologies past and future
 Some final thoughts on biocuration

Which path to AI? (circa 1990s)
Knowledge-
Based
Knowledge-
Free
statisti
cs
logic
learnin
g
encodin
g
Artificial Intelligence
Narrow AI Broad AI
‘knowin
g that’
‘knowin
g how’
Biologicall
y inspired
Cognitivel
y inspired

- All cats are mammals
- All dogs are mammals

- All cats are mammals
- All dogs are mammals
- Mammals have fur
- Dogs like balls
- Fido is a dog

• Analysis pipeline
• Curation tools
• Annotation databa
From sequence to genome
annotation

• Curation tools
• Annotation databa
Chado
Mungall, C. J., Emmert, D. B., & FlyBase Consortium, (2007). A Chado case study: an ontology-based
modular schema for representing genome-associated biological information. Bioinformatics, 23(13),
i337-346. http://doi.org/10.1093/bioinformatics/btm189
Generalized community tools

• Curation tools
• Annotation
database
• Functional
annotation
Genomes to function
annotation?
What does it do?

Gene Ontology: tool for the
unification of biology (2000)
 Organize
generalized
biological
knowledge as a
graph
 Attach genes to
nodes
 Propagate across
species
 Create gene lists
 Interpret high
throughput data
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., … Sherlock, G. (2000). Gene ontology: tool for the
unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1), 25–29. http://doi.org/10.1038/75556

Ontologies as force amplifiers for
data
domain knowledgedata
biocurationexperimen

Don’t worship the monolith
PROBLEM: GO and other ontologies were becoming monolithic
- lots of implicit overlap with other ontologies, latent structure

Open Biological Ontologies
(OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
2. Provide technical
and
sociotechnological
framework for
cooperation
4. Allow us to
curate all of the
things
3. Provide tools, best
practices and
infrastructure for
forging new
ontologies
@obofoundry

OBO Library PURLs
 PURL: Persistent URL
 Consistent, predictable, stable and versioned
URLs for ontology objects
 Can be shortened as compact URIs (CURIEs), e.g.
GO:0008150
 Can be registered and viewed on OBO site
 http://obofoundry.org
 Ontology purls
 Main ontology, subsets
 versionIRIs
 Ontology term purls

compound
eye
ommatidium
sense organ
eye
disc
is_a
part_of
develops
from
detection of light
stimulus involved in
visual perception
(GO)
One ontology to bind them: the
Relation Ontology (RO)
capable of
outer photoreceptor
cell
part_of
http://obofoundry.org/ontology/ro.html
lamina monopolar
neuron L3
synapsed
by

Contributions to and uses of
RO
virtualflybrain.org globalbioticinteractions.org
Osumi-Sutherland, D. (2012).
doi:10.1093/bioinformatics/bts113
 Has soma location
 Has synaptic terminal in
 Upstream in neural circuit with
 …
 Eats
 Epiphyte of
 Parasite of
 Kleptoparasitizes
 hyperparasitizes
Neurocellular Bioitic interaction
 Is model of
 Has phenotype
 Molecularly controls
 Allosteric inhibitor of
 causes or contributes to condition
 ...
David Osumi-Sutherland Anne ThessenMatt Brush Greg Stupp
Gene, drug,
phenotype
>500 relations

What happens when the pieces
don’t fit together?

Making the pieces fit together: GO
and CHEBI
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and
chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513.
http://doi.org/10.1186/1471-2164-14-513
GO CHEBI
• Some relationships didn’t make sense
• E.g. nucleotide isa carbohydrate
• Acids  conjugate bases
Harold Drabkin
David Hill
Jane Lomax
Tanya Berardini
Janna Hastings

Making the pieces fit together: GO
and CHEBI
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and
chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513.
http://doi.org/10.1186/1471-2164-14-513
GO CHEBI
• Fixed many is-as
• E.g. nucleotide isa carbohydrate
• Acids  conjugate bases
+ OWL reasoning
Harold Drabkin
David Hill
Jane Lomax
Tanya Berardini
Janna Hastings
GO CHEBI
+ Design Patterns

lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Challenges of multi-species anatomy
and phenotypes
develops_from
part_of
is_a (SubClassOf)
surrounded_by

The perils of mappings
Class A Class B Mapped
?
Useful
?
FMA: extensor
retinaculum of wrist
MouseAnatomy: retina Yes No
Plant Ontology: Pith
Fly Anat: femur
MouseAnatomy: medulla
MouseAnatomy: femur
Yes
Yes
No
No*
ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes
TAO:fossa AdverseReactions: depression Yes No
FMA: colon GAZ: Colón, Panama Yes No
Quality: male Chebi: maleate 2(-) Yes No

http://uberon.org
• Initial Phase
• Bottom-up
• Create groupings of
terms
• Light curation
• Next Phase
• Top down
• 14k classes
• Design Patterns
• Periodic alignment
and feeding back to
curators
Uberon

Uberon for gene expression
curation
http://bgee.org/

dinosaurs, sponges, comb jellies
and cephalopods, oh my
Thacker, R. W., (2014). The Porifera Ontology (PORO):
enhancing sponge systematics with an anatomy ontology.
Journal of Biomedical Semantics, 5(1), 39.
http://doi.org/10.1186/2041-1480-5-39
Graphic courtesy Nizar Ibrahim, Paul Sereno, et al.
Phenotype RCN
Wasila Dahdul
Bob Thacker
obofoundry.org/
ontology/ceph.html
obofoundry.org/
ontology/cteno.html

Phenotype and Disease
Ontologies
 Problem: Many ontologies, vocabularies and
condition/phenotype lists:
 HP, MP, WBPhenotype, FBcv, TO, VT, FYPO, APO,
SNOMED
 OMIM, Orphanet, DO, NCIT, MESH, ICD, UMLS,
MEDGEN …
 ZFIN, Phenoscape: EQ
Köhler, S.. (2013).. F1000Research, 1–
12.
http://doi.org/10.3410/f1000research.2-
Standardized Design
Patterns + OWL
Reasoning
Bayesian OWL Ontology
Merging
(BOOM)
Mungall, C.J et al (2016) kBOOM.
bioRxiv 10.1101/048843
Monarch merged
‘upheno’ ontology
MonDO
Elvira Mitraka
Sue Bello Nicole
Vasileksky

Combined score
Remove off-target and common variants
Whole exome
Variant Score based on allele frequency and
pathological impact
Mendelian filters
Whole or partial
phenome (HPO)
Owl
Sim
Gene phenotype scores
Curated
Phenotype
Data
Monarch
Integrated
KB
upheno
Curated
Orthology,
Interaction, ..
Data
+GENOMISER

animal-
associated
soil
marine
plant-
associated
sediment
aquatic
hot spring
food
cultured
freshwater
hydrothermal
vent
terrestrialsludge waste water
extremeorganism-
associated
air
microbial mat
lite
http://obofoundry.org/ontology/envo.html
Ramona Walls
Pier Luigi Buttigieg

Environments: generalizing beyond
microbes
https://github.com/cmungall/environmental-conditions

Biological knowledge and curation
QC
Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and
ontology development. BMC Bioinformatics, 11(1), 530. http://doi.org/10.1186/1471-2105-11-530
Annotation errors can arise for different reasons
- machine error (inappropriate propagation)
- human error
Previous versions of the GO had
various unusual annotations:
• Genes in chicken responsible
for lactation

QC
Annotation errors can arise for different reasons
- machine error (inappropriate propagation)
- human error
Previous versions of the GO had
various unusual annotations:
• Genes in chicken responsible
for lactation
• Genes in slime mold
responsible for dorsal fin
development

Solution: Taxon constraints
Encode taxon constraints as OWL
rules in the ontology
only in taxon
never in taxon
Can be propagated across
ontologies
E.g.
dorsal fin only in vertebrata
(uberon)
dorsal fin never in tetrapod
(uberon)
lactation only in mammals (go)

Hi, ROBOT
 How can we package things up and make
them easier to use in ontology/curation QC
pipelines?
 Enter ROBOT
 Design Patterns
 Continuous Integration

Next steps for ontology
annotation
 Existing ontology annotation model:
 Bag of terms
gene
ter
m
ter
m
ter
m
ter
m
ter
m
ter
m
ter
m
ter
m

All GO
annotations for
(human) beta-
catenin:(Molec
ular Function
branch)

Next generation ontology
annotation in Noctua
http://noctua.berkeleybop.org/

Generalization to phenotypes
http://noctua.berkeleybop.org/

Intelligent Concept Assistant
https://github.com/INCATools

Take homes
 Knowledge is a force multiplier
 Applies to all biocuration work
 But pinpoints need for QC
 Design for generality
 But acknowledge difficulties
 Better support required
 Biological knowledge is multifaceted and
nuanced
 Computer scientists have a tendency towards
hubris
 Biology is our nemesis
 Collaborative approach is vital

http://hoodline.com/2016/12/caught-on-camera-self-driving-uber-runs-red-
light-in-soma

Acknowledgments
 Monarch Initiative: Jeremy Nguyen-Xuan, Kent Shefcheck, Matt Brush, Tom Conlin, Lilly
Winfree, Eric Douglass, Jules Jacobsen, Craig McLachan, Suzanna Lewis, Julie McMurry, Dan
Keith, Nicole Washington, Nicole Vasilevsky, Nathan Dunn, Harry Hochheiser, William Bone, Neal
Boerkel, Damian Smedley, Tudor Groza, Sebastian Koehler, Melissa Haendel, Peter
Robinson
 GO: Michael Ashburner, David Hill, Paola Roncaglia, David Osumi-Sutherland, Tanya Berardini,
Jen Deegan, Jane Lomax, Karen Christie, Pascale Gaudet, Monica Munoz-Torres, Seth
Carbon, Eric Douglass, Heiko Dietze, Ruth Loverin, Rachael Huntley, Midori Harris, Harold
Drabkin, Kimberley Van Auken, Marc Feuermann, Petra Fey, Jim Hu, Debbie Siegel, Helen
Parkinson, Tony Sawford, Stacia Engel, Sylav Poux, Melanie Courtot, Becky Foulger, Emily
Dimmer, Rachael Huntley, Huaiyu Mi, Judy Blake, Paul Sternberg, Mike Cherry, Suzi Lewis, Paul
Thomas
 OBO: Michael Ashburner, Suzanna Lewis, Barry Smith, Richard Scheuermann, Chris Stockert,
Jie Zheng, Melanie Courtot, Simon Jupp, Ramona Wall,s Darren Natale, Melissa Haendel, Lynn
Schriml, Alan Ruttenberg, Seth Carbon, James Overton, Bjoern Peters, + all contributors
 Planteome: Pankaj Jaiswal, Dennis Stevenson, Laurel Cooper, Austin Meier, Marie Angelique
Laporte, Elizabeth Arnaud
 Uberon: David Osumi-Sutherland, Paula Mabee, Jim Balhoff, Wasila Dahdul, Alex Dececci,
Nizar Ibrahim, Paul Sereno, Frederic Bastian, Ann Niknejad, Marc Robinson-Rechavi, David
Blackburn, Terry Hayamizu, Yvonne Bradford, Ceri Van Slyke, Alex Diehl, Terry Meehab,
Robert Druzinsky, Melissa Haendel
 ALL OF THE BIOCURATORSNIH ORIP R24OD011883
NHGRI U41HG 002273 NSF DEB-0956049 DOE DE-AC02-05CH11231
NSF IOS 1340112
NSF DBI 1062404

Give me a place to stand and with a lever I
will move the whole world

Uncovering latent meaning in
ontologies
Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and
Functional Genomics, 5(7), 509–520.
regulation of Notch signaling pathway involved in heart
induction
relation relation anatomicpathway
OWL EXPRESSION HERE
≡ ∃regulates (NSP ⊓ ∃ part-of HI)

Open Biological Ontologies
(OBO)
 To provide modular building
blocks
 Not just functional annotation of
genes and gene products
 Framework, tools and
infrastructure for cooperation and
harmonization
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., … Lewis, S. (2007). The OBO Foundry: coordinated
evolution of ontologies to support biomedical data integration. Nat Biotechnol, 25(11), 1251–1255.
Functio
n
(GO)
Anatomy
Environ
ment
Chemical
s
(CHEBI)
Phenotyp
e and
Disease
Genes
(SO,
GENO)
Occurs
in
…
http://obofoundry.org

OBO: Modularity
Functio
n
(GO)
Gross
Anatomy
Chemical
s
(CHEBI)
Abnormal
Phenotype and
Disease
Sequenc
e
Imported into
Cell
Types

Relations: the glue that holds it
together
 RO 2005 paper
 10 relations
 Current RO
 >500 relations
 Molecular biology
 Neurobiology
 Biotic interactions
 …
 Many rules on how relations compose together
 Working with wikidata

Beyond the GO
Functional
Genomics: Gene
function
Transcriptomics:
Gene expression
Phenomics: Effects
of gene mutations
Gene Ontology
Anatomy and
Stage Ontology
Phenotype and
Trait Ontology
Links genes to
What they do
Links genes to
where they
are expressed
Links genes to
what happens
when they are
disrupted or
when they varyDisease Ontology
Environment
Ontology

anatomical
structure
endoderm of
forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
develops_from
part_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary
acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
http://uberon.org
Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy
ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
Uberon bridges anatomy
ontologies

Uberon for comparative Gene
Expression
http://bgee.org/

Uberon Core
Extensions to other animals…
Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E., Haendel, M. a,
& Mungall, C. J. (2014). The Porifera Ontology (PORO): enhancing sponge
systematics with an anatomy ontology. Journal of Biomedical Semantics, 5(1), 39
Non-model/human
extension
Porifera
Ontology
Ctenophore
Ontology
Cephalopod
Ontology
http://phenotypercn.org
https://github.com/obophenotype/cephalopod-ontology
https://github.com/obophenotype/ctenophore-ontology
https://github.com/obophenotype/porifera-ontology
https://github.com/obophenotype/uberon
Arthropod
Ontology

http://monarchinitiative.org/analyze/phenotypes/
PhenoGrid: visualizing phenotype
matches

The Undiagnosed Disease Patient
(UDP) Use Case
Clinical
Phenotyping
(HPO/phenot
ips)
Exome
Sequencing
Causative
Variant?

https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Robinson, P., et al . (2013). Improved exome prioritization of
disease genes through cross species phenotype comparison.
Genome Research. doi:10.1101/gr.160325.113

TODO DEPRECATED The need
for modularization
 Growing pains of GO
 Terms were added as-needed for curation
 Hard to maintain
 Scope: Encompassing all of biology is hard
 Biochemistry, cell biology, plants, animal development and
physiology, …
 We needed to modularize
 Meanwhile
 Other ontologies in the ‘style’ of GO were popping up,
for annotating other kinds of data
 Challenge: how were we going to coordinate this?

QC
 Taxon constraints
 CONCRETE EXAMPLE HERE
 Intersection rules
 (see Seth’s talk)

Knowledge-Based
• ice cream derived-from
dairy
• Ice cream is yummy

Uberon/CL applications and
users
 Ontology Modularization
 GO
 CLO
 Pheno Ontologies (EQ definitions)
 ENVO
 Transcriptomics and genome annotation
 ENCODE
 FANTOM5
 LINCS
 BgeeDb
 Phenomics
 Human and Mammalia Phenotype Ontology
 Phenotype comparison algorithms
 Evolutionary Phenotypes: Phenoscape
http://uberon.github.io/about/adopters.html

The path to AI, 1990s
 Two goals
 Broad AI
 Narrow AI
 What path to get there?
 Knowledge-Based
 Explicit Encoding of knowledge about the world
 Analytic or deductive reasoning
 Mathematical Logic vs Cognitively inspired (neats vs scruffs)
 ‘Knowing that’
 Knowledge-Free
 Machine Learning, Neural Networks
 Statistics
 Pattern Recognition
 Biological Inspired
 ‘Knowing how’

Opposites
Koehler et al, bioRxiv https://doi.org/10.1101/108977

compound
eye
ommatidium
sense organ
eye
disc
is_a
part_of
develops
from
detection of light
stimulus involved in
visual perception
One ontology to bind them: the
Relation Ontology (RO)
capable of
outer photoreceptor
cell
part_of
lamina monopolar
neuron L3
synapsed
by

Mungall keynote-biocurator-2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mungall keynote-biocurator-2017

Similar to Mungall keynote-biocurator-2017 (18)

More from Chris Mungall

More from Chris Mungall (19)

Recently uploaded

Recently uploaded (20)

Mungall keynote-biocurator-2017