2. Outline
• What the Gene Ontology is used for
– GO structure
– Limita8ons of text defini8ons
• Cross‐product extensions to the GO
– Logical computable defini8ons
• Results and Examples
– Chemical en88es, proteins, cells
– Anatomy and development
– Rela8ons
– Reasoning
• Release Plan
• Conclusions
3. A brief introduc8on to the GO
• Nearing 11th birthday
• 3 ontologies, 28k classes
– Molecular Func8on (MF)
– Biological Process (BP)
– Cellular Component (CC)
• Annota8ons
– 42m statements assigning func8on or localiza8on to genes across 187k species
• Standard uses of GO annota8on:
– Naviga8ng and querying func8onal annota8ons for genes
– Discovery; term enrichment; seman8c similarity
– >50 tools for performing hi‐throughput analysis using GO
• Most uses require a simple, lightly axioma8zed graph
– is_a
– part_of
– Defini8ons are textual
5. Solu8on: normaliza8on + reasoning
• Prior work metabolism
sulfur
amino acid
– Rector et al x
– Hill et al biosynthesis
cysteine
• Retrospec.ve sulfur amino
normaliza8on acid metabolism
– GO preceded OBO
• How? cysteine
sulfur amino
acid
– GONG, Wroe et al = metabolism
biosynthesis
– Ogren et al
– Obol
cysteine
biosynthesis
6. Assigning logical defini8ons to GO
classes
• Logical defini8on structure
– An X is a G that D
• X : defined term
• G : genus (parent) term
• D : differen8a(e) – discrimina8ng rela8onships
– Necessary and sufficient condi8ons
– Computable defini6on should mirror text defini6on
• Simple formalism, limited expressivity
– Equivalence axioms between named classes and posi8ve conjunc8ons
of named class and one or more existen8al restric8ons
• OBO priniciple of Posi.vity
– General template:
• EquivalentClasses(NamedClass intersec8onOf(NamedGenus
[someValuesFrom(NamedObjectProperty NamedDifferen.aClass)]+))
7. Example: mitochondrial transla8on
• ‘mitochondrial transla8on’ =def ‘transla8on’ that
occurs_in ‘mitochondrion’
– (current rela8onships in GO are necessary condi8ons
only)
OBO id: GO:0032543
name: mitochondrial translation
intersection_of: GO:0006412 ! translation
intersection_of: occurs_in GO:0005739 ! mitochondrion
FOL X instance_of ‘mitochondrial transla8on’ <‐>
X instance_of transla8on &
exists C,t [ C instance_of mitochondrion at t & X occurs_in C at t ]
OWL Class: ‘mitochondrial transla8on’
manchester EquivalentTo: transla8on AND occurs_in SOME mitochondrion
syntax
8. Cross Product (XP) Sets
• GO has ~28k classes
– Retrospec8ve assignment of logical defini8ons is a lot of work
– Divide work according to ontologies directly used
• Cross Product par88ons
– X <O1 x O2 x .. x On >
• typically n=2
• Genus taken from O1
• Differen8ae taken from O2..n
– Example: BP:cysteine_biosynthesis <BP x CHEBI>
• BP:biosynthesis that has_output CHEBI:cysteine
– Each XP set has one or more templates
• Obol grammars
– h:p://wiki.geneontology.org/index.php/Category:Cross_Products
9. Results: Logical defini8ons per XP set
Genus
MF BP CC 13k classes have
MF 103 241 148 provisional logical
defini8ons (46% of classes)
BP 4046 27
CC 634 289
cell 541 25
anatomy 692
chemical 7278 3072
protein 37
quality 0
sequence 66
RNA 0
10. GO Class Logical Defini6on Genus Differen6a
Ontology ontology(s)
S phase of mito6c S phase and part_of mitosis BP BP
cell cycle
mitochondrial transla6on and occurs_in mitochondrion BP CC
transla6on
Oocyte cell differen6a6on and BP CL
differen6a6on results_in_acquisi.on_of_features_of
oocyte
Neural plate anatomical structure forma6on and BP anatomy
forma6on results_in_forma.on_of neural plate
Interleukin‐1 biosynthe6c process and has_output BP PRO
biosynthesis interleukin‐1
L‐cysteine catabolic process and has_input L‐ BP CHEBI
catabolic process cysteine and has_output taurine
to taurine
group I intron catabolic process and has_input group I BP SO/RNAO
catabolic process intron
11. GO Class Logical Defini6on Genus Differen6a
Ontology ontology(s)
histone protein complex and has_func.on CC MF
deacetylase histone deacetylase ac6vity
complex
acrosomal membrane and surrounds acrosome CC CC
membrane
neuron projec6on cell projec6on and part_of neuron CC CL
virion transport transport vesicle and realizes vesicle CC BP
vesicle transport
snoRNP binding binding and results_in_binding_of MF CC
snoRNP
methionine cataly6c ac6vity and MF CHEBI
synthase ac6vity has_input 5‐methyltetrahydrofolate
and has_input L‐homocysteine and
has_output tetrahydrofolate and
has_output L‐methionine
12. Nested logical defini8ons
• Mul8ple differen8ae and nested descrip8ons
allowed
– Only named classes used
– Spans XP sets
GO Class Logical Defini6on Genus Differen6a
Ontology ontology(s)
nega6ve regula6on biological process and BP BP
of RNA metabolic has_par.cipant RNA
process metabolic process
RNA metabolic metabolic process and BP CHEBI
process has_par.cipant RNA
13. Development and anatomy
• Neural plate forma6on = anatomical structure
forma6on and results_in_forma.on_of neural plate
– GO annota8ons to xenopus, zebrafish, mouse
• Where is neural plate declared?
– Developmental structures not in scope of FMA
– Other choices:
• EHDAA – mouse (TS1‐26)
• ZFA ‐ zebrafish
• TAO ‐ teleost
• XAO ‐ xenopus
– Gross anatomical ontologies are species‐or‐taxon‐centric
14. Uberon: a mul8‐species anatomy
ontology
• GO contains an implicit anatomy ontology spanning mul8ple species
– GO:0007423 ! sensory organ development
• GO:0001654 ! eye development
– GO:0043010 ! camera‐type eye development
– GO:0048749 ! compound eye development sensory organ
• Normalized to form Uberon development
– Alignments with species‐centric AOs
– 3000 classes
– See Poster
• Current XP par88oning: eye
development
– Uberon [most metazoa]
– PO [plants]
– Others
• Fungal anatomy ontology
• Dictyosteliam anatomy ontology compound eye camera‐type
development eye
development
15. Addi8onal rela8ons are required for
full XP set
• Core RO
– part_of, has_par.cipant
• Spa8al rela8ons (CC x {CC,CL})
– membranes, pores
– adjacent_to, surrounds, perforates
• Par8cipa8on rela8on subtypes
– has_input, has_output
– ‘macro’ defined rela8ons
– E.g. results_in_transport_{of,to,from}
16. Reasoning
• Reasoning used as part of ontology development cycle
– batch mode
– interac8ve in OBO‐Edit2
– pre‐reasoned: inferred rela8onships are asserted
• Scalability
– GO + XPs + Referenced ontologies = 130k classes
– In memory reasoners do not scale
– h:p://wiki.geneontology.org/index.php/OBO‐
Edit:Reasoner_Benchmarks
– Solu8ons:
• Segmenta8on by XP set
• CHEBI slim
• RDBMS based reasoning
18. BP x CHEBI example
transport carbohydrate
is_a is_a is_a
carbohydrate
nucleo6de, carbohydrate phosphates
nucleobase or transport
nucleoside cabrohydrate transport =def transport is_a
transport and results_in_movement_of nucleoside
carbohydrate phosphates
is_a
is_a
nucleo6de
nucleo6des
transport
nucleo6de transport =def transport and
results_in_movement_of nucleo8de
19. Release plan: basic and extended
releases
• GO is currently available in two versions
– gene_ontology: “standard”
• is_a, part_of, intra‐ontology regulates
• intended for basic tools
– gene_ontology_ext: “extended”
• h:p://www.geneontology.org/GO.ontology‐ext.rela8ons.shtml
• standard + other rela8ons and axioms
– disjoint_from
– has_part (Aug 1 2009)
• XP sets current available as separate bridge files
– h:p://wiki.geneontology.org/index.php/
Category:Cross_Products
– will gradually migrate into gene_ontology_ext
20. Pre vs post composi8on
• Compose class descrip8ons
– During ontology development cycle?
– At the 8me of annota8on?
• Logically equivalent…
– Given computable defini8ons, reasoners can determine equivalency
• .. But very different from prac8cal point of view
• GO guidelines
– pre‐compose classes for any type for which scien8fic generaliza8ons
can be made
• Yes: mitochondrial transla8on
• Yes: oocyte nucleus
• No: nucleus of epithelium of le~ ear
– Use post‐composi8on to extend at annota8on 8me
21. Related work: weaving the fabric of
the OBO Foundry
• Ontology for Biomedical Inves8ga8ons (OBI)
• Phenotype Ontologies
– Mammalian Phenotype
– Human Phenotype
– Worm Phenotype
– Plant trait
• Environment ontology
• FMA
• Fly anatomy ontology
– Neuronal subtype and sense organ logical defini8ons using
CHEBI and GO
22. Future applica8ons of cross‐product
sets
• Demonstrated u8lity as part of ontology development cycle
– How do we evaluate?
– but what about actual applica8ons?
• How can logical defini8ons (and addi8onal axioma8sa8on in
general) help:
– Search and discovery
– Visualiza8on and presenta8on to users
– Cura8on
– Improve func8on predic8on
– Database integra8on
• E.g. pathway databases
– Term enrichment
– Seman8c similarity
• Need to educate tool developers
23. Conclusions
• Normalizing retrospec.vely is hard
– Prospec.ve approach recommended
– But redundancy in effort from alterna8ve perspec8ve can yield
valuable informa8on
• Many of the challenges are sociotechnological
– What if the referenced ontology
• does not yet exist?
• exists but is unfunded?
• is constructed according to different principles?
• is incomplete?
• ..or there is a choice of two compe8ng ontologies?
– The OBO Foundry process is crucial
• Grant challenge: more applica8ons needed
24. Acknowledgments
• GO Ontology Developers • OBO Ontology developers
– Midori Harris – Alex Diehl (GO, Cell)
– Janna Has8ngs (CHEBI)
– Jane Lomax – Paula de Matos (CHEBI)
– Jen Deegan – David Osumi‐Sutherland (Fly)
– Amelia Ireland – Melissa Haendel (Zebrafish)
– Tanya Berardini – Darren Natale (PRO)
– David Hill – Karen Eilbeck (SO)
• Also • OBO‐Edit
• Amina Abdulla
– Mike Bada
– Colin Batchelor • Nomi Harris
• John Day‐Richter
• OBO
• GO PIs
– Alan Ru:enberg – Suzanna Lewis
– Barry Smith – Mike Cherry
– Richard Scheuermann – Michael Ashburner
– Judith Blake