Cross Product Extensions to the Gene Ontology

  • 2,167 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • See conference proceedings for the 4-page paper: http://precedings.nature.com/documents/3496/version/1
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,167
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
1
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Cross
Product
Extensions
to
the
 Gene
Ontology
 Chris
Mungall
 Gene
Ontology
Consor8um
 h:p://www.geneontology.org

  • 2. Outline
 •  What
the
Gene
Ontology
is
used
for
 –  GO
structure
 –  Limita8ons
of
text
defini8ons
 •  Cross‐product
extensions
to
the
GO
 –  Logical
computable
defini8ons
 •  Results
and
Examples
 –  Chemical
en88es,
proteins,
cells
 –  Anatomy
and
development
 –  Rela8ons
 –  Reasoning
 •  Release
Plan
 •  Conclusions

  • 3. A
brief
introduc8on
to
the
GO
 •  Nearing
11th
birthday
 •  3
ontologies,
28k
classes
 –  Molecular
Func8on
(MF)
 –  Biological
Process
(BP)
 –  Cellular
Component
(CC)
 •  Annota8ons
 –  42m
statements
assigning
func8on
or
localiza8on
to
genes
across
187k
species

 •  Standard
uses
of
GO
annota8on:
 –  Naviga8ng
and
querying
func8onal
annota8ons
for
genes
 –  Discovery;
term
enrichment;
seman8c
similarity
 –  >50
tools
for
performing
hi‐throughput
analysis
using
GO
 •  Most
uses
require
a
simple,
lightly
axioma8zed
graph
 –  is_a
 –  part_of
 –  Defini8ons
are
textual

  • 4. Problems
and
limita8ons

 •  Maintenance
and
 errors
 – Combinatorial
terms
 – Tangled
 polyhierarchies
 •  Denormalized
 – Redundancy

 – lack
of
reuse

  • 5. Solu8on:
normaliza8on
+
reasoning
 •  Prior
work
 metabolism sulfur amino acid –  Rector
et
al
 x
 –  Hill
et
al
 biosynthesis cysteine •  Retrospec.ve
 sulfur amino normaliza8on
 acid metabolism –  GO
preceded
OBO
 •  How?
 cysteine sulfur amino acid –  GONG,
Wroe
et
al
 =
 metabolism biosynthesis –  Ogren
et
al
 –  Obol
 cysteine biosynthesis
  • 6. Assigning
logical
defini8ons
to
GO
 classes
 •  Logical
defini8on
structure
 –  An
X
is
a
G
that
D
 •  X
:
defined
term
 •  G
:
genus
(parent)
term
 •  D
:
differen8a(e)
–
discrimina8ng
rela8onships
 –  Necessary
and
sufficient
condi8ons
 –  Computable
defini6on
should
mirror
text
defini6on
 •  Simple
formalism,
limited
expressivity
 –  Equivalence
axioms
between
named
classes
and
posi8ve
conjunc8ons
 of
named
class
and
one
or
more
existen8al
restric8ons
 •  OBO
priniciple
of
Posi.vity
 –  General
template:
 •  EquivalentClasses(NamedClass
intersec8onOf(NamedGenus
 [someValuesFrom(NamedObjectProperty
NamedDifferen.aClass)]+))

  • 7. Example:
mitochondrial
transla8on
 •  ‘mitochondrial
transla8on’
=def
‘transla8on’
that
 occurs_in
‘mitochondrion’
 – (current
rela8onships
in
GO
are
necessary
condi8ons
 only)

 OBO
 id: GO:0032543 name: mitochondrial translation intersection_of: GO:0006412 ! translation intersection_of: occurs_in GO:0005739 ! mitochondrion FOL
 X
instance_of
‘mitochondrial
transla8on’
<‐>
 

X
instance_of
transla8on
&
 


exists
C,t
[
C
instance_of
mitochondrion
at
t
&
X
occurs_in
C
at
t
]
 OWL
 Class:
‘mitochondrial
transla8on’
 manchester
 EquivalentTo:
transla8on
AND
occurs_in
SOME
mitochondrion
 syntax

  • 8. Cross
Product
(XP)
Sets
 •  GO
has
~28k
classes
 –  Retrospec8ve
assignment
of
logical
defini8ons
is
a
lot
of
work
 –  Divide
work
according
to
ontologies
directly
used
 •  Cross
Product
par88ons
 –  X
 
<O1
x
O2
x
..
x
On
>

 •  typically
n=2
 •  Genus
taken
from
O1
 •  Differen8ae
taken
from
O2..n



 –  Example:
BP:cysteine_biosynthesis
 
<BP
x
CHEBI>
 •  BP:biosynthesis
that
has_output
CHEBI:cysteine

 –  Each
XP
set
has
one
or
more
templates
 •  Obol
grammars
 –  h:p://wiki.geneontology.org/index.php/Category:Cross_Products

  • 9. Results:
Logical
defini8ons
per
XP
set
 Genus
 MF
 BP
 CC
 13k
classes
have
 MF
 103
 241
 148
 provisional
logical
 defini8ons
(46%
of
classes)

 BP
 4046
 27
 CC
 634
 289
 cell
 541
 25
 anatomy
 692
 chemical
 7278
 3072
 protein
 37
 quality
 0
 sequence
 66
 RNA
 0

  • 10. GO
Class
 Logical
Defini6on
 Genus
 Differen6a
 Ontology
 ontology(s)
 S
phase
of
mito6c
 S
phase

and
part_of
mitosis

 BP
 BP
 cell
cycle

 mitochondrial
 transla6on
and
occurs_in
mitochondrion

 BP
 CC
 transla6on
 Oocyte
 cell
differen6a6on
and
 BP
 CL
 differen6a6on

 results_in_acquisi.on_of_features_of
 oocyte

 Neural
plate
 anatomical
structure
forma6on
and
 BP
 anatomy
 forma6on

 results_in_forma.on_of
neural
plate

 Interleukin‐1
 biosynthe6c
process
and
has_output
 BP
 PRO
 biosynthesis

 interleukin‐1

 L‐cysteine
 catabolic
process
and
has_input
L‐ BP
 CHEBI
 catabolic
process
 cysteine
and
has_output
taurine

 to
taurine

 group
I
intron
 catabolic
process
and
has_input
group
I
 BP
 SO/RNAO
 catabolic
process

 intron


  • 11. GO
Class
 Logical
Defini6on
 Genus
 Differen6a
 Ontology
 ontology(s)
 histone
 protein
complex
and
has_func.on
 CC
 MF
 deacetylase
 histone
deacetylase
ac6vity

 complex

 acrosomal
 membrane
and
surrounds
acrosome

 CC
 CC
 membrane

 neuron
projec6on
 cell
projec6on
and
part_of
neuron
 CC
 CL
 virion
transport
 transport
vesicle
and
realizes
vesicle
 CC
 BP
 vesicle
 transport
 snoRNP
binding
 binding
and
results_in_binding_of

 MF
 CC
 snoRNP
 methionine
 cataly6c
ac6vity
and

 MF
 CHEBI
 synthase
ac6vity
 has_input

5‐methyltetrahydrofolate
 and
has_input

L‐homocysteine
and
 has_output

tetrahydrofolate
and
 has_output

L‐methionine

  • 12. Nested
logical
defini8ons
 •  Mul8ple
differen8ae
and
nested
descrip8ons
 allowed
 – Only
named
classes
used
 – Spans
XP
sets
 GO
Class
 Logical
Defini6on
 Genus
 Differen6a
 Ontology
 ontology(s)
 nega6ve
regula6on
 biological
process
and
 BP
 BP
 of
RNA
metabolic
 has_par.cipant
RNA
 process
 metabolic
process
 RNA
metabolic
 metabolic
process
and
 BP
 CHEBI
 process
 has_par.cipant
RNA

  • 13. Development
and
anatomy
 •  Neural
plate
forma6on
=
anatomical
structure
 forma6on
and
results_in_forma.on_of
neural
plate
 –  GO
annota8ons
to
xenopus,
zebrafish,
mouse
 •  Where
is
neural
plate
declared?
 –  Developmental
structures
not
in
scope
of
FMA
 –  Other
choices:
 •  EHDAA
–
mouse
(TS1‐26)
 •  ZFA
‐
zebrafish
 •  TAO
‐
teleost
 •  XAO
‐
xenopus
 –  Gross
anatomical
ontologies
are
species‐or‐taxon‐centric


  • 14. Uberon:
a
mul8‐species
anatomy
 ontology
 •  GO
contains
an
implicit
anatomy
ontology
spanning
mul8ple
species
 –  GO:0007423
!
sensory
organ
development
 •  GO:0001654
!
eye
development


 –  GO:0043010
!
camera‐type
eye
development
 –  GO:0048749
!
compound
eye
development
 sensory
organ
 •  Normalized
to
form
Uberon
 development
 –  Alignments
with
species‐centric
AOs
 –  3000
classes
 –  See
Poster
 •  Current
XP
par88oning:
 eye
 development
 –  Uberon
[most
metazoa]
 –  PO
[plants]
 –  Others
 •  Fungal
anatomy
ontology
 •  Dictyosteliam
anatomy
ontology
 compound
eye
 camera‐type
 development
 eye
 development

  • 15. Addi8onal
rela8ons
are
required
for
 full
XP
set
 •  Core
RO
 – part_of,
has_par.cipant
 •  Spa8al
rela8ons
(CC
x
{CC,CL})
 – membranes,
pores
 – adjacent_to,
surrounds,
perforates
 •  Par8cipa8on
rela8on
subtypes
 – has_input,
has_output
 – ‘macro’
defined
rela8ons
 – E.g.
results_in_transport_{of,to,from}

  • 16. Reasoning
 •  Reasoning
used
as
part
of
ontology
development
cycle
 –  batch
mode
 –  interac8ve
in
OBO‐Edit2
 –  pre‐reasoned:
inferred
rela8onships
are
asserted
 •  Scalability
 –  GO
+
XPs
+
Referenced
ontologies
=
130k
classes
 –  In
memory
reasoners
do
not
scale
 –  h:p://wiki.geneontology.org/index.php/OBO‐ Edit:Reasoner_Benchmarks
 –  Solu8ons:
 •  Segmenta8on
by
XP
set
 •  CHEBI
slim
 •  RDBMS
based
reasoning

  • 17. Reasoner
results
 •  1000s
of
links
fixed
over
number
years
 •  inconsistencies
internal
to
GO
fixed
immediately
 – Fix
hierarchy
of
defined
class
 – Fix
hierarchy
of
referenced
class
 •  abduc8ve
reasoning
(Bada
et
al
OWLED
2008)
 – Fix
logical
defini8on
 •  inconsistencies
external
to
GO
take
longer
to
be
 resolved
 – CL
 – CHEBI

  • 18. BP
x
CHEBI
example
 transport
 carbohydrate
 is_a
 is_a
 is_a
 carbohydrate
 nucleo6de,
 carbohydrate
 phosphates
 nucleobase
or
 transport
 nucleoside
 cabrohydrate
transport
=def
transport
 is_a
 transport
 and
results_in_movement_of
 nucleoside
 carbohydrate
 phosphates
 is_a
 is_a
 nucleo6de
 nucleo6des
 transport
 nucleo6de
transport
=def
transport
and
 results_in_movement_of
nucleo8de

  • 19. Release
plan:
basic
and
extended
 releases
 •  GO
is
currently
available
in
two
versions
 –  gene_ontology:
“standard”
 •  is_a,
part_of,
intra‐ontology
regulates
 •  intended
for
basic
tools
 –  gene_ontology_ext:
“extended”
 •  h:p://www.geneontology.org/GO.ontology‐ext.rela8ons.shtml
 •  standard
+
other
rela8ons
and
axioms
 –  disjoint_from
 –  has_part
(Aug
1
2009)
 •  XP
sets
current
available
as
separate
bridge
files
 –  h:p://wiki.geneontology.org/index.php/ Category:Cross_Products
 –  will
gradually
migrate
into
gene_ontology_ext


  • 20. Pre
vs
post
composi8on
 •  Compose
class
descrip8ons
 –  During
ontology
development
cycle?
 –  At
the
8me
of
annota8on?
 •  Logically
equivalent…
 –  Given
computable
defini8ons,
reasoners
can
determine
equivalency
 •  ..
But
very
different
from
prac8cal
point
of
view
 •  GO
guidelines
 –  pre‐compose
classes
for
any
type
for
which
scien8fic
generaliza8ons
 can
be
made
 •  Yes:
mitochondrial
transla8on
 •  Yes:
oocyte
nucleus
 •  No:
nucleus
of
epithelium
of
le~
ear


 –  Use
post‐composi8on
to
extend
at
annota8on
8me



  • 21. Related
work:
weaving
the
fabric
of
 the
OBO
Foundry
 •  Ontology
for
Biomedical
Inves8ga8ons
(OBI)
 •  Phenotype
Ontologies
 –  Mammalian
Phenotype
 –  Human
Phenotype
 –  Worm
Phenotype
 –  Plant
trait
 •  Environment
ontology
 •  FMA
 •  Fly
anatomy
ontology
 –  Neuronal
subtype
and
sense
organ
logical
defini8ons
using
 CHEBI
and
GO

  • 22. Future
applica8ons
of
cross‐product
 sets
 •  Demonstrated
u8lity
as
part
of
ontology
development
cycle
 –  How
do
we
evaluate?
 –  but
what
about
actual
applica8ons?
 •  How
can
logical
defini8ons
(and
addi8onal
axioma8sa8on
in
 general)
help:
 –  Search
and
discovery
 –  Visualiza8on
and
presenta8on
to
users
 –  Cura8on
 –  Improve
func8on
predic8on
 –  Database
integra8on
 •  E.g.
pathway
databases
 –  Term
enrichment
 –  Seman8c
similarity
 •  Need
to
educate
tool
developers

  • 23. Conclusions
 •  Normalizing
retrospec.vely
is
hard
 –  Prospec.ve
approach
recommended
 –  But
redundancy
in
effort
from
alterna8ve
perspec8ve
can
yield
 valuable
informa8on
 •  Many
of
the
challenges
are
sociotechnological
 –  What
if
the
referenced
ontology
 •  does
not
yet
exist?
 •  exists
but
is
unfunded?
 •  is
constructed
according
to
different
principles?
 •  is
incomplete?
 •  ..or
there
is
a
choice
of
two
compe8ng
ontologies?
 –  The
OBO
Foundry
process
is
crucial
 •  Grant
challenge:
more
applica8ons
needed

  • 24. Acknowledgments
 •  GO
Ontology
Developers
 •  OBO
Ontology
developers
 –  Midori
Harris
 –  Alex
Diehl
(GO,
Cell)
 –  Janna
Has8ngs
(CHEBI)
 –  Jane
Lomax
 –  Paula
de
Matos
(CHEBI)
 –  Jen
Deegan
 –  David
Osumi‐Sutherland
(Fly)
 –  Amelia
Ireland
 –  Melissa
Haendel
(Zebrafish)
 –  Tanya
Berardini
 –  Darren
Natale
(PRO)
 –  David
Hill
 –  Karen
Eilbeck
(SO)
 •  Also
 •  OBO‐Edit
 •  Amina
Abdulla
 –  Mike
Bada
 –  Colin
Batchelor
 •  Nomi
Harris
 •  John
Day‐Richter
 •  OBO
 •  GO
PIs
 –  Alan
Ru:enberg
 –  Suzanna
Lewis
 –  Barry
Smith
 –  Mike
Cherry
 –  Richard
Scheuermann
 –  Michael
Ashburner
 –  Judith
Blake