Seman&c 
integra&on 
of 
orthology 
resources: 
beyond 
the 
OGO 
experience 
Jesualdo 
Tomás 
Fernández 
Breis 
Universidad 
de 
Murcia, 
IMIB-­‐Arrixaca 
jfernand@um.es 
hKp://webs.um.es/jfernand
Why? 
• Applica&ons 
able 
to 
combine 
mu&ple 
orthology 
databases 
with 
other 
sources 
(e.g., 
diseases) 
• Answer 
queries 
like 
– “Rat 
genes 
whose 
orthologs 
are 
associated 
with 
diseases 
with 
phenotype 
Autosomal 
dominant 
inheritance” 
– “Rat 
genes 
whose 
orthologs 
par8cipate 
in 
transcrip8on 
repression 
ac8vi8es 
and 
that 
are 
also 
related 
to 
genes 
involved 
in 
lung 
cancer”
State of the art in orthology resources
State of the art in orthology resources 
Pros 
Cons 
Many 
different 
formats 
Local 
meaning 
of 
fields 
Trees 
and 
clusters 
with 
different 
meanings 
Difficult 
to 
share 
and 
reuse 
data 
Open 
Many 
community 
resources 
available 
Increasing 
interest 
in 
orthologs
Our original effort 
• OGOLOD: 
An 
orthology 
linked 
dataset 
• Resources 
– KOG 
– Inparanoid 
– Homologene 
– OrthoMCL 
– OMIM 
– NCBI 
Taxonomy 
– GO 
– Human 
Phenotype
OGOLOD: ontological issues 
ClusterOrthologs hasResource 
hasOrtholog 
causedBy 
connectedTo hasMethod 
ECO_0000000 
Method 
PubmedArticle 
x Name 
x Identifier 
hasPhenotype 
GeneticDisease 
x Name 
x Identifier 
x Location 
relatedPubmedArticle 
Protein 
x Name 
x Identifier 
NCBITaxon_1 
Gene 
x Name 
x Identifier 
GO_0008150 
GO_0003674 GO_0005575 
HP_0000001 
Resource 
participates_in 
located_in 
isTranslatedTo 
encodedBy 
fromSpecies 
evidenceCode 
Gene Ontology 
Human Phenotype 
Ontology 
Evidence 
Code 
Ontology 
NCBI Taxonomy 
Relations Ontology
OGOLOD: the process 
Data 
schemas 
Data 
Schema 
mapping 
Mapping 
file 
Ontology 
Data 
transforma&on 
RDF 
data 
Linked 
Data 
sets 
Data 
enrichment 
Linked 
dataset
Linked Open Data Cloud 
(http://lod-cloud.net/)
OGOLOD: linking data
Quest For Orthologs 
(http://questfororthologs.com)
OrthoXML (http://orthoxml.org) 
• Intended 
community 
standard 
• Not 
many 
databases 
use 
it 
so 
far 
• OrthoXML 
2 
OGO 
ontology 
mapping
OrthoXML2OGO Mapping Examples 
Mapping 
genes 
Mapping 
clusters 
of 
orthologs
SWIT 
(http://sele.inf.um.es/swit) 
• Ontology-­‐driven 
transforma&on 
and 
integra&on 
of 
data 
(Rela&onal 
Databases 
+ 
XML 
Schemas) 
• Output: 
4-­‐stars 
datasets 
• Mapping 
and 
iden&ty 
rules 
– En&ty2Class 
– En&ty2ObjectProperty 
– En&ty2DatatypeProperty 
– Complex 
transforma&on 
paKerns 
coded 
in 
OPPL2. 
• Only 
logically 
consistent 
content 
is 
transformed
SWIT 
(http://sele.inf.um.es/swit)
Next challenges….Biohackathon 2014? 
• Need 
for 
a 
standard 
ontology 
for 
orthology 
• OGO, 
Ortho, 
Homology 
Ontology, 
Compara&ve 
Data 
Analysis 
Ontology 
• The 
evolu&onary 
rela&on 
between 
two 
genes 
may 
differ 
in 
two 
different 
contexts 
• Need 
for 
gedng 
more 
orthology 
resources 
in 
standardized 
formats
Acknowledgements 
• Mari 
Carmen 
Legaz 
García, 
José 
Antonio 
Miñarro 
Giménez, 
Mikel 
Egaña 
Aranguren, 
Marisa 
Madrid 
• Biohackathon 
2014 
organizers 
for 
the 
invita&on 
TIN2010- 21388-C02-02 15295/PI/10
Thank 
you 
for 
your 
aKen&on 
Ques&ons, 
comments.. 
Jesualdo 
Tomás 
Fernández 
Breis 
Universidad 
de 
Murcia, 
IMIB-­‐Arrixaca 
jfernand@um.es 
hKp://webs.um.es/jfernand

Bh14 ogo

  • 1.
    Seman&c integra&on of orthology resources: beyond the OGO experience Jesualdo Tomás Fernández Breis Universidad de Murcia, IMIB-­‐Arrixaca jfernand@um.es hKp://webs.um.es/jfernand
  • 2.
    Why? • Applica&ons able to combine mu&ple orthology databases with other sources (e.g., diseases) • Answer queries like – “Rat genes whose orthologs are associated with diseases with phenotype Autosomal dominant inheritance” – “Rat genes whose orthologs par8cipate in transcrip8on repression ac8vi8es and that are also related to genes involved in lung cancer”
  • 3.
    State of theart in orthology resources
  • 4.
    State of theart in orthology resources Pros Cons Many different formats Local meaning of fields Trees and clusters with different meanings Difficult to share and reuse data Open Many community resources available Increasing interest in orthologs
  • 5.
    Our original effort • OGOLOD: An orthology linked dataset • Resources – KOG – Inparanoid – Homologene – OrthoMCL – OMIM – NCBI Taxonomy – GO – Human Phenotype
  • 6.
    OGOLOD: ontological issues ClusterOrthologs hasResource hasOrtholog causedBy connectedTo hasMethod ECO_0000000 Method PubmedArticle x Name x Identifier hasPhenotype GeneticDisease x Name x Identifier x Location relatedPubmedArticle Protein x Name x Identifier NCBITaxon_1 Gene x Name x Identifier GO_0008150 GO_0003674 GO_0005575 HP_0000001 Resource participates_in located_in isTranslatedTo encodedBy fromSpecies evidenceCode Gene Ontology Human Phenotype Ontology Evidence Code Ontology NCBI Taxonomy Relations Ontology
  • 7.
    OGOLOD: the process Data schemas Data Schema mapping Mapping file Ontology Data transforma&on RDF data Linked Data sets Data enrichment Linked dataset
  • 8.
    Linked Open DataCloud (http://lod-cloud.net/)
  • 9.
  • 10.
    Quest For Orthologs (http://questfororthologs.com)
  • 11.
    OrthoXML (http://orthoxml.org) •Intended community standard • Not many databases use it so far • OrthoXML 2 OGO ontology mapping
  • 12.
    OrthoXML2OGO Mapping Examples Mapping genes Mapping clusters of orthologs
  • 13.
    SWIT (http://sele.inf.um.es/swit) •Ontology-­‐driven transforma&on and integra&on of data (Rela&onal Databases + XML Schemas) • Output: 4-­‐stars datasets • Mapping and iden&ty rules – En&ty2Class – En&ty2ObjectProperty – En&ty2DatatypeProperty – Complex transforma&on paKerns coded in OPPL2. • Only logically consistent content is transformed
  • 14.
  • 15.
    Next challenges….Biohackathon 2014? • Need for a standard ontology for orthology • OGO, Ortho, Homology Ontology, Compara&ve Data Analysis Ontology • The evolu&onary rela&on between two genes may differ in two different contexts • Need for gedng more orthology resources in standardized formats
  • 16.
    Acknowledgements • Mari Carmen Legaz García, José Antonio Miñarro Giménez, Mikel Egaña Aranguren, Marisa Madrid • Biohackathon 2014 organizers for the invita&on TIN2010- 21388-C02-02 15295/PI/10
  • 17.
    Thank you for your aKen&on Ques&ons, comments.. Jesualdo Tomás Fernández Breis Universidad de Murcia, IMIB-­‐Arrixaca jfernand@um.es hKp://webs.um.es/jfernand