SlideShare a Scribd company logo
1 of 22
Download to read offline
Enriching	
  the	
  Gene	
  Ontology	
  via	
  the	
  
   Dissec4on	
  of	
  Labels	
  using	
  the	
  
 Ontology	
  Pre-­‐Processor	
  Language	
  
   Jesualdo	
  Tomás	
  Fernández-­‐Breis,	
  Luigi	
  Iannone,	
  	
  
Ignazio	
  Palmisano,	
  Alan	
  L.	
  Rector,	
  and	
  Robert	
  Stevens	
  




                                October	
  12th	
  2010,	
  Lisbon,	
  Portugal	
  
Mo4va4on	
  
•  Biomedical	
  Ontologies	
  
   – The	
  OBO	
  Foundry	
  
       •  More	
  than	
  200	
  biomedical	
  ontologies	
  
       •  Some	
  proper4es	
  
            – Delineated	
  content	
  
            – Reuse	
  of	
  exis4ng	
  ontologies	
  
            – Textual	
  defini4ons	
  
            – Systema4c	
  naming	
  conven4on	
  
       •  Limited	
  explicit	
  seman4cs	
  
Gene	
  Ontology	
  Consor4um	
  
Enrichment	
  of	
  GO	
  Molecular	
  Func4on	
  
                                Dissec4on	
  of	
      Analysis	
  of	
  
  Original	
  GO	
  MF	
  
                                the	
  Ontology	
        Labels	
  




   Execu4on	
  of	
               Design	
  of	
      Iden4fica4on	
  
  the	
  Knowledge	
             Knowledge	
           of	
  Linguis4c	
  
     PaQerns	
                    PaQerns	
              PaQerns	
  




     Enriched	
  GO	
  MF	
  
Dissec4on	
  of	
  the	
  ontology	
  into	
  	
  
             its	
  seman4c	
  axes	
  
•  Normaliza4on	
  

•  Analysis	
  of	
  the	
  labels	
  
    – Biochemical	
  substances	
  
    – Biological	
  processes	
  
    – Cellular	
  component	
  


•  Reuse	
  and	
  combina4on	
  of	
  exis4ng	
  ontologies	
  
MyAuxiliarOntology	
  




          Biological	
                                                                                   Cellular	
  
           Process	
                                                                                   Component	
  

                                                      MySubstances	
  




                           Rela4ons	
  	
                                                                    Biochemical	
  
FMA	
                                         CHEBI	
              MyProtein	
         Aminoacid	
  
                           Ontology	
                                                                          Complex	
  




                                                                   EC-­‐Primi4ve	
  
Design	
  of	
  linguis4c	
  paQerns	
  from	
  labels	
  
•  Manual	
  analysis	
  of	
  the	
  structure	
  of	
  the	
  labels	
  
   by	
  taxonomies	
  

•  Some	
  linguis4c	
  paQerns	
  
    – “X	
  binding”	
  
    – “X	
  codon	
  amino	
  acid	
  adaptor	
  ac4vity”	
  	
  
    – “base	
  pairing	
  with	
  X”	
  
    – “transla4on	
  X	
  factor	
  ac4vity”	
  
Design	
  of	
  knowledge	
  paQerns	
  	
  
•  Some	
  	
  knowledge	
  paQerns	
  
      binding	
  =	
  	
  
      molecular_func,on	
  and	
  enables	
  some	
  
      (binds	
  some	
  chemical_substance	
  or	
  binds	
  some	
  cellular_component)	
  




triplet_codon_amino_acid_adaptor_ac4vity=	
  	
  
molecular_func,on	
  	
  
and	
  enables	
  some	
  (adapts	
  some	
  (amino_acid	
  and	
  recognizes	
  some	
  triplet))	
  
Execu4on	
  of	
  the	
  knowledge	
  paQerns	
  	
  

•  OPPL	
  Version	
  2	
  	
  
    – hQp://oppl2.sourceforge.net/	
  


•  Bulk	
  manipula4on	
  of	
  OWL	
  ontologies	
  
    – Enrichment,	
  Verifica4on,	
  PaQerns	
  
    – Manchester	
  OWL	
  Syntax	
  


•  Declara4ve	
  
    – OWL	
  Axioms,	
  variables,	
  regular	
  expressions	
  
OPPL	
  Use	
  case	
  
                           Values	
  




                        OPPL	
  Script	
  
Lean	
                                                                Rich	
  




                            OWL	
  
                           axioms	
  

            Egaña	
  et	
  al.	
  OWLED	
  2008	
  &	
  EKAW	
  2008,	
  Iannone	
  ESWC	
  2009	
  
A	
  paQern	
  as	
  an	
  	
  OPPL	
  script	
  
?y:CLASS=Match("((w+))_codon_amino_acid_adaptor_ac4vity"),	
  	
  
?x:CLASS=create(?y.GROUPS(1))	
  

SELECT	
  ?y	
  subClassOf	
  Thing	
  	
  
WHERE	
  ?y	
  Match("((w+))_codon_amino_acid_adaptor_ac4vity")	
  

BEGIN	
  

ADD	
  ?y	
  subClassOf	
  molecular_func4on,	
  	
  
ADD	
  ?y	
  subClassOf	
  enables	
  some	
  
(adapts	
  some	
  (amino_acid	
  and	
  recognizes	
  some	
  ?x))	
  

END;	
  
Results-­‐	
  Scope	
  
•  The	
  “source”	
  Gene	
  Ontology	
  
      –  Version	
  1550	
  
      –  	
  8548	
  classes,	
  5	
  OP,	
  5	
  DP	
  and	
  9954	
  subclass	
  axioms	
  
      –  Classifica4on	
  4me	
  :	
  <	
  1	
  sec	
  (Fact++)	
  

•  Scope	
  of	
  this	
  study	
  (approx	
  18%	
  GO	
  MF)	
  
      –  binding	
  	
  
      –  structural	
  molecule	
  ac4vity	
  	
  
      –  chaperone	
  ac4vity	
  	
  
      –  proteasome	
  regulator	
  ac4vity	
  	
  
      –  electron	
  carrier	
  ac4vity	
  	
  
      –  enzyme	
  regulator	
  ac4vity	
  	
  
      –  transla4on	
  regulator	
  ac4vity	
  

•  Complete	
  results:	
  hQp://miuras.inf.um.es/~mfoppl/	
  
Results	
  –	
  Effec4veness	
  
– 1567	
  descendant	
  classes	
  of	
  binding	
  

– Knowledge	
  paQerns:	
  
    •  Binding:	
  1228	
  /	
  1567	
  (78%)	
  
    •  Base	
  pairing:	
  6	
  /84	
  	
  	
  
         – Molecular	
  adaptor	
  ac4vity	
  	
  (71/72)	
  
    •  Triplet	
  codon	
  amino	
  acid	
  ac4vity	
  (64/64)	
  
    •  All	
  the	
  7	
  binding	
  paQerns:	
  1336	
  /1567	
  (85%)	
  
Results-­‐	
  Enrichment	
  (I)	
  
Before	
                        A(er	
  
Results-­‐	
  Enrichment	
  (II)	
  
•  The	
  enriched	
  GO	
  MF	
  
    –  58624	
  classes,	
  254	
  OP,	
  16	
  DP,	
  107631	
  subclass	
  axioms,	
  264	
  
       equivalent	
  class	
  axioms	
  and	
  488	
  disjoint	
  class	
  axioms	
  

    –  Classifica4on	
  4me:	
  	
  approx	
  2	
  minutes	
  (Fact++)	
  

    –  	
  Due	
  to	
  the	
  paQerns	
  
          •  584	
  new	
  classes	
  
                 –  Subop4mal	
  auxiliary	
  ontologies:	
  D1	
  Dopamine	
  
                 –  Use	
  of	
  abbreviated	
  forms	
  in	
  GO	
  MF:	
  MAPK,	
  IgX	
  
          •  13	
  new	
  OP	
  	
  	
  
          •  3608	
  new	
  subclass	
  axioms	
  	
  	
  
Results-­‐	
  Querying	
  (III)	
  
•  We	
  can	
  make	
  queries	
  that	
  were	
  not	
  possible	
  
   with	
  the	
  original	
  ontology:	
  
    – Example:	
  Molecular	
  func/ons	
  that	
  bind	
  
      substances	
  that	
  play	
  a	
  chemical	
  role	
  
Results-­‐	
  Findings	
  (II)	
  
•  We	
  can	
  make	
  queries	
  that	
  were	
  not	
  possible	
  
   with	
  the	
  original	
  ontology:	
  
    – Example:	
  Molecular	
  func/ons	
  that	
  bind	
  
      substances	
  that	
  play	
  a	
  chemical	
  role	
  
Results-­‐	
  Time	
  (IV)	
  
•  Execu4on	
  4me	
  of	
  the	
  binding	
  paQerns	
  
Conclusions	
  	
  
•  PaQerns	
  and	
  OPPL	
  are	
  useful	
  for	
  suppor4ng	
  ontology	
  
   enrichment	
  processes	
  

•  The	
  structure	
  of	
  the	
  labels	
  in	
  biomedical	
  ontologies	
  
   embeds	
  knowledge	
  that	
  can	
  be	
  extracted	
  

•  Benefits	
  of	
  encoding	
  knowledge	
  into	
  paQerns:	
  
   modularity,	
  maintenance	
  and	
  evolu4on	
  

•  Cri4cal	
  factor:	
  the	
  auxiliary	
  ontologies	
  
Further	
  work	
  
•  Bio-­‐evalua4on	
  of	
  the	
  paQerns	
  

•  Iden4fica4on	
  of	
  linguis4cs	
  paQerns	
  using	
  text	
  
   mining	
  techniques	
  

•  Applica4on	
  to	
  the	
  rest	
  of	
  GO	
  MF	
  and	
  the	
  other	
  
   GO	
  ontologies	
  

•  Alignment	
  with	
  efforts	
  of	
  the	
  GO	
  Consor4um	
  
Acknowledgements	
  



Thanks	
  for	
  your	
  aQen4on!	
  


Jesualdo	
  Tomás	
  Fernández	
  Breis	
  
       jfernand@um.es	
  
  hQp://webs.um.es/jfernand	
  

More Related Content

Similar to Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 
1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)
phdcareers
 
Hu cal platnimm alis adds
Hu cal platnimm alis addsHu cal platnimm alis adds
Hu cal platnimm alis adds
Brandon Chackel
 
Resume John Hsieh Scientist Enzymologist
Resume John Hsieh Scientist EnzymologistResume John Hsieh Scientist Enzymologist
Resume John Hsieh Scientist Enzymologist
Chang-Tai (John) Hsieh
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
ChemAxon
 

Similar to Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language (20)

lac operon and trp operon ppt
lac operon and trp operon pptlac operon and trp operon ppt
lac operon and trp operon ppt
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
 
Team presentation min
Team presentation minTeam presentation min
Team presentation min
 
BioDiscovery Solutions for Future
BioDiscovery Solutions for FutureBioDiscovery Solutions for Future
BioDiscovery Solutions for Future
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
ICDB
ICDBICDB
ICDB
 
"Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature""Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature"
 
1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)
 
Hu cal platnimm alis adds
Hu cal platnimm alis addsHu cal platnimm alis adds
Hu cal platnimm alis adds
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
MIREOT
MIREOTMIREOT
MIREOT
 
SMART Protocols in LISC-2014
SMART Protocols in LISC-2014 SMART Protocols in LISC-2014
SMART Protocols in LISC-2014
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
 
Jessica torres
Jessica torresJessica torres
Jessica torres
 
Resume John Hsieh Scientist Enzymologist
Resume John Hsieh Scientist EnzymologistResume John Hsieh Scientist Enzymologist
Resume John Hsieh Scientist Enzymologist
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
Building a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4jBuilding a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4j
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 

Recently uploaded

Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
Overkill Security
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 

Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

  • 1. Enriching  the  Gene  Ontology  via  the   Dissec4on  of  Labels  using  the   Ontology  Pre-­‐Processor  Language   Jesualdo  Tomás  Fernández-­‐Breis,  Luigi  Iannone,     Ignazio  Palmisano,  Alan  L.  Rector,  and  Robert  Stevens   October  12th  2010,  Lisbon,  Portugal  
  • 2. Mo4va4on   •  Biomedical  Ontologies   – The  OBO  Foundry   •  More  than  200  biomedical  ontologies   •  Some  proper4es   – Delineated  content   – Reuse  of  exis4ng  ontologies   – Textual  defini4ons   – Systema4c  naming  conven4on   •  Limited  explicit  seman4cs  
  • 4.
  • 5. Enrichment  of  GO  Molecular  Func4on   Dissec4on  of   Analysis  of   Original  GO  MF   the  Ontology   Labels   Execu4on  of   Design  of   Iden4fica4on   the  Knowledge   Knowledge   of  Linguis4c   PaQerns   PaQerns   PaQerns   Enriched  GO  MF  
  • 6. Dissec4on  of  the  ontology  into     its  seman4c  axes   •  Normaliza4on   •  Analysis  of  the  labels   – Biochemical  substances   – Biological  processes   – Cellular  component   •  Reuse  and  combina4on  of  exis4ng  ontologies  
  • 7. MyAuxiliarOntology   Biological   Cellular   Process   Component   MySubstances   Rela4ons     Biochemical   FMA   CHEBI   MyProtein   Aminoacid   Ontology   Complex   EC-­‐Primi4ve  
  • 8. Design  of  linguis4c  paQerns  from  labels   •  Manual  analysis  of  the  structure  of  the  labels   by  taxonomies   •  Some  linguis4c  paQerns   – “X  binding”   – “X  codon  amino  acid  adaptor  ac4vity”     – “base  pairing  with  X”   – “transla4on  X  factor  ac4vity”  
  • 9. Design  of  knowledge  paQerns     •  Some    knowledge  paQerns   binding  =     molecular_func,on  and  enables  some   (binds  some  chemical_substance  or  binds  some  cellular_component)   triplet_codon_amino_acid_adaptor_ac4vity=     molecular_func,on     and  enables  some  (adapts  some  (amino_acid  and  recognizes  some  triplet))  
  • 10. Execu4on  of  the  knowledge  paQerns     •  OPPL  Version  2     – hQp://oppl2.sourceforge.net/   •  Bulk  manipula4on  of  OWL  ontologies   – Enrichment,  Verifica4on,  PaQerns   – Manchester  OWL  Syntax   •  Declara4ve   – OWL  Axioms,  variables,  regular  expressions  
  • 11. OPPL  Use  case   Values   OPPL  Script   Lean   Rich   OWL   axioms   Egaña  et  al.  OWLED  2008  &  EKAW  2008,  Iannone  ESWC  2009  
  • 12. A  paQern  as  an    OPPL  script   ?y:CLASS=Match("((w+))_codon_amino_acid_adaptor_ac4vity"),     ?x:CLASS=create(?y.GROUPS(1))   SELECT  ?y  subClassOf  Thing     WHERE  ?y  Match("((w+))_codon_amino_acid_adaptor_ac4vity")   BEGIN   ADD  ?y  subClassOf  molecular_func4on,     ADD  ?y  subClassOf  enables  some   (adapts  some  (amino_acid  and  recognizes  some  ?x))   END;  
  • 13. Results-­‐  Scope   •  The  “source”  Gene  Ontology   –  Version  1550   –   8548  classes,  5  OP,  5  DP  and  9954  subclass  axioms   –  Classifica4on  4me  :  <  1  sec  (Fact++)   •  Scope  of  this  study  (approx  18%  GO  MF)   –  binding     –  structural  molecule  ac4vity     –  chaperone  ac4vity     –  proteasome  regulator  ac4vity     –  electron  carrier  ac4vity     –  enzyme  regulator  ac4vity     –  transla4on  regulator  ac4vity   •  Complete  results:  hQp://miuras.inf.um.es/~mfoppl/  
  • 14. Results  –  Effec4veness   – 1567  descendant  classes  of  binding   – Knowledge  paQerns:   •  Binding:  1228  /  1567  (78%)   •  Base  pairing:  6  /84       – Molecular  adaptor  ac4vity    (71/72)   •  Triplet  codon  amino  acid  ac4vity  (64/64)   •  All  the  7  binding  paQerns:  1336  /1567  (85%)  
  • 15. Results-­‐  Enrichment  (I)   Before   A(er  
  • 16. Results-­‐  Enrichment  (II)   •  The  enriched  GO  MF   –  58624  classes,  254  OP,  16  DP,  107631  subclass  axioms,  264   equivalent  class  axioms  and  488  disjoint  class  axioms   –  Classifica4on  4me:    approx  2  minutes  (Fact++)   –   Due  to  the  paQerns   •  584  new  classes   –  Subop4mal  auxiliary  ontologies:  D1  Dopamine   –  Use  of  abbreviated  forms  in  GO  MF:  MAPK,  IgX   •  13  new  OP       •  3608  new  subclass  axioms      
  • 17. Results-­‐  Querying  (III)   •  We  can  make  queries  that  were  not  possible   with  the  original  ontology:   – Example:  Molecular  func/ons  that  bind   substances  that  play  a  chemical  role  
  • 18. Results-­‐  Findings  (II)   •  We  can  make  queries  that  were  not  possible   with  the  original  ontology:   – Example:  Molecular  func/ons  that  bind   substances  that  play  a  chemical  role  
  • 19. Results-­‐  Time  (IV)   •  Execu4on  4me  of  the  binding  paQerns  
  • 20. Conclusions     •  PaQerns  and  OPPL  are  useful  for  suppor4ng  ontology   enrichment  processes   •  The  structure  of  the  labels  in  biomedical  ontologies   embeds  knowledge  that  can  be  extracted   •  Benefits  of  encoding  knowledge  into  paQerns:   modularity,  maintenance  and  evolu4on   •  Cri4cal  factor:  the  auxiliary  ontologies  
  • 21. Further  work   •  Bio-­‐evalua4on  of  the  paQerns   •  Iden4fica4on  of  linguis4cs  paQerns  using  text   mining  techniques   •  Applica4on  to  the  rest  of  GO  MF  and  the  other   GO  ontologies   •  Alignment  with  efforts  of  the  GO  Consor4um  
  • 22. Acknowledgements   Thanks  for  your  aQen4on!   Jesualdo  Tomás  Fernández  Breis   jfernand@um.es   hQp://webs.um.es/jfernand