Unifying ontology services forfunctional genomic annotationsTomasz Adamusiak MD PhD   7omaszPostdoc at LHC CgSB since 10/2...
The European Molecular Biology Laboratory, a        “European NIH” for molecular biology        Heidelberg               H...
EMBL-EBI external funding    • Sources of external funding as of December 2010                                            ...
Focus on providing database services to bioinformaticscommunity                              Literature and ontologies    ...
ArrayExpress is the 2nd largest resource for publictranscriptomics data (CIBEX < AE < GEO)  ‘blood cancer’  ‘hematological...
Experimental Factor Ontology (EFO)• Modelling experimental factors currently in Archive:  species, diseases, cell lines, e...
Developed a process to automatically import metadatafrom reference ontologies and validate changes                        ...
Step 1: xrefs are acquired by fuzzy lexicalmatching to domain ontologies                                         Disease O...
Did not evaluate Norm in this context• Production requirements (Perl, OWL)• Improvement (ngrams) over legacy code• Primary...
Step 2: definitions and synonyms are pulled in    from reference ontologies via NCBO BioPortalS   acute lymphoblastic leuk...
Step 3: regression testing package produces areport for manual verification of the import• 13 different tests• Shared xref...
EFO has a unique XSLT-based web presencehttp://www.ebi.ac.uk/efo/overview                                           12
EFO URIs are readable by humans and computers                                          13
Content negotiation is an alternative approachTuckey’s server side urlrewritefilter<rule>           <condition name="Accep...
The Semantic Web provides a common framework that allows data tobe shared and reused across application, enterprise, and c...
RDF triple is the core concept underpinning thesemantic web              subject                               predicate  ...
Open linked data lacks central URI reconciliation• Responsibility for URIs:http://bio2rdf.org/mesh:68009154http://bio2rdf....
Common Ontology Application Tasks(OntoCAT)  ‘blood cancer’  ‘hematological neoplasm’  ‘haematological neoplasm’           ...
There is no single ontology resource that coversall the use casesLocal ontologies in OWL/OBONCBO BioPortalEBI Ontology Loo...
EBI Ontology Lookup Service•   82 ontologies•   OBO ontologies•   SOAP web services/Java client•   First out thereCote RG,...
NCBO BioPortal•   267 ontologies and growing•   Both OWL and OBO•   REST web services•   Rich in functionalityNoy, N.F., S...
OLS vs. BioPortal (July, 2010)                                 22
OWL API• Reference implementation for manipulating and  serialising OWL2• Multiple parsers (incl. OBO)• Reasoner interface...
We wanted to annotate data with ontology terms within theMOLGENIS framework – ontology browser                         OWL...
Integration is hard                      25
A simple facade to ontology resources providing a set offunctions most common to ontology APIs (e.g. HL7 CTS2,UMLS API) un...
There are many ways how you could useOntoCAT• Store data and annotate with ontology terms   • OntoCAT database and browser...
The curious case of OntoFox, OntoBee, andOntoCAT                                       28
Developed for internal and external use casesExample 11@ontocat.org• Automatically obtain CUIs from UMLS sources for  extr...
ontocat R is first on Google                               30
Use case – explore beyond subsumptionExample 16@ontocat.org• Requested by reviewer for partonomy in GO• Easy in OBO, hard ...
Reasoning is fundamental to exploring thehierarchies of more expressive ontologies                                      He...
When ontologies classify as inconsistent it is not oftenobvious why (Open World Assumption)• Mary is_a CitizenOfFranceIs P...
The extra information is used in QC of EFO, butnot in query expansion                                         ventricular ...
Google analytics for ontocat.org                                   35
OntoCAT is enterprise-grade, low-maintenance,headache-free, zero-configuration software•   Java6, maven and ant support•  ...
Semantic Web Atlas of Gene Expression  ‘blood cancer’  ‘hematological neoplasm’  ‘haematological neoplasm’                ...
EFO inferred is_a hierarchy defines how experiments areaggregated in Atlas for re-analysishttp://www.ebi.ac.uk/gxa        ...
It is possible to infer diseases of heart computationallyrather than asserting this information directly                  ...
One RDF graph per experiment accession    Context-specific gene expression is grouped with blank nodes                    ...
One RDF graph per experiment accession    Context-specific gene expression is grouped with blank nodes                    ...
Sesame triplestore provided the shortestTime-to-Market                          REST + XSLT = RDF                         ...
Semantic Web is unlikely to take over the web,but has the potential to unify all of bioinformatics                        ...
Acknowledgments•   Morris A. Swertz’s group at the Genomics Coordination Center (GCC),    University of Groningen         ...
slides @ www.slideshare.net/adamusiakThank you!                                        4545
Upcoming SlideShare
Loading in …5
×

Unifying ontology services for functional genomic annotations

892 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
892
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Unifying ontology services for functional genomic annotations

  1. 1. Unifying ontology services forfunctional genomic annotationsTomasz Adamusiak MD PhD 7omaszPostdoc at LHC CgSB since 10/2011 1 EBI is an Outstation of the European Molecular Biology Laboratory.
  2. 2. The European Molecular Biology Laboratory, a “European NIH” for molecular biology Heidelberg Hamburg Hinxton Basic research in Structural biology Bioinformatics molecular biology Administration Grenoble Monterotondo EMBO • 1500 staff • >60 nationalities Structural biology Mouse biology 22
  3. 3. EMBL-EBI external funding • Sources of external funding as of December 2010 And no taxes! 33
  4. 4. Focus on providing database services to bioinformaticscommunity Literature and ontologies CiteXplore, GO Genomes Ensembl Ensembl Genomes Protein families, EGA motifs and domains Functional InterProNucleotide sequence genomics ENA ArrayExpress Expression Atlas Macromolecular EFO PDBe Protein activity IntAct , PRIDE Pathways Reactome Protein Sequences UniProt Chemical entities Systems ChEBI BioModels BioSamples Chemogenomics ChEMBL 4
  5. 5. ArrayExpress is the 2nd largest resource for publictranscriptomics data (CIBEX < AE < GEO) ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 5
  6. 6. Experimental Factor Ontology (EFO)• Modelling experimental factors currently in Archive: species, diseases, cell lines, etc.• Capture ~30% not in UMLS• Determined by Atlas, Ensembl, external requests (Upenn) and EBI site-wide search 6
  7. 7. Developed a process to automatically import metadatafrom reference ontologies and validate changes 20000 SYNONYMS 18000 16000 Number of classes or synonyms 14000 12000 10000 8000 6000 CLASSES 4000 2000 0 Aug-08 Jan-09 Jan-10 Jan-11 Aug-11 Time 7
  8. 8. Step 1: xrefs are acquired by fuzzy lexicalmatching to domain ontologies Disease Ontologyacute lymphoblastic leukemia acute lymphocytic leukemiahttp://www.ebi.ac.uk/efo/EFO_0000220 DOID:9952xref: NCI ThesaurusDOID:9952xref: Acute Lymphoblastic LeukemiaNCIt:C3167 C3167 map to EFOxref:DOID:9952 potential synonymy Perl mapping scriptsxref: EBI::FGPT::FuzzyRecogniserNCIt:C3167 OWL::Simple::Parser 8
  9. 9. Did not evaluate Norm in this context• Production requirements (Perl, OWL)• Improvement (ngrams) over legacy code• Primary use case mapping EFO against AE annotations: • 2-deoxy-5-azacytidine to 5-aza-2-deoxycytidine CHEBI:50131 • Barrett's Esophagus to Barretts esophagus• Difficult to use MetaMap on non-UMLS ontologies 9
  10. 10. Step 2: definitions and synonyms are pulled in from reference ontologies via NCBO BioPortalS acute lymphoblastic leukemia http://www.ebi.ac.uk/efo/EFO_0000220TE xref: xref: SNOMEDCT:91857003 translate IDsP DOID:9952 xref: xref: NCIt:C31671 NCIt:C3167 synonym: Acute lymphoid leukaemia, disease definition:S Leukemia with an acute onset [...] fetchT bioportal_provenance:E Acute Lymphocytic Leukaemia [accessedResource: NCIt:C3167]P [accessDate: 05-04-2011] bioportal_provenance:2 Leukemia with an acute onset [...] + provenance [accessedResource: NCIt:C3167] [accessDate: 05-04-2011] 10
  11. 11. Step 3: regression testing package produces areport for manual verification of the import• 13 different tests• Shared xrefs, e.g. NCIt:C17459 (Hispanic or Latino) • Hispanic (EFO_0003169) • Latino (EFO_0003166)• Shared synonyms, e.g. head kidney (ZFA:0000669) • pronephros (EFO_0000927) • bone marrow (EFO_0000868)• Changes in external sources (11/2010 vs. 5/2010): • synonym Spinocerebellar Ataxias (EFO_0002624) no longer in DOID:1441 • definition Organ with organ cavity which connects the cavity of the urinary bladder to the exterior. […] (EFO_0000931) no longer in FMAID:1966 11
  12. 12. EFO has a unique XSLT-based web presencehttp://www.ebi.ac.uk/efo/overview 12
  13. 13. EFO URIs are readable by humans and computers 13
  14. 14. Content negotiation is an alternative approachTuckey’s server side urlrewritefilter<rule> <condition name="Accept" type="header"> application/rdf+xml</condition> <from>^/$</from> <to type="redirect">/efo/efo.owl</to></rule> 14
  15. 15. The Semantic Web provides a common framework that allows data tobe shared and reused across application, enterprise, and communityboundaries (W3C)If you want to put something on the web there are three rules:1. All kinds of conceptual things, they have names now that start with HTTP.2. If I take one of these HTTP names and I look it up [...] I fetch the data using the HTTP protocol from the web, I will get back some data in a standard format3. Its got relationships [..] the other thing that its related to is given one of those names that starts HTTP. So, I can go ahead and look that thing up. Sir Tim Berners-Lee on the next Web (TED2009) 15
  16. 16. RDF triple is the core concept underpinning thesemantic web subject predicate object<http://www.example.com/index.html> <http://purl.org/dc/elements/1.1/creator> „John Smith” dc:creator example:index.html John SmithEntity Attribute Value (EAV) model with well defined semantics 16
  17. 17. Open linked data lacks central URI reconciliation• Responsibility for URIs:http://bio2rdf.org/mesh:68009154http://bio2rdf.org/pubmed:11992264http://bio2rdf.org/go:0016458http://purl.org/obo/owl/GO#GO_0016458• Versioning:http://sig.uw.edu/fma#Anatomical_entity (FMA 3.1)http://sig.biostr.washington.edu/fma3.0#Anatomical_entity (FMA 3.0)http://purl.obolibrary.org/obo/GO_0016458 (Foundry-compliant URI)• Requires institutional support• Would be great to have public UMLS in RDF 17
  18. 18. Common Ontology Application Tasks(OntoCAT) ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 18
  19. 19. There is no single ontology resource that coversall the use casesLocal ontologies in OWL/OBONCBO BioPortalEBI Ontology Lookup Service...and no huffing and puffing will blow all of them down... Leonard Leslie Brooke (1904) 19
  20. 20. EBI Ontology Lookup Service• 82 ontologies• OBO ontologies• SOAP web services/Java client• First out thereCote RG, Jones P, Apweiler R, Hermjakob H.The ontology lookup service, alightweight cross-platform tool for controlled vocabulary queries.BMC Bioinformatics. 2006 Feb 28;7(1):97 20
  21. 21. NCBO BioPortal• 267 ontologies and growing• Both OWL and OBO• REST web services• Rich in functionalityNoy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.BioPortal: ontologies andintegrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul1;37(Web Server issue):W170-3. 21
  22. 22. OLS vs. BioPortal (July, 2010) 22
  23. 23. OWL API• Reference implementation for manipulating and serialising OWL2• Multiple parsers (incl. OBO)• Reasoner interfaces• Low level accessSean Bechhofer, Phillip Lord, Raphael Volz. Cooking the Semantic Web with theOWL API. 2nd International Semantic Web Conference, ISWC, Sanibel Island,Florida, October 2003 23
  24. 24. We wanted to annotate data with ontology terms within theMOLGENIS framework – ontology browser OWL APIEFO Bioportal Import Ontology Browser 24
  25. 25. Integration is hard 25
  26. 26. A simple facade to ontology resources providing a set offunctions most common to ontology APIs (e.g. HL7 CTS2,UMLS API) under a single interfacehttp://www.ontocat.org BioPortal searchAll() searchOntology() getChildren() EBI OLS getParents() getSynonyms() getDefinitions() OWL getAllParents() getAllChildren() getRelations() OBO ... ? 26
  27. 27. There are many ways how you could useOntoCAT• Store data and annotate with ontology terms • OntoCAT database and browser• Work with ontologies in R • Bioconductor ontocat R package• Integrate a number of ontologies in a local repository • OntoCAT REST server• Add ontology support to your GWT web application • OntoCAT GoogleApphttp://www.ontocat.org/wiki/OntocatDownload 27
  28. 28. The curious case of OntoFox, OntoBee, andOntoCAT 28
  29. 29. Developed for internal and external use casesExample 11@ontocat.org• Automatically obtain CUIs from UMLS sources for extracted terms via BioPortal• Shamim Mollah, Bleeding History Phenotype Ontology, Rockefeller University Center for Clinical and Translational Science, New York, NY1. Get all terms from BHP2. Search for corresponding UMLS terms (also MetaMap)3. Obtain CUIs for mapped terms through BioPortal 29
  30. 30. ontocat R is first on Google 30
  31. 31. Use case – explore beyond subsumptionExample 16@ontocat.org• Requested by reviewer for partonomy in GO• Easy in OBO, hard in OWL• Computationally intensive: • (starting from the root node) • 1. classify all children of inverse_relation some class • 2. repeat 1. on all new nodes • 3. finish if all nodes were seen• OWL API is not thread safe 31
  32. 32. Reasoning is fundamental to exploring thehierarchies of more expressive ontologies Heart Heart Component Left Heart partOf is_a Mitral Valve 32
  33. 33. When ontologies classify as inconsistent it is not oftenobvious why (Open World Assumption)• Mary is_a CitizenOfFranceIs Paul a citizen of France?Closed World, e.g. SQL databases: NOOntologies: ?• OWL is more expressive: classes, individuals, closure axioms, value partitions, cardinality restrictions, property chains; disjoint, reflexive, irreflexive, symmetric and anti- symmetric, inverse or transitive properties• Explanation in OWL (http://owl.cs.manchester.ac.uk/explanation/) 33
  34. 34. The extra information is used in QC of EFO, butnot in query expansion ventricular subClassOf cardiomyopathy myocardium part_of has_disease_location myocardium atrial myocardium cardiac ventricle atrium heart atrial fibrillation Heart disease? 34
  35. 35. Google analytics for ontocat.org 35
  36. 36. OntoCAT is enterprise-grade, low-maintenance,headache-free, zero-configuration software• Java6, maven and ant support• Open source (LGPL v3)• 137 unit tests• Hudson daily builds Tests passed• Flexibility through design patterns: • Decorators • Proxies • Composites Daily builds 36
  37. 37. Semantic Web Atlas of Gene Expression ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 37
  38. 38. EFO inferred is_a hierarchy defines how experiments areaggregated in Atlas for re-analysishttp://www.ebi.ac.uk/gxa 38
  39. 39. It is possible to infer diseases of heart computationallyrather than asserting this information directly ventricular subClassOf cardiomyopathy myocardium part_of has_disease_location myocardium atrial myocardium cardiac ventricle atrium heart atrial fibrillation has_disease_location ∃ (heart ∪ part_of ∃ heart) heart disease ≡ 39
  40. 40. One RDF graph per experiment accession Context-specific gene expression is grouped with blank nodes experiment accession Predicates Homo sapiens efo:EFO_0004033 rdf:type rdf:type liver organism OBI_0100026 E-AFMX-1 gxa:E-AFMX-1 is_about IAO_0000136 NONDE gene EFO_0002606 experimental factor EFO_0000001 1.0E30 discretized differential EFO_0004034 expression p value OBI_0000175 PRDX2 ensembl:ENSG00000167815 gene efo:EFO_0002606W3C Note on RDF Approach to Gene Expression Data (in progress) 40Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
  41. 41. One RDF graph per experiment accession Context-specific gene expression is grouped with blank nodes experiment accession Predicates Homo sapiens efo:EFO_0004033 rdf:type rdf:type liver organism OBI_0100026 E-AFMX-1 gxa:E-AFMX-1 is_about IAO_0000136 NONDE gene EFO_0002606 experimental factor EFO_0000001 1.0E30 discretized differential EFO_0004034 expression p value OBI_0000175 approximately 14 PRDX2 weeks ensembl:ENSG00000167815 NONDE gene efo:EFO_0002606 1.0E30W3C Note on RDF Approach to Gene Expression Data (in progress) 41Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
  42. 42. Sesame triplestore provided the shortestTime-to-Market REST + XSLT = RDF WAR RDF Jun Zhao @ Oxford tc-test-3 tomcat-7 tomcat-8 Load Balancer Jena TDB Milarq wwwdev www.ebi.ac.uk www.open- 42 biomed.org.uk
  43. 43. Semantic Web is unlikely to take over the web,but has the potential to unify all of bioinformatics OntoCAT EFO Semantic Atlashttp://gigaom.com/broadband/the-storage-vs-bandwidth-debate/ 43
  44. 44. Acknowledgments• Morris A. Swertz’s group at the Genomics Coordination Center (GCC), University of Groningen This work was supported by the European • K Joeri van der Velde Communitys Seventh Framework • Despoina Antonakaki Programmes GEN2PHEN [grant number • Dasha Zhernakova 200754], SLING [grant number 226073], and SYBARIS [grant number 242220], the• James Malone European Molecular Biology Laboratory, the• Helen Parkinson Netherlands Organisation for Scientific Research [NWO/Rubicon grant number• FuzzyRecogniser: Emma Hastings 825.09.008], and the Netherlands• Niran Abeygunawardena Bioinformatics Centre [BioAssist/Biobanking platform and BioRange grant SP1.2.3]• Ele Holloway• Tim Rayner OntoCAT logo courtesy of Eamonn Maguire• Zooma: Tony Burdett• Bioconductor/R package: Natalja Kurbatova, Pavel Kurnosov, Misha Kapushesky Special thanks go to NCBO BioPortal and EBI OLS support teams for all the comprehensive help they provide 44
  45. 45. slides @ www.slideshare.net/adamusiakThank you! 4545

×