Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using ontologies to do integrative systems biology

1,209 views

Published on


To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.

Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology

It is part of this series: http://www.bioontology.org/webinar-series

Published in: Education
  • Be the first to comment

  • Be the first to like this

Using ontologies to do integrative systems biology

  1. 1. Using ontologies to do integrativesystems biologyChris EveloDepartment of Bioinformatics - BiGCaTMaastricht University @Chris_Evelochris.evelo@maastrichtuniversity.nl
  2. 2. Typically we want to:• Find studies.• Process data.• Integrate.• Evaluate.• Combine with yet other data.Faculty of Health, Medicine and Life Sciences
  3. 3. Systems Biology Issues:• Environment• Multi-compartment• Different levels of gene expression cascade (multi-omics)Needs:• Link information from different analysis techniques• Combine many studies (store study design)Faculty of Health, Medicine and Life Sciences
  4. 4. Using ISA tobe able tofind studieshttp://dx.doi.org/10.1038/ng.1054Faculty of Health, Medicine and Life Sciences
  5. 5. Why a study capturing application? New studies can be performed based on old data Translational comparisons (mouse, human, rat etc) Structured storage Facilitate collaborations between groups - Data sharing on joined project - Start a collaboration
  6. 6. What do we need to accomplish this Acceptance - Using standards (e.g. ISA-TAB & MAGE-TAB) - User friendly (interface via web browser) - Open source - Examples Collaboration - Ontologies - Security of data (log-in and store data locally) - Open source (make own module)
  7. 7. dbXP: a total study capturing solution Simple assay module Metabolomics moduleWeb input Study capturing module Web output Feature layer Transcriptomics module Any new module
  8. 8. dbNP Architecture GSCF Simple Assay module Query module Body weight, BMI, etc. Pathways, GO, metabolite profiles Templates Templates Templates Transcriptomics module Full-text querying Clean data Result data Raw dataSubjects Groups gene p-values cell files Structured expression z-values queryingEvents Protocols Profile-based analysis Epigenetics module Raw data Clean ResultingSamples Assays Nimblegen CPG island Genome Study comparison Illumina data Feature data Web user interfaceFaculty of Health, Medicine and Life Sciences
  9. 9. Generic Study Capture FrameworkData input / output GSCF Templates Templates Templates Subjects Groups xls, cvs, text Data import NCBO web Events Protocols Ontologies interface Samples Assays custom custom custom custom custom Molgenis programs programs EBI custom programs dbs dbs repository dbs
  10. 10. Used in European Projects Food4me (Dublin) NU-AGE (UNIBO, Bologna) Bioclaims (UIB, Palma) Nutritech (TNO, Zeist) EuroDish (WUR, Wageningen) ITFoM (proposed for metabolic syndrome studies)
  11. 11. Process the data…Faculty of Health, Medicine and Life Sciences
  12. 12. Epigenetics DNA Methylation Pipeline Raw data R Nimblegen QC, processing Clean DNA Result Raw data R methylation data Illumina QC, processing data Statistical with (Genome analysis p-values Feature (GFF)Raw sequencing data Sequence Format) MeDIP, BIS-Seq QC, processing
  13. 13. Connecting to Pathways: 1) Prepare data for pathway analysis 2) Connect processing pipelines PathVisioRPC used from arrayanalysis.org see: http://pathvisiorpc.wordpress.com 3) Store Pathway profiles as vectors, Using pathways themselves as a vocabulary C Evelo, K van Bochove & J Saito. Genes Nutr (2011) 6: 81-87Answering biological questions - querying a systems biology database for nutrigenomics 4) Allow queries for studies with same outcomeFaculty of Health, Medicine and Life Sciences
  14. 14. Integrate Example WikiPathway Pathway Pathway on glycolysis. Using modern systems iology annotation. And genes and metabolites connected to major databases.Faculty of Health, Medicine and Life Sciences
  15. 15. Find the pathways: Biological processes in duodenal mucosa affected by glutamine administration number of genesPathway Changed Up Down Measured Total Z ScoreHs_Mitochondrial_fatty_acid_betaoxidation 6 6 0 16 16 4.456Hs_Electron_Transport_Chain 17 17 0 85 105 4.278Hs_Fatty_Acid_Synthesis 5 5 0 21 22 2.757Hs_Fatty_Acid_Beta-Oxidation 6 6 0 31 32 2.424Hs_mRNA_processing_Reactome 16 6 10 118 127 2.402Hs_Unsaturated_Fatty_Acid_Beta_Oxidation 2 2 0 6 6 2.342Hs_HSP70_and_Apoptosis 4 4 0 18 18 2.299Hs_Oxidative_Stress 5 5 0 27 28 2.097Hs_Fatty_Acid_Omega_Oxidation 3 3 0 14 15 1.915Hs_Proteasome_Degradation 8 8 0 60 61 1.629Hs_RNA_transcription_Reactome 5 5 0 38 40 1.25Hs_Irinotecan_pathway_PharmGKB 2 1 1 12 12 1.154Hs_Synthesis_and_Degradation_of_Ketone_Bodies_KEGG 1 1 0 5 5 1.023
  16. 16. Connecting toother dataWe both needStudy CapturingFaculty of Health, Medicine and Life Sciences
  17. 17. If the mountain will not come to Mahomet, Mahomet must go to the mountain. Other repositories (like dbXP!) have better study descriptions. Integrate in Sage Synapse. Pathway visualisation missing: integrate PathVisio in Synapse (started).Faculty of Health, Medicine and Life Sciences
  18. 18. PathVisio www.pathvisio.org• Data modeling and visualization on biological pathways• Uses gene expression, proteomics and metabolomics data• Can identify significantly changed processes Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers, Susan Coort, Bruce R Conklin, Chris Evelo (2008) Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399
  19. 19. Understanding genomics Example WikiPathways Pathway Pathway on glycolysis. Using modern systems biology (MIM) annotation. And genes and metabolites connected to major databases.Faculty of Health, Medicine and Life Sciences
  20. 20. Faculty of Health, Medicine and Life Sciences
  21. 21. adding data =adding colour Example PathVisio result Showing proteomics and transcriptomics results on the glycolysis pathway in mice liver after starvation. [Data from Kaatje Lenaerts and Milka Sokolovic, analysis by Martijn van Iersel]Faculty of Health, Medicine and Life Sciences
  22. 22. Download Pathways Web services SPARQL endpoint
  23. 23. How to dodata visualization?
  24. 24. Connect to Genome Databases
  25. 25. Backpages link to databasesFaculty of Health, Medicine and Life Sciences
  26. 26. BridgeDbhttp://dx.doi.org/10.1186/1471-2105-11-5 Martijn van Iersel BiGCaT Maastricht
  27. 27. Problem: Identifier Mapping Entrez Gene 3643 ? Agilent probeset A65_P12450
  28. 28. Solution: Built-in Mapping • Generic bioinformatics platforms should have identifier mapping built-in. BioConductor PathVisio Cytoscape ... Batteries Included
  29. 29. Problem: Which mapping service?• Ensembl Biomart• Synergizer• CRONOS• DAVID• AliasServer• MatchMiner• OntoTranslate or• Local database
  30. 30. BridgeDB: Abstraction Layer class IDMapperRdb relational database interface IDMapper class IDMapperFile tab-delimited text class IDMapperBiomart web serviceThe BridgeDb Framework: Standardized Access to Gene, Protein and Metabolite IdentifierMapping Services. Martijn P van Iersel, Alexander R Pico, Thomas Kelder, Jianjiong Gao, Isaac Ho,Kristina Hanspers, Bruce R Conklin, Chris T Evelo. BMC Bioinformatics 2010, 11: 5.
  31. 31. CyThe- Network saurus Merge Wiki Tools PathVisio Pathways Cytoscape Plugins BridgeDb Internet webservices Local Tab-Mapping BridgeDb Databas delimitedServices BioMart PICR - e text files REST
  32. 32. BridgeDb interface1: JAVA interface 2: REST interface
  33. 33. API Overview BridgeDb.connect(...) IDMapper.mapID(...) Xref.getUrl() DataSource.getUrl()
  34. 34. Easy & Flexible Code
  35. 35. REST APIhttp://webservice.bridgedb.org/Human/xrefs/L/1234ILMN_1713029 Illumina3255967 AffyNP_001025186 RefSeqIPI00005930 IPIGO:0042752 GeneOntologyNM_033282 RefSeq3255968 Affy94233 Entrez GeneENSG00000122375Ensembl Human234226_at AffyA6NEB4 Uniprot/TrEMBL0001780601 IlluminaGO:0008020 GeneOntology606665 OMIMA_23_P24234 Agilent14449 HUGO
  36. 36. REST APIhttp://<Base URL>/<Species>/<function> [ /<argument> ... ]http://webservice.bridgedb.org/Human/xrefs/L/1234http://webservice.bridgedb.org/Human/search/ENSG00000122375http://webservice.bridgedb.org/Human/attributeSethttp://webservice.bridgedb.org/Human/propertieshttp://webservice.bridgedb.org/Human/targetDataSourceshttp://webservice.bridgedb.org/Human/attributes/L/3643http://localhost:8183/Human/xrefs/L/3643
  37. 37. R Example
  38. 38. Problem: Custom Microarrays ? Custom probe #QXZCY!34
  39. 39. Solution: Stacking EnsMart Custom table
  40. 40. CyThesaurus
  41. 41. MIRIAM and Identifiers.org Regular expression for autodetection Pattern for generating URLs Link to documentation
  42. 42. Availibility BMC Bioinformatics. 2010 Jan 4;11(1):5.www.bridgedb.orgwww.helixsoft.nl/blog bridgedb-discuss@googlegroups.com
  43. 43. Innovate using BridgeDBDataMetabolite FluxVisualizing fluxes on metabolic pathways 46
  44. 44. Integrating it allVisualizing fluxes, data and annotation
  45. 45. Extending pathways, how to do it?Faculty of Health, Medicine and Life Sciences
  46. 46. Network approaches to extend pathwaysE.g. most pathways don’t have miRNA’s
  47. 47. Adding miRNA’s
  48. 48. Pathway Loom, weaving pathwaysFaculty of Health, Medicine and Life Sciences
  49. 49. Faculty of Health, Medicine and Life Sciences
  50. 50. Adding miRNA’s clutters
  51. 51. PathVisio RI plugin provides backpage info microRNAs in pathway analysis. The Regulatory Interaction plugin offers a suitable middle-ground between not including any miRNAs in pathways, which misses this regulatory information, and including all validated miRNA-target interactions, which clutters the pathway. After loading interaction file(s), selecting a pathway element shows the interaction partners of this element and their expressions in a side panel. This allows for the detection of potential active regulatory mechanisms in the study at hand. http://www.bigcat.unimaas.nl/wiki/images/f/f6/VanHelden-poster-nbic2012.pdf
  52. 52. Or consider pathway as a networkFaculty of Health, Medicine and Life Sciences
  53. 53. GPML Cytoscape Pluginhttp://www.pathvisio.org/wiki/Cytoscape_plugin
  54. 54. Cytoscape visualization used to groupPPS1LiverAll pathwaysPathways with high z-scoregrouped together.Explains why there arerelatively few significantgenes, but many pathwayswith high z-score. Robert Caesar et al (2010) A combined transcriptomics and lipidomics analysis of subcutaneous, epididymal and mesenteric adipose tissue reveals marked functional differences. PLoS One 5: 7. e11525 http://dx.doi.org/doi:10.1371/journal.pone.0011525
  55. 55. Explore pathway interactionsThomas Kelder, Lars Eijssen, Robert Kleemann, Marjan van Erk, Teake Kooistra, Chris Evelo(2011) Exploring pathway interactions in insulin resistant mouse liver BMC Systems Biology 5: 127Aug. http://dx.doi.org/doi:10.1186/1752-0509-5-127
  56. 56. What we usedNon-redundant shortest paths in a weightedgraph.1. A set of pathways2. An interaction network3. Weight value for all edges = experimental expression of connected genes.
  57. 57. Pathway interactions and what causes them
  58. 58. An indirect interaction between the Axon Guidance and Insulin Signaling pathways in the network forthe comparison between HF and LF diet at t = 0. Left: Network representation of the identified pathbetween the two pathways, consisting of three proteins Gsk3b, Sgk3 and Tsc1. Right: The location of theseproteins in the KEGG pathway diagrams. The newly found indirect interactions have been added in red.
  59. 59. Pathway interactions anddetailed network visualizationfor the interactions with threeapoptosis related pathways forthe comparison between HF andLF diet at t = 0. A: Subgraph of thepathway interaction network, basedon incoming interactions to threestress response and apoptosispathways with the highest in-degree. Pathway nodes with a thickborder are significantly enriched (p< 0.05) with differentially expressedgenes. B: The protein interactionsthat compose the interactionsbetween the three apoptosisrelated pathways and theirneighbors in the subgraph asshown in box A (see inset, includedinteractions are colored orange).Protein nodes have a thick borderwhen their encoding genes aresignificantly differentially expressed(q < 0.05).
  60. 60. We tried to make it easier withThe CyTargetLinker Cytoscape PluginExtending pathways on the fly. Provided databases with the plugin: • miRNAs with targets • Transciption Factors with targets • Drug – Target Interactions • ENCODE derived databases Extend with your own.
  61. 61. MiRNAs of InterestmiRNA target information from mirTarBase
  62. 62. miRTarBase as a target interaction network Collection of miRNA-target gene interactions in the miRTarBase database with 1,715 genes, 286 miRNAs and 2,817 interactions.
  63. 63. miRNAs associated with colorectal cancerextended with validated target genes
  64. 64. human ErbB signaling pathway extendedwith validated microRNA regulation
  65. 65. OPS Framework OPS GUI Architecture. Dec 2011 App Framework Web Service API Sparql Web Services OPS Data Model Identity & Vocabulary Management Semantic Data Workflow Engine RDF Data Cache ChemistryNormalisation & Registration Descriptor Descriptor Descriptor Descriptor Nanopub Nanopub Feed in WikiPathways RDF 1 relationships, use BioPAX RDF 2 RDF 3 RDF 4 to create the RDF Public Vocabularies Data 1 Data 2 Data 3 Data 4
  66. 66. And then we have linked data?
  67. 67. Well yes, for Open PHACTS we do… OPS Data Model Identity & Vocabulary Management Semantic Data Workflow Engine ChemistryNormalisation & Registration Descriptor Descriptor RDF 1 RDF 2 Public Vocabularies Data 1 Data 2
  68. 68. But really…,what about federated SPARQL queries? Descriptor Descriptor RDF 1 RDF 2 Other Public Vocabularies Data 1 Data 2 Public Vocabularies
  69. 69. Most often partly… If the vocabularies used are different linking just database IDs not good enough. We need full mappings of ontologies. Identification of overlapping modules. And maybe… Suggestions for ontologies to use in specific field. Identity Mapping Descriptor Descriptor RDF 1 RDF 2 Other Public Vocabularies Data 1 Data 2 Public Vocabularies
  70. 70. Thanks! WikiPathways team: • Martijn van Iersel (PathVisio, BridgeDB) • Thomas Kelder (WikiPathways, networks) • Alex Pico (US team leader) • Brice Conklin (former US team leader) • Kristina Hanspers (US curation) • Martina Kutmon (CyTargetLinker) • Susan Coort (Regulatory plugins) • Lars Eijssen (Data pipelines) • Anwesha Dutta (Flux visualisation) • Andra Waagmeester (LOOM) • Egon Willighagen (Open Phacts) Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional: Transnational University. EU: NuGO and Microgennet, IMI: Open Phacts + Agilent thought leader grant and NIH.
  71. 71. Thanks! Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional: Transnational University. EU: NuGO and Microgennet, IMI: Open Phacts + Agilent thought leader grant.
  72. 72. Analyzing GO representation inpathways using an independent library for ontology analysisCombining efforts and information to increase biological understanding
  73. 73. Structuring biological data• Gene Ontology (GO) – Protein function or localization – Hierarchically structured terms – 3 topics (namespaces) • Biological process • Molecular function • Cellular component – Disadvantage • No information on interactions
  74. 74. Structuring biological data• Pathways – Network of interactions – Structural overview of elements in the pathway – Disadvantages: • Missing structure of interacting pathways • Overlap and abundance in pathways
  75. 75. Analysis based on structures• Uses: – Better overview of the data – Increased biological understanding• Challenges in the field: – Difficulty comparing algorithms – Good work may be overlooked – Redundant efforts – Out-of-date algorithms used – Comparison extremely difficult
  76. 76. Goals:• Develop an independent library for ontology analysis in which efforts can be combined• Increase biological understanding by combining knowledge on pathways and gene ontology.
  77. 77. Independent library for ontology analysis• Open source: – Collaboration – Clear view of the algorithm – Free use – Minimalizing redundant efforts• Usable for multiple ontologys and identifiers
  78. 78. Combining Pathways and GO• Display information on the function of the pathway• Make a comparison between pathways• Quality control – Single pathway – List of pathways
  79. 79. Materials• PathVisio – Open source Tool for visualizing and analyzing pathway data• BridgeDb – id mapping framework for bioinformatics• WikiPathways – Community curated pathway data source
  80. 80. Independent Library• Manager input: 1. Ontology Terms (File) 2. Map of term with identifier 3. Method Selection
  81. 81. Methods Id’s linked Genes not to GO linked to GO Id’s in pathway a b a+b Id’s not in pathway c d c+d a+c b+d n
  82. 82. Plug-in• Panel for the analysis of a single pathway – Display GO terms in a table with score – Highlight matches – Save results• Menu Item for analyzing a list of pathways – Select a folder containing pathway files – Individual result files – File containing all results with extra info
  83. 83. Single Pathway analysis
  84. 84. Single Pathway analysis• Regulation of blood pressure• Angiogenesis• Others: – G-protein coupled receptor – proteolysis Homo sapiens: Mus musculus: name score name score G-protein coupled receptor signaling kidney development 50% pathway 35% G-protein coupled receptor signaling regulation of cell proliferation 29% pathway 50% proteolysis 29% response to drug 37% regulation of blood pressure 29% negative regulation of cell proliferation 37% response to drug 29% positive regulation of apoptotic process 37% regulation of vasoconstriction 29% regulation of blood pressure 37% positive regulation of apoptotic process 29% response to salt stress 25% negative regulation of cell growth 23% regulation of systemic arterial blood kidney development 23% pressure by circulatory renin-angiotensin 25% elevation of cytosolic calcium ion arachidonic acid secretion 25% concentration 23% blood vessel development 25%
  85. 85. Multiple Pathway analysis
  86. 86. Multiple Pathway analysis 0 2 4 6 8 10 12 14 16 18Biological Process12 of 105 terms signal transduction xenobiotic metabolic process oxidation-reduction process metabolic process G-protein coupled receptor signaling pathway gene expression nerve growth factor receptor signaling pathway apoptotic process synaptic transmission DNA repair mitotic cell cycle innate immune response 0 10 20 30 40 50 60 70 80Cellular Compontent cytoplasm12 of 26 terms cytosol nucleus plasma membrane membrane integral to membrane mitochondrion nucleoplasm endoplasmic reticulum membrane extracellular region endoplasmic reticulum integral to plasma membrane microsome extracellular space
  87. 87. Goals:• Develop an independent library for ontology analysis in which efforts can be combined• Increase biological understanding by combining knowledge on pathways and gene ontology.
  88. 88. Independent library• Reads GO terms from file• Mapping from term to identifier• Analysis on sample data• Framework enables more methods to be added
  89. 89. Combining Pathways and GO• Single Pathway: – More information on pathway – Quality control possible• Pathway List: – Separate results for every pathway – Enables structuring possibility’s – Quality control possible

×