Using ontologies to do integrative systems biology

Using ontologies to do integrative
systems biology

Chris Evelo
Department of Bioinformatics - BiGCaT
Maastricht University
@Chris_Evelo
chris.evelo@maastrichtuniversity.nl

Typically we want to:

• Find studies.
• Process data.
• Integrate.
• Evaluate.
• Combine with yet
other data.

Faculty of Health, Medicine and Life Sciences

Systems Biology Issues:
• Environment
• Multi-compartment
• Different levels of gene expression cascade
(multi-omics)

Needs:
• Link information from different analysis
techniques
• Combine many studies (store study design)


Using ISA to
be able to
find studies
http://dx.doi.org/10.1038/ng.1054


Why a study capturing application?

New studies can be performed based on old data

Translational comparisons (mouse, human, rat etc)

Structured storage

Facilitate collaborations between groups
- Data sharing on joined project
- Start a collaboration

What do we need to accomplish this

Acceptance
- Using standards (e.g. ISA-TAB & MAGE-TAB)
- User friendly (interface via web browser)
- Open source
- Examples

Collaboration
- Ontologies
- Security of data (log-in and store data locally)
- Open source (make own module)

dbXP: a total study capturing solution

Simple assay module Metabolomics module

Web input Study capturing module Web output

Feature layer
Transcriptomics module Any new module

dbNP Architecture
GSCF Simple Assay module Query module
Body weight, BMI, etc.

Pathways, GO, metabolite profiles
Templates
Templates
Templates
Transcriptomics module Full-text querying
Clean data Result data
Raw data
Subjects Groups gene p-values
cell files Structured
expression z-values
querying

Events Protocols Profile-based analysis

Epigenetics module
Raw data Clean Resulting
Samples Assays Nimblegen CPG island Genome Study comparison
Illumina data Feature data

Web user interface


Generic Study Capture Framework
Data input / output
GSCF
Templates
Templates
Templates

Subjects Groups

xls, cvs, text
Data import
NCBO web
Events Protocols
Ontologies interface

Samples Assays

custom
custom
custom custom
custom
Molgenis programs
programs EBI custom
programs dbs
dbs
repository dbs

Used in European Projects

Food4me (Dublin)

NU-AGE (UNIBO, Bologna)

Bioclaims (UIB, Palma)

Nutritech (TNO, Zeist)

EuroDish (WUR, Wageningen)

ITFoM (proposed for metabolic syndrome studies)

Process the data…


Epigenetics DNA Methylation Pipeline

Raw data R
Nimblegen QC, processing Clean
DNA Result
Raw data R methylation data
Illumina QC, processing data Statistical with
(Genome analysis p-values
Feature (GFF)
Raw sequencing data Sequence Format)
MeDIP, BIS-Seq QC, processing

Connecting to Pathways:
1) Prepare data for pathway analysis

2) Connect processing pipelines
PathVisioRPC used from arrayanalysis.org
see: http://pathvisiorpc.wordpress.com

3) Store Pathway profiles as vectors,
Using pathways themselves as a vocabulary
C Evelo, K van Bochove & J Saito. Genes Nutr (2011) 6: 81-87Answering
biological questions - querying a systems biology database for
nutrigenomics

4) Allow queries for studies with same outcome


Integrate

Example
WikiPathway Pathway
Pathway on glycolysis.
Using modern systems
iology annotation.

And genes and
metabolites connected
to major databases.


Find the pathways:
Biological processes in duodenal mucosa affected by glutamine administration

number of genes
Pathway Changed Up Down Measured Total Z Score

Hs_Mitochondrial_fatty_acid_betaoxidation 6 6 0 16 16 4.456
Hs_Electron_Transport_Chain 17 17 0 85 105 4.278
Hs_Fatty_Acid_Synthesis 5 5 0 21 22 2.757

Hs_Fatty_Acid_Beta-Oxidation 6 6 0 31 32 2.424
Hs_mRNA_processing_Reactome 16 6 10 118 127 2.402

Hs_Unsaturated_Fatty_Acid_Beta_Oxidation 2 2 0 6 6 2.342
Hs_HSP70_and_Apoptosis 4 4 0 18 18 2.299
Hs_Oxidative_Stress 5 5 0 27 28 2.097
Hs_Fatty_Acid_Omega_Oxidation 3 3 0 14 15 1.915
Hs_Proteasome_Degradation 8 8 0 60 61 1.629
Hs_RNA_transcription_Reactome 5 5 0 38 40 1.25
Hs_Irinotecan_pathway_PharmGKB 2 1 1 12 12 1.154
Hs_Synthesis_and_Degradation_of_Ketone_Bodie
s_KEGG 1 1 0 5 5 1.023

Connecting to
other data

We both need
Study Capturing


If the mountain will not
come to Mahomet,
Mahomet must go to
the mountain.

Other repositories (like
dbXP!) have better
study descriptions.
Integrate in Sage
Synapse.

Pathway visualisation
missing: integrate
PathVisio in Synapse
(started).


PathVisio
www.pathvisio.org

• Data modeling and visualization on biological pathways
• Uses gene expression, proteomics and metabolomics data
• Can identify significantly changed processes
Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers, Susan Coort, Bruce R Conklin, Chris
Evelo (2008) Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399

Understanding
genomics

Example
WikiPathways Pathway
Pathway on glycolysis.
Using modern systems
biology (MIM) annotation.

And genes and metabolites
connected to major
databases.


adding data =
adding colour

Example
PathVisio result
Showing proteomics
and transcriptomics
results on the glycolysis
pathway in mice liver
after starvation.
[Data from Kaatje
Lenaerts and Milka
Sokolovic, analysis by
Martijn van Iersel]


Download Pathways
Web services

SPARQL endpoint

Backpages link to databases


BridgeDb
http://dx.doi.org/10.1186/1471-2105-11-5

Martijn van Iersel
BiGCaT Maastricht

Problem: Identifier Mapping
Entrez Gene
3643

?
Agilent probeset
A65_P12450

Solution: Built-in Mapping
• Generic
bioinformatics
platforms should
have identifier
mapping built-in.

BioConductor
PathVisio
Cytoscape
...
Batteries
Included

Problem: Which mapping service?

• Ensembl Biomart
• Synergizer
• CRONOS
• DAVID
• AliasServer
• MatchMiner
• OntoTranslate
or
• Local database

BridgeDB: Abstraction Layer
class
IDMapperRdb

relational database

interface
IDMapper class
IDMapperFile

tab-delimited text

class
IDMapperBiomart

web service

The BridgeDb Framework: Standardized Access to Gene, Protein and Metabolite Identifier
Mapping Services. Martijn P van Iersel, Alexander R Pico, Thomas Kelder, Jianjiong Gao, Isaac Ho,
Kristina Hanspers, Bruce R Conklin, Chris T Evelo. BMC Bioinformatics 2010, 11: 5.

CyThe- Network
saurus Merge Wiki
Tools PathVisio
Pathways
Cytoscape Plugins

BridgeDb
Internet webservices
Local Tab-
Mapping
BridgeDb Databas delimited
Services BioMart PICR - e text files
REST

BridgeDb interface
1: JAVA interface 2: REST interface

API Overview
BridgeDb.connect(...)
IDMapper.mapID(...)

Xref.getUrl()
DataSource.getUrl()

REST API
http://webservice.bridgedb.org/Human/xrefs/L/1234

ILMN_1713029 Illumina
3255967 Affy
NP_001025186 RefSeq
IPI00005930 IPI
GO:0042752 GeneOntology
NM_033282 RefSeq
3255968 Affy
94233 Entrez Gene
ENSG00000122375Ensembl Human
234226_at Affy
A6NEB4 Uniprot/TrEMBL
0001780601 Illumina
GO:0008020 GeneOntology
606665 OMIM
A_23_P24234 Agilent
14449 HUGO

REST API
http://<Base URL>/<Species>/<function> [ /<argument> ... ]

http://webservice.bridgedb.org/Human/xrefs/L/1234
http://webservice.bridgedb.org/Human/search/ENSG00000122375
http://webservice.bridgedb.org/Human/attributeSet
http://webservice.bridgedb.org/Human/properties
http://webservice.bridgedb.org/Human/targetDataSources
http://webservice.bridgedb.org/Human/attributes/L/3643
http://localhost:8183/Human/xrefs/L/3643

Problem: Custom Microarrays

?
Custom probe
#QXZCY!34

Solution: Stacking

EnsMart
Custom
table

MIRIAM and Identifiers.org

Regular
expression for
autodetection Pattern for
generating URLs

Link to
documentation

Availibility

BMC Bioinformatics. 2010 Jan 4;11(1):5.
www.bridgedb.org

www.helixsoft.nl/blog bridgedb-discuss@googlegroups.com

Innovate using BridgeDB

Data

Metabolite

Flux

Visualizing fluxes on metabolic pathways 46

Integrating it all
Visualizing fluxes, data and annotation

Extending pathways, how to do it?


Network approaches to extend pathways
E.g. most pathways don’t have miRNA’s

Pathway Loom, weaving pathways


PathVisio RI plugin provides backpage info

microRNAs in pathway analysis. The Regulatory Interaction plugin offers a suitable middle-ground between not including any
miRNAs in pathways, which misses this regulatory information, and including all validated miRNA-target interactions, which
clutters the pathway. After loading interaction file(s), selecting a pathway element shows the interaction partners of this
element and their expressions in a side panel. This allows for the detection of potential active regulatory mechanisms in the
study at hand.
http://www.bigcat.unimaas.nl/wiki/images/f/f6/VanHelden-poster-nbic2012.pdf

Or consider pathway as a network


GPML Cytoscape Plugin
http://www.pathvisio.org/wiki/Cytoscape_plugin

Cytoscape visualization used to group

PPS1
Liver
All pathways
Pathways with high z-score
grouped together.

Explains why there are
relatively few significant
genes, but many pathways
with high z-score.

Robert Caesar et al (2010) A combined transcriptomics and lipidomics analysis of subcutaneous,
epididymal and mesenteric adipose tissue reveals marked functional differences. PLoS One 5: 7. e11525
http://dx.doi.org/doi:10.1371/journal.pone.0011525

Explore pathway interactions

Thomas Kelder, Lars Eijssen, Robert Kleemann, Marjan van Erk, Teake Kooistra, Chris Evelo
(2011) Exploring pathway interactions in insulin resistant mouse liver BMC Systems Biology 5: 127
Aug. http://dx.doi.org/doi:10.1186/1752-0509-5-127

What we used
Non-redundant shortest paths in a weighted
graph.

1. A set of pathways
2. An interaction network
3. Weight value for all edges
= experimental expression of connected
genes.

Pathway interactions and what causes them

An indirect interaction between the Axon Guidance and Insulin Signaling pathways in the network for
the comparison between HF and LF diet at t = 0. Left: Network representation of the identified path
between the two pathways, consisting of three proteins Gsk3b, Sgk3 and Tsc1. Right: The location of these
proteins in the KEGG pathway diagrams. The newly found indirect interactions have been added in red.

Pathway interactions and
detailed network visualization
for the interactions with three
apoptosis related pathways for
the comparison between HF and
LF diet at t = 0. A: Subgraph of the
pathway interaction network, based
on incoming interactions to three
stress response and apoptosis
pathways with the highest in-
degree. Pathway nodes with a thick
border are significantly enriched (p
< 0.05) with differentially expressed
genes. B: The protein interactions
that compose the interactions
between the three apoptosis
related pathways and their
neighbors in the subgraph as
shown in box A (see inset, included
interactions are colored orange).
Protein nodes have a thick border
when their encoding genes are
significantly differentially expressed
(q < 0.05).

We tried to make it easier with

The CyTargetLinker Cytoscape Plugin
Extending pathways on the fly.

Provided databases with the plugin:
• miRNAs with targets
• Transciption Factors with targets
• Drug – Target Interactions
• ENCODE derived databases

Extend with your own.

MiRNAs of Interest
miRNA target information from mirTarBase

miRTarBase as a target interaction network

Collection of miRNA-target gene interactions in the miRTarBase database with 1,715 genes,
286 miRNAs and 2,817 interactions.

miRNAs associated with colorectal cancer
extended with validated target genes

human ErbB signaling pathway extended
with validated microRNA regulation

OPS Framework
OPS GUI Architecture. Dec 2011

App
Framework

Web Service API Sparql Web
Services
OPS Data Model
Identity &
Vocabulary
Management Semantic Data Workflow Engine

RDF Data Cache

Chemistry
Normalisation &
Registration Descriptor Descriptor

Descriptor Descriptor Nanopub Nanopub
Feed in WikiPathways
RDF 1
relationships, use BioPAX RDF 2 RDF 3 RDF 4
to create the RDF
Public
Vocabularies Data 1 Data 2 Data 3 Data 4

Well yes, for Open PHACTS we do…

OPS Data Model
Identity &
Vocabulary
Management Semantic Data Workflow Engine

Chemistry
Normalisation &
Registration
Descriptor Descriptor

RDF 1 RDF 2
Public
Vocabularies Data 1 Data 2

But really…,
what about federated SPARQL queries?


RDF 1 RDF 2
Other
Public
Vocabularies Data 1 Data 2 Public
Vocabularies

Most often partly…
If the vocabularies used are different linking just database IDs not good enough.

We need full mappings of ontologies.
Identification of overlapping modules.

And maybe… Suggestions for ontologies to use in specific field.

Identity
Mapping


RDF 1 RDF 2
Other
Public
Vocabularies Data 1 Data 2 Public
Vocabularies

Thanks!
WikiPathways team:
• Martijn van Iersel (PathVisio,
BridgeDB)
• Thomas Kelder (WikiPathways,
networks)
• Alex Pico (US team leader)
• Brice Conklin (former US team leader)
• Kristina Hanspers (US curation)
• Martina Kutmon (CyTargetLinker)
• Susan Coort (Regulatory plugins)
• Lars Eijssen (Data pipelines)
• Anwesha Dutta (Flux visualisation)
• Andra Waagmeester (LOOM)
• Egon Willighagen (Open Phacts)

Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional:
Transnational University. EU: NuGO and Microgennet,
IMI: Open Phacts + Agilent thought leader grant and
NIH.

Thanks!

Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional:
Transnational University. EU: NuGO and Microgennet,
IMI: Open Phacts + Agilent thought leader grant.

Analyzing GO representation in
pathways using an independent
library for ontology analysis

Combining efforts and information to
increase biological understanding

Structuring biological data
• Gene Ontology (GO)
– Protein function or
localization
– Hierarchically structured
terms
– 3 topics (namespaces)
• Biological process
• Molecular function
• Cellular component

– Disadvantage
• No information on interactions

Structuring biological data
• Pathways
– Network of interactions
– Structural overview of elements in the
pathway
– Disadvantages:
• Missing structure
of interacting
pathways
• Overlap and
abundance in
pathways

Analysis based on structures
• Uses:
– Better overview of the data
– Increased biological understanding

• Challenges in the field:
– Difficulty comparing algorithms
– Good work may be overlooked
– Redundant efforts
– Out-of-date algorithms used
– Comparison extremely difficult

Goals:
• Develop an independent library for ontology
analysis in which efforts can be combined

• Increase biological understanding by
combining knowledge on pathways and gene
ontology.

Independent library for ontology
analysis
• Open source:
– Collaboration
– Clear view of the algorithm
– Free use
– Minimalizing redundant efforts
• Usable for multiple ontology's and identifiers

Combining Pathways and GO
• Display information on the function of the
pathway
• Make a comparison between pathways
• Quality control
– Single pathway
– List of pathways

Materials
• PathVisio
– Open source Tool for visualizing and analyzing
pathway data
• BridgeDb
– id mapping framework for bioinformatics
• WikiPathways
– Community curated pathway data source

Independent Library
• Manager input:
1. Ontology Terms
(File)
2. Map of term with
identifier
3. Method Selection

Methods

Id’s linked Genes not
to GO linked to GO

Id’s in
pathway a b a+b
Id’s not in
pathway c d c+d
a+c b+d n

Plug-in
• Panel for the analysis of a single pathway
– Display GO terms in a table with score
– Highlight matches
– Save results

• Menu Item for analyzing a list of pathways
– Select a folder containing pathway files
– Individual result files
– File containing all results with extra info

Single Pathway analysis
• Regulation of blood pressure
• Angiogenesis
• Others:
– G-protein coupled receptor
– proteolysis
Homo sapiens: Mus musculus:
name score name score
G-protein coupled receptor signaling kidney development 50%
pathway 35% G-protein coupled receptor signaling
regulation of cell proliferation 29% pathway 50%
proteolysis 29% response to drug 37%
regulation of blood pressure 29% negative regulation of cell proliferation 37%
response to drug 29% positive regulation of apoptotic process 37%
regulation of vasoconstriction 29% regulation of blood pressure 37%
positive regulation of apoptotic process 29% response to salt stress 25%
negative regulation of cell growth 23% regulation of systemic arterial blood
kidney development 23% pressure by circulatory renin-angiotensin 25%
elevation of cytosolic calcium ion arachidonic acid secretion 25%
concentration 23% blood vessel development 25%

Multiple Pathway analysis
0 2 4 6 8 10 12 14 16 18
Biological Process
12 of 105 terms signal transduction
xenobiotic metabolic process
oxidation-reduction process
metabolic process
G-protein coupled receptor signaling pathway
gene expression
nerve growth factor receptor signaling pathway
apoptotic process
synaptic transmission
DNA repair
mitotic cell cycle
innate immune response

0 10 20 30 40 50 60 70 80
Cellular Compontent
cytoplasm
12 of 26 terms cytosol
nucleus
plasma membrane
membrane
integral to membrane
mitochondrion
nucleoplasm
endoplasmic reticulum membrane
extracellular region
endoplasmic reticulum
integral to plasma membrane
microsome
extracellular space

Independent library
• Reads GO terms from file
• Mapping from term to identifier
• Analysis on sample data
• Framework enables more methods to be
added

Combining Pathways and GO
• Single Pathway:
– More information on pathway
– Quality control possible
• Pathway List:
– Separate results for every pathway
– Enables structuring possibility’s
– Quality control possible

Using ontologies to do integrative systems biology

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Using ontologies to do integrative systems biology

Similar to Using ontologies to do integrative systems biology (20)

Recently uploaded

Recently uploaded (20)

Using ontologies to do integrative systems biology

Editor's Notes