WikiPathways: how open source and open data can make omics technology more useful

WikiPathways: how open
source code and open data
can make omics technology
more useful
@Chris_Evelo
Department of Bioinformatics - BiGCaT
Maastricht University

Recon2 a Google map of human metabolism

Collaborating Systems Biology and Metabolomics groups: 2013

http://www.wikipathways.org/index.php/Pathway:WP430

http://www.wikipathways.org/index.php/Pathway:WP1601

• XML format (BioPAX)
• RESTful web services
• HTML embed code
• RDF and SPARQL endpoint
http://www.wikipathways.org

Set Early Milestones
• Online (Mar ‘07) Success!
• Firstunknown user (Jan (Jan ’08)
• First unknown user ’08)

400
Number of human pathways
Number of unique human genes

320
240

6,200

160

4,700

80

3,200


2,800
Over 1 million pageviews
by 280,000 unique visitors

1,400

0

~22%


Don’t Try to Change the World
Work with (not against) established:
• First unknown user (Jan ’08)
• Models

• Communities

• Tools and pipelines
• Publishing models

Go Ahead, Change the World
• Tweak established models
••First unknown user (Jan ’08)
Grow communities

• Change perspectives
• everyone is a curator
• knowledge should be open

• Grow communities

• New attribution systems
• redefine “publication”
• redefine “productive”

• Grow communities

• New attribution systems
• New analysis pipelines
• connect with other communitycurated resources

Professional Open Source
• Subversion source repository
• License!
• Development web site
• Bug tracker
• Mailing lists
• Development and Release plans
• Modular (plugins, OSGi)

Key Academic Innovation Partners
About 100 active university collaborations annually

U of Washington
U of Calgary

UC Davis
UCSF
UC Berkeley
Stanford

Technical U of Denmark
Karolinska Inst., Sweden

U of Michigan
CU Boulder

UC San Diego
Baylor U

Yale
MIT
Princeton

U of Kent, UK

Harvard

U of Illinois

Arizona State

Karlsruhe U, Germany

NIBRT, Ireland

Johns Hopkins
U of Maryland

U of Twente, NL

Maastricht U, NL

U of Manchester, UK

Tsinghua U, China

Charles U, Czech Republic

U of Oviedo, Spain

Technion, Israel

Seoul Nat’l U, Korea
Chungnam Nat’l U, Korea
Tohoku U, Japan

U of Alicante, Spain

Hamner Inst.
U of Texas
U of Houston

Southeast U, China

U Teknologi MARA, Malaysia
NUS, Singapore
NTU, Singapore

U of São Paulo, Brazil

U of Queensland

UNICAMP, Brazil
Electronics
Chemical Analysis
Life Sciences

U of Pretoria, S. Africa

Macquarie U

Curtin U.
U of W. Australia
U of Sydney
U Tech, Sydney
RMIT U

Monash U
U of Tasmania

26/11/13
23

Agilent Thought Leader Program
http://www.agilent.com/univ_relation/TLP/index.shtml

TL Network

Thought Leader Awards
Promote fundamental advances in life science
through contribution to research of thought
leaders
• Align societal trends, academic research and
rapidly advancing Agilent measurement platforms:
Synthetic Biology, Structural Biology, OMICS &
Integrated Biology, in vitro Toxicology, and
Environemntal & Food Safety
• Candidates selected based on scientific
leadership, productivity, project significance
(invitational program)
• High-level executive sponsorship and active support
throughout Agilent enable breakthrough research

Early Career Professor Award
Establish strong collaborative
relationships with highly promising
early career professors
2013 focus:
Contributions to cancer diagnostics

26/11/13
24

LC/MS
GC/MS

MassHunter Qual/Quant
ChemStation AMDIS

Microarrays

Feature Extraction

Alignment to Reference Genome
NGS

GeneSpring Platform

Biological
Pathways

Agilent’s Platform for Multi-Omics Data
Analysis

mRNA, miRNA, Exon arrays
GWAS, CNV via SNP arrays

SureSelect Target Enrichment
Whole Genome Sequencing
Proteomics
Metabolomics

mRNA
microRNA
QPCR
Alternative Splicing
GWAS & CNV via SNP arrays

NGS (Next-Gen
Sequencing)

MPP (Mass Profiler Pro)
Proteomics
Metabolomics

PA (Pathway Analysis)

GX (Gene Expression)
DNA-Seq
RNA-Seq
Methyl-Seq
ChIP-Seq
small RNA-Seq


Proteomics
Metabolomics

NGS Analysis Workflow:
1. Align data
2. Load BAM/SAM into GS NGS
3. Measure Gene Expression, Find variants, methyl calls
4. Biological Contextualization (Integrated Genomics, GO, Pathways)

• Import, store, and visualize Agilent
Metabolomics & Proteomics data
(LC/MS, GC/MS)
• Generic file import
• Statistical analysis
• ID Browser for compound identification

NGS (Next-Gen
Sequencing)
Proteomics
Metabolomics


Multi-Omic Analysis
Canonical Pathways
Network Discovery

NGS (Next-Gen
Sequencing)

Proteomics
Metabolomics

• Map and visualize data from one or
two types of –omic data on
pathways
Metabolite Data
Overlay

• Search, browse and filter pathways
Supports pathways from:
•WikiPathways
•BioCyc
•Supported pathway formats
•

List of all pathway entities, dynamically
linked to pathway selection

BioPAX 3 – Pathway Commons, Reactome,
NCI Nature Pathway

•

GPML – PathVisio –custom drawing

Export compound list from pathways

Proteomics

Genomics

Proteomics and Genomics

Import of
WikiPathways

Select
WikiPathways
for analysis

View overlaid
data on
WikiPathways

Examine
Experimental
Data

Export Pathway
Entities

Propose new experiments
based on pathway analysis
• Re-examine acquired untargeted
metabolomics data based on
pathway analysis
• Design new experiments (metabolite,
protein or genes) based on pathway
results interpretation

Build custom metabolite
database

PCDL

Custom microarray or NGS
design

eArray

Targeted MS/MS

Spectrum Mill

Typical workflow used for identification of a
relevant pathway using GeneSpring

Identification of
differential expression

Statistical analysis and
filtering

Curated pathway
analysis using
Wikipathways

Network analysis using
NLP to identify
interaction of pathways

Identification of Candidate Genes

Step 1) Identification of
differentially expressed
genes via hierarchical
cluster analysis

Step 2) Volcano plot of showing
significantly differentially expressed
between two conditions

From Differential Expression to Pathways

Step 3) Significantly changed
pathways in Müller cells identified
using pathway analysis in
GeneSpring

Step 4) The Protein-Protein Interactions
analysis was further performed to identify
the direct interaction of these genes
products in GeneSpring

Pathway analysis showed significant changes in MAPK signaling at both conditions.
Network analysis shows interaction of MAPK with other gene products.
Compare network analysis/extension in Cytoscape.

Combine further with
 Open Knowledge

e.g. IMI semantic web project Open PHACTS
Pathway content and extension
 Open Data
e.g. ISAtab based study capturing
in phenotype database (dbNP)
pathway analysis and profiling

OPS Framework
Architecture. Dec 2011

OPS GUI

App
Framework

Web Service API
Identity &
Vocabulary
Management

Sparql

OPS Data Model
Semantic Data Workflow Engine
RDF Data Cache

Chemistry
Normalisation &
Registration
Descriptor
Feed in WikiPathways
RDF 1
relationships, use BioPAX
to create the RDF
Public
Data 1
Vocabularies

Descriptor

Descriptor

Descriptor

Nanopub

Nanopub

RDF 2

RDF 3

RDF 4

Data 2

Data 3

Data 4

Web
Services

Data
capturing
Using ISA to
connect to the
rest of world

Faculty of Health, Medicine and Life Sciences

Generic Study Capture Framework
Data input / output
GSCF
Templates
Templates
Templates

Events

Molgenis

custom
custom
custom
programs
programs
programs

Protocols

Samples

NCBO
Ontologies

Groups

Assays

web
interface

EBI
repository

Data import
xls, cvs, text

Subjects

custom
custom
custom
dbs
dbs
dbs

dbNP Architecture
GSCF

Subjects

Groups

Events

Transcriptomics module
Raw data
cell files

Clean data
gene
expression

Result data
p-values
z-values

Protocols

Epigenetics module
Samples

Assays

Raw data
Nimblegen
Illumina

Clean
CPG island
data

Resulting
Genome
Feature data

Pathways, GO, metabolite profiles

Body weight, BMI, etc.

Templates
Templates
Templates

Query module

Simple Assay module

Full-text querying
Structured
querying
Profile-based analysis

Study comparison

Use PathVisioRPC to use
WikiPathways
content
Web user interface

Faculty of Health, Medicine and Life Sciences

Thomas Kelder
Martijn van Iersel
Kristina Hanspers
Martina Kutmon
Andra Waagmeester
Chris Evelo
Bruce Conklin

nrnb.org

wikipathways.org
Acknowledgements

WikiPathways: how open source and open data can make omics technology more useful

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to WikiPathways: how open source and open data can make omics technology more useful

Similar to WikiPathways: how open source and open data can make omics technology more useful (20)

Recently uploaded

Recently uploaded (20)

WikiPathways: how open source and open data can make omics technology more useful

Editor's Notes