EnVisioning Pathways

EnVisioning Pathways
EMBRACE-ENFIN workshop, 5-6 October
Rafael Jimenez
rafael@ebi.ac.uk
Updated: 30 September 2009
EnCORE
tutorial
41 slides

Molecular Biology Database resources
Human Genes and
Diseases
13%
Proteomics Resources
1%
Other Molecular
Biology Databases
3%
Immunological
databases
2%
Plant databases
7%
Organelle databases
2%
Human and other
Vertebrate Genomes
8%
Nucleotide Sequence
Databases
9%
RNA sequence
databases
5%
Protein sequence
databases
13%
Structure Databases
9%
, Genomics Databases
non-vertebrate
19%
Metabolic and
Signaling Pathways
9%
Nucleic Acids Research annual
Database Issue and the NAR online
Molecular Biology Database
Collection in 2009. MY Galperin, GR
Cochrane - Nucleic Acids Research,
2008
~1440
resources

Molecular Biology Database resources
• Metabolic and Signaling Pathways
Enzymes and
enzyme
nomenclature
12%
Metabolic pathways
21%
Protein -protein
Interactions
62%
Signaling pathways
5%
Nucleic Acids Research annual Database Issue and the
NAR online Molecular Biology Database Collection in
2009MY Galperin, GR Cochrane - Nucleic Acids Research,
~122
resources

Biological pathway resources
Pathguide
• Categories
Other
4%
Protein -Protein
Interactions
34%
Metabolic Pathways
20%Pathway Diagrams
10%
Transcription Factors
/ Gene Regulatory
Networks
15%
Protein -Compound
Interactions
11%
Protein Sequence
Focused
6%
http://www.pathguide.org
~303
resources

Centralized databases VS In-house databases
DB
GUI
API
WS
Centralized database
A AA A
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
In-house databases
A AA A
A Annotator Database
Graphical User Interface
Application programming interface
Web Services
GUI
API
WS
User Standard protocolSP

Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Tim Hubbard

Many databases VS Federation
DB
GUI
API
WS
DB DB DB
SP SP SP SP
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
Many databases Federation
Database Graphical User InterfaceGUI User Standard protocolSP

Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Integration of

Data integration
• Combining data residing in different sources
• … providing users with a unified view of these data.
Main objective Requires
• Share
• Compare
• Unify
– Data from the same domain
– Data from different domains
• Federated systems
• Standard formats
• Mapping tools
• Ontologies

Data integration
• Federated systems
– DAS
– PSICQUIC
– …
• Standard formats
– DAS
– PSI-MI
– BioPAX
– SBML
– CellML
– …
• Ontologies
– OLS
– …
• Mapping tools
– PICR
– Uniprot API
– Ensembl API
– DAS
– Biomart
– …
• Integration systems
– Biomart
– EnCORE
– …

Standards development – international collaborations
Genome annotation
www.geneontology.org
Genome annotation
www.geneontology.org
Microarray and Gene
Expression Data (MGED)
www.mged.org
Microarray and Gene
Expression Data (MGED)
www.mged.org
Protein sequence
www.uniprot.org
Protein sequence
www.uniprot.org
HUPO-
Proteomics
Standards
Initiative (PSI)
Psidev.sf.net
HUPO-
Proteomics
Standards
Initiative (PSI)
Psidev.sf.net
Protein structure
www.wwpdb.org
Protein structure
www.wwpdb.org
Cheminformatics
www.ebi.ac.uk/chebi
Cheminformatics
www.ebi.ac.uk/chebi
Pathways
www.reactome.org
www.biopax.org
Pathways
www.reactome.org
www.biopax.org Systems modelling
standards
www.sbml.org
Systems modelling
standards
www.sbml.orgMetabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
Metabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
Genomics Standards Consortium (GSC)
gensc.org
Genomics Standards Consortium (GSC)
gensc.org
Nucleotide sequence
www.insdc.org
Nucleotide sequence
www.insdc.org

The Distributed Annotation System, 2001 Dowell et al;
BMC Bioinformatics. 2001; 2: 7. Published online 2001 October 10.
DAS, Architectural Overview
illustration

DAS implementation
Service
broker
Service
consumer
Service
provider
Service
Contract
...
...
Interact
PublishFind
DAS
...
...
...
DAS
Registry
DAS Clients
Annotation
sources
Reference
source
Alignment
sources
Alignment
sources
Alignment
sources
Annotation
sources
Annotation
sources
DAS Clients
DAS Clients
Protocol
Service Oriented Architecture
… 657 sources!

DAS servers and data types
Genome sequence
Sequence alignments
Protein sequence
Protein-protein interaction
Gel 2D
EMAP
3DM
Protein structure
Protein structure
EMAP
3DM
Protein-protein interaction
Protein structure
Gel 2D
Mass spectrometry
Epigenetics
Phenotype
Functional genomics
Structural genomics
Protein sequence
Alignment servers Annotation servers Reference servers

DAS clients
Protein
sequence
Protein-protein
interaction
Protein
structure
Genome
sequence
Sequence
alignment
EMAP

PSICQUIC
based on the PSI-MI standard for molecular interactions
….….
….....
….….
….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC servers
Annotation error
Client

PSICQUIC implementation
Service
broker
Service
consumer
Service
provider
Service
Contract
...
...
Interact
PublishFind
Service Oriented Architecture
PSI-MI
...
...
...
PSICQUIC
Registry
DAS ClientsDAS ClientsPSICQUIC
Clients
Format
PSICQUIC
sources
PSICQUIC
sources
PSICQUIC
sources

ENFIN Network of Excellence
• Brings together
experimentalists and
computational biologists to
develop the next generation of
informatics resources for
systems biology
• Funded by the European
Commission within its FP6
programme under the
thematic area ‘Life sciences,
genomics and biotechnology
for health’
• 20 partners in 13 countries
• www.enfin.org

EnCORE
…
Input
EnXML
Output
EnXML
Service
EnCORE WS
• ENFIN Platform to enable mining data across various
domains, sources, formats and types
• Integrates database resources and analysis tools across
different disciplines

Diverse service world
SOAP, REST,
Java API, Perl
API, FTP,
GUI, …
External data sources
Different formats
Access interfaces
User
?integration
• Multiple manual connections
• Multiple technologies
• Multiple result files which have to be combined manually
• Much work to reproduce
XML, CSV,
Plain Text,
JSON, …

Standardised EnCORE world
Heterogeneous
external world
Standardised
EnCORE world
EnXML
EnCORE services
EnVISION pages
API, WS access
Standard EnXML format
User
input output

Standardised EnCORE world
EnCORE services
EnCORE workflows
EnVISION pages
WS, API
WS
API
Web interface
21

EnCORE services
From Inputs to Outputs
Positive Negative
Input/Query
Output/Results
Program/Service
EnCORE dataset
EnCORE
results
EnCORE webservice
• Enfin-IntAct
• Enfin-PRIDE
• Enfin-Affy2UniProt
• Enfin-PICR
• Enfin-Reactome
• Enfin-ArrayExpress
• Enfin-UniProt
• Enfin-BioModels
• Enfin-KEGG
• Enfin-G:GOSt
• Enfin-CellMINT
• Enfin-DOMAINATION
• Database IDs
• Sequences
• Experiment: Identifies the result
• Sets: Contains the structure of the result
• Molecules: Includes the results
• Features: Describe details of the result

EnCORE services
Example
Positive Negative
Input/Query
Output/Results
Program/Service
EnCORE dataset
EnCORE
results
EnCORE webservice
• Encore webservice
Enfin-IntAct
• Database ID (Uniprot ID)
P37173
• Experiment: ID4
• Sets: (1)EBI-296235, (2)EBI-1033040, (3) EBI-
902913, EBI-902937, (4) EBI-296166, EBI-296246,
(5)EBI-902913
• Molecules: (1)O35613, (2)P10600, (3)P07200,
(4)Q9UER7, (5)Q99K41
• Features: No features

EnCORE services
Example (Result on a table)
Interactor A Interactor B Interaction IDs
1 P37173 O35613 EBI-296235
2 P37173 P10600 EBI-1033040
3 P37173 P07200 EBI-902913, EBI-902937
4 P37173 Q9UER7 EBI-296166, EBI-296246
5 P37173 Q99K41 EBI-902913
Input/Query
Output/Results
Program/Service
Enfin-IntAct
P37173

EnCORE services
Building workflows
Input Result Positive result Negative resultWebservice Input selection

Envison interface (example)
• Results for Pride, Uniprot, Intact, Reactome, CellMint, PICR, Biomodels, …
http://www.ebi.ac.uk/~rafael/enfin/presentations/EnVISION2_01.ppt
http://www.enfin.org/dokuwiki/
EnCORE
tutorial

EnVISION Pathways result
Positive results
Negative results
Representation in a
Pathway map

EnVISION dataset representation in Reactome

~303
resources
Integration of information from the same
Molecular Biology Domain
Domain 5 Domain …Domain 4
Domain 2 Domain 3Domain 1

Adapting EnCORE to
Standards and Federation
Domain 1
Federated systems / Standards
EnVISION pages
WS
WS
Web interface
EnCORE wrapper

Adapting EnCORE to

Adapting EnCORE to
• Integration of sources.
• Filtering redundancy (whenever possible)
• Interconnect results.

Predefined workflows
Run different services
on the same input
Use the output of one
service as an input of
another service
EnVISION EnVISION2

Predefined workflows and automated workflows
• “Semantic Web” promises to use data sources and
analysis tools to automatically build workflows that make
sense to satisfy users’ requests.
• Early stage of “Semantic Web”, not a practical solution to
apply on our workflows.
• Useful workflows require users to go though each step of
the workflow.
• Our problem using predefined workflows:
– Explosion of results.
– Workflow configuration is subjective.
– We could come up with multiple predefined combinations.
– Limitations to define its configuration.

User selection based workflow
Query
Results
EnCORE WS
Positive Negative
User
selection
Query
Results
EnCORE WS
Positive Negative
User selection
EnCORE WS
Negative
Positive
Positive Negative
Results
EnCORE WS
1 2
3

Biological pathway resources
Pathguide
• Data Access Methods
0 50 100 150 200 250
Browsing / Canned queries
Keyword searches
Download in other format
Download in BioPAX format
Download in PSI format
Download in SBML format
SQL queries
Download in CellML format
Standards

Conclusions
• Data integration
– Adopting standards formats
– Building a federated system of sources
– Describing data with ontologies
– Using standard identifiers
– Mapping references from different domains

EnVisioning Pathways

Recommended

Recommended

More Related Content

Similar to EnVisioning Pathways

Similar to EnVisioning Pathways (20)

More from Rafael C. Jimenez

More from Rafael C. Jimenez (20)

Recently uploaded

Recently uploaded (20)

EnVisioning Pathways

Editor's Notes