SlideShare a Scribd company logo
1 of 68
EBI is an Outstation of the European Molecular Biology Laboratory.
08/23/18
EMBL-EBI Proteomics data resources and services
Rafael JIMENEZ (EBI, Hinxton, UK)
4th Annual Forum for SMEs
Munich, October 18th-19th 2010
Context
Integration, standards and dissemination
Uniprot
Protein Sequences
Reactome
Pathways
IntAct
Interactions
PRIDE
Mass Spec
DAS
PSICQUIC
EnCore
Annotation
Archive
sequence databases
(INSDC)
EMBL
DDBJ
NCBI
interactions
IMEx
IntAct
BIND
DIP
MINT
…
mass spec
ProteomeXchange
PRIDE
PeptideAtlas
GPMDB
Tranche
…
Sharing infrastructures
• Multiple repositories in a particular field
Collaboration and data exchange
More data coverage
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
• … facilitating data comparison, exchange and verification
PSI
4
http://www.psidev.info/
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
• … facilitating data comparison, exchange and verification
PSI
5
• MIAPE: The Minimum Information About a Proteomics Experiment
• Data and metadata from proteomics experiments
• Data: results
• Metadata: data about the data
• Where the samples came from
• How the analysis were performed
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
• … facilitating data comparison, exchange and verification
PSI
6
http://www.psidev.info/
7
PSI-MI
Data format
Data distribution
Control vocabulary
Data submission
Website
Standard format
Tools
PSICQUIC
PSI-MI CV
Reporting guideline MIMIx
Tools
PSI-MI XML
PSI-MITAB
XML Java API
MITAB Java API
XMLMakerFlattener
XML Validator
MIF25_view.xsl
MIF25_compact.xsl
MIF25_expand.xsl
PSI-MI XML files
PSI Excel Sheet
PSI Web Form
Data
Servers
Registry
Clients
• Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of data in
public repositories
• Concentrating on …
• Annotation and representation of published MI data
• Accessibility of MI data to the user community
PSI - Molecular Interactions
Data format
Data distribution
Control vocabulary
MIAPE
Reporting guideline
PSI-MI XML
PSI-MITAB
PSICQUIC
MIMIxPSI-MI CV
http://www.psidev.info/MI
Scoring
PSISCORE
PSI-MI format
• Community standard for Molecular Interactions
• Jointly developed by major data providers: BIND,
CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono,
U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Collecting and combining data from different sources
has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained, e.g.
IntAct data in Cytoscape.
9
psi-mi/xml25psi-mi/xml25 psi-mi/tab25psi-mi/tab25
PSI-MI Control vocabulary
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
MIMIx
• MIAPE document guideline for molecular interactions
• 1. Manuscript information
• 2. Experiment
• 3. Interaction
• 4. Confidence
Data distribution: PSICQUIC
• Proteomics Standards Initiative Common QUery InterfaCe.
• Community effort to standardise the way to access and retrieve data
from Molecular Interaction databases.
• Widely implemented by independent interaction data resources.
• Based on the PSI standard formats (PSI-MI XML and MITAB)
• Not limited to protein-protein interactions, also e.g.
• Drug-target interactions
• Simplified pathway data
• A registry listing resources implementing PSICQUIC
• Documentation: http://psicquic.googlecode.com
PSICQUIC implementation
….….
….....
….….
….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC services
Annotation error
User
PSICQUIC
Registry
PSICQUIC client
08/23/1814
Service
broker
Service
consumer
Service
provider
Service
Contract
...
...
Interact
PublishFind
Service Oriented Architecture
PSI-MI
...
...
...
PSICQUIC
Registry
DAS ClientsDAS ClientsPSICQUIC
Clients
Format
PSICQUIC
sources
PSICQUIC
sources
PSICQUIC
sources
PSICQUIC implementation
• PSICQUIC Server (SOAP/REST web service)
• PSICQUIC Registry
• PSICQUIC Clients
• PSICQUIC view
• Citoscape
• Envision2
• …
PSICQUIC
Registry
• 13 sources
• 14.665.530
interactions
http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS
PSICQUIC example: REST queries
Bruno Aranda (baranda@ebi.ac.uk)
http://mint.bio.uniroma2.it/mint/psicquic/webservices/current/search/query/p53
http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/p53
http://www.ebi.ac.uk/Tools/webservices/psicquic/chembl/webservices/current/search/interactor/p53
1
2
3
PSICQUIC example: MIQL
Bruno Aranda (baranda@ebi.ac.uk)
• Molecular Interaction Query Language
PSICQUIC example: MIQL
…/query/specie:rat
…/query/brca AND rpa1
• Terms
• Fields
• Operands
19
PSICQUIC client
20
PSICQUIC clustering
21
PSISCORE
22
22
PSISCORE
Scoring algorithms
offered by
PSISCORE servers
23
23
PSISCORE
Scoring algorithm
description, provided
by scoring server /
registry
Examplary visualization
of a scoring algorithm
with a 0-1 range
Scoring algorithms
offered by
PSISCORE servers
IMEx website http://www.imexconsortium.org/
IMEx: The International Molecular Exchange Consortium
• Group of major public interaction data providers sharing
curation effort: DIP, IntAct, MINT, MPact, MatrixDB, MPIDB and BioGRID
• Independent molecular interaction resources
• Common curation standards for detailed curation
• Common data formats (PSI-MI XML, PSI-MITAB, PSICQUIC)
• Common accession number space
• Coordinated & non-redundant curation
• In production mode since February 2010
• Since 3/2009 supported by the European Commission
under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature,
Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)
Imex.sf.net
IntAct
• Freely available, open-source database system
• Public repository of molecular interactions
• Interactions manually curated and reviewed by experts
• Interaction derived from literature or direct user submissions
• Topic centric datasets (eg. Cancer, Chromatin, MSD…)
• Analysis tools for interaction data
• EBI database (part of the IMEx consortium and the PSI-MI)
• Data updated every week: ftp://ftp.ebi.ac.uk/pub/databases/intact
• Data formats available:
http://www.ebi.ac.uk/intact
IntAct statistics
IntAct statistics
• Interactions by identification method
• ~70% Y2H
• ~25% Affinity purification
• ~3% Physical data
• ~2% Other methods
IntAct statistics
IntAct: Search and results
Export
Custom columns
Filters
More results
(PSICQUIC)
IntAct
32
PSI-MSS PSI-MS
PSI-PI
Data format
Tools
Standard format
Reporting guideline MIAPE-MS
mzML
TraML
- ProDaC
- OpenMS/TOPP
- ProteoWizard
- Proteios
- TPP
- X!Tandem
- Myrimatch
- InSilicoSpectro
- NCBI C++ toolkit
- Mascot
Validation, analysis, exporters, viewers , ...
- Phenyx
- PEAKS
- mzML_Exporter
- CompassXport
- Insilicos Viewer
-Jmzml
- Pride Inspector
- Pride Converter
…
Control vocabulary PSI-MS
Data format
Tools
Standard format
Reporting guideline MIAPE-MSI
mzIdentML
mzQuantML
- mzIdentML validator
- Mascot
- OMSSA
- Peaks
- Phenyx
- PLGS
- ProCon
- ProteinPilot
- ProteinScape
- SEQUEST
Validation, analysis, exporters, viewers , ...
- SpectraST
- Spectrum Mill
- X!Tandem
- OpenMS/TOPP
- Scaffold
- TPP
- Mascot Integra
- MIAPE MSI exporter
- CSV exporter
…
Tools
Data
Website
Pride Inspector
Pride Converter
Pride Biomart
Pride QProjects
PICR
OLS
• Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of data in
public repositories
• Concentrating on …
• Annotation and representation of published MS data
• Accessibility of MS data to the user community
PSI - Mass Spectrometry Standards
Individual
proteins
Peptides
Protein
mixture
Peptide
Mass
Separation 2D-SDS-PAGE
Spot Cutting
Digestion
Trypsin
Mass Spectroscopy MALDI-TOF
Database
search
mzMLmzML
mzIdentMLmzIdentML
Protein
identification
Quantification
mzQuantMLmzQuantML
Protein
quantification
mzXMLmzXML
mzDatamzData
analysisXMLanalysisXML
PSI-MS Controlled vocabulary
34
• Share by PSI-MSS and PSI-PI
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
MIAPE
PSI-MS PSI-PI
ProteomExchange website
36
http://www.proteomeexchange.com
ProteomExchange:
Enhancing Cooperation of Proteomics Data Repositories
• Group of major public Mass Spec data providers
• Single point of submission to proteomics repositories
• Encourage data exchange
• Common data formats (mzML, mzIdentML, mzQuantML)
• Common accession number space
• Coordinated & non-redundant data
• Since 2010 supported by the European Commission
38
Secondary resources
Data reprocessing and notification
Journals
Wiley
Proteomics
NBT
JPR
MCP
Standards Local data management systems
mzQuantML
Release 1 Release 2 Release 3
ProHITS
MS-Lims
ProCon
Phenyx
OmicsHub
Other
LIMS
Pride Converter
Repositories
Pride
Metadata,
Results
mzML
mzIdentML
Peptide
Atlas
Uniprot
NIST
Spectrum
libraries
……
Implementedin
Data submission
RSS
feed
Central
Dataset
Look-up
Service
MIAPE
validation
Accession
Number/
reviewer login
Notification
Reprocessing notification
Tranche
Raw
data
Peptidome
Metadata,
Results
xref xref
Data release / publications
Proposal structure
http://www.ebi.ac.uk/pride
The Proteomics Identifications Database
• Centralized, standards compliant, public data repository
for proteomics identifications
• Open source
• Open data
• > 100.000.000 spectra
• ~ 4.000.000 protein identifications
• Detailed annotation of meta-data
• Vizcaíno JA, Côté R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L.
A guide to the Proteomics Identifications Database proteomics data repository.
Proteomics. 2009 Sep;9(18):4276-83.
PMID: 19662629
PRIDE data content
40
Release of PRIDE Converter
Protein IDs Peptide IDs
PRIDE data content
PRIDE Website
PART_OF
Search by
• Experiment
• Protein id
• Ontology
PRIDE Website
• Results
• Peptide IDs
• Protein IDs
• Mass spectra as peak lists
• Metadata - experiment
• Analysis
08/23/1844
BioMart – System Overview
ATGCTGTTGTGC
ATGCTGGACTGG
ATGGCCCGATGG
ATGCTGTTGTGC
ATGCTGGACTGG
ATGGCCCGATGG
Source data
(MySQL, Oracle, Postgres)
DB
Mart
Bert Overduin
45
PRIDE Biomart
1. Filter 2. Attributes
3. Results
http://www.ebi.ac.uk/pride/prideMart.do
http://www.ebi.ac.uk/pride/biomart/martservice?query= XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0"
uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "pride" interface = "default" >
<Filter name = "experiment_ac" value = "1632"/>
<Attribute name = "submitted_accession" />
</Dataset>
</Query>
Easy programmatic access!
08/23/1846
What is OLS?
• A unified, single point of query for over 69 ontologies
(updated daily) and upwards of 850,000 terms.
• A tool that offers online and programmatic access to query
ontologies about:
• Term names
• Synonyms
• Relationships
• Annotations
• Cross-references
• Reusable code components to integrate such functionality
in other projects
http://www.ebi.ac.uk/ontology-lookup/
Richard Cote
08/23/1847
Ontology browsing
08/23/1848
Why do you need ID mapping
• Merging datasets to a common identifier space
• Finding all aliases/synonyms for an identifier
• (data integration – submissions!)
• Mapping from secondary IDs to more recent primary IDs
• (data “freshness”)
• Preparing data sets for specific tools
• Querying in various primary databases
• (data format requirements)
Richard Cote
08/23/1849
Protein identifier mapping is hard
• The basic problem: the same protein sequence is referred to by
multiple accession numbers assigned by multiple databases.
• No universal identifier scheme
• Redundant databases – multiple identifiers for the same sequence in
the same database
• Unstable identifiers (ex: gi numbers)
• Obsolete and deleted identifiers (hypothetical proteins)
• Different production cycles for major databases
• Tools exist, but are limited in important their database and
species coverage and in their usability and availability. Richard Cote
08/23/1850
PICR Home Page
Submit accessions
OR sequences
(FASTA) with 500
entry interactive limit
(no batch limit)
Select output format
Select one or
many databases
to map to in one
request
Limit search by
taxonomy
(pessimistic)
Choose to return
all mappings or
only active ones
Run
search
Richard Cote
08/23/1851
PICR Result Page – simple view
Logical xref
(hyperlinked)
Inactive xref
Secondary
Identifier
Active xref
(hyperlinked)
Richard Cote
Pride inspector
• Open mzML and PRIDE XML files
• Browse the PRIDE database
• Facilitate publication reviews
08/23/1853
DAS, The Distributed Annotation System
The Distributed Annotation System is…
• A network of biological data sources
• A Service Oriented Architecture (SOA)
• RESTful web service
• An example of federation
• Uniform access to multiple repositories of biological data.
• Repositories distributed in different geographical locations.
The DAS Protocol is…
• An integration platform
• A client-server protocol
• An agreed standard for web services Andy Jenkinson
54
74 Protein DAS sources!
PRIDE
DAS 1.6
Pride & Dasty3
Protein client
At present, PRIDE is a data repository to support publications
-We rely 100% on data submitted by researchers
-It is not possible to determine which data is good and which is not
that good
-This makes not possible that data in PRIDE can be used as
supportive evidence for protein existence in UniProt
-Data can not be reused by other resources either…
PRIDE-Q: Why?
Project funded by the Wellcome Trust (5 years):
Added value: High-quality data will go to a new resource: PRIDE-Q
Data exports:
•Links, DAS track for all PRIDE data
•Quality controlled, e.g. “Protein Existence”, Expression Atlas from PRIDE-Q
PRIDE-Q *
Curation
Automated rules,
Curator override
PRIDE-Q
Reactome
•Human pathway knowledgebase
•Manually curated
•Open source, open data
•Collaboration between EBI, OCRI and
NYU
•Online since 2003
•Matthews L, et al: Reactome knowledgebase of
human biological pathways and processes. Nucleic
Acids Res. 2008 Nov 3.
http://www.ebi.ac.uk/pride
58
Stats
New site! Coming soon …
http://reactome.oicr.on.ca
Main
text
Navigation bar
The Pathway Browser
Species selector
Search &
Analyze barSidebar
Pathway Diagram Panel
Details Panel (hidden)
Thumbnail
The Pathway Browser - Pathway Diagrams
Boxes are proteins, sets or complexes.
Ovals are small molecules.
Green boxes are proteins or sets, blue are complexes.
Catalyst
Input
Outputs
Compartment
Reaction node
Transition Binding Dissociation Omitted Uncertain
Regulation
+ve -ve
Pathway Analysis – Overrepresentation
‘Top-level’
Reveal next level
P-val, In set/In pathway
Species Comparison II
Yellow = human/rat
Blue = human only
Grey = not relevant
Black = Complex
Expression Analysis II ‘Hot’ = high
‘Cold’ = low
Molecular Interaction Overlay
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0"
uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "pathway" interface = "default" >
<Filter name = "referencepeptidesequence_uniprot_id_list"
value = "P25205"/>
<Attribute name = "stableidentifier_identifier" />
<Attribute name = "pathway_db_id" />
</Dataset>
</Query>
BioMart
1. Filter
2. Attributes
3. Results
http://www.reactome.org:5555/biomart/martservice?query= XML
Easy programmatic access!
http://www.reactome.org:5555/biomart/martview
Adknoledgments …
• EU:
• ProDaC (to 03/2009)
• ProteomeBinders
• BioSapiens
• Felics
• LipidomicNet
• APO-SYS
• PSIMEx (since 03/2009)
• EMBL
• Wellcome Trust
• NIH
The Funding
68
?

More Related Content

What's hot

Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMCarole Goble
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics David Shorthouse
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Dag Endresen
 
The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)Juan Antonio Vizcaino
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31Dag Endresen
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 

What's hot (20)

Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Prosite
PrositeProsite
Prosite
 
Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
Presentation FAIRsFAIR workshop (June 2021)
Presentation FAIRsFAIR workshop (June 2021)Presentation FAIRsFAIR workshop (June 2021)
Presentation FAIRsFAIR workshop (June 2021)
 
The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 

Similar to EMBL-EBI Proteomics data resources and services

PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICRafael C. Jimenez
 
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.Rafael C. Jimenez
 
Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Rafael C. Jimenez
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldJuan Antonio Vizcaino
 
Clustering and scoring molecular interactions relying on community standards
Clustering and scoring molecular interactions relying on community standardsClustering and scoring molecular interactions relying on community standards
Clustering and scoring molecular interactions relying on community standardsRafael C. Jimenez
 
IntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICIntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICRafael C. Jimenez
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeVince Smith
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalChristoph Steinbeck
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesRafael C. Jimenez
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meetingNiklas Blomberg
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Takeya Kasukawa
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc RDM
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 

Similar to EMBL-EBI Proteomics data resources and services (20)

PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 
Clustering and scoring molecular interactions relying on community standards
Clustering and scoring molecular interactions relying on community standardsClustering and scoring molecular interactions relying on community standards
Clustering and scoring molecular interactions relying on community standards
 
Standards
StandardsStandards
Standards
 
IntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICIntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUIC
 
MIMIx
MIMIxMIMIx
MIMIx
 
Mibbi mim ix
Mibbi mim ixMibbi mim ix
Mibbi mim ix
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNal
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 

More from Rafael C. Jimenez

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop Rafael C. Jimenez
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesRafael C. Jimenez
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsRafael C. Jimenez
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...Rafael C. Jimenez
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic accessRafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeRafael C. Jimenez
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Rafael C. Jimenez
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Rafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information Rafael C. Jimenez
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS projectRafael C. Jimenez
 
ELIXIR . Technical Coordinator
ELIXIR. Technical CoordinatorELIXIR. Technical Coordinator
ELIXIR . Technical CoordinatorRafael C. Jimenez
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsRafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
 
ELIXIR . Technical Coordinator
ELIXIR. Technical CoordinatorELIXIR. Technical Coordinator
ELIXIR . Technical Coordinator
 
BioJS introduction
BioJS introductionBioJS introduction
BioJS introduction
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
 

Recently uploaded

Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 

Recently uploaded (20)

Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 

EMBL-EBI Proteomics data resources and services

  • 1. EBI is an Outstation of the European Molecular Biology Laboratory. 08/23/18 EMBL-EBI Proteomics data resources and services Rafael JIMENEZ (EBI, Hinxton, UK) 4th Annual Forum for SMEs Munich, October 18th-19th 2010
  • 2. Context Integration, standards and dissemination Uniprot Protein Sequences Reactome Pathways IntAct Interactions PRIDE Mass Spec DAS PSICQUIC EnCore Annotation Archive
  • 3. sequence databases (INSDC) EMBL DDBJ NCBI interactions IMEx IntAct BIND DIP MINT … mass spec ProteomeXchange PRIDE PeptideAtlas GPMDB Tranche … Sharing infrastructures • Multiple repositories in a particular field Collaboration and data exchange More data coverage
  • 4. • Proteomics Standards Initiative • Work group of the Human Proteome Organization • Defines community standards for data in proteomics • … facilitating data comparison, exchange and verification PSI 4 http://www.psidev.info/
  • 5. • Proteomics Standards Initiative • Work group of the Human Proteome Organization • Defines community standards for data in proteomics • … facilitating data comparison, exchange and verification PSI 5 • MIAPE: The Minimum Information About a Proteomics Experiment • Data and metadata from proteomics experiments • Data: results • Metadata: data about the data • Where the samples came from • How the analysis were performed
  • 6. • Proteomics Standards Initiative • Work group of the Human Proteome Organization • Defines community standards for data in proteomics • … facilitating data comparison, exchange and verification PSI 6 http://www.psidev.info/
  • 7. 7 PSI-MI Data format Data distribution Control vocabulary Data submission Website Standard format Tools PSICQUIC PSI-MI CV Reporting guideline MIMIx Tools PSI-MI XML PSI-MITAB XML Java API MITAB Java API XMLMakerFlattener XML Validator MIF25_view.xsl MIF25_compact.xsl MIF25_expand.xsl PSI-MI XML files PSI Excel Sheet PSI Web Form Data Servers Registry Clients
  • 8. • Work group of the Proteomics Standards Initiative • Community coordination to ensure deposition of data in public repositories • Concentrating on … • Annotation and representation of published MI data • Accessibility of MI data to the user community PSI - Molecular Interactions Data format Data distribution Control vocabulary MIAPE Reporting guideline PSI-MI XML PSI-MITAB PSICQUIC MIMIxPSI-MI CV http://www.psidev.info/MI Scoring PSISCORE
  • 9. PSI-MI format • Community standard for Molecular Interactions • Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others • Collecting and combining data from different sources has become easier • Standardized annotation through PSI-MI ontologies • Tools from different organizations can be chained, e.g. IntAct data in Cytoscape. 9 psi-mi/xml25psi-mi/xml25 psi-mi/tab25psi-mi/tab25
  • 10. PSI-MI Control vocabulary • Ontology browser: http://www.ebi.ac.uk/ontology-lookup
  • 11. MIMIx • MIAPE document guideline for molecular interactions • 1. Manuscript information • 2. Experiment • 3. Interaction • 4. Confidence
  • 12. Data distribution: PSICQUIC • Proteomics Standards Initiative Common QUery InterfaCe. • Community effort to standardise the way to access and retrieve data from Molecular Interaction databases. • Widely implemented by independent interaction data resources. • Based on the PSI standard formats (PSI-MI XML and MITAB) • Not limited to protein-protein interactions, also e.g. • Drug-target interactions • Simplified pathway data • A registry listing resources implementing PSICQUIC • Documentation: http://psicquic.googlecode.com
  • 13. PSICQUIC implementation ….…. …..... ….…. …..... PSICQUIC PSICQUIC PSICQUIC Sample Observation error Interaction databases Publications PSICQUIC services Annotation error User PSICQUIC Registry PSICQUIC client
  • 14. 08/23/1814 Service broker Service consumer Service provider Service Contract ... ... Interact PublishFind Service Oriented Architecture PSI-MI ... ... ... PSICQUIC Registry DAS ClientsDAS ClientsPSICQUIC Clients Format PSICQUIC sources PSICQUIC sources PSICQUIC sources PSICQUIC implementation • PSICQUIC Server (SOAP/REST web service) • PSICQUIC Registry • PSICQUIC Clients • PSICQUIC view • Citoscape • Envision2 • …
  • 15. PSICQUIC Registry • 13 sources • 14.665.530 interactions http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS
  • 16. PSICQUIC example: REST queries Bruno Aranda (baranda@ebi.ac.uk) http://mint.bio.uniroma2.it/mint/psicquic/webservices/current/search/query/p53 http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/p53 http://www.ebi.ac.uk/Tools/webservices/psicquic/chembl/webservices/current/search/interactor/p53 1 2 3
  • 17. PSICQUIC example: MIQL Bruno Aranda (baranda@ebi.ac.uk) • Molecular Interaction Query Language
  • 18. PSICQUIC example: MIQL …/query/specie:rat …/query/brca AND rpa1 • Terms • Fields • Operands
  • 23. 23 23 PSISCORE Scoring algorithm description, provided by scoring server / registry Examplary visualization of a scoring algorithm with a 0-1 range Scoring algorithms offered by PSISCORE servers
  • 25. IMEx: The International Molecular Exchange Consortium • Group of major public interaction data providers sharing curation effort: DIP, IntAct, MINT, MPact, MatrixDB, MPIDB and BioGRID • Independent molecular interaction resources • Common curation standards for detailed curation • Common data formats (PSI-MI XML, PSI-MITAB, PSICQUIC) • Common accession number space • Coordinated & non-redundant curation • In production mode since February 2010 • Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai) Imex.sf.net
  • 26. IntAct • Freely available, open-source database system • Public repository of molecular interactions • Interactions manually curated and reviewed by experts • Interaction derived from literature or direct user submissions • Topic centric datasets (eg. Cancer, Chromatin, MSD…) • Analysis tools for interaction data • EBI database (part of the IMEx consortium and the PSI-MI) • Data updated every week: ftp://ftp.ebi.ac.uk/pub/databases/intact • Data formats available: http://www.ebi.ac.uk/intact
  • 29. • Interactions by identification method • ~70% Y2H • ~25% Affinity purification • ~3% Physical data • ~2% Other methods IntAct statistics
  • 30. IntAct: Search and results Export Custom columns Filters More results (PSICQUIC)
  • 32. 32 PSI-MSS PSI-MS PSI-PI Data format Tools Standard format Reporting guideline MIAPE-MS mzML TraML - ProDaC - OpenMS/TOPP - ProteoWizard - Proteios - TPP - X!Tandem - Myrimatch - InSilicoSpectro - NCBI C++ toolkit - Mascot Validation, analysis, exporters, viewers , ... - Phenyx - PEAKS - mzML_Exporter - CompassXport - Insilicos Viewer -Jmzml - Pride Inspector - Pride Converter … Control vocabulary PSI-MS Data format Tools Standard format Reporting guideline MIAPE-MSI mzIdentML mzQuantML - mzIdentML validator - Mascot - OMSSA - Peaks - Phenyx - PLGS - ProCon - ProteinPilot - ProteinScape - SEQUEST Validation, analysis, exporters, viewers , ... - SpectraST - Spectrum Mill - X!Tandem - OpenMS/TOPP - Scaffold - TPP - Mascot Integra - MIAPE MSI exporter - CSV exporter … Tools Data Website Pride Inspector Pride Converter Pride Biomart Pride QProjects PICR OLS
  • 33. • Work group of the Proteomics Standards Initiative • Community coordination to ensure deposition of data in public repositories • Concentrating on … • Annotation and representation of published MS data • Accessibility of MS data to the user community PSI - Mass Spectrometry Standards Individual proteins Peptides Protein mixture Peptide Mass Separation 2D-SDS-PAGE Spot Cutting Digestion Trypsin Mass Spectroscopy MALDI-TOF Database search mzMLmzML mzIdentMLmzIdentML Protein identification Quantification mzQuantMLmzQuantML Protein quantification mzXMLmzXML mzDatamzData analysisXMLanalysisXML
  • 34. PSI-MS Controlled vocabulary 34 • Share by PSI-MSS and PSI-PI • Ontology browser: http://www.ebi.ac.uk/ontology-lookup
  • 37. ProteomExchange: Enhancing Cooperation of Proteomics Data Repositories • Group of major public Mass Spec data providers • Single point of submission to proteomics repositories • Encourage data exchange • Common data formats (mzML, mzIdentML, mzQuantML) • Common accession number space • Coordinated & non-redundant data • Since 2010 supported by the European Commission
  • 38. 38 Secondary resources Data reprocessing and notification Journals Wiley Proteomics NBT JPR MCP Standards Local data management systems mzQuantML Release 1 Release 2 Release 3 ProHITS MS-Lims ProCon Phenyx OmicsHub Other LIMS Pride Converter Repositories Pride Metadata, Results mzML mzIdentML Peptide Atlas Uniprot NIST Spectrum libraries …… Implementedin Data submission RSS feed Central Dataset Look-up Service MIAPE validation Accession Number/ reviewer login Notification Reprocessing notification Tranche Raw data Peptidome Metadata, Results xref xref Data release / publications Proposal structure
  • 39. http://www.ebi.ac.uk/pride The Proteomics Identifications Database • Centralized, standards compliant, public data repository for proteomics identifications • Open source • Open data • > 100.000.000 spectra • ~ 4.000.000 protein identifications • Detailed annotation of meta-data • Vizcaíno JA, Côté R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics. 2009 Sep;9(18):4276-83. PMID: 19662629
  • 40. PRIDE data content 40 Release of PRIDE Converter
  • 41. Protein IDs Peptide IDs PRIDE data content
  • 42. PRIDE Website PART_OF Search by • Experiment • Protein id • Ontology
  • 43. PRIDE Website • Results • Peptide IDs • Protein IDs • Mass spectra as peak lists • Metadata - experiment • Analysis
  • 44. 08/23/1844 BioMart – System Overview ATGCTGTTGTGC ATGCTGGACTGG ATGGCCCGATGG ATGCTGTTGTGC ATGCTGGACTGG ATGGCCCGATGG Source data (MySQL, Oracle, Postgres) DB Mart Bert Overduin
  • 45. 45 PRIDE Biomart 1. Filter 2. Attributes 3. Results http://www.ebi.ac.uk/pride/prideMart.do http://www.ebi.ac.uk/pride/biomart/martservice?query= XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" > <Dataset name = "pride" interface = "default" > <Filter name = "experiment_ac" value = "1632"/> <Attribute name = "submitted_accession" /> </Dataset> </Query> Easy programmatic access!
  • 46. 08/23/1846 What is OLS? • A unified, single point of query for over 69 ontologies (updated daily) and upwards of 850,000 terms. • A tool that offers online and programmatic access to query ontologies about: • Term names • Synonyms • Relationships • Annotations • Cross-references • Reusable code components to integrate such functionality in other projects http://www.ebi.ac.uk/ontology-lookup/ Richard Cote
  • 48. 08/23/1848 Why do you need ID mapping • Merging datasets to a common identifier space • Finding all aliases/synonyms for an identifier • (data integration – submissions!) • Mapping from secondary IDs to more recent primary IDs • (data “freshness”) • Preparing data sets for specific tools • Querying in various primary databases • (data format requirements) Richard Cote
  • 49. 08/23/1849 Protein identifier mapping is hard • The basic problem: the same protein sequence is referred to by multiple accession numbers assigned by multiple databases. • No universal identifier scheme • Redundant databases – multiple identifiers for the same sequence in the same database • Unstable identifiers (ex: gi numbers) • Obsolete and deleted identifiers (hypothetical proteins) • Different production cycles for major databases • Tools exist, but are limited in important their database and species coverage and in their usability and availability. Richard Cote
  • 50. 08/23/1850 PICR Home Page Submit accessions OR sequences (FASTA) with 500 entry interactive limit (no batch limit) Select output format Select one or many databases to map to in one request Limit search by taxonomy (pessimistic) Choose to return all mappings or only active ones Run search Richard Cote
  • 51. 08/23/1851 PICR Result Page – simple view Logical xref (hyperlinked) Inactive xref Secondary Identifier Active xref (hyperlinked) Richard Cote
  • 52. Pride inspector • Open mzML and PRIDE XML files • Browse the PRIDE database • Facilitate publication reviews
  • 53. 08/23/1853 DAS, The Distributed Annotation System The Distributed Annotation System is… • A network of biological data sources • A Service Oriented Architecture (SOA) • RESTful web service • An example of federation • Uniform access to multiple repositories of biological data. • Repositories distributed in different geographical locations. The DAS Protocol is… • An integration platform • A client-server protocol • An agreed standard for web services Andy Jenkinson
  • 54. 54 74 Protein DAS sources! PRIDE DAS 1.6 Pride & Dasty3 Protein client
  • 55. At present, PRIDE is a data repository to support publications -We rely 100% on data submitted by researchers -It is not possible to determine which data is good and which is not that good -This makes not possible that data in PRIDE can be used as supportive evidence for protein existence in UniProt -Data can not be reused by other resources either… PRIDE-Q: Why?
  • 56. Project funded by the Wellcome Trust (5 years): Added value: High-quality data will go to a new resource: PRIDE-Q Data exports: •Links, DAS track for all PRIDE data •Quality controlled, e.g. “Protein Existence”, Expression Atlas from PRIDE-Q PRIDE-Q * Curation Automated rules, Curator override PRIDE-Q
  • 57. Reactome •Human pathway knowledgebase •Manually curated •Open source, open data •Collaboration between EBI, OCRI and NYU •Online since 2003 •Matthews L, et al: Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2008 Nov 3. http://www.ebi.ac.uk/pride
  • 59. New site! Coming soon … http://reactome.oicr.on.ca Main text Navigation bar
  • 60. The Pathway Browser Species selector Search & Analyze barSidebar Pathway Diagram Panel Details Panel (hidden) Thumbnail
  • 61. The Pathway Browser - Pathway Diagrams Boxes are proteins, sets or complexes. Ovals are small molecules. Green boxes are proteins or sets, blue are complexes. Catalyst Input Outputs Compartment Reaction node Transition Binding Dissociation Omitted Uncertain Regulation +ve -ve
  • 62. Pathway Analysis – Overrepresentation ‘Top-level’ Reveal next level P-val, In set/In pathway
  • 63. Species Comparison II Yellow = human/rat Blue = human only Grey = not relevant Black = Complex
  • 64. Expression Analysis II ‘Hot’ = high ‘Cold’ = low
  • 66. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" > <Dataset name = "pathway" interface = "default" > <Filter name = "referencepeptidesequence_uniprot_id_list" value = "P25205"/> <Attribute name = "stableidentifier_identifier" /> <Attribute name = "pathway_db_id" /> </Dataset> </Query> BioMart 1. Filter 2. Attributes 3. Results http://www.reactome.org:5555/biomart/martservice?query= XML Easy programmatic access! http://www.reactome.org:5555/biomart/martview
  • 67. Adknoledgments … • EU: • ProDaC (to 03/2009) • ProteomeBinders • BioSapiens • Felics • LipidomicNet • APO-SYS • PSIMEx (since 03/2009) • EMBL • Wellcome Trust • NIH The Funding
  • 68. 68 ?

Editor's Notes

  1. There are many different approaches for confidence scoring individual molecular interactions. They can, for instance, be based on structural information, network topology, experimental conditions , similarity of the functional annotations of the interactors, or evolutionary conservation. It is therefore unfeasible that a single, central scoring approach could combine all these methods. The obvious solution to this is decentralization. In a decentralized setting, individual research groups focus on specific scoring approaches, for instance, a group with a focus on protein structures can provide confidence scores for the mutual interaction interfaces.
  2. Each line is a binary interaction evidence reported by a scientific publication Evidences are grouped by molecule pairs (allowing for subsequent filtering should you need to) Data can be downloaded in standard formats (see table header) Only 30 interactions per page for speedy loading One can customize the list of columns by clicking on the “Change Column Displayed”
  3. An integration platform for biological data a way of bringing together data from different providers federation unifies data sources that are different to each other
  4. Species selector – you need to have a pathway selected for this to do anything Sidebar in the PB displays the pathway hierarchy – on the Pathways tab. Analyze, Annotate &amp; Upload – i.e. A Control Panel. Contains tools and configuration options. Search – not used in the exercises, takes some getting used to Pathway Diagram panel is where diagrams appear, when selected from the hierarchy or a search result Details panel – does what it says, responds to the selected object – N.B. Hidden by default!
  5. Results – explain the order is best at the top, but only shows ‘top-level’ pathways coloured gives significance, the order is by best scoring pathway OR subpathway, explains why sometimes the colours seem out of order. Use twisties to see subpathways, or Open All. Numbers after name are p-val, then no. proteins from submitted set that match the pathway/total in pathway. Matching values twisty – lists the proteins from dataset that matched the pathway
  6. Results list all pathways, pick one and it looks a bit like this – colours overlaid on pathway diagram. Box shows result of right-clicking a complex to expand
  7. Results form, click view to see a pathway. Colour indicates expression level, refer to spectrum bar for values. Right click to zoom into complexes. Experiment Browser toolbar steps through multiple columns of expression data (if provided).