SlideShare a Scribd company logo
1 of 70
PRIDE resources and ProteomeXchange
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE Archive (in the context of
ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE stores mass spectrometry (MS)-
based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak
lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Mission
• To archive all types of proteomics mass
spectrometry data for the purpose of supporting
reproducible research, allowing the application of
quality control metrics and enabling the reuse of
these data by other researchers.
• To integrate MS-based data in a protein-centric
manner to provide information on protein variants,
modifications, and expression.
• To provide mass spectrometry based expression
data to the Expression Atlas.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Mission
• To archive all types of proteomics mass
spectrometry data for the purpose of supporting
reproducible research, allowing the application of
quality control metrics and enabling the reuse of
these data by other researchers.
• To integrate MS-based data in a protein-centric
manner to provide information on protein variants,
modifications, and expression.
• To provide mass spectrometry based expression
data to the Expression Atlas.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data content in PRIDE Archive
• Submission driven resource
• PRIDE is split in datasets (group of assays)
• An assay represents one MS run (in most cases).
• No data reprocessing at present. PRIDE aims to represent
the author’s view on the data
• Supported formats: PRIDE XML and mzIdentML.
• Raw data is also now stored
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
What is a proteomics publication in 2015?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Journal Submission Recommendations
• Journal guidelines recommend submission to proteomics
repositories:
 Proteomics
 Nature Biotechnology
 Nature Methods
 Molecular and Cellular Proteomics
• Funding agencies are enforcing public deposition of data
to maximize the value of the funds provided.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
Expression Atlas.
http://www.ebi.ac.uk/pride
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
1
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Search
output
files
Spectra
files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE
XML
‘RESULT’
Before: only file conversion to PRIDE XML
File conversion
PRIDE
Converter
Other tools, e.g. hEIDI
Barsnes et al., Nat Biotechnol, 2009
Cote et al., MCP, 2012
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
ProteinPilo
t
Scaffold
PEAKS
MSGF+
Others
Native File export
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta,
PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML
1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector
PRIDE Inspector 2 supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification
https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Inspector 2
PRIDE Inspector 2
https://github.com/PRIDE-Toolsuite/
New visualisation
functionality for Protein
Groups
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• Capture the mappings between the different types of files.
• Make the file upload process straightforward to the submitter (It transfers all the
files using Aspera or FTP).
PX submission tool
Published
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Command line alternative: Using the Aspera file transfer protocol.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PX submission tool: screenshots
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Fast file transfer with Aspera
- Aspera is the default file transfer protocol to PRIDE:
- PX Submission tool
- Command line
- Up to 50X faster than FTP
File transfer speed should
not be a problem!!
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Archive: Number of submitted datasets in 2015
0
20
40
60
80
100
120
140
160
180
200
Number of submitted datasets to PRIDE Archive per month (November 1st
2015)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeXchange: 2,774 datasets up until 1st September, 2015
Type:
1681 PRIDE partial
813 PRIDE complete
173 MassIVE
84 PeptideAtlas/PASSEL complete
23 Reprocessed
Publicly Accessible:
1372 datasets, 49% of all
90% PRIDE
6% PASSEL
4% MassIVE
Data volume:
Total: ~150 TB
Number of all files: ~400,000
PXD000320-324: ~ 4 TB
PXD002319-26 ~2.4 TB
PXD001471 ~1.6 TB
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1182
Top Species studied by at least 20
datasets:
1080 Homo sapiens
335 Mus musculus
110 Saccharomyces cerevisiae
98 Arabidopsis thaliana
75 Rattus norvegicus
58 Escherichia coli
29 Bos taurus
23 Glycine max
20 Caenorhabditis elegans
20 Oryza sativa
~ 500 species in total
Origin:
714 USA
313 Germany
252 United Kingdom
163 China
146 France
121 Netherlands
108 Switzerland
103 Canada
81 Denmark
73 Spain
68 Japan
67 Australia
63 Sweden
57 Belgium
43 Austria
39 India
34 Taiwan
33 Norway
26 Italy
24 Ireland
24 Finland
21 Republic of Korea
20 Brazil
20 Russia
18 Israel
18 Singapore …
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Public data release: when does it happen?
• When the author tells us to do it (the authors can do it by
themselves)
• When we find out that a dataset has been published
• We look for PXD identifiers in PubMed abstracts.
• If your PXD identifier is not in the abstract, a paper may have
been published and the data is still private. Let us know!
• New web form in the PRIDE web to facilitate the process
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Partial submissions can be used to store
other data types
• Everything can be stored, not only MS/MS data: very flexible
mechanism to be able to capture all types of datasets
• PRIDE does not store SRM data (it goes to PASSEL)
• Top down proteomics datasets.
• Mass Spectrometry Imaging datasets.
• Data independent acquisition techniques: e.g. SWATH-MS datasets.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
C
D
From original publication [13] Reconstructed ProteomeXchange data
1. Thermo RAW data / UDP
2. Mirion Software (JLU)
1. Thermo RAW data / UDP
2. Convert to imzML
3. Upload to PRIDE
(EBI, Cambridge, UK)
4. Download from PRIDE
5. Display in MSiReader
- Vendor-independent data format
- Freely available software (open source)
- ‘open data‘ – free to reuse
- Anybody can do this!
 A public repository for mass spectrometry imaging data
Römpp et al., 2015
PRIDE
database
European
Bioinformatics
Institute,
Cambridge, UK
3. Upload
4. Download
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data access to PRIDE Archive
• Look for particular datasets of interest:
• For data reuse: which particular proteins and peptides
(including PTMs) have been detected.
• Data reinterpretation or re-analysis.
• Validation of the experimental results reported.
• Specific use cases for proteomics: spectral libraries,
fragmentation models, SRM transitions,…
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
RSS feed for public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Archive web interface
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Archive web interface (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Vaudel M, Barsnes H, Berven FS, Sickmann A,
Martens L:
Proteomics 2011;11(5):996-9.
https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,
Barsnes H:
Nature Biotechnology 2015; 33(1):22-24.
CompOmics Open Source Analysis Pipeline
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Find the desired PRIDE project …
… and start re-analyzing the data!
… inspect the project details ….
Reshake PRIDE data!
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE resources
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE
Archive
Aggr
egati
on
PRIDE
Cluster
Basic QC
checks for
PSMs
Reprocessed
datasets
Original
Submissions
Link to the original evidence
For original results
PRIDE
Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Sneak peak
• Provide an aggregated and QC filtered peptide-centric
and protein centric view on PRIDE Archive data.
http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Cluster - Concept
• Use spectral clustering to reliably group spectra coming
from the same peptide
• Infer reliable identifications by comparing submitted
identifications of spectra within a cluster
• Increases quality through data increase (taking
advantage of the wealth of data in PRIDE).
• Inherently adapts to new (labelling) techniques
Griss et al., Nat Methods, 2013
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Cluster - Concept
Griss et al., Nat Methods, 2013
NMMAACDPR
NMMAACDPR
PPECPDFDPPR
NMMAACDPR
Consensus spectrum
PPECPDFDPPR
NMMAACDPR
NMMAACDPR
Threshold: At least 10 spectra in
a cluster and ratio >70%.
Originally submitted identified spectra
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Cluster Home page
http://www.ebi.ac.uk/pride/cluster/#/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Cluster: result of searches
http://www.ebi.ac.uk/pride/cluster/#/
A couple of examples …
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: one perfect cluster (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: one perfect cluster (3)
What does that peptide sequence correspond to?
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: very good cluster
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: very good cluster (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: one perfect cluster (3)
What does that peptide sequence correspond to?
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Cluster – Spectral libraries
http://www.ebi.ac.uk/pride/cluster/#/libraries
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Proteomes: reusing PRIDE Cluster data
• Condensed and cross-dataset view of PRIDE Archive for
identification data:
• Data filtering of PSMs is performed at the level of the
submitted data.
• PSMs are grouped as peptide sequences.
• The peptide sequences are remapped to a recent version of
UniProtKB (at present UniProtKB “complete proteome”).
• Linked to the original supporting evidence.
• “PRIDE Cluster” used as an extra evidence for the PSMs.
http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE: Using it for giving reliability to IDs
Link to PRIDE
Cluster web
http://wwwdev.ebi.ac.u
k/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• Main characteristics of PRIDE Archive and
ProteomeXchange
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
• PRIDE Proteomes and PRIDE Cluster: new resources
Conclusions
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Aknowledgements: People
Attila Csordas
Tobias Ternent
Noemi del Toro
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
All past team members, especially
Rui Wang, Florian Reisinger and
Jose A. Dianes
All ProteomeXchange partners,
especially Eric Deutsch and Nuno
Bandeira
Acknowledgements: The PRIDE Team and collaborators
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Questions?

More Related Content

What's hot

Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLVidya Kalaivani Rajkumar
 
Binary Vector, By KK Sahu sir
Binary Vector, By KK Sahu sirBinary Vector, By KK Sahu sir
Binary Vector, By KK Sahu sirKAUSHAL SAHU
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Vectors for gene transfer in animals: Retro virus
Vectors for gene transfer in animals: Retro virusVectors for gene transfer in animals: Retro virus
Vectors for gene transfer in animals: Retro virusKhushbu
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentRamya S
 
Vector engineering and codon optimization
Vector engineering and codon optimizationVector engineering and codon optimization
Vector engineering and codon optimizationPiyush Jamwal
 
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)Rishabh Jain
 
Agrobacterium tumefaciens as a tool for genetic engineering in plants
Agrobacterium tumefaciens as a tool for genetic engineering in plantsAgrobacterium tumefaciens as a tool for genetic engineering in plants
Agrobacterium tumefaciens as a tool for genetic engineering in plantsSourabh Sharma
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Erin Davis
 
Functional proteomics, methods and tools
Functional proteomics, methods and toolsFunctional proteomics, methods and tools
Functional proteomics, methods and toolsKAUSHAL SAHU
 
Rap db(rice annotation project data base)
Rap db(rice annotation project data base)Rap db(rice annotation project data base)
Rap db(rice annotation project data base)PrajaktaKale17
 

What's hot (20)

Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
CATH
CATHCATH
CATH
 
Finding ORF
Finding ORFFinding ORF
Finding ORF
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
Binary Vector, By KK Sahu sir
Binary Vector, By KK Sahu sirBinary Vector, By KK Sahu sir
Binary Vector, By KK Sahu sir
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Transplastomics
TransplastomicsTransplastomics
Transplastomics
 
Vectors for gene transfer in animals: Retro virus
Vectors for gene transfer in animals: Retro virusVectors for gene transfer in animals: Retro virus
Vectors for gene transfer in animals: Retro virus
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Transfection method
Transfection methodTransfection method
Transfection method
 
Vector engineering and codon optimization
Vector engineering and codon optimizationVector engineering and codon optimization
Vector engineering and codon optimization
 
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)
Lectut btn-202-ppt-l3. gene cloning and plasmid vectors (1)
 
Labelling of dna
Labelling of dnaLabelling of dna
Labelling of dna
 
Agrobacterium tumefaciens as a tool for genetic engineering in plants
Agrobacterium tumefaciens as a tool for genetic engineering in plantsAgrobacterium tumefaciens as a tool for genetic engineering in plants
Agrobacterium tumefaciens as a tool for genetic engineering in plants
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell)
 
Functional proteomics, methods and tools
Functional proteomics, methods and toolsFunctional proteomics, methods and tools
Functional proteomics, methods and tools
 
Protein database
Protein databaseProtein database
Protein database
 
Prosite
PrositeProsite
Prosite
 
Rap db(rice annotation project data base)
Rap db(rice annotation project data base)Rap db(rice annotation project data base)
Rap db(rice annotation project data base)
 

Viewers also liked

Ravishankar Subramanian
Ravishankar SubramanianRavishankar Subramanian
Ravishankar SubramanianRavi Shankar
 
Resume MARTIN KAUMBUTHO PhD 2016
Resume MARTIN KAUMBUTHO PhD 2016Resume MARTIN KAUMBUTHO PhD 2016
Resume MARTIN KAUMBUTHO PhD 2016Martin Kaumbutho
 
Bimbingan Sosisl Perorangan ( BSP )
Bimbingan Sosisl Perorangan ( BSP )Bimbingan Sosisl Perorangan ( BSP )
Bimbingan Sosisl Perorangan ( BSP )CrussitaYusuf
 
Operation process of apple
Operation process of appleOperation process of apple
Operation process of appleAssad Rifat
 
David Chipperfield
David Chipperfield David Chipperfield
David Chipperfield JunXiang97
 
Ppt mas izar
Ppt mas izarPpt mas izar
Ppt mas izarizar jk
 
Math summer 2016_803
Math summer 2016_803Math summer 2016_803
Math summer 2016_803bagrutonline
 
MPKT B (CL2)
MPKT B (CL2)MPKT B (CL2)
MPKT B (CL2)siti1010
 
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -Jordi Masnou
 

Viewers also liked (14)

Ravishankar Subramanian
Ravishankar SubramanianRavishankar Subramanian
Ravishankar Subramanian
 
Resume_CourtneyHarkness
Resume_CourtneyHarknessResume_CourtneyHarkness
Resume_CourtneyHarkness
 
Blended Learning for Secondary School Teachers: Teaching a new programming en...
Blended Learning for Secondary School Teachers: Teaching a new programming en...Blended Learning for Secondary School Teachers: Teaching a new programming en...
Blended Learning for Secondary School Teachers: Teaching a new programming en...
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Maleni saucedo muñoz
Maleni saucedo muñozMaleni saucedo muñoz
Maleni saucedo muñoz
 
Resume MARTIN KAUMBUTHO PhD 2016
Resume MARTIN KAUMBUTHO PhD 2016Resume MARTIN KAUMBUTHO PhD 2016
Resume MARTIN KAUMBUTHO PhD 2016
 
Bimbingan Sosisl Perorangan ( BSP )
Bimbingan Sosisl Perorangan ( BSP )Bimbingan Sosisl Perorangan ( BSP )
Bimbingan Sosisl Perorangan ( BSP )
 
Operation process of apple
Operation process of appleOperation process of apple
Operation process of apple
 
David Chipperfield
David Chipperfield David Chipperfield
David Chipperfield
 
Ppt mas izar
Ppt mas izarPpt mas izar
Ppt mas izar
 
Math summer 2016_803
Math summer 2016_803Math summer 2016_803
Math summer 2016_803
 
MPKT B (CL2)
MPKT B (CL2)MPKT B (CL2)
MPKT B (CL2)
 
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -
CALENDARIO oficial - ALEVÍN 2º - 10º Torneo Primer Toque 2016 -
 
Yeganeh majidi
Yeganeh majidiYeganeh majidi
Yeganeh majidi
 

Similar to PRIDE-ProteomeXchange

ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarJuan Antonio Vizcaino
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeJuan Antonio Vizcaino
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...Juan Antonio Vizcaino
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressJuan Antonio Vizcaino
 

Similar to PRIDE-ProteomeXchange (20)

ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinar
 
PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
Human microbiome project
Human microbiome projectHuman microbiome project
Human microbiome project
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progress
 

More from Juan Antonio Vizcaino

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Juan Antonio Vizcaino
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formatsJuan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Juan Antonio Vizcaino
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...Juan Antonio Vizcaino
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateJuan Antonio Vizcaino
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataJuan Antonio Vizcaino
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...Juan Antonio Vizcaino
 

More from Juan Antonio Vizcaino (20)

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
 
ProteomeXchange update 2017
ProteomeXchange update 2017ProteomeXchange update 2017
ProteomeXchange update 2017
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

PRIDE-ProteomeXchange

  • 1. PRIDE resources and ProteomeXchange Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data resources at EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • PRIDE Cluster and PRIDE Proteomes Overview
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • PRIDE Cluster and PRIDE Proteomes Overview
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE stores mass spectrometry (MS)- based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches PRIDE (PRoteomics IDEntifications) database http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2013
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Mission • To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers. • To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression. • To provide mass spectrometry based expression data to the Expression Atlas.
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Mission • To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers. • To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression. • To provide mass spectrometry based expression data to the Expression Atlas.
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data content in PRIDE Archive • Submission driven resource • PRIDE is split in datasets (group of assays) • An assay represents one MS run (in most cases). • No data reprocessing at present. PRIDE aims to represent the author’s view on the data • Supported formats: PRIDE XML and mzIdentML. • Raw data is also now stored
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 What is a proteomics publication in 2015? • Proteomics studies generate potentially large amounts of data and results. • Ideally, a proteomics publication needs to: • Summarize the results of the study • Provide supporting information for reliability of any results reported • Information in a publication: • Manuscript • Supplementary material • Associated data submitted to a public repository
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Journal Submission Recommendations • Journal guidelines recommend submission to proteomics repositories:  Proteomics  Nature Biotechnology  Nature Methods  Molecular and Cellular Proteomics • Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE: Source of MS proteomics data • PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the Expression Atlas. http://www.ebi.ac.uk/pride
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Complete Partial Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized.
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 How to perform a complete PX submission to PRIDE • Decide between a complete/partial submission. • File conversion/export to PRIDE XML or mzIdentML • File check before submission (PRIDE Inspector) • Experimental annotation and actual file submission (PX submission tool) • Post-submission steps
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files (the list can be extended): a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 1
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Search output files Spectra files Original data files ‘RESULT’ file generation Final ‘RESULT’ file PRIDE XML ‘RESULT’ Before: only file conversion to PRIDE XML File conversion PRIDE Converter Other tools, e.g. hEIDI Barsnes et al., Nat Biotechnol, 2009 Cote et al., MCP, 2012
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export to mzIdentML Spectra files (mzML, mzXML, mzData, mgf, pkl, ms2, dta, apl) Mascot ProteinPilo t Scaffold PEAKS MSGF+ Others Native File export
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Complete submissions Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - MyriMatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - PeptideShaker - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (from version 5.0) - X!Tandem native conversion (Beta, PILEDRIVER) - Others: library for X!Tandem conversion, lab internal pipelines, … - Crux An increasing number of tools support export to mzIdentML 1.1 - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing- mzIdentML#.
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 2
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., MCP, 2016, in press PRIDE Inspector PRIDE Inspector 2 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab identification and Quantification https://github.com/PRIDE-Toolsuite/
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Inspector 2 PRIDE Inspector 2 https://github.com/PRIDE-Toolsuite/ New visualisation functionality for Protein Groups
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 3
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • Capture the mappings between the different types of files. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). PX submission tool Published Raw Other files http://www.proteomexchange.org/submission PX submission tool • Command line alternative: Using the Aspera file transfer protocol.
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PX submission tool: screenshots
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Fast file transfer with Aspera - Aspera is the default file transfer protocol to PRIDE: - PX Submission tool - Command line - Up to 50X faster than FTP File transfer speed should not be a problem!!
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Manuscript published detailing the process Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Archive: Number of submitted datasets in 2015 0 20 40 60 80 100 120 140 160 180 200 Number of submitted datasets to PRIDE Archive per month (November 1st 2015)
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeXchange: 2,774 datasets up until 1st September, 2015 Type: 1681 PRIDE partial 813 PRIDE complete 173 MassIVE 84 PeptideAtlas/PASSEL complete 23 Reprocessed Publicly Accessible: 1372 datasets, 49% of all 90% PRIDE 6% PASSEL 4% MassIVE Data volume: Total: ~150 TB Number of all files: ~400,000 PXD000320-324: ~ 4 TB PXD002319-26 ~2.4 TB PXD001471 ~1.6 TB Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1182 Top Species studied by at least 20 datasets: 1080 Homo sapiens 335 Mus musculus 110 Saccharomyces cerevisiae 98 Arabidopsis thaliana 75 Rattus norvegicus 58 Escherichia coli 29 Bos taurus 23 Glycine max 20 Caenorhabditis elegans 20 Oryza sativa ~ 500 species in total Origin: 714 USA 313 Germany 252 United Kingdom 163 China 146 France 121 Netherlands 108 Switzerland 103 Canada 81 Denmark 73 Spain 68 Japan 67 Australia 63 Sweden 57 Belgium 43 Austria 39 India 34 Taiwan 33 Norway 26 Italy 24 Ireland 24 Finland 21 Republic of Korea 20 Brazil 20 Russia 18 Israel 18 Singapore …
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Public data release: when does it happen? • When the author tells us to do it (the authors can do it by themselves) • When we find out that a dataset has been published • We look for PXD identifiers in PubMed abstracts. • If your PXD identifier is not in the abstract, a paper may have been published and the data is still private. Let us know! • New web form in the PRIDE web to facilitate the process
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Partial submissions can be used to store other data types • Everything can be stored, not only MS/MS data: very flexible mechanism to be able to capture all types of datasets • PRIDE does not store SRM data (it goes to PASSEL) • Top down proteomics datasets. • Mass Spectrometry Imaging datasets. • Data independent acquisition techniques: e.g. SWATH-MS datasets.
  • 38. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 C D From original publication [13] Reconstructed ProteomeXchange data 1. Thermo RAW data / UDP 2. Mirion Software (JLU) 1. Thermo RAW data / UDP 2. Convert to imzML 3. Upload to PRIDE (EBI, Cambridge, UK) 4. Download from PRIDE 5. Display in MSiReader - Vendor-independent data format - Freely available software (open source) - ‘open data‘ – free to reuse - Anybody can do this!  A public repository for mass spectrometry imaging data Römpp et al., 2015 PRIDE database European Bioinformatics Institute, Cambridge, UK 3. Upload 4. Download
  • 39. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • PRIDE Cluster and PRIDE Proteomes Overview
  • 40. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data access to PRIDE Archive • Look for particular datasets of interest: • For data reuse: which particular proteins and peptides (including PTMs) have been detected. • Data reinterpretation or re-analysis. • Validation of the experimental results reported. • Specific use cases for proteomics: spectral libraries, fragmentation models, SRM transitions,…
  • 41. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 42. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 43. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 RSS feed for public datasets http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
  • 44. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Ways to access data in PRIDE Archive • PRIDE web interface • File repository • REST web service • PRIDE Inspector tool
  • 45. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Archive web interface
  • 46. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Archive web interface (2)
  • 47. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9. https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H: Nature Biotechnology 2015; 33(1):22-24. CompOmics Open Source Analysis Pipeline
  • 48. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Find the desired PRIDE project … … and start re-analyzing the data! … inspect the project details …. Reshake PRIDE data!
  • 49. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • PRIDE Cluster and PRIDE Proteomes Overview
  • 50. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE resources
  • 51. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Archive Aggr egati on PRIDE Cluster Basic QC checks for PSMs Reprocessed datasets Original Submissions Link to the original evidence For original results PRIDE Proteomes
  • 52. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Sneak peak • Provide an aggregated and QC filtered peptide-centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
  • 53. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Cluster - Concept • Use spectral clustering to reliably group spectra coming from the same peptide • Infer reliable identifications by comparing submitted identifications of spectra within a cluster • Increases quality through data increase (taking advantage of the wealth of data in PRIDE). • Inherently adapts to new (labelling) techniques Griss et al., Nat Methods, 2013
  • 54. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Cluster - Concept Griss et al., Nat Methods, 2013 NMMAACDPR NMMAACDPR PPECPDFDPPR NMMAACDPR Consensus spectrum PPECPDFDPPR NMMAACDPR NMMAACDPR Threshold: At least 10 spectra in a cluster and ratio >70%. Originally submitted identified spectra
  • 55. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Cluster Home page http://www.ebi.ac.uk/pride/cluster/#/
  • 56. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Cluster: result of searches http://www.ebi.ac.uk/pride/cluster/#/ A couple of examples …
  • 57. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: one perfect cluster - 880 PSMs give the same peptide ID - 4 species - 28 datasets - Same instruments
  • 58. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: one perfect cluster (2)
  • 59. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: one perfect cluster (3) What does that peptide sequence correspond to?
  • 60. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: very good cluster
  • 61. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: very good cluster (2)
  • 62. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: one perfect cluster (3) What does that peptide sequence correspond to?
  • 63. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Cluster – Spectral libraries http://www.ebi.ac.uk/pride/cluster/#/libraries
  • 64. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Proteomes: reusing PRIDE Cluster data • Condensed and cross-dataset view of PRIDE Archive for identification data: • Data filtering of PSMs is performed at the level of the submitted data. • PSMs are grouped as peptide sequences. • The peptide sequences are remapped to a recent version of UniProtKB (at present UniProtKB “complete proteome”). • Linked to the original supporting evidence. • “PRIDE Cluster” used as an extra evidence for the PSMs. http://wwwdev.ebi.ac.uk/pride/proteomes/
  • 65. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE: Using it for giving reliability to IDs Link to PRIDE Cluster web http://wwwdev.ebi.ac.u k/pride/proteomes/
  • 66. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples: one perfect cluster - 880 PSMs give the same peptide ID - 4 species - 28 datasets - Same instruments
  • 67. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • Main characteristics of PRIDE Archive and ProteomeXchange • PX/PRIDE submission workflow for MS/MS data • PRIDE Inspector • PX submission tool • PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics • PRIDE Proteomes and PRIDE Cluster: new resources Conclusions
  • 68. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Do you want to know a bit more…? http://www.slideshare.net/JuanAntonioVizcaino
  • 69. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Aknowledgements: People Attila Csordas Tobias Ternent Noemi del Toro Johannes Griss Yasset Perez-Riverol Henning Hermjakob All past team members, especially Rui Wang, Florian Reisinger and Jose A. Dianes All ProteomeXchange partners, especially Eric Deutsch and Nuno Bandeira Acknowledgements: The PRIDE Team and collaborators
  • 70. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Questions?