PRIDE and ProteomeXchange: Training
webinar
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
juan@ebi.ac.uk
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Welcome - webinar instructions
• Gototraining works best in Chrome or IE – avoid Firefox
due to audio issues with Macs.
• To access the full features of Gototraining, use the
desktop version by clicking “switch to desktop version”.
• All microphones will be muted whilst the trainer is
speaking.
• If you have a question during this time or at the end,
please use the chat box at the bottom of the
gototraining box.
• Please complete the feedback survey which will launch
at the end of the webinar.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of
ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Mass Spectrometry (MS)-based proteomics
7
• Many different workflows.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition
• Data independent acquisition
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction
Monitoring)
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Mass Spectrometry (MS)-based proteomics
8
• Many different workflows.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition
• Data independent acquisition
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction
Monitoring)
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
MS proteomics: tandem MS (bottom-up)
MS/MS matching identifies
peptides, not proteins.
Proteins are inferred from the
peptide sequences.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE stores mass spectrometry (MS)-
based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak
lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Mission
• To archive all types of proteomics mass
spectrometry data for the purpose of supporting
reproducible research, allowing the application of
quality control metrics and enabling the reuse of
these data by other researchers.
• To integrate MS-based data in a protein-centric
manner to provide information on protein variants,
modifications, and expression.
• To provide mass spectrometry based expression
data to the Expression Atlas.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Mission
• To archive all types of proteomics mass
spectrometry data for the purpose of supporting
reproducible research, allowing the application of
quality control metrics and enabling the reuse of
these data by other researchers.
• To integrate MS-based data in a protein-centric
manner to provide information on protein variants,
modifications, and expression.
• To provide mass spectrometry based expression
data to the Expression Atlas.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
What is a proteomics publication in 2015?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Journal Submission Recommendations
• Journal guidelines recommend submission to proteomics
repositories:
 Proteomics
 Nature Biotechnology
 Nature Methods
 Molecular and Cellular Proteomics
• Funding agencies are enforcing public deposition of data
to maximize the value of the funds provided.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data content in PRIDE Archive
• Dataset submission driven resource.
• PRIDE is organised in datasets (group of assays).
• An assay represents one MS run (in most cases).
• No data reprocessing at present. PRIDE aims to represent
the author’s view on the data.
• Main supported formats: PRIDE XML and mzIdentML.
• Raw data is also now stored.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
1
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Search
output
files
Spectra
files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE
XML
‘RESULT’
Before: only file conversion to PRIDE XML
File conversion
PRIDE
Converter
Other tools, e.g. hEIDI
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX Data workflow for MS/MS data
Search
Engine
Results +
MS files
PRIDE
Converter 2
PRIDE
XML
Coté & Griss et al., MCP, 2012
Other tools available:
- PRIDE Converter
- PLGS (Waters)
- Proteios
- EasyProt
- hEIDI
- OmicsHub (Integromics)
- PeptideShaker (Compomics)
PRIDE Converter 2
https://github.com/PRIDE-Toolsuite/pride-converter-2
- ‘Bulk’ conversion possible: Command Line mode
- Virtually no limit in file sizes.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
ProteinPilo
t
Scaffold
PEAKS
MSGF+
Others
Native File export
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta,
PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML
1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector
PRIDE Inspector Toolsuite supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification +
all types of spectra files
https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Inspector Toolsuite
https://github.com/PRIDE-Toolsuite/
New visualisation
functionality for Protein
Groups
PRIDE Inspector Toolsuite
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Inspector Toolsuite
PRIDE Inspector Toolsuite
Private review of files
submitted to PRIDE
https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• Capture the mappings between the different types of files.
• Make the file upload process straightforward to the submitter (It transfers all the
files using Aspera or FTP).
PX submission tool
Published
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Command line alternative: Using the Aspera file transfer protocol.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX submission tool: step by step
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX submission tool: screenshots
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets by November 1st
• 923 submitted datasets in 2014
• In the last 6 months, 155 submitted datasets per month
• Size: ~ 160 TB.
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE: Size comparison with other EBI resources (May 2015)
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1.E+12
1.E+13
1.E+14
1.E+15
1.E+16
1.E+17
2004 2006 2008 2010 2012 2014 2016
bytes
date
Data accumulation by resource
Metabolites
PRIDE
EGA
ENA (less AE)
AE
Chart generated by Guy Cochrane
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data access to PRIDE Archive
• Look for particular datasets of interest:
• For data reuse: which particular proteins and peptides
(including PTMs) have been detected.
• Data reinterpretation or re-analysis.
• Validation of the experimental results reported.
• Specific use cases for proteomics: spectral libraries,
fragmentation models, SRM transitions,…
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
RSS feed for public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive web interface
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive web interface (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
2015 overview of PRIDE resources
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Proteomes and PRIDE Cluster
• Provide an aggregated and QC filtered peptide-centric
and protein centric view on PRIDE Archive data.
http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• Main characteristics of PRIDE Archive and
ProteomeXchange (PX)
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
Conclusions
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Aknowledgements: People
Attila Csordas
Tobias Ternent
Noemi del Toro
Gerhard Mayer (Bochum, de.NBI)
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang,
Florian Reisinger and Jose A.
Dianes
Acknowledgements: The PRIDE Team
• 9 December – UniProt website updates
• 16 December – Ensembl release 83
All webinars @ 4:00pm GMT time unless stated
For details see: http://www.ebi.ac.uk/training/webinars
Future webinars:

PRIDE and ProteomeXchange: Training webinar

  • 1.
    PRIDE and ProteomeXchange:Training webinar Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK juan@ebi.ac.uk
  • 2.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Welcome - webinar instructions • Gototraining works best in Chrome or IE – avoid Firefox due to audio issues with Macs. • To access the full features of Gototraining, use the desktop version by clicking “switch to desktop version”. • All microphones will be muted whilst the trainer is speaking. • If you have a question during this time or at the end, please use the chat box at the bottom of the gototraining box. • Please complete the feedback survey which will launch at the end of the webinar.
  • 3.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Data resources at EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 4.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Data resources at EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 5.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 6.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 7.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Mass Spectrometry (MS)-based proteomics 7 • Many different workflows. • Discovery mode: • Bottom-up proteomics • Data dependent acquisition • Data independent acquisition • Top down proteomics • Targeted mode: • SRM (Selected Reaction Monitoring)
  • 8.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Mass Spectrometry (MS)-based proteomics 8 • Many different workflows. • Discovery mode: • Bottom-up proteomics • Data dependent acquisition • Data independent acquisition • Top down proteomics • Targeted mode: • SRM (Selected Reaction Monitoring)
  • 9.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 MS proteomics: tandem MS (bottom-up) MS/MS matching identifies peptides, not proteins. Proteins are inferred from the peptide sequences.
  • 10.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE stores mass spectrometry (MS)- based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches PRIDE (PRoteomics IDEntifications) database http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2013
  • 11.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Mission • To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers. • To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression. • To provide mass spectrometry based expression data to the Expression Atlas.
  • 12.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Mission • To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers. • To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression. • To provide mass spectrometry based expression data to the Expression Atlas.
  • 13.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 What is a proteomics publication in 2015? • Proteomics studies generate potentially large amounts of data and results. • Ideally, a proteomics publication needs to: • Summarize the results of the study • Provide supporting information for reliability of any results reported • Information in a publication: • Manuscript • Supplementary material • Associated data submitted to a public repository
  • 14.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Journal Submission Recommendations • Journal guidelines recommend submission to proteomics repositories:  Proteomics  Nature Biotechnology  Nature Methods  Molecular and Cellular Proteomics • Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.
  • 15.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE: Source of MS proteomics data • PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas. http://www.ebi.ac.uk/pride/archive
  • 16.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Data content in PRIDE Archive • Dataset submission driven resource. • PRIDE is organised in datasets (group of assays). • An assay represents one MS run (in most cases). • No data reprocessing at present. PRIDE aims to represent the author’s view on the data. • Main supported formats: PRIDE XML and mzIdentML. • Raw data is also now stored.
  • 17.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 18.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 19.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 20.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 21.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 22.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Complete Partial Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized.
  • 23.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files (the list can be extended): a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 24.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 1
  • 25.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Search output files Spectra files Original data files ‘RESULT’ file generation Final ‘RESULT’ file PRIDE XML ‘RESULT’ Before: only file conversion to PRIDE XML File conversion PRIDE Converter Other tools, e.g. hEIDI
  • 26.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PX Data workflow for MS/MS data Search Engine Results + MS files PRIDE Converter 2 PRIDE XML Coté & Griss et al., MCP, 2012 Other tools available: - PRIDE Converter - PLGS (Waters) - Proteios - EasyProt - hEIDI - OmicsHub (Integromics) - PeptideShaker (Compomics) PRIDE Converter 2 https://github.com/PRIDE-Toolsuite/pride-converter-2 - ‘Bulk’ conversion possible: Command Line mode - Virtually no limit in file sizes.
  • 27.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export to mzIdentML Spectra files (mzML, mzXML, mzData, mgf, pkl, ms2, dta, apl) Mascot ProteinPilo t Scaffold PEAKS MSGF+ Others Native File export
  • 28.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Complete submissions Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - MyriMatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - PeptideShaker - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (from version 5.0) - X!Tandem native conversion (Beta, PILEDRIVER) - Others: library for X!Tandem conversion, lab internal pipelines, … - Crux An increasing number of tools support export to mzIdentML 1.1 - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing- mzIdentML#.
  • 29.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 2
  • 30.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., MCP, 2016, in press PRIDE Inspector PRIDE Inspector Toolsuite supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab identification and Quantification + all types of spectra files https://github.com/PRIDE-Toolsuite/
  • 31.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Inspector Toolsuite https://github.com/PRIDE-Toolsuite/ New visualisation functionality for Protein Groups PRIDE Inspector Toolsuite
  • 32.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Inspector Toolsuite PRIDE Inspector Toolsuite Private review of files submitted to PRIDE https://github.com/PRIDE-Toolsuite/
  • 33.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 3
  • 34.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • Capture the mappings between the different types of files. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). PX submission tool Published Raw Other files http://www.proteomexchange.org/submission PX submission tool • Command line alternative: Using the Aspera file transfer protocol.
  • 35.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PX submission tool: step by step
  • 36.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PX submission tool: screenshots
  • 37.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Manuscript published detailing the process Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data
  • 38.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Archive submitted datasets up until 1st November, 2015 • 1,259 submitted datasets by November 1st • 923 submitted datasets in 2014 • In the last 6 months, 155 submitted datasets per month • Size: ~ 160 TB.
  • 39.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE: Size comparison with other EBI resources (May 2015) 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13 1.E+14 1.E+15 1.E+16 1.E+17 2004 2006 2008 2010 2012 2014 2016 bytes date Data accumulation by resource Metabolites PRIDE EGA ENA (less AE) AE Chart generated by Guy Cochrane
  • 40.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 41.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Data access to PRIDE Archive • Look for particular datasets of interest: • For data reuse: which particular proteins and peptides (including PTMs) have been detected. • Data reinterpretation or re-analysis. • Validation of the experimental results reported. • Specific use cases for proteomics: spectral libraries, fragmentation models, SRM transitions,…
  • 42.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 RSS feed for public datasets http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
  • 43.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Ways to access data in PRIDE Archive • PRIDE web interface • File repository • REST web service • PRIDE Inspector tool
  • 44.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Archive web interface
  • 45.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Archive web interface (2)
  • 46.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 47.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 48.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • A sneak peak to other PRIDE resources Overview
  • 49.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 2015 overview of PRIDE resources
  • 50.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 PRIDE Proteomes and PRIDE Cluster • Provide an aggregated and QC filtered peptide-centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
  • 51.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 • Main characteristics of PRIDE Archive and ProteomeXchange (PX) • PX/PRIDE submission workflow for MS/MS data • PRIDE Inspector • PX submission tool • PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics Conclusions
  • 52.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Do you want to know a bit more…? http://www.slideshare.net/JuanAntonioVizcaino
  • 53.
    Juan A. Vizcaíno juan@ebi.ac.uk Trainingwebinar 25 November 2015 Aknowledgements: People Attila Csordas Tobias Ternent Noemi del Toro Gerhard Mayer (Bochum, de.NBI) Johannes Griss Yasset Perez-Riverol Henning Hermjakob Former team members: Rui Wang, Florian Reisinger and Jose A. Dianes Acknowledgements: The PRIDE Team
  • 54.
    • 9 December– UniProt website updates • 16 December – Ensembl release 83 All webinars @ 4:00pm GMT time unless stated For details see: http://www.ebi.ac.uk/training/webinars Future webinars: