PRIDE and ProteomeXchange: Training webinar

PRIDE and ProteomeXchange: Training
webinar
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
juan@ebi.ac.uk

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Welcome - webinar instructions
• Gototraining works best in Chrome or IE – avoid Firefox
due to audio issues with Macs.
• To access the full features of Gototraining, use the
desktop version by clicking “switch to desktop version”.
• All microphones will be muted whilst the trainer is
speaking.
• If you have a question during this time or at the end,
please use the chat box at the bottom of the
gototraining box.
• Please complete the feedback survey which will launch
at the end of the webinar.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE Archive (in the context of
ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Mass Spectrometry (MS)-based proteomics
7
• Many different workflows.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition
• Data independent acquisition
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction
Monitoring)

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Mass Spectrometry (MS)-based proteomics
8
• Many different workflows.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition
• Data independent acquisition
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction
Monitoring)

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
MS proteomics: tandem MS (bottom-up)
MS/MS matching identifies
peptides, not proteins.
Proteins are inferred from the
peptide sequences.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• PRIDE stores mass spectrometry (MS)-
based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak
lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Mission
• To archive all types of proteomics mass
spectrometry data for the purpose of supporting
reproducible research, allowing the application of
quality control metrics and enabling the reuse of
these data by other researchers.
• To integrate MS-based data in a protein-centric
manner to provide information on protein variants,
modifications, and expression.
• To provide mass spectrometry based expression
data to the Expression Atlas.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
What is a proteomics publication in 2015?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Journal Submission Recommendations
• Journal guidelines recommend submission to proteomics
repositories:
 Proteomics
 Nature Biotechnology
 Nature Methods
 Molecular and Cellular Proteomics
• Funding agencies are enforcing public deposition of data
to maximize the value of the funds provided.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data content in PRIDE Archive
• Dataset submission driven resource.
• PRIDE is organised in datasets (group of assays).
• An assay represents one MS run (in most cases).
• No data reprocessing at present. PRIDE aims to represent
the author’s view on the data.
• Main supported formats: PRIDE XML and mzIdentML.
• Raw data is also now stored.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
1

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Search
output
files
Spectra
files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE
XML
‘RESULT’
Before: only file conversion to PRIDE XML
File conversion
PRIDE
Converter
Other tools, e.g. hEIDI

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Search
Engine
Results +
MS files
PRIDE
Converter 2
PRIDE
XML
Coté & Griss et al., MCP, 2012
Other tools available:
- PRIDE Converter
- PLGS (Waters)
- Proteios
- EasyProt
- hEIDI
- OmicsHub (Integromics)
- PeptideShaker (Compomics)
PRIDE Converter 2
https://github.com/PRIDE-Toolsuite/pride-converter-2
- ‘Bulk’ conversion possible: Command Line mode
- Virtually no limit in file sizes.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
ProteinPilo
t
Scaffold
PEAKS
MSGF+
Others
Native File export

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta,
PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML
1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Converter 2
mzIdentML
PRIDE XML
2

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector
PRIDE Inspector Toolsuite supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification +
all types of spectra files
https://github.com/PRIDE-Toolsuite/

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
New visualisation
functionality for Protein
Groups

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Private review of files
submitted to PRIDE

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Converter 2
mzIdentML
PRIDE XML
3

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• Capture the mappings between the different types of files.
• Make the file upload process straightforward to the submitter (It transfers all the
files using Aspera or FTP).
PX submission tool
Published
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Command line alternative: Using the Aspera file transfer protocol.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX submission tool: step by step

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PX submission tool: screenshots

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets by November 1st
• 923 submitted datasets in 2014
• In the last 6 months, 155 submitted datasets per month
• Size: ~ 160 TB.

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE: Size comparison with other EBI resources (May 2015)
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1.E+12
1.E+13
1.E+14
1.E+15
1.E+16
1.E+17
2004 2006 2008 2010 2012 2014 2016
bytes
date
Data accumulation by resource
Metabolites
PRIDE
EGA
ENA (less AE)
AE
Chart generated by Guy Cochrane

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Data access to PRIDE Archive
• Look for particular datasets of interest:
• For data reuse: which particular proteins and peptides
(including PTMs) have been detected.
• Data reinterpretation or re-analysis.
• Validation of the experimental results reported.
• Specific use cases for proteomics: spectral libraries,
fragmentation models, SRM transitions,…

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
RSS feed for public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive web interface

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Archive web interface (2)

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
2015 overview of PRIDE resources

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
PRIDE Proteomes and PRIDE Cluster
• Provide an aggregated and QC filtered peptide-centric
and protein centric view on PRIDE Archive data.
http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
• Main characteristics of PRIDE Archive and
ProteomeXchange (PX)
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
Conclusions

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino

Juan A. Vizcaíno
juan@ebi.ac.uk
Training webinar
25 November 2015
Aknowledgements: People
Attila Csordas
Tobias Ternent
Noemi del Toro
Gerhard Mayer (Bochum, de.NBI)
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang,
Florian Reisinger and Jose A.
Dianes
Acknowledgements: The PRIDE Team

• 9 December – UniProt website updates
• 16 December – Ensembl release 83
All webinars @ 4:00pm GMT time unless stated
For details see: http://www.ebi.ac.uk/training/webinars
Future webinars:

PRIDE and ProteomeXchange: Training webinar

More Related Content

Viewers also liked

Similar to PRIDE and ProteomeXchange: Training webinar

More from Juan Antonio Vizcaino

Recently uploaded

PRIDE and ProteomeXchange: Training webinar