Hubble Asteroid Hunter III. Physical properties of newly found asteroids
PRIDE resources and ProteomeXchange
1. PRIDE resources and ProteomeXchange
Dr. Juan Antonio Vizcaíno
EMBL-EBI
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Cross dom
ain
resources
.
C
ro
ss
d
o
m
a
in
re
s
o
u
rc
e
s
d
g
P
b
s
y
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene & protein expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
3. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
Overview
4. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
•PRIDE stores mass spectrometry (MS)-
based proteomics data:
•Peptide and protein expression data
(identification and quantification)
•Post-translational modifications
•Mass spectra (raw data and peak lists)
•Technical and biological metadata
•Any other related information
•Full support for tandem MS approaches
•Any type of data can be stored
•From July 2017, an ELIXIR core resource
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
5. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
6. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE data submissions and data growth
May 2018 (320 datasets) was again a
record month in terms of datasets
submitted
Datasets submitted per month
> 2,400 datasets submitted in 2017
Datasets submitted per year
PRIDE contains >85% of all ProteomeXchange datasets
Dataset PXD010000 reached on June 1st
7. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Data content in PRIDE Archive
• Data submission driven resource
• PRIDE is organised in datasets (group of assays)
• An assay represents one MS run (in most cases).
• PRIDE aims to store the author’s original view on the data
• No data reprocessing at present (but this will change).
• Supported formats for the results: mzIdentML and mzTab
(and PRIDE XML, historically).
8. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide
Atlas
Researcher’s results
Raw data
Metadata
Research
groups
Reanalysis of datasets
Reprocessed results
MassIVE
Receiving repositories
PRIDE
PASSEL
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
SRM
data
iProX
ProteomeXchange data workflow
DATASETS
9. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide
Atlas
Other DBsGPMDBResearcher’s results
Raw data
Metadata
proteomicsDB
Research
groups
Reanalysis of datasets
DATASETS
OmicsDI
Integration with other
omics datasets
Reprocessed results
MassIVE
PRIDE
PASSEL
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
SRM
data
iProX
ProteomeXchange data workflow
Receiving repositories
10. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
Overview
11. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide
Atlas
Researcher’s results
Raw data
Metadata
Research
groups
Reanalysis of datasets
Reprocessed results
MassIVE
Receiving repositories
PRIDE
PASSEL
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
SRM
data
iProX
ProteomeXchange data workflow
DATASETS
12. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
13. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
14. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission
• File conversion/export to mzIdentML (also mzTab, PRIDE
XML is no longer recommended)
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
15. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML (or mzTab) data standards
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
16. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML (or mzTab) data standards
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
17. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Components: Data Submission Process
PRIDE Inspector PX Submission Tool
In addition to PRIDE Archive, the PRIDE team develops and maintains different
tools and software libraries to facilitate the handling and visualisation of MS
proteomics data and the submission process
mzIdentML
(mzTab)
18. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf, pkl,
etc)
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
Others
Native File export
19. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta, PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
- Work in progress: ProteomeDiscoverer (Thermo),
PLGS
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
20. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Tools ‘RESULT’ file generation Final ‘RESULT’ file
Support for mzTab
Mascot
MaxQuant*
OpenMS
Native File export
mzTab
‘RESULT’
Spectra
files
(mzML,
mzXML,
mzData,
mgf, pkl,
etc)* Work in progress
21. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Components: Data Submission Process
PRIDE Inspector PX Submission Tool
mzIdentML
(mzTab)
2
22. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016
PRIDE Inspector
PRIDE Inspector 2 supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification (+
all types of spectra files)
https://github.com/PRIDE-Toolsuite/
23. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Inspector Toolsuite
PRIDE Inspector 2
https://github.com/PRIDE-Toolsuite/
Visualisation
functionality for Protein
Groups
24. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Components: Data Submission Process
PRIDE Inspector PX Submission Tool
mzIdentML
(mzTab)
3
25. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PX Submission Tool
Ø Desktop application for data
submissions to ProteomeXchange via
PRIDE
• Implemented in Java 7
• Streamlines the submission process
• Capture mappings between files
• Retain metadata
• Fast file transfer with Aspera (FASP®
transfer technology) – FTP also available
• Command line option
Submission tool screenshot
https://www.ebi.ac.uk/pride/help/archive/submission
28. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Workflow Raw data
mgf
PS
output
mzIdentML
PeptideShaker
PRIDE Inspector
PX/ PRIDE
Your own workflow:
MS/MS
data processing
peak list generation
PX submission tool
PeptideShaker
29. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
In the first part of the exercise…
• You will use PeptideShaker to export the results of an
analysis to mzIdentML v1.1 (format required for the
submission)
• You will learn how to use PRIDE Inspector (visualization and
analysis of proteomics data)
• You will learn how to use the PX submission tool, to
perform data submissions to PRIDE.
31. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Public data release: when does it happen?
• When the author tells us to do it (the authors can do it by
themselves)
• When we find out that a dataset has been published
• We look for PXD identifiers in PubMed abstracts.
• If your PXD identifier is not in the abstract, a paper may have
been published and the data is still private. Let us know!
• New web form in the PRIDE web to facilitate the process
32. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
Overview
33. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
36. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Partial submissions can be used to store other data
types
93.6% of the datasets
come from DDA/ shot-gun
approaches
37. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide
Atlas
Researcher’s results
Raw data
Metadata
Research
groups
Reanalysis of datasets
Reprocessed results
MassIVE
Receiving repositories
PRIDE
PASSEL
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
SRM
data
iProX
ProteomeXchange data workflow
DATASETS
38. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
39. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Public datasets from different omics: OmicsDI
http://www.omicsdi.org/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
…and others
Perez-Riverol et al., Nat Biotechnol, 2017
41. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Second part of the exercise…
• Choice of 2 exercises:
1. PRIDE Archive web interface (search/ browse PRIDE data
in the web - beginner)
2. PRIDE Archive API web service (search/ browse PRIDE
data programmatically - intermediate)
42. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Archive web interface: Get familiarized with it
43. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
PRIDE Archive API
for project in projects:
# Set the request URL
url = 'http://www.ebi.ac.uk:80/pride/ws/archive/file/list/project/' + project
# Create the request
req = urllib.request.Request(url)
45. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
• You have learnt how to use:
• PeptideShaker can export to mzIdentML
• PRIDE Inspector
• PX submission tool
• PRIDE complete submission requirements.
• How to use the PRIDE Archive web interface
• How to use PRIDE API
Conclusions of the exercise
46. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
• Main characteristics of PRIDE Archive and
ProteomeXchange
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
Conclusions
47. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
48. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2018
Hinxton, 19 July 2018
Aknowledgements: People
Yasset Perez-Riverol
Johannes Griss
Suresh Hewapathirana
Tobias Ternent
Jingwen Bai
Attila Csordas
Deepti Jaiswal
Andrew Jarnuczak
Mathias Walzer
Gerhard Mayer (de.NBI)
Former team members, especially
Manuel Bernal-Linares & Henning
Hermjakob
Acknowledgements
All data submitters !!!
@pride_ebi
@proteomexchange