PRIDE and ProteomeXchange:
supporting the cultural change in
proteomics public data deposition
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas
MetaboLights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• PRIDE and ProteomeXchange
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• Some examples of public data reuse
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
What is a proteomics publication in 2016?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• PRIDE stores mass spectrometry (MS)-
based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak
lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Journal Submission Recommendations
• Journal guidelines recommend submission to proteomics
repositories:
 Proteomics (dataset briefs)
 JPR (HPP papers)
 Molecular and Cellular Proteomics
 Journals from the Nature group
 Journals from the PLOS group
• Funding agencies are enforcing public deposition of data
to maximize the value of the funds provided.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
Expression Atlas.
http://www.ebi.ac.uk/pride
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Countries with at least 100
datasets:
1105 USA
546 Germany
411 United Kingdom
356 China
229 France
188 Netherlands
178 Canada
150 Switzerland
125 Australia
123 Spain
123 Denmark
117 Japan
101 Sweden
ProteomeXchange: 4,534 datasets up until 31st July, 2016
Type:
4067 PRIDE (~90%)
339 MassIVE
115 PeptideAtlas/PASSEL
13 jPOST
Publicly Accessible:
2597 datasets, 57% of all
2334 PRIDE
135 MassIVE
115 PASSEL
13 jPOST
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1758
2016 (till end of July): 1184
Top Species studied by at least 100
datasets:
2010 Homo sapiens
604 Mus musculus
191 Saccharomyces cerevisiae
140 Arabidopsis thaliana
127 Rattus norvegicus
936 reported taxa in total
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• Some examples of public data reuse
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results (results can be parsed) and they can be visualized.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Partial submissions can be used to store
other data types
• Everything can be stored, not only MS/MS data: very flexible
mechanism to be able to capture all types of datasets
• PRIDE does not store SRM data (it goes to PASSEL)
• Top down proteomics datasets.
• Mass Spectrometry Imaging datasets.
• Data independent acquisition techniques: e.g. SWATH-MS datasets.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to mzIdentML (or PRIDE XML)
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML data standard (also PRIDE XML).
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML data standard (also PRIDE XML).
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
1
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
PLGS
Native File export
Others
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta, PILEDRIVER)
- Others: library for X!Tandem conversion, lab internal
pipelines, …
- Crux
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Complete submissions: which tools are missing
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- MaxQuant: Export to mzTab (work in progress)
-Proteome Discoverer (Thermo): Work in progress
An increasing number of tools support export to mzIdentML 1.1
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., Bioinformatics,
2015
Perez-Riverol et al., MCP, 2016
• PRIDE Inspector - standalone tool to enable visualisation and validation of MS
data.
• Build on top of ms-data-core-api - open source algorithms and libraries for
computational proteomics.
• Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE
XML.
• Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-api
https://github.com/PRIDE-Toolsuite/pride-inspector
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Inspector Functionality
Summary and QC charts Peptide spectra annotation and
visualisation
Protein groups inference
 Protein view containing protein
inference information
 Quantification view
 Multiple export options (.mgf,
protein/peptide tables, mzTab file)
 Direct access to PRIDE datasets
 Summary and QC charts (Delta m/z,
precursor charges, etc.)
 Spectra view (fragmentation table, ion
series annotation)
 Protein inference algorithm and protein
groups visualisation
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• Capture the mappings between the different types of files.
• Make the file upload process straightforward to the submitter (It transfers all the
files using Aspera or FTP).
PX submission tool
Published
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Command line alternative: Using the Aspera file transfer protocol.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PX submission tool: screenshots
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Fast file transfer with Aspera
- Aspera is the default file transfer protocol to PRIDE:
- PX Submission tool
- Command line
- Up to 50X faster than FTP
File transfer speed should
not be a problem!!
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Public data release: when does it happen?
• When the author tells us to do it (the authors can do it by
themselves)
• When we find out that a dataset has been published
• We look for PXD identifiers in PubMed abstracts.
• If your PXD identifier is not in the abstract, a paper may have
been published and the data is still private. Let us know!
• New web form in the PRIDE web to facilitate the process
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
RSS and Twitter feeds for public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
@proteomexchange
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Vaudel M, Barsnes H, Berven FS, Sickmann A,
Martens L:
Proteomics 2011;11(5):996-9.
https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,
Barsnes H:
Nature Biotechnology 2015; 33(1):22-24.
CompOmics Open Source Analysis Pipeline
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Find the desired PRIDE project …
… and start re-analyzing the data!
… inspect the project details ….
Reshake PRIDE data!
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• Some examples of public data reuse
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Datasets are being reused more and more….
Data download volume for PRIDE in 2015: ~ 200 TB
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Challenges for data reuse in proteomics
• Insufficient technical and biological metadata.
• Large computational infrastructure maybe needed (e.g. when
analysing many datasets together).
• Shortage of expertise (people).
• Lack of standardisation in the field.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Cluster
• Provide an aggregated peptide centric view of PRIDE Archive
• Hypothesis: same peptide will generate similar MS/MS spectra across
experiments
• New spectral clustering algorithm to reliably group spectra coming from the
same peptide
• Infer reliable identifications by comparing submitted identifications of
spectra within a cluster
 After clustering, a representative spectrum is built for all peptides
consistently identified across different datasets Griss et al., Nat. Methods,
2013
Griss et al., Nat. Methods,
2016
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Examples: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Examples: one perfect cluster (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PRIDE Cluster as a Public Data Mining Resource
50
• http://www.ebi.ac.uk/pride/cluster
• Spectral libraries for 16 species.
• All clustering results, as well as specific subsets of interest available.
• Source code (open source) and Java API
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Reprocess
• Data are reprocessed with the intention of obtaining
new knowledge or to provide an updated view on the
results.
• It mainly serves the same purpose of the original
experiment.
• For instance, a shot-gun dataset can be reprocessed
with a different algorithm or an updated sequence
database.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Reprocessing repositories
• These resources collect MS raw data and reprocess it using
one given analysis pipeline, and an up-to date protein
sequence database.
• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PeptideAtlas builds
Examples of builds:
- Human
- Human plasma
- Human urine
- Drosophila
- Mouse
- Mouse plasma
- Cow
- Yeast
…
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need further
validation… but generated a lot of discussion and put
proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014
•Around 60% of the data used for the
analysis comes from previous
experiments, most of them stored in
proteomics repositories such as
PRIDE/ProteomeXchange, PASSEL or
MassIVE.
•They complement that data with “exotic”
tissues.
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
OmicsDI: Portal for omics datasets
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., 2016, BioRXxiv
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
OmicsDI: Portal for omics datasets
Perez-Riverol et al., 2016, BioRXxiv
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
• Main characteristics of PRIDE and ProteomeXchange
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
• Reuse/ reanalysis of proteomics data -> Many possible
applications
Conclusions
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Former team members,
especially Rui Wang, Florian
Reisinger, Noemi del Toro, Jose
A. Dianes & Henning Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
Juan A. Vizcaíno
juan@ebi.ac.uk
De.NBI Summer School 2016
Dagstuhl, 27 September 2016
PSI Spring Meeting 2017
Beijing Proteome Research Center, China
April 24-26, 2017
April 23: 2nd PHOENIX Mini-Symposium
on Frontiers of Proteomics
April 27: Hiking the Great Wall
Focus topics:
• Quality control: qcML
• Proteogenomics formats
• proXI: proteomics eXpression Interface
• Privacy and Proteomics Data

PRIDE and ProteomeXchange: supporting the cultural change in proteomics public data deposition

  • 1.
    PRIDE and ProteomeXchange: supportingthe cultural change in proteomics public data deposition Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK
  • 2.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Data resources at EMBL-EBI Genes, genomes & variation ArrayExpress Expression Atlas MetaboLights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 3.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • PRIDE and ProteomeXchange • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • Some examples of public data reuse Overview
  • 4.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 What is a proteomics publication in 2016? • Proteomics studies generate potentially large amounts of data and results. • Ideally, a proteomics publication needs to: • Summarize the results of the study • Provide supporting information for reliability of any results reported • Information in a publication: • Manuscript • Supplementary material • Associated data submitted to a public repository
  • 5.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • PRIDE stores mass spectrometry (MS)- based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches PRIDE (PRoteomics IDEntifications) database http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 6.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Journal Submission Recommendations • Journal guidelines recommend submission to proteomics repositories:  Proteomics (dataset briefs)  JPR (HPP papers)  Molecular and Cellular Proteomics  Journals from the Nature group  Journals from the PLOS group • Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.
  • 7.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE: Source of MS proteomics data • PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the Expression Atlas. http://www.ebi.ac.uk/pride
  • 8.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta Mandatory raw data deposition since July 2015 • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
  • 9.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory raw data deposition since July 2015 • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. http://www.proteomexchange.org New in 2016 Vizcaíno et al., Nat Biotechnol, 2014
  • 10.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow
  • 11.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow
  • 12.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals UniProt/ neXtProtPeptide Atlas Other DBs Receiving repositories PRIDE GPMDBResearcher’s results Raw data Metadata PASSEL proteomicsDB Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS OmicsDI Integration with other omics datasets SRM data Reprocessed results MassIVE ProteomeXchange data workflow
  • 13.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Countries with at least 100 datasets: 1105 USA 546 Germany 411 United Kingdom 356 China 229 France 188 Netherlands 178 Canada 150 Switzerland 125 Australia 123 Spain 123 Denmark 117 Japan 101 Sweden ProteomeXchange: 4,534 datasets up until 31st July, 2016 Type: 4067 PRIDE (~90%) 339 MassIVE 115 PeptideAtlas/PASSEL 13 jPOST Publicly Accessible: 2597 datasets, 57% of all 2334 PRIDE 135 MassIVE 115 PASSEL 13 jPOST Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1758 2016 (till end of July): 1184 Top Species studied by at least 100 datasets: 2010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus 936 reported taxa in total
  • 14.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • Some examples of public data reuse Overview
  • 15.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals UniProt/ neXtProtPeptide Atlas Other DBs Receiving repositories PRIDE GPMDBResearcher’s results Raw data Metadata PASSEL proteomicsDB Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS OmicsDI Integration with other omics datasets SRM data Reprocessed results MassIVE ProteomeXchange data workflow
  • 16.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Complete Partial Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results (results can be parsed) and they can be visualized.
  • 17.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed
  • 18.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Partial submissions can be used to store other data types • Everything can be stored, not only MS/MS data: very flexible mechanism to be able to capture all types of datasets • PRIDE does not store SRM data (it goes to PASSEL) • Top down proteomics datasets. • Mass Spectrometry Imaging datasets. • Data independent acquisition techniques: e.g. SWATH-MS datasets.
  • 19.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 How to perform a complete PX submission to PRIDE • Decide between a complete/partial submission. • File conversion/export to mzIdentML (or PRIDE XML) • File check before submission (PRIDE Inspector) • Experimental annotation and actual file submission (PX submission tool) • Post-submission steps
  • 20.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to the mzIdentML data standard (also PRIDE XML). b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 21.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to the mzIdentML data standard (also PRIDE XML). b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files (the list can be extended): a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 22.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 1
  • 23.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export to mzIdentML Spectra files (mzML, mzXML, mzData, mgf, pkl, ms2, dta, apl) Mascot ProteinPilot Scaffold PEAKS MSGF+ PLGS Native File export Others
  • 24.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Complete submissions Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - MyriMatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - PeptideShaker - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (from version 5.0) - X!Tandem native conversion (Beta, PILEDRIVER) - Others: library for X!Tandem conversion, lab internal pipelines, … - Crux An increasing number of tools support export to mzIdentML 1.1 - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing- mzIdentML#.
  • 25.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Complete submissions: which tools are missing Search Engine Results + MS files Search engines mzIdentML - MaxQuant: Export to mzTab (work in progress) -Proteome Discoverer (Thermo): Work in progress An increasing number of tools support export to mzIdentML 1.1 Updated list: http://www.psidev.info/tools-implementing- mzIdentML#.
  • 26.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 2
  • 27.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., Bioinformatics, 2015 Perez-Riverol et al., MCP, 2016 • PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. • Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics. • Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML. • Broad functionality. https://github.com/PRIDE-Utilities/ms-data-core-api https://github.com/PRIDE-Toolsuite/pride-inspector
  • 28.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Inspector Functionality Summary and QC charts Peptide spectra annotation and visualisation Protein groups inference  Protein view containing protein inference information  Quantification view  Multiple export options (.mgf, protein/peptide tables, mzTab file)  Direct access to PRIDE datasets  Summary and QC charts (Delta m/z, precursor charges, etc.)  Spectra view (fragmentation table, ion series annotation)  Protein inference algorithm and protein groups visualisation
  • 29.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Components: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 3
  • 30.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • Capture the mappings between the different types of files. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). PX submission tool Published Raw Other files http://www.proteomexchange.org/submission PX submission tool • Command line alternative: Using the Aspera file transfer protocol.
  • 31.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PX submission tool: screenshots
  • 32.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Fast file transfer with Aspera - Aspera is the default file transfer protocol to PRIDE: - PX Submission tool - Command line - Up to 50X faster than FTP File transfer speed should not be a problem!!
  • 33.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Manuscript published detailing the process Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data
  • 34.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Public data release: when does it happen? • When the author tells us to do it (the authors can do it by themselves) • When we find out that a dataset has been published • We look for PXD identifiers in PubMed abstracts. • If your PXD identifier is not in the abstract, a paper may have been published and the data is still private. Let us know! • New web form in the PRIDE web to facilitate the process
  • 35.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • PRIDE Cluster and PRIDE Proteomes Overview
  • 36.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals UniProt/ neXtProtPeptide Atlas Other DBs Receiving repositories PRIDE GPMDBResearcher’s results Raw data Metadata PASSEL proteomicsDB Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS OmicsDI Integration with other omics datasets SRM data Reprocessed results MassIVE ProteomeXchange data workflow
  • 37.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 ProteomeCentral: Centralised portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 38.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 RSS and Twitter feeds for public datasets http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml @proteomexchange
  • 39.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Ways to access data in PRIDE Archive • PRIDE web interface • File repository • REST web service • PRIDE Inspector tool
  • 40.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9. https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H: Nature Biotechnology 2015; 33(1):22-24. CompOmics Open Source Analysis Pipeline
  • 41.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Find the desired PRIDE project … … and start re-analyzing the data! … inspect the project details …. Reshake PRIDE data!
  • 42.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • PRIDE Archive (in the context of ProteomeXchange and the PSI standards) • How to submit data to PRIDE: PRIDE tools • How to access data in PRIDE Archive • Some examples of public data reuse Overview
  • 43.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Datasets are being reused more and more…. Data download volume for PRIDE in 2015: ~ 200 TB Vaudel et al., Proteomics, 2016
  • 44.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Challenges for data reuse in proteomics • Insufficient technical and biological metadata. • Large computational infrastructure maybe needed (e.g. when analysing many datasets together). • Shortage of expertise (people). • Lack of standardisation in the field.
  • 45.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Data sharing in Proteomics
  • 46.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Data sharing in Proteomics
  • 47.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Cluster • Provide an aggregated peptide centric view of PRIDE Archive • Hypothesis: same peptide will generate similar MS/MS spectra across experiments • New spectral clustering algorithm to reliably group spectra coming from the same peptide • Infer reliable identifications by comparing submitted identifications of spectra within a cluster  After clustering, a representative spectrum is built for all peptides consistently identified across different datasets Griss et al., Nat. Methods, 2013 Griss et al., Nat. Methods, 2016
  • 48.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Examples: one perfect cluster - 880 PSMs give the same peptide ID - 4 species - 28 datasets - Same instruments
  • 49.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Examples: one perfect cluster (2)
  • 50.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PRIDE Cluster as a Public Data Mining Resource 50 • http://www.ebi.ac.uk/pride/cluster • Spectral libraries for 16 species. • All clustering results, as well as specific subsets of interest available. • Source code (open source) and Java API
  • 51.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Data sharing in Proteomics
  • 52.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Reprocess • Data are reprocessed with the intention of obtaining new knowledge or to provide an updated view on the results. • It mainly serves the same purpose of the original experiment. • For instance, a shot-gun dataset can be reprocessed with a different algorithm or an updated sequence database.
  • 53.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Reprocessing repositories • These resources collect MS raw data and reprocess it using one given analysis pipeline, and an up-to date protein sequence database. • Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
  • 54.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PeptideAtlas builds Examples of builds: - Human - Human plasma - Human urine - Drosophila - Mouse - Mouse plasma - Cow - Yeast …
  • 55.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014 •Two independent groups claimed to have produced the first complete draft of the human proteome by MS. • Some of their findings are controversial and need further validation… but generated a lot of discussion and put proteomics in the spotlight. •They used many different tissues. Nature cover 29 May 2014
  • 56.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. •They complement that data with “exotic” tissues.
  • 57.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 OmicsDI: Portal for omics datasets http://www.ebi.ac.uk/Tools/omicsdi/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA Perez-Riverol et al., 2016, BioRXxiv
  • 58.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 OmicsDI: Portal for omics datasets Perez-Riverol et al., 2016, BioRXxiv
  • 59.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 • Main characteristics of PRIDE and ProteomeXchange • PX/PRIDE submission workflow for MS/MS data • PRIDE Inspector • PX submission tool • PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics • Reuse/ reanalysis of proteomics data -> Many possible applications Conclusions
  • 60.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 Aknowledgements: People Attila Csordas Tobias Ternent Gerhard Mayer (de.NBI) Johannes Griss Yasset Perez-Riverol Manuel Bernal-Llinares Andrew Jarnuczak Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob Acknowledgements: The PRIDE Team All data submitters !!!
  • 61.
    Juan A. Vizcaíno juan@ebi.ac.uk De.NBISummer School 2016 Dagstuhl, 27 September 2016 PSI Spring Meeting 2017 Beijing Proteome Research Center, China April 24-26, 2017 April 23: 2nd PHOENIX Mini-Symposium on Frontiers of Proteomics April 27: Hiking the Great Wall Focus topics: • Quality control: qcML • Proteogenomics formats • proXI: proteomics eXpression Interface • Privacy and Proteomics Data