ProteomeXchange update HUPO 2016

The ProteomeXchange Consortium: 2016
update
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK

Juan A. Vizcaíno
juan@ebi.ac.uk
HUPO 2016 World Conference
Taipei, 20 September 2016
PSI Spring Meeting 2017
Beijing Proteome Research Center, China
April 24-26, 2017
April 23: 2nd PHOENIX Mini-Symposium
on Frontiers of Proteomics
April 27: Hiking the Great Wall
Focus topics:
• Quality control: qcML
• Proteogenomics formats
• proXI: proteomics eXpression Interface
• Privacy and Proteomics Data

Juan A. Vizcaíno
juan@ebi.ac.uk
Overview
• General introduction to ProteomeXchange
• Overall submission statistics
• Updated HPP guidelines
• Specifics about MassIVE (Nuno)

Juan A. Vizcaíno
juan@ebi.ac.uk
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org

Juan A. Vizcaíno
juan@ebi.ac.uk
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016

Juan A. Vizcaíno
juan@ebi.ac.uk
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow

Juan A. Vizcaíno
juan@ebi.ac.uk
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset

Juan A. Vizcaíno
juan@ebi.ac.uk
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow

Juan A. Vizcaíno
juan@ebi.ac.uk
OmicsDI: Portal for omics datasets
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., 2016, BioRXxiv

Juan A. Vizcaíno
juan@ebi.ac.uk
OmicsDI: Portal for omics datasets
Perez-Riverol et al., 2016, BioRXxiv

Juan A. Vizcaíno
juan@ebi.ac.uk
Countries with at least 100
datasets:
1105 USA
546 Germany
411 United Kingdom
356 China
229 France
188 Netherlands
178 Canada
150 Switzerland
125 Australia
123 Spain
123 Denmark
117 Japan
101 Sweden
ProteomeXchange: 4,534 datasets up until 31st July, 2016
Type:
4067 PRIDE
339 MassIVE
115 PeptideAtlas/PASSEL
13 jPOST
Publicly Accessible:
2597 datasets, 57% of all
2334 PRIDE
135 MassIVE
115 PASSEL
13 jPOST
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1758
2016 (till end of July): 1184
Top Species studied by at least 100
datasets:
2010 Homo sapiens
604 Mus musculus
191 Saccharomyces cerevisiae
140 Arabidopsis thaliana
127 Rattus norvegicus
936 reported taxa in total

Juan A. Vizcaíno
juan@ebi.ac.uk
Datasets are being reused more and more….
Data download volume for PRIDE in 2015: ~ 200 TB
Vaudel et al., Proteomics, 2016

Juan A. Vizcaíno
juan@ebi.ac.uk
HPP guidelines version 2.1

Juan A. Vizcaíno
juan@ebi.ac.uk
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.

Juan A. Vizcaíno
juan@ebi.ac.uk
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed

Juan A. Vizcaíno
juan@ebi.ac.uk
An observer of ProteomeXchange consortium - iProX
• Proteome data sharing platform in China
• Focusing
• Collection and sharing of proteome experiment raw data
• Standardized metadata of proteome experiment
• Visualization of proteome dataset
• Providing
• A User friendly data submission pipeline
• Structured management of datasets
• An effective user authority system
• Standardized metadata collection
• Powerful computing, storage, and network resources to support the pipeline
• Remote data backup and synchronous update
www.iprox.org

MassIVE update
Mingxun Wang1,2,4, Jeremy Carver1,4, Nuno Bandeira1-4
1Center for Computational Mass Spectrometry
2Computer Science and Engineering
3Skaggs School of Pharmacy and Pharmaceutical Sciences
4University of California, San Diego
Center for
Computational
Mass
Spectrometry
http://massive.ucsd.edu

Juan A. Vizcaíno
juan@ebi.ac.uk
http://massive.ucsd.eduhttp://proteomics.ucsd.edu
MassIVE Interactivity
• MassIVE = Mass spectrometry Interactive Virtual Environment

Juan A. Vizcaíno
juan@ebi.ac.uk
Massive reanalysis
• Community knowledge requires reproducible, well-characterized results
• MS-GF+ standard database search
• Reanalyzed 15 TB of Human data with ~185M MS/MS spectra
• 79 million new FDR-controlled PSMs
• 3.6 million modified versions of 2.8 million unique peptide sequences
• CPTAC colon cancer available with 5 different results sets
• [Original] Imported CPTAC results: 6.9M PSMs
• [Reanalysis] MS-GF+ database search: 8.9M PSMs, 70k mod variants (169k total)
• [Reanalysis] Spectral library search (MSPLIT): 10M PSMs, including 387K mixture spectra
• [Reanalysis] Proteogenomics searches of TCGA transcriptomics sequences (Enosi): 6.8M total
PSMs, 19,728 proteogenomic events
• [Reanalysis] Blind modification search (MODa): 7.8M PSMs, 2.8M PSMs for 221k mod variants
(306k total), 203K new mod variants (unique modified peptides)

Juan A. Vizcaíno
juan@ebi.ac.uk
Massive: Do it yourself
1. MSGF+ - Database search engine
2. MSPLIT – Spectral Library Search Engine
3. ENOSI – ProteoGenomic Search Engine
4. MODa - Multi-blind modification database search engine
5. Spectral Networks – spectral alignment-based
analysis and propagation of identifications
6. Multi-pass - MSPLIT, MSGFDB, MODa cascade Search
Workflow
7. MSGFDB - Database search engine
8. MSPLIT-DIA – Spectral Library Search for SWATH
9. Upload your own! (mzIdentML, mzTab, TSV)

Juan A. Vizcaíno
juan@ebi.ac.uk
Check what others think the spectrum is –
Massive Search
 Find peptide, proteins, PTMs
 Agreement in spectrum
identification?
One-stop search
across tens of
millions of PSMs
 Original
 Reanalysis

Juan A. Vizcaíno
juan@ebi.ac.uk
What can you do?
• How can the community work together to reveal the whole human proteome?
• Mass spectrometrists  share Data
• At least: partial submissions with raw mass spectrometry data and enough metadata to
allow for reanalysis
• Especially useful: rare tissues/conditions or very deep acquisition
• Biologists  share Knowledge
• At least: complete submissions with FDR-filtered results in open format (mzIdentML or
mzTab)
• Especially useful: human-curated knowledge of proteins, PTMs, endogenous peptides,
etc
• Bioinformaticians  share Reanalyses
• At least: FDR-filtered results in open format (mzIdentML or mzTab)
• Especially useful: algorithms that identify new types of PSMs (e.g., PTM-specific,
mixtures)

Juan A. Vizcaíno
juan@ebi.ac.uk
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Former team members, especially:
Rui Wang
Florian Reisinger
Noemi del Toro
Jose A. Dianes
Henning Hermjakob
Acknowledgements: The PRIDE Team and all PX partners
All data submitters !!!
Eric Deutsch
Zhi Sun
David Campbell
Nuno Bandeira
Mingxun Wang
Jeremy Carver
Yasushi Ishihama
Shujiro Okuda
Shin Kawano
Follow new datasets @proteomexchange

ProteomeXchange update HUPO 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to ProteomeXchange update HUPO 2016

Similar to ProteomeXchange update HUPO 2016 (20)

More from Juan Antonio Vizcaino

More from Juan Antonio Vizcaino (14)

Recently uploaded

Recently uploaded (20)

ProteomeXchange update HUPO 2016