SlideShare a Scribd company logo
1 of 25
Download to read offline
PRIDE and ProteomeXchange – Making 
proteomics data accessible and reusable 
Dr. Yasset Perez-Riverol 
Twitter: @ypriverol 
Github: ypriverol 
Bioinformatician - PRIDE Group 
Proteomics Services Team 
EMBL-EBI 
Hinxton, Cambridge, UK
Proteomics Services, EBI-EMBL 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Protein Sequences 
IntAct 
Interactions 
PRIDE 
MS/MS Data 
Uniprot 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Reactome 
Pathways 
Biomodels
Overview 
• The ProteomeXchange (PX) consortium 
• PRIDE and ProteomeXchange 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
• PRIDE Components. 
• Current and future developments.
ProteomeXchange Consortium 
• Goal: Development of a framework to allow 
standard data submission and dissemination 
pipelines between the main existing proteomics 
repositories. 
• Includes PeptideAtlas (ISB, Seattle), PRIDE 
(Cambridge, UK) and MassIVE (UCSD, San Diego). 
• Common identifier space (PXD identifiers) 
• Two supported data workflows: MS/MS and SRM. 
• Main objective: Make data available and 
reusable. 
http://www.proteomexchange.org 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange data workflow 
Results 
Raw Data* 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
ProteomeCentral 
PRIDE 
(MS/MS data) 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Metadata / 
Manuscript 
Journals 
UniProt/ 
neXtProt 
Peptide Atlas 
Other DBs 
Receiving r e positories 
PASSEL 
(SRM data) 
Other DBs 
GPMDB 
Researcher’s results 
Reprocessed results 
Raw data* 
Metadata 
MassIVE 
(MS/MS data) 
Vizcaíno et al., Nat Biotechnol, 2014
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
MassIVE (UCSD) 
http://proteomics.ucsd.edu/service/massive/ 
• Just joined ProteomeXchange on June 2014
http://www.peptideatlas.org/passel/ 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
• Suitable for SRM assays 
• Part of PeptideAtlas set of 
resources. 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Farrah et al., Proteomics, 2012 
PASSEL: repository for SRM data
Pride: Protein identification Database 
http://www.ebi.ac.uk/pride/archive/ 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Vizcaíno et al., N. A Research, 2014 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
PX Submission workflow for MS/MS data 
1. Mass spectrometer output files: raw data (binary files) or 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
peak list spectra in a standardized format (mzML, mzXML). 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
2. Result files: 
a. Complete submissions: Result files can be converted to 
PRIDE XML or the mzIdentML data standard. 
b. Partial submissions: For workflows not yet supported by 
PRIDE, search engine output files will be stored and 
provided in their original form. 
3. Metadata: Sufficiently detailed description of sample origin, 
workflow, instrumentation, submitter based on Ontologies and 
Controlled Vocabularies. 
4. Other files: Optional files: 
a. QUANT: Quantification related results e. FASTA 
b. PEAK: Peak list files 
c. OTHER: Any other file type 
Published 
Raw 
Files 
Other 
files 
Ternent et al., Proteomics, 2014
Complete submissions using mzIdentML 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
An increasing number of tools support export to mzIdentML 1.1 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Search 
Engine 
Results + 
MS files 
Search 
engines 
mzIdentML 
- Mascot 
- MSGF+ 
- Myrimatch and related tools from D. Tabb’s lab 
- OpenMS 
- PEAKS 
- ProCon (ProteomeDiscoverer, Sequest) 
- Scaffold 
- TPP via the idConvert tool (ProteoWizard) 
- ProteinPilot (planned by the end of 2014) 
- Others: library for X!Tandem conversion, lab 
internal pipelines, … 
- Referenced spectral files need to be submitted as well 
(all open formats are supported). 
Updated list: 
http://www.psidev.info/tools-implementing-mzIdentML#.
Metadata • Key-Value pairs 
Protein • Table-based 
Peptide • Table-based 
PSM • Table-based 
Small Molecule • Table-based 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
• Basic information about experiment and sample 
• Basic information about protein identifications 
• Information about quantified peptides 
• Information about identified spectra 
• Basic information about identified small molecules 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
mzTab 
http://mztab.googlecode.com 
J. Griss et al., MCP, 2014
PRIDE Components: Submission Process 
PRIDE Converter PRIDE Inspector PX Submission Tool 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
PRIDE Components: PX submission tool 
• Capture the mappings between the different types of files. 
• Add the mandatory metadata annotation. 
• Make the file upload process straightforward to the submitter (It transfers all the 
files using Aspera or FTP). 
• Command line alternative: some scripting is needed. 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Published 
Raw 
Other 
files 
http://www.proteomexchange.org/submission 
PX 
submission 
tool
Available for complete submissions 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Wang et al., Nat. Biotechnology, 2012 
PRIDE Inspector 2.0 
PRIDE Inspector 2.0 supports: 
- PRIDE XML 
- mzIdentML + all types of spectra files 
- mzML 
- mzTab Quantitation (work in progress) 
https://github.com/PRIDE-Toolsuite/
Pride Components: Pipelines and Visualization 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Submission 
validation 
Pipeline 
• QC of files submitted. 
• Metadata check. 
Submission 
pipeline. 
• Add Project to Database (files location, general statistics, 
metadata) 
Publication 
pipeline 
• Conversion of files to mztab 
• Conversion spectra peaks to mgf 
• Index de information in Solr server
Pride Components: Services & Web components 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeCentral: Portal for all PX datasets 
http://proteomecentral.proteomexchange.org/cgi/GetDataset 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange: 1329 datasets up until October 2014 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Origin: 
271 USA 
166 Germany 
115 United Kingdom 
73 Switzerland 
70 China 
68 Netherlands 
67 France 
55 Canada 
44 Spain 
42 Belgium 
33 Sweden 
31 Australia 
31 Denmark 
31 Japan 
20 India 
20 Norway 
19 Taiwan 
17 Ireland 
16 Austria 
14 Finland 
14 Italy 
12 Republic of Korea 
11 Brazil 
9 Russia 
8 Israel 
7 Singapore … 
Type: 
437 PRIDE complete 
792 PRIDE partial 
63 PeptideAtlas/PASSEL complete 
14 MassIVE 
23 reprocessed 
Publicly Accessible: 
691 datasets, 52% of all 
86% PRIDE 
12% PASSEL 
2% MassIVE 
Top Species studied by at least 10 
datasets: 
577 Homo sapiens 
165 Mus musculus 
56 Saccharomyces cerevisiae 
53 Arabidopsis thaliana 
29 Rattus norvegicus 
22 Escherichia coli 
17 Bos taurus 
16 Mycobacterium tuberculosis 
13 Oryza sativa 
13 Drosophila melanogaster 
13 Glycine max 
~ 290 species in total 
Data volume: 
Total: ~55 TB 
Number of all files: ~131,000 
PXD000320-324: ~ 5 TB 
PXD000065: ~ 1.4TB 
Datasets/year: 
2012: 102 
2013: 527 
2014: 700
Journals and Data Deposition 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Journal 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Number of Submissions
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Data Access ? 
Total Numbers
Future developments 
• Make the data reusable. 
• Integration of different Protein expression resources 
• PRIDE 
• PeptideAtlas 
• ProteomicsDB 
• Human Proteome Map 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
PXD 
Identifier 
Hits 
Dataset title 
PXD000561 153512 
A draft map of the human 
proteome 
PXD000865 51639 
Mass spectrometry based draft of 
the human proteome
Web Services PROXI PROXI PROXI PROXI PROXI 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
PROXI Clients 
Repositories 
& 
Databases 
Registry 
Data Perez-Riverol Y, Proteomics, 20014
Conclusions 
• ProteomeXchange is widely used. 
• PRIDE contains most of the MS/MS datasets. 
• It has now a new consortium member: MassIVE (UCSD). 
• Around half of the datasets are already public. 
• Different open source tools available to facilitate the process: 
• File transfer speed should not be a problem (Aspera support) 
• Data depostion enables and promotes data reuse. 
• ProteomeXchange is open to new members. 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
Acknowledgements 
PRIDE Team 
Juan A. Vizcaino (Group Leader) 
Attila Csordas 
Rui Wang 
Florian Reisinger 
Jose A. Dianes 
Tobias Ternent 
Yasset Perez-Riverol 
Noemi del Toro 
Henning Hermjakob 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
PeptideAtlas Team (ISB, Seattle) 
Eric Deutsch 
Terry Farrah 
Zhi Sun 
MAssIVE 
Nuno Bandeira 
And many other PX partners and 
stakeholders 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Questions?

More Related Content

What's hot

The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateJuan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...Dag Endresen
 
Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Dag Endresen
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31Dag Endresen
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Dag Endresen
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Anubhav Jain
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...Juan Antonio Vizcaino
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgpetermurrayrust
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Dag Endresen
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014Dag Endresen
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorLevi Waldron
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 

What's hot (20)

Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
Human microbiome project
Human microbiome projectHuman microbiome project
Human microbiome project
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
 
Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 

Similar to PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Yasset Perez-Riverol
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014Juan Antonio Vizcaino
 
Submitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialSubmitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialJuan Antonio Vizcaino
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015Juan Antonio Vizcaino
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressJuan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataJuan Antonio Vizcaino
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeJuan Antonio Vizcaino
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)Juan Antonio Vizcaino
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarJuan Antonio Vizcaino
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 

Similar to PRIDE and ProteomeXchange – Making proteomics data accessible and reusable (20)

Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014
 
Submitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialSubmitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorial
 
PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
ProteomeXchange update 2017
ProteomeXchange update 2017ProteomeXchange update 2017
ProteomeXchange update 2017
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progress
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Proteomexchange
ProteomexchangeProteomexchange
Proteomexchange
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...
 
PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinar
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 

More from Yasset Perez-Riverol

Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsYasset Perez-Riverol
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesYasset Perez-Riverol
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Yasset Perez-Riverol
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionYasset Perez-Riverol
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017Yasset Perez-Riverol
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleYasset Perez-Riverol
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesYasset Perez-Riverol
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small moleculesYasset Perez-Riverol
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningYasset Perez-Riverol
 

More from Yasset Perez-Riverol (14)

Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All Hands
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon Introduction
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studies
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small molecules
 
PBS Web (Spanish)
PBS Web (Spanish)PBS Web (Spanish)
PBS Web (Spanish)
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Yasset perezriverol csi2011
Yasset perezriverol csi2011Yasset perezriverol csi2011
Yasset perezriverol csi2011
 
Yasset iso point-cigb-2012
Yasset iso point-cigb-2012Yasset iso point-cigb-2012
Yasset iso point-cigb-2012
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual Screening
 

Recently uploaded

ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

  • 1. PRIDE and ProteomeXchange – Making proteomics data accessible and reusable Dr. Yasset Perez-Riverol Twitter: @ypriverol Github: ypriverol Bioinformatician - PRIDE Group Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Proteomics Services, EBI-EMBL Yasset Perez-Riverol yperez@ebi.ac.uk Protein Sequences IntAct Interactions PRIDE MS/MS Data Uniprot BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Reactome Pathways Biomodels
  • 3. Overview • The ProteomeXchange (PX) consortium • PRIDE and ProteomeXchange Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) • PRIDE Components. • Current and future developments.
  • 4. ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make data available and reusable. http://www.proteomexchange.org Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 5. ProteomeXchange data workflow Results Raw Data* Yasset Perez-Riverol yperez@ebi.ac.uk ProteomeCentral PRIDE (MS/MS data) BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Metadata / Manuscript Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving r e positories PASSEL (SRM data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014
  • 6. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) MassIVE (UCSD) http://proteomics.ucsd.edu/service/massive/ • Just joined ProteomeXchange on June 2014
  • 7. http://www.peptideatlas.org/passel/ Yasset Perez-Riverol yperez@ebi.ac.uk • Suitable for SRM assays • Part of PeptideAtlas set of resources. BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Farrah et al., Proteomics, 2012 PASSEL: repository for SRM data
  • 8. Pride: Protein identification Database http://www.ebi.ac.uk/pride/archive/ Yasset Perez-Riverol yperez@ebi.ac.uk Vizcaíno et al., N. A Research, 2014 BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 9. PX Submission workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or Yasset Perez-Riverol yperez@ebi.ac.uk peak list spectra in a standardized format (mzML, mzXML). BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter based on Ontologies and Controlled Vocabularies. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files c. OTHER: Any other file type Published Raw Files Other files Ternent et al., Proteomics, 2014
  • 10. Complete submissions using mzIdentML Yasset Perez-Riverol yperez@ebi.ac.uk An increasing number of tools support export to mzIdentML 1.1 BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, … - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
  • 11. Metadata • Key-Value pairs Protein • Table-based Peptide • Table-based PSM • Table-based Small Molecule • Table-based Yasset Perez-Riverol yperez@ebi.ac.uk • Basic information about experiment and sample • Basic information about protein identifications • Information about quantified peptides • Information about identified spectra • Basic information about identified small molecules BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) mzTab http://mztab.googlecode.com J. Griss et al., MCP, 2014
  • 12. PRIDE Components: Submission Process PRIDE Converter PRIDE Inspector PX Submission Tool Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 13. PRIDE Components: PX submission tool • Capture the mappings between the different types of files. • Add the mandatory metadata annotation. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). • Command line alternative: some scripting is needed. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Published Raw Other files http://www.proteomexchange.org/submission PX submission tool
  • 14. Available for complete submissions Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Wang et al., Nat. Biotechnology, 2012 PRIDE Inspector 2.0 PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab Quantitation (work in progress) https://github.com/PRIDE-Toolsuite/
  • 15. Pride Components: Pipelines and Visualization Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Submission validation Pipeline • QC of files submitted. • Metadata check. Submission pipeline. • Add Project to Database (files location, general statistics, metadata) Publication pipeline • Conversion of files to mztab • Conversion spectra peaks to mgf • Index de information in Solr server
  • 16. Pride Components: Services & Web components Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 17. ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 18. ProteomeXchange: 1329 datasets up until October 2014 Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Origin: 271 USA 166 Germany 115 United Kingdom 73 Switzerland 70 China 68 Netherlands 67 France 55 Canada 44 Spain 42 Belgium 33 Sweden 31 Australia 31 Denmark 31 Japan 20 India 20 Norway 19 Taiwan 17 Ireland 16 Austria 14 Finland 14 Italy 12 Republic of Korea 11 Brazil 9 Russia 8 Israel 7 Singapore … Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Datasets/year: 2012: 102 2013: 527 2014: 700
  • 19. Journals and Data Deposition Yasset Perez-Riverol yperez@ebi.ac.uk Journal BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Number of Submissions
  • 20. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Data Access ? Total Numbers
  • 21. Future developments • Make the data reusable. • Integration of different Protein expression resources • PRIDE • PeptideAtlas • ProteomicsDB • Human Proteome Map Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) PXD Identifier Hits Dataset title PXD000561 153512 A draft map of the human proteome PXD000865 51639 Mass spectrometry based draft of the human proteome
  • 22. Web Services PROXI PROXI PROXI PROXI PROXI Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) PROXI Clients Repositories & Databases Registry Data Perez-Riverol Y, Proteomics, 20014
  • 23. Conclusions • ProteomeXchange is widely used. • PRIDE contains most of the MS/MS datasets. • It has now a new consortium member: MassIVE (UCSD). • Around half of the datasets are already public. • Different open source tools available to facilitate the process: • File transfer speed should not be a problem (Aspera support) • Data depostion enables and promotes data reuse. • ProteomeXchange is open to new members. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 24. Acknowledgements PRIDE Team Juan A. Vizcaino (Group Leader) Attila Csordas Rui Wang Florian Reisinger Jose A. Dianes Tobias Ternent Yasset Perez-Riverol Noemi del Toro Henning Hermjakob Yasset Perez-Riverol yperez@ebi.ac.uk PeptideAtlas Team (ISB, Seattle) Eric Deutsch Terry Farrah Zhi Sun MAssIVE Nuno Bandeira And many other PX partners and stakeholders BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 25. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Questions?