Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Do we need to make public our proteomics
data?
Dr. Yasset Perez-Riverol
Twitter: @ypriverol
Github: ypriverol
Bioinformati...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
I believe in open data, source, access …...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
I believe in open data, source, access p...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Overview
• Proteomics data deposition, b...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Proteomics data deposition, bad practice...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
ProteomeXchange Consortium
• Goal: Devel...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
ProteomeCentral
Metadata /
Manuscript
Ra...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
ProteomeXchange Partners: MassIVE (UCSD)...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
• Suitable for SRM assays
• Part of Pept...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
ProteomeXchange Partners: Pride
Vizcaíno...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Current status of databases & repositori...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Data rescue
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
PX Submission workflow for MS/MS data
1....
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Complete submissions (mzIdentML)
Search
...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Universal file format (mzTab)
http://mzt...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
PRIDE Components: Submission Process
PRI...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
• Capture the mappings between the diffe...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Available for complete submissions
Wang ...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Pride Components: Services & Web compone...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
ProteomeXchange: 1329 datasets up until ...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Brazil Submissions:
21 Projects
11 PXD P...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Journals and Data Deposition
Journal
Num...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Data Access ?TotalNumbers
PXD Identifier...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Ongoing and future work.
• Quality asses...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
QC with PRIDE Inspector
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
QC with PRIDE Inspector
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
QC with PRIDE Inspector
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
QC PRIDE Inspector and Quantitation
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Validation of controversial data
• Analy...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Quality control: PRIDE Cluster
• Data in...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
PRIDE Cluster
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Spectral libraries
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Sneak peak of the new PRIDE Cluster web
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Make data available and reusable.
•Aroun...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Vaudel M, Barsnes H, Berven FS, Sickmann...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Find the desired PRIDE project …
… and s...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Current status of databases & repositori...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
PROXI Clients
Repositories
&
Databases
W...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Conclusions
• ProteomeXchange is widely ...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Acknowledgements
PRIDE Team
Juan A. Vizc...
Yasset Perez-Riverol
yperez@ebi.ac.uk
BRPROT 2014
Búzios, Brazil (Dec 7-10, 2014)
Questions?
Upcoming SlideShare
Loading in …5
×

Do we need to make public our proteomics data?

688 views

Published on

Published in: Science
  • Be the first to comment

Do we need to make public our proteomics data?

  1. 1. Do we need to make public our proteomics data? Dr. Yasset Perez-Riverol Twitter: @ypriverol Github: ypriverol Bioinformatician - PRIDE Group Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  2. 2. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) I believe in open data, source, access … An Integrated, Directed Mass Spectrometric Approach for In-depth Characterization of Complex Peptide Mixtures. Mol Cell Proteomics. Nov 2008; 7(11): 2138–2150 1 dataset (no cost) => 4 papers and 3 new algorithms
  3. 3. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) I believe in open data, source, access policies…
  4. 4. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Overview • Proteomics data deposition, bad practices and experiences. • PRIDE and ProteomeXchange • PRIDE Components. • Ongoing and Future work!
  5. 5. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Proteomics data deposition, bad practices, experiences. Protein Expression Databases Processed Data RAW Data
  6. 6. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. http://www.proteomexchange.org
  7. 7. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  8. 8. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) ProteomeXchange Partners: MassIVE (UCSD) http://proteomics.ucsd.edu/service/massive/ • Just joined ProteomeXchange on June 2014
  9. 9. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) • Suitable for SRM assays • Part of PeptideAtlas set of resources. http://www.peptideatlas.org/passel/ Farrah et al., Proteomics, 2012 ProteomeXchange Partners: PASSEL for SRM data
  10. 10. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) ProteomeXchange Partners: Pride Vizcaíno et al., N. A Research, 2014 http://www.ebi.ac.uk/pride/archive/
  11. 11. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Current status of databases & repositories. Protein resources Protein Expression Databases Processed Data & RAW DataPRIDE PASSEL Chorus MassIVE Perez-Riverol Y, et al. Proteomics. 2014 PeptideAtlas GPMDB proteomicsDBPaxDb Human Proteinpedia MaxQB PRIDE PASSEL Human Proteome Map MOPED UniProt neXtProt
  12. 12. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Data rescue
  13. 13. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) PX Submission workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. b. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter based on Ontologies and Controlled Vocabularies. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files c. OTHER: Any other file type Published Raw Files Other files Ternent et al., Proteomics, 2014
  14. 14. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Complete submissions (mzIdentML) Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, … An increasing number of tools support export to mzIdentML 1.1 - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing- mzIdentML#.
  15. 15. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Universal file format (mzTab) http://mztab.googlecode.com • Basic information about experiment and sample • Key-Value pairsMetadata • Basic information about protein identifications • Table-basedProtein • Information about quantified peptides • Table-basedPeptide • Information about identified spectra • Table-basedPSM • Basic information about identified small molecules • Table-basedSmall Molecule J. Griss et al., MCP, 2014
  16. 16. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) PRIDE Components: Submission Process PRIDE Converter PRIDE Inspector PX Submission Tool
  17. 17. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) • Capture the mappings between the different types of files. • Add the mandatory metadata annotation. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). • Command line alternative: some scripting is needed. PRIDE Components: PX submission tool Published Raw Other files http://www.proteomexchange.org/submission PX submission tool
  18. 18. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Available for complete submissions Wang et al., Nat. Biotechnology, 2012 PRIDE Inspector 2.0 PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab Quantitation (work in progress) https://github.com/PRIDE-Toolsuite/
  19. 19. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Pride Components: Services & Web components
  20. 20. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) ProteomeXchange: 1329 datasets up until October 2014 Origin: 293 USA 184 Germany 143 UK 83 France 82 Netherlands 78 China 62 Switzerland 46 Spain 45 Belgium 45 Canada 42 Denmark 37 Australia 37 Japan 34 Sweden 26 Austria 21 Brazil 21 Taiwan 21 India 20 Norway 19 Finland 17 Ireland 14 Italy 12 Republic of Korea 8 Israel 9 Singapore 8 Russia Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total Datasets/year: 2012: 102 2013: 527 2014: 700
  21. 21. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Brazil Submissions: 21 Projects 11 PXD Public 10 PXD Private Main Contributors: Martins-de-Souza D. PhD (6) Domont G. Prof (4) Carvalho PC. (4)
  22. 22. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Journals and Data Deposition Journal NumberofSubmissions
  23. 23. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Data Access ?TotalNumbers PXD Identifier Hits Dataset title PXD000561 153512 A draft map of the human proteome PXD000865 51639 Mass spectrometry based draft of the human proteome
  24. 24. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Ongoing and future work. • Quality assessment of complete submissions. • Make the data reusable and reusable. • Integration of different Protein expression resources • PRIDE • PeptideAtlas • ProteomicsDB • Human Proteome Map
  25. 25. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) QC with PRIDE Inspector
  26. 26. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) QC with PRIDE Inspector
  27. 27. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) QC with PRIDE Inspector
  28. 28. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) QC PRIDE Inspector and Quantitation
  29. 29. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Validation of controversial data • Analysis of Tyrannosaurus rex fossils: controversial presence of collagen (is it a contamination of the sample?) Asara et al. (2007) Science 316: 280-5. Asara et al. (2007) Science 316: 1324-5. Bern et al. (2009) JPR 9: 4328-32 PRIDE assay accession 8633
  30. 30. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Quality control: PRIDE Cluster • Data integration across many experiments before filtering • Assumption: The same peptide will generate the same MS/MS spectrum in many experiments • Cluster all spectra in PRIDE • Those clusters which contain only/mainly one peptide are considered reliable NMMAACDPR NMMAACDPR PPECPDFDPPR NMMAACDPR Consensus PPECPDFDPPR Griss, et. al. Nature Met. 2012
  31. 31. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) PRIDE Cluster
  32. 32. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Spectral libraries
  33. 33. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Sneak peak of the new PRIDE Cluster web
  34. 34. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Make data available and reusable. •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. Perez-Riverol Y, et al. Proteomics. 2014
  35. 35. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9. http://searchgui.googlecode.com http://peptide-shaker.googlecode.com Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H: Nature Biotechnology (in press) CompOmics Open Source Analysis Pipeline
  36. 36. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Find the desired PRIDE project … … and start re-analyzing the data! … inspect the project details …. Reshake PRIDE data!
  37. 37. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Current status of databases & repositories. Protein resources Protein Expression Databases Processed Data & RAW DataPRIDE PASSEL Chorus MassIVE Perez-Riverol Y, et al. Proteomics. 2014 PeptideAtlas GPMDB proteomicsDBPaxDb Human Proteinpedia MaxQB PRIDE PASSEL Human Proteome Map MOPED UniProt neXtProt
  38. 38. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) PROXI Clients Repositories & Databases Web Services PROXI PROXI PROXI PROXI PROXI Registry Data Perez-Riverol Y, Proteomics, 20014 Integration of different Protein expression resources
  39. 39. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Conclusions • ProteomeXchange is widely used. • PRIDE contains most of the MS/MS datasets. • It has now a new consortium member: MassIVE (UCSD). • Around half of the datasets are already public. • Different open source tools available to facilitate the process: • File transfer speed should not be a problem (Aspera support) • Data depostion enables and promotes data reuse. • ProteomeXchange is open to new members.
  40. 40. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Acknowledgements PRIDE Team Juan A. Vizcaino (Group Leader) Attila Csordas Rui Wang Florian Reisinger Jose A. Dianes Tobias Ternent Noemi del Toro Henning Hermjakob PeptideAtlas Team (ISB, Seattle) Eric Deutsch Terry Farrah Zhi Sun MAssIVE Nuno Bandeira And many other PX partners and stakeholders
  41. 41. Yasset Perez-Riverol yperez@ebi.ac.uk BRPROT 2014 Búzios, Brazil (Dec 7-10, 2014) Questions?

×