1. The force of computational mass
spectrometry awakens in the EBI
Juan Antonio Vizcaíno
Reza Salek
Ken Haug
2. HRF Talk
17 December 2015
• Very short intro to Mass Spectrometry
• PRIDE and MetaboLights
• ProteomeXchange and MetabolomeXchange
• Data standards
• OmicsDI
Overview
3. HRF Talk
17 December 2015
How Mass Spectrometry works?
1. ionisation 2. separation of m/z ions
3. detection
Mass spectrometry has been described as the smallest scale in the world, not because
of the mass spectrometer’s size but because of the size of what it weighs –
Gary Siuzdak- Head of the Scripps centre for metabolomics and mass spectrometry at La Jolla USA
Ion cannon
From Cheng Lu – Introduction to Mass Spectrometer, Figures from various sources (Liebler (introduction to
proteomics: tools for the new biology. Humana Press 2002, Scripps, whatisms),
4. HRF Talk
17 December 2015
MS proteomics: tandem MS (bottom-up)
MS/MS matching identifies
peptides, not proteins.
Proteins are inferred from the
peptide sequences.
6. HRF Talk
17 December 2015
Data size of Mass Spectrometry data at the EBI (May 2015)
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1.E+12
1.E+13
1.E+14
1.E+15
1.E+16
1.E+17
2004 2006 2008 2010 2012 2014 2016
bytes
date
Data accumulation by platform
sequence
array
MS
Chart generated by Guy Cochrane
7. HRF Talk
17 December 2015
• Very short intro to Mass Spectrometry
• PRIDE and MetaboLights
• ProteomeXchange and MetabolomeXchange
• Data standards and tools
• OmicsDI
Overview
8. HRF Talk
17 December 2015
• PRIDE Archive stores MS-based proteomics data:
• Peptide and protein expression data (identification & quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Focused in MS/MS approaches, but any type of proteomics workflows
can be stored.
• For each dataset PRIDE stores at least the raw data and the
processed results.
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016, in press
9. HRF Talk
17 December 2015
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides MS
proteomics data to other EMBL-EBI
resources such as UniProt, Ensembl
and the Expression Atlas.
http://www.ebi.ac.uk/pride
10. HRF Talk
17 December 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
11. HRF Talk
17 December 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets to PRIDE Archive by November 1st
• 923 were submitted datasets in 2014
• In the last 6 months, 155 submitted datasets per month
• Size: ~ 160 TB
12. HRF Talk
17 December 2015
PRIDE Tools: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
13. HRF Talk
17 December 2015
PRIDE Inspector Toolsuite: Visualisation tool
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector Toolsuite
PRIDE Inspector Toolsuite supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification
+ all types of spectra files
https://github.com/PRIDE-Toolsuite/
16. HRF Talk
17 December 2015
PRIDE Tools: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
17. HRF Talk
17 December 2015
• It selects and captures the mappings between the different types of files included in the
submission.
• It transfers all the files using Aspera (default) or FTP.
PX submission tool
Results
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Version 2.3.0 released in August 2015 (Several refinements and improvements).
• Alternative command line method also available for groups with bioinformatics support.
18. HRF Talk
17 December 2015
MetaboLights – Logical components
• Data Submission
• ISAcreator
• Online data deposition
• Repository
• Complete metabolomics experiments
• Open data access
• Metabolite References
• Metabolite annotation
• Analysis
• Integrated data analysis
20. HRF Talk
17 December 2015
MetaboLights – Submission Pipeline
Share private prepublication studies with
reviewers and other trusted parties.
Study upload
22. HRF Talk
17 December 2015
• Very short intro to Mass Spectrometry
• PRIDE and MetaboLights
• ProteomeXchange and MetabolomeXchange
• Data standards
• OmicsDI
Overview
23. HRF Talk
17 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow standard
data submission and dissemination pipelines
between the main existing proteomics repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE (UCSD,
San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
24. HRF Talk
17 December 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
ProteomeXchange data workflow
25. HRF Talk
17 December 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
26. HRF Talk
17 December 2015
ProteomeXchange: 2,774 datasets up until 1st September, 2015
Type:
1681 PRIDE partial
813 PRIDE complete
173 MassIVE
84 PeptideAtlas/PASSEL complete
23 Reprocessed
Publicly Accessible:
1372 datasets, 49% of all
90% PRIDE
6% PASSEL
4% MassIVE
Data volume:
Total: ~150 TB
Number of all files: ~400,000
PXD000320-324: ~ 4 TB
PXD002319-26 ~2.4 TB
PXD001471 ~1.6 TB
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1182
Top Species studied by at least 20 datasets:
1080 Homo sapiens
335 Mus musculus
110 Saccharomyces cerevisiae
98 Arabidopsis thaliana
75 Rattus norvegicus
58 Escherichia coli
29 Bos taurus
23 Glycine max
20 Caenorhabditis elegans
20 Oryza sativa
~ 500 species in total
Origin:
714 USA
313 Germany
252 United Kingdom
163 China
146 France
121 Netherlands
108 Switzerland
103 Canada
81 Denmark
73 Spain
68 Japan
67 Australia
63 Sweden
57 Belgium
43 Austria
39 India
34 Taiwan
33 Norway
26 Italy
24 Ireland
24 Finland
21 Republic of Korea
20 Brazil
20 Russia
18 Israel
18 Singapore …
28. HRF Talk
17 December 2015
MetabolomeXchange Consortium
• Global network for exchange of
metabolomics data
• Includes study as well as reference
data
30. HRF Talk
17 December 2015
• Very short intro to Mass Spectrometry
• PRIDE and MetaboLights
• ProteomeXchange and MetabolomeXchange
• Data standards
• OmicsDI
Overview
31. HRF Talk
17 December 2015
Current PSI Proteomics Standard File Formats for MS
• mzTabFinal Results
• TraMLSRM
• mzQuantMLQuantitation
• mzIdentMLIdentification
• mzMLMS data
32. HRF Talk
17 December 2015
Current Metabolomics Standard File Formats for MS
• mzTabFinal Results
• TraML *SRM
• mzQuantML *Quantitation
• mzIdentMLIdentification
• mzMLMS data
33. HRF Talk
17 December 2015
Data exchange standards in MS
Neumann (IPB-Halle), Proteomics and HUPO-PSI community
35. HRF Talk
17 December 2015
nmrTab
nmrTab
NMR data exchange standards
Neumann and D Schober (IPB-Halle, M Wilson and D Wishart (U Alberta Canada), L Figueiredo and R Salek (EMBL-EBI), D Jacob
and C Deborde (Centre INRA de Bordeaux) and P Rocca-Serra (University of Oxford e-Research Centre). T Ebbels (Imperial College),
C Ludwig, J Easton, (University of Birmingham), A Moing (Centre INRA de Bordeaux), L Tenori (University of Florence), A Rosato
(University of Florence), I Lewis (Princeton) and many more
36. HRF Talk
17 December 2015
NMR data management facilitation via nmrML
http://nmrml.org
37. HRF Talk
17 December 2015
• Very short intro to Mass Spectrometry
• PRIDE and MetaboLights
• ProteomeXchange and MetabolomeXchange
• Data standards and tools
• OmicsDI
Overview
38. HRF Talk
17 December 2015
OmicsDI: Portal for omics datasets
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (genomics, proteomics and
metabolomics at present). Not only EBI resources are included.
PRIDE Archive
MassIVE
PASSEL
GPMDB
MetaboLights
Metabolomics Workbench
GNPS
EGA
39. HRF Talk
17 December 2015
(a) (b) (c)
(d) (e) (f)
OmicsDI: Functionality in the home page
40. HRF Talk
17 December 2015
Aknowledgements: People
Attila Csordas
Tobias Ternent
Noemi del Toro
Gerhard Mayer (Bochum, de.NBI)
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang,
Florian Reisinger and Jose A. Dianes
Other EBI teams involved in the
development of OmicsDI
Acknowledgements: The PRIDE Team
41. HRF Talk
17 December 2015
MetaboLights – The team
Previous: Pablo Conesa, Paula de Matos, Mark Rijnbeek, Tejasvi Mahendraker, Xinzhu
Wang (UC)
Kenneth Haug Reza Salek
Jose Ramon Macias Mark Williams
Kalai Jayaseelan Namrata Kale
Venkata Chandrasekhar
Christoph Steinbeck Jules Griffin (UC &
MRC)
Xuefei Li (MRC)