The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results
1. mzTab - Reporting MS-based Proteomics
and Metabolomics Results
Dr. Juan A. Vizcaíno on behalf of
Dr. Johannes Griss
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Division of Immunology, Allergy and
Infectious Diseases
Department of Dermatology
Medical University of Vienna, Austria
2. Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
Overview
• Need for mzTab
• Details about the data format (mzTab 1.0)
• Existing software implementations
• Extension of mzTab 1.0 for metabolomics
3. HUPO Proteomics Standards Initiative
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software
producers, publishers, …
•Active Workgroups: MI, MS, PI, Mod, (Protein Separation).
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
www.psidev.info
4. PSI-MS/PI Standard File Formats before mzTab
Quantitation •mzQuantML
Identification •mzIdentML
MS data •mzML
Johannes Griss
jgriss@ebi.ac.uk
SRM • TraML
HUPO 2014
5. Reasons for an additional file format (mzTab)
• mzIdentML and mzQuantML (necessary) focus on
complete representation of proteomics results
• Complex XML-based file formats
• Specialised software required for visualisation
• In-depth bioinformatics understanding required to create and
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
use files
• No simple method to communicate final results to non-proteomics
experts
• No simple method to utilise files through scripting
languages and standard statistical software
6. Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab – Aims
• Store final results of MS-based experiment in a single file
• Quantitation data
• Identification data
• Small Molecule data
• Reduce complexity to make data accessible to non-proteomics
/ bioinformatics experts
• Be easily accessible using “standard” software
7. Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab – Aims
• What the format does NOT aim at:
• Replace mzIdentML or mzQuantML for proteomics
approaches
• Contain the complete data of a MS based experiment
• Provide fully detailed evidence for the data
• Allow a researcher to recreate the process which led to the
results
8. Why a tab-delimited file?
• Using XML based formats requires sophisticated
bioinformatics expertise
• Many researchers are still used to use MS Excel to “look”
at or exchange their data.
• Standard tab-delimited file formats for transcriptomics
(MAGE-TAB) and molecular interactions (MI-TAB) data
were already successful
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
10. Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab - Sections
• Basic information about experiment and sample
• Key-Value pairs Metadata
• Basic information about protein identifications
• Table-based Protein
• Information about quantified peptides
• Table-based Peptide
• Information about identified spectra
• Table-based PSM
• Basic information about identified small molecules
• Table-based Small Molecule
12. mzTab –Modes and Types
• Modes (depending on the level of detail):
• ‘Summary’: only the ‘final results’.
• ‘Complete’: detailed information for each individual assay or
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
replicate is provided.
• Types:
• ‘Identification’: Only identification results.
• ‘Quantification’: They can also contain identification results.
• Overall, 4 different files “flavors” are possible, so very
flexible design.
17. mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript
published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
group).
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).
18. mzTab – ongoing development
• More detailed modelling of MS metabolomics data
• Led by S. Neumann (COSMOS EU FP7 project).
• Extension from one to three sections.
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
http://www.cosmos-fp7.eu/
19. mzTab format related publications
J. Griss et al., MCP, 2014
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
http://code.google.com/p/mztab/
Q.W. Xu et al., Proteomics, 2014
21. Current PSI-MS/PI Standard File Formats
Final Results • mzTab
Quantitation • mzQuantML
Identification • mzIdentML
MS data • mzML
Johannes Griss
jgriss@ebi.ac.uk
SRM • TraML
HUPO 2014
22. Acknowledgements
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
Johannes Griss
Qing-Wei Xu
Henning Hermjakob
Timo Sachsenberg
Mathias Walzer
Oliver Kohlbacher
http://mztab.googlecode.com
Andy Jones
S. Neumann and other COSMOS
partners
PSI editor and reviewers
… and many others have
also contributed
BBSRC PROCESS grant
BBSRC ProteoSuite grant