The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

mzTab - Reporting MS-based Proteomics
and Metabolomics Results
Dr. Juan A. Vizcaíno on behalf of
Dr. Johannes Griss
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Division of Immunology, Allergy and
Infectious Diseases
Department of Dermatology
Medical University of Vienna, Austria

Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
Overview
• Need for mzTab
• Details about the data format (mzTab 1.0)
• Existing software implementations
• Extension of mzTab 1.0 for metabolomics

HUPO Proteomics Standards Initiative
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software
producers, publishers, …
•Active Workgroups: MI, MS, PI, Mod, (Protein Separation).
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
www.psidev.info

PSI-MS/PI Standard File Formats before mzTab
Quantitation •mzQuantML
Identification •mzIdentML
MS data •mzML
Johannes Griss
jgriss@ebi.ac.uk
SRM • TraML
HUPO 2014

Reasons for an additional file format (mzTab)
• mzIdentML and mzQuantML (necessary) focus on
complete representation of proteomics results
• Complex XML-based file formats
• Specialised software required for visualisation
• In-depth bioinformatics understanding required to create and
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
use files
• No simple method to communicate final results to non-proteomics
experts
• No simple method to utilise files through scripting
languages and standard statistical software

Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab – Aims
• Store final results of MS-based experiment in a single file
• Quantitation data
• Identification data
• Small Molecule data
• Reduce complexity to make data accessible to non-proteomics
/ bioinformatics experts
• Be easily accessible using “standard” software

Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab – Aims
• What the format does NOT aim at:
• Replace mzIdentML or mzQuantML for proteomics
approaches
• Contain the complete data of a MS based experiment
• Provide fully detailed evidence for the data
• Allow a researcher to recreate the process which led to the
results

Why a tab-delimited file?
• Using XML based formats requires sophisticated
bioinformatics expertise
• Many researchers are still used to use MS Excel to “look”
at or exchange their data.
• Standard tab-delimited file formats for transcriptomics
(MAGE-TAB) and molecular interactions (MI-TAB) data
were already successful
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014

Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab format
http://mztab.googlecode.com

Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
mzTab - Sections
• Basic information about experiment and sample
• Key-Value pairs Metadata
• Basic information about protein identifications
• Table-based Protein
• Information about quantified peptides
• Table-based Peptide
• Information about identified spectra
• Table-based PSM
• Basic information about identified small molecules
• Table-based Small Molecule

Metadata section - Example
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014

mzTab –Modes and Types
• Modes (depending on the level of detail):
• ‘Summary’: only the ‘final results’.
• ‘Complete’: detailed information for each individual assay or
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
replicate is provided.
• Types:
• ‘Identification’: Only identification results.
• ‘Quantification’: They can also contain identification results.
• Overall, 4 different files “flavors” are possible, so very
flexible design.

Protein Section (label-free)
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014

Peptide Section (label-free)
• Only used in “Quantification” files.
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014

PSM section (identification data)
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014

mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript
published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
group).
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).

mzTab – ongoing development
• More detailed modelling of MS metabolomics data
• Led by S. Neumann (COSMOS EU FP7 project).
• Extension from one to three sections.
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
http://www.cosmos-fp7.eu/

mzTab format related publications
J. Griss et al., MCP, 2014
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
http://code.google.com/p/mztab/
Q.W. Xu et al., Proteomics, 2014

Current PSI-MS/PI Standard File Formats
Final Results • mzTab
Quantitation • mzQuantML
Identification • mzIdentML
MS data • mzML
Johannes Griss
jgriss@ebi.ac.uk
SRM • TraML
HUPO 2014

Acknowledgements
Johannes Griss
jgriss@ebi.ac.uk
HUPO 2014
Johannes Griss
Qing-Wei Xu
Henning Hermjakob
Timo Sachsenberg
Mathias Walzer
Oliver Kohlbacher
http://mztab.googlecode.com
Andy Jones
S. Neumann and other COSMOS
partners
PSI editor and reviewers
… and many others have
also contributed
BBSRC PROCESS grant
BBSRC ProteoSuite grant

The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Recommended

Recommended

More Related Content

Similar to The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Similar to The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results (20)

More from Juan Antonio Vizcaino

More from Juan Antonio Vizcaino (20)

Recently uploaded

Recently uploaded (20)

The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results