2. PSI-PI group structure
Role Current Encumbent
Chair Juan Antonio Vizcaíno
Co-chair Martin Eisenacher; Andy Jones
MIAPE Co-ordinator Pierre-Alain Binz
Ontology Co-ordinator Gerhard Mayer
Editor Gerhard Mayer
Secretary
Website Da Qi
Mailing List: psidev-pi-dev@lists.sourceforge.net
Of course, a very long list of people involved!
3. PSI Charter
Focus and Purpose
… The main goal of the PSI-PI working group is to
define community data formats and associated
controlled vocabulary terms, facilitating data
exchange and archiving of:
the downstream results of proteomics analysis by
mass spectrometry, including the identification and
quantification of peptides and proteins by
software, and the output of integrative analysis of
proteomics data with other omics technologies
(e.g. proteogenomics analysis).
4. PSI PI Working group
• mzIdentML
• mzQuantML
• mzTab
• Proteogenomics data formats: proBed and proBAM
• MIAPE guidelines
• Joint efforts:
– PSI MS CV
– mzTab extension for metabolomics data (Metabolomics Group)
– PSI MS group
5. mzIdentML 1.1
Data standard for peptide
and protein identification
data
mzIdentML 1.2
2011-
2012
2017
New support for:
- Cross-linking approaches
- Peptide level scores
- Modification localization scores
- Proteogenomics approaches
Improved support for:
- Protein inference
- Pre-fractionation
- de-novo sequencing
- Spectral library searches
Increasingly
supported
by the most-
used
proteomics
software
and
databases
jmzIdentML
mzid Library
ms-data-core-api
MyriMatch
PIA
ProCon
6. mzIdentML
• Overview
– XML-based data standard for peptide and protein identifications e.g. following
database search and protein inference
– Sections for all PSMs, proteins/protein groups, protocols/parameters etc.
• Timeline:
– Original 1.0 version in Aug 2009
– Version 1.1 stable (Aug 2011); Original manuscript published in MCP in 2012*
– Well supported in lots of open source and commercial software
– Fully supported by ProteomeXchange resources
– 2012 onwards (mzIdentML 1.2): extended use cases
• Better support for protein grouping. Manuscript published in Proteomics **
– 2017 mzIdentML 1.2 release; manuscript published at MCP***
• Open issues
– Implementation plan to move software from 1.1 support to 1.2?
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-
based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications
in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
*** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting
advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
7. mzIdentML
• Overview
– XML-based data standard for peptide and protein identifications e.g.
following database search and protein inference
– Sections for all PSMs, proteins/protein groups, protocols/parameters
etc.
• Timeline:
– Original 1.0 version in Aug 2009
– Version 1.1 stable (Aug 2011); Original manuscript published in MCP in
2012*
– Well supported in lots of open source and commercial software
– Fully supported by ProteomeXchange resources
– 2012 onwards (mzIdentML 1.2): extended use cases
• Better support for protein grouping. Manuscript published in Proteomics **
– 2017 mzIdentML 1.2 release; manuscript published at MCP***
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-
based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications
in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
*** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting
advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
8. mzQuantML status
Overview
• XML-based standard for quantification data Can report tables of data
(QuantLayers), columns are: StudyVariables, Assays or Ratios, rows are
ProteinGroups, Proteins or Peptides
• Can also capture 2D coordinates of quantified regions in LC-MS (Features)
Timeline
• Work started in Oct 2011, and progressed at various PSI meetings
• Completed PSI process in Feb 2013 – version 1.0 release
– Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques
(e.g. iTRAQ) and MS1 label techniques e.g. SILAC*
• Updated in 2013-2014 to support SRM as a new technique**; mzqLibrary***
• 2015, mzQuantML 1.0.1 – minor update with SRM included
Open issues
• Not widely supported by software. No live development. Efforts are being put
into mzTab support instead
*Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506
**Qi et al. PROTEOMICS, 2015, 15(18):3152-62
*** Qi et al PROTEOMICS 2015, 15, 2592-2596.
9. mzTab status
• Overview
– Tab delimited data standard for peptide, protein and small molecule
identifications and quantification.
– Metadata section (key-value pairs).
– Proteomics: Protein, peptide and PSM section.
– Metabolomics: Small Molecule section (in version 1.0)
• Timeline:
– Version 1.0 release in June 2014.
– Manuscript published in 2014* and Java API**
– 1.0 is stable and implemented in PRIDE/ProteomeXchange (for ID data).
• Implementations (Proteomics)
– A lot of potential interest, few implementations still (Mascot export)
– MaxQuant (still work in progress, J. Cox)
* Griss, J, Jones, A. R., Sachsenberg T, Walzer M, et al., The mzTab data exchange format: communicating MS-based proteomics and
metabolomics results to a wider audience . Molecular & Cellular Proteomics 2014, 10, 2765-75.
**Xu, QW, Griss J, Wang R, et al., jmzTab: a Java interface to the mzTab data standard. Proteomics 2014:1328-1332
10. mzTab status: mzTab-M 1.1
• Need to implement a format for reporting qualitative and quantitative
results for MS metabolomics data.
• It was decided to separate development between mzTab-M and mzTab-P.
Changes made to make for metabolomics are not backwards compatible
with proteomics part.
• mzTab-M 1.1 has been in active development (e.g. meeting at EBI; August
2017). The plan is to finish it in this meeting.
• Exploring changes in the proteomics version at present:
– We need to document well the needed changes.
– Start work in implementing mzTab for DIA approaches
• There is interest in a tailored version for lipidomics data (mzTab-L)
11. Proteogenomics related formats
• Two ongoing formats have been developed: proBed and
proBAM. Data standards formalised and manuscript just
published*
• Same overall objective: to map identified peptides to genome
coordinates for visualisation and annotation
• Different level of detail:
– proBed is tab-delimited and simpler, based on the original
BED format. Less level of detail.
– proBAM is based in the original SAM/BAM formats, widely
used in genomics. Much higher level of detail.
• Near future: Focus on Adoption -> Software implementations
* Menschaert G., Wang X., Jones A.R., Ghali F. et al. The proBAM and proBed standard formats: enabling a seamless integration
of genomics and proteomics data. Genome Biol. 2018, 19, 12.
13. New initiative: proVCF
• A generic format for representing genetic variation at the
protein level
• Developed based on Variant Calling Format (VCF) widely used
by the genomics and transcriptomics community
• Its a tab delimited file with a header and data section
• Stores protein polymorphisms such as Single Amino Acid
polymorphisms, Insertions, and Deletions
• Provides a toolset to process proVCF files
• Shyama will present the current status, to gather interest.
14. MIAPE documents
• Originally one MIAPE document:
– MIAPE Mass spectrometry information (MSI) containing both identification guidelines and
quant guidelines
– Move to split: MIAPE MSI (ident only) and MIAPE Quant
MIAPE MSI (Mass Spec Informatics) status
– MIAPE MSI 1.1 published back in 2008
– Working group 2011-2012 minor updates to requirements and removal of quant parts (new
MIAPE Quant doc)… but process has not been completed for MIAPE MSI
MIAPE QUANT
– Work started on Dec 2010 by ProteoRed groups
– Major revision accepted on October 2012; Publication: Martínez-Bartolomé, S., et al.. Journal
of Proteomics, 2013. Dec 16;95:84-8.
Open issues
• MIAPE documents have not been actively worked on for some years, not obvious
there is great demand for updating these
16. Main plans for meeting
• mzTab
– Discuss status of mzTab-P and start working in the
encoding of DIA data.
– Work towards finishing mzTab-M 1.1
– Discuss mzTab-L
• Discuss status of proBed/proBAM and related
software.
• Discuss proVCF format -> Capture genetic variation
data at the protein level (based on VCF) (Shyama)
• Discuss future plans for the other formats