SlideShare a Scribd company logo
1 of 17
PSI-Proteome Informatics update
Juan Antonio Vizcaíno
PSI Meeting Heidelberg 2018
PSI-PI group structure
Role Current Encumbent
Chair Juan Antonio Vizcaíno
Co-chair Martin Eisenacher; Andy Jones
MIAPE Co-ordinator Pierre-Alain Binz
Ontology Co-ordinator Gerhard Mayer
Editor Gerhard Mayer
Secretary
Website Da Qi
Mailing List: psidev-pi-dev@lists.sourceforge.net
Of course, a very long list of people involved!
PSI Charter
Focus and Purpose
… The main goal of the PSI-PI working group is to
define community data formats and associated
controlled vocabulary terms, facilitating data
exchange and archiving of:
the downstream results of proteomics analysis by
mass spectrometry, including the identification and
quantification of peptides and proteins by
software, and the output of integrative analysis of
proteomics data with other omics technologies
(e.g. proteogenomics analysis).
PSI PI Working group
• mzIdentML
• mzQuantML
• mzTab
• Proteogenomics data formats: proBed and proBAM
• MIAPE guidelines
• Joint efforts:
– PSI MS CV
– mzTab extension for metabolomics data (Metabolomics Group)
– PSI MS group
mzIdentML 1.1
Data standard for peptide
and protein identification
data
mzIdentML 1.2
2011-
2012
2017
New support for:
- Cross-linking approaches
- Peptide level scores
- Modification localization scores
- Proteogenomics approaches
Improved support for:
- Protein inference
- Pre-fractionation
- de-novo sequencing
- Spectral library searches
Increasingly
supported
by the most-
used
proteomics
software
and
databases
jmzIdentML
mzid Library
ms-data-core-api
MyriMatch
PIA
ProCon
mzIdentML
• Overview
– XML-based data standard for peptide and protein identifications e.g. following
database search and protein inference
– Sections for all PSMs, proteins/protein groups, protocols/parameters etc.
• Timeline:
– Original 1.0 version in Aug 2009
– Version 1.1 stable (Aug 2011); Original manuscript published in MCP in 2012*
– Well supported in lots of open source and commercial software
– Fully supported by ProteomeXchange resources
– 2012 onwards (mzIdentML 1.2): extended use cases
• Better support for protein grouping. Manuscript published in Proteomics **
– 2017 mzIdentML 1.2 release; manuscript published at MCP***
• Open issues
– Implementation plan to move software from 1.1 support to 1.2?
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-
based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications
in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
*** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting
advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
mzIdentML
• Overview
– XML-based data standard for peptide and protein identifications e.g.
following database search and protein inference
– Sections for all PSMs, proteins/protein groups, protocols/parameters
etc.
• Timeline:
– Original 1.0 version in Aug 2009
– Version 1.1 stable (Aug 2011); Original manuscript published in MCP in
2012*
– Well supported in lots of open source and commercial software
– Fully supported by ProteomeXchange resources
– 2012 onwards (mzIdentML 1.2): extended use cases
• Better support for protein grouping. Manuscript published in Proteomics **
– 2017 mzIdentML 1.2 release; manuscript published at MCP***
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-
based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications
in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
*** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting
advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
mzQuantML status
Overview
• XML-based standard for quantification data Can report tables of data
(QuantLayers), columns are: StudyVariables, Assays or Ratios, rows are
ProteinGroups, Proteins or Peptides
• Can also capture 2D coordinates of quantified regions in LC-MS (Features)
Timeline
• Work started in Oct 2011, and progressed at various PSI meetings
• Completed PSI process in Feb 2013 – version 1.0 release
– Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques
(e.g. iTRAQ) and MS1 label techniques e.g. SILAC*
• Updated in 2013-2014 to support SRM as a new technique**; mzqLibrary***
• 2015, mzQuantML 1.0.1 – minor update with SRM included
Open issues
• Not widely supported by software. No live development. Efforts are being put
into mzTab support instead
*Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506
**Qi et al. PROTEOMICS, 2015, 15(18):3152-62
*** Qi et al PROTEOMICS 2015, 15, 2592-2596.
mzTab status
• Overview
– Tab delimited data standard for peptide, protein and small molecule
identifications and quantification.
– Metadata section (key-value pairs).
– Proteomics: Protein, peptide and PSM section.
– Metabolomics: Small Molecule section (in version 1.0)
• Timeline:
– Version 1.0 release in June 2014.
– Manuscript published in 2014* and Java API**
– 1.0 is stable and implemented in PRIDE/ProteomeXchange (for ID data).
• Implementations (Proteomics)
– A lot of potential interest, few implementations still (Mascot export)
– MaxQuant (still work in progress, J. Cox)
* Griss, J, Jones, A. R., Sachsenberg T, Walzer M, et al., The mzTab data exchange format: communicating MS-based proteomics and
metabolomics results to a wider audience . Molecular & Cellular Proteomics 2014, 10, 2765-75.
**Xu, QW, Griss J, Wang R, et al., jmzTab: a Java interface to the mzTab data standard. Proteomics 2014:1328-1332
mzTab status: mzTab-M 1.1
• Need to implement a format for reporting qualitative and quantitative
results for MS metabolomics data.
• It was decided to separate development between mzTab-M and mzTab-P.
Changes made to make for metabolomics are not backwards compatible
with proteomics part.
• mzTab-M 1.1 has been in active development (e.g. meeting at EBI; August
2017). The plan is to finish it in this meeting.
• Exploring changes in the proteomics version at present:
– We need to document well the needed changes.
– Start work in implementing mzTab for DIA approaches
• There is interest in a tailored version for lipidomics data (mzTab-L)
Proteogenomics related formats
• Two ongoing formats have been developed: proBed and
proBAM. Data standards formalised and manuscript just
published*
• Same overall objective: to map identified peptides to genome
coordinates for visualisation and annotation
• Different level of detail:
– proBed is tab-delimited and simpler, based on the original
BED format. Less level of detail.
– proBAM is based in the original SAM/BAM formats, widely
used in genomics. Much higher level of detail.
• Near future: Focus on Adoption -> Software implementations
* Menschaert G., Wang X., Jones A.R., Ghali F. et al. The proBAM and proBed standard formats: enabling a seamless integration
of genomics and proteomics data. Genome Biol. 2018, 19, 12.
TrackHubs in Genome Browsers
New initiative: proVCF
• A generic format for representing genetic variation at the
protein level
• Developed based on Variant Calling Format (VCF) widely used
by the genomics and transcriptomics community
• Its a tab delimited file with a header and data section
• Stores protein polymorphisms such as Single Amino Acid
polymorphisms, Insertions, and Deletions
• Provides a toolset to process proVCF files
• Shyama will present the current status, to gather interest.
MIAPE documents
• Originally one MIAPE document:
– MIAPE Mass spectrometry information (MSI) containing both identification guidelines and
quant guidelines
– Move to split: MIAPE MSI (ident only) and MIAPE Quant
MIAPE MSI (Mass Spec Informatics) status
– MIAPE MSI 1.1 published back in 2008
– Working group 2011-2012 minor updates to requirements and removal of quant parts (new
MIAPE Quant doc)… but process has not been completed for MIAPE MSI
MIAPE QUANT
– Work started on Dec 2010 by ProteoRed groups
– Major revision accepted on October 2012; Publication: Martínez-Bartolomé, S., et al.. Journal
of Proteomics, 2013. Dec 16;95:84-8.
Open issues
• MIAPE documents have not been actively worked on for some years, not obvious
there is great demand for updating these
Publications since last meeting
Main plans for meeting
• mzTab
– Discuss status of mzTab-P and start working in the
encoding of DIA data.
– Work towards finishing mzTab-M 1.1
– Discuss mzTab-L
• Discuss status of proBed/proBAM and related
software.
• Discuss proVCF format -> Capture genetic variation
data at the protein level (based on VCF) (Shyama)
• Discuss future plans for the other formats
PSI-Proteome Informatics update

More Related Content

Similar to PSI-Proteome Informatics update

Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formatsJuan Antonio Vizcaino
 
The mzTab data standard format for reporting MS-based peptide, protein and sm...
The mzTab data standard format for reporting MS-based peptide, protein and sm...The mzTab data standard format for reporting MS-based peptide, protein and sm...
The mzTab data standard format for reporting MS-based peptide, protein and sm...Juan Antonio Vizcaino
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressJuan Antonio Vizcaino
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldJuan Antonio Vizcaino
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeJuan Antonio Vizcaino
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015Juan Antonio Vizcaino
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018Alexander Pico
 
EMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesEMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesRafael C. Jimenez
 
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...AIST
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015avonmcconnell
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...David Peyruc
 

Similar to PSI-Proteome Informatics update (20)

Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
The mzTab data standard format for reporting MS-based peptide, protein and sm...
The mzTab data standard format for reporting MS-based peptide, protein and sm...The mzTab data standard format for reporting MS-based peptide, protein and sm...
The mzTab data standard format for reporting MS-based peptide, protein and sm...
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progress
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 
EMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesEMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and services
 
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015a
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 

More from Juan Antonio Vizcaino

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Juan Antonio Vizcaino
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...Juan Antonio Vizcaino
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateJuan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataJuan Antonio Vizcaino
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...Juan Antonio Vizcaino
 

More from Juan Antonio Vizcaino (20)

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 

PSI-Proteome Informatics update

  • 1. PSI-Proteome Informatics update Juan Antonio Vizcaíno PSI Meeting Heidelberg 2018
  • 2. PSI-PI group structure Role Current Encumbent Chair Juan Antonio Vizcaíno Co-chair Martin Eisenacher; Andy Jones MIAPE Co-ordinator Pierre-Alain Binz Ontology Co-ordinator Gerhard Mayer Editor Gerhard Mayer Secretary Website Da Qi Mailing List: psidev-pi-dev@lists.sourceforge.net Of course, a very long list of people involved!
  • 3. PSI Charter Focus and Purpose … The main goal of the PSI-PI working group is to define community data formats and associated controlled vocabulary terms, facilitating data exchange and archiving of: the downstream results of proteomics analysis by mass spectrometry, including the identification and quantification of peptides and proteins by software, and the output of integrative analysis of proteomics data with other omics technologies (e.g. proteogenomics analysis).
  • 4. PSI PI Working group • mzIdentML • mzQuantML • mzTab • Proteogenomics data formats: proBed and proBAM • MIAPE guidelines • Joint efforts: – PSI MS CV – mzTab extension for metabolomics data (Metabolomics Group) – PSI MS group
  • 5. mzIdentML 1.1 Data standard for peptide and protein identification data mzIdentML 1.2 2011- 2012 2017 New support for: - Cross-linking approaches - Peptide level scores - Modification localization scores - Proteogenomics approaches Improved support for: - Protein inference - Pre-fractionation - de-novo sequencing - Spectral library searches Increasingly supported by the most- used proteomics software and databases jmzIdentML mzid Library ms-data-core-api MyriMatch PIA ProCon
  • 6. mzIdentML • Overview – XML-based data standard for peptide and protein identifications e.g. following database search and protein inference – Sections for all PSMs, proteins/protein groups, protocols/parameters etc. • Timeline: – Original 1.0 version in Aug 2009 – Version 1.1 stable (Aug 2011); Original manuscript published in MCP in 2012* – Well supported in lots of open source and commercial software – Fully supported by ProteomeXchange resources – 2012 onwards (mzIdentML 1.2): extended use cases • Better support for protein grouping. Manuscript published in Proteomics ** – 2017 mzIdentML 1.2 release; manuscript published at MCP*** • Open issues – Implementation plan to move software from 1.1 support to 1.2? * Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry- based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381. ** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399. *** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
  • 7. mzIdentML • Overview – XML-based data standard for peptide and protein identifications e.g. following database search and protein inference – Sections for all PSMs, proteins/protein groups, protocols/parameters etc. • Timeline: – Original 1.0 version in Aug 2009 – Version 1.1 stable (Aug 2011); Original manuscript published in MCP in 2012* – Well supported in lots of open source and commercial software – Fully supported by ProteomeXchange resources – 2012 onwards (mzIdentML 1.2): extended use cases • Better support for protein grouping. Manuscript published in Proteomics ** – 2017 mzIdentML 1.2 release; manuscript published at MCP*** * Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry- based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381. ** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399. *** Vizcaíno, J. A., Mayer G., Perkins S., Barsnes H., et al., The mzIdentML Data Standard Version 1.2, Supporting advances in Proteome Informatics. Molecular & Cellular Proteomics 2017, 16, 1275-1285.
  • 8. mzQuantML status Overview • XML-based standard for quantification data Can report tables of data (QuantLayers), columns are: StudyVariables, Assays or Ratios, rows are ProteinGroups, Proteins or Peptides • Can also capture 2D coordinates of quantified regions in LC-MS (Features) Timeline • Work started in Oct 2011, and progressed at various PSI meetings • Completed PSI process in Feb 2013 – version 1.0 release – Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques (e.g. iTRAQ) and MS1 label techniques e.g. SILAC* • Updated in 2013-2014 to support SRM as a new technique**; mzqLibrary*** • 2015, mzQuantML 1.0.1 – minor update with SRM included Open issues • Not widely supported by software. No live development. Efforts are being put into mzTab support instead *Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506 **Qi et al. PROTEOMICS, 2015, 15(18):3152-62 *** Qi et al PROTEOMICS 2015, 15, 2592-2596.
  • 9. mzTab status • Overview – Tab delimited data standard for peptide, protein and small molecule identifications and quantification. – Metadata section (key-value pairs). – Proteomics: Protein, peptide and PSM section. – Metabolomics: Small Molecule section (in version 1.0) • Timeline: – Version 1.0 release in June 2014. – Manuscript published in 2014* and Java API** – 1.0 is stable and implemented in PRIDE/ProteomeXchange (for ID data). • Implementations (Proteomics) – A lot of potential interest, few implementations still (Mascot export) – MaxQuant (still work in progress, J. Cox) * Griss, J, Jones, A. R., Sachsenberg T, Walzer M, et al., The mzTab data exchange format: communicating MS-based proteomics and metabolomics results to a wider audience . Molecular & Cellular Proteomics 2014, 10, 2765-75. **Xu, QW, Griss J, Wang R, et al., jmzTab: a Java interface to the mzTab data standard. Proteomics 2014:1328-1332
  • 10. mzTab status: mzTab-M 1.1 • Need to implement a format for reporting qualitative and quantitative results for MS metabolomics data. • It was decided to separate development between mzTab-M and mzTab-P. Changes made to make for metabolomics are not backwards compatible with proteomics part. • mzTab-M 1.1 has been in active development (e.g. meeting at EBI; August 2017). The plan is to finish it in this meeting. • Exploring changes in the proteomics version at present: – We need to document well the needed changes. – Start work in implementing mzTab for DIA approaches • There is interest in a tailored version for lipidomics data (mzTab-L)
  • 11. Proteogenomics related formats • Two ongoing formats have been developed: proBed and proBAM. Data standards formalised and manuscript just published* • Same overall objective: to map identified peptides to genome coordinates for visualisation and annotation • Different level of detail: – proBed is tab-delimited and simpler, based on the original BED format. Less level of detail. – proBAM is based in the original SAM/BAM formats, widely used in genomics. Much higher level of detail. • Near future: Focus on Adoption -> Software implementations * Menschaert G., Wang X., Jones A.R., Ghali F. et al. The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data. Genome Biol. 2018, 19, 12.
  • 13. New initiative: proVCF • A generic format for representing genetic variation at the protein level • Developed based on Variant Calling Format (VCF) widely used by the genomics and transcriptomics community • Its a tab delimited file with a header and data section • Stores protein polymorphisms such as Single Amino Acid polymorphisms, Insertions, and Deletions • Provides a toolset to process proVCF files • Shyama will present the current status, to gather interest.
  • 14. MIAPE documents • Originally one MIAPE document: – MIAPE Mass spectrometry information (MSI) containing both identification guidelines and quant guidelines – Move to split: MIAPE MSI (ident only) and MIAPE Quant MIAPE MSI (Mass Spec Informatics) status – MIAPE MSI 1.1 published back in 2008 – Working group 2011-2012 minor updates to requirements and removal of quant parts (new MIAPE Quant doc)… but process has not been completed for MIAPE MSI MIAPE QUANT – Work started on Dec 2010 by ProteoRed groups – Major revision accepted on October 2012; Publication: Martínez-Bartolomé, S., et al.. Journal of Proteomics, 2013. Dec 16;95:84-8. Open issues • MIAPE documents have not been actively worked on for some years, not obvious there is great demand for updating these
  • 16. Main plans for meeting • mzTab – Discuss status of mzTab-P and start working in the encoding of DIA data. – Work towards finishing mzTab-M 1.1 – Discuss mzTab-L • Discuss status of proBed/proBAM and related software. • Discuss proVCF format -> Capture genetic variation data at the protein level (based on VCF) (Shyama) • Discuss future plans for the other formats