The document discusses the Proteomics Standards Initiative (PSI), which develops data format standards for proteomics to facilitate data sharing and reproducibility. It notes that PSI has developed several standard file formats for mass spectrometry-based proteomics data, including mzML for MS data, mzIdentML for identification data, and mzTab for final results. It also maintains related controlled vocabularies and specifies minimum reporting guidelines. The document outlines PSI's process for developing and reviewing standards and lists its current objectives to improve adoption, extend standards to other omics fields, and facilitate reproducible analysis pipelines.
1. The Proteomics Standards Initiative (PSI)
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
co-Chair PSI Proteomics Informatics Working Group
EMBL-European Bioinformatics Institute (EMBL-EBI)
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene & protein expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
3. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software producers,
publishers, … as many interested parties as possible.
•Started in 2002, so some experience already…
•One annual meeting in March-April, regular phone calls.
http://www.psidev.info
HUPO Proteomics Standards Initiative
4. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
PSI Deliverables
•Formats: Usually an XML schema (but also tab-delimited files)
•Controlled vocabularies: Usually an OBO-style hierarchical controlled
vocabulary precisely defining the metadata that are encoded in the
formats.
•Minimum information (MIAPE) specifications: Format-independent
specification of minimum information guidelines.
•Databases and Tools: Foster software implementations to make the
standards truly useful.
•Community interaction to ensure deposition of data in public
repositories.
5. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Mass Spectrometry (MS)-based proteomics
• Many different workflows -> Many different data types ->
Need for several data standards.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition (DDA)
• Data independent acquisition (DIA)
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction Monitoring)
6. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Current PSI Standard File Formats for MS
• mzMLMS data
• mzIdentMLIdentification
• mzQuantMLQuantitation
• mzTabFinal Results
• TraMLSRM
7. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Current PSI Standard File Formats for MS
• mzMLMS data
• mzIdentMLIdentification
• mzQuantMLQuantitation
• mzTabFinal Results
• TraMLSRM
8. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Binary data
mzData
mzXML
mzML
XML-based
files
.dta, .pkl, .mgf,
.ms2
Peak lists
Data formats for mass spectra data
9. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
PSI document process
•Every data standard has to undergo a
thorough review process…
•In fact, in practice, two review processes
happen in parallel: the PSI and usually a
manuscript review as well.
10. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
PSI MS Controlled Vocabulary
Mayer et al., Database, 2013~2,600 terms by February 2017
11. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
Objectives of the PSI in the next 5 years
• Improve adoption/support, especially in vendors software
• Extend current proteomics data standards to MS
metabolomics
• Facilitate reproducible analysis pipelines.
• Finalize compatible formats with genomics data e.g.
proBed & proBAM (applicable for proteogenomics studies)
• Start working with the structural biology community (since
MS proteomics is being increasingly used in that context)
12. Juan A. Vizcaíno
juan@ebi.ac.uk
SmartLab Exchange Europe 2017
Berlin, 8 February 2017
HUPO Proteomics Standards Initiative (PSI)
Lennart Martens (U Ghent)
Darren Kessner (CSHS)
Matt Chambers (Vanderbilt)
Jim Shofstahl (Thermo)
Fredrik Levander (U Lund)
Steffen Neumann (IPB Halle)
Juan Antonio Vizcaíno (EBI)
Florian Reisinger (EBI)
Luisa Montecchi-Palazzi (EBI)
Randy Julian (Indigo)
Natalie Tasman (Insilicos)
Jari Häkkinen (U Lund)
Brian Pratt (Insilicos)
Erik Nilsson (Insilicos)
Mike Coleman (Stowers)
Luis Mendoza (ISB)
David Shteynberg (ISB)
Lars Nilse (Manchester)
David Tabb (Stellenbosch U)
Mathias Walzer (U Tübingen)
Gerben Menschaert (U Gent)
Xiaojing Wang (Baylor)
Chris Taylor (EBI)
Patrick Pedrioli (ETHZ)
Sean Seymour (AB Sciex)
David Creasy (Matrix Science)
Angel Pizarro (U Penn)
Phil Jones (EBI)
Jimmy Eng (U Washington)
Kent Laursen (Indigo)
Howard Read (Waters)
Jim Langridge (Waters)
Benito Cañas (Madrid)
Lola Gutierrez (Madrid)
Alberto Medina (Madrid)
Trish Wheztel (U Penn)
Eva Duchoslav (MDS Sciex)
Jayson Falkner (U Michigan)
David Horn (Agilent)
Henning Hermjakob (EBI)
Andy Jones (U Liverpool)
Sandra Orchard (EBI)
Andreas Römpp (U Giessen)
Marc Sturm (U Tübingen)
Parag Mallick (Stanford)
Norman Paton (Manchester)
Ron Beavis (UBC)
Ruedi Aebersold (ETHZ)
Wilfred Tang (ABI)
Rune Philosof
David Sparkman (U Pacific)
Marius Kallhardt (Bruker)
Nail Swainston (Manchester)
Ruth McNally (Cardiff)
Chris Allen
Paul Rudnik (NIST)
Steve Stein (NIST)
Mi-Youn Brusniak (ISB)
Dave Campbell (ISB)
Role Name
Chair Eric Deutsch, Institute for Systems Biology
Co-chairs Henning Hermjakob, European Bioinformatics Institute
Andy Jones, University of Liverpool
Secretary Sandra Orchard, European Bioinformatics Institute
MIAPE Pierre-Alain Binz, Swiss Institute for Bioinformatics
Ontology Gerhard Mayer, Medizinische Proteom-Center Bochum
Editors Martin Eisenacher, Medizinische Proteom-Center Bochum
Andy Jones, University of Liverpool
Sylvie Ricard-Blum, University of Lyon
WG chairs Sandra Orchard (Molecular Interactions)
Eric Deutsch (Mass Spectrometry)
Andy Jones (Proteomics Informatics)
David Tabb (Quality Control)