The EMBL-EBI ELIXIR Node
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EBI activities split by the current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EBI activities split by the current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
• Any data workflow is now supported.
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
PRIDE is leading the global ProteomeXchange Consortium
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
PRIDE Archive – ~6,000 datasets from over
51 countries and >2,000 groups
Data volume:
• Total: ~280 TB
• Number of all files: ~560,000
• PXD000320-324: ~ 4 TB
• PXD002319-26 ~2.4 TB
• PXD001471 ~1.6 TB
• >50% of all are publicly
accessible
• ~90% of all
ProteomeXchange datasets
Year
Submissions
All submissions
Complete
PRIDE Archive growth
In 2016:
1,979 submitted datasets
(Record)
~165 datasets per month
Main organisms represented
~50% of datasets Homo sapiens
Mus musculus
Saccharomyces cerevisiae
Arabidopsis thaliana
Rattus norvegicus
>900 reported taxa in total
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Public proteomics datasets are being increasingly
reused…
Martens & Vizcaíno, Trends Bioch Sci, 2017
Data download in 2016: 243 TB
0
50
100
150
200
250
300
2013 2014 2015 2016
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
PRIDE and ELIXIR
• PRIDE is by far, the world’s largest proteomics data
repository.
• PRIDE has submitted an application to become a core
ELIXIR resource.
• Not only an EMBL-EBI activity. Involvement of ELIXIR-DE in
PRIDE activities:
• G. Mayer (Bochum) helping with data submissions.
• Federated PRIDE in the future?
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EBI activities split by the current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
PRIDE Components: Data Submission Process
PRIDE Inspector PX Submission Tool
mzIdentML
mzTab
In addition to PRIDE Archive, the PRIDE team develops
and maintains different tools and software libraries to
facilitate the handling and visualisation of MS proteomics
data and the submission process
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EBI activities split by the current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
•Develops data standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software producers,
publishers, everyone who wants to be involved…
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
•One annual meeting in March-April, regular phone calls.
•Closer interaction with the metabolomics community (MSI).
http://www.psidev.info
HUPO Proteomics Standards Initiative
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Current PSI Standard File Formats for MS
• mzMLMS data
• mzIdentMLIdentification
• mzQuantMLQuantitation
• mzTabFinal Results
• TraMLSRM
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., Bioinformatics,
2015
Perez-Riverol et al., MCP, 2016
• PRIDE Inspector - standalone tool to enable visualisation and validation of MS
data.
• Build on top of ms-data-core-api - open source algorithms and libraries for
computational proteomics.
• Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE
XML.
• Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-api
https://github.com/PRIDE-Toolsuite/pride-inspector
Summary and QC charts Peptide spectra annotation and
visualization
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Public datasets from different omics: OmicsDI
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., Nat Biotechnol, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Summary
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EBI activities split by the current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Compute
• Scalable, fully reproducible and freely available pipelines are
needed, e.g. for certification purposes.
• Need to do it for all the main proteomics analysis
approaches, and for multi-omics techniques (e.g.
proteogenomics).
• It should be possible to deploy them in different computing
set-ups (e.g. cloud environments).
• Needed to tackle larger studies (e.g. in clinical context).
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
ELIXIR Implementation Project
• 1-year project just started. Led by EMBL-EBI (Vizcaíno) and
ELIXIR-Germany (Kohlbacher, Eisenacher).
• Aim: Development of reproducible data analysis pipelines for
shot-gun proteomics using the OpenMS framework.
• Deployment in the EMBL-EBI “Embassy cloud” as proof of
concept:
• Facilitate deployment in other cloud environments.
• Direct connection with public datasets in PRIDE.
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
EMBL-activities sorted by current ELIXIR
platforms
• Data
• Tools
• Interoperability
• Compute
• Training
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Annual WT Proteomics Bioinformatics
Course
• Other shorter raining activities (e.g. EMBL-EBI e-learning
platform)
• More coordination of training activities is needed
• Co-organised by L.
Martens & myself
• Running for 10 years
• It includes many
relevant resources
and tools
• It has been
sponsored by EuPA
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Conclusions
• Data -> PRIDE database
• Tools -> PRIDE Inspector/ PX submission tool
• Interoperability Data standards/ PRIDE and other resources
• Compute -> Starting to work in data analysis pipelines
• Training -> More coordination is needed
Juan A. Vizcaíno
juan@ebi.ac.uk
ELIXIR meeting
Tuebingen, 1 March 2017
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Enrique Perez
Former team members, especially Rui
Wang, Florian Reisinger, Noemi del Toro,
Jose A. Dianes & Henning Hermjakob
Alvis Brazma, Ugis Sarkans & Robert
Petryszak
Acknowledgements: The PRIDE Team
@pride_ebi
@proteomexchange

Introduction to EBI for Proteomics in ELIXIR

  • 1.
    The EMBL-EBI ELIXIRNode Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK
  • 2.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EBI activities split by the current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 3.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EBI activities split by the current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 4.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 • PRIDE stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches • Any data workflow is now supported. PRIDE (PRoteomics IDEntifications) Archive http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 5.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 PRIDE is leading the global ProteomeXchange Consortium PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory raw data deposition since July 2015 Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. http://www.proteomexchange.org New in 2016 Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017
  • 6.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 PRIDE Archive – ~6,000 datasets from over 51 countries and >2,000 groups Data volume: • Total: ~280 TB • Number of all files: ~560,000 • PXD000320-324: ~ 4 TB • PXD002319-26 ~2.4 TB • PXD001471 ~1.6 TB • >50% of all are publicly accessible • ~90% of all ProteomeXchange datasets Year Submissions All submissions Complete PRIDE Archive growth In 2016: 1,979 submitted datasets (Record) ~165 datasets per month Main organisms represented ~50% of datasets Homo sapiens Mus musculus Saccharomyces cerevisiae Arabidopsis thaliana Rattus norvegicus >900 reported taxa in total
  • 7.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Public proteomics datasets are being increasingly reused… Martens & Vizcaíno, Trends Bioch Sci, 2017 Data download in 2016: 243 TB 0 50 100 150 200 250 300 2013 2014 2015 2016 Downloads in TBs
  • 8.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 PRIDE and ELIXIR • PRIDE is by far, the world’s largest proteomics data repository. • PRIDE has submitted an application to become a core ELIXIR resource. • Not only an EMBL-EBI activity. Involvement of ELIXIR-DE in PRIDE activities: • G. Mayer (Bochum) helping with data submissions. • Federated PRIDE in the future?
  • 9.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EBI activities split by the current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 10.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 PRIDE Components: Data Submission Process PRIDE Inspector PX Submission Tool mzIdentML mzTab In addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
  • 11.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EBI activities split by the current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 12.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 •Develops data standards for proteomics. •Both data representation and annotation standards. •Involves data producers, database providers, software producers, publishers, everyone who wants to be involved… •Inter-group activities: MIAPE and Controlled Vocabularies. •Started in 2002, so some experience already… •One annual meeting in March-April, regular phone calls. •Closer interaction with the metabolomics community (MSI). http://www.psidev.info HUPO Proteomics Standards Initiative
  • 13.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Current PSI Standard File Formats for MS • mzMLMS data • mzIdentMLIdentification • mzQuantMLQuantitation • mzTabFinal Results • TraMLSRM
  • 14.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., Bioinformatics, 2015 Perez-Riverol et al., MCP, 2016 • PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. • Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics. • Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML. • Broad functionality. https://github.com/PRIDE-Utilities/ms-data-core-api https://github.com/PRIDE-Toolsuite/pride-inspector Summary and QC charts Peptide spectra annotation and visualization
  • 15.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Public datasets from different omics: OmicsDI http://www.ebi.ac.uk/Tools/omicsdi/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA Perez-Riverol et al., Nat Biotechnol, in press
  • 16.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Summary
  • 17.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EBI activities split by the current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 18.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Compute • Scalable, fully reproducible and freely available pipelines are needed, e.g. for certification purposes. • Need to do it for all the main proteomics analysis approaches, and for multi-omics techniques (e.g. proteogenomics). • It should be possible to deploy them in different computing set-ups (e.g. cloud environments). • Needed to tackle larger studies (e.g. in clinical context).
  • 19.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 ELIXIR Implementation Project • 1-year project just started. Led by EMBL-EBI (Vizcaíno) and ELIXIR-Germany (Kohlbacher, Eisenacher). • Aim: Development of reproducible data analysis pipelines for shot-gun proteomics using the OpenMS framework. • Deployment in the EMBL-EBI “Embassy cloud” as proof of concept: • Facilitate deployment in other cloud environments. • Direct connection with public datasets in PRIDE.
  • 20.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 EMBL-activities sorted by current ELIXIR platforms • Data • Tools • Interoperability • Compute • Training
  • 21.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Annual WT Proteomics Bioinformatics Course • Other shorter raining activities (e.g. EMBL-EBI e-learning platform) • More coordination of training activities is needed • Co-organised by L. Martens & myself • Running for 10 years • It includes many relevant resources and tools • It has been sponsored by EuPA
  • 22.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Conclusions • Data -> PRIDE database • Tools -> PRIDE Inspector/ PX submission tool • Interoperability Data standards/ PRIDE and other resources • Compute -> Starting to work in data analysis pipelines • Training -> More coordination is needed
  • 23.
    Juan A. Vizcaíno juan@ebi.ac.uk ELIXIRmeeting Tuebingen, 1 March 2017 Aknowledgements: People Attila Csordas Tobias Ternent Gerhard Mayer (de.NBI) Johannes Griss Yasset Perez-Riverol Manuel Bernal-Llinares Andrew Jarnuczak Enrique Perez Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob Alvis Brazma, Ugis Sarkans & Robert Petryszak Acknowledgements: The PRIDE Team @pride_ebi @proteomexchange