ProteomeXchange update
Dr. Juan Antonio Vizcaíno
(on behalf of all ProteomeXchange partners)
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
New in 2017
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
VIP
Load balance server 1
nginx keepalived
CentOS
Load balance server 2
nginx keepalived
CentOS
Application server 1
SpringMVC MyBatis
tomcat
java
CentOS
Application server 2
SpringMVC MyBatis
tomcat
java
CentOS
Database server (Master)
CentOS
MySql
Database server (slave)
CentOS
MySql
Data storage server 2
nginx
CentOS
Data storage server 1
nginx keepalived
CentOS
aspera
Data storage server 3
nginx keepalived
CentOS
aspera
iProX- the integrated proteome resources in China
Cloud platform architecture
with High Availability
http://www.iprox.org
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Deployment of iProX
Beijing
Hunan
Shanghai
• BPRC & NCPSB (Beijing): Main
location of deployment and the
only submission site
• Three Offsite data backups
• CNIC (Beijing, north China)
• SCBIT(Shanghai, east China)
• NSCC(Hunan, south China)
• All four sites will provide
downloading service at the same
time coordinated by the load
balancer.
• By the end of March 2018, 374
datasets are submitted, with a
total amount of 60 TB
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
New in 2017
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
https://jpostdb.org/
Repository
is going well.
Database part is just open.
Re-analysis part is
under development.
Funding is just renewed for next 5 years!
JPOST status
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Public datasets from different omics: OmicsDI
http://www.omicsdi.org/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
…and others
Perez-Riverol et al., Nat Biotechnol, 2017
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Data content per resource (PXD identifiers)
84.9%
11.5%
1.8% 1.5% 0.3%
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
PRIDE data submissions and data growth
> 2,400 datasets submitted in 2017
In March 2018 we have reached for the
first time 300 submitted datasets
Datasets submitted per month Datasets submitted per year
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Data re-use in proteomics is increasing
Data download volume for PRIDE Archive in
2017: 295 TB
0
50
100
150
200
250
300
350
2013 2014 2015 2016 2017
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Guidelines developed
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Guidelines developed
• Initial implementation in MassIVE
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Other guidelines developed during the last year
• Retraction of datasets (“Re-calling”)
• Support for alternative location of datasets (alternative URLs)
• Try to get external datasets into PX (e.g. CPTAC)
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Panorama Public
• Panorama Public is designed for sharing data generated
through Skyline-based targeted proteomics workflows such as
SRM and PRM or targeted DDA and DIA.
• Led by Brendan MacLean & Mike MacCoss group
• Processed results are stored in the Skyline XML format
• Interested to join ProteomeXchange as a repository for targeted
proteomics workflows.
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Panorama Public
https://panoramaweb.org/
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
PRIDE has become and ELIXIR core data resource
• ELIXIR coordinates, integrates and sustains bioinformatics
resources across Europe and enables users in academia and
industry to access services that are vital for their research
• First list of core resources announced on July 2017.
• PRIDE included in the initial list.
https://www.elixir-europe.org/platforms/data/core-data-resources
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
• The goal of the ELIXIR proteomics community is to
develop and maintain sustainable proteomics
tools and data resources
• An essential part of the development will also be the
‘FAIRification’ of the resources (i.e. making the
resources FAIR)
• Integrate proteomics bioinformatics activities in
ELIXIR
PRIDE as a “pillar” of the ELIXIR Proteomics Community
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Main plans for meeting
• General update
• Panorama Public application to join PX
• Do we need more formalised guidelines for several topics?
• Short Report about GDPR guidelines
• Two related projects (NIH “data standards” grant):
• Universal Spectrum Identifier (USI)
• PROXI
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Aknowledgements: People
Yasset Perez-Riverol
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Andrew Jarnuczak
Mathias Walzer
Suresh Hewapathirana
Jingwen Bai
Former team members, especially:
Henning Hermjakob
Acknowledgements: All ProteomeXchange partners
All data submitters !!!
Eric Deutsch
Zhi Sun
David Campbell
Nuno Bandeira
Mingxun Wang
Jeremy Carver
Yasushi Ishihama
Shin Kawano
Follow new datasets @proteomexchange
Yunping Zhu
Masheng Li
Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018

ProteomeXchange update

  • 1.
    ProteomeXchange update Dr. JuanAntonio Vizcaíno (on behalf of all ProteomeXchange partners) EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK
  • 2.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Overview • Introduction • Some usage statistics • Guidelines: Handling of reprocessed datasets • New prospective member: Panorama Public • Miscellaneous
  • 3.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory data deposition http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014 Deutsch et al., NAR, 2017 • Framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
  • 4.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory data deposition http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014 Deutsch et al., NAR, 2017 iProX (MS/MS data) • Framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. New in 2017
  • 5.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 VIP Load balance server 1 nginx keepalived CentOS Load balance server 2 nginx keepalived CentOS Application server 1 SpringMVC MyBatis tomcat java CentOS Application server 2 SpringMVC MyBatis tomcat java CentOS Database server (Master) CentOS MySql Database server (slave) CentOS MySql Data storage server 2 nginx CentOS Data storage server 1 nginx keepalived CentOS aspera Data storage server 3 nginx keepalived CentOS aspera iProX- the integrated proteome resources in China Cloud platform architecture with High Availability http://www.iprox.org
  • 6.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Deployment of iProX Beijing Hunan Shanghai • BPRC & NCPSB (Beijing): Main location of deployment and the only submission site • Three Offsite data backups • CNIC (Beijing, north China) • SCBIT(Shanghai, east China) • NSCC(Hunan, south China) • All four sites will provide downloading service at the same time coordinated by the load balancer. • By the end of March 2018, 374 datasets are submitted, with a total amount of 60 TB
  • 7.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory data deposition http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014 Deutsch et al., NAR, 2017 iProX (MS/MS data) • Framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. New in 2017
  • 8.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 https://jpostdb.org/ Repository is going well. Database part is just open. Re-analysis part is under development. Funding is just renewed for next 5 years! JPOST status
  • 9.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 10.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Public datasets from different omics: OmicsDI http://www.omicsdi.org/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA …and others Perez-Riverol et al., Nat Biotechnol, 2017
  • 11.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Overview • Introduction • Some usage statistics • Guidelines: Handling of reprocessed datasets • New prospective member: Panorama Public • Miscellaneous
  • 12.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Data content per resource (PXD identifiers) 84.9% 11.5% 1.8% 1.5% 0.3%
  • 13.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 PRIDE data submissions and data growth > 2,400 datasets submitted in 2017 In March 2018 we have reached for the first time 300 submitted datasets Datasets submitted per month Datasets submitted per year
  • 14.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Data re-use in proteomics is increasing Data download volume for PRIDE Archive in 2017: 295 TB 0 50 100 150 200 250 300 350 2013 2014 2015 2016 2017 Downloads in TBs
  • 15.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Overview • Introduction • Some usage statistics • Guidelines: Handling of reprocessed datasets • New prospective member: Panorama Public • Miscellaneous
  • 16.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Guidelines developed
  • 17.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Guidelines developed • Initial implementation in MassIVE
  • 18.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Other guidelines developed during the last year • Retraction of datasets (“Re-calling”) • Support for alternative location of datasets (alternative URLs) • Try to get external datasets into PX (e.g. CPTAC)
  • 19.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Overview • Introduction • Some usage statistics • Guidelines: Handling of reprocessed datasets • New prospective member: Panorama Public • Miscellaneous
  • 20.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Panorama Public • Panorama Public is designed for sharing data generated through Skyline-based targeted proteomics workflows such as SRM and PRM or targeted DDA and DIA. • Led by Brendan MacLean & Mike MacCoss group • Processed results are stored in the Skyline XML format • Interested to join ProteomeXchange as a repository for targeted proteomics workflows.
  • 21.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Panorama Public https://panoramaweb.org/
  • 22.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Overview • Introduction • Some usage statistics • Guidelines: Handling of reprocessed datasets • New prospective member: Panorama Public • Miscellaneous
  • 23.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 PRIDE has become and ELIXIR core data resource • ELIXIR coordinates, integrates and sustains bioinformatics resources across Europe and enables users in academia and industry to access services that are vital for their research • First list of core resources announced on July 2017. • PRIDE included in the initial list. https://www.elixir-europe.org/platforms/data/core-data-resources
  • 24.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 • The goal of the ELIXIR proteomics community is to develop and maintain sustainable proteomics tools and data resources • An essential part of the development will also be the ‘FAIRification’ of the resources (i.e. making the resources FAIR) • Integrate proteomics bioinformatics activities in ELIXIR PRIDE as a “pillar” of the ELIXIR Proteomics Community
  • 25.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Main plans for meeting • General update • Panorama Public application to join PX • Do we need more formalised guidelines for several topics? • Short Report about GDPR guidelines • Two related projects (NIH “data standards” grant): • Universal Spectrum Identifier (USI) • PROXI
  • 26.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018 Aknowledgements: People Yasset Perez-Riverol Attila Csordas Tobias Ternent Gerhard Mayer (de.NBI) Andrew Jarnuczak Mathias Walzer Suresh Hewapathirana Jingwen Bai Former team members, especially: Henning Hermjakob Acknowledgements: All ProteomeXchange partners All data submitters !!! Eric Deutsch Zhi Sun David Campbell Nuno Bandeira Mingxun Wang Jeremy Carver Yasushi Ishihama Shin Kawano Follow new datasets @proteomexchange Yunping Zhu Masheng Li
  • 27.
    Juan A. Vizcaíno juan@ebi.ac.uk PSIMeeting 2018 Heidelberg, 18 April 2018