Hubble Asteroid Hunter III. Physical properties of newly found asteroids
ProteomeXchange update
1. ProteomeXchange update
Dr. Juan Antonio Vizcaíno
(on behalf of all ProteomeXchange partners)
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
3. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
4. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
New in 2017
5. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
VIP
Load balance server 1
nginx keepalived
CentOS
Load balance server 2
nginx keepalived
CentOS
Application server 1
SpringMVC MyBatis
tomcat
java
CentOS
Application server 2
SpringMVC MyBatis
tomcat
java
CentOS
Database server (Master)
CentOS
MySql
Database server (slave)
CentOS
MySql
Data storage server 2
nginx
CentOS
Data storage server 1
nginx keepalived
CentOS
aspera
Data storage server 3
nginx keepalived
CentOS
aspera
iProX- the integrated proteome resources in China
Cloud platform architecture
with High Availability
http://www.iprox.org
6. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Deployment of iProX
Beijing
Hunan
Shanghai
• BPRC & NCPSB (Beijing): Main
location of deployment and the
only submission site
• Three Offsite data backups
• CNIC (Beijing, north China)
• SCBIT(Shanghai, east China)
• NSCC(Hunan, south China)
• All four sites will provide
downloading service at the same
time coordinated by the load
balancer.
• By the end of March 2018, 374
datasets are submitted, with a
total amount of 60 TB
7. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
New in 2017
8. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
https://jpostdb.org/
Repository
is going well.
Database part is just open.
Re-analysis part is
under development.
Funding is just renewed for next 5 years!
JPOST status
9. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
10. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Public datasets from different omics: OmicsDI
http://www.omicsdi.org/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
…and others
Perez-Riverol et al., Nat Biotechnol, 2017
11. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
12. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Data content per resource (PXD identifiers)
84.9%
11.5%
1.8% 1.5% 0.3%
13. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
PRIDE data submissions and data growth
> 2,400 datasets submitted in 2017
In March 2018 we have reached for the
first time 300 submitted datasets
Datasets submitted per month Datasets submitted per year
14. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Data re-use in proteomics is increasing
Data download volume for PRIDE Archive in
2017: 295 TB
0
50
100
150
200
250
300
350
2013 2014 2015 2016 2017
Downloads in TBs
15. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
18. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Other guidelines developed during the last year
• Retraction of datasets (“Re-calling”)
• Support for alternative location of datasets (alternative URLs)
• Try to get external datasets into PX (e.g. CPTAC)
19. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
20. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Panorama Public
• Panorama Public is designed for sharing data generated
through Skyline-based targeted proteomics workflows such as
SRM and PRM or targeted DDA and DIA.
• Led by Brendan MacLean & Mike MacCoss group
• Processed results are stored in the Skyline XML format
• Interested to join ProteomeXchange as a repository for targeted
proteomics workflows.
22. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Overview
• Introduction
• Some usage statistics
• Guidelines: Handling of reprocessed datasets
• New prospective member: Panorama Public
• Miscellaneous
23. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
PRIDE has become and ELIXIR core data resource
• ELIXIR coordinates, integrates and sustains bioinformatics
resources across Europe and enables users in academia and
industry to access services that are vital for their research
• First list of core resources announced on July 2017.
• PRIDE included in the initial list.
https://www.elixir-europe.org/platforms/data/core-data-resources
24. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
• The goal of the ELIXIR proteomics community is to
develop and maintain sustainable proteomics
tools and data resources
• An essential part of the development will also be the
‘FAIRification’ of the resources (i.e. making the
resources FAIR)
• Integrate proteomics bioinformatics activities in
ELIXIR
PRIDE as a “pillar” of the ELIXIR Proteomics Community
25. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Main plans for meeting
• General update
• Panorama Public application to join PX
• Do we need more formalised guidelines for several topics?
• Short Report about GDPR guidelines
• Two related projects (NIH “data standards” grant):
• Universal Spectrum Identifier (USI)
• PROXI
26. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI Meeting 2018
Heidelberg, 18 April 2018
Aknowledgements: People
Yasset Perez-Riverol
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Andrew Jarnuczak
Mathias Walzer
Suresh Hewapathirana
Jingwen Bai
Former team members, especially:
Henning Hermjakob
Acknowledgements: All ProteomeXchange partners
All data submitters !!!
Eric Deutsch
Zhi Sun
David Campbell
Nuno Bandeira
Mingxun Wang
Jeremy Carver
Yasushi Ishihama
Shin Kawano
Follow new datasets @proteomexchange
Yunping Zhu
Masheng Li