SlideShare a Scribd company logo
1 of 37
Download to read offline
Reusing and integrating public proteomics
data to improve our knowledge of the human
proteome
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-European Bioinformatics Institute (EMBL-EBI)
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Overview
• Short introduction to PRIDE and ProteomeXchange
• Reuse of public proteomics data
• Integration of proteomics and genomics data
• Open analysis pipelines for proteomics data
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
•PRIDE stores mass spectrometry (MS)-
based proteomics data:
•Peptide and protein expression data
(identification and quantification)
•Post-translational modifications
•Mass spectra (raw data and peak lists)
•Technical and biological metadata
•Any other related information
•Full support for tandem MS approaches
•Any type of data can be stored
•From July 2017, an ELIXIR core resource
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
PRIDE data submissions and data growth
May 2018 (320 datasets) was again a
record month in terms of datasets
submitted
Datasets submitted per month
> 2,400 datasets submitted in 2017
Datasets submitted per year
PRIDE contains >85% of all ProteomeXchange datasets
Dataset PXD010000 reached on June 1st
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
HPP statistics (HPP tags)
0
20
40
60
80
100
120
140
160
180
Hum
an
Proteom
e
Project
ease-Driven
Hum
an
Proteom
e
Project(B/D-…
Chrom
osom
e-centric
Hum
an
Proteom
e
Project(C-H
PP)
Cancer(B/D-H
PP)
Hum
an
Im
m
uno-Peptidom
e
Project(HU
PO
-H
IPP)
Liver(B/D-HPP)
Protein
M
isfolding
and
Aggregation
(B/D-HPP)
Extrem
e
Conditions
(B/D-H
PP)
Hum
an
Brain
Proteom
e
Project(H
UPO_HBPP)(B/D-HPP)
Diabetes
(B/D-HPP)
Food
and
Nutrition
(B/D-H
PP)
G
lycoproteom
ics
(B/D-HPP)
KidneyU
rine
(B/D-H
PP)
Cardiovascular(B/D-HPP)
EpigeneticC
hrom
atin
(B/D-HPP)
EyeO
M
E
(B/D-HPP)
China
Hum
an
Proteom
e
Project(CN
HPP)
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
HPP statistics (Countries)
0
10
20
30
40
50
60
China
USA
SpainCanada
FranceG
erm
any
Republic
ofKorea
SwitzerlandAustralia
Italy
Japan
Netherlands
Brazil
IndiaNorway
South
KoreaThailand
AustriaBelgiumDenm
arkFinland
IsraelPakistan
RussiaSwedenTaiwan
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Overview
• Short introduction to PRIDE and ProteomeXchange
• Reuse of public proteomics data
• Integration of proteomics and genomics data
• Open analysis pipelines for proteomics data
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Data re-use in proteomics keeps increasing
Data download volume for PRIDE in 2017: 295 TB
0
50
100
150
200
250
300
350
2013 2014 2015 2016 2017
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Public data re-analysis -> Data repurposing
• Individual authors (third parties) can re-analyze MS
proteomics raw data with new hypotheses in mind
(not taken into account by the original authors).
• Proteogenomics studies
• Meta-analysis studies
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Public data re-analysis -> Data repurposing
• Individual authors (third parties) can re-analyze MS
proteomics raw data with new hypotheses in mind
(not taken into account by the original authors).
• Proteogenomics studies
• Meta-analysis studies
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Examples of repurposing datasets: proteogenomics
Data in public resources can be used for genome annotation purposes ->
Discovery of short ORFs, translated lncRNAs, etc
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Examples of repurposing datasets: proteogenomics
Also some studies have been performed in model organisms: mouse, rat,
Drosophila, and other microorganisms (Mycobacterium tuberculosis,
Helicobacter pylori, rice,…)
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Public datasets from different omics: OmicsDI
http://www.omicsdi.org/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
…and others
Perez-Riverol et al., Nat Biotechnol, 2017
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Public data re-analysis -> Data repurposing
• Individual authors (third parties) can re-analyze MS
proteomics raw data with new hypotheses in mind
(not taken into account by the original authors).
• Proteogenomics studies
• Meta-analysis studies -> Analysing together a
large number of datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Reuse of public proteomics data is on the rise!
Martens & Vizcaíno, Trends Bioch Sci, 2017 Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
My talk on Monday afternoon: “The functional landscape of
human phosphorylation”
1. Consistent re-analysis of PRIDE public datasets
2. Constructing a functional score for those phospho-sites (ML)
3. Validation of the score (in silico and in vivo)
Collaboration with Pedro Beltrao’s group
Ø Largest to date MS-based phospho-proteomics atlas
Ø Fully annotated at dataset level
Ø 101 cell lines/tissues (120 PXD datasets)
Ø 6,801 raw files (~5.2 TB)
Ø Running time ~ 2 months
Ø ~120k highly confident phospho-peptide identifications
(<0.01 FDR, Ascore & ∆score filtered)
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Overview
• Short introduction to PRIDE and ProteomeXchange
• Reuse of public proteomics data
• Integration of proteomics and genomics data
• Open analysis pipelines for proteomics data
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
ProteoGenomics data integration in PRIDE
PX Submission
Tool PRIDE
1 2
PRIDE
submission
Pipelines
PRIDE web
and API
3
TrackHub
Registry
4
Automatically connecting proteomics data from original data
submissions to PRIDE to genome browsers (Ensembl, UCSC
browser)
Data in HUPO-
PSI standard
formats:
mzIdentML,
mzTab
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Proteogenomics related formats
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Mapping peptides to the genome: PoGo
Schlaffner CN., Pirklbauer G, Bender A , Choudhary JS, PoGo: Cell Systems, 5(2):152-156.e4)
For each .pogo file:
• PTMs are standard to a common
representation using PRIDE-Mod library.
• Each Peptide reference to an Assay URL in
PRIDE.
• Each Pogo file is generated automatically
by the PRIDE Pipeline.
chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0
chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0
chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0
chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0
chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0
chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0
chr1 1486636 1486663 LFDWANTSR 1000 + 1486636 1486636 128,128,128 1 27 0
chr1 1490572 1490596 LAQFDYGR 1000 + 1490572 1490572 128,128,128 1 24 0
chr1 1522863 1522887 ITVLEALR 1000 + 1522863 1522863 128,128,128 1 24 0
Challenge in the Future:
• Bed information can be extended with more
information about the transcript reliability.
• Peptide uniqueness
• Reliability score.
• Native bigBed should be provided to
remove the customization of new pipelines,
etc.
• What to do with the unmapped peptides
(which are long lists.)
• Maintainability.
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
TrackHub creation and Publication
https://www.trackhubregistry.org/
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
TrackHub creation and Publication
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
UCSC Viewer
http://genome.ucsc.edu/cgi-
bin/hgTracks?db=mm10&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=
0&nonVirtPosition=&position=chr12%3A20021505-
100107519&hgsid=644445905_BScrPMQymPnGtl9O7jeSv1jS5Rm0
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Visualization in IGV
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Overview
• Short introduction to PRIDE and ProteomeXchange
• Reuse of public proteomics data
• Integration of proteomics and genomics data
• Open analysis pipelines for proteomics data
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Reproducible Science
http://www.nature.com/nature/focus/reproducibility/
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
How to make data analysis pipelines reproducible
• That means using:
• Exactly the same software (including the same version) in the
same order.
• The same protein sequence database (including the same
version).
• If we use the same files as input to the software, we will get
EXACTLY the same results.
• If that’s not the case, something has gone wrong.
• Computers are much more reliable than people.
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Develop exemplary proteomics data analysis workflows and deploy
them in the EMBL-EBI "Embassy Cloud”:
(1) Standard identification workflow
(2) Identification workflow for PTMs
(3) Quantification (label-free/label-based approaches)
(4) Quality Control (to aid data set interpretation/reanalysis evaluation)
(5) Versions of quantification approaches (including PTMs)
è Connected to public proteomics data from
Developing pipelines in the cloud -> DDA data
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Cloud based infrastructure
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Open analysis pipelines: DIA and proteogenomics
• Pipelines for DIA approaches.
• In collaboration with the Stoller Center (Manchester) (co-PIs Graham,
Hubbard & Townsend)
• Pipelines for proteogenomics approaches (project just started).
• In collaboration with J. Choudhary (Institute of Cancer Research, London)
• Additional DDA pipelines (ELIXIR Proteomics Community).
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Vision: total transparency and reproducibility
Analysis
Pipelines
Input data Data analysis Results
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Summary
• Public proteomics datasets are on the rise! Reliable (widely used)
infrastructure now exists: PRIDE and ProteomeXchange.
• A lot of possibilities open for reuse of this data.
• New purposes: proteogenomics, novel PTMs,...
• New infrastructure to integrate proteomics and genomics data
• Developing open and reproducible analysis pipelines.
• Supporting reproducible science
• Aim: In the future they are made available to everyone in the
community
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018
Aknowledgements: People
Yasset Perez-Riverol
Johannes Griss
Suresh Hewapathirana
Tobias Ternent
Jingwen Bai
Attila Csordas
Deepti Jaiswal
Andrew Jarnuczak
Mathias Walzer
Gerhard Mayer (de.NBI)
Former team members, especially
Manuel Bernal-Linares & Henning
Hermjakob
Acknowledgements
All data submitters !!!
@pride_ebi
@proteomexchange
Juan A. Vizcaíno
juan@ebi.ac.uk
19th
C-HPP Symposium
Santiago, 16 June 2018

More Related Content

What's hot

Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Juan Antonio Vizcaino
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyTHILAKAR MANI
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoverySunghwan Kim
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationSunghwan Kim
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKeywan Hassani-Pak
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...Sapan Anand
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 

What's hot (20)

PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug Discovery
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics education
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 

Similar to Reusing and integrating public proteomics data to improve our knowledge of the human proteome

An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...Juan Antonio Vizcaino
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataJuan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Juan Antonio Vizcaino
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldJuan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...Juan Antonio Vizcaino
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 

Similar to Reusing and integrating public proteomics data to improve our knowledge of the human proteome (20)

An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
ProteomeXchange update HUPO 2016
ProteomeXchange update HUPO 2016ProteomeXchange update HUPO 2016
ProteomeXchange update HUPO 2016
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Reuse of public data in proteomics
Reuse of public data in proteomicsReuse of public data in proteomics
Reuse of public data in proteomics
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Pride Cluster 062016 Update
Pride Cluster 062016 UpdatePride Cluster 062016 Update
Pride Cluster 062016 Update
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 

More from Juan Antonio Vizcaino

ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateJuan Antonio Vizcaino
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?Juan Antonio Vizcaino
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...Juan Antonio Vizcaino
 
Enabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics dataEnabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics dataJuan Antonio Vizcaino
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRJuan Antonio Vizcaino
 
The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)Juan Antonio Vizcaino
 

More from Juan Antonio Vizcaino (11)

ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
 
ProteomeXchange update 2017
ProteomeXchange update 2017ProteomeXchange update 2017
ProteomeXchange update 2017
 
Enabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics dataEnabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics data
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
 
The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)
 

Recently uploaded

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 

Recently uploaded (20)

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 

Reusing and integrating public proteomics data to improve our knowledge of the human proteome

  • 1. Reusing and integrating public proteomics data to improve our knowledge of the human proteome Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-European Bioinformatics Institute (EMBL-EBI) Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Overview • Short introduction to PRIDE and ProteomeXchange • Reuse of public proteomics data • Integration of proteomics and genomics data • Open analysis pipelines for proteomics data
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 •PRIDE stores mass spectrometry (MS)- based proteomics data: •Peptide and protein expression data (identification and quantification) •Post-translational modifications •Mass spectra (raw data and peak lists) •Technical and biological metadata •Any other related information •Full support for tandem MS approaches •Any type of data can be stored •From July 2017, an ELIXIR core resource PRIDE (PRoteomics IDEntifications) database http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory data deposition http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014 Deutsch et al., NAR, 2017 iProX (MS/MS data) • Framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 PRIDE data submissions and data growth May 2018 (320 datasets) was again a record month in terms of datasets submitted Datasets submitted per month > 2,400 datasets submitted in 2017 Datasets submitted per year PRIDE contains >85% of all ProteomeXchange datasets Dataset PXD010000 reached on June 1st
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 HPP statistics (HPP tags) 0 20 40 60 80 100 120 140 160 180 Hum an Proteom e Project ease-Driven Hum an Proteom e Project(B/D-… Chrom osom e-centric Hum an Proteom e Project(C-H PP) Cancer(B/D-H PP) Hum an Im m uno-Peptidom e Project(HU PO -H IPP) Liver(B/D-HPP) Protein M isfolding and Aggregation (B/D-HPP) Extrem e Conditions (B/D-H PP) Hum an Brain Proteom e Project(H UPO_HBPP)(B/D-HPP) Diabetes (B/D-HPP) Food and Nutrition (B/D-H PP) G lycoproteom ics (B/D-HPP) KidneyU rine (B/D-H PP) Cardiovascular(B/D-HPP) EpigeneticC hrom atin (B/D-HPP) EyeO M E (B/D-HPP) China Hum an Proteom e Project(CN HPP)
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 HPP statistics (Countries) 0 10 20 30 40 50 60 China USA SpainCanada FranceG erm any Republic ofKorea SwitzerlandAustralia Italy Japan Netherlands Brazil IndiaNorway South KoreaThailand AustriaBelgiumDenm arkFinland IsraelPakistan RussiaSwedenTaiwan
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Overview • Short introduction to PRIDE and ProteomeXchange • Reuse of public proteomics data • Integration of proteomics and genomics data • Open analysis pipelines for proteomics data
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Data re-use in proteomics keeps increasing Data download volume for PRIDE in 2017: 295 TB 0 50 100 150 200 250 300 350 2013 2014 2015 2016 2017 Downloads in TBs
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Public data re-analysis -> Data repurposing • Individual authors (third parties) can re-analyze MS proteomics raw data with new hypotheses in mind (not taken into account by the original authors). • Proteogenomics studies • Meta-analysis studies
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Public data re-analysis -> Data repurposing • Individual authors (third parties) can re-analyze MS proteomics raw data with new hypotheses in mind (not taken into account by the original authors). • Proteogenomics studies • Meta-analysis studies
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Examples of repurposing datasets: proteogenomics Data in public resources can be used for genome annotation purposes -> Discovery of short ORFs, translated lncRNAs, etc
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Examples of repurposing datasets: proteogenomics Also some studies have been performed in model organisms: mouse, rat, Drosophila, and other microorganisms (Mycobacterium tuberculosis, Helicobacter pylori, rice,…)
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Public datasets from different omics: OmicsDI http://www.omicsdi.org/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA …and others Perez-Riverol et al., Nat Biotechnol, 2017
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 OmicsDI: Portal for omics datasets
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Public data re-analysis -> Data repurposing • Individual authors (third parties) can re-analyze MS proteomics raw data with new hypotheses in mind (not taken into account by the original authors). • Proteogenomics studies • Meta-analysis studies -> Analysing together a large number of datasets
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Reuse of public proteomics data is on the rise! Martens & Vizcaíno, Trends Bioch Sci, 2017 Vaudel et al., Proteomics, 2016
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 My talk on Monday afternoon: “The functional landscape of human phosphorylation” 1. Consistent re-analysis of PRIDE public datasets 2. Constructing a functional score for those phospho-sites (ML) 3. Validation of the score (in silico and in vivo) Collaboration with Pedro Beltrao’s group Ø Largest to date MS-based phospho-proteomics atlas Ø Fully annotated at dataset level Ø 101 cell lines/tissues (120 PXD datasets) Ø 6,801 raw files (~5.2 TB) Ø Running time ~ 2 months Ø ~120k highly confident phospho-peptide identifications (<0.01 FDR, Ascore & ∆score filtered)
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Overview • Short introduction to PRIDE and ProteomeXchange • Reuse of public proteomics data • Integration of proteomics and genomics data • Open analysis pipelines for proteomics data
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 ProteoGenomics data integration in PRIDE PX Submission Tool PRIDE 1 2 PRIDE submission Pipelines PRIDE web and API 3 TrackHub Registry 4 Automatically connecting proteomics data from original data submissions to PRIDE to genome browsers (Ensembl, UCSC browser) Data in HUPO- PSI standard formats: mzIdentML, mzTab
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Proteogenomics related formats
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Mapping peptides to the genome: PoGo Schlaffner CN., Pirklbauer G, Bender A , Choudhary JS, PoGo: Cell Systems, 5(2):152-156.e4) For each .pogo file: • PTMs are standard to a common representation using PRIDE-Mod library. • Each Peptide reference to an Assay URL in PRIDE. • Each Pogo file is generated automatically by the PRIDE Pipeline. chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0 chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0 chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0 chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0 chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0 chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0 chr1 1486636 1486663 LFDWANTSR 1000 + 1486636 1486636 128,128,128 1 27 0 chr1 1490572 1490596 LAQFDYGR 1000 + 1490572 1490572 128,128,128 1 24 0 chr1 1522863 1522887 ITVLEALR 1000 + 1522863 1522863 128,128,128 1 24 0 Challenge in the Future: • Bed information can be extended with more information about the transcript reliability. • Peptide uniqueness • Reliability score. • Native bigBed should be provided to remove the customization of new pipelines, etc. • What to do with the unmapped peptides (which are long lists.) • Maintainability.
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 TrackHub creation and Publication https://www.trackhubregistry.org/
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 TrackHub creation and Publication
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 UCSC Viewer http://genome.ucsc.edu/cgi- bin/hgTracks?db=mm10&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode= 0&nonVirtPosition=&position=chr12%3A20021505- 100107519&hgsid=644445905_BScrPMQymPnGtl9O7jeSv1jS5Rm0
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Visualization in IGV
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Overview • Short introduction to PRIDE and ProteomeXchange • Reuse of public proteomics data • Integration of proteomics and genomics data • Open analysis pipelines for proteomics data
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Reproducible Science http://www.nature.com/nature/focus/reproducibility/
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 How to make data analysis pipelines reproducible • That means using: • Exactly the same software (including the same version) in the same order. • The same protein sequence database (including the same version). • If we use the same files as input to the software, we will get EXACTLY the same results. • If that’s not the case, something has gone wrong. • Computers are much more reliable than people.
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Develop exemplary proteomics data analysis workflows and deploy them in the EMBL-EBI "Embassy Cloud”: (1) Standard identification workflow (2) Identification workflow for PTMs (3) Quantification (label-free/label-based approaches) (4) Quality Control (to aid data set interpretation/reanalysis evaluation) (5) Versions of quantification approaches (including PTMs) è Connected to public proteomics data from Developing pipelines in the cloud -> DDA data
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Cloud based infrastructure
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Open analysis pipelines: DIA and proteogenomics • Pipelines for DIA approaches. • In collaboration with the Stoller Center (Manchester) (co-PIs Graham, Hubbard & Townsend) • Pipelines for proteogenomics approaches (project just started). • In collaboration with J. Choudhary (Institute of Cancer Research, London) • Additional DDA pipelines (ELIXIR Proteomics Community).
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Vision: total transparency and reproducibility Analysis Pipelines Input data Data analysis Results
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Summary • Public proteomics datasets are on the rise! Reliable (widely used) infrastructure now exists: PRIDE and ProteomeXchange. • A lot of possibilities open for reuse of this data. • New purposes: proteogenomics, novel PTMs,... • New infrastructure to integrate proteomics and genomics data • Developing open and reproducible analysis pipelines. • Supporting reproducible science • Aim: In the future they are made available to everyone in the community
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018 Aknowledgements: People Yasset Perez-Riverol Johannes Griss Suresh Hewapathirana Tobias Ternent Jingwen Bai Attila Csordas Deepti Jaiswal Andrew Jarnuczak Mathias Walzer Gerhard Mayer (de.NBI) Former team members, especially Manuel Bernal-Linares & Henning Hermjakob Acknowledgements All data submitters !!! @pride_ebi @proteomexchange
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk 19th C-HPP Symposium Santiago, 16 June 2018