SlideShare a Scribd company logo
1 of 42
Introduction to the PRIDE database for the
Human Microbiome Project
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• What is PRIDE?
• The ProteomeXchange Consortium
• Submission process and PRIDE tools
• Miscellaneous
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• What is PRIDE?
• The ProteomeXchange Consortium
• Submission process and PRIDE tools
• Miscellaneous
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
ArrayE
xpress
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions & pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride
• PRIDE Archive stores mass spectrometry
(MS)-based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Data content in PRIDE Archive
• PRIDE is organised in datasets (or group of assays).
• An assay represents one MS run (in most cases).
• Focused in MS/MS approaches, but any type of proteomics
workflows can be stored.
• For each dataset PRIDE stores at least the raw data and the
processed results.
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets to PRIDE Archive by November 1st
• 923 were submitted datasets in 2014
• In the last 6 months, 155 submitted datasets per month
• Size: ~ 160 TB
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Access statistics: PRIDE Archive via the file repository
Data download figures:
153.5 TB downloaded so far in 2015 by FTP
22.6 TB so far in 2015 by Aspera
So far, 176 TB in 2015 (by mid November)
156 TB in 2014
FTP access for 2015, split per month
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• What is PRIDE?
• The ProteomeXchange Consortium
• Submission process and PRIDE tools
• Miscellaneous
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow standard
data submission and dissemination pipelines
between the main existing proteomics repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE (UCSD,
San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
ProteomeXchange: 2,774 datasets up until 1st September, 2015
Type:
1681 PRIDE partial
813 PRIDE complete
173 MassIVE
84 PeptideAtlas/PASSEL complete
23 Reprocessed
Publicly Accessible:
1372 datasets, 49% of all
90% PRIDE
6% PASSEL
4% MassIVE
Data volume:
Total: ~150 TB
Number of all files: ~400,000
PXD000320-324: ~ 4 TB
PXD002319-26 ~2.4 TB
PXD001471 ~1.6 TB
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1182
Top Species studied by at least 20 datasets:
1080 Homo sapiens
335 Mus musculus
110 Saccharomyces cerevisiae
98 Arabidopsis thaliana
75 Rattus norvegicus
58 Escherichia coli
29 Bos taurus
23 Glycine max
20 Caenorhabditis elegans
20 Oryza sativa
~ 500 species in total
Origin:
714 USA
313 Germany
252 United Kingdom
163 China
146 France
121 Netherlands
108 Switzerland
103 Canada
81 Denmark
73 Spain
68 Japan
67 Australia
63 Sweden
57 Belgium
43 Austria
39 India
34 Taiwan
33 Norway
26 Italy
24 Ireland
24 Finland
21 Republic of Korea
20 Brazil
20 Russia
18 Israel
18 Singapore …
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• What is PRIDE?
• The ProteomeXchange Consortium
• Submission process and PRIDE tools
• Miscellaneous
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or peak list
spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and provided in
their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other files
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
Processed results are available in an open data standard:
mzIdentML or in the older PRIDE XML format
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Public metaproteomics data in PRIDE Archive so far…
28 public datasets:
- 10 Complete
- 15 Partial
(3 before PX)
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Tools: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
1
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export
Spectra
files
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
Others
Native File export
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Complete submissions
Search
Engine
Results + MS
files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta, PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Search
output
files
Spectra
files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE
XML
‘RESULT’
Before: only file conversion to PRIDE XML
File conversion
PRIDE
Converter
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PX Data workflow for MS/MS data
Search
Engine
Results + MS
files
PRIDE
Converter 2
PRIDE XML
Coté & Griss et al., MCP, 2012
Other tools available:
- PRIDE Converter
- PLGS (Waters)
- Proteios
- EasyProt
- hEIDI
- OmicsHub (Integromics)
- PeptideShaker (Compomics)
PRIDE Converter 2
https://github.com/PRIDE-Toolsuite/pride-converter-2
- ‘Bulk’ conversion possible: Command Line mode
- Virtually no limit in file sizes.
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Tools: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Inspector Toolsuite: Visualisation tool
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector Toolsuite
PRIDE Inspector Toolsuite supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification
+ all types of spectra files
https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Example visualisation: PXD000164
Lassek et al., MCP, 2015
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Inspector Toolsuite: Visualisation tool
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Inspector Toolsuite: Visualisation tool
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PRIDE Tools: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• It selects and captures the mappings between the different types of files included in the
submission.
• It transfers all the files using Aspera (default) or FTP.
PX submission tool
Results
Raw
Other
files
http://www.proteomexchange.org/submission
PX
submission
tool
• Version 2.3.0 released in August 2015 (Several refinements and improvements).
• Alternative command line method also available for groups with bioinformatics support.
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
PX submission tool: screenshots
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Fast file transfer with Aspera
- Aspera is the default file transfer protocol to PRIDE:
- PX Submission tool
- Command line
- Up to 50X faster than FTP
File transfer speed should not
be a problem!!
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission-proteomexchange-pride
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
• What is PRIDE?
• The ProteomeXchange Consortium
• Submission process and PRIDE tools
• Miscellaneous
Overview
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Partial submissions can be used to store other data
workflows
• Everything can be stored, not only MS/MS data (~90% of datasets):
very flexible mechanism to be able to capture all types of datasets
• PRIDE Archive does not store SRM data (it goes to PASSEL, PX
partner).
• Top down proteomics datasets (19 public datasets).
• Mass Spectrometry Imaging datasets (1 public dataset).
• Data independent acquisition techniques: e.g. SWATH-MS (22 public
datasets), MSE (5 public datasets) HDMSE (2 public datasets).
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Linking of datasets from other omics fields
• Samples IDs can be included during the submission process (e.g.
in the PX submission tool), but they are not linked at present .
• Ongoing general EBI approach to improve the current situation,
working closely with the EBI BioSamples database.
• Better integration of IDs from the NCBI Biosamples DB should also
be possible.
• So far, we have not had an “example” project and dedicated funding
to do this.
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Sneak peak to other PRIDE resources
• Provide an aggregated and QC filtered peptide-centric
and protein centric view on PRIDE Archive data.
http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
Griss et al., Nat Methods, 2013
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Aknowledgements: People
Attila Csordas
Tobias Ternent
Noemi del Toro
Gerhard Mayer (Bochum, de.NBI)
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang,
Florian Reisinger and Jose A. Dianes
The PRIDE Team
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Discussion
Juan A. Vizcaíno
juan@ebi.ac.uk
Human Microbiome Project
11 December 2015
Questions?

More Related Content

Similar to Introduction to PRIDE Database for Human Microbiome

PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarJuan Antonio Vizcaino
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataJuan Antonio Vizcaino
 
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...Juan Antonio Vizcaino
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateJuan Antonio Vizcaino
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressJuan Antonio Vizcaino
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBIJuan Antonio Vizcaino
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...Juan Antonio Vizcaino
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeJuan Antonio Vizcaino
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014Juan Antonio Vizcaino
 

Similar to Introduction to PRIDE Database for Human Microbiome (20)

PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinarPRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinar
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
Mass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progressMass Spectrometry Informatics formats in progress
Mass Spectrometry Informatics formats in progress
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchangeData volumes in proteomics data resources: PRIDE and ProteomeXchange
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014
 

More from Juan Antonio Vizcaino

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Juan Antonio Vizcaino
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formatsJuan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Juan Antonio Vizcaino
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...Juan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Juan Antonio Vizcaino
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Juan Antonio Vizcaino
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...Juan Antonio Vizcaino
 

More from Juan Antonio Vizcaino (20)

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Introduction to PRIDE Database for Human Microbiome

  • 1. Introduction to the PRIDE database for the Human Microbiome Project Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • What is PRIDE? • The ProteomeXchange Consortium • Submission process and PRIDE tools • Miscellaneous Overview
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • What is PRIDE? • The ProteomeXchange Consortium • Submission process and PRIDE tools • Miscellaneous Overview
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Data resources at EMBL-EBI Genes, genomes & variation RNA Central ArrayE xpress Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE (PRoteomics IDEntifications) Archive http://www.ebi.ac.uk/pride • PRIDE Archive stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016, in press
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Data content in PRIDE Archive • PRIDE is organised in datasets (or group of assays). • An assay represents one MS run (in most cases). • Focused in MS/MS approaches, but any type of proteomics workflows can be stored. • For each dataset PRIDE stores at least the raw data and the processed results.
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Ways to access data in PRIDE Archive • PRIDE web interface • File repository • REST web service • PRIDE Inspector tool
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Archive submitted datasets up until 1st November, 2015 • 1,259 submitted datasets to PRIDE Archive by November 1st • 923 were submitted datasets in 2014 • In the last 6 months, 155 submitted datasets per month • Size: ~ 160 TB
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Access statistics: PRIDE Archive via the file repository Data download figures: 153.5 TB downloaded so far in 2015 by FTP 22.6 TB so far in 2015 by Aspera So far, 176 TB in 2015 (by mid November) 156 TB in 2014 FTP access for 2015, split per month
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • What is PRIDE? • The ProteomeXchange Consortium • Submission process and PRIDE tools • Miscellaneous Overview
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014 ProteomeXchange data workflow
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 ProteomeXchange: 2,774 datasets up until 1st September, 2015 Type: 1681 PRIDE partial 813 PRIDE complete 173 MassIVE 84 PeptideAtlas/PASSEL complete 23 Reprocessed Publicly Accessible: 1372 datasets, 49% of all 90% PRIDE 6% PASSEL 4% MassIVE Data volume: Total: ~150 TB Number of all files: ~400,000 PXD000320-324: ~ 4 TB PXD002319-26 ~2.4 TB PXD001471 ~1.6 TB Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1182 Top Species studied by at least 20 datasets: 1080 Homo sapiens 335 Mus musculus 110 Saccharomyces cerevisiae 98 Arabidopsis thaliana 75 Rattus norvegicus 58 Escherichia coli 29 Bos taurus 23 Glycine max 20 Caenorhabditis elegans 20 Oryza sativa ~ 500 species in total Origin: 714 USA 313 Germany 252 United Kingdom 163 China 146 France 121 Netherlands 108 Switzerland 103 Canada 81 Denmark 73 Spain 68 Japan 67 Australia 63 Sweden 57 Belgium 43 Austria 39 India 34 Taiwan 33 Norway 26 Italy 24 Ireland 24 Finland 21 Republic of Korea 20 Brazil 20 Russia 18 Israel 18 Singapore …
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • What is PRIDE? • The ProteomeXchange Consortium • Submission process and PRIDE tools • Miscellaneous Overview
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files (the list can be extended): a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Complete Partial Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized. Processed results are available in an open data standard: mzIdentML or in the older PRIDE XML format
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Public metaproteomics data in PRIDE Archive so far… 28 public datasets: - 10 Complete - 15 Partial (3 before PX)
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Tools: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 1
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export Spectra files Mascot ProteinPilot Scaffold PEAKS MSGF+ Others Native File export
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Complete submissions Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - MyriMatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - PeptideShaker - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (from version 5.0) - X!Tandem native conversion (Beta, PILEDRIVER) - Others: library for X!Tandem conversion, lab internal pipelines, … - Crux An increasing number of tools support export to mzIdentML 1.1 - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Search output files Spectra files Original data files ‘RESULT’ file generation Final ‘RESULT’ file PRIDE XML ‘RESULT’ Before: only file conversion to PRIDE XML File conversion PRIDE Converter
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PX Data workflow for MS/MS data Search Engine Results + MS files PRIDE Converter 2 PRIDE XML Coté & Griss et al., MCP, 2012 Other tools available: - PRIDE Converter - PLGS (Waters) - Proteios - EasyProt - hEIDI - OmicsHub (Integromics) - PeptideShaker (Compomics) PRIDE Converter 2 https://github.com/PRIDE-Toolsuite/pride-converter-2 - ‘Bulk’ conversion possible: Command Line mode - Virtually no limit in file sizes.
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Tools: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 2
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Inspector Toolsuite: Visualisation tool Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., MCP, 2016, in press PRIDE Inspector Toolsuite PRIDE Inspector Toolsuite supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab identification and Quantification + all types of spectra files https://github.com/PRIDE-Toolsuite/
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Example visualisation: PXD000164 Lassek et al., MCP, 2015
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Inspector Toolsuite: Visualisation tool
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Inspector Toolsuite: Visualisation tool
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PRIDE Tools: Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML 3
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • It selects and captures the mappings between the different types of files included in the submission. • It transfers all the files using Aspera (default) or FTP. PX submission tool Results Raw Other files http://www.proteomexchange.org/submission PX submission tool • Version 2.3.0 released in August 2015 (Several refinements and improvements). • Alternative command line method also available for groups with bioinformatics support.
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 PX submission tool: screenshots
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Fast file transfer with Aspera - Aspera is the default file transfer protocol to PRIDE: - PX Submission tool - Command line - Up to 50X faster than FTP File transfer speed should not be a problem!!
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Manuscript published detailing the process Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission-proteomexchange-pride Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 • What is PRIDE? • The ProteomeXchange Consortium • Submission process and PRIDE tools • Miscellaneous Overview
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Partial submissions can be used to store other data workflows • Everything can be stored, not only MS/MS data (~90% of datasets): very flexible mechanism to be able to capture all types of datasets • PRIDE Archive does not store SRM data (it goes to PASSEL, PX partner). • Top down proteomics datasets (19 public datasets). • Mass Spectrometry Imaging datasets (1 public dataset). • Data independent acquisition techniques: e.g. SWATH-MS (22 public datasets), MSE (5 public datasets) HDMSE (2 public datasets).
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Linking of datasets from other omics fields • Samples IDs can be included during the submission process (e.g. in the PX submission tool), but they are not linked at present . • Ongoing general EBI approach to improve the current situation, working closely with the EBI BioSamples database. • Better integration of IDs from the NCBI Biosamples DB should also be possible. • So far, we have not had an “example” project and dedicated funding to do this.
  • 38. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Sneak peak to other PRIDE resources • Provide an aggregated and QC filtered peptide-centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/ Griss et al., Nat Methods, 2013
  • 39. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Do you want to know a bit more…? http://www.slideshare.net/JuanAntonioVizcaino
  • 40. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Aknowledgements: People Attila Csordas Tobias Ternent Noemi del Toro Gerhard Mayer (Bochum, de.NBI) Johannes Griss Yasset Perez-Riverol Henning Hermjakob Former team members: Rui Wang, Florian Reisinger and Jose A. Dianes The PRIDE Team
  • 41. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Discussion
  • 42. Juan A. Vizcaíno juan@ebi.ac.uk Human Microbiome Project 11 December 2015 Questions?