PRIDE Resource Team
PRIDE ProteoGenomics
Moving millions of Peptide Evidences into EBI Protein
Resources.
SAB Meeting
EMBL-EBI, November 2018
Moving peptidoforms to ENSEMBL
• Increasing interest to see peptide MS/MS evidences into Genomics context, with
special focus in:
• Post-translational modifications
• Single amino acids variants.
• Interest on expression information and correlation with gene expression.
SAB Meeting
EMBL-EBI, November 2018
PRIDE Peptidome
PX
Complete
.
.
n
Hadoop Cluster
PRIDE Archive Import
Complete Submissions
PX successfully converted
New Peptide/PTMs
Number of Identified and non-Identified Spectra
QC
QC
Number of new clusters
PRIDE Cluster score distribution
Number of clusters by modification
mgf
(Annotated
spectra)
Clustering
Files
Johannes Griss
Visitor Postdoc
Griss J and Perez-Riverol Y, et. al. Nature Methods, 2016
http://wwwdev.ebi.ac.uk/pride/peptidome
VLIPVFALGR 0.98
ITVLEALR 0.95
LFDWANTSR 0.89
ATLNAFLYR 1.00
LAQFDYGR 0.75
.
.
.
SAB Meeting
EMBL-EBI, November 2018
ENSEMBL Track Hub Registry
SAB Meeting
EMBL-EBI, November 2018
Trackhub search
Search results for the keyword “proteome”
Filter by species (organisms) /
Assembly version
SAB Meeting
EMBL-EBI, November 2018
Attached hub (Ensembl)
SAB Meeting
EMBL-EBI, November 2018
Mapping peptides to ENSEMBL
GitHub Tool: https://github.com/bigbio/pgatk/tree/master/PepGenome
For each .pogo file:
• PTMs are standard to a common
representation using PRIDE-Mod library.
• Each Peptide reference to an Assay URL in
PRIDE.
• Each Pogo file is generated automatically by
the PRIDE Pipeline.
chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0
chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0
chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0
chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0
chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0
chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0
chromosome start end feature score itemRgbstrand
SAB Meeting
EMBL-EBI, November 2018
PRIDE Peptidome Pipeline
PX
Complete
.
.
n
Hadoop Cluster
PRIDE Archive Import
PX successfully converted
New Peptide/PTMs
Number of Identified and non-Identified Spectra
QC QC
Number of new clusters
PRIDE Cluster score distribution
Number of clusters by modification
mgf
(Annotated
spectra)
Clustering
Files
Peptide
Tables
(Pogo File)
Peptide Export
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
Johannes Griss
Visitor Postdoc
• Automatic update when new ENSEMBL Release.
• Support more than 19 species from ENSEMBL.
• Update when a new release of the data in PRIDE Peptidome.
• Support highlight on PTMs, transcript and gene uniqueness.
SAB Meeting
EMBL-EBI, November 2018
Annotating peptide evidences from PRIDE
Archive projects
• 1% FDR PSM level (Combine Results)
• 1% FDR Peptide Level (Combine Results)
Filters (HPP):
• > 8 AA
• 1% FDR at transcript level (inference
needed)
5
TrackHub Registry can search Tracks by:
ShortLabel, LongLabel.
OmicsType: Proteomics, Genomics,
Transcriptomics.
4
PRIDE Submission Pipelines1
mzid
Peak
lists
PX Complete Submission Assays
mztab
mgf
lists
PX Complete Submission
Assays
2
Convert to mztab/mgf and filter evidences do not
pass the reported mzid threshold.
5 Storage of the Project Metadata, Peptide Sequences,
Protein identifiers in Solr and MongoDB.
6
3
Assay
Peptide
Pogo File
PX Complete Submission
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
4
5
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/archive/
SAB Meeting
EMBL-EBI, November 2018
Generation reliable peptide tables
Current Filter options:
• 1% FDR PSM level (Combine
Results)
• 1% FDR Peptide Level (Combine
Results)
Possible Filters (HPP):
• > 8 AA
• 1% FDR at transcript level
(inference needed)
Combine PSM Score:
- Same Spectra, Peptide
- Different Search Engine
Combine Peptide Score:
- Same Peptide
- Different PSMs
Experiment Peptide PSMs Quant
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34644">Assay 34644</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> THTQDAVPLTLGQEFSGYVQQVQYAM(oxidation)VR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> KKQVM(oxidation)EK 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> VGSGDTNNFPYLEK 2 2.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> SLTYLSILR 3 3.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> LPFTPLSYIQGLSHR 8 8.000000
Audain, Enrique, et al. Journal of Proteomics, 2017
PS6PS4
PR1
PR2 PR3
PR4
PR5
P1
P3P2
P4PS1
PR10
PR9
PR7
PR8
PR6
P5
P6
P7
PS2 PS3
PS5
PS7
PS8
SAB Meeting
EMBL-EBI, November 2018
Complete Submissions (Human)
PSMs = 4,374,055.00
MOD-PSMs = 1,225,565.00
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
SAB Meeting
EMBL-EBI, November 2018
Human (hg38) Mouse (mm10)
• Black (all identified
peptides).
• Cyan (oxidation)
• Orange (acetyl)
• Red (phospho)
• 182 PRIDE public datasets.
• 163 from Homo sapiens.
• 15 from Mus musculus.
• 4 from Rattus norvegicus and 2
from Bos Taurus.
• 4 millions peptidoforms including
PTMs
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
SAB Meeting
EMBL-EBI, November 2018
ENSEMBL TrackHub Visualization
SAB Meeting
EMBL-EBI, November 2018
ENSEMBL TrackHub Visualization
SAB Meeting
EMBL-EBI, November 2018
(Mapping also other ProteoXchange
Partners)
We have map more than 1 millions peptides from
PeptideAtlas into ENSEMBL Genome Coordinates.
SAB Meeting
EMBL-EBI, November 2018
Conclusions
Increase the number of submissions map to ESEMBL
coordinates.
Explore the possibility to map from the peptide evidence to
the corresponding spectrum visualizer in PRIDE.
Provide more information about the Disease, Tissue, cell
type when the information get improved in PRIDE.
Develop pipelines to move Intensity-based quantitative
data into ENSEMBL.
Reuse the generated data to improve ENSEMBL
annotations.
SAB Meeting
EMBL-EBI, November 2018
PRIDE Developer Team
@pride_ebi
@proteomexchange
Manuel Bernal-Llinares
(track-hub creator)
Johannes Griss
(pride cluster pipelines)
Christoph Schlaffner
(pogo tool)
Jyoti Choudhary
(PI)
Alessandro Vullo
(trackhub registry)
ENSEMBLTeam
Sanger Team

Mapping millions of peptidoforms to Genome Coordinates

  • 1.
    PRIDE Resource Team PRIDEProteoGenomics Moving millions of Peptide Evidences into EBI Protein Resources.
  • 2.
    SAB Meeting EMBL-EBI, November2018 Moving peptidoforms to ENSEMBL • Increasing interest to see peptide MS/MS evidences into Genomics context, with special focus in: • Post-translational modifications • Single amino acids variants. • Interest on expression information and correlation with gene expression.
  • 3.
    SAB Meeting EMBL-EBI, November2018 PRIDE Peptidome PX Complete . . n Hadoop Cluster PRIDE Archive Import Complete Submissions PX successfully converted New Peptide/PTMs Number of Identified and non-Identified Spectra QC QC Number of new clusters PRIDE Cluster score distribution Number of clusters by modification mgf (Annotated spectra) Clustering Files Johannes Griss Visitor Postdoc Griss J and Perez-Riverol Y, et. al. Nature Methods, 2016 http://wwwdev.ebi.ac.uk/pride/peptidome VLIPVFALGR 0.98 ITVLEALR 0.95 LFDWANTSR 0.89 ATLNAFLYR 1.00 LAQFDYGR 0.75 . . .
  • 4.
    SAB Meeting EMBL-EBI, November2018 ENSEMBL Track Hub Registry
  • 5.
    SAB Meeting EMBL-EBI, November2018 Trackhub search Search results for the keyword “proteome” Filter by species (organisms) / Assembly version
  • 6.
    SAB Meeting EMBL-EBI, November2018 Attached hub (Ensembl)
  • 7.
    SAB Meeting EMBL-EBI, November2018 Mapping peptides to ENSEMBL GitHub Tool: https://github.com/bigbio/pgatk/tree/master/PepGenome For each .pogo file: • PTMs are standard to a common representation using PRIDE-Mod library. • Each Peptide reference to an Assay URL in PRIDE. • Each Pogo file is generated automatically by the PRIDE Pipeline. chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0 chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0 chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0 chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0 chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0 chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0 chromosome start end feature score itemRgbstrand
  • 8.
    SAB Meeting EMBL-EBI, November2018 PRIDE Peptidome Pipeline PX Complete . . n Hadoop Cluster PRIDE Archive Import PX successfully converted New Peptide/PTMs Number of Identified and non-Identified Spectra QC QC Number of new clusters PRIDE Cluster score distribution Number of clusters by modification mgf (Annotated spectra) Clustering Files Peptide Tables (Pogo File) Peptide Export Taxonomies Track TrackHub Generation Taxonomies TrackHub Registry Johannes Griss Visitor Postdoc • Automatic update when new ENSEMBL Release. • Support more than 19 species from ENSEMBL. • Update when a new release of the data in PRIDE Peptidome. • Support highlight on PTMs, transcript and gene uniqueness.
  • 9.
    SAB Meeting EMBL-EBI, November2018 Annotating peptide evidences from PRIDE Archive projects • 1% FDR PSM level (Combine Results) • 1% FDR Peptide Level (Combine Results) Filters (HPP): • > 8 AA • 1% FDR at transcript level (inference needed) 5 TrackHub Registry can search Tracks by: ShortLabel, LongLabel. OmicsType: Proteomics, Genomics, Transcriptomics. 4 PRIDE Submission Pipelines1 mzid Peak lists PX Complete Submission Assays mztab mgf lists PX Complete Submission Assays 2 Convert to mztab/mgf and filter evidences do not pass the reported mzid threshold. 5 Storage of the Project Metadata, Peptide Sequences, Protein identifiers in Solr and MongoDB. 6 3 Assay Peptide Pogo File PX Complete Submission Taxonomies Track TrackHub Generation Taxonomies TrackHub Registry 4 5 http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/archive/
  • 10.
    SAB Meeting EMBL-EBI, November2018 Generation reliable peptide tables Current Filter options: • 1% FDR PSM level (Combine Results) • 1% FDR Peptide Level (Combine Results) Possible Filters (HPP): • > 8 AA • 1% FDR at transcript level (inference needed) Combine PSM Score: - Same Spectra, Peptide - Different Search Engine Combine Peptide Score: - Same Peptide - Different PSMs Experiment Peptide PSMs Quant <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> APPLLEGAPFR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34644">Assay 34644</a> APPLLEGAPFR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> THTQDAVPLTLGQEFSGYVQQVQYAM(oxidation)VR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> KKQVM(oxidation)EK 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> VGSGDTNNFPYLEK 2 2.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> SLTYLSILR 3 3.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> LPFTPLSYIQGLSHR 8 8.000000 Audain, Enrique, et al. Journal of Proteomics, 2017 PS6PS4 PR1 PR2 PR3 PR4 PR5 P1 P3P2 P4PS1 PR10 PR9 PR7 PR8 PR6 P5 P6 P7 PS2 PS3 PS5 PS7 PS8
  • 11.
    SAB Meeting EMBL-EBI, November2018 Complete Submissions (Human) PSMs = 4,374,055.00 MOD-PSMs = 1,225,565.00 http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
  • 12.
    SAB Meeting EMBL-EBI, November2018 Human (hg38) Mouse (mm10) • Black (all identified peptides). • Cyan (oxidation) • Orange (acetyl) • Red (phospho) • 182 PRIDE public datasets. • 163 from Homo sapiens. • 15 from Mus musculus. • 4 from Rattus norvegicus and 2 from Bos Taurus. • 4 millions peptidoforms including PTMs http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
  • 13.
    SAB Meeting EMBL-EBI, November2018 ENSEMBL TrackHub Visualization
  • 14.
    SAB Meeting EMBL-EBI, November2018 ENSEMBL TrackHub Visualization
  • 15.
    SAB Meeting EMBL-EBI, November2018 (Mapping also other ProteoXchange Partners) We have map more than 1 millions peptides from PeptideAtlas into ENSEMBL Genome Coordinates.
  • 16.
    SAB Meeting EMBL-EBI, November2018 Conclusions Increase the number of submissions map to ESEMBL coordinates. Explore the possibility to map from the peptide evidence to the corresponding spectrum visualizer in PRIDE. Provide more information about the Disease, Tissue, cell type when the information get improved in PRIDE. Develop pipelines to move Intensity-based quantitative data into ENSEMBL. Reuse the generated data to improve ENSEMBL annotations.
  • 17.
    SAB Meeting EMBL-EBI, November2018 PRIDE Developer Team @pride_ebi @proteomexchange Manuel Bernal-Llinares (track-hub creator) Johannes Griss (pride cluster pipelines) Christoph Schlaffner (pogo tool) Jyoti Choudhary (PI) Alessandro Vullo (trackhub registry) ENSEMBLTeam Sanger Team

Editor's Notes

  • #8 Try to merge KNIME information into slide 7.
  • #11 Try to merge KNIME information into slide 7.