Mapping millions of peptidoforms to Genome Coordinates
1. PRIDE Resource Team
PRIDE ProteoGenomics
Moving millions of Peptide Evidences into EBI Protein
Resources.
2. SAB Meeting
EMBL-EBI, November 2018
Moving peptidoforms to ENSEMBL
• Increasing interest to see peptide MS/MS evidences into Genomics context, with
special focus in:
• Post-translational modifications
• Single amino acids variants.
• Interest on expression information and correlation with gene expression.
3. SAB Meeting
EMBL-EBI, November 2018
PRIDE Peptidome
PX
Complete
.
.
n
Hadoop Cluster
PRIDE Archive Import
Complete Submissions
PX successfully converted
New Peptide/PTMs
Number of Identified and non-Identified Spectra
QC
QC
Number of new clusters
PRIDE Cluster score distribution
Number of clusters by modification
mgf
(Annotated
spectra)
Clustering
Files
Johannes Griss
Visitor Postdoc
Griss J and Perez-Riverol Y, et. al. Nature Methods, 2016
http://wwwdev.ebi.ac.uk/pride/peptidome
VLIPVFALGR 0.98
ITVLEALR 0.95
LFDWANTSR 0.89
ATLNAFLYR 1.00
LAQFDYGR 0.75
.
.
.
7. SAB Meeting
EMBL-EBI, November 2018
Mapping peptides to ENSEMBL
GitHub Tool: https://github.com/bigbio/pgatk/tree/master/PepGenome
For each .pogo file:
• PTMs are standard to a common
representation using PRIDE-Mod library.
• Each Peptide reference to an Assay URL in
PRIDE.
• Each Pogo file is generated automatically by
the PRIDE Pipeline.
chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0
chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0
chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0
chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0
chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0
chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0
chromosome start end feature score itemRgbstrand
8. SAB Meeting
EMBL-EBI, November 2018
PRIDE Peptidome Pipeline
PX
Complete
.
.
n
Hadoop Cluster
PRIDE Archive Import
PX successfully converted
New Peptide/PTMs
Number of Identified and non-Identified Spectra
QC QC
Number of new clusters
PRIDE Cluster score distribution
Number of clusters by modification
mgf
(Annotated
spectra)
Clustering
Files
Peptide
Tables
(Pogo File)
Peptide Export
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
Johannes Griss
Visitor Postdoc
• Automatic update when new ENSEMBL Release.
• Support more than 19 species from ENSEMBL.
• Update when a new release of the data in PRIDE Peptidome.
• Support highlight on PTMs, transcript and gene uniqueness.
9. SAB Meeting
EMBL-EBI, November 2018
Annotating peptide evidences from PRIDE
Archive projects
• 1% FDR PSM level (Combine Results)
• 1% FDR Peptide Level (Combine Results)
Filters (HPP):
• > 8 AA
• 1% FDR at transcript level (inference
needed)
5
TrackHub Registry can search Tracks by:
ShortLabel, LongLabel.
OmicsType: Proteomics, Genomics,
Transcriptomics.
4
PRIDE Submission Pipelines1
mzid
Peak
lists
PX Complete Submission Assays
mztab
mgf
lists
PX Complete Submission
Assays
2
Convert to mztab/mgf and filter evidences do not
pass the reported mzid threshold.
5 Storage of the Project Metadata, Peptide Sequences,
Protein identifiers in Solr and MongoDB.
6
3
Assay
Peptide
Pogo File
PX Complete Submission
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
4
5
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/archive/
11. SAB Meeting
EMBL-EBI, November 2018
Complete Submissions (Human)
PSMs = 4,374,055.00
MOD-PSMs = 1,225,565.00
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
12. SAB Meeting
EMBL-EBI, November 2018
Human (hg38) Mouse (mm10)
• Black (all identified
peptides).
• Cyan (oxidation)
• Orange (acetyl)
• Red (phospho)
• 182 PRIDE public datasets.
• 163 from Homo sapiens.
• 15 from Mus musculus.
• 4 from Rattus norvegicus and 2
from Bos Taurus.
• 4 millions peptidoforms including
PTMs
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/
15. SAB Meeting
EMBL-EBI, November 2018
(Mapping also other ProteoXchange
Partners)
We have map more than 1 millions peptides from
PeptideAtlas into ENSEMBL Genome Coordinates.
16. SAB Meeting
EMBL-EBI, November 2018
Conclusions
Increase the number of submissions map to ESEMBL
coordinates.
Explore the possibility to map from the peptide evidence to
the corresponding spectrum visualizer in PRIDE.
Provide more information about the Disease, Tissue, cell
type when the information get improved in PRIDE.
Develop pipelines to move Intensity-based quantitative
data into ENSEMBL.
Reuse the generated data to improve ENSEMBL
annotations.
17. SAB Meeting
EMBL-EBI, November 2018
PRIDE Developer Team
@pride_ebi
@proteomexchange
Manuel Bernal-Llinares
(track-hub creator)
Johannes Griss
(pride cluster pipelines)
Christoph Schlaffner
(pogo tool)
Jyoti Choudhary
(PI)
Alessandro Vullo
(trackhub registry)
ENSEMBLTeam
Sanger Team