Call Girls Dilsukhnagar 7001305949 all area service COD available Any Time
Β
Introduction to Proteogenomics
1. Dr. Yasset Perez-Riverol
Twitter/github: @ypriverol
Proteomics Project Leader
EMBL-EBI
Hinxton, Cambridge, UK
Proteogenomics
Integration of proteomics data in Ensembl.
2. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Outline
I. Why is so important multi-omics approaches.
II. What is proteogenomics.
III. Proteogenomics by integrating proteomics and
genomics resources.
IV. Ensembl/UCSC Trackhubs.
V. Integration of Proteomics data into Ensembl
trackhubs.
3. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Ritchie, Marylyn D., et al. "Methods of integrating data to uncover genotypeβphenotype interactions." Nature
Reviews Genetics 16.2 (2015): 85.
4. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Outline
I. Why is so important multi-omics approaches.
II. What is proteogenomics.
III. Proteogenomics by integrating proteomics and
genomics resources.
IV. Ensembl/UCSC Trackhubs.
V. Integration of Proteomics data into Ensembl
trackhubs.
5. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
What is proteogenomics
Proteogenomics is a field of biological research that utilizes a combination of proteomics,
genomics, and transcriptomics to aid in the discovery and identification/quantification of
peptides and proteins. Proteogenomics is used to identify new peptides by comparing
MS/MS spectra against a protein database that has been derived from genomic and
transcriptomics information.
Gene
annotations
Identification
Novel peptides
Prokaryotic organisms
frameshifts, N-terminal methionine
excision, signal peptides, and other
post-translational modifications.
Multi-omics
Analysis
Correlation between genomics and
proteomics sequence events:
genetic mutations, posttranslational
modifications.
Proteogenomics
Custom Protein sequence
Six-frame / three-
frame translation
RNA-seq data
Exome data
Downstream
analysis
FDR analysis, Filtering rules
Mapping
Reference System
High-Quality
Evidences
FDR analysis, Filtering rules
Downstream
analysis
Correlations
between features
Genome
Coordinates Trackhubs
7. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
90% of peptides
most of these peptides are localized within
an exon, and the remaining peptides -
typically less than twenty percent - span an
exon-exon junction.
8. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
PGATK: ProteoGenomics Analysis Toolkit
https://pgatk.readthedocs.io/en/latest/pypgatk.html
Extenddatabasefor
cancer-orientedstudies
AllMSexperiments
withtofilters:
Taxonomy,Tissues
Cosmic
Cancer
Mutations
cBioportal
Cancer
Mutations
Experimental Design
VCF variants from Ensembl
including:
β’ germline variations
VCF variants from
Ensembl including:
β’ Somatic
mutations (filter
by Tissues)
Cancer mutation protein
databases based on
Cosmic and cbBioPortal
β’ Somatic mutations
(filter by Tissues)
Database Construction
12. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Outline
I. Why is so important multi-omics approaches.
II. What is proteogenomics.
III. Proteogenomics by integrating proteomics and
genomics resources.
IV. Ensembl/UCSC Trackhubs.
V. Integration of Proteomics data into Ensembl
trackhubs.
13. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Integration of genomics / proteomics
resources
The number of multi-omics datasets is still growing but not
enough. An alternative options is to correlate and complement
your proteomics data with existing public genomics data or other
proteomics datasets in PRIDE database.
Multi-omics
Analysis
Correlation between genomics and
proteomics sequence events:
genetic mutations, posttranslational
modifications.
Mapping
Reference System
High-Quality
Evidences
FDR analysis, Filtering rules
Downstream
analysis
Correlations
between features
Genome
Coordinates
Trackhubs
Gene Features
Transcript Features
Protein Features
Protein Family / Structures
.
.
.
14. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
PRIDE datasets to Ensembl
coordinatesPX Submission
Tool
PRIDE Archive
1 2
PRIDE submission Pipelines
PRIDE
Archive Web
and API
3
TrackHub
Registry
4
PX submission can be Partial or Complete:
Partial Submission: RAW data, SEARCH
Results and Peaks Lists.
Complete Submission: RAW data, Result
Files and Peak Lists, SEARCH Results.
1
Each PX submission can be search by:
Title, Metadata, Description, Tissue,
Taxonomy, PTMs.
Peptide Sequence or Protein Identifier.
3
TrackHub Registry can search Tracks by:
ShortLabel, LongLabel.
OmicsType: Proteomics, Genomics,
Transcriptomics.
4
PRIDE Submission Pipelines2
mzid
Peak
lists
PX Complete Submission
Assays
mztab
mgf
lists
PX Complete Submission
Assays
5
Convert to mztab/mgf and filter evidences do
not pass the reported mzid threshold.
5 Storage of the Project Metadata, Peptide Sequences,
Protein identifiers in Solr and MongoDB.
6
6
Assay
Peptide
Pogo File
PX Complete Submission
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
15. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Generation reliable peptide tables
Current Filter options:
β’ 1% FDR PSM level (Combine
Results)
β’ 1% FDR Peptide Level (Combine
Results)
Possible Filters (HPP):
β’ > 8 AA
β’ 1% FDR at transcript level
(inference needed)
Combine PSM Score:
- Same Spectra, Peptide
- Different Search Engine
Combine Peptide Score:
- Same Peptide
- Different PSMs
Experiment Peptide PSMs Quant
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34644">Assay 34644</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> THTQDAVPLTLGQEFSGYVQQVQYAM(oxidation)VR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> KKQVM(oxidation)EK 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> VGSGDTNNFPYLEK 2 2.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> SLTYLSILR 3 3.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> LPFTPLSYIQGLSHR 8 8.000000
A B DC E
P1 P3P2 P4
PR1
JIG HF
P5
PR1
Protein Inference Toolkit
Protein Groups
Audain, Enrique, et al. "In-depth analysis of protein inference algorithms using multiple search engines and well-
defined metrics." Journal of proteomics 150 (2017): 170-182.
16. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Mapping peptides to ENSEMBL
https://pgatk.readthedocs.io/en/latest/pepgenome.html
For each .pogo file:
β’ PTMs are standard to a common
representation using PRIDE-Mod library.
β’ Each Peptide reference to an Assay URL in
PRIDE.
β’ Each Pogo file is generated automatically
by the PRIDE Pipeline.
chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0
chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0
chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0
chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0
chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0
chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0
Challenge in the Future:
β’ Bed information can be extended with more
information about the transcript reliability.
β’ Peptide uniqueness
β’ Reliability score.
β’ Native bigBed should be provided to
remove the customization of new pipelines,
etc.
β’ What to do with the unmapped peptides
(which are long lists.)
β’ Maintainability.
chromosome start end feature score itemRgbstrand
21. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Outline
I. Why is so important multi-omics approaches.
II. What is proteogenomics.
III. Proteogenomics by integrating proteomics and
genomics resources.
IV. Ensembl/UCSC Trackhubs.
V. Integration of Proteomics data into Ensembl
trackhubs.
22. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Introduction to Genome Browsers
β’ Browse genes in their genomic context.
β’ See features in and around a specific gene
β’ Investigate genome organization and explore larger chromosome regions
Ensembl Genome Browser: http://www.ensembl.org/
UCSC Genome Browser: http://genome.ucsc.edu/
24. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
What are Track hubs?
β’ Internet-accessible collections of genome annotations
β’ Demand on transfer β hub annotations are stored at the
remote site.
β’ Client side caching.
25. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Trackhub structure
β’ hub.txt β defines the labels used to describe the hub
β’ genomes.txt β describes the assemblies supported by the
hub
β’ trackDb.txt β describe the data files and defines their
display attributes
β’ complex format, collection of stanzas
β’ defines the display and configuration properties
Track database definition document
http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html)
33. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
Outline
I. Why is so important multi-omics approaches.
II. What is proteogenomics.
III. Proteogenomics by integrating proteomics and
genomics resources.
IV. Ensembl/UCSC Trackhubs.
V. Integration of Proteomics data into Ensembl
trackhubs.
37. Proteomics Bioinformatics Course
EMBL-EBI, July β 2019
PRIDE team.
Johannes Griss
(pride cluster pipelines)
Boston Children Hospital
Christoph Schlaffner
(PepGenome tool)
Juan A. Vizcaino
(PI)
Alessandro Vullo
(trackhub registry)
ENSEMBLTeam
Chakradhar Reddy Bandla
(PepGenome tool)
Karolinska Institute
Husen Umer
(PyPGATK tool)
Rui Branca
(PyPGATK tool)
Yafeng Zhu
(PyPGATK tool)