Introduction to proteomics
Why proteomics?
● Most of the stuff are done by proteins
● Valuable publicly available resources
● Post translational modifications may influence epigenetics
● Many recent applications of machine learning in
○ Preprocessing raw files
○ Analysis
○ Interpreting
● Proteogenomics
○ Proteoepigenomics?
How does a peptide look like
How does a peptide look like
How is proteomics data generated?
Image from: http://www.premierbiosoft.com/tech_notes/mass-spectrometry.html
Using trypsin to limit possibilities
Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids
lysine or arginine, except when either is followed by proline.
1st dimension of data: time
Other dimensions
Slide from Andrei Drabovich
Possible precursor fragmentation
Slide from Thomas Kislinger
In summary
Slide from Thomas Kislinger
How are proteomics
experiments designed?
A. Labeling Cysteine after
obtaining peptide mixture
B. Labeling with heavy amino
acids while growing cells or
animal
C. Reaction with iTRAQ tags
Slide from Thomas Kislinger
What are the features for each peptide?
1. Time of elution
2. m/z and intensity of precursor ion (trypsinated peptides)
3. m/z and intensity of fragmented ions (mono-oligo peptides) from each
precursor ion
How to analyze the data?
1. Possible expectations from a FASTA file
2. Publicly available data from known proteins
3. De novo sequencing:
a. What are the possible peptides for a given precursion ion m/z
b. Which of these peptides are better supported with m/z of obtained b/y ions?
How to analyze the data?
Zolg, Daniel P., et al. "Building ProteomeTools based on a complete synthetic human proteome." Nature methods 14.3 (2017): 259.
How to infer post-translational modifications
Chick, Joel M., et al. "A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides."
Nature biotechnology 33.7 (2015): 743.
Bioconductor
packages
Publicly available cancer proteomics resources
● The Clinical Proteomic Tumor Analysis Consortium (CPTAC)
○ Breast cancer
○ Ovarian cancer
○ Colorectal cancer
○ Rectal adenocarcinoma
○ Skin carcinoma
○ Oral squamous cell carcinoma
○ Treated and resistant triple negative breast cancer xenografts

Introduction to proteomics

  • 1.
  • 2.
    Why proteomics? ● Mostof the stuff are done by proteins ● Valuable publicly available resources ● Post translational modifications may influence epigenetics ● Many recent applications of machine learning in ○ Preprocessing raw files ○ Analysis ○ Interpreting ● Proteogenomics ○ Proteoepigenomics?
  • 3.
    How does apeptide look like
  • 4.
    How does apeptide look like
  • 5.
    How is proteomicsdata generated? Image from: http://www.premierbiosoft.com/tech_notes/mass-spectrometry.html
  • 6.
    Using trypsin tolimit possibilities Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, except when either is followed by proline.
  • 7.
    1st dimension ofdata: time
  • 8.
  • 9.
  • 10.
    In summary Slide fromThomas Kislinger
  • 11.
    How are proteomics experimentsdesigned? A. Labeling Cysteine after obtaining peptide mixture B. Labeling with heavy amino acids while growing cells or animal C. Reaction with iTRAQ tags Slide from Thomas Kislinger
  • 12.
    What are thefeatures for each peptide? 1. Time of elution 2. m/z and intensity of precursor ion (trypsinated peptides) 3. m/z and intensity of fragmented ions (mono-oligo peptides) from each precursor ion
  • 13.
    How to analyzethe data? 1. Possible expectations from a FASTA file 2. Publicly available data from known proteins 3. De novo sequencing: a. What are the possible peptides for a given precursion ion m/z b. Which of these peptides are better supported with m/z of obtained b/y ions?
  • 14.
    How to analyzethe data? Zolg, Daniel P., et al. "Building ProteomeTools based on a complete synthetic human proteome." Nature methods 14.3 (2017): 259.
  • 15.
    How to inferpost-translational modifications Chick, Joel M., et al. "A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides." Nature biotechnology 33.7 (2015): 743.
  • 16.
  • 17.
    Publicly available cancerproteomics resources ● The Clinical Proteomic Tumor Analysis Consortium (CPTAC) ○ Breast cancer ○ Ovarian cancer ○ Colorectal cancer ○ Rectal adenocarcinoma ○ Skin carcinoma ○ Oral squamous cell carcinoma ○ Treated and resistant triple negative breast cancer xenografts

Editor's Notes

  • #6 Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, except when either is followed by proline.
  • #8 SCX: Strong cation exchange C18: Hydrophobic column HPLC: High pressure liquid chromatography Acetonitrile: ACN a polar aprotic solvent Blue: reverse phase gradient Green: salt pulse and reverse phase
  • #9 SCX: Strong cation exchange C18: Hydrophobic column HPLC: High pressure liquid chromatography Acetonitrile: ACN a polar aprotic solvent Blue: reverse phase gradient Green: salt pulse and reverse phase
  • #12 ICAT - Isotope coded affinity tags SILAC - Stable isotope labeling by amino acids in cell culture iTRAQ - Isotope tags for relative and absolute quantification