2. Understanding the Basic
What are peptides and why are they important?
● Proteins are made up of chains of amino
acids
● Peptide contains two or more amino acids
● Studies have shown that peptides can help
design novel enzymes, drugs, vaccines
● Mutant protein that causes genetic disease
Accurately identifying protein is one of the
main objectives in proteomic research
What is mass spectrometry (MS)?
● Identify protein by mass
● Proteins are digested into peptides and
injected into an instrument call liquid
chromatography-tandem MS (LC-MS/MS)
- Detect and quantify molecules
- Produce a spectrum graph that
contains two main features: mass-to-
charge ratio (m/z) and intensity
(sometimes called relative abundance).
3. Sequence databases and Spectral Libraries
● Use sequence database
to link mass spectra to a
peptide
● Spectral library contains
a curated collection of
LC-MS/MS peptide
spectra with their
corresponding protein
sequence
4. Main Objective
To improve data quality control and shorten
computational efforts, deep learning models are
integrated into the database search pipeline and
used to help optimize search efficiency
6. Retention Time
Prediction Basic Idea
● Time duration of the peptide spent in its
stationary and mobile phase is called the
retention time
● Improve and evaluate the quality of identifying
peptides during database searching
Deep Learning Aspect
● Popular DL architectures: CNN, RNN, Hybrid
networks
● RNN can capture long-range interactions within
the sequence, perfect for modeling sequential
protein data
9. MS/MS Spectrum
Prediction
Basic Idea
● Peptides are fragmented and detected through
mass spectrometry or MS/MS (two mass analyzers)
● Example of mass spectrum for pentane:
Deep Learning Aspect
● Generating spectra libraries is computationally
expensive and time-consuming
● Use DL to predict spectra for peptide sequence to
help build and expand coverage of protein database
10. Deep Learning tools
Input - one-hot encoded peptide sequence
Output - intensities of different fragment ion types at each position along the input peptide sequence
11. De Novo Peptide
Sequencing
Basic Idea
● Directly identify peptide sequence from the
MS/MS spectrum without the use of sequence
db
● Resemble image captioning where MS/MS
spectra are equivalent to image and peptide
sequence to caption
12. Conclusion
● Work and improvement needs to be done in model
development and application
● Constraint on dataset with retention time and MS/MS
spectrum prediction
● Issues of generalization in transfer learning
● Data science community should get a better understanding
on the mechanism of proteomic problems
13. Reference
Read more about other major applications of deep learning in proteomics from the articles:
● Deep Learning in Proteomics
https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.201900335
For a deeper understanding of the fundamentals:
● Fundamentals of Biological Mass Spectrometry and Proteomics
https://www.broadinstitute.org/files/shared/proteomics/Fundamentals_of_Biological_MS_an
d_Proteomics_Carr_5_15.pdf
● How Does Mass Spectroscopy Work? https://bitesizebio.com/6016/how-does-mass-
spec-work/