SlideShare a Scribd company logo
1 of 13
A Gentle Introduction to Deep Learning
in Proteomics
Medium Short Story - Haley Feng
Understanding the Basic
What are peptides and why are they important?
● Proteins are made up of chains of amino
acids
● Peptide contains two or more amino acids
● Studies have shown that peptides can help
design novel enzymes, drugs, vaccines
● Mutant protein that causes genetic disease
Accurately identifying protein is one of the
main objectives in proteomic research
What is mass spectrometry (MS)?
● Identify protein by mass
● Proteins are digested into peptides and
injected into an instrument call liquid
chromatography-tandem MS (LC-MS/MS)
- Detect and quantify molecules
- Produce a spectrum graph that
contains two main features: mass-to-
charge ratio (m/z) and intensity
(sometimes called relative abundance).
Sequence databases and Spectral Libraries
● Use sequence database
to link mass spectra to a
peptide
● Spectral library contains
a curated collection of
LC-MS/MS peptide
spectra with their
corresponding protein
sequence
Main Objective
To improve data quality control and shorten
computational efforts, deep learning models are
integrated into the database search pipeline and
used to help optimize search efficiency
Encoding Methods
Retention Time
Prediction Basic Idea
● Time duration of the peptide spent in its
stationary and mobile phase is called the
retention time
● Improve and evaluate the quality of identifying
peptides during database searching
Deep Learning Aspect
● Popular DL architectures: CNN, RNN, Hybrid
networks
● RNN can capture long-range interactions within
the sequence, perfect for modeling sequential
protein data
Deep Learning tools
Prosit
Deep Learning tools
List of DL-based retention time prediction tools in recent years
MS/MS Spectrum
Prediction
Basic Idea
● Peptides are fragmented and detected through
mass spectrometry or MS/MS (two mass analyzers)
● Example of mass spectrum for pentane:
Deep Learning Aspect
● Generating spectra libraries is computationally
expensive and time-consuming
● Use DL to predict spectra for peptide sequence to
help build and expand coverage of protein database
Deep Learning tools
Input - one-hot encoded peptide sequence
Output - intensities of different fragment ion types at each position along the input peptide sequence
De Novo Peptide
Sequencing
Basic Idea
● Directly identify peptide sequence from the
MS/MS spectrum without the use of sequence
db
● Resemble image captioning where MS/MS
spectra are equivalent to image and peptide
sequence to caption
Conclusion
● Work and improvement needs to be done in model
development and application
● Constraint on dataset with retention time and MS/MS
spectrum prediction
● Issues of generalization in transfer learning
● Data science community should get a better understanding
on the mechanism of proteomic problems
Reference
Read more about other major applications of deep learning in proteomics from the articles:
● Deep Learning in Proteomics
https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.201900335
For a deeper understanding of the fundamentals:
● Fundamentals of Biological Mass Spectrometry and Proteomics
https://www.broadinstitute.org/files/shared/proteomics/Fundamentals_of_Biological_MS_an
d_Proteomics_Carr_5_15.pdf
● How Does Mass Spectroscopy Work? https://bitesizebio.com/6016/how-does-mass-
spec-work/

More Related Content

Similar to Short story PPT

Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Keiji Takamoto
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
thehyve
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
Chien-Wei Lin
 

Similar to Short story PPT (20)

Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
 
C044041723
C044041723C044041723
C044041723
 
“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Proteomics 2009 V9p1696
Proteomics 2009 V9p1696Proteomics 2009 V9p1696
Proteomics 2009 V9p1696
 
Mascot database
Mascot databaseMascot database
Mascot database
 
Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
The Application and Methods for Peptidomics
The Application and Methods for PeptidomicsThe Application and Methods for Peptidomics
The Application and Methods for Peptidomics
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Drug design based on bioinformatic tools
Drug design based on bioinformatic toolsDrug design based on bioinformatic tools
Drug design based on bioinformatic tools
 
A brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsA brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methods
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Proteomics
ProteomicsProteomics
Proteomics
 

Recently uploaded

bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
JocylDuran
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Obat Aborsi 088980685493 Jual Obat Aborsi
 

Recently uploaded (20)

bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.
 

Short story PPT

  • 1. A Gentle Introduction to Deep Learning in Proteomics Medium Short Story - Haley Feng
  • 2. Understanding the Basic What are peptides and why are they important? ● Proteins are made up of chains of amino acids ● Peptide contains two or more amino acids ● Studies have shown that peptides can help design novel enzymes, drugs, vaccines ● Mutant protein that causes genetic disease Accurately identifying protein is one of the main objectives in proteomic research What is mass spectrometry (MS)? ● Identify protein by mass ● Proteins are digested into peptides and injected into an instrument call liquid chromatography-tandem MS (LC-MS/MS) - Detect and quantify molecules - Produce a spectrum graph that contains two main features: mass-to- charge ratio (m/z) and intensity (sometimes called relative abundance).
  • 3. Sequence databases and Spectral Libraries ● Use sequence database to link mass spectra to a peptide ● Spectral library contains a curated collection of LC-MS/MS peptide spectra with their corresponding protein sequence
  • 4. Main Objective To improve data quality control and shorten computational efforts, deep learning models are integrated into the database search pipeline and used to help optimize search efficiency
  • 6. Retention Time Prediction Basic Idea ● Time duration of the peptide spent in its stationary and mobile phase is called the retention time ● Improve and evaluate the quality of identifying peptides during database searching Deep Learning Aspect ● Popular DL architectures: CNN, RNN, Hybrid networks ● RNN can capture long-range interactions within the sequence, perfect for modeling sequential protein data
  • 8. Deep Learning tools List of DL-based retention time prediction tools in recent years
  • 9. MS/MS Spectrum Prediction Basic Idea ● Peptides are fragmented and detected through mass spectrometry or MS/MS (two mass analyzers) ● Example of mass spectrum for pentane: Deep Learning Aspect ● Generating spectra libraries is computationally expensive and time-consuming ● Use DL to predict spectra for peptide sequence to help build and expand coverage of protein database
  • 10. Deep Learning tools Input - one-hot encoded peptide sequence Output - intensities of different fragment ion types at each position along the input peptide sequence
  • 11. De Novo Peptide Sequencing Basic Idea ● Directly identify peptide sequence from the MS/MS spectrum without the use of sequence db ● Resemble image captioning where MS/MS spectra are equivalent to image and peptide sequence to caption
  • 12. Conclusion ● Work and improvement needs to be done in model development and application ● Constraint on dataset with retention time and MS/MS spectrum prediction ● Issues of generalization in transfer learning ● Data science community should get a better understanding on the mechanism of proteomic problems
  • 13. Reference Read more about other major applications of deep learning in proteomics from the articles: ● Deep Learning in Proteomics https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.201900335 For a deeper understanding of the fundamentals: ● Fundamentals of Biological Mass Spectrometry and Proteomics https://www.broadinstitute.org/files/shared/proteomics/Fundamentals_of_Biological_MS_an d_Proteomics_Carr_5_15.pdf ● How Does Mass Spectroscopy Work? https://bitesizebio.com/6016/how-does-mass- spec-work/