Ensemble Learning Model for MS/MS

336
-1

Published on

Ensemble Learning Model to Estimate the Accuracy of
Peptide Identi cations Made by MS/MS

Course project for B529

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
336
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ensemble Learning Model for MS/MS

  1. 1. Ensemble Learning Model to Estimate the Accuracy of Peptide Identifications Made by MS/MS Qiang Kou Shan Xiao Xiaohui Yao Zongliang Yue qkou@umail.iu.edu April 29, 2014 Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 1/25 April 29, 2014 1 / 25
  2. 2. Background Background Mass spectrometry has become the most widely used tool for the characterization of proteins Many database searching softwares and algorithms have been developed, including SEQUEST, MASCOT, X!tandem, InsPecT, MS-Align+ Scores always have significant overlap between correct and incorrect identification Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 2/25 April 29, 2014 2 / 25
  3. 3. Background Background 0 20 40 60 80 100 120 140 160 180 200 -3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3 “Correct” “Incorrect” Descriminant Score (D) NumberofSpectrainEachBin From Brian C. Searle Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 3/25 April 29, 2014 3 / 25
  4. 4. Background Available Software PeptideProphet [1] F(x1, x2, . . . , xn) = c0 + n i=1 ci xi p(+|F) = p(F|+)p(+) p(F|+)p(+)+p(F|−)p(−) Percolator [2, 3] F(x) = i wi hi (x) + b, where hk (x) = tanh((wk )T x + bk ) sigmod loss function: L(F(x), y) = 1/exp(1 + F(x)) Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 4/25 April 29, 2014 4 / 25
  5. 5. Background Trans-Proteomic Pipeline PeptideProphet mzXML X!Tandem Percolator ProteinProphet Proteins Ensemble Learning Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 5/25 April 29, 2014 5 / 25
  6. 6. Ensemble Learning Ensemble Learning Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 6/25 April 29, 2014 6 / 25
  7. 7. Ensemble Learning Ensemble Learning Homogeneous: learners from the same category boosting bagging random forest Heterogeneous: learners from different categories Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 7/25 April 29, 2014 7 / 25
  8. 8. Ensemble Learning Example of Ensemble Learning Two real variables Two random pseudo variables Three methods: linear model, SVM and random forest Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 8/25 April 29, 2014 8 / 25
  9. 9. Ensemble Learning Results of Three Methods Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 9/25 April 29, 2014 9 / 25
  10. 10. Ensemble Learning Average of Three Methods Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 10/25 April 29, 2014 10 / 25
  11. 11. Ensemble Strategy Ensemble Strategy Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 11/25 April 29, 2014 11 / 25
  12. 12. Ensemble Strategy Non-negative Least Squares and Logistic Regression Non-negative least squares regression fe(X) = k i=1 αi fi (X), αi = 1, αi ≥ 0 Non-negative logistic regression fe(X) = 1 1 + exp(− k i αi fi (X)) , αi ≥ 0 Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 12/25 April 29, 2014 12 / 25
  13. 13. Ensemble Strategy Greedy Strategy 1 Start with the empty ensemble; 2 Add the model which can maximize the ensemble’s classification result on the training dataset; 3 Repeat Step 2 for a fixed number of iterations; 4 Return the final ensemble. Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 13/25 April 29, 2014 13 / 25
  14. 14. Application in MS/MS Application in MS/MS Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 14/25 April 29, 2014 14 / 25
  15. 15. Application in MS/MS Available Features Symbol Description mass precursor neutral mass time retention time ∆M mass difference #match numuber of matched ions pepLen peptide length charge charge state exp E-value #missed #missed cleavages enzN if prceded by an enzymatic site enzC if there is an enzymatic C-terminus #consistent #peptide termini consistent with cleavage #ions #fragment ions predicted for peptide #proteins #proteins containing peptide Arg,. . . ,Val # each kind of amino acid Hyperscore, Nextscore, BScore, YScore scoring functions in X!tandem Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 15/25 April 29, 2014 15 / 25
  16. 16. Application in MS/MS Weights in Regularized Generalized Linear Model Description Weights Description Weights #missed -1.923 Arg -1.321 charge 1.246 Cys -1.062 Lys -0.990 His 0.790 Trp 0.726 #consistent -0.494 Pro 0.407 Asp 0.388 Met -0.369 Val 0.350 bscore -0.347 Tyr 0.238 #ions 0.210 Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 16/25 April 29, 2014 16 / 25
  17. 17. Application in MS/MS Model Used Algorithm Description R Package glm linear model stats randomForest random forest randomForest knn k-nearest neighbour stats glmnet elastic net glmnet svm SVM e1071 step stepwise glm stats Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 17/25 April 29, 2014 17 / 25
  18. 18. Application in MS/MS Training and Testing Dataset Paola Picotti, et al. Nature 494:266-270, 2013 [4] Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 18/25 April 29, 2014 18 / 25
  19. 19. Application in MS/MS ROC Curves False positive rate Truepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Ensemble Learning 0.873 Percolator 0.821 PeptideProphet 0.789 Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 19/25 April 29, 2014 19 / 25
  20. 20. Application in MS/MS Relation between FDR and Ensemble Score with LOESS q q q qq q qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q qq q q qq qqqqqqqq0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Ensemble Score FDR Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 20/25 April 29, 2014 20 / 25
  21. 21. Application in MS/MS Relation between FDR and Ensemble Score with LOESS q q q qq q qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q qq q q qq qqqqqqqq0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Ensemble Score FDR Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 21/25 April 29, 2014 21 / 25
  22. 22. Application in MS/MS Number of Correct/Incorrect Identifications with 0.05 FDR 0 500 1000 PeptideProphet Percolator Ensemble methods number variable correct incorrect Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 22/25 April 29, 2014 22 / 25
  23. 23. Application in MS/MS Some Conclusion Ensemble learning methods often have better results Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 23/25 April 29, 2014 23 / 25
  24. 24. Application in MS/MS Some Conclusion Ensemble learning methods often have better results Very easy to over fit on training data Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 23/25 April 29, 2014 23 / 25
  25. 25. Application in MS/MS Some Conclusion Ensemble learning methods often have better results Very easy to over fit on training data Time-consuming for model training Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 23/25 April 29, 2014 23 / 25
  26. 26. Application in MS/MS References Andrew Keller, Alexey I Nesvizhskii, Eugene Kolker, and Ruedi Aebersold. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 2002. Lukas K¨all, Jesse D Canterbury, Jason Weston, William Stafford Noble, and Michael J MacCoss. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods, 4(11), 2007. Marina Spivak, Jason Weston, Lon Bottou, Lukas Kll, and William Stafford Noble. Improvements to the percolator algorithm for peptide identification from shotgun proteomics data sets. Journal of Proteome Research, 8(7):3737–3745, 2009. Paola Picotti, Mathieu Clment-Ziza, Henry Lam, David S Campbell, Alexander Schmidt, Eric W Deutsch, Hannes Rst, Zhi Sun, Oliver Rinner, Lukas Reiter, Qin Shen, Jacob J Michaelson, Andreas Frei, Simon Alberti, Ulrike Kusebauch, Bernd Wollscheid, Robert L Moritz, Andreas Beyer, and Ruedi Aebersold. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature, 494(7436), 2013. Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 24/25 April 29, 2014 24 / 25
  27. 27. Thank you Thank you Thank you! http://qkou.info/sl.pdf Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 25/25 April 29, 2014 25 / 25

×