This document summarizes a presentation on using machine learning to predict rare cancer-causing mutations in protein kinases. The researchers identified 29 features of mutations and used feature selection and multiple classifiers to predict whether mutations were oncogenic drivers or benign passengers. They applied their model to rare mutations in the epidermal growth factor receptor, identifying two mutations as highly likely to be oncogenic based on their ability to increase EGFR activity and cell proliferation in experimental validation. Kinase group and family were the most important predictors of oncogenic mutations.
Efficient spin-up of Earth System Models usingsequence acceleration
Seminar: U et al. 2014 PLoS Comp. Biol. 10(4):e1003545
1. Prediction and prioritization of rare oncogenic
mutations in the cancer kinome using novel features
and multiple classi
2. ers
Presentation by Rosemary McCloskey
ManChon U1 Eric Talevich2 Samiksha Katiyar3 Khaled
Rasheed1 Natarajan Kannan3;4
1Department of Computer Science, University of Georgia
2Department of Dermatology, University of California San Francisco
3Department of Biochemistry and Molecular Biology, University of Georgia
4Institute of Bioinformatics, University of Georgia
September 4, 2014
U et al. Oncogenic kinome mutations September 4, 2014 1 / 11
3. Driver vs. passenger mutations
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
4. Driver vs. passenger mutations
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
5. Driver vs. passenger mutations
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
6. Driver vs. passenger mutations
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
7. Driver vs. passenger mutations
driver
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
8. Driver vs. passenger mutations
driver passengers
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
9. Driver vs. passenger mutations
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
10. Driver vs. passenger mutations
How to distinguish driver from passenger mutations?
U et al. Oncogenic kinome mutations September 4, 2014 2 / 11
11. Protein kinases
Enzymes which transfer POH from ATP to protein.
Regulate cellular processes.
Kinase mutations implicated in cancer.
518 human protein kinase genes (Manning et al. 2002).
8 groups (plus other and atypical), 133 families.
11 structurally conserved regions among all kinases
(subdomains).
U et al. Oncogenic kinome mutations September 4, 2014 3 / 11
12. Training and validation data
Supervised machine learning approach.
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
13. Training and validation data
birds
people
Y 7
Supervised machine learning approach.
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
14. Training and validation data
birds
people
Y 7
what is this?
m
Supervised machine learning approach.
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
15. Training and validation data
birds
people
Y 7
what is this?
m
bird
Supervised machine learning approach.
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
16. Training and validation data
birds
people
Y 7
what is this?
m
bird
Supervised machine learning approach.
Known oncogenic: Catalogue of Somatic Mutations in Cancer
(COSMIC) database.
I Filtered for protein kinases.
I Two sets, one excluding mutations reported only once.
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
17. Training and validation data
birds
people
Y 7
what is this?
m
bird
Supervised machine learning approach.
Known oncogenic: Catalogue of Somatic Mutations in Cancer
(COSMIC) database.
I Filtered for protein kinases.
I Two sets, one excluding mutations reported only once.
Known benign: protein kinase SNP's (SNP@Domain).
U et al. Oncogenic kinome mutations September 4, 2014 4 / 11
28. cation.
Conservation of position within all kinases, family, and group.
Amino acid properties: charge, polarity, mass, etc.
U et al. Oncogenic kinome mutations September 4, 2014 5 / 11
31. cation.
Conservation of position within all kinases, family, and group.
Amino acid properties: charge, polarity, mass, etc.
Structural and functional: kinase sub-domain, binding site,
posttranslational modi
32. cation.
U et al. Oncogenic kinome mutations September 4, 2014 5 / 11
35. cation.
Conservation of position within all kinases, family, and group.
Amino acid properties: charge, polarity, mass, etc.
Structural and functional: kinase sub-domain, binding site,
posttranslational modi
36. cation.
U et al. Oncogenic kinome mutations September 4, 2014 5 / 11
39. cation.
Conservation of position within all kinases, family, and group.
Amino acid properties: charge, polarity, mass, etc.
Structural and functional: kinase sub-domain, binding site,
posttranslational modi
40. cation.
Which of these is important?
U et al. Oncogenic kinome mutations September 4, 2014 5 / 11
41. Feature selection
5 feature selection algorithms.
U et al. Oncogenic kinome mutations September 4, 2014 6 / 11
42. Feature selection
5 feature selection algorithms.
10-fold cross-validation for each algorithm.
I Ran 10 times, each time excluding 10% of data.
I Averaged results over all 10 folds.
U et al. Oncogenic kinome mutations September 4, 2014 6 / 11
43. Feature selection
5 feature selection algorithms.
10-fold cross-validation for each algorithm.
I Ran 10 times, each time excluding 10% of data.
I Averaged results over all 10 folds.
17 features chosen by 3/5 selectors were retained.
U et al. Oncogenic kinome mutations September 4, 2014 6 / 11
44. Feature selection
5 feature selection algorithms.
10-fold cross-validation for each algorithm.
I Ran 10 times, each time excluding 10% of data.
I Averaged results over all 10 folds.
17 features chosen by 3/5 selectors were retained.
Kinase family and group were most important by a wide margin.
U et al. Oncogenic kinome mutations September 4, 2014 6 / 11
45. Training and cross-validation
11 machine learning methods.
U et al. Oncogenic kinome mutations September 4, 2014 7 / 11
46. Training and cross-validation
11 machine learning methods.
10-fold cross-validation for each (train on 90%, test on 10%).
U et al. Oncogenic kinome mutations September 4, 2014 7 / 11
47. Training and cross-validation
11 machine learning methods.
10-fold cross-validation for each (train on 90%, test on 10%).
Quantify accuracy of each method by
F measure =
2 recall precision
recall + precision
;
where
precision =
identi
54. ers to
analyse rare mutations in
epidermal growth factor
receptor.
L861R, G724S most likely
oncogenic.
U et al. Oncogenic kinome mutations September 4, 2014 8 / 11
56. ers to
analyse rare mutations in
epidermal growth factor
receptor.
L861R, G724S most likely
oncogenic.
T725M, L858Q middle-ranked
but unknown functional
impact.
U et al. Oncogenic kinome mutations September 4, 2014 8 / 11
58. ers to
analyse rare mutations in
epidermal growth factor
receptor.
L861R, G724S most likely
oncogenic.
T725M, L858Q middle-ranked
but unknown functional
impact.
E746K low ranked.
U et al. Oncogenic kinome mutations September 4, 2014 8 / 11
62. cation
Eect of each mutation on
EGFR autophosphorylation
and Akt phosphorylation.
L861R, T725M, E746K showed
increased EGFR
autophosphorylation.
I Up-regulation of cell
proliferation.
U et al. Oncogenic kinome mutations September 4, 2014 9 / 11
64. cation
Eect of each mutation on
EGFR autophosphorylation
and Akt phosphorylation.
L861R, T725M, E746K showed
increased EGFR
autophosphorylation.
I Up-regulation of cell
proliferation.
G724S, L858Q showed
increased Akt (protein kinase
B) phosphorylation.
I Blocks apoptosis.
U et al. Oncogenic kinome mutations September 4, 2014 9 / 11
65. Conclusions
Used machine learning to classify rare EGFR mutations.
U et al. Oncogenic kinome mutations September 4, 2014 10 / 11
66. Conclusions
Used machine learning to classify rare EGFR mutations.
Kinase group and family were most important predictors.
U et al. Oncogenic kinome mutations September 4, 2014 10 / 11
67. Conclusions
Used machine learning to classify rare EGFR mutations.
Kinase group and family were most important predictors.
Identi
68. ed T725M, L861R as likely cancer-associated with an
obvious mechanism (activating EGFR).
U et al. Oncogenic kinome mutations September 4, 2014 10 / 11
69. Conclusions
Used machine learning to classify rare EGFR mutations.
Kinase group and family were most important predictors.
Identi
70. ed T725M, L861R as likely cancer-associated with an
obvious mechanism (activating EGFR).
L858Q, G724S also likely oncogenic, but less obvious mechanism
(Akt?).
U et al. Oncogenic kinome mutations September 4, 2014 10 / 11