Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine learning, biomarker accuracy and best practices

270 views

Published on

Machine learning, biomarker accuracy and best practices.

biomarker, clinical, cross validation, healthcare, machine learning, neurology, neuroscience, personalized medicine, predictive modeling, Pradeep Reddy Raamana, Toronto, Canada, cross invalidation

Published in: Engineering
  • Be the first to comment

Machine learning, biomarker accuracy and best practices

  1. 1. Machine learning, biomarker accuracy 
 and best practices Pradeep Reddy Raamana crossinvalidation.com
  2. 2. Singular goal of workshop Accuracydistribution model 1 model 2 model 3 model 4 model 5 model 6 • understand
  3. 3. Singular goal of workshop Accuracydistribution model 1 model 2 model 3 model 4 model 5 model 6 • understand • machine learning
  4. 4. Singular goal of workshop Accuracydistribution model 1 model 2 model 3 model 4 model 5 model 6 • understand • machine learning • support vector machine
  5. 5. Singular goal of workshop Accuracydistribution model 1 model 2 model 3 model 4 model 5 model 6 • understand • machine learning • support vector machine • classification accuracy
  6. 6. Singular goal of workshop Accuracydistribution model 1 model 2 model 3 model 4 model 5 model 6 • understand • machine learning • support vector machine • classification accuracy • cross-validation
  7. 7. What is Machine Learning?
  8. 8. What is Machine Learning? • “giving computers the ability to learn without being explicitly programmed.”
  9. 9. What is Machine Learning? • “giving computers the ability to learn without being explicitly programmed.” • i.e. building algorithms to learn patterns in data
  10. 10. What is Machine Learning? • “giving computers the ability to learn without being explicitly programmed.” • i.e. building algorithms to learn patterns in data • automatically
  11. 11. Examples images from various sites on internet
  12. 12. Examples images from various sites on internet
  13. 13. Examples images from various sites on internet
  14. 14. Examples images from various sites on internet
  15. 15. Examples images from various sites on internet
  16. 16. Examples images from various sites on internet
  17. 17. Clinical Application of ML • ML has many clinical applications, including:
  18. 18. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support
  19. 19. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine
  20. 20. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine • treatment and monitoring
  21. 21. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine • treatment and monitoring • better care and service delivery systems (reduce length of hospitalization, optimize resource redistribution etc)
  22. 22. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine • treatment and monitoring • better care and service delivery systems (reduce length of hospitalization, optimize resource redistribution etc) • I will focus on biomarkers today!
  23. 23. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine • treatment and monitoring • better care and service delivery systems (reduce length of hospitalization, optimize resource redistribution etc) • I will focus on biomarkers today! • more on how to assess their utility
  24. 24. Clinical Application of ML • ML has many clinical applications, including: • computer-aided diagnosis • clinical decision support • personalized medicine • treatment and monitoring • better care and service delivery systems (reduce length of hospitalization, optimize resource redistribution etc) • I will focus on biomarkers today! • more on how to assess their utility • less on how to identify, build and tune them.
  25. 25. Types of Machine learning Data Labelled SupervisedUnsupervised Data Not labelled
  26. 26. Unsupervised learning Discover hidden patterns
  27. 27. Unsupervised learning Discover hidden patterns
  28. 28. Unsupervised: examples images from wikipedia.com and gerfficient.com
  29. 29. Unsupervised: examples • Clustering images from wikipedia.com and gerfficient.com
  30. 30. Unsupervised: examples • Clustering • Blind source separation • PCA • ICA images from wikipedia.com and gerfficient.com
  31. 31. Supervised learning
  32. 32. Supervised learning Classification Setosa Versicolor Viriginica
  33. 33. Supervised learning Classification Regression Setosa Versicolor Viriginica
  34. 34. Supervised: examples
  35. 35. Supervised: examples linear classifier A B
  36. 36. Supervised: examples support vector machine linear classifier A B
  37. 37. Supervised: examples support vector machine linear classifier A B decision tree is x1 < 1.5 BA yes no
  38. 38. Focus Today classificationclustering regression
  39. 39. Focus Today classificationclustering regression
  40. 40. Terminology
  41. 41. Terminology PetalSepal
  42. 42. Terminology names→ counter↓ sepal width sepal length petal width petal length class 1 0.2 1.1 0.4 1 setosa features (variables, dimensions, columns etc) PetalSepal
  43. 43. Terminology names→ counter↓ sepal width sepal length petal width petal length class 1 0.2 1.1 0.4 1 setosa 2 0.35 0.9 0.1 2 setosa 3 0.3 … 4 0.28 versicolor 5 .. versicolor … .. .. … 0.45 virginica N 0.35 virginica samples (observations, data points etc) features (variables, dimensions, columns etc) PetalSepal
  44. 44. Terminology names→ counter↓ sepal width sepal length petal width petal length class 1 0.2 1.1 0.4 1 setosa 2 0.35 0.9 0.1 2 setosa 3 0.3 … 4 0.28 versicolor 5 .. versicolor … .. .. … 0.45 virginica N 0.35 virginica samples (observations, data points etc) features (variables, dimensions, columns etc) PetalSepal y ↓ → X
  45. 45. Classification Training data
  46. 46. Classification Training data Build the classifier
  47. 47. Classification Training data New test data Build the classifier
  48. 48. Classification Training data New test data map to known classes Build the classifier
  49. 49. Support Vector Machine (SVM)
  50. 50. Support Vector Machine (SVM) • A popular classification technique
  51. 51. Support Vector Machine (SVM) • A popular classification technique
  52. 52. Support Vector Machine (SVM) • A popular classification technique • At its core, it is • binary (separate two classes)
  53. 53. Support Vector Machine (SVM) • A popular classification technique • At its core, it is • binary (separate two classes) • linear (boundary: line in 2d or hyperplane in n-d)
  54. 54. Support Vector Machine (SVM) • A popular classification technique • At its core, it is • binary (separate two classes) • linear (boundary: line in 2d or hyperplane in n-d) • Its power lies in finding the boundary between classes difficult to separate
  55. 55. How does SVM work? x1 x2
  56. 56. How does SVM work? x1 x2
  57. 57. How does SVM work? x1 x2
  58. 58. How does SVM work? L1 x1 x2
  59. 59. How does SVM work? L1 L2 x1 x2
  60. 60. How does SVM work? L1 L2 L3 x1 x2
  61. 61. How does SVM work? L1 L2 L3 x1 x2
  62. 62. How does SVM work? L1 L2 L3 x1 x2
  63. 63. How does SVM work? L1 L2 L3 x1 x2 support vectors
  64. 64. How does SVM work? L3 x1 x2 support vectors
  65. 65. How does SVM work? L3 x1 x2 support vectors
  66. 66. Harder problem 
 (classes are not linearly separable) x1 x2
  67. 67. Harder problem 
 (classes are not linearly separable) x1 x2
  68. 68. Harder problem 
 (classes are not linearly separable) L1 x1 x2
  69. 69. Harder problem 
 (classes are not linearly separable) L1 x1 x2 L1→less errors, smaller margin
  70. 70. Harder problem 
 (classes are not linearly separable) L1 L2 x1 x2 L1→less errors, smaller margin
  71. 71. Harder problem 
 (classes are not linearly separable) L1 L2 x1 x2 L1→less errors, smaller margin L2→more errors, larger margin
  72. 72. Harder problem 
 (classes are not linearly separable) L1 L2 x1 x2 L1→less errors, smaller margin L2→more errors, larger margin Tradeoff between error and margin!
  73. 73. Harder problem 
 (classes are not linearly separable) L1 L2 x1 x2 L1→less errors, smaller margin L2→more errors, larger margin Tradeoff between error and margin! parameter C: penalty for misclassification
  74. 74. Even harder problems! x1
  75. 75. Even harder problems! x1
  76. 76. Even harder problems! x1
  77. 77. Transform to higher dimensions x1
  78. 78. Transform to higher dimensions x1 x2=x1^2
  79. 79. Transform to higher dimensions x1 x2=x1^2
  80. 80. Transform to higher dimensions x1 x2=x1^2
  81. 81. Transform to higher dimensions x1 x2=x1^2
  82. 82. Transform to higher dimensions x1 x2=x1^2 We turned the linear problem into a nonlinear problem.
  83. 83. Transform to higher dimensions x1 x2=x1^2 We turned the linear problem into a nonlinear problem. This trick is achieved via kernel functions!
  84. 84. Fancier kernels exist! x1 x2
  85. 85. Fancier kernels exist! x1 x2 x1 x2 nonlinear kernel
  86. 86. Recap: SVM
  87. 87. Recap: SVM • Linear classifier at its core
  88. 88. Recap: SVM • Linear classifier at its core • Boundary with max. margin
  89. 89. Recap: SVM • Linear classifier at its core • Boundary with max. margin • Input data can be transformed to higher dimensions to achieve better separation
  90. 90. Classifier Performance • How do you evaluate how well the classifier works? • input unseen data with known labels (ground truth) • make predictions with previously trained classifier • using ground truth, • compute % of when prediction matches ground truth —> classification accuracy
  91. 91. Classifier Performance Ground Truth (GT)
  92. 92. Classifier Performance Ground Truth (GT) Predicted (P)
  93. 93. Classifier Performance Ground Truth (GT) Predicted (P) Accuracy = %(P == GT)
  94. 94. P. Raamana What is generalizability? available data (sample*) 23*has a statistical definition
  95. 95. P. Raamana What is generalizability? available data (sample*) 23*has a statistical definition
  96. 96. P. Raamana What is generalizability? available data (sample*) desired: accuracy on 
 unseen data (population*) 23*has a statistical definition
  97. 97. P. Raamana What is generalizability? available data (sample*) desired: accuracy on 
 unseen data (population*) 23*has a statistical definition
  98. 98. P. Raamana What is generalizability? available data (sample*) desired: accuracy on 
 unseen data (population*) out-of-sample predictions 23*has a statistical definition
  99. 99. P. Raamana What is generalizability? available data (sample*) desired: accuracy on 
 unseen data (population*) out-of-sample predictions 23 avoid 
 overfitting *has a statistical definition
  100. 100. P. Raamana Overfitting 24
  101. 101. P. Raamana Overfitting 24
  102. 102. P. Raamana Overfitting 24 Underfit
  103. 103. P. Raamana Overfitting 24 Overfit Underfit
  104. 104. P. Raamana Overfitting 24 Good fit Overfit Underfit
  105. 105. P. Raamana 50 shades of overfitting 25 Decade Population © mathworks human 
 annihilation?
  106. 106. P. Raamana “Clever forms of overfitting” 26from http://hunch.net/?p=22
  107. 107. P. Raamana “Clever forms of overfitting” 26from http://hunch.net/?p=22
  108. 108. P. Raamana Cross-validation 27
  109. 109. P. Raamana Cross-validation • What is cross-validation? Training set Test set 27
  110. 110. P. Raamana Cross-validation • What is cross-validation? • How to perform it? Training set Test set ≈ℵ≈ 27
  111. 111. P. Raamana Cross-validation • What is cross-validation? • How to perform it? • What are the effects of different CV choices? Training set Test set ≈ℵ≈ 27
  112. 112. P. Raamana Cross-validation • What is cross-validation? • How to perform it? • What are the effects of different CV choices? Training set Test set ≈ℵ≈ negative bias unbiased positive bias 27
  113. 113. P. Raamana CV helps quantify generalizability 28
  114. 114. P. Raamana CV helps quantify generalizability 28
  115. 115. P. Raamana CV helps quantify generalizability 28
  116. 116. P. Raamana CV helps quantify generalizability 28
  117. 117. P. Raamana Why cross-validate? Training set Test set 29
  118. 118. P. Raamana Why cross-validate? Training set Test set bigger training set better learning 29
  119. 119. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set 29
  120. 120. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set Key: Train & test sets must be disjoint. 29
  121. 121. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set Key: Train & test sets must be disjoint. And the dataset or sample size is fixed. 29
  122. 122. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set Key: Train & test sets must be disjoint. And the dataset or sample size is fixed. They grow at the expense of each other! 29
  123. 123. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set Key: Train & test sets must be disjoint. And the dataset or sample size is fixed. They grow at the expense of each other! 29
  124. 124. P. Raamana Why cross-validate? Training set Test set bigger training set better learning better testing bigger test set Key: Train & test sets must be disjoint. And the dataset or sample size is fixed. They grow at the expense of each other! cross-validate to maximize both 29
  125. 125. P. Raamana Use cases 30
  126. 126. P. Raamana Use cases • “When setting aside data for parameter estimation and validation of results can not be afforded, cross-validation (CV) is typically used” 30
  127. 127. P. Raamana Use cases • “When setting aside data for parameter estimation and validation of results can not be afforded, cross-validation (CV) is typically used” • Use cases: 30
  128. 128. P. Raamana accuracydistribution
 fromrepetitionofCV(%) Use cases • “When setting aside data for parameter estimation and validation of results can not be afforded, cross-validation (CV) is typically used” • Use cases: • to estimate generalizability 
 (test accuracy) 30
  129. 129. P. Raamana accuracydistribution
 fromrepetitionofCV(%) Use cases • “When setting aside data for parameter estimation and validation of results can not be afforded, cross-validation (CV) is typically used” • Use cases: • to estimate generalizability 
 (test accuracy) • to pick optimal parameters 
 (model selection) 30
  130. 130. P. Raamana accuracydistribution
 fromrepetitionofCV(%) Use cases • “When setting aside data for parameter estimation and validation of results can not be afforded, cross-validation (CV) is typically used” • Use cases: • to estimate generalizability 
 (test accuracy) • to pick optimal parameters 
 (model selection) • to compare performance 
 (model comparison). 30
  131. 131. P. Raamana Key Aspects of CV 31
  132. 132. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test 31
  133. 133. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. 31
  134. 134. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. •This split could be • over samples (e.g. indiv. diagnosis) samples (rows) 31
  135. 135. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. •This split could be • over samples (e.g. indiv. diagnosis) samples (rows) 31 healt hy dise ase
  136. 136. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. •This split could be • over samples (e.g. indiv. diagnosis) • over time (for task prediction in fMRI) time (columns) samples (rows) 31 healt hy dise ase
  137. 137. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. •This split could be • over samples (e.g. indiv. diagnosis) • over time (for task prediction in fMRI) time (columns) samples (rows) 31 healt hy dise ase
  138. 138. P. Raamana Key Aspects of CV 1. How you split the dataset into train/test •maximal independence between 
 training and test sets is desired. •This split could be • over samples (e.g. indiv. diagnosis) • over time (for task prediction in fMRI) 2. How often you repeat randomized splits? •to expose classifier to full variability •As many as times as you can e.g. 100 ≈ℵ≈ time (columns) samples (rows) 31 healt hy dise ase
  139. 139. P. Raamana Validation set Training set 32*biased towards X —> overfit to X
  140. 140. P. Raamana Validation set goodness of fit of the model Training set 32*biased towards X —> overfit to X
  141. 141. P. Raamana Validation set goodness of fit of the model biased* towards the training set Training set 32*biased towards X —> overfit to X
  142. 142. P. Raamana Validation set goodness of fit of the model biased* towards the training set Training set Test set 32*biased towards X —> overfit to X
  143. 143. P. Raamana Validation set goodness of fit of the model biased* towards the training set Training set Test set ≈ℵ≈ 32*biased towards X —> overfit to X
  144. 144. P. Raamana Validation set optimize parameters goodness of fit of the model biased* towards the training set Training set Test set ≈ℵ≈ 32*biased towards X —> overfit to X
  145. 145. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set Training set Test set ≈ℵ≈ 32*biased towards X —> overfit to X
  146. 146. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set Training set Test set Validation set ≈ℵ≈ 32*biased towards X —> overfit to X
  147. 147. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set evaluate generalization independent of training or test sets Training set Test set Validation set ≈ℵ≈ 32*biased towards X —> overfit to X
  148. 148. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set evaluate generalization independent of training or test sets Whole dataset Training set Test set Validation set ≈ℵ≈ 32*biased towards X —> overfit to X
  149. 149. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set evaluate generalization independent of training or test sets Whole dataset Training set Test set Validation set ≈ℵ≈ inner-loop 32*biased towards X —> overfit to X
  150. 150. P. Raamana Validation set optimize parameters goodness of fit of the model biased towards the test set biased* towards the training set evaluate generalization independent of training or test sets Whole dataset Training set Test set Validation set ≈ℵ≈ inner-loop outer-loop 32*biased towards X —> overfit to X
  151. 151. P. Raamana Terminology 33
  152. 152. P. Raamana Terminology 33 Data split Training Testing Validation
  153. 153. P. Raamana Terminology 33 Data split Training Testing Validation Purpose (Do’s) Train model to learn its core parameters Optimize
 hyperparameters Evaluate fully-optimized classifier to report performance
  154. 154. P. Raamana Terminology 33 Data split Training Testing Validation Purpose (Do’s) Train model to learn its core parameters Optimize
 hyperparameters Evaluate fully-optimized classifier to report performance Don’ts (Invalid use) Don’t report training error as the test error! Don’t do feature selection or anything supervised on test set to learn or optimize! Don’t use it in any way to train classifier or optimize parameters
  155. 155. P. Raamana Terminology 33 Data split Training Testing Validation Purpose (Do’s) Train model to learn its core parameters Optimize
 hyperparameters Evaluate fully-optimized classifier to report performance Don’ts (Invalid use) Don’t report training error as the test error! Don’t do feature selection or anything supervised on test set to learn or optimize! Don’t use it in any way to train classifier or optimize parameters Alternative names Training 
 (no confusion) Validation 
 (or tweaking, tuning, optimization set) Test set (more accurately reporting set)
  156. 156. P. Raamana K-fold CV 34
  157. 157. P. Raamana K-fold CV 34
  158. 158. P. Raamana K-fold CV Train Test, 4th fold trial 1 2 … k 34
  159. 159. P. Raamana K-fold CV Train Test, 4th fold trial 1 2 … k 34
  160. 160. P. Raamana K-fold CV Train Test, 4th fold trial 1 2 … k 34
  161. 161. P. Raamana K-fold CV Train Test, 4th fold trial 1 2 … k 34
  162. 162. P. Raamana K-fold CV Test sets in different trials are indeed mutually disjoint Train Test, 4th fold trial 1 2 … k 34
  163. 163. P. Raamana K-fold CV Test sets in different trials are indeed mutually disjoint Train Test, 4th fold trial 1 2 … k Note: different folds won’t be contiguous. 34
  164. 164. P. Raamana K-fold CV Test sets in different trials are indeed mutually disjoint Train Test, 4th fold trial 1 2 … k Note: different folds won’t be contiguous. 34
  165. 165. P. Raamana Repeated Holdout CV Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  166. 166. P. Raamana Repeated Holdout CV Train Test trial 1 2 … n Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  167. 167. P. Raamana Repeated Holdout CV Train Test trial 1 2 … n Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  168. 168. P. Raamana Repeated Holdout CV Train Test trial 1 2 … n Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  169. 169. P. Raamana Repeated Holdout CV Train Test trial 1 2 … n Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  170. 170. P. Raamana Repeated Holdout CV Train Test trial 1 2 … n Note: there could be overlap among the test sets 
 from different trials! Hence large n is recommended. Set aside an independent subsample (e.g. 30%) for testing whole dataset 35
  171. 171. Typical workflow Whole dataset (randomized split) Training set (with labels) feature extraction selection parameter optimization (on training data only) Trained classifier Test set: rest (no labels) Same feature extraction Select same features Evaluate on test set Pool predictions over repetitions NextCVrepetitioniofn Accuracydistribution
  172. 172. Software • There is a free machine learning toolbox in every major language! • Check below for the latest techniques/toolboxes: • http://www.jmlr.org/mloss/ or • http://mloss.org/software/
  173. 173. Confusion Matrices Feature Importance Accuracy distributions Intuitive comparison of misclassification rates neuropredict : easy and comprehensive predictive analysis input features • each feature set could be any set of numbers estimated from sample by itself (intrinsic, not group-wise) • designed to seamlessly compare many features (n>1), if they are all from same set of samples belonging to same classes • supports many input formats. • plugs directly into outputs from popular software like Freesurfer. neuropredict • performs cross-validation, 
 in such a way to increase power of statistical comparisons later on • tracks misclassification rates: class- and subject-wise, 
 for each feature • measures feature importance • statistical comparison of predictive performance • intuitive visualizations • stream-lined comparison for 
 a large number of features! docs: http://neuropredict.readthedocs.io code: github.com/raamana/neuropredict twitter: @raamana_
  174. 174. neuropredict features
  175. 175. neuropredict features • Auto-reading of neuroimaging features
  176. 176. neuropredict features • Auto-reading of neuroimaging features • Auto-evaluation of predictive accuracy
  177. 177. neuropredict features • Auto-reading of neuroimaging features • Auto-evaluation of predictive accuracy • Auto-comparison of performance …
  178. 178. neuropredict features • Auto-reading of neuroimaging features • Auto-evaluation of predictive accuracy • Auto-comparison of performance … • Notice the word I am repeating?
  179. 179. neuropredict features • Auto-reading of neuroimaging features • Auto-evaluation of predictive accuracy • Auto-comparison of performance … • Notice the word I am repeating? • Auto • Auto • Auto
  180. 180. neuropredict features • Auto-reading of neuroimaging features • Auto-evaluation of predictive accuracy • Auto-comparison of performance … • Notice the word I am repeating? • Being automatic is important, without which, it becomes hard, and error-prone!! • Auto • Auto • Auto
  181. 181. sample outputs
  182. 182. Accuracy distributions sample outputs
  183. 183. Confusion Matrices Accuracy distributions sample outputs
  184. 184. Confusion Matrices Accuracy distributions Intuitive comparison of misclassification rates sample outputs
  185. 185. Confusion Matrices Feature Importance Accuracy distributions Intuitive comparison of misclassification rates sample outputs
  186. 186. Now, it’s time to neuropredict! xkcd .com/raamana
  187. 187. Model selection Friedman, J., Hastie, T., & Tibshirani, R. (2008). The elements of statistical learning. Springer, Berlin: Springer series in statistics.

×