Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning Applications in Medicine (Olga Senyukova)


Published on

How can beautiful algorithmic findings be helpful in our everyday life? One of the answers to this question lies in the area of healthcare applications. Nowadays machine learning methods are becoming more and more useful in medicine. They are able not only to assist medical specialists in processing large amounts of data, but also to help in diagnostics and patient follow-up.

This course is devoted to the discussion of some interesting applications of machine learning methods to automatically analyse medical images and physiologic signals. Medical images acquired by means of special equipment represent internal structures of the human body and/or processes in it. The most modern technologies for acquisition of such images are magnetic resonance imaging (MRI) and computed tomography (CT). Physiologic signals usually refer to cardiologic time series such as electrocardiograms (ECG), but can also represent other physiological data, for example, stride intervals of human gait.

Several important problems will be highlighted along with successful solutions involving machine learning methods including examples both from the worldwide practice and the author’s own research. Description of the basic principles of the algorithms used will provide a good opprotunity to strengthen the knowledge acquired from the other courses of the school.

Published in: Science
  • Be the first to comment

Machine Learning Applications in Medicine (Olga Senyukova)

  1. 1. MACHINE LEARNING APPLICATIONS IN MEDICINE Olga Senyukova Graphics & Media Lab Faculty of Computational Mathematics and Cybernetics Lomonosov Moscow State University
  2. 2. Medical data Medical images Physiologic signals Other: narrative, textual, numerical, etc.
  3. 3. Medical data Medical images Physiologic signals Other: narrative, textual, numerical, etc.
  4. 4. Medical images X-Ray MRI CTUltrasound
  5. 5. Computed tomography (CT)  1972, Sir Godfrey Hounsfield  X-rays are computer-processed to produce tomographic images
  6. 6. Computed tomography (CT)
  7. 7. Magnetic resonance imaging (MRI)  1973, Paul C. Lauterbur and Peter Mansfield  Allows localizing the image by slices Source: K. Toennies
  8. 8. Magnetic resonance imaging (MRI)
  9. 9. Electrocardiography (ECG)  1901, Einthoven  Recording of the electrical activity of the heart by electrodes placed on the body
  10. 10. RR time series RR time series (interbeat intervals lengths) are widely used for ECG analysis
  11. 11. Human gait time series
  12. 12. Analysis: what for?  Normal or diseased?  Where is the diseased area?  What changes over time occur (especially, after treatment)?  Does the specific condition take place (e.g. overtraining of the sportsman)?  …
  13. 13. Main tasks: images Detection aneurysm Segmentation T Matching (Registration)
  14. 14. Main tasks: physiologic signals Diagnostics  Healthy  Disease XXX  Disease YYY Template Matching Condition ZZZ The same or not???
  15. 15. Machine learning in medical imaging: challenges Slide by D. Rueckert  Images are often 3D or 4D:  # of voxels and # of extracted features is very large  Number of images for training is often limited:  large datasets means typically 100 to 1000 images  “small sample size problem”
  16. 16. Machine learning in medical imaging: challenges  Training data is expensive  annotation of images is resource intensive (manpower, cost, time)  sometimes possible to augment training using unlabelled images  Training data is sometimes imperfect  training data may be wrongly labelled  e.g. diseases such as Alzheimer’s require confirmation through pathology (difficult and costly to obtain) Slide by D. Rueckert
  17. 17. The InnerEye project Measuring brain tumors Localizing and identifying vertebrae Kinect for surgery Source: A. Criminisi & the InnerEye team @ M
  18. 18. Anatomy localization via regression forests A. Criminisi, et al. Med Image Analysis 2013
  19. 19. Decision forests  Leo Breiman, 2001  A. Criminisi, J. Shotton (eds.). Decision Forests in Computer Vision and Medical Image Analysis // Advances in Computer Vision and Pattern Recognition. 2013 Decision forest consists of decision trees…
  20. 20. Decision tree  Each internal node: a split (test) function  Each leaf: class label (predictor) Source: A. Konushin
  21. 21. Regression tree input value continuouslabel • Green – high uncertainty • Red – low uncertainty • Thickness – the number of samples from the training set Source: A. Criminisi, J. Shotton
  22. 22. Regression tree: training • S0 – whole training set • Sj – part of training set at the jth node ))(,;(~)|( 2 xyyNxyp y Source: A. Criminisi, J. Shotton
  23. 23. Regression tree: training  Split function parameters at the jth node maximize the information gain  At each part (L,R):  fit a line to the points (e.g. least squares)  for each x we have ))(,;(~)|( 2 xyyNxyp y ),(maxarg   jj SI j                j i jSyx RLi Syx yy xxI ),( },{ ),( ))(log())(log(  y – green lineSource: A. Criminisi, J. Shotton
  24. 24. Example Source: A. Criminisi, J. Shotton
  25. 25. Example Source: A. Criminisi, J. Shotton
  26. 26. Different models Predictor models Constant Polynomial and linearProbabilistic linear Weak learners (split functions) Axis-aligned Generic oriented hyperplane Conic section Source: A. Criminisi, J.
  27. 27. Regression forest d dxx  ),...,( 1v Source: A. Criminisi, J.
  28. 28. Randomness  Bagging: each tree is learned on subset of the whole training set
  29. 29. Randomness  Randomized node optimization: optimize a split function at the jth node w.r.t. a small random subset of parameter values ),(maxarg  jj SI ),(maxarg  jj SI  j !!! j ),,( jjjj τ  j j jτ selects features from the whole feature set is a weak learner type (axis-aligned, linear, etc.) is a set of splitting thresholds Source: A. Criminisi, J.
  30. 30. Forest vs tree Source: A. Criminisi, J.
  31. 31. The labeled database Source: A. Criminisi, J.
  32. 32. Anatomy localization  Key idea: all voxels in the image vote for the position of the organ  Each organ is defined by its 3D axis- aligned bounding box Cc ),,,,,( F c H c P c A c R c L cc bbbbbbb  C = {liver, spleen, kidneyL, kidneyR, …} Source: A. Criminisi, J.
  33. 33. Anatomy localization For each input voxel we obtain distribution of relative displacements to the organ bounding box ),,( zyx vvvv ),,,,,()( F c H c P c A c R c L cc ddddddd v );( vf – feature response Source: A. Criminisi, J.
  34. 34. Context-rich features Features: mean intensity in randomly displaced boxes Source: A. Criminisi, J.
  35. 35. Features for CT and MRI CT: we can rely on absolute intensity MRI: only intensity difference makes sense Source: A. Criminisi, J.
  36. 36. Learning clinically useful information from medical images  Biomedical Image Analysis Group  Department of Computing  Daniel Rueckert
  37. 37. Segmentation using registration Slide by D.
  38. 38. Multi-atlas segmentation using classifier fusion
  39. 39. Multi-atlas segmentation using classifier fusion and selection
  40. 40. Selection of atlases  How to select atlases the most similar to our image?  Atlases should be clustered by disease/population  Manifold learning is used to efficiently discover such clusters
  41. 41. Manifold learning Source: D. Embed the data to the manifold (project to less- dimensional space) Find a manifold
  42. 42. Manifold learning: Laplacian eigenmaps  Given a graph G = (V, E)  Each vertex vi corresponds to an image  Each edge weight wij defines the similarity between image i and j  Define diagonal matrix T which contains the degree sums for each vertex  j ijii wt Slide by D.
  43. 43. Manifold learning: Laplacian eigenmaps 2/12/1 )(   TWTTL Normalized graph Laplacian 2 , min jiji ij yyW  The eigen decomposition of L provides manifold coordinates yi for each Source: D. Rueckert
  44. 44. Manifold learning for multi-atlas segmentation  We have two sets of images:  labeled (atlases)  unlabeled  We want to label all the unlabeled images  We can do it iteratively:  label a part of unlabeled images using the most similar from already labeled  these images can be used as atlases for the next iteration
  45. 45. Manifold learning for multi-atlas segmentation Wolz et al., Neuroimage, 2010
  46. 46. Example Wolz et al., Neuroimage, 2010
  47. 47. Segmentation of brain lesions in MRI  Olga V. Senyukova, “Segmentation of blurred objects by classification of isolabel contours”. Pattern Recognition, 2014  Data was provided by Children's Clinical and Research Institute Emergency Surgery and Trauma
  48. 48. The proposed algorithm  Each MRI slice is processed separately  In order to improve speed and robustness the regions containing lesions can be specified manually  Lesions inside these regions are segmented automatically
  49. 49. Algorithm overview Input region Isolabel contours I(x,y)=const Closed isolabel contours Nonlinear SVM classification
  50. 50. Isolabel contours In geography each isolabel contour (one color): constant height f(x,y)=h In image processing each isolabel contour (one color): constant intensity f(x,y)=I
  51. 51. How to distinguish lesion contours?  Visually we can do it easily!  Let’s use the same set of features for automatic classification of isolabel contours
  52. 52. Features of isolabel contours In order to distinguish isolabel contours delineating lesions we propose 4 features Imean Imean inside the contour / Imean inside BBox Imax-IminIvariance
  53. 53. Labeled training base  Various regions on many images:  a user can click on lesion contours: they will get “lesion”  other isolabel contours will automatically get “non- lesion” …, , [ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> lesion [ɸ1, ɸ2, ɸ3, ɸ4] -> lesion … [ɸ1, ɸ2, ɸ3, ɸ4] is a feature vector
  54. 54. Binary classification via SVM  We have a binary classification task: each isolabel contour belongs to one of two classes, lesions or non-lesions  One of the best classifiers is SVM – Support Vector Machine  original linear SVM: Vladimir Vapnick, Alexey Chervonenkis, 1963  applying a kernel trick results in nonlinear SVM: Bernhard Boser, Isabelle Guyon, Vladimir Vapnick, 1992
  55. 55. Linear SVM support vectors margin 1:1  by ii wx 1:1  by ii wx positive samples negative samples w/2 Maximizing we solve quadratic optimization problem: w/2 wwT 2 1 1)(  bxy ii w minimizing subject to byb iii i   xxxw  Solution is a hyperplane: ix i – support vectors – learned weights
  56. 56. Nonlinear SVM  For linearly separable data linear SVM is excellent  What about the data that is not linearly separable?..  We can make it linearly separable by mapping it to more-dimensional space
  57. 57. Nonlinear SVM: kernel trick by iii i  xx bKy iii i  ),( xxInstead of we have )()(),( jijiK xxxx  where  2 exp),( jijiK xxxx   For classification of isolabel contours I use nonlinear SVM with RBF (radial basis function) kernel
  58. 58. Ensemble-based analysis of RR and gait  Olga Senyukova  Valeriy Gavrishchaka, Department of Physics, West Virginia University  Springer, 2013, 2015
  59. 59. RR and gait time series Normal? Huntington’s disease? Parkinson’s disease? … Normal? Arrhythmia? Congestive heart failure? …
  60. 60. Ensemble learning techniques  Ensemble can work better than a single classifier … accuracy: 0.61 accuracy: 0.73 accuracy: 0.65 base classifier 1 base classifier 2 base classifier N Ensemble of classifiers accuracy: 0.9
  61. 61. AdaBoost  Freund and Schapire, 1997  On each iteration focuses on the most hard-to- classify samples
  62. 62. AdaBoost  – training data, – labels  Initial weights of all N items:  M iterations, from m = 1 to M:  find  if then stop  set  update  Classifier output: Nwi /1)0(  ))(()( 1   M m mmTsignH xx  Nii ,...,1, x }1;1{ iy )]([)(minarg)( 1 imi m i mj T m TyiwT j xx     2/1m         m m m    1 log 2 1   t imimm m Z Tyiw iw )(exp)( )(1 x 
  63. 63. Good classifier example
  64. 64. Iteration 1 of 3 T1
  65. 65. Iteration 2 of 3 T2
  66. 66. Iteration 3 of 3 STOP T3
  67. 67. Final model )](72.0)(70.0)(42.0[ 321 xxx TTTsign 
  68. 68. Ensemble decomposition learning  We apply ensemble-based classifier to vector x  Each x can be described by its ensemble decomposition vector (EDL vector)  We can classify data points by comparing their EDL vectors    M m mmTH 1 )()( xx  )](,),(),([)( 2211 xxxx MMTTTD  
  69. 69. EDL: learning All available data «normal/abnormal» MS E DF A AdaBoo st Indicators from nonlinear dynamics Building a general classifier «normal/abnormal» MSE 1 DFA 2 … MSE N α1 + α2 + αN Ансамбль классификаторов Training example x MSE 1 DFA 2 … MSE N α1 + α2 + αN Applying the ensemble MS E +1 (normal) -1 (abnormal) +1 (normal) -1 (abnormal) DF A )]1(*,),1(*,1*[)( 21  MD  x EDL vector
  70. 70. EDL: testing Testing example y MSE 1 DFA 2 … MSE N α1 + α2 + αN Applying the ensemble ]1*,,1*),1(*[)( 21 MD  y  )()( yx DD ? x = yx ≠ y EDL vector no yes In multi-class classification problem the class of y is the class of the training example with the closest EDL vector ))()((min)()(: yy DxDDxDC ik C i Ck 
  71. 71. Results  CHF/Arrhythmia classification  Real data from
  72. 72. Thank you for attention!