Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

idalab seminar #2 Interpretation of multivariate machine learning models

34 views

Published on

Besides accurate prediction/description of the data, one goal in multivariate machine learning is to gain insights into the problem domain by analyzing what aspects of the data are most relevant for the model to achieve its performance. However, such analyses may give rise to misinterpretation in the sense that data features that are for example relevant for a classification task, are not necessarily informative of any of the classes themselves.

Haufe’s talk points out this problem and its implications in the context of linear “decoding” applications in neuroimaging, demonstrating that it is caused by correlated additive noise, and proposing a simple transformation of linear methods into a representation from which the class-specific features of actual interest can be easily read off.

Another domain in which correlated noise can lead to misinterpretation is the analysis of interactions between time series. Using again an example from neuroimaging, the presentation points out how linear mixing of noise or signal components in the data can lead to spurious detection of interaction for some of the most established interaction measures. To deal with this problem, a number of “robust” measures and demonstrate their advantages on simulated data will be introduced.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

idalab seminar #2 Interpretation of multivariate machine learning models

  1. 1. Dr. Stefan Haufe Agency for Data Science Machine learning & AI Mathematical modelling Data strategy Interpretation of multivariate machine learning models idalab seminar #2 | May 20th 2016
  2. 2. Interpretation of multivariate machine learning models Stefan Haufe stefan.haufe@tu-berlin.de Talk at idalab, Berlin, May 20th 2016 Machine Learning Group, TU Berlin
  3. 3. Multivariate data Neuroimaging: ~10,000 brain voxels imaged in parallel using fMRI Other: financial, genomic, climate, molecule descriptors, … : dataset with features and samples 2 t = 1, . . . , T M T
  4. 4. Goal Find latent signal components with interesting properties in the data. Supervised: predictive of one or more target variables • Regression, Classification • Targets: • Stimulus property, experimental condition, behavioral response • Weather, drug efficacy, shoe size, … Unsupervised: • Other properties: large variance, mutual independence, stationarity • Factor models, (blind) source separation 3 si(t) yi(t)
  5. 5. Goal Find latent signal components with interesting properties in the data. Supervised: predictive of one or more target variables • Regression, Classification • Targets: • Stimulus property, experimental condition, behavioral response • Weather, drug efficacy, shoe size, … Unsupervised: • Other properties: large variance, mutual independence, stationarity • Factor models, (blind) source separation 4 yi(t) si(t)
  6. 6. Subgoals 1. Reconstruction: Accurately estimate interesting components. Purpose: Approximate target well. Example: Decode somebody’s cognitive state from brain signals.
 2. Interpretation: Identify data features that are related to a component. Purpose: Gain insight into problem. Example: Find out where in the brain a cognitive state is represented. Possible to have both at the same time? 5
  7. 7. Mass-univariate analysis Example: network of brain regions is modulated by experimental condition 6 r=0.77
  8. 8. Mass-univariate analysis Example: network of brain regions is modulated by experimental condition In real data: additive noise (other brain activity and artifacts). Correlating single features (brain voxels) with single target variable (condition) leads to suboptimal reconstruction and interpretation. 7 r=0.77 max r=0.32
  9. 9. Linear forward model A.k.a.: generative model Idea: express each feature as linear superposition of a set of components. , where are components, the columns of are loading vectors, or activation patterns, is noise. 8 (t) RM
  10. 10. Supervised case Set , where comprises all targets and known noise variables. Allows one to correlate single target with data, without interference from ‘nuisance’ (other target or noise) variables → better reconstruction. A.k.a.: GLM, ANOVA, partial correlation 9 s(t) = y(t) y(t) from Monti, 2011 yi(t) yj=i(t)
  11. 11. Limitations • Still mass-univariate w.r.t. data , doesn’t combine information from different channels. • Only influence of known ‘nuisance’ variables can be removed. → Suboptimal reconstruction. 10 x(t) yi(t)
  12. 12. Linear backward model A.k.a.: discriminative model Idea: express latent components by linear combination of channels. , where the columns of are called extraction filters. Supervised case: optimize such that . Examples: linear classifiers and regression models (SVM, LDA, lasso, OLS, Ridge regression, …) 11 s(t) y(t)
  13. 13. Advantages • Only target variable of interest needs to be known. • Everything else is considered noise, and gets suppressed by linearly combining features. → Improved reconstruction. Reconstructed component given implicitly as projected data. → Possible to make predictions on new data using . 12
  14. 14. Interpretation? Patterns and filters can be visualized in the same way. 13
  15. 15. Interpretation? Patterns and filters can be visualized in the same way. However, their meanings are different. 14
  16. 16. Interpretation Filters: tell how to weight features to extract a (target-related) component. • Depend on signal and noise → Non-zero weight for features statically independent of the target → Zero weight for features related to target → Not interpretable, e.g., cannot be used to localize brain function 15
  17. 17. Interpretation Filters: tell how to weight features to extract a (target-related) component. • Depend on signal and noise → Non-zero weight for features statically independent of the target → Zero weight for features related to target → Not interpretable, e.g., cannot be used to localize brain function Patterns: tell how strongly a component is expressed in each feature. • Depend only on signal → Nonzero values only for features related to target → Indicate sign/strength of signal in each feature → Interpretable, e.g., can be used to localize brain activity to features 16
  18. 18. Condition-specific brain region overlapping with the ‘default-mode network’. DMN condition-specific Example: decoding with correlated noise Figure from Norton et al., 2012, Neurology 17
  19. 19. Condition-specific brain region overlapping with the ‘default-mode network’. DMN condition-specific Consider two voxels: and . Example: decoding with correlated noise Figure from Norton et al., 2012, Neurology 18
  20. 20. The target can be perfectly reconstructed by taking the difference with and . → Despite purely measuring DMN activity, gets a nonzero weight in . Example: decoding with correlated noise 19
  21. 21. The target can be perfectly reconstructed by taking the difference with and . → Despite purely measuring DMN activity, gets a nonzero weight in . The forward model is given by with and . → gets zero weight in . Example: decoding with correlated noise 20
  22. 22. Another example Noise correlation strongly affects . 21 w
  23. 23. "By using a sparse model (e.g., lasso), we can make sure that the features with nonzero weights are related to the target." Can regularization make backward models interpretable? 22
  24. 24. "By using a sparse model (e.g., lasso), we can make sure that the features with nonzero weights are related to the target." Answer: no Nevertheless, regularization may be necessary to prevent overfitting. Can regularization make backward models interpretable? True activation 23
  25. 25. Answer: yes, by transforming them into forward models. Can backward models be made interpretable at all? backward forward
  26. 26. Answer: yes, by transforming them into forward models. For : . Can backward models be made interpretable at all? backward forward 25
  27. 27. Answer: yes, by transforming them into forward models. For : . For : , where and are the covariance matrices of and . Can backward models be made interpretable at all? backward forward 26
  28. 28. Answer: yes, by transforming them into forward models. For : . For : , where and are the covariance matrices of and . For or uncorrelated components : . Can backward models be made interpretable at all? backward forward 27
  29. 29. Regularization of filters does not influence structure of patterns. Can backward models be made interpretable at all? True activation Activation patterns obtained via 28
  30. 30. • Parameters of forward models (e.g. GLMs) are interpretable (can be used to identify features that are related to a target) • Backward models (e.g. SVMs) predict better than forward models due to optimally combining features • However, their parameters are not interpretable • Regularization/sparsification does not improve interpretation • Backward models can be transforming into forward models Summary 29

×