Brain reading: Compressive sensing, fMRI, and statistical learning in Python                Ga¨l Varoquaux                ...
1 Brain reading: predictive models   2 Sparse recovery with correlated de-     signs   3 Having an impact: softwareG Varoq...
1 Brain reading: predictive     models   Functional brain imaging:     Study of human cognitionG Varoquaux                ...
1 Brain imaging  fMRI data > 50 000 voxels   stimuliG Varoquaux                             4
1 Brain imaging  fMRI data > 50 000 voxels                     stimuli                   Standard analysis              De...
1 Brain reading  Predicting the object category viewed   [Haxby 2001, Distributed and Overlapping   Representations of Fac...
1 Brain reading  Predicting the object category viewed   [Haxby 2001, Distributed and Overlapping   Representations of Fac...
1 Linear model for fMRI                sign(X w + e) = y                                      Target      Design      matr...
1 Estimation: statistical learning Inverse problem       Minimize an error term:                            w = argmin l(y...
1 Estimation: statistical learning Inverse problem                  Minimize an error term:                               ...
1 TV-penalization to promote regions                                                [Haxby 2001]    Neuroscientists think ...
1 Prediction with logistic regression - TV              w = argmin l(y − X w) + p(w)              ˆ                       ...
1 Prediction with logistic regression - TV              wStandard l(y − X w) + p(w)              ˆ = argmin analysis      ...
1 Standard analysis or predictive modeling?  Predicting the object category viewed   [Haxby 2001, Distributed and Overlapp...
1 Standard analysis or predictive modeling?         Recovery rather than predictionG Varoquaux                            ...
1 Good prediction = good recovery                                   Ground truth  Simulations    Ground truth  Lasso    Pr...
1 Brain mapping: a statistical perspective              Small sample linear model estimation              Random correlate...
1 Brain mapping: a statistical perspective               Small sample linear model estimation               Random correla...
1 Brain mapping as a sparse recovery task   Recovering brain regionsG Varoquaux                                  14
1 Brain mapping as a sparse recovery task   Recovering k non-zero coefficients     nmin ∼ 2 k log p     Restricted-isometry-...
1 Randomized sparsity                  [Meinshausen and Buhlmann 2010, Bach 2008]    Perturb the design matrix:     Subsam...
2 Sparse recovery with     correlated designs     Not enough samples: nmin ∼ 2 k log p     Spatial correlationsG Varoquaux...
2 Sparse recovery with     correlated designs                    Combining                      Clustering                ...
2 Brain parcellations    Spatially-connected hierarchical clustering    ⇒ reduces voxel numbers [Michel Pat Rec 2011]    R...
2 Brain parcellations + sparsity   Hypothesis: clustering compatible with support(w)   Benefits of clustering    Reduced k ...
2 Randomized parcellations + sparsity                                 Randomization                                 + Stab...
2 Algorithm 1 set n clusters and sparsity by cross-validation 2 loop: perturb randomly data 3     clustering to form reduc...
2 Simulations     p = 2048, k = 64, n = 256          (nmin > 1000)     Weights w: patches of varying size     Design matri...
2 When can we recover patches?   Smoothness helps (reduces noise degrees of freedom)   Small patches are hard to recoverG ...
2 What is the best method for patch recovery?   For small patches: elastic net   For large patches: randomized-clustered s...
2 Randomizing clusters matters!   Non-random (Ward) clustering inefficient   Fully-random performs quite well   Randomized W...
2 Randomizing clusters matters!                Degenerate family of cluster                assignements   Non-random (Ward...
2 fMRI: face vs house discrimination   [Haxby 2001]        F-scores                                             L         ...
2 fMRI: face vs house discrimination   [Haxby 2001]        1   Logistic                                             L     ...
2 fMRI: face vs house discrimination   [Haxby 2001]        Randomized   1   Logistic                                      ...
2 fMRI: face vs house discrimination       [Haxby 2001]        Randomized Clustered   1   Logistic                        ...
2 fMRI: face vs house discrimination   [Haxby 2001]        F-scores                                             L         ...
2 Predictive model on selected features   Object recognition [Haxby 2001]    Using recovered features improves predictionG...
Small-sample brain mapping              Sparse recovery of patches on              spatially-correlated designs   Ingredie...
3 Having an impact: software              How to we reach our target              audience (neuroscientists)?             ...
3 Python as a scientific environment     General purpose     Easy, readable syntax     Interactive (ipython)     Great scie...
3 Growing a software stack     Code lines are costly    ⇒ Open source + community driven      Need quality and impact     ...
3 scikit-learn: machine learning in PythonTechnical choices Prefer Python or Cython, focus on readability  Documentation a...
3 scikit-learn: machine learning in PythonAPI Inputs are numpy arrays  Learn a model from the data:   estimator.fit(X trai...
3 scikit-learn: machine learning in PythonComputational performance          scikit-learn mlpy pybrain pymvpa mdp shogunSV...
3 scikit-learn: machine learning in PythonCommunity 163 contributors since 2008, 397 github forks  25 contributors in late...
3 Research code = software library              Factor 10 in time investment     Corner cases in algorithm (numerical stab...
3 Research code = software library              Factor 10 in time investment     CornerTechnical + scientific tradeoffs     ...
Statistical learning to study brain function   Spatial regularization for   predictive models   Total variation    Compres...
Bibliography[Michel TMI 2011] V. Michel, et al., Total variation regularization forfMRI-based prediction of behaviour, IEE...
Upcoming SlideShare
Loading in …5
×

Brain reading, compressive sensing, fMRI and statistical learning in Python

1,900
-1

Published on

Talk given at Gipsa-lab on using machine learning to learn from fMRI brain patterns and regions related to behavior. This talks focuses on the signal and inverse-problem aspects of the equation, as well as on the software.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,900
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
78
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Brain reading, compressive sensing, fMRI and statistical learning in Python

  1. 1. Brain reading: Compressive sensing, fMRI, and statistical learning in Python Ga¨l Varoquaux e INRIA/Parietal
  2. 2. 1 Brain reading: predictive models 2 Sparse recovery with correlated de- signs 3 Having an impact: softwareG Varoquaux 2
  3. 3. 1 Brain reading: predictive models Functional brain imaging: Study of human cognitionG Varoquaux 3
  4. 4. 1 Brain imaging fMRI data > 50 000 voxels stimuliG Varoquaux 4
  5. 5. 1 Brain imaging fMRI data > 50 000 voxels stimuli Standard analysis Detect voxels that correlate to the stimuliG Varoquaux 4
  6. 6. 1 Brain reading Predicting the object category viewed [Haxby 2001, Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex ] Supervised learning taskG Varoquaux 5
  7. 7. 1 Brain reading Predicting the object category viewed [Haxby 2001, Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex ]combinations of voxels to Find predict the stimuli Supervised learning task Multi-variate statisticsG Varoquaux 5
  8. 8. 1 Linear model for fMRI sign(X w + e) = y Target Design matrix × Coefficients = Problem size: p > 50 000 n∼100 per categoryG Varoquaux 6
  9. 9. 1 Estimation: statistical learning Inverse problem Minimize an error term: w = argmin l(y − X w) ˆ w Ill-posed: X is not full rank Inject prior: regularize w = argmin l(y − X w) + p(w) ˆ wG Varoquaux 7
  10. 10. 1 Estimation: statistical learning Inverse problem Minimize an error term: w = argmin l(y − X w) ˆ w Example: Lasso = sparseis not full rank Ill-posed: X regression w = argmin y − X w 2 + 1 (w) ˆ 2 w 1 (w) = i |wi |Inject prior: regularize w = argmin l(y − X w) + p(w) ˆ wG Varoquaux 7
  11. 11. 1 TV-penalization to promote regions [Haxby 2001] Neuroscientists think in terms of brain regions Total-variation penalization Impose sparsity on the gradient of the image: p(w) = 1( w) [Michel TMI 2011]G Varoquaux 8
  12. 12. 1 Prediction with logistic regression - TV w = argmin l(y − X w) + p(w) ˆ w l: least-square or logistic-regression p: TV Optimization: proximal gradient (FISTA) - Gradient descent on l (smooth term) - Projections on TV Prediction performance: Feature screening + SVC 0.77 Sparse regression 0.78 Total Variation 0.84 (explained variance)G Varoquaux 9
  13. 13. 1 Prediction with logistic regression - TV wStandard l(y − X w) + p(w) ˆ = argmin analysis w l: least-square or logistic-regression p: TV Optimization: proximal gradient (FISTA) - Gradient descent on l (smooth term) - Projections on TV Prediction performance: Feature screening + SVC 0.77 Sparse regression 0.78 Total Variation 0.84 (explained variance)G Varoquaux 9
  14. 14. 1 Standard analysis or predictive modeling? Predicting the object category viewed [Haxby 2001, Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex ] Take home message: brain regions, not predictionG Varoquaux 10
  15. 15. 1 Standard analysis or predictive modeling? Recovery rather than predictionG Varoquaux 11
  16. 16. 1 Good prediction = good recovery Ground truth Simulations Ground truth Lasso Prediction: 0.78 Recovery: 0.429 SVM Prediction: 0.71 Recovery: 0.486 Need a method suited for recoveryG Varoquaux 12
  17. 17. 1 Brain mapping: a statistical perspective Small sample linear model estimation Random correlated design × = Problem size: p > 50 000 n∼100 per categoryG Varoquaux 13
  18. 18. 1 Brain mapping: a statistical perspective Small sample linear model estimation Random correlated design Estimation strategy Standard approach: univariate statistics × Multiple comparisons problem ⇒ statistical power ∝ 1/p = We want sub-linear sample complexity ⇒ non-rotationally-invariant estimators e.g. 1 penalization [ Ng, 2004 Feature selection, 1 vs. 2 regularization, and rotational invariance ]G Varoquaux 13
  19. 19. 1 Brain mapping as a sparse recovery task Recovering brain regionsG Varoquaux 14
  20. 20. 1 Brain mapping as a sparse recovery task Recovering k non-zero coefficients nmin ∼ 2 k log p Restricted-isometry-like property: The design matrix is well-conditioned [Candes 2006] on sub-matrices of size > k [Tropp 2004] [Wainwright 2009] Mutual incoherence: Relevant features S and irrelevant ones S are not too correlated Violated by spatial correlations in our design lasso: 23 non-zerosG Varoquaux 14
  21. 21. 1 Randomized sparsity [Meinshausen and Buhlmann 2010, Bach 2008] Perturb the design matrix: Subsample the data Randomly rescale features + Run sparse estimator Keep features that are often selected ⇒ Good recovery without mutual incoherence But RIP-like condition Cannot recover large correlated groups For m correlated features, selection frequency divided by mG Varoquaux 15
  22. 22. 2 Sparse recovery with correlated designs Not enough samples: nmin ∼ 2 k log p Spatial correlationsG Varoquaux 16
  23. 23. 2 Sparse recovery with correlated designs Combining Clustering Sparsity [Varoquaux ICML 2012]G Varoquaux 16
  24. 24. 2 Brain parcellations Spatially-connected hierarchical clustering ⇒ reduces voxel numbers [Michel Pat Rec 2011] Replace features by corresponding cluster average + Use a supervised learner on reduced problem Cluster choice sub-optimal for regressionG Varoquaux 17
  25. 25. 2 Brain parcellations + sparsity Hypothesis: clustering compatible with support(w) Benefits of clustering Reduced k and p ⇒ n > nmin : good side of the “sharp threshold” Cluster together correlated features ⇒ Improves RIP-like conditions Recovery possible on reduced featuresG Varoquaux 18
  26. 26. 2 Randomized parcellations + sparsity Randomization + Stability scores Marginalize the cluster choice Relaxes mutual incoherence requirementG Varoquaux 19
  27. 27. 2 Algorithm 1 set n clusters and sparsity by cross-validation 2 loop: perturb randomly data 3 clustering to form reduced features 4 sparse linear model on reduced features 5 accumulate non-zero features 6 threshold map of apparition countsG Varoquaux 20
  28. 28. 2 Simulations p = 2048, k = 64, n = 256 (nmin > 1000) Weights w: patches of varying size Design matrix: 2D Gaussian random images of varying smoothness Estimators Randomized lasso Our approach Elastic Net Univariate F test Parameters set by cross-validation Performance metric Recovery seen as a 2-class problem ⇒ Report AUC of the precision-recall curveG Varoquaux 21
  29. 29. 2 When can we recover patches? Smoothness helps (reduces noise degrees of freedom) Small patches are hard to recoverG Varoquaux 22
  30. 30. 2 What is the best method for patch recovery? For small patches: elastic net For large patches: randomized-clustered sparsity Large patches and very smooth images: F-testG Varoquaux 23
  31. 31. 2 Randomizing clusters matters! Non-random (Ward) clustering inefficient Fully-random performs quite well Randomized Ward gives an extra gainG Varoquaux 24
  32. 32. 2 Randomizing clusters matters! Degenerate family of cluster assignements Non-random (Ward) clustering inefficient Fully-random performs quite well Randomized Ward gives an extra gainG Varoquaux 24
  33. 33. 2 fMRI: face vs house discrimination [Haxby 2001] F-scores L R L Ry=-31 x=17 z=-17 G Varoquaux 25
  34. 34. 2 fMRI: face vs house discrimination [Haxby 2001] 1 Logistic L R L Ry=-31 x=17 z=-17 G Varoquaux 25
  35. 35. 2 fMRI: face vs house discrimination [Haxby 2001] Randomized 1 Logistic L R L Ry=-31 x=17 z=-17 G Varoquaux 25
  36. 36. 2 fMRI: face vs house discrimination [Haxby 2001] Randomized Clustered 1 Logistic L R L Ry=-31 x=17 z=-17 G Varoquaux 25
  37. 37. 2 fMRI: face vs house discrimination [Haxby 2001] F-scores L R L Ry=-31 x=17 z=-17 G Varoquaux 25
  38. 38. 2 Predictive model on selected features Object recognition [Haxby 2001] Using recovered features improves predictionG Varoquaux 26
  39. 39. Small-sample brain mapping Sparse recovery of patches on spatially-correlated designs Ingredients: Clustering + Randomization ⇒ Reduced feature set compatible with recovery: matches sparsity pattern + recovery conditions Compressive sensing questions Can we recover k > n, in the case of large patches? When do we loose sub-linear sample-complexity?G Varoquaux 27
  40. 40. 3 Having an impact: software How to we reach our target audience (neuroscientists)? How do we disseminate our ideas? How do we facilitate new ideas?G Varoquaux 28
  41. 41. 3 Python as a scientific environment General purpose Easy, readable syntax Interactive (ipython) Great scientific libraries (numpy, scipy, matplotlib...)G Varoquaux 29
  42. 42. 3 Growing a software stack Code lines are costly ⇒ Open source + community driven Need quality and impact ⇒ Focus on the general purpose libraries first Scikit-learn: machine learning in Python http://scikit-learn.orgG Varoquaux 30
  43. 43. 3 scikit-learn: machine learning in PythonTechnical choices Prefer Python or Cython, focus on readability Documentation and examples are paramount Little object-oriented design. Opt for simplicity Prefer algorithms to framework Code quality: consistency and testingG Varoquaux 31
  44. 44. 3 scikit-learn: machine learning in PythonAPI Inputs are numpy arrays Learn a model from the data: estimator.fit(X train, Y train) Predict using learned model estimator.predict(X test) Test goodness of fit estimator.score(X test, y test) Apply change of representation estimator.transform(X, y)G Varoquaux 32
  45. 45. 3 scikit-learn: machine learning in PythonComputational performance scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68 Algorithms rather than low-level optimization convex optimization + machine learning Avoid memory copiesG Varoquaux 33
  46. 46. 3 scikit-learn: machine learning in PythonCommunity 163 contributors since 2008, 397 github forks 25 contributors in latest release (3 months span)Why this success? Trendy topic? Low barrier of entry Friendly and very skilled mailing list Credit to peopleG Varoquaux 34
  47. 47. 3 Research code = software library Factor 10 in time investment Corner cases in algorithm (numerical stability) Multiple platforms and library versions (Blas ) Documentation Making it simpler (and get less educated users) User and developer support ( ∼ 100 mails/day) Exhausting, but has impact on science and societyG Varoquaux 35
  48. 48. 3 Research code = software library Factor 10 in time investment CornerTechnical + scientific tradeoffs cases in algorithm (numerical stability) Multiple platforms andof use rather than speed Ease of install/ease library versions (Blas ) Documentation science” Focus on “old Making it simpler (and get less educatednot Nice publications and theorems are users) a recipe for useful code User and developer support ( ∼ 100 mails/day) Exhausting, but has impact on science and societyG Varoquaux 35
  49. 49. Statistical learning to study brain function Spatial regularization for predictive models Total variation Compressive-sensing approach Sparsity + randomized clustering for correlated designs Machine learning in Python Huge impact Post-doc positions availableG Varoquaux 36
  50. 50. Bibliography[Michel TMI 2011] V. Michel, et al., Total variation regularization forfMRI-based prediction of behaviour, IEEE Transactions in medicalimaging (2011)http://hal.inria.fr/inria-00563468/en[Varoquaux ICML 2012] G. Varoquaux, A. Gramfort, B. ThirionSmall-sample brain mapping: sparse recovery on spatially correlateddesigns with randomization and clustering, ICML (2012)http://hal.inria.fr/hal-00705192/en[Michel Pat Rec 2011] V. Michel, et al., A supervised clustering approachfor fMRI-based inference of brain states, Pattern Recognition (2011)http://hal.inria.fr/inria-00589201/en[Pedregosa ICML 2011] F. Pedregosa, el al., Scikit-learn: machinelearning in Python, JMRL (2011)http://hal.inria.fr/hal-00650905/enG Varoquaux 37

×