Python for brain mining:(neuro)science with state of the art machine learningand data visualization                     Ga...
1 Brain mining    Learning models of brain functionGa¨l Varoquaux  e                                     2
1 Imaging neuroscience         Brain                      Models of        images                      function           ...
1 Imaging neuroscience         Brain                  Models of        images                  function                 Da...
1 Brain functional data                                 Rich data                                  50 000 voxels per      ...
1 Brain functional data                                Rich data                                 50 000 voxels per        ...
1 Statistics: the curse of dimensionality       y function of x1Ga¨l Varoquaux  e                                         ...
1 Statistics: the curse of dimensionality       y function of x1      y function of x1 and x2                 More fit para...
1 Statistics: the curse of dimensionality       y function of x1      y function of x1 and x2                 More fit para...
1 Brain reading    Predict from brain images the object viewedCorrelation analysisGa¨l Varoquaux  e                       ...
1 Brain reading    Predict from brain images the object viewed  Inverse problem                                Inject prio...
1 Brain reading    Predict from brain images the object viewed  Inverse problem                                Inject prio...
1 Brain reading    Predict from brain images the object viewed  Inverse problem                               Inject prior...
1 On-going/spontaneous activity                                      95% of the                                       acti...
1 Learning regions from spontaneous activity                                                   Spatial                    ...
1 Graphical models: interactions between regions                        Estimate covariance structure                     ...
1 Graphical models: interactions between regions                         Estimate covariance structure                    ...
2 My data-science software stack    Mayavi, scikit-learn, joblibGa¨l Varoquaux  e                                   10
2 Mayavi: 3D data visualizationRequirements              Solution large 3D data              VTK: C++ data visualization i...
2 scikit-learn: statistical learningVision Address non-machine-learning experts  Simplify but don’t dumb down  Performance...
2 scikit-learn: statistical learningTechnical choices Prefer Python or Cython, focus on readability  Documentation and exa...
2 scikit-learn: statistical learningAPI Inputs are numpy arrays  Learn a model from the data:   estimator.fit(X train, Y t...
2 scikit-learn: statistical learningComputational performance          scikit-learn mlpy pybrain pymvpa    mdp shogunSVM  ...
2 scikit-learn: statistical learningCommunity 35 contributors since 2008, 103 github forks  25 contributors in latest rele...
2 joblib: Python functions on steroids We keep recomputing the same things  Nested loops with overlapping sub-problems    ...
2 joblib: Python functions on steroids Philosophy  Simple                    don’t change your code   Minimal             ...
2 joblib  Lazy recomputing>>>   from j o b l i b import Memory>>>   mem = Memory ( c a c h e d i r = ’/ tmp / joblib ’)>>>...
Conclusion                 Data-driven science will need                 machine learning because of the                 c...
Upcoming SlideShare
Loading in …5
×

Python for brain mining: (neuro)science with state of the art machine learning and data visualization

39,418 views

Published on

Talk given at scipy 2011

Published in: Technology, Education
2 Comments
19 Likes
Statistics
Notes
  • Amazing Work , Amusing and informative..Thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Wow! Nice presentation. Thanks.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
39,418
On SlideShare
0
From Embeds
0
Number of Embeds
388
Actions
Shares
0
Downloads
226
Comments
2
Likes
19
Embeds 0
No embeds

No notes for slide

Python for brain mining: (neuro)science with state of the art machine learning and data visualization

  1. Python for brain mining:(neuro)science with state of the art machine learningand data visualization Ga¨l Varoquaux e 1. Data-driven science “Brain mining” 2. Data mining in Python Mayavi, scikit-learn, joblib
  2. 1 Brain mining Learning models of brain functionGa¨l Varoquaux e 2
  3. 1 Imaging neuroscience Brain Models of images function Cognitive tasksGa¨l Varoquaux e 3
  4. 1 Imaging neuroscience Brain Models of images function Data-driven science ∂ i Cognitive= HΨ Ψ tasks ∂tGa¨l Varoquaux e 3
  5. 1 Brain functional data Rich data 50 000 voxels per frame Complex underlying dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problemGa¨l Varoquaux e 4
  6. 1 Brain functional data Rich data 50 000 voxels per frame Modern complex system studies: Complex underlying from strong hypothesizes to rich data dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problemGa¨l Varoquaux e 4
  7. 1 Statistics: the curse of dimensionality y function of x1Ga¨l Varoquaux e 5
  8. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more dataGa¨l Varoquaux e 5
  9. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more data y function of 50 000 voxels? Expert knowledge Machine learning (pick the right ones)Ga¨l Varoquaux e 5
  10. 1 Brain reading Predict from brain images the object viewedCorrelation analysisGa¨l Varoquaux e 6
  11. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Observations Spatial code Sparse regressionCorrelation analysis = compressive sensingGa¨l Varoquaux e 6
  12. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Extract brain regions Observations Spatial code Total variationCorrelation analysis regression [Michel, Trans Med Imag 2011]Ga¨l Varoquaux e 6
  13. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Cast the problem in Extract brain regions a prediction task: Observations supervised Total variation Spatial code learning.Correlation analysis a model-selection metric Prediction is regression [Michel, Trans Med Imag 2011]Ga¨l Varoquaux e 6
  14. 1 On-going/spontaneous activity 95% of the activity is unrelated to taskGa¨l Varoquaux e 7
  15. 1 Learning regions from spontaneous activity Spatial Time series map Multi-subject dictionary learning Sparsity + spatial continuity + spatial variability ⇒ Individual maps + functional regions atlas [Varoquaux, Inf Proc Med Imag 2011]Ga¨l Varoquaux e 8
  16. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional independence = sparsity on inverse covariance [Varoquaux NIPS 2010]Ga¨l Varoquaux e 9
  17. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional Find structure via a density estimation: independence unsupervised learning. = sparsity on inverse Model selection: likelihood of new data covariance [Varoquaux NIPS 2010]Ga¨l Varoquaux e 9
  18. 2 My data-science software stack Mayavi, scikit-learn, joblibGa¨l Varoquaux e 10
  19. 2 Mayavi: 3D data visualizationRequirements Solution large 3D data VTK: C++ data visualization interactive visualization UI (traits) easy scripting + pylab-inspired API Black-box solutions don’t yield new intuitionsLimitations hard to install Tragedy of the clunky & complex commons or C++ leaking through niche product? 3D visualization doesn’t pay in academiaGa¨l Varoquaux e 11
  20. 2 scikit-learn: statistical learningVision Address non-machine-learning experts Simplify but don’t dumb down Performance: be state of the art Ease of installationGa¨l Varoquaux e 12
  21. 2 scikit-learn: statistical learningTechnical choices Prefer Python or Cython, focus on readability Documentation and examples are paramount Little object-oriented design. Opt for simplicity Prefer algorithms to framework Code quality: consistency and testingGa¨l Varoquaux e 13
  22. 2 scikit-learn: statistical learningAPI Inputs are numpy arrays Learn a model from the data: estimator.fit(X train, Y train) Predict using learned model estimator.predict(X test) Test goodness of fit estimator.score(X test, y test) Apply change of representation estimator.transform(X, y)Ga¨l Varoquaux e 14
  23. 2 scikit-learn: statistical learningComputational performance scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68 Algorithms rather than low-level optimization convex optimization + machine learning Avoid memory copiesGa¨l Varoquaux e 15
  24. 2 scikit-learn: statistical learningCommunity 35 contributors since 2008, 103 github forks 25 contributors in latest release (3 months span)Why this success? Trendy topic? Low barrier of entry Friendly and very skilled mailing list Credit to peopleGa¨l Varoquaux e 16
  25. 2 joblib: Python functions on steroids We keep recomputing the same things Nested loops with overlapping sub-problems Varying parameters I/O Standard solution: pipelines Challenges Dependencies modeling Parameter trackingGa¨l Varoquaux e 17
  26. 2 joblib: Python functions on steroids Philosophy Simple don’t change your code Minimal no dependencies Performant big data Robust never fail joblib’s solution = lazy recomputation: Take an MD5 hash of function arguments, Store outputs to diskGa¨l Varoquaux e 18
  27. 2 joblib Lazy recomputing>>> from j o b l i b import Memory>>> mem = Memory ( c a c h e d i r = ’/ tmp / joblib ’)>>> import numpy a s np>>> a = np . v a n d e r ( np . a r a n g e (3) )>>> s q u a r e = mem. c a c h e ( np . s q u a r e )>>> b = square (a)[ Memory ] C a l l i n g s q u a r e ...s q u a r e ( a r r a y ([[0 , 0 , 1] , [1 , 1 , 1] , [4 , 2 , 1]]) ) s q u a r e - 0.0 s>>> c = s q u a r e ( a )>>> # No recomputationGa¨l Varoquaux e 19
  28. Conclusion Data-driven science will need machine learning because of the curse of dimensionality Scikit-learn and joblib: focus on large-data performance and easy of use Cannot develop software and science separatelyGa¨l Varoquaux e 20

×