Python for brain mining: (neuro)science with state of the art machine learning and data visualization
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Python for brain mining: (neuro)science with state of the art machine learning and data visualization

on

  • 36,136 views

Talk given at scipy 2011

Talk given at scipy 2011

Statistics

Views

Total Views
36,136
Views on SlideShare
35,791
Embed Views
345

Actions

Likes
14
Downloads
194
Comments
2

27 Embeds 345

http://daviddeboer.blogspot.com 76
http://www.claudepenland.com 49
http://era.pkr 42
https://twitter.com 35
http://paper.li 34
http://twitter.com 17
http://localhost:60459 15
http://teste.l3go.com.br 15
http://presentworth.blogspot.com 13
http://www.daviddeboer.blogspot.com 10
http://edicolaeuropea.blogspot.com 6
http://gael-varoquaux.info 4
http://daviddeboer.blogspot.in 3
http://daviddeboer.blogspot.ca 3
http://era.test6.smartcommerce.pl 3
http://www.daviddeboer.blogspot.be 3
http://tweetedtimes.com 2
http://a0.twimg.com 2
http://blog.l3go.com.br 2
http://www.daviddeboer.blogspot.nl 2
http://besoselectricos.blogspot.com 2
http://pinterest.com 2
https://si0.twimg.com 1
http://edicolaeuropea.blogspot.in 1
http://edicolaeuropea.blogspot.com.au 1
http://daviddeboer.blogspot.nl 1
http://test.l3go.com.br 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Amazing Work , Amusing and informative..Thanks
    Are you sure you want to
    Your message goes here
    Processing…
  • Wow! Nice presentation. Thanks.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Python for brain mining: (neuro)science with state of the art machine learning and data visualization Presentation Transcript

  • 1. Python for brain mining:(neuro)science with state of the art machine learningand data visualization Ga¨l Varoquaux e 1. Data-driven science “Brain mining” 2. Data mining in Python Mayavi, scikit-learn, joblib
  • 2. 1 Brain mining Learning models of brain functionGa¨l Varoquaux e 2
  • 3. 1 Imaging neuroscience Brain Models of images function Cognitive tasksGa¨l Varoquaux e 3
  • 4. 1 Imaging neuroscience Brain Models of images function Data-driven science ∂ i Cognitive= HΨ Ψ tasks ∂tGa¨l Varoquaux e 3
  • 5. 1 Brain functional data Rich data 50 000 voxels per frame Complex underlying dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problemGa¨l Varoquaux e 4
  • 6. 1 Brain functional data Rich data 50 000 voxels per frame Modern complex system studies: Complex underlying from strong hypothesizes to rich data dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problemGa¨l Varoquaux e 4
  • 7. 1 Statistics: the curse of dimensionality y function of x1Ga¨l Varoquaux e 5
  • 8. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more dataGa¨l Varoquaux e 5
  • 9. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more data y function of 50 000 voxels? Expert knowledge Machine learning (pick the right ones)Ga¨l Varoquaux e 5
  • 10. 1 Brain reading Predict from brain images the object viewedCorrelation analysisGa¨l Varoquaux e 6
  • 11. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Observations Spatial code Sparse regressionCorrelation analysis = compressive sensingGa¨l Varoquaux e 6
  • 12. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Extract brain regions Observations Spatial code Total variationCorrelation analysis regression [Michel, Trans Med Imag 2011]Ga¨l Varoquaux e 6
  • 13. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Cast the problem in Extract brain regions a prediction task: Observations supervised Total variation Spatial code learning.Correlation analysis a model-selection metric Prediction is regression [Michel, Trans Med Imag 2011]Ga¨l Varoquaux e 6
  • 14. 1 On-going/spontaneous activity 95% of the activity is unrelated to taskGa¨l Varoquaux e 7
  • 15. 1 Learning regions from spontaneous activity Spatial Time series map Multi-subject dictionary learning Sparsity + spatial continuity + spatial variability ⇒ Individual maps + functional regions atlas [Varoquaux, Inf Proc Med Imag 2011]Ga¨l Varoquaux e 8
  • 16. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional independence = sparsity on inverse covariance [Varoquaux NIPS 2010]Ga¨l Varoquaux e 9
  • 17. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional Find structure via a density estimation: independence unsupervised learning. = sparsity on inverse Model selection: likelihood of new data covariance [Varoquaux NIPS 2010]Ga¨l Varoquaux e 9
  • 18. 2 My data-science software stack Mayavi, scikit-learn, joblibGa¨l Varoquaux e 10
  • 19. 2 Mayavi: 3D data visualizationRequirements Solution large 3D data VTK: C++ data visualization interactive visualization UI (traits) easy scripting + pylab-inspired API Black-box solutions don’t yield new intuitionsLimitations hard to install Tragedy of the clunky & complex commons or C++ leaking through niche product? 3D visualization doesn’t pay in academiaGa¨l Varoquaux e 11
  • 20. 2 scikit-learn: statistical learningVision Address non-machine-learning experts Simplify but don’t dumb down Performance: be state of the art Ease of installationGa¨l Varoquaux e 12
  • 21. 2 scikit-learn: statistical learningTechnical choices Prefer Python or Cython, focus on readability Documentation and examples are paramount Little object-oriented design. Opt for simplicity Prefer algorithms to framework Code quality: consistency and testingGa¨l Varoquaux e 13
  • 22. 2 scikit-learn: statistical learningAPI Inputs are numpy arrays Learn a model from the data: estimator.fit(X train, Y train) Predict using learned model estimator.predict(X test) Test goodness of fit estimator.score(X test, y test) Apply change of representation estimator.transform(X, y)Ga¨l Varoquaux e 14
  • 23. 2 scikit-learn: statistical learningComputational performance scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68 Algorithms rather than low-level optimization convex optimization + machine learning Avoid memory copiesGa¨l Varoquaux e 15
  • 24. 2 scikit-learn: statistical learningCommunity 35 contributors since 2008, 103 github forks 25 contributors in latest release (3 months span)Why this success? Trendy topic? Low barrier of entry Friendly and very skilled mailing list Credit to peopleGa¨l Varoquaux e 16
  • 25. 2 joblib: Python functions on steroids We keep recomputing the same things Nested loops with overlapping sub-problems Varying parameters I/O Standard solution: pipelines Challenges Dependencies modeling Parameter trackingGa¨l Varoquaux e 17
  • 26. 2 joblib: Python functions on steroids Philosophy Simple don’t change your code Minimal no dependencies Performant big data Robust never fail joblib’s solution = lazy recomputation: Take an MD5 hash of function arguments, Store outputs to diskGa¨l Varoquaux e 18
  • 27. 2 joblib Lazy recomputing>>> from j o b l i b import Memory>>> mem = Memory ( c a c h e d i r = ’/ tmp / joblib ’)>>> import numpy a s np>>> a = np . v a n d e r ( np . a r a n g e (3) )>>> s q u a r e = mem. c a c h e ( np . s q u a r e )>>> b = square (a)[ Memory ] C a l l i n g s q u a r e ...s q u a r e ( a r r a y ([[0 , 0 , 1] , [1 , 1 , 1] , [4 , 2 , 1]]) ) s q u a r e - 0.0 s>>> c = s q u a r e ( a )>>> # No recomputationGa¨l Varoquaux e 19
  • 28. Conclusion Data-driven science will need machine learning because of the curse of dimensionality Scikit-learn and joblib: focus on large-data performance and easy of use Cannot develop software and science separatelyGa¨l Varoquaux e 20