SlideShare a Scribd company logo
1 of 28
Download to read offline
Python for brain mining:
(neuro)science with state of the art machine learning
and data visualization
                     Ga¨l Varoquaux
                       e


                           1. Data-driven science
                            “Brain mining”

                           2. Data mining in Python
                            Mayavi, scikit-learn, joblib
1 Brain mining
    Learning models of brain function




Ga¨l Varoquaux
  e                                     2
1 Imaging neuroscience



         Brain                      Models of
        images                      function




                  Cognitive tasks




Ga¨l Varoquaux
  e                                             3
1 Imaging neuroscience



         Brain                  Models of
        images                  function
                 Data-driven science
                        ∂
                    i Cognitive= HΨ
                           Ψ tasks
                        ∂t




Ga¨l Varoquaux
  e                                         3
1 Brain functional data


                                 Rich data
                                  50 000 voxels per
                                  frame
                                  Complex underlying
                                  dynamics

                                 Few observations
                                  ∼ 100


                  Drawing scientific conclusions?
                   Ill-posed statistical problem
Ga¨l Varoquaux
  e                                                    4
1 Brain functional data


                                Rich data
                                 50 000 voxels per
                                 frame
          Modern complex system studies:
                                 Complex underlying
       from strong hypothesizes to rich data
                                 dynamics

                                 Few observations
                                  ∼ 100


                  Drawing scientific conclusions?
                   Ill-posed statistical problem
Ga¨l Varoquaux
  e                                                   4
1 Statistics: the curse of dimensionality
       y function of x1




Ga¨l Varoquaux
  e                                           5
1 Statistics: the curse of dimensionality
       y function of x1      y function of x1 and x2




                 More fit parameters?
             ⇒ need exponentially more data




Ga¨l Varoquaux
  e                                                    5
1 Statistics: the curse of dimensionality
       y function of x1      y function of x1 and x2




                 More fit parameters?
             ⇒ need exponentially more data

             y function of 50 000 voxels?
     Expert knowledge
                                Machine learning
    (pick the right ones)

Ga¨l Varoquaux
  e                                                    5
1 Brain reading
    Predict from brain images the object viewed




Correlation analysis




Ga¨l Varoquaux
  e                                               6
1 Brain reading
    Predict from brain images the object viewed
  Inverse problem
                                Inject prior: regularize



  Observations   Spatial code
                                Sparse regression
Correlation analysis            = compressive sensing




Ga¨l Varoquaux
  e                                                        6
1 Brain reading
    Predict from brain images the object viewed
  Inverse problem
                                Inject prior: regularize


                                Extract brain regions
  Observations   Spatial code
                                Total variation
Correlation analysis            regression




                                 [Michel, Trans Med Imag 2011]
Ga¨l Varoquaux
  e                                                              6
1 Brain reading
    Predict from brain images the object viewed
  Inverse problem
                               Inject prior: regularize

     Cast the problem in Extract brain regions
                             a prediction task:
 Observations  supervised Total variation
               Spatial code learning.
Correlation analysis a model-selection metric
     Prediction is          regression




                                [Michel, Trans Med Imag 2011]
Ga¨l Varoquaux
  e                                                             6
1 On-going/spontaneous activity




                                      95% of the
                                       activity is
                                    unrelated to task




Ga¨l Varoquaux
  e                                                     7
1 Learning regions from spontaneous activity
                                                   Spatial
                               Time series             map
  Multi-subject dictionary
  learning

     Sparsity + spatial continuity + spatial variability

 ⇒ Individual maps      + functional regions atlas




                            [Varoquaux, Inf Proc Med Imag 2011]
Ga¨l Varoquaux
  e                                                               8
1 Graphical models: interactions between regions

                        Estimate covariance structure
                          Many parameters to learn

                             Regularize: conditional
                             independence
                             = sparsity on inverse
                               covariance



                                 [Varoquaux NIPS 2010]



Ga¨l Varoquaux
  e                                                      9
1 Graphical models: interactions between regions

                         Estimate covariance structure
                           Many parameters to learn

                             Regularize: conditional
      Find structure via a density estimation:
                             independence
               unsupervised learning.
                               = sparsity on inverse
      Model selection: likelihood of new data
                                 covariance



                                  [Varoquaux NIPS 2010]



Ga¨l Varoquaux
  e                                                       9
2 My data-science software stack
    Mayavi, scikit-learn, joblib




Ga¨l Varoquaux
  e                                   10
2 Mayavi: 3D data visualization
Requirements              Solution
 large 3D data              VTK: C++ data visualization
 interactive visualization UI (traits)
 easy scripting            + pylab-inspired API
        Black-box solutions don’t yield new intuitions
Limitations
  hard to install                     Tragedy of the
  clunky & complex                     commons or
  C++ leaking through                 niche product?
  3D visualization doesn’t
  pay in academia
Ga¨l Varoquaux
  e                                                      11
2 scikit-learn: statistical learning
Vision
 Address non-machine-learning experts
  Simplify but don’t dumb down
  Performance: be state of the art
  Ease of installation




Ga¨l Varoquaux
  e                                     12
2 scikit-learn: statistical learning
Technical choices
 Prefer Python or Cython, focus on readability
  Documentation and examples are paramount
  Little object-oriented design. Opt for simplicity
  Prefer algorithms to framework
  Code quality: consistency and testing




Ga¨l Varoquaux
  e                                                   13
2 scikit-learn: statistical learning
API
 Inputs are numpy arrays
  Learn a model from the data:
   estimator.fit(X train, Y train)
  Predict using learned model
   estimator.predict(X test)
  Test goodness of fit
   estimator.score(X test, y test)
  Apply change of representation
   estimator.transform(X, y)


Ga¨l Varoquaux
  e                                     14
2 scikit-learn: statistical learning
Computational performance
          scikit-learn mlpy pybrain pymvpa    mdp shogun
SVM           5.2       9.47 17.5     11.52   40.48 5.63
LARS         1.17      105.3    -     37.35      -    -
Elastic Net 0.52        73.7    -      1.44      -    -
kNN           0.57      1.41    -     0.56     0.58 1.36
PCA          0.18         -     -      8.93    0.47 0.33
k-Means       1.34      0.79    ∞        -    35.75 0.68
  Algorithms rather than low-level optimization
  convex optimization + machine learning

  Avoid memory copies

Ga¨l Varoquaux
  e                                                   15
2 scikit-learn: statistical learning
Community
 35 contributors since 2008, 103 github forks
  25 contributors in latest release (3 months span)


Why this success?
 Trendy topic?
  Low barrier of entry
  Friendly and very skilled mailing list
  Credit to people


Ga¨l Varoquaux
  e                                                   16
2 joblib: Python functions on steroids
 We keep recomputing the same things
  Nested loops with overlapping sub-problems
    Varying parameters                         I/O


      Standard solution:
          pipelines

 Challenges
  Dependencies modeling
    Parameter tracking


Ga¨l Varoquaux
  e                                                  17
2 joblib: Python functions on steroids
 Philosophy
  Simple                    don’t change your code
   Minimal                         no dependencies
   Performant                              big data
   Robust                                  never fail

     joblib’s solution = lazy recomputation:
        Take an MD5 hash of function arguments,
        Store outputs to disk



Ga¨l Varoquaux
  e                                                     18
2 joblib
  Lazy recomputing
>>>   from j o b l i b import Memory
>>>   mem = Memory ( c a c h e d i r = ’/ tmp / joblib ’)
>>>   import numpy a s np
>>>   a = np . v a n d e r ( np . a r a n g e (3) )
>>>   s q u a r e = mem. c a c h e ( np . s q u a r e )
>>>   b = square (a)

[ Memory ] C a l l i n g s q u a r e ...
s q u a r e ( a r r a y ([[0 , 0 , 1] ,
              [1 , 1 , 1] ,
              [4 , 2 , 1]]) )
                                           s q u a r e - 0.0 s
>>> c = s q u a r e ( a )
>>> # No recomputation
Ga¨l Varoquaux
  e                                                              19
Conclusion

                 Data-driven science will need
                 machine learning because of the
                 curse of dimensionality

                 Scikit-learn and joblib: focus on
                 large-data performance and easy of
                 use

  Cannot develop software and science separately
Ga¨l Varoquaux
  e                                                   20

More Related Content

What's hot

Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetGael Varoquaux
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data scienceGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareGael Varoquaux
 
Data-driven Hypothesis Management
Data-driven Hypothesis ManagementData-driven Hypothesis Management
Data-driven Hypothesis Managementbgoncalves2
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific SoftwareGael Varoquaux
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

What's hot (18)

Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
 
Data-driven Hypothesis Management
Data-driven Hypothesis ManagementData-driven Hypothesis Management
Data-driven Hypothesis Management
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific Software
 
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Viewers also liked

Neuroplasticity lesson pwpt
Neuroplasticity lesson pwptNeuroplasticity lesson pwpt
Neuroplasticity lesson pwptsarahbousquet
 
Neuroplasticity & technology
Neuroplasticity & technologyNeuroplasticity & technology
Neuroplasticity & technologyFelix Morgan
 
Train your brain: Neuroplasticity and Brain Fitness
Train your brain: Neuroplasticity and Brain FitnessTrain your brain: Neuroplasticity and Brain Fitness
Train your brain: Neuroplasticity and Brain FitnessjessicaeldridgeASU
 
Brain Science Applying Neuroplasticity Principles To Higher Education
Brain Science Applying Neuroplasticity Principles To Higher EducationBrain Science Applying Neuroplasticity Principles To Higher Education
Brain Science Applying Neuroplasticity Principles To Higher Educationsmarkbarnes
 
Neuroplasticity Psychology IB
Neuroplasticity Psychology IBNeuroplasticity Psychology IB
Neuroplasticity Psychology IBMette Morell
 
Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationBrian Lange
 
Neuroplasticity
NeuroplasticityNeuroplasticity
NeuroplasticityLMRF
 
Neuroplasticity
NeuroplasticityNeuroplasticity
NeuroplasticitySerphaty
 
Introduccion a Machine Learning
Introduccion a Machine LearningIntroduccion a Machine Learning
Introduccion a Machine LearningStratebi
 
Neuronal plasticity
Neuronal plasticityNeuronal plasticity
Neuronal plasticitySyed Nadir
 
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.com
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.comNeuroplasticity and the Science of Habit Formation, Case Study ZenFriend.com
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.comRemo Uherek
 
Neuroplasticity presentation
Neuroplasticity presentationNeuroplasticity presentation
Neuroplasticity presentationneandergal
 
Introduction to Neuroplasticity & its application in neuro rehabilitation
Introduction to Neuroplasticity & its application in neuro rehabilitationIntroduction to Neuroplasticity & its application in neuro rehabilitation
Introduction to Neuroplasticity & its application in neuro rehabilitationPhinoj K Abraham
 
Neurology & neuroplasticity
Neurology & neuroplasticityNeurology & neuroplasticity
Neurology & neuroplasticityFernando Molina L.
 
Neuroplasticity: The Brain That Changes Itself
Neuroplasticity: The Brain That Changes ItselfNeuroplasticity: The Brain That Changes Itself
Neuroplasticity: The Brain That Changes ItselfS'eclairer
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 

Viewers also liked (20)

Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
 
Neuroplasticity lesson pwpt
Neuroplasticity lesson pwptNeuroplasticity lesson pwpt
Neuroplasticity lesson pwpt
 
Neuroplasticity
NeuroplasticityNeuroplasticity
Neuroplasticity
 
Neuroplasticity & technology
Neuroplasticity & technologyNeuroplasticity & technology
Neuroplasticity & technology
 
Neuroplasticity
NeuroplasticityNeuroplasticity
Neuroplasticity
 
Train your brain: Neuroplasticity and Brain Fitness
Train your brain: Neuroplasticity and Brain FitnessTrain your brain: Neuroplasticity and Brain Fitness
Train your brain: Neuroplasticity and Brain Fitness
 
Brain Science Applying Neuroplasticity Principles To Higher Education
Brain Science Applying Neuroplasticity Principles To Higher EducationBrain Science Applying Neuroplasticity Principles To Higher Education
Brain Science Applying Neuroplasticity Principles To Higher Education
 
Neuroplasticity Psychology IB
Neuroplasticity Psychology IBNeuroplasticity Psychology IB
Neuroplasticity Psychology IB
 
Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— Classification
 
Neuroplasticity
NeuroplasticityNeuroplasticity
Neuroplasticity
 
Neuroplasticity
NeuroplasticityNeuroplasticity
Neuroplasticity
 
Introduccion a Machine Learning
Introduccion a Machine LearningIntroduccion a Machine Learning
Introduccion a Machine Learning
 
Neuronal plasticity
Neuronal plasticityNeuronal plasticity
Neuronal plasticity
 
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.com
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.comNeuroplasticity and the Science of Habit Formation, Case Study ZenFriend.com
Neuroplasticity and the Science of Habit Formation, Case Study ZenFriend.com
 
Neuroplasticity presentation
Neuroplasticity presentationNeuroplasticity presentation
Neuroplasticity presentation
 
15. neuroplasticity
15. neuroplasticity15. neuroplasticity
15. neuroplasticity
 
Introduction to Neuroplasticity & its application in neuro rehabilitation
Introduction to Neuroplasticity & its application in neuro rehabilitationIntroduction to Neuroplasticity & its application in neuro rehabilitation
Introduction to Neuroplasticity & its application in neuro rehabilitation
 
Neurology & neuroplasticity
Neurology & neuroplasticityNeurology & neuroplasticity
Neurology & neuroplasticity
 
Neuroplasticity: The Brain That Changes Itself
Neuroplasticity: The Brain That Changes ItselfNeuroplasticity: The Brain That Changes Itself
Neuroplasticity: The Brain That Changes Itself
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Python for brain mining: (neuro)science with state of the art machine learning and data visualization

Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
 
Can we recover meaning full spatial information from multivariate pattern ana...
Can we recover meaning full spatial information from multivariate pattern ana...Can we recover meaning full spatial information from multivariate pattern ana...
Can we recover meaning full spatial information from multivariate pattern ana...Gael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesNamkug Kim
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningkkkc
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning LiveMike Anderson
 
Puneet Singla
Puneet SinglaPuneet Singla
Puneet Singlapsingla
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
The Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectThe Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectDETER-Project
 
Short presentation about my thesis
Short presentation about my thesis Short presentation about my thesis
Short presentation about my thesis fourthplanet
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksVipul Vaibhaw
 
An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learningbutest
 
Bessensap2014_Boris Reuderink
Bessensap2014_Boris ReuderinkBessensap2014_Boris Reuderink
Bessensap2014_Boris ReuderinkNWOBessensap
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxRatuRumana3
 
Multivariate analyses & decoding
Multivariate analyses & decodingMultivariate analyses & decoding
Multivariate analyses & decodingkhbrodersen
 

Similar to Python for brain mining: (neuro)science with state of the art machine learning and data visualization (20)

Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
Can we recover meaning full spatial information from multivariate pattern ana...
Can we recover meaning full spatial information from multivariate pattern ana...Can we recover meaning full spatial information from multivariate pattern ana...
Can we recover meaning full spatial information from multivariate pattern ana...
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning Live
 
Puneet Singla
Puneet SinglaPuneet Singla
Puneet Singla
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
The Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectThe Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER Project
 
Short presentation about my thesis
Short presentation about my thesis Short presentation about my thesis
Short presentation about my thesis
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networks
 
An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learning
 
Bessensap2014_Boris Reuderink
Bessensap2014_Boris ReuderinkBessensap2014_Boris Reuderink
Bessensap2014_Boris Reuderink
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptx
 
Multivariate analyses & decoding
Multivariate analyses & decodingMultivariate analyses & decoding
Multivariate analyses & decoding
 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovationGael Varoquaux
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Gael Varoquaux
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Gael Varoquaux
 

More from Gael Varoquaux (11)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Python for brain mining: (neuro)science with state of the art machine learning and data visualization

  • 1. Python for brain mining: (neuro)science with state of the art machine learning and data visualization Ga¨l Varoquaux e 1. Data-driven science “Brain mining” 2. Data mining in Python Mayavi, scikit-learn, joblib
  • 2. 1 Brain mining Learning models of brain function Ga¨l Varoquaux e 2
  • 3. 1 Imaging neuroscience Brain Models of images function Cognitive tasks Ga¨l Varoquaux e 3
  • 4. 1 Imaging neuroscience Brain Models of images function Data-driven science ∂ i Cognitive= HΨ Ψ tasks ∂t Ga¨l Varoquaux e 3
  • 5. 1 Brain functional data Rich data 50 000 voxels per frame Complex underlying dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problem Ga¨l Varoquaux e 4
  • 6. 1 Brain functional data Rich data 50 000 voxels per frame Modern complex system studies: Complex underlying from strong hypothesizes to rich data dynamics Few observations ∼ 100 Drawing scientific conclusions? Ill-posed statistical problem Ga¨l Varoquaux e 4
  • 7. 1 Statistics: the curse of dimensionality y function of x1 Ga¨l Varoquaux e 5
  • 8. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more data Ga¨l Varoquaux e 5
  • 9. 1 Statistics: the curse of dimensionality y function of x1 y function of x1 and x2 More fit parameters? ⇒ need exponentially more data y function of 50 000 voxels? Expert knowledge Machine learning (pick the right ones) Ga¨l Varoquaux e 5
  • 10. 1 Brain reading Predict from brain images the object viewed Correlation analysis Ga¨l Varoquaux e 6
  • 11. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Observations Spatial code Sparse regression Correlation analysis = compressive sensing Ga¨l Varoquaux e 6
  • 12. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Extract brain regions Observations Spatial code Total variation Correlation analysis regression [Michel, Trans Med Imag 2011] Ga¨l Varoquaux e 6
  • 13. 1 Brain reading Predict from brain images the object viewed Inverse problem Inject prior: regularize Cast the problem in Extract brain regions a prediction task: Observations supervised Total variation Spatial code learning. Correlation analysis a model-selection metric Prediction is regression [Michel, Trans Med Imag 2011] Ga¨l Varoquaux e 6
  • 14. 1 On-going/spontaneous activity 95% of the activity is unrelated to task Ga¨l Varoquaux e 7
  • 15. 1 Learning regions from spontaneous activity Spatial Time series map Multi-subject dictionary learning Sparsity + spatial continuity + spatial variability ⇒ Individual maps + functional regions atlas [Varoquaux, Inf Proc Med Imag 2011] Ga¨l Varoquaux e 8
  • 16. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional independence = sparsity on inverse covariance [Varoquaux NIPS 2010] Ga¨l Varoquaux e 9
  • 17. 1 Graphical models: interactions between regions Estimate covariance structure Many parameters to learn Regularize: conditional Find structure via a density estimation: independence unsupervised learning. = sparsity on inverse Model selection: likelihood of new data covariance [Varoquaux NIPS 2010] Ga¨l Varoquaux e 9
  • 18. 2 My data-science software stack Mayavi, scikit-learn, joblib Ga¨l Varoquaux e 10
  • 19. 2 Mayavi: 3D data visualization Requirements Solution large 3D data VTK: C++ data visualization interactive visualization UI (traits) easy scripting + pylab-inspired API Black-box solutions don’t yield new intuitions Limitations hard to install Tragedy of the clunky & complex commons or C++ leaking through niche product? 3D visualization doesn’t pay in academia Ga¨l Varoquaux e 11
  • 20. 2 scikit-learn: statistical learning Vision Address non-machine-learning experts Simplify but don’t dumb down Performance: be state of the art Ease of installation Ga¨l Varoquaux e 12
  • 21. 2 scikit-learn: statistical learning Technical choices Prefer Python or Cython, focus on readability Documentation and examples are paramount Little object-oriented design. Opt for simplicity Prefer algorithms to framework Code quality: consistency and testing Ga¨l Varoquaux e 13
  • 22. 2 scikit-learn: statistical learning API Inputs are numpy arrays Learn a model from the data: estimator.fit(X train, Y train) Predict using learned model estimator.predict(X test) Test goodness of fit estimator.score(X test, y test) Apply change of representation estimator.transform(X, y) Ga¨l Varoquaux e 14
  • 23. 2 scikit-learn: statistical learning Computational performance scikit-learn mlpy pybrain pymvpa mdp shogun SVM 5.2 9.47 17.5 11.52 40.48 5.63 LARS 1.17 105.3 - 37.35 - - Elastic Net 0.52 73.7 - 1.44 - - kNN 0.57 1.41 - 0.56 0.58 1.36 PCA 0.18 - - 8.93 0.47 0.33 k-Means 1.34 0.79 ∞ - 35.75 0.68 Algorithms rather than low-level optimization convex optimization + machine learning Avoid memory copies Ga¨l Varoquaux e 15
  • 24. 2 scikit-learn: statistical learning Community 35 contributors since 2008, 103 github forks 25 contributors in latest release (3 months span) Why this success? Trendy topic? Low barrier of entry Friendly and very skilled mailing list Credit to people Ga¨l Varoquaux e 16
  • 25. 2 joblib: Python functions on steroids We keep recomputing the same things Nested loops with overlapping sub-problems Varying parameters I/O Standard solution: pipelines Challenges Dependencies modeling Parameter tracking Ga¨l Varoquaux e 17
  • 26. 2 joblib: Python functions on steroids Philosophy Simple don’t change your code Minimal no dependencies Performant big data Robust never fail joblib’s solution = lazy recomputation: Take an MD5 hash of function arguments, Store outputs to disk Ga¨l Varoquaux e 18
  • 27. 2 joblib Lazy recomputing >>> from j o b l i b import Memory >>> mem = Memory ( c a c h e d i r = ’/ tmp / joblib ’) >>> import numpy a s np >>> a = np . v a n d e r ( np . a r a n g e (3) ) >>> s q u a r e = mem. c a c h e ( np . s q u a r e ) >>> b = square (a) [ Memory ] C a l l i n g s q u a r e ... s q u a r e ( a r r a y ([[0 , 0 , 1] , [1 , 1 , 1] , [4 , 2 , 1]]) ) s q u a r e - 0.0 s >>> c = s q u a r e ( a ) >>> # No recomputation Ga¨l Varoquaux e 19
  • 28. Conclusion Data-driven science will need machine learning because of the curse of dimensionality Scikit-learn and joblib: focus on large-data performance and easy of use Cannot develop software and science separately Ga¨l Varoquaux e 20