SlideShare a Scribd company logo
LIGHTNING, A LIBRARY FOR
LARGE-SCALE MACHINE
LEARNING IN PYTHON
,Fabian Pedregosa (1) Mathieu Blondel (2)
(1) Chaire Havas-Dauphine / INRIA, Paris France
(2) NTT Communication Science Laboratories, Kyoto Japan
SCIKIT-LEARN: WITH GREAT CODE
COMES GREAT RESPONSABILITY
#lines of code in scikit-learn
Very selective for new algorithms/models.
LIGHTNING
Incorporate recent progress in large-scale optimization.
scikit-learn compatible .
scalable on large datasets.
support for dense and sparse input.
emphasis on structured sparsity penalties.
dependencies = Python + Cython + scikit-learn.
SCIKIT-LEARN COMPATIBLE
mix lightning with scikit-learn Pipeline, GridSearchCV,
etc.
⟹
FROM LARGE DATA TO LARGE
OPTIMIZATION
Big data comes in different flavors.
n{
⎛
⎝
⎜
⎜
⎜
⎜
D
A
T
A
⎞
⎠
⎟
⎟
⎟
⎟
  p
Large sample:
Computer vision, advertising,
etc.
Large dimension:
Biology, neuroscience, etc.
LEARNING FROM LARGE SAMPLES
Usual methods (gradient descent, BFGS, etc.):
Pass through the data at each iteration.
Prohibitive for large datasets.
Back to simple methods:
Stochastic gradient descent (Robbins and Monro, 1951).
LEARNING FROM LARGE SAMPLES
lighting example, n=100.000
In last 5 years, flurry of
new stochastic methods:
Stochastic variance-
reduced gradient
(SVRG)
Stochastic Dual
Coordinate Ascent
(SDCA)
Stochastic Average
Gradient (SAG/SAGA)
They are all in lightning!
LEARNING FROM LARGE FEATURES
Iterate through the columns.
Coordinate Descent-like algorithms.
Very efficient for sparse models.
(Blondel et al. 2013) , multiclass classification with group-lasso penalty
STRUCTURED SPARSITY
There's so much more than the Lasso ...
Group sparse penalty.
Total variation.
Trace norm (low rank).
API
Similarities and differences with scikit-learn
scikit-learn:
(penalty = 'l1', )LogisticRegression
  loss function
solver='liblinear'
  algorithm
lightning:
(penalty = 'l1', ) CDClassifier
  algorithm
loss='log'
  loss function
API based on algorithms, not models.
EXTENSIBILITY
Typical loss and penalties available.
Possible to pass custom loss or penalty function
clf = FistaClassifier(
loss=my_loss,
penalty=my_penalty)
(available for Fista*and SAGA*)
FUTURE CHALLENGES
Parallel stochastic methods
(Leblond, Pedregosa, Lacoste-Julien 2016)
Out of core (scale beyond computer memory).
SCIKIT-LEARN-CONTRIB
lightning is just the beginning.
Welcome projects that are:
Your browser does not support SVG
scikit-learn compatible.
Documented.
Test coverage > 80%.
THANKS FOR YOUR ATTENTION
http://contrib.scikit-learn.org/lightning/
(We're hiring!)

More Related Content

Viewers also liked

Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
Fabian Pedregosa
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learning
Arnaud Rachez
 
Observing Dark Worlds
Observing Dark WorldsObserving Dark Worlds
Observing Dark Worlds
Corey Chivers
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Gabriel Peyré
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data Science
Corey Chivers
 
A journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategyA journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategy
Aleksey Savkin
 

Viewers also liked (9)

Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learning
 
Observing Dark Worlds
Observing Dark WorldsObserving Dark Worlds
Observing Dark Worlds
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data Science
 
A journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategyA journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategy
 
Los adjetivos
Los adjetivosLos adjetivos
Los adjetivos
 

Similar to Lightning: large scale machine learning in python

Scikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in PythonScikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in PythonAjay Ohri
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVideoguy
 
Breast Cancer Prediction.pdf
Breast Cancer Prediction.pdfBreast Cancer Prediction.pdf
Breast Cancer Prediction.pdf
SouravNaga2
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
Wesley De Neve
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
Scott Collis
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsEuroCloud
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsEuroCloud
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
Gregory Newby
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
Gérard Dupont
 
Big data
Big dataBig data
AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...
Umberto Lupo
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
Andy Nisbet
 
ppt_ids-data science.pdf
ppt_ids-data science.pdfppt_ids-data science.pdf
ppt_ids-data science.pdf
PerumalPitchandi
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
Dr Sandeep Kumar Poonia
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
data science and its role in big data analytics.pptx
data science and its role in big data analytics.pptxdata science and its role in big data analytics.pptx
data science and its role in big data analytics.pptx
AkashVerma168555
 
On Big Data
On Big DataOn Big Data
On Big Data
arttan2001
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
Robert Grossman
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 

Similar to Lightning: large scale machine learning in python (20)

Scikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in PythonScikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in Python
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
 
Breast Cancer Prediction.pdf
Breast Cancer Prediction.pdfBreast Cancer Prediction.pdf
Breast Cancer Prediction.pdf
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Big data
Big dataBig data
Big data
 
AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
ppt_ids-data science.pdf
ppt_ids-data science.pdfppt_ids-data science.pdf
ppt_ids-data science.pdf
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
data science and its role in big data analytics.pptx
data science and its role in big data analytics.pptxdata science and its role in big data analytics.pptx
data science and its role in big data analytics.pptx
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 

More from Fabian Pedregosa

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
Fabian Pedregosa
 
Average case acceleration through spectral density estimation
Average case acceleration through spectral density estimationAverage case acceleration through spectral density estimation
Average case acceleration through spectral density estimation
Fabian Pedregosa
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
Fabian Pedregosa
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
Fabian Pedregosa
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
Fabian Pedregosa
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Fabian Pedregosa
 

More from Fabian Pedregosa (10)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
Average case acceleration through spectral density estimation
Average case acceleration through spectral density estimationAverage case acceleration through spectral density estimation
Average case acceleration through spectral density estimation
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
 

Recently uploaded

Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 

Recently uploaded (20)

Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 

Lightning: large scale machine learning in python

  • 1. LIGHTNING, A LIBRARY FOR LARGE-SCALE MACHINE LEARNING IN PYTHON ,Fabian Pedregosa (1) Mathieu Blondel (2) (1) Chaire Havas-Dauphine / INRIA, Paris France (2) NTT Communication Science Laboratories, Kyoto Japan
  • 2. SCIKIT-LEARN: WITH GREAT CODE COMES GREAT RESPONSABILITY #lines of code in scikit-learn Very selective for new algorithms/models.
  • 3. LIGHTNING Incorporate recent progress in large-scale optimization. scikit-learn compatible . scalable on large datasets. support for dense and sparse input. emphasis on structured sparsity penalties. dependencies = Python + Cython + scikit-learn.
  • 4. SCIKIT-LEARN COMPATIBLE mix lightning with scikit-learn Pipeline, GridSearchCV, etc. ⟹
  • 5. FROM LARGE DATA TO LARGE OPTIMIZATION Big data comes in different flavors. n{ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ D A T A ⎞ ⎠ ⎟ ⎟ ⎟ ⎟   p Large sample: Computer vision, advertising, etc. Large dimension: Biology, neuroscience, etc.
  • 6. LEARNING FROM LARGE SAMPLES Usual methods (gradient descent, BFGS, etc.): Pass through the data at each iteration. Prohibitive for large datasets. Back to simple methods: Stochastic gradient descent (Robbins and Monro, 1951).
  • 7. LEARNING FROM LARGE SAMPLES lighting example, n=100.000 In last 5 years, flurry of new stochastic methods: Stochastic variance- reduced gradient (SVRG) Stochastic Dual Coordinate Ascent (SDCA) Stochastic Average Gradient (SAG/SAGA) They are all in lightning!
  • 8. LEARNING FROM LARGE FEATURES Iterate through the columns. Coordinate Descent-like algorithms. Very efficient for sparse models. (Blondel et al. 2013) , multiclass classification with group-lasso penalty
  • 9. STRUCTURED SPARSITY There's so much more than the Lasso ... Group sparse penalty. Total variation. Trace norm (low rank).
  • 10. API Similarities and differences with scikit-learn scikit-learn: (penalty = 'l1', )LogisticRegression   loss function solver='liblinear'   algorithm lightning: (penalty = 'l1', ) CDClassifier   algorithm loss='log'   loss function API based on algorithms, not models.
  • 11. EXTENSIBILITY Typical loss and penalties available. Possible to pass custom loss or penalty function clf = FistaClassifier( loss=my_loss, penalty=my_penalty) (available for Fista*and SAGA*)
  • 12. FUTURE CHALLENGES Parallel stochastic methods (Leblond, Pedregosa, Lacoste-Julien 2016) Out of core (scale beyond computer memory).
  • 13. SCIKIT-LEARN-CONTRIB lightning is just the beginning. Welcome projects that are: Your browser does not support SVG scikit-learn compatible. Documented. Test coverage > 80%.
  • 14. THANKS FOR YOUR ATTENTION http://contrib.scikit-learn.org/lightning/ (We're hiring!)