SlideShare a Scribd company logo
1 of 33
Download to read offline
Scikit-learn: an incomplete yearly review
Ga¨el Varoquaux
scikit
machine learning in Python
Trends with
1 The library
2 The community
G Varoquaux 2
1 The library
scikit
machine learning in Python
G Varoquaux 3
1 In 0.18 oldies but goodies
G Varoquaux 4
1 In 0.18 oldies but goodies
New cross-validation objects V.R. Rajagopalan
from s k l e a r n . c r o s s v a l i d a t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d (y , n f o l d s =2)
for t r a i n , t e s t in cv :
X t r a i n = X[ t r a i n ]
y t a i n = y[ t r a i n ]
G Varoquaux 4
1 In 0.18 oldies but goodies
New cross-validation objects V.R. Rajagopalan
from s k l e a r n . m o d e l s e l e c t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d ( n f o l d s =2)
for t r a i n , t e s t in cv . s p l i t (X, y):
X t r a i n = X[ t r a i n ]
y t a i n = y[ t r a i n ]
⇒ better nested-CV
G Varoquaux 4
1 In 0.18 oldies but goodies
New cross-validation objects V.R. Rajagopalan
PCA == Randomized PCA G. Patrini
Heuristic to switch PCA to random linear algebra
Fights global warming
Huge speed gains for biggish data
G Varoquaux 4
1 Coming soon Merged in master
Memory in pipeline: G. Lemaitre
make pipeline(PCA(), LinearSVC(), memory=’/tmp/joe’)
Limits recomputation (eg in grid search)
G Varoquaux 5
1 Coming soon Merged in master
Memory in pipeline G. Lemaitre
New solver for logistic regression: SAGA A. Mensch
linear model.LogisticRegression(solver=’saga’)
Fast linear model on biggish data
Trainingobjective
SAGA
Liblinear
RCV1
G Varoquaux 5
1 Coming soon Merged in master
Memory in pipeline G. Lemaitre
New solver for logistic regression: SAGA A. Mensch
Quantile transformer: G. Lemaitre
0 2 4 6 8 10 12
Median Income
0
1
2
3
4
5
6
Numberofhouseholds
0.6
1.2
1.8
2.4
3.0
3.6
4.2
4.8
Colormappingforvaluesofy
G Varoquaux 5
1 Coming soon Merged in master
Memory in pipeline G. Lemaitre
New solver for logistic regression: SAGA A. Mensch
Quantile transformer: G. Lemaitre
0 2 4 6 8 10 12
Median Income
0
1
2
3
4
5
6
Numberofhouseholds
0.6
1.2
1.8
2.4
3.0
3.6
4.2
4.8
Colormappingforvaluesofy
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Median Income
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Numberofhouseholds
0.6
1.2
1.8
2.4
3.0
3.6
4.2
4.8
Colormappingforvaluesofy
G Varoquaux 5
1 Coming soon Merged in master
Memory in pipeline G. Lemaitre
New solver for logistic regression: SAGA A. Mensch
Quantile transformer G. Lemaitre
Local outlier factor: N. Goix
normal
abnormal
G Varoquaux 5
1 Coming soon Merged in master
Memory in pipeline G. Lemaitre
New solver for logistic regression: SAGA A. Mensch
Quantile transformer G. Lemaitre
Local outlier factor N. Goix
Memory savings
Avoid casting (work with float32) J. Massich, A. Imbert
T-SNE (in progress) T. Moreau
G Varoquaux 5
1 To come Maybe
ColumnsTransformer: J. Van den Bossche
Pandas in ... feature engineering ... array out
transformer = make column transformer({
StandardScaler(): [’age’],
OneHotEncoder(): [’company’]
})
array = transformer.fit transform(data frame)
G Varoquaux 6
1 To come Maybe
ColumnsTransformer J. Van den Bossche
Faster trees, forest& boosting:
V.R. Rajagopalan, G. Lemaitre
Teaching from XGBoost, lightgbm:
bin features for discrete values
depth-first tree, for access locality
G Varoquaux 6
1 Scaling out Infrastructure
Using many computers: cloud, elastic computing
Orchestration, data distribution
Integration in corporate infrastructure
Hadoop, queues, services
joblib backends
Parallel computing
Loky (robust single-machine process pool)
Distributed (Yarn, dask, CMFActivity)
Storage (S3, HDFS)
G Varoquaux 7
1 Continuous integration
Testing under numpy & scipy dev
A. Mueller
G Varoquaux 8
1 Scikit-learn-contrib
Scaling the scikit-learn universe quicker
https://github.com/scikit-learn-contrib
py-earth multivariate adaptive regression splines
imbalanced-learn under-sampling and over-sampling
lightning fast linear models
polylearn factorization machines and polynomial networks
hdbscan high-performance clustering
forest-confidence-interval confidence interval for forests
boruta py boruta feature selection
G Varoquaux 9
1 Scikit-learn-contrib
Scaling the scikit-learn universe quicker
https://github.com/scikit-learn-contrib
py-earth multivariate adaptive regression splines
imbalanced-learn under-sampling and over-sampling
lightning fast linear models
polylearn factorization machines and polynomial networks
hdbscan high-performance clustering
forest-confidence-interval confidence interval for forests
boruta py boruta feature selection
sklearn.utils.estimator checks.check estimator
G Varoquaux 9
2 The community
Users & developers
G Varoquaux 10
2 User base
350 000 returning users 5 000 citations
G Varoquaux 11
2 User base
350 000 returning users 5 000 citations
OS Employer
Windows
Mac
Linux
Industry Academia
Other
50%
20%
30%
63%
3%
34%
G Varoquaux 11
2 User base
Jun Jul Aug Sep Oct Nov Dec Jan
2017
Feb Mar Apr May Jun
0
20000
40000
NumberofPyPIdownloads
G Varoquaux 12
2 User base
Jun Jul Aug Sep Oct Nov Dec Jan
2017
Feb Mar Apr May Jun
0
20000
40000
60000
80000
100000NumberofPyPIdownloads numpy
pandas
scikit-learn
django
flask
G Varoquaux 12
2 In the Python ecosystem
1 10 100 1000 10000
Package rank
104
105
106
107
108
109
NumberofPyPIdownloads
G Varoquaux 13
2 In the Python ecosystem
1 10 100 1000 10000
Package rank
104
105
106
107
108
109
NumberofPyPIdownloads
numpy
scikit-learn
joblib
simplejson
sixsetuptools
G Varoquaux 13
2 Core software is infrastructure
Everybody uses it everyday
In industry, education, & research
“Roads and Bridge”: Ford foundation report
Excellent talk by Heather Miller
https://www.youtube.com/watch?v=17yy5BwIiTw
G Varoquaux 14
2 Community-based development in scikit-learn
Active development team
2010 2012 2014 2016
0
25
50Monthly contributors
https://www.openhub.net/p/scikit-learn
G Varoquaux 15
2 Funding & spending 2015 & 2016
New York A. Mueller
$ 350 000 Moore-Sloan grant
A. Mueller (full time). Students: M. Kumar, V. Birodkar
Telecom ParisTech A. Gramfort
200 000e WendelinIA grant + 12 000 e CDS
Programmers: T. Guillemot, T. Dupr´e
Students: M. Kumar, D. Sullivan, V.R. Rajagopalan, N. Goix
Inria Parietal G. Varoquaux
120 000e Inria + 100 000 e WendelinIA
+ 50 000 e ANR + 30 000 e CDS
Programmers: O. Grisel, L. Esteve (programmer), G.
Lemaitre, J. Van den Boosche
Students: A. Mensch, J. Schreiber, G. Patrini
> 400 000 e/yrG Varoquaux 16
2 Funding & spending 2015 & 2016
New York A. Mueller
$ 350 000 Moore-Sloan grant
A. Mueller (full time). Students: M. Kumar, V. Birodkar
Telecom ParisTech A. Gramfort
200 000e WendelinIA grant + 12 000 e CDS
Programmers: T. Guillemot, T. Dupr´e
Students: M. Kumar, D. Sullivan, V.R. Rajagopalan, N. Goix
Inria Parietal G. Varoquaux
120 000e Inria + 100 000 e WendelinIA
+ 50 000 e ANR + 30 000 e CDS
Programmers: O. Grisel, L. Esteve (programmer), G.
Lemaitre, J. Van den Boosche
Students: A. Mensch, J. Schreiber, G. Patrini
> 400 000 e/yrG Varoquaux 16
2 Sustainability
G Varoquaux 17
2 Sustainability
Educating decision makers
Not funding your infrastructure is a risk
A fundation
Danger: governance, focus on features for the rich
We need partners, good ones
G Varoquaux 17
@GaelVaroquaux
Scikit-learn
Machine learning for everyone
– from beginner to expert
On going progress
Faster models (algorithmics, float32)
Easier usage (better pandas integration)
Coupling to infrastructure (via joblib)
Thinking about sustainability & partnership

More Related Content

What's hot

PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?Sylvain Zimmer
 
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgSri Ambati
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow ApiArwinKhan1
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data scienceGael Varoquaux
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovationGael Varoquaux
 
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Sri Ambati
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Sujit Pal
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareGael Varoquaux
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 

What's hot (20)

PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?
 
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow Api
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
 
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam Penrose
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
The Materials API
The Materials APIThe Materials API
The Materials API
 
PHM 2013
PHM 2013PHM 2013
PHM 2013
 

Viewers also liked

Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learnAWeber
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnKan Ouivirach, Ph.D.
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learnYoss Cohen
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanChetan Khatri
 
Machine learning with scikit-learn
Machine learning with scikit-learnMachine learning with scikit-learn
Machine learning with scikit-learnQingkai Kong
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017Francesco Mosconi
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...PyData
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnAWeber
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLVillu Ruusmann
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 

Viewers also liked (20)

Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
Machine learning with scikit-learn
Machine learning with scikit-learnMachine learning with scikit-learn
Machine learning with scikit-learn
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learn
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 

Similar to Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Reactive programming with RxJS - Taiwan
Reactive programming with RxJS - TaiwanReactive programming with RxJS - Taiwan
Reactive programming with RxJS - Taiwanmodernweb
 
Group3-Gravitation.pdf
Group3-Gravitation.pdfGroup3-Gravitation.pdf
Group3-Gravitation.pdfVidhanSingh11
 
Decentralized Evolution and Consolidation of RDF Graphs
Decentralized Evolution and Consolidation of RDF GraphsDecentralized Evolution and Consolidation of RDF Graphs
Decentralized Evolution and Consolidation of RDF GraphsAksw Group
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelData Works MD
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...FestGroup
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
IPT Reactive Java IoT Demo - BGOUG 2018
IPT Reactive Java IoT Demo - BGOUG 2018IPT Reactive Java IoT Demo - BGOUG 2018
IPT Reactive Java IoT Demo - BGOUG 2018Trayan Iliev
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
CloudStack news
CloudStack newsCloudStack news
CloudStack newsShapeBlue
 
PyParis2018 - Python tooling for continuous deployment
PyParis2018 - Python tooling for continuous deploymentPyParis2018 - Python tooling for continuous deployment
PyParis2018 - Python tooling for continuous deploymentArthur Lutz
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 

Similar to Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux (20)

Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Reactive programming with RxJS - Taiwan
Reactive programming with RxJS - TaiwanReactive programming with RxJS - Taiwan
Reactive programming with RxJS - Taiwan
 
Group3-Gravitation.pdf
Group3-Gravitation.pdfGroup3-Gravitation.pdf
Group3-Gravitation.pdf
 
Decentralized Evolution and Consolidation of RDF Graphs
Decentralized Evolution and Consolidation of RDF GraphsDecentralized Evolution and Consolidation of RDF Graphs
Decentralized Evolution and Consolidation of RDF Graphs
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...
JavaFest. Cedrick Lunven. Build APIS with SpringBoot - REST, GRPC, GRAPHQL wh...
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
Lecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptxLecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptx
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
IPT Reactive Java IoT Demo - BGOUG 2018
IPT Reactive Java IoT Demo - BGOUG 2018IPT Reactive Java IoT Demo - BGOUG 2018
IPT Reactive Java IoT Demo - BGOUG 2018
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
CloudStack news
CloudStack newsCloudStack news
CloudStack news
 
PyParis2018 - Python tooling for continuous deployment
PyParis2018 - Python tooling for continuous deploymentPyParis2018 - Python tooling for continuous deployment
PyParis2018 - Python tooling for continuous deployment
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

  • 1. Scikit-learn: an incomplete yearly review Ga¨el Varoquaux scikit machine learning in Python
  • 2. Trends with 1 The library 2 The community G Varoquaux 2
  • 3. 1 The library scikit machine learning in Python G Varoquaux 3
  • 4. 1 In 0.18 oldies but goodies G Varoquaux 4
  • 5. 1 In 0.18 oldies but goodies New cross-validation objects V.R. Rajagopalan from s k l e a r n . c r o s s v a l i d a t i o n import S t r a t i f i e d K F o l d cv = S t r a t i f i e d K F o l d (y , n f o l d s =2) for t r a i n , t e s t in cv : X t r a i n = X[ t r a i n ] y t a i n = y[ t r a i n ] G Varoquaux 4
  • 6. 1 In 0.18 oldies but goodies New cross-validation objects V.R. Rajagopalan from s k l e a r n . m o d e l s e l e c t i o n import S t r a t i f i e d K F o l d cv = S t r a t i f i e d K F o l d ( n f o l d s =2) for t r a i n , t e s t in cv . s p l i t (X, y): X t r a i n = X[ t r a i n ] y t a i n = y[ t r a i n ] ⇒ better nested-CV G Varoquaux 4
  • 7. 1 In 0.18 oldies but goodies New cross-validation objects V.R. Rajagopalan PCA == Randomized PCA G. Patrini Heuristic to switch PCA to random linear algebra Fights global warming Huge speed gains for biggish data G Varoquaux 4
  • 8. 1 Coming soon Merged in master Memory in pipeline: G. Lemaitre make pipeline(PCA(), LinearSVC(), memory=’/tmp/joe’) Limits recomputation (eg in grid search) G Varoquaux 5
  • 9. 1 Coming soon Merged in master Memory in pipeline G. Lemaitre New solver for logistic regression: SAGA A. Mensch linear model.LogisticRegression(solver=’saga’) Fast linear model on biggish data Trainingobjective SAGA Liblinear RCV1 G Varoquaux 5
  • 10. 1 Coming soon Merged in master Memory in pipeline G. Lemaitre New solver for logistic regression: SAGA A. Mensch Quantile transformer: G. Lemaitre 0 2 4 6 8 10 12 Median Income 0 1 2 3 4 5 6 Numberofhouseholds 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8 Colormappingforvaluesofy G Varoquaux 5
  • 11. 1 Coming soon Merged in master Memory in pipeline G. Lemaitre New solver for logistic regression: SAGA A. Mensch Quantile transformer: G. Lemaitre 0 2 4 6 8 10 12 Median Income 0 1 2 3 4 5 6 Numberofhouseholds 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8 Colormappingforvaluesofy 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Median Income 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Numberofhouseholds 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8 Colormappingforvaluesofy G Varoquaux 5
  • 12. 1 Coming soon Merged in master Memory in pipeline G. Lemaitre New solver for logistic regression: SAGA A. Mensch Quantile transformer G. Lemaitre Local outlier factor: N. Goix normal abnormal G Varoquaux 5
  • 13. 1 Coming soon Merged in master Memory in pipeline G. Lemaitre New solver for logistic regression: SAGA A. Mensch Quantile transformer G. Lemaitre Local outlier factor N. Goix Memory savings Avoid casting (work with float32) J. Massich, A. Imbert T-SNE (in progress) T. Moreau G Varoquaux 5
  • 14. 1 To come Maybe ColumnsTransformer: J. Van den Bossche Pandas in ... feature engineering ... array out transformer = make column transformer({ StandardScaler(): [’age’], OneHotEncoder(): [’company’] }) array = transformer.fit transform(data frame) G Varoquaux 6
  • 15. 1 To come Maybe ColumnsTransformer J. Van den Bossche Faster trees, forest& boosting: V.R. Rajagopalan, G. Lemaitre Teaching from XGBoost, lightgbm: bin features for discrete values depth-first tree, for access locality G Varoquaux 6
  • 16. 1 Scaling out Infrastructure Using many computers: cloud, elastic computing Orchestration, data distribution Integration in corporate infrastructure Hadoop, queues, services joblib backends Parallel computing Loky (robust single-machine process pool) Distributed (Yarn, dask, CMFActivity) Storage (S3, HDFS) G Varoquaux 7
  • 17. 1 Continuous integration Testing under numpy & scipy dev A. Mueller G Varoquaux 8
  • 18. 1 Scikit-learn-contrib Scaling the scikit-learn universe quicker https://github.com/scikit-learn-contrib py-earth multivariate adaptive regression splines imbalanced-learn under-sampling and over-sampling lightning fast linear models polylearn factorization machines and polynomial networks hdbscan high-performance clustering forest-confidence-interval confidence interval for forests boruta py boruta feature selection G Varoquaux 9
  • 19. 1 Scikit-learn-contrib Scaling the scikit-learn universe quicker https://github.com/scikit-learn-contrib py-earth multivariate adaptive regression splines imbalanced-learn under-sampling and over-sampling lightning fast linear models polylearn factorization machines and polynomial networks hdbscan high-performance clustering forest-confidence-interval confidence interval for forests boruta py boruta feature selection sklearn.utils.estimator checks.check estimator G Varoquaux 9
  • 20. 2 The community Users & developers G Varoquaux 10
  • 21. 2 User base 350 000 returning users 5 000 citations G Varoquaux 11
  • 22. 2 User base 350 000 returning users 5 000 citations OS Employer Windows Mac Linux Industry Academia Other 50% 20% 30% 63% 3% 34% G Varoquaux 11
  • 23. 2 User base Jun Jul Aug Sep Oct Nov Dec Jan 2017 Feb Mar Apr May Jun 0 20000 40000 NumberofPyPIdownloads G Varoquaux 12
  • 24. 2 User base Jun Jul Aug Sep Oct Nov Dec Jan 2017 Feb Mar Apr May Jun 0 20000 40000 60000 80000 100000NumberofPyPIdownloads numpy pandas scikit-learn django flask G Varoquaux 12
  • 25. 2 In the Python ecosystem 1 10 100 1000 10000 Package rank 104 105 106 107 108 109 NumberofPyPIdownloads G Varoquaux 13
  • 26. 2 In the Python ecosystem 1 10 100 1000 10000 Package rank 104 105 106 107 108 109 NumberofPyPIdownloads numpy scikit-learn joblib simplejson sixsetuptools G Varoquaux 13
  • 27. 2 Core software is infrastructure Everybody uses it everyday In industry, education, & research “Roads and Bridge”: Ford foundation report Excellent talk by Heather Miller https://www.youtube.com/watch?v=17yy5BwIiTw G Varoquaux 14
  • 28. 2 Community-based development in scikit-learn Active development team 2010 2012 2014 2016 0 25 50Monthly contributors https://www.openhub.net/p/scikit-learn G Varoquaux 15
  • 29. 2 Funding & spending 2015 & 2016 New York A. Mueller $ 350 000 Moore-Sloan grant A. Mueller (full time). Students: M. Kumar, V. Birodkar Telecom ParisTech A. Gramfort 200 000e WendelinIA grant + 12 000 e CDS Programmers: T. Guillemot, T. Dupr´e Students: M. Kumar, D. Sullivan, V.R. Rajagopalan, N. Goix Inria Parietal G. Varoquaux 120 000e Inria + 100 000 e WendelinIA + 50 000 e ANR + 30 000 e CDS Programmers: O. Grisel, L. Esteve (programmer), G. Lemaitre, J. Van den Boosche Students: A. Mensch, J. Schreiber, G. Patrini > 400 000 e/yrG Varoquaux 16
  • 30. 2 Funding & spending 2015 & 2016 New York A. Mueller $ 350 000 Moore-Sloan grant A. Mueller (full time). Students: M. Kumar, V. Birodkar Telecom ParisTech A. Gramfort 200 000e WendelinIA grant + 12 000 e CDS Programmers: T. Guillemot, T. Dupr´e Students: M. Kumar, D. Sullivan, V.R. Rajagopalan, N. Goix Inria Parietal G. Varoquaux 120 000e Inria + 100 000 e WendelinIA + 50 000 e ANR + 30 000 e CDS Programmers: O. Grisel, L. Esteve (programmer), G. Lemaitre, J. Van den Boosche Students: A. Mensch, J. Schreiber, G. Patrini > 400 000 e/yrG Varoquaux 16
  • 32. 2 Sustainability Educating decision makers Not funding your infrastructure is a risk A fundation Danger: governance, focus on features for the rich We need partners, good ones G Varoquaux 17
  • 33. @GaelVaroquaux Scikit-learn Machine learning for everyone – from beginner to expert On going progress Faster models (algorithmics, float32) Easier usage (better pandas integration) Coupling to infrastructure (via joblib) Thinking about sustainability & partnership