SlideShare a Scribd company logo
Scikit-learn The state of the union
Ga¨el Varoquaux Open Source Innovation Spring
2016
Personal point of view, as an opening to scikit-learn days 2016 in Paris
1 Some history
Scikit-learn canal historique
G Varoquaux 2
1 scikit-learn growth: users
Website users (weekly): Google analytics
Debian popcon: ∼ 1% of the Debian users
G Varoquaux 3
1 scikit-learn growth: users
Website users (weekly): Google analytics
Debian popcon: ∼ 1% of the Debian users
Web searches: Google trends
G Varoquaux 3
1 scikit-learn growth: lines of code
Lines of code:
Huge feature set
https://www.openhub.net/p/scikit-learn
G Varoquaux 4
1 scikit-learn growth: contributors
Contributors:
759 contributors
https://www.openhub.net/p/scikit-learn
G Varoquaux 5
1 Started as David Cournapeau’s failed PhD project
David then preferred
improving numpy/scipy
That’s David sprinting in 2011
G Varoquaux 6
1 2009: We (Inria Parietal) need machine learning
My team takes over the
development
Hire a young guy
(Fabian Pedregosa)
Put post-docs and PhDs
(Alexandre Gramfort, Vincent Michel...)
Work in the open
Pythonic, fast, documented
G Varoquaux 7
1 2010: ICML MLOSS workshop
Machine Learning Open Source Software
“The examples in the
tutorial are pretty, but
not particularly useful
for the serious user.”
“For the sustainability of
the project it might be bet-
ter to narrow the focus...”
G Varoquaux 8
1 2011: NIPS sprint
People that I didn’t know
were solving my problems
G Varoquaux 9
1 2011: NIPS sprint
People that I didn’t know
were solving my problems
The project took off because of the community...
G Varoquaux 9
2 Upcoming cool stuff
Upcoming 0.18 release
G Varoquaux 10
2 Less code:
Lines of code:
G Varoquaux 11
2 Less code: Cython no longer embedded
Lines of code:
Generated C no longuer embedded in git
⇒ opens the door to fused-types (polymorphism)
⇒ multiple dtypes support in algorithm
= memory saver
Arthur MenschG Varoquaux 11
2 Faster code: better algorithmics
RandomizedPCA → PCA
Automatic choice randomized linear algebra
power iteration (arpack) full (lapack)
For large data: up to 20× speed up
https://github.com/scikit-learn/scikit-learn/issues/5243
Giorgio Patrini
G Varoquaux 12
2 Faster code: better algorithmics
RandomizedPCA → PCA
Automatic choice randomized linear algebra
power iteration (arpack) full (lapack)
For large data: up to 20× speed up
https://github.com/scikit-learn/scikit-learn/issues/5243
Giorgio Patrini
Elkan’s K means
For large data: ∼ 2× speed up.
https://github.com/scikit-learn/scikit-learn/pull/5414
Andreas M¨uller
G Varoquaux 12
2 New cross-validation objects
from s k l e a r n . c r o s s v a l i d a t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d (y , n f o l d s =2)
for t r a i n , t e s t in cv :
X t r a i n = X[ t r a i n ]
y t a i n = y[ t r a i n ]
Data-independent nested-CV possible
https://github.com/scikit-learn/scikit-learn/pull/4294
Raghav R V
G Varoquaux 13
2 New cross-validation objects
from s k l e a r n . m o d e l s e l e c t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d ( n f o l d s =2)
for t r a i n , t e s t in cv . s p l i t (X, y):
X t r a i n = X[ t r a i n ]
y t a i n = y[ t r a i n ]
Data-independent ⇒ nested-CV possible
https://github.com/scikit-learn/scikit-learn/pull/4294
Raghav R V
G Varoquaux 13
2 Sequential / Bayesian search CV
See hyper-parameter selection as a Bayesian
optimization / noisy fit problem.
⇒ choose hyper-parameters cleverly, not on a grid
Pull request stalled
https://github.com/scikit-learn/scikit-learn/pull/5491
Fabian Pedregosa, Sebastien Dubois, & Manoj Kumar
G Varoquaux 14
3 Vision(s): the future
G Varoquaux 15
Mission statement
Enable progress via data science
Lower the costs,
less technicalities
Machine learning
for everybody and
for everything
G Varoquaux 16
Mission statement
Enable progress via data science
Lower the costs,
less technicalities
Machine learning
for everybody and
for everything
Small hardware,
medium data
G Varoquaux 16
3 Deep learning
sklearn.neural network.MLPClassifier
architecture-specification language
GPUs unbound technicality
G Varoquaux 17
3 Deep learning
sklearn.neural network.MLPClassifier
architecture-specification language
GPUs unbound technicality
keras, caffe...
G Varoquaux 17
3 AutoML
Automatic model selection
Better hyper-parameter selection
Better description and uniformization of estimators
Integrate feedback from auto-sklearn
G Varoquaux 18
3 Better, faster, stronger
Faster models
From lightning, back to sklearn
Inspiration from XGBoost the paper is out!
G Varoquaux 19
3 Better, faster, stronger
Faster models
From lightning, back to sklearn
Inspiration from XGBoost the paper is out!
Larger data
More partial fit online forests?
Less copies
G Varoquaux 19
3 Scaling up (out?)
I don’t want java/scala
Less fluid prototyping
Cross-VM debugging hard
Numerics in java slowers than Lapack
Need C somewhere
G Varoquaux 20
3 Scaling up (out?)
I don’t want java/scala
They have:
Coupling distributed store to computation
Distributed job management
Create new stack? Ride on this one?
G Varoquaux 20
3 Scaling up (out?)
I don’t want java/scala
They have:
Coupling distributed store to computation
Distributed job management
Create new stack? Ride on this one?
Blaze, Ibis, dask: require rewrite of algorithms
dask promising for ETL
New backends for joblib parallel and storage
distributed, ssh
G Varoquaux 20
Sustainable growth
Reviewing is the bottleneck
User support drowns core devs
Users need stability (Airbus)
Coding is not the only thing
sprint, GSOC management, tutorials...
G Varoquaux 21
Sustainable growth
Reviewing is the bottleneck
User support drowns core devs
Users need stability (Airbus)
Coding is not the only thing
sprint, GSOC management, tutorials...
Structure & stability
How to organize funding and governance?
process/meetings/reports/funding proposal...
= work on project
Passionate coders get a lot done
unless they get drowned by meetings
G Varoquaux 21
@GaelVaroquaux
Funding: Inria, Nexedi, Paris-Saclay CDS, NYU CDS, GSoC

More Related Content

What's hot

Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...Edureka!
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
 
Transfer learning, active learning using tensorflow object detection api
Transfer learning, active learning  using tensorflow object detection apiTransfer learning, active learning  using tensorflow object detection api
Transfer learning, active learning using tensorflow object detection api설기 김
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchgeetachauhan
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...Kamila Stępniowska
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming..."The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...Edge AI and Vision Alliance
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 

What's hot (20)

Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Transfer learning, active learning using tensorflow object detection api
Transfer learning, active learning  using tensorflow object detection apiTransfer learning, active learning  using tensorflow object detection api
Transfer learning, active learning using tensorflow object detection api
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Python and Sage
Python and SagePython and Sage
Python and Sage
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming..."The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 

Viewers also liked

Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learnYoss Cohen
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017Francesco Mosconi
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnAWeber
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...PyData
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learnAWeber
 
Machine learning with scikit-learn
Machine learning with scikit-learnMachine learning with scikit-learn
Machine learning with scikit-learnQingkai Kong
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanChetan Khatri
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnKan Ouivirach, Ph.D.
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLVillu Ruusmann
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 

Viewers also liked (20)

Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learn
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 
Machine learning with scikit-learn
Machine learning with scikit-learnMachine learning with scikit-learn
Machine learning with scikit-learn
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 

Similar to Scikit-learn: the state of the union 2016

Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Bootiful Reactive Testing - Mario Gray
Bootiful Reactive Testing - Mario GrayBootiful Reactive Testing - Mario Gray
Bootiful Reactive Testing - Mario GrayVMware Tanzu
 
OpenCon2014 - Sumatra as an Open Science tool
OpenCon2014 - Sumatra as an Open Science toolOpenCon2014 - Sumatra as an Open Science tool
OpenCon2014 - Sumatra as an Open Science toolFelix Z. Hoffmann
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Jian-Hong Pan
 

Similar to Scikit-learn: the state of the union 2016 (20)

Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
ConSol_IBM_webcast_quarkus_the_blue_hedgehog_of_java_web_frameworks
ConSol_IBM_webcast_quarkus_the_blue_hedgehog_of_java_web_frameworksConSol_IBM_webcast_quarkus_the_blue_hedgehog_of_java_web_frameworks
ConSol_IBM_webcast_quarkus_the_blue_hedgehog_of_java_web_frameworks
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Bootiful Reactive Testing - Mario Gray
Bootiful Reactive Testing - Mario GrayBootiful Reactive Testing - Mario Gray
Bootiful Reactive Testing - Mario Gray
 
OpenCon2014 - Sumatra as an Open Science tool
OpenCon2014 - Sumatra as an Open Science toolOpenCon2014 - Sumatra as an Open Science tool
OpenCon2014 - Sumatra as an Open Science tool
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Gael Varoquaux
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonGael Varoquaux
 

More from Gael Varoquaux (20)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 

Scikit-learn: the state of the union 2016

  • 1. Scikit-learn The state of the union Ga¨el Varoquaux Open Source Innovation Spring 2016 Personal point of view, as an opening to scikit-learn days 2016 in Paris
  • 2. 1 Some history Scikit-learn canal historique G Varoquaux 2
  • 3. 1 scikit-learn growth: users Website users (weekly): Google analytics Debian popcon: ∼ 1% of the Debian users G Varoquaux 3
  • 4. 1 scikit-learn growth: users Website users (weekly): Google analytics Debian popcon: ∼ 1% of the Debian users Web searches: Google trends G Varoquaux 3
  • 5. 1 scikit-learn growth: lines of code Lines of code: Huge feature set https://www.openhub.net/p/scikit-learn G Varoquaux 4
  • 6. 1 scikit-learn growth: contributors Contributors: 759 contributors https://www.openhub.net/p/scikit-learn G Varoquaux 5
  • 7. 1 Started as David Cournapeau’s failed PhD project David then preferred improving numpy/scipy That’s David sprinting in 2011 G Varoquaux 6
  • 8. 1 2009: We (Inria Parietal) need machine learning My team takes over the development Hire a young guy (Fabian Pedregosa) Put post-docs and PhDs (Alexandre Gramfort, Vincent Michel...) Work in the open Pythonic, fast, documented G Varoquaux 7
  • 9. 1 2010: ICML MLOSS workshop Machine Learning Open Source Software “The examples in the tutorial are pretty, but not particularly useful for the serious user.” “For the sustainability of the project it might be bet- ter to narrow the focus...” G Varoquaux 8
  • 10. 1 2011: NIPS sprint People that I didn’t know were solving my problems G Varoquaux 9
  • 11. 1 2011: NIPS sprint People that I didn’t know were solving my problems The project took off because of the community... G Varoquaux 9
  • 12. 2 Upcoming cool stuff Upcoming 0.18 release G Varoquaux 10
  • 13. 2 Less code: Lines of code: G Varoquaux 11
  • 14. 2 Less code: Cython no longer embedded Lines of code: Generated C no longuer embedded in git ⇒ opens the door to fused-types (polymorphism) ⇒ multiple dtypes support in algorithm = memory saver Arthur MenschG Varoquaux 11
  • 15. 2 Faster code: better algorithmics RandomizedPCA → PCA Automatic choice randomized linear algebra power iteration (arpack) full (lapack) For large data: up to 20× speed up https://github.com/scikit-learn/scikit-learn/issues/5243 Giorgio Patrini G Varoquaux 12
  • 16. 2 Faster code: better algorithmics RandomizedPCA → PCA Automatic choice randomized linear algebra power iteration (arpack) full (lapack) For large data: up to 20× speed up https://github.com/scikit-learn/scikit-learn/issues/5243 Giorgio Patrini Elkan’s K means For large data: ∼ 2× speed up. https://github.com/scikit-learn/scikit-learn/pull/5414 Andreas M¨uller G Varoquaux 12
  • 17. 2 New cross-validation objects from s k l e a r n . c r o s s v a l i d a t i o n import S t r a t i f i e d K F o l d cv = S t r a t i f i e d K F o l d (y , n f o l d s =2) for t r a i n , t e s t in cv : X t r a i n = X[ t r a i n ] y t a i n = y[ t r a i n ] Data-independent nested-CV possible https://github.com/scikit-learn/scikit-learn/pull/4294 Raghav R V G Varoquaux 13
  • 18. 2 New cross-validation objects from s k l e a r n . m o d e l s e l e c t i o n import S t r a t i f i e d K F o l d cv = S t r a t i f i e d K F o l d ( n f o l d s =2) for t r a i n , t e s t in cv . s p l i t (X, y): X t r a i n = X[ t r a i n ] y t a i n = y[ t r a i n ] Data-independent ⇒ nested-CV possible https://github.com/scikit-learn/scikit-learn/pull/4294 Raghav R V G Varoquaux 13
  • 19. 2 Sequential / Bayesian search CV See hyper-parameter selection as a Bayesian optimization / noisy fit problem. ⇒ choose hyper-parameters cleverly, not on a grid Pull request stalled https://github.com/scikit-learn/scikit-learn/pull/5491 Fabian Pedregosa, Sebastien Dubois, & Manoj Kumar G Varoquaux 14
  • 20. 3 Vision(s): the future G Varoquaux 15
  • 21. Mission statement Enable progress via data science Lower the costs, less technicalities Machine learning for everybody and for everything G Varoquaux 16
  • 22. Mission statement Enable progress via data science Lower the costs, less technicalities Machine learning for everybody and for everything Small hardware, medium data G Varoquaux 16
  • 23. 3 Deep learning sklearn.neural network.MLPClassifier architecture-specification language GPUs unbound technicality G Varoquaux 17
  • 24. 3 Deep learning sklearn.neural network.MLPClassifier architecture-specification language GPUs unbound technicality keras, caffe... G Varoquaux 17
  • 25. 3 AutoML Automatic model selection Better hyper-parameter selection Better description and uniformization of estimators Integrate feedback from auto-sklearn G Varoquaux 18
  • 26. 3 Better, faster, stronger Faster models From lightning, back to sklearn Inspiration from XGBoost the paper is out! G Varoquaux 19
  • 27. 3 Better, faster, stronger Faster models From lightning, back to sklearn Inspiration from XGBoost the paper is out! Larger data More partial fit online forests? Less copies G Varoquaux 19
  • 28. 3 Scaling up (out?) I don’t want java/scala Less fluid prototyping Cross-VM debugging hard Numerics in java slowers than Lapack Need C somewhere G Varoquaux 20
  • 29. 3 Scaling up (out?) I don’t want java/scala They have: Coupling distributed store to computation Distributed job management Create new stack? Ride on this one? G Varoquaux 20
  • 30. 3 Scaling up (out?) I don’t want java/scala They have: Coupling distributed store to computation Distributed job management Create new stack? Ride on this one? Blaze, Ibis, dask: require rewrite of algorithms dask promising for ETL New backends for joblib parallel and storage distributed, ssh G Varoquaux 20
  • 31. Sustainable growth Reviewing is the bottleneck User support drowns core devs Users need stability (Airbus) Coding is not the only thing sprint, GSOC management, tutorials... G Varoquaux 21
  • 32. Sustainable growth Reviewing is the bottleneck User support drowns core devs Users need stability (Airbus) Coding is not the only thing sprint, GSOC management, tutorials... Structure & stability How to organize funding and governance? process/meetings/reports/funding proposal... = work on project Passionate coders get a lot done unless they get drowned by meetings G Varoquaux 21
  • 33. @GaelVaroquaux Funding: Inria, Nexedi, Paris-Saclay CDS, NYU CDS, GSoC