Introduction to scikit-learn
Predictive modeling in Python
Olivier Grisel
Slides: ogrisel.github.io/decks/2019_intro_sklearn
1 / 54
Agenda
Machine Learning refresher
Where do predictive models fit?
Scikit-learn
The open-source development process
2 / 54
Predictive Modeling 101
Make predictions of outcome of repeated
events
Extract the structure of historical records
Statistical tools to summarize the training data
into an executable model
Alternative to hard-coded rules written by
experts
3 / 54
4 / 54
5 / 54
6 / 54
7 / 54
8 / 54
9 / 54
10 / 54
11 / 54
12 / 54
13 / 54
14 / 54
15 / 54
16 / 54
17 / 54
18 / 54
19 / 54
20 / 54
Where do predictive models
fit?
21 / 54
22 / 54
23 / 54
24 / 54
Scikit-learn
25 / 54
Library of Machine Learning algorithms
Open Source project
Python / NumPy / SciPy / Cython
Simple fit / predict / transform API
Model Assessment, Selection, Ensembles
26 / 54
Core team: Australia, China, France, Germany,
USA
Users: Scientific/Academic and
Business/Industry
27 / 54
28 / 54
29 / 54
30 / 54
31 / 54
32 / 54
Linear Classifier
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C=1, penalty='l1')
model.fit(X_train, y_train)
y_predicted = model.predict(X_test)
from sklearn.metrics import f1_score
f1_score(y_test, y_predicted)
33 / 54
Support Vector Machine
from sklearn.svm import SVC
model = SVC(kernel="rbf", C=1.0, gamma=1e-4)
model.fit(X_train, y_train)
y_predicted = model.predict(X_test)
from sklearn.metrics import f1_score
f1_score(y_test, y_predicted)
34 / 54
Random Forest
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=200)
model.fit(X_train, y_train)
y_predicted = model.predict(X_test)
from sklearn.metrics import f1_score
f1_score(y_test, y_predicted)
35 / 54
36 / 54
37 / 54
Scikit-learn development
38 / 54
39 / 54
40 / 54
41 / 54
42 / 54
43 / 54
44 / 54
45 / 54
46 / 54
47 / 54
48 / 54
49 / 54
Conclusion
50 / 54
51 / 54
Secrets of the success of
Python (& R) in Data Science
Iterative exploration with built-in plotting tools
Low latency of single host in-memory
computing
Easy to install, easy to teach: no-sysadmin
required
Rich ecosystem of libraries
52 / 54
Thank you for your attention!
https://scikit-learn.org
Slides: ogrisel.github.io/decks/2019_intro_sklearn
@ogrisel on twitter
53 / 54
Background image credits
https://www.flickr.com/photos/jemimus/8533890844/
https://www.flickr.com/photos/antcaz/2249694239/
https://www.flickr.com/photos/benjamine-
s/14004414605
https://www.flickr.com/photos/a-herzog/9026372290
54 / 54

#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria