Machine	
  	
  
Learning	
  
From	
  xkcd	
  
Machine learning -
using scikit-learn

Qingkai Kong

2017-06-28

h+p://seismo.berkeley.edu/qingkaikong/	
  
h+ps://github.com/qingkaikong/20170628_ML_sklearn	
  
AFer	
  downloads	
  
Workshop time

Learningcurve

Gentle Introduction

•  What’s ML

•  Types of ML

ML tours

•  Classification

•  Regression

•  Clsutering

•  Dim. reduction

Common Practices

h+ps://github.com/qingkaikong/20170628_ML_sklearn	
  
What is machine learning?

h+ps://github.com/qingkaikong/20170628_ML_sklearn	
  
Data

examples

Tunable

Model

Optimization

algorithm

Trained

Model

Pipeline of training a
machine learning model.
Self-driving car

Voice recognition 

…

h+ps://github.com/qingkaikong/20170628_ML_sklearn	
  
Not always
working
Common types of machine learning

Supervised

learning

Unsupervised

learning
Supervised learning
Unsupervised learning
Supervised

learning

Regression

 Classification
Regression

X

Y
Regression

X

Y
Classification
Classification
Unsupervised

learning

Clustering

 Dimensionality

reduction
Clustering
Clustering
Dimensionality reduction
	
  
Figure	
  from	
  h+ps://stats.stackexchange.com/quesIons/183236/what-­‐is-­‐the-­‐relaIon-­‐between-­‐k-­‐means-­‐clustering-­‐and-­‐pca	
  
Data

examples

Tunable

Model

Optimization

algorithm

Trained

Model

Pipeline of training a
machine learning model.
Representation of data	
  
Documents

Images

Numbers

Sounds

Raw data
Representation of data	
  
Documents

Images

Numbers

Sounds

Raw data

n_features

n_samples

Feature matrix (X)

 Target

(y)

n_samples
Optimization	
  
How we optimize?

X

Y

A simple example
Measuring error

X

Y

error

Parameter

 Error

ω1

 e1
Measuring error

X

Y

Parameter

 Error

ω1

 e1

ω2

 e2
Measuring error

X

Y

Parameter

 Error

ω1

 e1

ω2

 e2

ω3

 e3
Measuring error

X

Y

Parameter

 Error

ω1

 e1

ω2

 e2

ω3

 e3
ω

Error

e

Cost function
ω

Error

e

Optimization

Initial 

start

Target

Of course, there are many different types of cost functions and
optimization algorithms, but the general idea is very similar.
Ideal Cost Function	
  
Real-world Cost Function	
  
Support Vector Machine

 Artificial Neural Network

K-means

Naïve Bayes

Decision Tree

Random Forest

Ridge Regression

Self-Organizing Map

Logistic Regression

Linear Regression

Stepwise Regression

Nearest Neighbor

Hierarchical clustering

Principal Component Analysis 

Gaussian Mixture Model

AdaBoost

 Boosting

Linear Discriminant Analysis

Quadratic Discriminant Analysis

Convolutional Neural Network

Bayesian Network

LASSO

Ordinary Least Squares Regression

Recurrent Neural Network

Generative Adversarial Networks

Radial Basis Function Network
Support Vector Machine

 Artificial Neural Network

Naïve Bayes

Decision Tree

Ridge Regression

Self-Organizing Map

Logistic Regression

Linear Regression

Stepwise Regression

Nearest Neighbor

Hierarchical clustering

Principal Component Analysis 

Gaussian Mixture Model

AdaBoost

 Boosting

Linear Discriminant Analysis

Quadratic Discriminant Analysis

Convolutional Neural Network

Bayesian Network

 Ordinary Least Squares Regression

Recurrent Neural Network

Generative Adversarial Networks

Radial Basis Function Network 

K-means

Random Forest

 LASSO
h+p://scikit-­‐learn.org/stable/	
  
Go to notebook 01
Classification	
  
Support Vector Machine	
  
Artificial Neural Network

Supervised

learning
Which one is
better?	
  
A new data point	
  
Wide margin	
  
Narrow margin	
  
Support Vectors	
  
ANN
X

Σ y

g
Input

Output

feature2

feature3

ω0

ω2

ω3

X – input data

y – output target

ωi – weights

Σ – summation

g – activation function

Blue circle – bias

feature1

ω1

1
Z = Σ = ω0x0+ω1x1+ω2x2+ω3x3+…+ωnxn

	
  
y = g(ω0x0+ω1x1+ω2x2+ω3x3+…+ωnxn)
X

Σ y

g
Input

Output

1
feature2

feature3

X – input data

y – output target

Σ – summation

g – activation function

blue circle - bias

feature1

Hidden1

Σ | g
Hidden3

Σ | g
Hidden4

Σ | g
Hidden2

Σ | g
1
.	
  
.	
  
.	
  
w1

w2

wn

F(eye×w1+nose×w2+…+mouth×wn)

Sheldon	
  
Cooper?	
  
Intuitive 
Artificial Neural Network

Output

Input
.	
  
.	
  
.	
  
w1

w2

wn

F(eye×w1+nose×w2+…+mouth×wn)

Sheldon	
  
Cooper?	
  
feedback

error

Intuitive 
Artificial Neural Network

Output

Input
.	
  
.	
  
.	
  
w1

w2

wn

F(eye×w1+nose×w2+…+mouth×wn)

Sheldon	
  
Cooper?	
  
feedback

error

Intuitive 
Artificial Neural Network

Output

Input
.	
  
.	
  
.	
  
w1

w2

wn

F(eye×w1+nose×w2+…+mouth×wn)

Intuitive 
Artificial Neural Network

Output

Input
Go to notebook 02
Regression	
  
Supervised

learning
Simple linear regression

X

Y

y2

line: y = a + bx 

ŷ2

y2-ŷ2

x1

y1

Σ (yi-ŷi)2

i = 1

n

Minimize:
Random Forest	
  
h+p://scikit-­‐learn.org/stable/
modules/tree.html#tree	
  
Majority vote

Random Forest
Go to notebook 03
Unsupervised	
  
Principal component analysis 	
  
K-means
Figure	
  from:	
  h+ps://shapeofdata.wordpress.com/2013/04/09/principle-­‐component-­‐analysis/	
  
PCA
Figure	
  from:	
  h+p://setosa.io/ev/principal-­‐component-­‐analysis/	
  
Kmeans
Go to notebook 04
More on common practices
Machine	
  learning	
  is	
  all	
  about	
  
GeneralizaIon	
  
Which one is 
better?

?

?

?
Clearly, 
black is better 

Underfitting

Just right

Overfitting
Train/test dataset split
Test

 Train

Train/test dataset split

20%

 80%
Model Complexity

MeanError

Underfitting

 Overfitting

Just right
Test

 Train

5-fold cross-validation
Pre-processing data
Age

 Salary

24

 $110,000

36

 $130,000

38

 $80,000

44

 $420,000

27

 $420,000

43

 $12,000,000

…

 …

Why do we need?
Check	
  out:	
  h+p://scikit-­‐learn.org/stable/modules/preprocessing.html	
  
•  Standardization

o  Zero mean and unit variance

•  Scale to a range

o  i.e. (0, 1)

•  Normalization

o  Unit norm

•  …

Many ways
Go to notebook 05
More resources
InteracIve	
  version:	
  h+p://scikit-­‐learn.org/stable/tutorial/machine_learning_map/index.html	
  
Conclude with notebook 06

Machine learning with scikit-learn