SlideShare a Scribd company logo
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
CS-E3210 Machine Learning: Basic Principles
Lecture 5: Classification I
slides by Alexander Jung, 2017
Department of Computer Science
Aalto University, School of Science
Autumn (Period I) 2017
1 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Today’s Motto
similar features give similar labels
2 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Material
this lecture is inspired by
video lectures of Andrew Ng
https://www.youtube.com/watch?v=-la3q9d7AKQ
https://www.youtube.com/watch?v=7F-CuXdTQ5k
lecture notes
http://cs229.stanford.edu/notes/cs229-notes1.pdf
Ch. 2.2 of the tutorial “Kernel Methods in Computer Vision”
by Ch. Lampert https://pub.ist.ac.at/~chl/papers/
lampert-fnt2009.pdf
lecture notes http://www.robots.ox.ac.uk/~az/
lectures/ml/lect2.pdf
3 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
In A Nutshell
today we consider classification problems
consider data points z with features x and label y
want to learn classifier h(·) for predicting y based on h(x)
today we consider parametric classifiers h(w,b)
a classifier is represented by parameters w, b
we learn/find optimal parameters w, b using training data X
once we have learnt optimal parameter, we can discard data !
4 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Outline
1 A Classification Problem
2 Logistic Regression
3 Support Vector Classification
4 Wrap Up
5 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Ski Resort Marketing
you are working in the marketing agency of a ski resort
hard disk full of webcam snapshots (gigabytes of data)
want to group them into “winter” and ”summer” images
you have only a few hours for this task ...
6 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Webcam Snapshots
7 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Labeled Webcam Snapshots
create dataset X by randomly selecting N = 6 snapshots
manually categorise/label them (y(i) = 1 for summer)
8 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Towards an ML Problem
we have few labeled snapshots in X
need an algorithm/method/software-app to automatically
label all snapshots as either “winter” or “summer”
each snapshot is several MByte large
computational/time constraints force us to use more compact
representation (features)
what are good features of a snapshot for classifying summer
vs. winter?
9 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Redness, Greenness and Blueness
summer images are expected to be more colourful
winter images of Alps tend to contain much “white” (snow)
lets use redness xr , greenness xg and blueness xb
redness xr :=
j∈pixels
r[j] − (1/2)(g[j] + b[j])
greenness xg :=
j∈pixels
g[j] − (1/2)(r[j] + b[j])
blueness xb :=
j∈pixels
b[j] − (1/2)(r[j] + g[j])
r[j], g[j], b[j] denote red/green/blue intensity of pixel j
10 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
A Classification Problem
labeled dataset X = {(x(i), y(i))}N
i=1
feature vector x(i) = (x
(i)
r , x
(i)
g , x
(i)
b )T ∈ R3
label y(i) = 1 for summer and y(i) = 0 for winter
find a classifier h(·) : R3 → {0, 1} with y ≈ h(x)
which hypothesis space H and loss L(z, h(·)) should we use?
11 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Linear Regression Classifier
lets first try to recycle ideas from linear regression
use H = {h(w)(x) = wT x, for w ∈ Rd } and squared error loss
two shortcomings of this approach:
classifier h(w)
(x) can be any real number, while y ∈ {0, 1}
squared error loss would penalize correct decisions
12 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Outline
1 A Classification Problem
2 Logistic Regression
3 Support Vector Classification
4 Wrap Up
13 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Taking Label Space Into Account
lets exploit that labels y take only values 0 or 1
use predictor h(·) with h(x) ∈ [0, 1]
one such choice is
h(w,b)
(x) = g(wT
x + b) with g(z) := 1/(1 + exp(−z))
g(z) known as logistic or sigmoid function
classifier is parametrized by weight w and offset b
14 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The Sigmoid Function
15 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
A Probabilistic Interpretation
LogReg predicts y ∈{0, 1} by h(x)=g(w ·x+b)∈[0, 1]
lets model the label y and features x as random variables
features x are given/observed/measured
conditional probabilities P{y = 1|x} and P{y = 0|x}
estimate P{y = 1|x} by h(w,b)(x)
this yields the following relation
P{y|x} = h(w,b)
(x)y
(1 − h(w,b)
(x))(1−y)
16 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Logistic Regression
max. likelihood max
w,b
P{y|x}=h(w,b)(x)y (1−h(w,b)(x))(1−y)
max. P{y|x} equivalent to min. logistic loss
L((x, y), h(w,b)
(·)) := − log P{y|x}
= −y log h(w,b)
(x)−(1−y) log(1−h(w,b)
(x))
choose w and b via empirical risk minimisation
min
w
E{h(w,b)
(·)|X} =
1
N
N
i=1
L((x(i)
, y(i)
), h(·))
=
1
N
N
i=1
−y(i)
log h(x(i)
)−(1−y(i)
) log(1−h(x(i)
))
=
1
N
N
i=1
−y(i)
log g(wT
x(i)
+ b)−(1−y(i)
) log(1−g(wT
x(i)
+ b))
17 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
ID Card of Logistic Regression
input/feature space X = Rd
label space Y = [0, 1]
loss function L((x, y), h(·)) = −y log h(x)−(1−y) log(1−h(x))
hypothesis space
H = {h(w,b)(x)=g(wT x+b), with w ∈ Rd , b ∈ R}
classify y = 1 if h(w,b)(x) ≥ 0.5 and y = 0 otherwise
18 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Classifying with Logistic Regression
logistic regression problem
min
w,b
1
N
N
i=1
−y(i)
log g(wT
x(i)
+ b)−(1−y(i)
) log(1−g(wT
x(i)
+ b))
denote optimal point by w0 and b0
evaluate h(x) = g(wT
0 x + b0) for new data point
h(x) is an estimate for P(y = 1|x)
let us classify y = 1 if h(x) ≥ 1/2 and y = 0 else
partitions X in R1 ={x:h(x)≥1/2} and R0 ={x:h(x)<1/2}
19 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The Decision Boundary of Logistic Regression
20 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Learning a Logistic Regression Model
logistic regression problem
min
w,b
1
N
N
i=1
−y(i)
log g(wT
x(i)
+ b)−(1−y(i)
) log(1−g(wT
x(i)
+ b))
in contrast to LinReg, no closed-form solution here
however, we can use GD !
21 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
A Learning Algorithm for Classification
input: labeled data set X, step-size or learning rate α
output: classifier h(w,b)(x) = g(wT x + b)
initalize: k := 0 and w0 := 0 and b0 := 0
until stopping criterion satisfied do
(w(k+1)
, b(k+1)
):=(w(k)
, b(k)
)−α w,bE{h(w(k)
,b(k)
)
|X}
k := k + 1
set w := w(k), b := b(k)
22 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Outline
1 A Classification Problem
2 Logistic Regression
3 Support Vector Classification
4 Wrap Up
23 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Binary Linear Classifiers
logistic regression delivers a linear classifier
linear classifier specified by normal vector w and offset b
let us from now on code the binary labels as +1 and −1
output of linear classifier ˆy = I(h(w,b)(x) > 0) with linear
predictor h(w,b)(x) = wT x + b
we can use different loss functions for learning w and b!
seemingly, squared error loss is not good for binary labels
24 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Minimizing Error Probability
eventually, we aim at low error probability P{ˆy = y}
using 0/1-loss L((x, y), h(·)) = I(ˆy = y) we can approximate
P{ˆy = y} ≈ (1/N)
N
i=1
L((x(i)
, y(i)
), h(·))
the optimal classifier is then obtained by
min
h(·)∈H
N
i=1
L((x(i)
, y(i)
), h(·))
non-convex non-smooth optimization problem ! (there is a
work-around as we see in next lecture :-)
25 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The 0/1 Loss
26 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The Hinge Loss
27 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The Hinge Loss (y = 1)
28 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
The Hinge Loss (y = −1)
29 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Learning Linear Classifier via Hinge Loss
linear classifier h(w,b)(x) = wT x + b
choose w and b by minimizing hinge loss
L((x, y), h(w,b)
) = max{0, 1−y · h(w,b)
(x)}
= max{0, 1 − y · (wT
x + b)}
learn optimal classifier via empirical risk minimization
min
w,b
E(h(w,b)
|X) :=
1
N
N
i=1
L((x(i)
, y(i)
), h(w,b)
(·))
=
1
N
N
i=1
max{0, 1 − y(i)
(wT
x(i)
+ b)}
30 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
SVC Maximizes Margin
we can rewrite hinge loss as
L((x, y), h(w,b)
) = max{0, 1 − y · (wT
x + b)}
= min
ξ≥0
ξ s.t. ξ ≥ 1 − y · (wT
x + b)
“margin
minimizing hing loss means maximizing margin
min
w,b
E(h(w,b)
|X) =
1
N
N
i=1
max{0, 1 − y(i)
(wT
x(i)
+ b)}
=
1
N
min
ξ(i)≥0
N
i=1
ξ(i)
s.t. ξ(i)
≥ 1 − y(i)
· (wT
x(i)
+ b)
31 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
SVC Maximizes Margin
32 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
ID Card of Support Vector Classifier
input/feature space X = Rd
label space Y = {−1, 1}
loss function L((x, y), h(·)) = max{0, 1 − y · h(w,b)(x)}
hypothesis space
H = {h(w,b)(x)=wT x+b, with w ∈ Rd , b ∈ R}
classify y = 1 if h(w,b)(x) ≥ 0 and y = −1 else
33 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Outline
1 A Classification Problem
2 Logistic Regression
3 Support Vector Classification
4 Wrap Up
34 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
What We Learned Today
how to formulize a classification problem
different loss functions yield different classification methods
LogReg with logistic loss; amounts to maximum likelihood
SVC with hinge-loss and amounts to max. margin
LogReg and SVC are both parametric and linear classifiers
35 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Logistic Regression at a Glance
uses hypothesis space of linear classifiers
uses a probabilistic interpretation of predictions
tailored to particular likelihood (Gaussian ??)
ERM amounts to SMOOTH convex problem
36 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
Support Vector Classifier (Machine) at a Glance
uses hypothesis space of linear classifiers
based on geometry (maximum margin between classes)
can be extended by using feature methods (kernel methods)
ERM amounts to NON-SMOOTH cvx opt problem
37 / 38
aalto-logo-en-3
A Classification Problem
Logistic Regression
Support Vector Classification
Wrap Up
What Happens Next?
next lecture on two further classification methods (decision
trees and naive Bayes)
read Sec. 9.2 - 9.2.3 of
https://web.stanford.edu/~hastie/Papers/ESLII.pdf
fill out post-lecture questionnaire in MyCourses (contributes
to grade!)
38 / 38

More Related Content

What's hot

Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
SciCompIIT
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
charlesmartin14
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...
thahirakabeer
 
Financial Time Series Analysis Using R
Financial Time Series Analysis Using RFinancial Time Series Analysis Using R
Financial Time Series Analysis Using R
Majeed Simaan
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
Steve Nouri
 
BMC 2012
BMC 2012BMC 2012
A Dimension Abstraction Approach to Vectorization in Matlab
A Dimension Abstraction Approach to Vectorization in MatlabA Dimension Abstraction Approach to Vectorization in Matlab
A Dimension Abstraction Approach to Vectorization in Matlab
aiQUANT
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Marina Santini
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
Marina Santini
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
Alexander Litvinenko
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
Masahiro Suzuki
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
butest
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
Steve Nouri
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
Kazuki Fujikawa
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on Graphs
Mohamed Seif
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
Alexander Litvinenko
 

What's hot (20)

Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...
 
Financial Time Series Analysis Using R
Financial Time Series Analysis Using RFinancial Time Series Analysis Using R
Financial Time Series Analysis Using R
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
BMC 2012
BMC 2012BMC 2012
BMC 2012
 
A Dimension Abstraction Approach to Vectorization in Matlab
A Dimension Abstraction Approach to Vectorization in MatlabA Dimension Abstraction Approach to Vectorization in Matlab
A Dimension Abstraction Approach to Vectorization in Matlab
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on Graphs
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 

Similar to Linear Classifiers

A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
Honglin Yu
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
NoorUlHaq47
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
MahimMajee
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
Tianlu Wang
 
SVM.ppt
SVM.pptSVM.ppt
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes Classifiers
Alexander Jung
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
nozomuhamada
 
Xgboost
XgboostXgboost
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big Data
Alexander Jung
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
Steve Nouri
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
PriyankaRamavath3
 
Midterm sols
Midterm solsMidterm sols
Midterm sols
Robert Edwards
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with Regularization
Илья Трофимов
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dat Nguyen
 
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
Yuko Kuroki (黒木祐子)
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Ono Shigeru
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
NaglaaAbdelhady
 
The Elements of Machine Learning
The Elements of Machine LearningThe Elements of Machine Learning
The Elements of Machine Learning
Alexander Jung
 
Recursion in Java
Recursion in JavaRecursion in Java
Recursion in Java
Fulvio Corno
 

Similar to Linear Classifiers (20)

A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes Classifiers
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Xgboost
XgboostXgboost
Xgboost
 
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big Data
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
 
Midterm sols
Midterm solsMidterm sols
Midterm sols
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with Regularization
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
 
The Elements of Machine Learning
The Elements of Machine LearningThe Elements of Machine Learning
The Elements of Machine Learning
 
Recursion in Java
Recursion in JavaRecursion in Java
Recursion in Java
 

Recently uploaded

UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Recently uploaded (20)

UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

Linear Classifiers

  • 1. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up CS-E3210 Machine Learning: Basic Principles Lecture 5: Classification I slides by Alexander Jung, 2017 Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 38
  • 2. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Today’s Motto similar features give similar labels 2 / 38
  • 3. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Material this lecture is inspired by video lectures of Andrew Ng https://www.youtube.com/watch?v=-la3q9d7AKQ https://www.youtube.com/watch?v=7F-CuXdTQ5k lecture notes http://cs229.stanford.edu/notes/cs229-notes1.pdf Ch. 2.2 of the tutorial “Kernel Methods in Computer Vision” by Ch. Lampert https://pub.ist.ac.at/~chl/papers/ lampert-fnt2009.pdf lecture notes http://www.robots.ox.ac.uk/~az/ lectures/ml/lect2.pdf 3 / 38
  • 4. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up In A Nutshell today we consider classification problems consider data points z with features x and label y want to learn classifier h(·) for predicting y based on h(x) today we consider parametric classifiers h(w,b) a classifier is represented by parameters w, b we learn/find optimal parameters w, b using training data X once we have learnt optimal parameter, we can discard data ! 4 / 38
  • 5. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Outline 1 A Classification Problem 2 Logistic Regression 3 Support Vector Classification 4 Wrap Up 5 / 38
  • 6. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Ski Resort Marketing you are working in the marketing agency of a ski resort hard disk full of webcam snapshots (gigabytes of data) want to group them into “winter” and ”summer” images you have only a few hours for this task ... 6 / 38
  • 7. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Webcam Snapshots 7 / 38
  • 8. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Labeled Webcam Snapshots create dataset X by randomly selecting N = 6 snapshots manually categorise/label them (y(i) = 1 for summer) 8 / 38
  • 9. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Towards an ML Problem we have few labeled snapshots in X need an algorithm/method/software-app to automatically label all snapshots as either “winter” or “summer” each snapshot is several MByte large computational/time constraints force us to use more compact representation (features) what are good features of a snapshot for classifying summer vs. winter? 9 / 38
  • 10. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Redness, Greenness and Blueness summer images are expected to be more colourful winter images of Alps tend to contain much “white” (snow) lets use redness xr , greenness xg and blueness xb redness xr := j∈pixels r[j] − (1/2)(g[j] + b[j]) greenness xg := j∈pixels g[j] − (1/2)(r[j] + b[j]) blueness xb := j∈pixels b[j] − (1/2)(r[j] + g[j]) r[j], g[j], b[j] denote red/green/blue intensity of pixel j 10 / 38
  • 11. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up A Classification Problem labeled dataset X = {(x(i), y(i))}N i=1 feature vector x(i) = (x (i) r , x (i) g , x (i) b )T ∈ R3 label y(i) = 1 for summer and y(i) = 0 for winter find a classifier h(·) : R3 → {0, 1} with y ≈ h(x) which hypothesis space H and loss L(z, h(·)) should we use? 11 / 38
  • 12. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Linear Regression Classifier lets first try to recycle ideas from linear regression use H = {h(w)(x) = wT x, for w ∈ Rd } and squared error loss two shortcomings of this approach: classifier h(w) (x) can be any real number, while y ∈ {0, 1} squared error loss would penalize correct decisions 12 / 38
  • 13. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Outline 1 A Classification Problem 2 Logistic Regression 3 Support Vector Classification 4 Wrap Up 13 / 38
  • 14. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Taking Label Space Into Account lets exploit that labels y take only values 0 or 1 use predictor h(·) with h(x) ∈ [0, 1] one such choice is h(w,b) (x) = g(wT x + b) with g(z) := 1/(1 + exp(−z)) g(z) known as logistic or sigmoid function classifier is parametrized by weight w and offset b 14 / 38
  • 15. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The Sigmoid Function 15 / 38
  • 16. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up A Probabilistic Interpretation LogReg predicts y ∈{0, 1} by h(x)=g(w ·x+b)∈[0, 1] lets model the label y and features x as random variables features x are given/observed/measured conditional probabilities P{y = 1|x} and P{y = 0|x} estimate P{y = 1|x} by h(w,b)(x) this yields the following relation P{y|x} = h(w,b) (x)y (1 − h(w,b) (x))(1−y) 16 / 38
  • 17. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Logistic Regression max. likelihood max w,b P{y|x}=h(w,b)(x)y (1−h(w,b)(x))(1−y) max. P{y|x} equivalent to min. logistic loss L((x, y), h(w,b) (·)) := − log P{y|x} = −y log h(w,b) (x)−(1−y) log(1−h(w,b) (x)) choose w and b via empirical risk minimisation min w E{h(w,b) (·)|X} = 1 N N i=1 L((x(i) , y(i) ), h(·)) = 1 N N i=1 −y(i) log h(x(i) )−(1−y(i) ) log(1−h(x(i) )) = 1 N N i=1 −y(i) log g(wT x(i) + b)−(1−y(i) ) log(1−g(wT x(i) + b)) 17 / 38
  • 18. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up ID Card of Logistic Regression input/feature space X = Rd label space Y = [0, 1] loss function L((x, y), h(·)) = −y log h(x)−(1−y) log(1−h(x)) hypothesis space H = {h(w,b)(x)=g(wT x+b), with w ∈ Rd , b ∈ R} classify y = 1 if h(w,b)(x) ≥ 0.5 and y = 0 otherwise 18 / 38
  • 19. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Classifying with Logistic Regression logistic regression problem min w,b 1 N N i=1 −y(i) log g(wT x(i) + b)−(1−y(i) ) log(1−g(wT x(i) + b)) denote optimal point by w0 and b0 evaluate h(x) = g(wT 0 x + b0) for new data point h(x) is an estimate for P(y = 1|x) let us classify y = 1 if h(x) ≥ 1/2 and y = 0 else partitions X in R1 ={x:h(x)≥1/2} and R0 ={x:h(x)<1/2} 19 / 38
  • 20. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The Decision Boundary of Logistic Regression 20 / 38
  • 21. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Learning a Logistic Regression Model logistic regression problem min w,b 1 N N i=1 −y(i) log g(wT x(i) + b)−(1−y(i) ) log(1−g(wT x(i) + b)) in contrast to LinReg, no closed-form solution here however, we can use GD ! 21 / 38
  • 22. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up A Learning Algorithm for Classification input: labeled data set X, step-size or learning rate α output: classifier h(w,b)(x) = g(wT x + b) initalize: k := 0 and w0 := 0 and b0 := 0 until stopping criterion satisfied do (w(k+1) , b(k+1) ):=(w(k) , b(k) )−α w,bE{h(w(k) ,b(k) ) |X} k := k + 1 set w := w(k), b := b(k) 22 / 38
  • 23. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Outline 1 A Classification Problem 2 Logistic Regression 3 Support Vector Classification 4 Wrap Up 23 / 38
  • 24. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Binary Linear Classifiers logistic regression delivers a linear classifier linear classifier specified by normal vector w and offset b let us from now on code the binary labels as +1 and −1 output of linear classifier ˆy = I(h(w,b)(x) > 0) with linear predictor h(w,b)(x) = wT x + b we can use different loss functions for learning w and b! seemingly, squared error loss is not good for binary labels 24 / 38
  • 25. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Minimizing Error Probability eventually, we aim at low error probability P{ˆy = y} using 0/1-loss L((x, y), h(·)) = I(ˆy = y) we can approximate P{ˆy = y} ≈ (1/N) N i=1 L((x(i) , y(i) ), h(·)) the optimal classifier is then obtained by min h(·)∈H N i=1 L((x(i) , y(i) ), h(·)) non-convex non-smooth optimization problem ! (there is a work-around as we see in next lecture :-) 25 / 38
  • 26. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The 0/1 Loss 26 / 38
  • 27. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The Hinge Loss 27 / 38
  • 28. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The Hinge Loss (y = 1) 28 / 38
  • 29. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up The Hinge Loss (y = −1) 29 / 38
  • 30. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Learning Linear Classifier via Hinge Loss linear classifier h(w,b)(x) = wT x + b choose w and b by minimizing hinge loss L((x, y), h(w,b) ) = max{0, 1−y · h(w,b) (x)} = max{0, 1 − y · (wT x + b)} learn optimal classifier via empirical risk minimization min w,b E(h(w,b) |X) := 1 N N i=1 L((x(i) , y(i) ), h(w,b) (·)) = 1 N N i=1 max{0, 1 − y(i) (wT x(i) + b)} 30 / 38
  • 31. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up SVC Maximizes Margin we can rewrite hinge loss as L((x, y), h(w,b) ) = max{0, 1 − y · (wT x + b)} = min ξ≥0 ξ s.t. ξ ≥ 1 − y · (wT x + b) “margin minimizing hing loss means maximizing margin min w,b E(h(w,b) |X) = 1 N N i=1 max{0, 1 − y(i) (wT x(i) + b)} = 1 N min ξ(i)≥0 N i=1 ξ(i) s.t. ξ(i) ≥ 1 − y(i) · (wT x(i) + b) 31 / 38
  • 32. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up SVC Maximizes Margin 32 / 38
  • 33. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up ID Card of Support Vector Classifier input/feature space X = Rd label space Y = {−1, 1} loss function L((x, y), h(·)) = max{0, 1 − y · h(w,b)(x)} hypothesis space H = {h(w,b)(x)=wT x+b, with w ∈ Rd , b ∈ R} classify y = 1 if h(w,b)(x) ≥ 0 and y = −1 else 33 / 38
  • 34. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Outline 1 A Classification Problem 2 Logistic Regression 3 Support Vector Classification 4 Wrap Up 34 / 38
  • 35. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up What We Learned Today how to formulize a classification problem different loss functions yield different classification methods LogReg with logistic loss; amounts to maximum likelihood SVC with hinge-loss and amounts to max. margin LogReg and SVC are both parametric and linear classifiers 35 / 38
  • 36. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Logistic Regression at a Glance uses hypothesis space of linear classifiers uses a probabilistic interpretation of predictions tailored to particular likelihood (Gaussian ??) ERM amounts to SMOOTH convex problem 36 / 38
  • 37. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up Support Vector Classifier (Machine) at a Glance uses hypothesis space of linear classifiers based on geometry (maximum margin between classes) can be extended by using feature methods (kernel methods) ERM amounts to NON-SMOOTH cvx opt problem 37 / 38
  • 38. aalto-logo-en-3 A Classification Problem Logistic Regression Support Vector Classification Wrap Up What Happens Next? next lecture on two further classification methods (decision trees and naive Bayes) read Sec. 9.2 - 9.2.3 of https://web.stanford.edu/~hastie/Papers/ESLII.pdf fill out post-lecture questionnaire in MyCourses (contributes to grade!) 38 / 38