Logistic Regression

Machine
learning
workshop

guodong@hulu.com

Machine
learning
introduc7on

Logis&c
regression

Feature
selec7on

Boos7ng,
tree
boos7ng

See
more
machine
learning
post:
h>p://dongguo.me

Overview
of
machine
learning

Machine
Learning

Unsupervised

Learning

Supervised

Learning

Classiﬁca7on

Logis7c

regression

Semi-‐supervised

Learning

Regression

How
to
choose
a
suitable
model?

Characteris&c

Naïve

Bayes

Trees
K
Nearest

neighbor

Logis&c

regression

Neural

SVM

Networks

Computa7onal

scalability

3

3

1

3

1

1

Interpretability

2

2

1

2

1

1

Predic7ve
power

1

1

3

2

3

3

Natural
handling

data
of
“mixed”
type

1

3

1

1

1

1

Robustness
to

outliers
in
input

space

3

3

3

3

1

1

<Elements
of
Sta-s-cal
Learning>
II
P351

Why
model
can’t
perform
perfectly
on
unseen
data

•  Expected
risk

•  Empirical
risk

•  Choose
func7on
family
for
predic7on
func7ons

•  Error

Outline

• 
• 
• 
• 
• 

Introduc7on

Inference

Regulariza7on

Experiments

More

–  Mul7-‐nominal
LR

–  Generalized
linear
model

•  Applica7on

Logit
func7on
and
logis7c
func7on

•  Logit
func7on

•  logis7c
func7on:
Inversed
logit

Logis7c
regression

•  Predic7on
func7on

Inference
with
maximize
likelihood
(1)

•  Likelihood

•  Inference

Inference
with
maximize
likelihood
(2)

•  Inference
cont.

•  Use
gradient
descent

•  Stochas7c
gradient
descent

Regulariza7on

•  Penalize
large
weight
to
avoid
overﬁ`ng

–  L2
regulariza7on

–  L1
regulariza7on

Regulariza7on:
Maximum
a
posteriori

•  MAP

L2
regulariza7on
:
Gaussian
Prior

•  Gaussian
prior

•  MAP

•  Gradient
descent
step

L1
regulariza7on
:
Laplace
Prior

•  Laplace
prior

•  MAP

•  Gradient
descent
step

Implementa7on

•  L2
LR

_weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);

•  L1
LR

if (_weightOfFeatures[fea] > 0)
{
_weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam;
if (_weightOfFeatures[fea] < 0)
_weightOfFeatures[fea] = 0;
}else if (_weightOfFeatures[fea] < 0)
{
_weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam;
if (_weightOfFeatures[fea] > 0)
_weightOfFeatures[fea] = 0;
}else{
_weightOfFeatures[fea] += step * (feaValue * error);
}

L2
VS.
L1

•  L2
regulariza7on

–  Almost
all
weights
are
not
equal
to
zero

–  Not
suitable
when
training
samples
are
scarce

•  L1
regulariza7on

–  Produces
sparse
parameter
vectors

–  More
suitable
when
most
features
are
irrelevant

–  Could
handle
scarce
training
samples
be>er

Experiments

•  Dataset

–  Goal:
gender
predic7on

–  Dataset:
train
samples
(431k),
test
samples
(167k)

•  Comparison
algorithms

–  A:
gradient
descent
with
L1
regulariza7on

–  B:
gradient
descent
with
L2
regulariza7on

–  C:
OWL-‐QN
(L-‐BFGS
based
op7miza7on
with
L1
regulariza7on)

•  Parameters
choice

– 
– 
– 
– 

Regulariza7on
value

Step(learning
speed)

Decay
ra7o

Itera7on
over
condi7on

•  Max
itera7on
7mes(50)
||

AUC
change
<=0.0005

Experiments
(cont.)

•  Experiments
results

Parameters
and

metrics

gradient
descent
with
gradient
descent
with

L1

L2

OWL-‐QN

‘best’
regulariza7on
0.001~0.005

term

0.0002~0.001

1

Best
step

0.05

0.02~0.05

-‐

Best
decay
ra7o

0.85

0.85

-‐

Itera7on
7mes

26

20~26

48

Not
zero
feature
/

all
feature

10492/10938

10938/10938

6629/10938

AUC

0.8470

0.8463

0.8467

Mul7-‐nominal
logis7c
regression

•  Predic7on
func7on

•  Inference
with
maximize
likelihood

•  Gradient
descent
step
(L2)

More
Link
func7ons

•  Inference
with
maximize
likelihood

•  Link
func7on

•  Link
func7ons
for
binomial
distribu7on

–  Logit
func7on

–  Probit
func7on

–  Log-‐log
func7on

Generalized
linear
model

• 

What
is
GLM

–  Generaliza7on
of
linear
regression

–  Connect
linear
model
with
response
variable
by
link
func7on

–  More
distribu7on
for
response
variable

• 

Typical
GLM

• 

Overview

–  Linear
regression
,
Logis7c
regression,
Poisson
regression

Applica7on

•  Yahoo

–  <Personalized
Click
Predic7on
in
Sponsored
Search>
WSDM’10

•  Microsoq

–  <Scalable
Training
of
L1-‐Regularized
Log-‐Linear
Models>
ICML’07

•  Baidu

–  Contextual
ads
CTR
predic7on

•  h>p://www.docin.com/p-‐376254439.html

•  Hulu

– 
– 
– 
– 

Demographic
targe7ng

Other
ad-‐targe7ng
project

Custom
churn
predic7on

More…

Reference

•  ‘Scalable
Training
of
L1-‐Regularized
Log-‐Linear

Models’
ICML’07

–  h>p://www.docin.com/p-‐376254439.html#

•  ‘Genera-ve
and
discrimina-ve
classiﬁers:
Naïve

Bayes
and
logis-c
regression’
by
Mitchell

•  ‘Feature
selec-on,
L1
vs.
L2
regulariza-on,
and

rota-onal
invariance’
ICML’04

Recommended
resources

•  Machine
Learning
open
class
–
by
Andrew
Ng

–  //10.20.0.130/TempShare/Machine-‐Learning
Open
Class

•  h>p://www.cnblogs.com/vivounicorn/archive/
2012/02/24/2365328.html

•  logis7c
regression
Implementa7on[link]

–  //10.20.0.130/TempShare/guodong/Logis7c
regression
Implementa7on/

–  Support
binomial
and
mul7nominal
LR
with
L1
and
L2
regulariza7on

•  OWL-‐QN

–  //10.20.0.130/TempShare/guodong/OWL-‐QN/

Logistic Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Logistic Regression

Similar to Logistic Regression (20)

More from Dong Guo

More from Dong Guo (6)

Recently uploaded

Recently uploaded (20)

Logistic Regression

Editor's Notes