2. Logistic Regression (Intuition)
Logistic Regression main to slove(Binary question)
Find a logistic regression function to predict event will happen or not
「people get disease or not」、「company bankrupt or not」
example:probability of a successful reaction at a certain temperature
2
real
observed
values
succes
s
fail
temperature
5. Logistic Regression (Requirement)
1. Can handle not normal distribution data
Logistic regression can handle not normal distribution data,
because it use non-linear log transformation
2. Cannot has collinearity (共線性)
One independent variable can be others independent variables
linear combination
example: two independent variables X1, X2,complete collinearity
means X1 = a + bX2
3. Dependent variable(Y) need to be binary distribution
,example:(0, 1)
5
6. Logistic Regression (Solution)
Logistic Regression by Logit function(𝜎 𝑡 ) calculate
「 maximum likelihood 」to find coefficient of logit function
Purpose:After put training sample into logit function,find
Residuals sum minimum logit function coefficient
case0 probability more close to 0,
case1 probability more close to 1
6
Residuals
Y = σ(β0+β1X1+β2X2+β3X3+……+βiXi), 𝜎 𝑡 =
1
1+𝑒−𝑡
7. Example
7
Y 𝑥 =
1
1 + 𝑒−(𝛽0+𝛽1 𝑋1)
R1: Y 𝑥 =
1
1+𝑒−(1+𝑋1)
Normal Residuals : 0.88+0.92+0.94+0.95+0.97+0.98 = 5.64
Bankruptcy Residuals : 6-(1+1+1+1+1+1) = 0
Sum: 5.64
R2: Y 𝑥 =
1
1+𝑒−(−298+81𝑋1)
Normal Residuals : 0+0+0+0+0+0 = 0
Bankruptcy Residuals : 6-(1+1+1+1+1+1) = 0
Sum: 0
Y 𝑥 =
1
1 + 𝑒−(1+𝑋1)
Y 𝑥 =
1
1 + 𝑒−(−298+81𝑋1)
8. Discriminant Analysis (Intuition)
Discriminant Analysis
We know classified group,when new sample coming we can choose a classify
standard to determine which group should new sample should classify into.
example:medical disease classification of patient, company financial status
classification
8
9. Binary Logistic regression (BLR) vs
Linear Discriminant analysis (with 2 groups: also known as
Fisher's LDA)
BLR
1. Not so exigent to the level of the
scale and the form of the
distribution in predictors
2. Not so sensitive to outliers
LDA
1. Predictirs desirably interval level
with multivariate normal
distribution.
2. Quite sensitive to outliers.
10. Linear Discriminant Analysis V.S. Logistic
Regression(example_1)
Sample data are two classes data are normal distribution
Class_1
(blue dot)
-2 -0.90909 0.181818 1.272727 2.363636 3.454545 4.545455 5.636364 6.727273 7.818182 8.909091 10
Class_2
(red dot)
-10 -8.90909 -7.81818 -6.72727 -5.63636 -4.54545 -3.45455 -2.36364 -1.27273 -0.18182 0.909091 2
Class_1 is average split 12 dots from 10
to -2
Class_2 is average split 12 dots from 2 to
-10
14. Sample data are two classes data are normal distribution
One of them has Outlier
Linear Discriminant Analysis V.S. Logistic
Regression(example_3)
Class_1
(blue dot)
-2 -0.8 0.4 1.6 2.8 4 5.2 6.4 7.6 8.8 10
20(Outlie
r)
Class_2
(red dot)
-10 -8.90909 -7.81818 -6.72727 -5.63636 -4.54545 -3.45455 -2.36364 -1.27273 -0.18182 0.909091 2
Outlier
Class_1 is average split 12 dots from 10 to -2
Class_2 is average split 12 dots from 2 to -10
Logistic regression can handle all sorts of relationships, because it applies a non-linear log transformation
Assumptions:
The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
Distribution of Yi is Bin(ni, πi), i.e., binary logistic regression model assumes binomial distribution of the response. The dependent variable does NOT need to be normally distributed, but it typically assumes a distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal,...)
Does NOT assume a linear relationship between the dependent variable and the independent variables, but it does assume linear relationship between the logit of the response and the explanatory variables; logit(π) = β0 + βX.
Independent (explanatory) variables can be even the power terms or some other nonlinear transformations of the original independent variables.
The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in many cases given the model structure.
Errors need to be independent but NOT normally distributed.
It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to estimate the parameters, and thus relies on large-sample approximations.
Goodness-of-fit measures rely on sufficiently large samples, where a heuristic rule is that not more than 20% of the expected cells counts are less than 5.