Linear Regression and Logistic Regression in ML

Linear Regression and Logistic
Regression
Compiled by : Dr. Kumud Kundu

Linear Regression
• In statistical modeling, linear regression analysis is a set of
statistical process that is utilized for estimating the relationships
between a dependent variable and one or more independent
variables. * (Wikipedia)
• Or
• It analyzes the influence of one or more independent variables on a
dependent variable
• Dependent variable — The attribute that is to be predicted
• Independent variables — The factors under consideration which
influence the prediction of dependent variable.

Expenditure(₹)
Income (₹)
If the relationship between Y and X is
believed to be linear, then the equation for a
line may be appropriate: Y = β1 + β2X, where
β1 is an intercept term and β2 is a slope
coefficient.

• y is termed as the dependent or study variable and X is termed as the independent or explanatory
• variable. The terms are the parameters of the model.
• These parameters are usually called as regression coefficients.
• The unobservable error component accounts for the difference between the true and observed
realization of y
• For each point the differences between the predicted point and the actual observation is the residue
• For simple linear regression we choose sum squared error (SSE)
• S (predictedi – actuali)2 = S (residuei)2. Thus, find the line which minimizes the sum of the squared
residues (e.g. least squares)

Obtain Linear Regression for the following dataset
Finding Learning parameters of
Linear Regression line
https://news.vidyaacademy.ac.in/wp-content/uploads/2018/10/NotesOnMachineLearningForBTech-1.pdf

Obtain Linear Regression for Data

Multiple Linear Regression
8
Y = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽N 𝑋N + 𝜀

Logistic Regression
• Logistic regression is a form of regression analysis in which the
dependent or outcome variable is binary or dichotomous.
• It is the form of regression practically used for label classification
rather than prediction of continuous value

Logistic Regression?
What is the “Logistic” component?
• The term “Logistic” is taken from the Logit function that is used in this
method of classification. The logit function is log of odds and is equivalent
to Sigmoid function, which takes any real value between zero and one.
Sigmoid function:
What is the “Regression” component?
Methods used to quantify association between an outcome and
predictor variables.
So, Logistic Regression is defined as technique which instead of modeling
the outcome (Y) directly, models the log odds(Y) using the logistic function.
 
 
exp
( ) ln
(1 ) 1 exp
zp
LOGIT p z p
p z
 
    
  

Odds of an Event
• Odds are used to describe the chance of an event occurring.
• The odds are the ratios that compare the number of ways the event
can occur with the number of ways the event cannot occur.
• The odds of an event is the ratio of the probability of an event to the
probability of its complement.
• In other words, it is the ratio of favorable outcomes to unfavorable
outcomes.
• We say the odds are "3 to 2," which means 3 favorable outcomes to every
2 unfavorable outcomes, and we write 3 : 2.

( ) log
(1 )
p
LOGIT p z
p
 

exp( )
1 exp( )
z
p
z


 
 
exp
( ) ln
(1 ) 1 exp
zp
LOGIT p z p
p z
 
    
  
The Logistic Curve
z (log odds)
p(probability)

*https://www.machinelearningplus.com/wp-content/uploads/2017/09/linear_vs_logistic_regression.jpg
Linear Regression Vs. Logistic Regression

The Logistic Regression Model
Logistic Regression:
ln
P Y
1−P Y
= 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽K 𝑋K
Linear Regression:

 
  0 1 1 2 2 K K
P Y
ln
1-P Y
X X X   
 
      
 
K
Independent variables
 
 





 YP1
YP
ln is the log(odds) of the outcome.
Binary outcome

 
  0 1 1 2 2 K K
P Y
ln
1-P Y
X X X   
 
      
 
K
 
 





 YP1
YP
ln is the log(odds) of the outcome.
Regression coefficients

Form for Predicted Probabilities
 
 
 
 
 
0 1 1 2 2 K K
0 1 1 2 2 K K
0 1 1 2 2 K K
P Y
ln
1-P Y
exp
P Y
1 exp
X X X
X X X
X X X
   
   
   
 
      
 
   

    
K
c
K
K
In this latter form, the logistic regression model directly relates the
probability of Y to the predictor variables.

Relationship between
Odds & Probability
 
 
 
 
 
 
Probability event
Odds event =
1-Probability event
Odds event
Probability event
1+Odds event


ln
Pr Y
1 − Pr Y
= 2.67 − 0.13 ∗ X1
⇕
Pr Y =
exp 2.67 − 0.13 ∗ X1
1 + exp 2.67 − 0.13 ∗ X1
What is the effect of X1 on Y?
= exp −0.13 = 0.88
This implies that for every 1 unit increase in X1, the odds of Y decrease by 12%.

Regression of Log Odds
22
Medication
Dosage
#
Cured
Total
Patients
Probability:
# Cured/Total
Patients
Odds:
p/(1-p) =
# cured/
# not cured
Log
Odds:
ln(Odds)
20 1 5 .20 .25 -1.39
30 2 6 .33 .50 -0.69
40 4 6 .67 2.0 0.69
50 6 7 .86 6.0 1.79
0 10 20 30 40 50 60
+2
-2
0
• y = .11x – 3.8 - Regression equation
• We transform that log odds point to a probability: p = elogit(x)/(1+elogit(x))
• For example assume we want p for dosage = 10
Logit(10) = .11(10) – 3.8 = -2.7
p(10) = e-2.7/(1+e-2.7) = .06
prob.
Cured
0
1

Linear Regression Regression Model Implementation using
sklearn
from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
linreg.predict(X_test)

Logistic Regression Model Implementation using sklearn
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
logreg.predict(X_test)

Evaluation Metrics of Regression models

from sklearn import metrics
• print('Absolute Mean Error',metrics.mean_absolute_error(y_pred,y_test))
• print('Mean Squared Error', metrics.mean_squared_error(y_pred,y_test))
• print('Root Mean Square Error', metrics.r2_score(y_pred, y_test))

Confusion Matrix
A confusion matrix is a table that is often used to describe the performance
of a classification model on a set of test data for which the true values are
known.
• True Positives (TP) - These are the
correctly predicted positive values which
means that the value of actual class is yes
and the value of predicted class is also
yes.
• True Negatives (TN) - These are the
correctly predicted negative values which
means that the value of actual class is no
and value of predicted class is also no.
• False Positives (FP) Type 1 error–
When actual class is no and predicted
class is yes.
• False Negatives (FN) Type 2 error–
When actual class is yes but predicted
class in no.
https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/

• Accuracy, Precision, and
Recall
• Accuracy is the proportion of
true results among the total
number of cases examined.
• Accuracy =
(TP+TN)/(TP+FP+FN+TN)
• Accuracy is a valid choice of
evaluation for classification
problems which have no class
imbalance.
Evaluation Metrics of Classification models

Recall (Senstivity)
What proportion of Actual Positives is correctly
classified?
Recall = (TP)/(TP+FN)
Any Recall value above 0.5 is considered as good.
Precision
Precision is the ratio of correctly predicted positive
observations to the total predicted positive
observations.
High precision relates to the low false positive rate.
Precision = TP/TP+FP
F1 Score: It is the harmonic mean of precision and
recall.
The F1 score is a number between 0 and 1
Always aim to have a model with both good precision and recall.

Compute Accuracy, Precision, Recall, F1-Score
• The total outcome values are:
• TP = 30, TN = 930, FP = 30,
FN = 10
https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/

• The accuracy for a model turns out to
be: 96%
https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/
50% percent of the correctly predicted
cases turned out to be positive cases.
Whereas 75% of the positives were
successfully predicted by the model.

Calculate the F1 score for binary prediction
problems using:
from sklearn.metrics import f1_score
y_true = [0, 1, 1, 0, 1, 1]
y_pred = [0, 0, 1, 0, 0, 1]
f1_score(y_true, y_pred)

Linear Regression and Logistic Regression in ML

Linear Regression and Logistic Regression in ML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear Regression and Logistic Regression in ML

Similar to Linear Regression and Logistic Regression in ML (20)

Recently uploaded

Recently uploaded (20)

Linear Regression and Logistic Regression in ML

Editor's Notes