SlideShare a Scribd company logo
1 of 46
BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus
Machine Learning
AIML CZG565
Linear Models for Classification
Dr. Sugata Ghosal
sugata.ghosal@pilani.bits-pilani.ac.in
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
• Discriminant Functions
• Probabilistic Generative Classifiers
• Probabilistic Discriminative Classifiers
• Logistic Regression
• Applications : Text classification model, Image classification
Agenda
BITS Pilani
Decision Theory
&
Classification Models
BITS Pilani, Pilani Campus
Inductive Learning Hypothesis :Interpretation
• Target Concept : t
• Discrete : f(x) ∈ {Yes, No, Maybe}
• Continuous : f(x) ∈ [20-100]
• Probability Estimation : f(x) ∈ [0-1]
5
Classification
Regression
BITS Pilani, Pilani Campus
Decision Theory
• Target Concept : t
• Discrete : f(x) ∈ {Yes, No} ie., t ∈ {0, 1}
• Continuous : f(x) ∈ [20-100]
• Probability Estimation : f(x) ∈ [0-1]
6
Binary Classification
BITS Pilani, Pilani Campus
Decision Theory :
The decision problem: given x, predict t according to a probabilistic model p(x, t)
• Target Concept : t
• Discrete : f(x) ∈ {Yes, No} ie., t ∈ {0, 1}
• Continuous : f(x) ∈ [20-100]
• Probability Estimation : f(x) ∈ [0-1]
7
X =< , , , , , >  = P(Ck| X)
p(x, Ck ) is the (central!) inference problem
BITS Pilani, Pilani Campus
Classification Problem: Stages
8
Induction/
Inference
step
Deduction/
Decision Step
Test Set
Training Set Model
Learning
algorithm
Learn
Model for
p(x,Ck)
Apply Model
to find
optimal t
BITS Pilani, Pilani Campus
Decision Region
9
Induction/
Inference
step
Training Set Model
Learning
algorithm
Learn
Model for
p(x,Ck)
Model divides the input
space into regions Rk called
decision regions, one for
each class, such that all
points in Rk are assigned to
class Ck
A mistake occurs when an
input vector belonging to
class C1 is assigned to class
C2
BITS Pilani, Pilani Campus
Misclassification Rate
10
To minimize p(mistake),
each x is assigned to
whichever class has the
smaller value of the
integrand
The minimum probability of
making a mistake is
obtained if each value of x is
assigned to the class for
which the posterior
probability p(Ck|x) is
largest.
BITS Pilani, Pilani Campus
Decision Theory - Summary
11
BITS Pilani
Linear Models for Classification
BITS Pilani, Pilani Campus
Types of Classification
Inductive Learning Hypothesis :Interpretation
• Target Concept
• Discrete : f(x) ∈ {Yes, No, Maybe}
• Continuous : f(x) ∈ [20-100]
• Probability Estimation : f(x) ∈ [0-1]
13
Classification
Regression
BITS Pilani, Pilani Campus
Types of Classification
Output Labels
• Target Concept
14
BITS Pilani, Pilani Campus
Types of Classification
Output Labels
• Target Concept
15
BITS Pilani, Pilani Campus
Prediction – Multi class Classification
One Vs All Strategy(one-vs-rest)
𝑥2
𝑥1
Class 1:
Class 2:
Class 3:
ℎ𝜃
𝑖
𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3)
𝑥2
𝑥1
𝑥2
𝑥1
𝑥2
𝑥1
ℎ𝜃
1
𝑥
ℎ𝜃
2
𝑥
ℎ𝜃
3
𝑥
For input x Predict ∶ max
i
ℎ𝜃
𝑖
𝑥
Note: Scikit-Learn detects when you try to use a binary classification
algorithm for a multi‐class classification task, and it automatically runs
OvA (except for SVM classifiers for which it uses OvO)
BITS Pilani, Pilani Campus
Prediction – Multi class Classification
One Vs One Strategy
𝑥2
𝑥1
Class 1:
Class 2:
Class 3:
ℎ𝜃
𝑖
𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3)
ℎ𝜃
1
𝑥
ℎ𝜃
2
𝑥
ℎ𝜃
3
𝑥
For input x Predict ∶ max
i
ℎ𝜃
𝑖
𝑥
N × (N – 1) / 2 classifiers
𝑥2
𝑥1
𝑥2
𝑥1
𝑥2
𝑥1
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Sky AirTemp Humidity Wind Forecast Enjoy
Sport?
Sunny Warm Normal Strong Same Yes
Sunny Warm High Strong Same No
Rainy Cold High Strong Change No
Sunny Warm Normal Breeze Same Yes
Sunny Hot Normal Breeze Same No
Rainy Cold High Strong Change No
Sunny Warm High Strong Change Yes
Rainy Warm Normal Breeze Same Yes
Types of Classification
Decision Theory: Interpretation Model Building
x1
x2
Generative
)
(
)
(
)
|
(
)
|
(
2
1
2
1
2
1
d
d
n
X
X
X
P
Y
P
Y
X
X
X
P
X
X
X
Y
P


 
Known as generative models, because by sampling
from them it is possible to generate synthetic data
points in the input space.
Eg., Gaussians, Naïve Bayes, Mixtures of multinomials
, Mixtures of Gaussians, Bayesian networks
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Sky AirTemp Humidity Wind Forecast Enjoy
Sport?
Sunny Warm Normal Strong Same Yes
Sunny Warm High Strong Same No
Rainy Cold High Strong Change No
Sunny Warm Normal Breeze Same Yes
Sunny Hot Normal Breeze Same No
Rainy Cold High Strong Change No
Sunny Warm High Strong Change Yes
Rainy Warm Normal Breeze Same Yes
Types of Classification
Decision Theory: Interpretation Model Building
Logistic regression, SVMs , tree based
classifiers (e.g. decision tree) Traditional neural
networks, Nearest neighbor
x1
x2
Decision
Boundary
Discriminative
y=1
y=0
𝜃0 +
𝑖
𝜃𝑖𝑥𝑖 ≥ 0
𝜃0 +
𝑖
𝜃𝑖𝑥𝑖 < 0
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Sky AirTemp Humidity Wind Forecast Enjoy
Sport?
Sunny Warm Normal Strong Same Yes
Sunny Warm High Strong Same No
Rainy Cold High Strong Change No
Sunny Warm Normal Breeze Same Yes
Sunny Hot Normal Breeze Same No
Rainy Cold High Strong Change No
Sunny Warm High Strong Change Yes
Rainy Warm Normal Breeze Same Yes
Types of Classification
Decision Theory: Interpretation Model Building
Logistic regression, SVMs , tree based
classifiers (e.g. decision tree) Traditional neural
networks, Nearest neighbor
𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Overcast THEN PLAY = Yes
ELSE
𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Rain AND WIND = Strong
THEN PLAY = No
BITS Pilani, Pilani Campus
BITS Pilani
Logistic Regression
BITS Pilani, Pilani Campus
If
If
, predict “y = 1”
, predict “y = 0”
Malignant
?
(Yes) 1
(No) 0
• Tumor Size
• Can we solve the problem using linear
regression? E.g., fit a straight line and
define a threshold at 0.5
• Threshold classifier output h(x) at 0.5:
Logistic Regression vs Least Squares Regression
A Discriminant function f (x)
directly map input to class labels
In two-class problem, f (.) is binary valued
BITS Pilani, Pilani Campus
Decision Rules
• Classifier:
f (x,w) = wo + wT x (linear discriminant function)
• Decision rule is
• Mathematically
y = sign(w0 + wT x)
• This specifies a linear classifier: it has a linear boundary (hyperplane)
w0 + wT x = 0
A discriminant is a function that takes an input vector x and assigns it
to one of K classes, denoted Ck.
BITS Pilani, Pilani Campus
Learning model parameters
• Training set:
• m examples
• How to choose parameters (feature weights) ?
27
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Logistic Regression
• ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2
– e.g., 𝜃0 = −3, 𝜃1 = 1, 𝜃2 = 1
• Predict “𝑦 = 1” if −3 + 𝑥1 + 𝑥2 ≥ 0
• At decision boundary output of logistic regression is 0.5
Tumor Size
Age
Decision
boundary
Slide credit: Andrew Ng
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Logistic Regression
Slide credit: Andrew Ng
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Logistic Regression
• Training set:
• How to choose parameters (feature weights)?
• m examples
• n features
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Logistic Regression
• Training set:
• How to choose parameters (feature weights)?
“non-convex” “convex”
BITS Pilani, Pilani Campus
Logistic regression cost function (cross entropy)
If y = 1
1
0
Cost
BITS Pilani, Pilani Campus
Logistic regression cost function
If y = 0
1
0
Cost
Cost=0; If y=0 and hꝊ(x)=0
BITS Pilani, Pilani Campus
Output :
To fit parameters : Apply Gradient Descent Algorithm
To make a prediction given new :
Cost function
BITS Pilani, Pilani Campus
Gradient Descent Algorithm
𝐽 𝜃 = −
1
𝑚
𝑖=1
𝑚
𝑦(𝑖)
log ℎ𝜃 𝑥(𝑖)
+ (1 − 𝑦(𝑖)
) log 1 − ℎ𝜃 𝑥(𝑖)
Goal: min
𝜃
𝐽(𝜃)
Repeat
{
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼
𝜕
𝜕𝜃𝑗
𝐽 𝜃
}
𝜕
𝜕𝜃𝑗
𝐽 𝜃 =
1
𝑚
𝑖=1
𝑚
(ℎ𝜃 𝑥 𝑖
− 𝑦(𝑖)
) 𝑥𝑗
(𝑖)
BITS Pilani, Pilani Campus
Linear Regression
Repeat {
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼
1
𝑚
𝑖=1
𝑚
ℎ𝜃 𝑥 𝑖
− 𝑦(𝑖)
𝑥𝑗
(𝑖)
}
Logistic Regression
Repeat {
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼
1
𝑚
𝑖=1
𝑚
ℎ𝜃 𝑥 𝑖
− 𝑦(𝑖)
𝑥𝑗
(𝑖)
}
ℎ𝜃 𝑥 = 𝜃⊤
𝑥
ℎ𝜃 𝑥 =
1
1 + 𝑒−𝜃⊤𝑥
Slide credit: Andrew
Ng
Gradient Descent Algorithm
BITS Pilani, Pilani Campus
Example: Sentiment Analysis
Sentiment Features
BITS Pilani, Pilani Campus
Classifying sentiment using logistic regression
BITS Pilani, Pilani Campus
Logistic Regression – Fit a Model
CGPA IQ IQ Job Offered
5.5 6.7 100 1
5 7 105 0
8 6 90 1
9 7 105 1
6 8 120 0
7.5 7.3 110 0
Hyper parameters:
Learning Rate = 0.3
Initial Weights = (0.5, 0.5,0.5)
Regularization Constant = 0
𝜃TX = 0.5 +0.5 CGPA + 0.5 IQ
ℎ𝜃 𝑥 =
1
1 + 𝑒−𝜃⊤𝑥
𝜃𝐶𝐺𝑃𝐴 ≔ 𝜃𝐶𝐺𝑃𝐴 − 0.3
1
6
𝑖=1
6
ℎ𝜃 𝑥 𝑖
− 𝑦(𝑖)
𝑥𝐶𝐺𝑃𝐴
(𝑖)
𝜃𝐼𝑄 ≔ 𝜃𝐼𝑄 − 0.3
1
6
𝑖=1
6
ℎ𝜃 𝑥 𝑖
− 𝑦(𝑖)
𝑥𝐼𝑄
(𝑖)
𝜃0 ≔ 𝜃0 − 0.3
1
6
𝑖=1
6
ℎ𝜃 𝑥 𝑖
− 𝑦 𝑖
(1)
ℎ𝜃 𝑥 =
1
1 + 𝑒−(0.5 +0.5 𝐶𝐺𝑃𝐴 + 0.5 𝐼𝑄 )
Approx. : New weights
𝜃0 = 0.4
𝜃1=𝐶𝐺𝑃𝐴 =-0.4
𝜃2=𝐼𝑄 = −0.6
BITS Pilani, Pilani Campus
Logistic Regression – Inference & Interpretation
CGPA IQ IQ Job Offered
5.5 6.7 100 1
5 7 105 0
8 6 90 1
9 7 105 1
6 8 120 0
7.5 7.3 110 0
Assume : 0.4+0.3CGPA-0.45IQ
Predict the Job offered for a candidate : (5, 6)
h(x) = 0.31
Y-Predicted = 0 / No
Note :
The exponential function of the regression coefficient (ew-cpga) is the odds ratio associated with a one-unit
increase in the cgpa.
+ The odd of being offered with job increase by a factor of 1.35 for every unit increase in the CGPA
[np.exp(model.params)]
BITS Pilani, Pilani Campus
Logistic regression (Classification)
• Model
ℎ𝜃 𝑥 = 𝑃 𝑌 = 1 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 =
1
1+𝑒−𝜃⊤𝑥
• Cost function
𝐽 𝜃 =
1
𝑚
𝑖=1
𝑚
Cost(ℎ𝜃(𝑥 𝑖
), 𝑦(𝑖)
)) Cost(ℎ𝜃 𝑥 , 𝑦) =
−log ℎ𝜃 𝑥 if 𝑦 = 1
−log 1 − ℎ𝜃 𝑥 if 𝑦 = 0
• Learning
Gradient descent: Repeat {𝜃𝑗 ≔ 𝜃𝑗 − 𝛼
1
𝑚 𝑖=1
𝑚
ℎ𝜃 𝑥 𝑖
− 𝑦 𝑖
𝑥𝑗
𝑖
}
• Inference
𝑌 = ℎ𝜃 𝑥test
=
1
1 + 𝑒−𝜃⊤𝑥test
Note:
• σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0, so a Logistic model predicts 1 if xTθ is positive, and 0 if it is negative
• logit(p) = log(p / (1 - p)), is the inverse of the logistic function. Indeed, if you compute the logit of the estimated probability
p, you will find that the result is t. The logit is also called the log-odds
BITS Pilani, Pilani Campus
Overfitting vs Underfitting
Overfitting
• Fitting the data too well
– Features are noisy / uncorrelated to
concept
Underfitting
• Learning too little of the true
concept
– Features don’t capture concept
– Too much bias in model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BITS Pilani, Pilani Campus
BITS Pilani
Regularization
Note : This topic is already covered in the module 3 . Refresher & few more
points added here
BITS Pilani, Pilani Campus
Regularization
BITS Pilani, Pilani Campus
Ways to Control Overfitting
• Regularization
𝐿𝑜𝑠𝑠 𝑆 =
𝑖
𝑛
𝐿𝑜𝑠𝑠(𝑦𝑖
^
, 𝑦𝑖) + 𝛼
𝑗
# 𝑊𝑒𝑖𝑔ℎ𝑡𝑠
|θ𝑗|
45
Note:
The hyperparameter controlling the regularization strength of a Scikit-Learn LogisticRegression model is not
alpha (as in other linear models), but its inverse: C. The higher the value of C, the less the model is
regularized.
BITS Pilani, Pilani Campus
Regularization
BITS Pilani, Pilani Campus
Understanding Regularization
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Regularization
RidgeRegression/Tikhonovregularization
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Regularization
LassoRegression(LeastAbsoluteShrinkageandSelectionOperatorRegression)
BITS Pilani, Pilani Campus
Thank you !
Next Session Plan :
Module 5 – Decision Tree Classifier
Required Reading for completed session :
T1 - Chapter # 6 (Tom M. Mitchell, Machine Learning)
R1 – Chapter # 3,#4 (Christopher M. Bhisop, Pattern Recognition & Machine
Learning) & Refresh your MFDS course basics

More Related Content

Similar to Module 4 - Linear Model for Classification.pptx

Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Machine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxMachine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxGulamSarwar31
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.pptImXaib
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Conceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesConceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesJuanPabloCarbajal3
 
Parallel Bayesian Optimization
Parallel Bayesian OptimizationParallel Bayesian Optimization
Parallel Bayesian OptimizationSri Ambati
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016Pablo Ginestet
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).pptmuqadsatareen
 

Similar to Module 4 - Linear Model for Classification.pptx (20)

Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Machine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxMachine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptx
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Conceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesConceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian Processes
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Parallel Bayesian Optimization
Parallel Bayesian OptimizationParallel Bayesian Optimization
Parallel Bayesian Optimization
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).ppt
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Module 4 - Linear Model for Classification.pptx

  • 1. BITS Pilani Pilani Campus BITS Pilani Pilani Campus Machine Learning AIML CZG565 Linear Models for Classification Dr. Sugata Ghosal sugata.ghosal@pilani.bits-pilani.ac.in
  • 2. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus • Discriminant Functions • Probabilistic Generative Classifiers • Probabilistic Discriminative Classifiers • Logistic Regression • Applications : Text classification model, Image classification Agenda
  • 4. BITS Pilani, Pilani Campus Inductive Learning Hypothesis :Interpretation • Target Concept : t • Discrete : f(x) ∈ {Yes, No, Maybe} • Continuous : f(x) ∈ [20-100] • Probability Estimation : f(x) ∈ [0-1] 5 Classification Regression
  • 5. BITS Pilani, Pilani Campus Decision Theory • Target Concept : t • Discrete : f(x) ∈ {Yes, No} ie., t ∈ {0, 1} • Continuous : f(x) ∈ [20-100] • Probability Estimation : f(x) ∈ [0-1] 6 Binary Classification
  • 6. BITS Pilani, Pilani Campus Decision Theory : The decision problem: given x, predict t according to a probabilistic model p(x, t) • Target Concept : t • Discrete : f(x) ∈ {Yes, No} ie., t ∈ {0, 1} • Continuous : f(x) ∈ [20-100] • Probability Estimation : f(x) ∈ [0-1] 7 X =< , , , , , >  = P(Ck| X) p(x, Ck ) is the (central!) inference problem
  • 7. BITS Pilani, Pilani Campus Classification Problem: Stages 8 Induction/ Inference step Deduction/ Decision Step Test Set Training Set Model Learning algorithm Learn Model for p(x,Ck) Apply Model to find optimal t
  • 8. BITS Pilani, Pilani Campus Decision Region 9 Induction/ Inference step Training Set Model Learning algorithm Learn Model for p(x,Ck) Model divides the input space into regions Rk called decision regions, one for each class, such that all points in Rk are assigned to class Ck A mistake occurs when an input vector belonging to class C1 is assigned to class C2
  • 9. BITS Pilani, Pilani Campus Misclassification Rate 10 To minimize p(mistake), each x is assigned to whichever class has the smaller value of the integrand The minimum probability of making a mistake is obtained if each value of x is assigned to the class for which the posterior probability p(Ck|x) is largest.
  • 10. BITS Pilani, Pilani Campus Decision Theory - Summary 11
  • 11. BITS Pilani Linear Models for Classification
  • 12. BITS Pilani, Pilani Campus Types of Classification Inductive Learning Hypothesis :Interpretation • Target Concept • Discrete : f(x) ∈ {Yes, No, Maybe} • Continuous : f(x) ∈ [20-100] • Probability Estimation : f(x) ∈ [0-1] 13 Classification Regression
  • 13. BITS Pilani, Pilani Campus Types of Classification Output Labels • Target Concept 14
  • 14. BITS Pilani, Pilani Campus Types of Classification Output Labels • Target Concept 15
  • 15. BITS Pilani, Pilani Campus Prediction – Multi class Classification One Vs All Strategy(one-vs-rest) 𝑥2 𝑥1 Class 1: Class 2: Class 3: ℎ𝜃 𝑖 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥2 𝑥1 𝑥2 𝑥1 𝑥2 𝑥1 ℎ𝜃 1 𝑥 ℎ𝜃 2 𝑥 ℎ𝜃 3 𝑥 For input x Predict ∶ max i ℎ𝜃 𝑖 𝑥 Note: Scikit-Learn detects when you try to use a binary classification algorithm for a multi‐class classification task, and it automatically runs OvA (except for SVM classifiers for which it uses OvO)
  • 16. BITS Pilani, Pilani Campus Prediction – Multi class Classification One Vs One Strategy 𝑥2 𝑥1 Class 1: Class 2: Class 3: ℎ𝜃 𝑖 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) ℎ𝜃 1 𝑥 ℎ𝜃 2 𝑥 ℎ𝜃 3 𝑥 For input x Predict ∶ max i ℎ𝜃 𝑖 𝑥 N × (N – 1) / 2 classifiers 𝑥2 𝑥1 𝑥2 𝑥1 𝑥2 𝑥1
  • 17. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Sky AirTemp Humidity Wind Forecast Enjoy Sport? Sunny Warm Normal Strong Same Yes Sunny Warm High Strong Same No Rainy Cold High Strong Change No Sunny Warm Normal Breeze Same Yes Sunny Hot Normal Breeze Same No Rainy Cold High Strong Change No Sunny Warm High Strong Change Yes Rainy Warm Normal Breeze Same Yes Types of Classification Decision Theory: Interpretation Model Building x1 x2 Generative ) ( ) ( ) | ( ) | ( 2 1 2 1 2 1 d d n X X X P Y P Y X X X P X X X Y P     Known as generative models, because by sampling from them it is possible to generate synthetic data points in the input space. Eg., Gaussians, Naïve Bayes, Mixtures of multinomials , Mixtures of Gaussians, Bayesian networks
  • 18. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Sky AirTemp Humidity Wind Forecast Enjoy Sport? Sunny Warm Normal Strong Same Yes Sunny Warm High Strong Same No Rainy Cold High Strong Change No Sunny Warm Normal Breeze Same Yes Sunny Hot Normal Breeze Same No Rainy Cold High Strong Change No Sunny Warm High Strong Change Yes Rainy Warm Normal Breeze Same Yes Types of Classification Decision Theory: Interpretation Model Building Logistic regression, SVMs , tree based classifiers (e.g. decision tree) Traditional neural networks, Nearest neighbor x1 x2 Decision Boundary Discriminative y=1 y=0 𝜃0 + 𝑖 𝜃𝑖𝑥𝑖 ≥ 0 𝜃0 + 𝑖 𝜃𝑖𝑥𝑖 < 0
  • 19. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Sky AirTemp Humidity Wind Forecast Enjoy Sport? Sunny Warm Normal Strong Same Yes Sunny Warm High Strong Same No Rainy Cold High Strong Change No Sunny Warm Normal Breeze Same Yes Sunny Hot Normal Breeze Same No Rainy Cold High Strong Change No Sunny Warm High Strong Change Yes Rainy Warm Normal Breeze Same Yes Types of Classification Decision Theory: Interpretation Model Building Logistic regression, SVMs , tree based classifiers (e.g. decision tree) Traditional neural networks, Nearest neighbor 𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Overcast THEN PLAY = Yes ELSE 𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Rain AND WIND = Strong THEN PLAY = No
  • 20. BITS Pilani, Pilani Campus BITS Pilani Logistic Regression
  • 21. BITS Pilani, Pilani Campus If If , predict “y = 1” , predict “y = 0” Malignant ? (Yes) 1 (No) 0 • Tumor Size • Can we solve the problem using linear regression? E.g., fit a straight line and define a threshold at 0.5 • Threshold classifier output h(x) at 0.5: Logistic Regression vs Least Squares Regression A Discriminant function f (x) directly map input to class labels In two-class problem, f (.) is binary valued
  • 22. BITS Pilani, Pilani Campus Decision Rules • Classifier: f (x,w) = wo + wT x (linear discriminant function) • Decision rule is • Mathematically y = sign(w0 + wT x) • This specifies a linear classifier: it has a linear boundary (hyperplane) w0 + wT x = 0 A discriminant is a function that takes an input vector x and assigns it to one of K classes, denoted Ck.
  • 23. BITS Pilani, Pilani Campus Learning model parameters • Training set: • m examples • How to choose parameters (feature weights) ? 27
  • 24. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Logistic Regression • ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 – e.g., 𝜃0 = −3, 𝜃1 = 1, 𝜃2 = 1 • Predict “𝑦 = 1” if −3 + 𝑥1 + 𝑥2 ≥ 0 • At decision boundary output of logistic regression is 0.5 Tumor Size Age Decision boundary Slide credit: Andrew Ng
  • 25. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Logistic Regression Slide credit: Andrew Ng
  • 26. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Logistic Regression • Training set: • How to choose parameters (feature weights)? • m examples • n features
  • 27. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Logistic Regression • Training set: • How to choose parameters (feature weights)? “non-convex” “convex”
  • 28. BITS Pilani, Pilani Campus Logistic regression cost function (cross entropy) If y = 1 1 0 Cost
  • 29. BITS Pilani, Pilani Campus Logistic regression cost function If y = 0 1 0 Cost Cost=0; If y=0 and hꝊ(x)=0
  • 30. BITS Pilani, Pilani Campus Output : To fit parameters : Apply Gradient Descent Algorithm To make a prediction given new : Cost function
  • 31. BITS Pilani, Pilani Campus Gradient Descent Algorithm 𝐽 𝜃 = − 1 𝑚 𝑖=1 𝑚 𝑦(𝑖) log ℎ𝜃 𝑥(𝑖) + (1 − 𝑦(𝑖) ) log 1 − ℎ𝜃 𝑥(𝑖) Goal: min 𝜃 𝐽(𝜃) Repeat { 𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝜕 𝜕𝜃𝑗 𝐽 𝜃 } 𝜕 𝜕𝜃𝑗 𝐽 𝜃 = 1 𝑚 𝑖=1 𝑚 (ℎ𝜃 𝑥 𝑖 − 𝑦(𝑖) ) 𝑥𝑗 (𝑖)
  • 32. BITS Pilani, Pilani Campus Linear Regression Repeat { 𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 1 𝑚 𝑖=1 𝑚 ℎ𝜃 𝑥 𝑖 − 𝑦(𝑖) 𝑥𝑗 (𝑖) } Logistic Regression Repeat { 𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 1 𝑚 𝑖=1 𝑚 ℎ𝜃 𝑥 𝑖 − 𝑦(𝑖) 𝑥𝑗 (𝑖) } ℎ𝜃 𝑥 = 𝜃⊤ 𝑥 ℎ𝜃 𝑥 = 1 1 + 𝑒−𝜃⊤𝑥 Slide credit: Andrew Ng Gradient Descent Algorithm
  • 33. BITS Pilani, Pilani Campus Example: Sentiment Analysis Sentiment Features
  • 34. BITS Pilani, Pilani Campus Classifying sentiment using logistic regression
  • 35. BITS Pilani, Pilani Campus Logistic Regression – Fit a Model CGPA IQ IQ Job Offered 5.5 6.7 100 1 5 7 105 0 8 6 90 1 9 7 105 1 6 8 120 0 7.5 7.3 110 0 Hyper parameters: Learning Rate = 0.3 Initial Weights = (0.5, 0.5,0.5) Regularization Constant = 0 𝜃TX = 0.5 +0.5 CGPA + 0.5 IQ ℎ𝜃 𝑥 = 1 1 + 𝑒−𝜃⊤𝑥 𝜃𝐶𝐺𝑃𝐴 ≔ 𝜃𝐶𝐺𝑃𝐴 − 0.3 1 6 𝑖=1 6 ℎ𝜃 𝑥 𝑖 − 𝑦(𝑖) 𝑥𝐶𝐺𝑃𝐴 (𝑖) 𝜃𝐼𝑄 ≔ 𝜃𝐼𝑄 − 0.3 1 6 𝑖=1 6 ℎ𝜃 𝑥 𝑖 − 𝑦(𝑖) 𝑥𝐼𝑄 (𝑖) 𝜃0 ≔ 𝜃0 − 0.3 1 6 𝑖=1 6 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 (1) ℎ𝜃 𝑥 = 1 1 + 𝑒−(0.5 +0.5 𝐶𝐺𝑃𝐴 + 0.5 𝐼𝑄 ) Approx. : New weights 𝜃0 = 0.4 𝜃1=𝐶𝐺𝑃𝐴 =-0.4 𝜃2=𝐼𝑄 = −0.6
  • 36. BITS Pilani, Pilani Campus Logistic Regression – Inference & Interpretation CGPA IQ IQ Job Offered 5.5 6.7 100 1 5 7 105 0 8 6 90 1 9 7 105 1 6 8 120 0 7.5 7.3 110 0 Assume : 0.4+0.3CGPA-0.45IQ Predict the Job offered for a candidate : (5, 6) h(x) = 0.31 Y-Predicted = 0 / No Note : The exponential function of the regression coefficient (ew-cpga) is the odds ratio associated with a one-unit increase in the cgpa. + The odd of being offered with job increase by a factor of 1.35 for every unit increase in the CGPA [np.exp(model.params)]
  • 37. BITS Pilani, Pilani Campus Logistic regression (Classification) • Model ℎ𝜃 𝑥 = 𝑃 𝑌 = 1 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 = 1 1+𝑒−𝜃⊤𝑥 • Cost function 𝐽 𝜃 = 1 𝑚 𝑖=1 𝑚 Cost(ℎ𝜃(𝑥 𝑖 ), 𝑦(𝑖) )) Cost(ℎ𝜃 𝑥 , 𝑦) = −log ℎ𝜃 𝑥 if 𝑦 = 1 −log 1 − ℎ𝜃 𝑥 if 𝑦 = 0 • Learning Gradient descent: Repeat {𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 1 𝑚 𝑖=1 𝑚 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗 𝑖 } • Inference 𝑌 = ℎ𝜃 𝑥test = 1 1 + 𝑒−𝜃⊤𝑥test Note: • σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0, so a Logistic model predicts 1 if xTθ is positive, and 0 if it is negative • logit(p) = log(p / (1 - p)), is the inverse of the logistic function. Indeed, if you compute the logit of the estimated probability p, you will find that the result is t. The logit is also called the log-odds
  • 38. BITS Pilani, Pilani Campus Overfitting vs Underfitting Overfitting • Fitting the data too well – Features are noisy / uncorrelated to concept Underfitting • Learning too little of the true concept – Features don’t capture concept – Too much bias in model 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 39. BITS Pilani, Pilani Campus BITS Pilani Regularization Note : This topic is already covered in the module 3 . Refresher & few more points added here
  • 40. BITS Pilani, Pilani Campus Regularization
  • 41. BITS Pilani, Pilani Campus Ways to Control Overfitting • Regularization 𝐿𝑜𝑠𝑠 𝑆 = 𝑖 𝑛 𝐿𝑜𝑠𝑠(𝑦𝑖 ^ , 𝑦𝑖) + 𝛼 𝑗 # 𝑊𝑒𝑖𝑔ℎ𝑡𝑠 |θ𝑗| 45 Note: The hyperparameter controlling the regularization strength of a Scikit-Learn LogisticRegression model is not alpha (as in other linear models), but its inverse: C. The higher the value of C, the less the model is regularized.
  • 42. BITS Pilani, Pilani Campus Regularization
  • 43. BITS Pilani, Pilani Campus Understanding Regularization
  • 44. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Regularization RidgeRegression/Tikhonovregularization
  • 45. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Regularization LassoRegression(LeastAbsoluteShrinkageandSelectionOperatorRegression)
  • 46. BITS Pilani, Pilani Campus Thank you ! Next Session Plan : Module 5 – Decision Tree Classifier Required Reading for completed session : T1 - Chapter # 6 (Tom M. Mitchell, Machine Learning) R1 – Chapter # 3,#4 (Christopher M. Bhisop, Pattern Recognition & Machine Learning) & Refresh your MFDS course basics