SlideShare a Scribd company logo
Module 2
Regression
• In machine learning, a regression problem is the problem of
predicting the value of a numeric variable based on observed values
of the variable.
• The value of the output variable may be a number, such as an integer
or a floating point value.
• These are often quantities, such as amounts and sizes. The input
variables may be discrete or real-valued.
• Different regression models
• There are various types of regression techniques available to make
predictions.
• These techniques mostly differ in three aspects, namely, the number
and type of independent variables, the type of dependent variables
and the shape of regression line.
• Simple linear regression: There is only one continuous independent
variable x and the assumed relation between the independent variable
and the dependent variable y is y = a + bx.
• Multivariate linear regression: There are more than one independent
variable, say x1, . . . , xn, and the assumed relation between the
independent variables and the dependent variable is y = a0 + a1x1 + ⋯
+ anxn.
.
• Polynomial regression: There is only one continuous independent
variable x and the assumed model is y = a0 + a1x + ⋯ + anx n .
• Logistic regression: The dependent variable is binary, that is, a
variable which takes only the values 0 and 1. The assumed model
involves certain probability distributions
Errors in Machine Learning
•Irreducible errors are errors which will always be present in a machine learning model, because of
unknown variables, and whose values cannot be reduced.
•Reducible errors are those errors whose values can be further reduced to improve a model. They are
caused because our model’s output function does not match the desired output function and can be
optimized.
What is Bias?
• To make predictions, our model will analyze our data and find patterns in it.
Using these patterns, we can make generalizations about certain instances in
our data. Our model after training learns these patterns and applies them to
the test set to predict them.
• Bias is the difference between our actual and predicted values. Bias is the
simple assumptions that our model makes about our data to be able to predict
new data.
• When the Bias is high, assumptions made by our model are too basic, the
model can’t capture the important features of our data.
• This means that our model hasn’t captured patterns in the training data and
hence cannot perform well on the testing data too.
• If this is the case, our model cannot perform on new data and cannot be sent
into production.
• This instance, where the model cannot find patterns in our training set and
hence fails for both seen and unseen data, is called Underfitting.
• The below figure shows an example of Underfitting. As we can see, the model
has found no patterns in our data and the line of best fit is a straight line that
does not pass through any of the data points. The model has failed to train
properly on the data given and cannot predict new data either.
variance
• Variance is the very opposite of Bias. During training, it allows our model to
‘see’ the data a certain number of times to find patterns in it.
• If it does not work on the data for long enough, it will not find patterns and bias
occurs. On the other hand, if our model is allowed to view the data too many
times, it will learn very well for only that data.
• It will capture most patterns in the data, but it will also learn from the
unnecessary data present, or from the noise.
• We can define variance as the model’s sensitivity to fluctuations in the data.
Our model may learn from noise. This will cause our model to consider trivial
features as important.
• In the above figure, we can see that our model has learned extremely well for
our training data, which has taught it to identify cats.
• But when given new data, such as the picture of a fox, our model predicts it as
a cat, as that is what it has learned. This happens when the Variance is high,
our model will capture all the features of the data given to it, including the
noise, will tune itself to the data, and predict it very well but when given new
data, it cannot predict on it as it is too specific to training data.
Bias-Variance Tradeoff
• For any model, we have to find the perfect balance between Bias and
Variance.
• This just ensures that we capture the essential patterns in our model while
ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps
optimize the error in our model and keeps it as low as possible.
• An optimized model will be sensitive to the patterns in our data, but at the
same time will be able to generalize to new data. In this, both the bias and
variance should be low so as to prevent overfitting and underfitting.
• we can see that when bias is high, the error in both testing and training set is
also high.
• If we have a high variance, the model performs well on the testing set, we can
see that the error is low, but gives high error on the training set.
• We can see that there is a region in the middle, where the error in both
training and testing set is low and the bias and variance is in perfect balance.
The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther
and farther away from the center, the error increases in our model. The best model is one where bias and
variance are both low.
Theorem of total probability
• Let B1, B2, …, BN be mutually exclusive events whose union equals the
sample space S. We refer to these sets as a partition of S.
• An event A can be represented as:
• Since B1, B2, …, BN are mutually exclusive, then
P(A) = P(A  B1) + P(A  B2) + … + P(A  BN)
• And therefore
P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN)
= i P(A | Bi) * P(Bi) Exhaustive conditionalization
Marginalization
Bayes theorem
• P(A  B) = P(B) * P(A | B) = P(A) * P(B | A)
A
P
B
P
A
B
P
)
(
)
(
)
|
( =
=>
Posterior probability Prior of A (Normalizing constant)
B
A
P )
|
( Prior of B
Conditional probability
(likelihood)
This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful
relations in probability and statistics
Bayes Theorem is definitely the fundamental relation in Statistical Pattern
Recognition
Bayes theorem (cont’d)
• Given B1, B2, …, BN, a partition of the sample space S.
Suppose that event A occurs; what is the probability of
event Bj?
• P(Bj | A) = P(A | Bj) * P(Bj) / P(A)
= P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj)
Bj: different models / hypotheses
In the observation of A, should you choose a model that maximizes P(Bj | A)
or P(A | Bj)? Depending on how much you know about Bj !
Posterior probability
Likelihood Prior of Bj
Normalizing constant
(theorem of total probabilities)
Another example
• We’ve talked about the boxes of casinos: 99% fair, 1% loaded (50% at
six)
• We said if we randomly pick a die and roll, we have 17% of chance to
get a six
• If we get 3 six in a row, what’s the chance that the die is loaded?
• How about 5 six in a row?
• P(loaded | 666)
= P(666 | loaded) * P(loaded) / P(666)
= 0.53 * 0.01 / (0.53 * 0.01 + (1/6)3 * 0.99)
= 0.21
• P(loaded | 66666)
= P(66666 | loaded) * P(loaded) / P(66666)
= 0.55 * 0.01 / (0.55 * 0.01 + (1/6)5 * 0.99)
= 0.71
Classification : Bayes’ decision theory

More Related Content

Similar to regression.pptx

dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
Deadpool120050
 
Model validation
Model validationModel validation
Model validation
Utkarsh Sharma
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
hiblooms
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
ShayanChowdary
 
Statistics
StatisticsStatistics
Statistics
Eran Earland
 
Bias in Research Methods
Bias in Research Methods Bias in Research Methods
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
egoodwintx
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
Statistics four
Statistics fourStatistics four
Statistics four
Mohamed Hefny
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
Avjinder (Avi) Kaler
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
ILRI-Jmaru
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
testuser473730
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
Georgios Ath. Kounis
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
Dhananjay Birmole
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
Weam Banjar
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
smithashetty24
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
donthuraj
 

Similar to regression.pptx (20)

dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Model validation
Model validationModel validation
Model validation
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Statistics
StatisticsStatistics
Statistics
 
Bias in Research Methods
Bias in Research Methods Bias in Research Methods
Bias in Research Methods
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Statistics four
Statistics fourStatistics four
Statistics four
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 

Recently uploaded

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 

Recently uploaded (20)

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 

regression.pptx

  • 2. Regression • In machine learning, a regression problem is the problem of predicting the value of a numeric variable based on observed values of the variable. • The value of the output variable may be a number, such as an integer or a floating point value. • These are often quantities, such as amounts and sizes. The input variables may be discrete or real-valued.
  • 3.
  • 4. • Different regression models • There are various types of regression techniques available to make predictions. • These techniques mostly differ in three aspects, namely, the number and type of independent variables, the type of dependent variables and the shape of regression line.
  • 5. • Simple linear regression: There is only one continuous independent variable x and the assumed relation between the independent variable and the dependent variable y is y = a + bx. • Multivariate linear regression: There are more than one independent variable, say x1, . . . , xn, and the assumed relation between the independent variables and the dependent variable is y = a0 + a1x1 + ⋯ + anxn. .
  • 6. • Polynomial regression: There is only one continuous independent variable x and the assumed model is y = a0 + a1x + ⋯ + anx n . • Logistic regression: The dependent variable is binary, that is, a variable which takes only the values 0 and 1. The assumed model involves certain probability distributions
  • 7. Errors in Machine Learning •Irreducible errors are errors which will always be present in a machine learning model, because of unknown variables, and whose values cannot be reduced. •Reducible errors are those errors whose values can be further reduced to improve a model. They are caused because our model’s output function does not match the desired output function and can be optimized.
  • 8. What is Bias? • To make predictions, our model will analyze our data and find patterns in it. Using these patterns, we can make generalizations about certain instances in our data. Our model after training learns these patterns and applies them to the test set to predict them. • Bias is the difference between our actual and predicted values. Bias is the simple assumptions that our model makes about our data to be able to predict new data.
  • 9.
  • 10. • When the Bias is high, assumptions made by our model are too basic, the model can’t capture the important features of our data. • This means that our model hasn’t captured patterns in the training data and hence cannot perform well on the testing data too. • If this is the case, our model cannot perform on new data and cannot be sent into production.
  • 11. • This instance, where the model cannot find patterns in our training set and hence fails for both seen and unseen data, is called Underfitting. • The below figure shows an example of Underfitting. As we can see, the model has found no patterns in our data and the line of best fit is a straight line that does not pass through any of the data points. The model has failed to train properly on the data given and cannot predict new data either.
  • 12.
  • 13. variance • Variance is the very opposite of Bias. During training, it allows our model to ‘see’ the data a certain number of times to find patterns in it. • If it does not work on the data for long enough, it will not find patterns and bias occurs. On the other hand, if our model is allowed to view the data too many times, it will learn very well for only that data. • It will capture most patterns in the data, but it will also learn from the unnecessary data present, or from the noise.
  • 14. • We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important.
  • 15. • In the above figure, we can see that our model has learned extremely well for our training data, which has taught it to identify cats. • But when given new data, such as the picture of a fox, our model predicts it as a cat, as that is what it has learned. This happens when the Variance is high, our model will capture all the features of the data given to it, including the noise, will tune itself to the data, and predict it very well but when given new data, it cannot predict on it as it is too specific to training data.
  • 16.
  • 17. Bias-Variance Tradeoff • For any model, we have to find the perfect balance between Bias and Variance. • This just ensures that we capture the essential patterns in our model while ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps optimize the error in our model and keeps it as low as possible. • An optimized model will be sensitive to the patterns in our data, but at the same time will be able to generalize to new data. In this, both the bias and variance should be low so as to prevent overfitting and underfitting.
  • 18.
  • 19. • we can see that when bias is high, the error in both testing and training set is also high. • If we have a high variance, the model performs well on the testing set, we can see that the error is low, but gives high error on the training set. • We can see that there is a region in the middle, where the error in both training and testing set is low and the bias and variance is in perfect balance.
  • 20. The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther and farther away from the center, the error increases in our model. The best model is one where bias and variance are both low.
  • 21. Theorem of total probability • Let B1, B2, …, BN be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. • An event A can be represented as: • Since B1, B2, …, BN are mutually exclusive, then P(A) = P(A  B1) + P(A  B2) + … + P(A  BN) • And therefore P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN) = i P(A | Bi) * P(Bi) Exhaustive conditionalization Marginalization
  • 22. Bayes theorem • P(A  B) = P(B) * P(A | B) = P(A) * P(B | A) A P B P A B P ) ( ) ( ) | ( = => Posterior probability Prior of A (Normalizing constant) B A P ) | ( Prior of B Conditional probability (likelihood) This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition
  • 23. Bayes theorem (cont’d) • Given B1, B2, …, BN, a partition of the sample space S. Suppose that event A occurs; what is the probability of event Bj? • P(Bj | A) = P(A | Bj) * P(Bj) / P(A) = P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj) Bj: different models / hypotheses In the observation of A, should you choose a model that maximizes P(Bj | A) or P(A | Bj)? Depending on how much you know about Bj ! Posterior probability Likelihood Prior of Bj Normalizing constant (theorem of total probabilities)
  • 24. Another example • We’ve talked about the boxes of casinos: 99% fair, 1% loaded (50% at six) • We said if we randomly pick a die and roll, we have 17% of chance to get a six • If we get 3 six in a row, what’s the chance that the die is loaded? • How about 5 six in a row?
  • 25. • P(loaded | 666) = P(666 | loaded) * P(loaded) / P(666) = 0.53 * 0.01 / (0.53 * 0.01 + (1/6)3 * 0.99) = 0.21 • P(loaded | 66666) = P(66666 | loaded) * P(loaded) / P(66666) = 0.55 * 0.01 / (0.55 * 0.01 + (1/6)5 * 0.99) = 0.71
  • 26. Classification : Bayes’ decision theory