SlideShare a Scribd company logo
Important Classification
and Regression Metrics
By chode Amarnath
Important Links referred
1) https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/
2) https://www.javatpoint.com/confusion-matrix-in-machine-learning
3) https://medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-
f1-score-ade299cf63cd
4) https://www.freecodecamp.org/news/evaluation-metrics-for-regression-problems-
machine-learning/
Why do we use different evaluation metrics
There are plenty of ways to measure the quality of an algorithm and each company
decides for themselves
→ What is the most appropriate way for their particular problem.
Example:
Let’s say an online shop is trying to maximize effectiveness of their website.
→ we need to formalize what is effectiveness.
→ we need to define a metric how effectiveness is measured.
→ It can be a number of times a website was visited, or the number of times
something was ordered using this website.
→ So the company usually decides for itself what quantity is most important
When assessing how well a model fits a dataset, we use the RMSE more often because
it is measured in the same units as the response variable
Regression & Classification Metrics
1) Regression
a) MSE
b) RMSE
c) R-squared
d) MAE
e) RMSPE,MAPE
2) Classification
a) Confusion Matrix
b) Accuracy
c) Precision
d) Recall
e) F1 Score
f) AUC
Regression Metrics - Mean Square Error(MSE)
Mean or Average of the square of the difference between actual and estimated values
A high value of MSE means that the model is not performing well,
whereas a MSE of 0 would mean that you have a perfect model that predicts the
target without any error.
Example :
Why we Square the difference
Example : Model Comparison
When we compare Model A with Mobel B is having extreme errors
Advantages & Disadvantages
Advantages of using MSE
Easy to calculate in Python
Simple to understand calculation for end users
Designed to punish large errors
Disadvantages of using MSE
Error value not given in terms of the target
Difficult to interpret
Not comparable across use cases
RMSE
RMSE is the square root of the mean of the square of all of the error
→ RMSE has the benefit of penalizing large errors more so can be more
appropriate in some cases,
→ On the other hand, one distinct advantage of RMSE over MAE is that RMSE
avoids the use of taking the absolute value
Example :
Let’s understand the above statement with the two examples:
Case 1 : Actual Value = [2,4,6,8], Predicted Values = [4,6,8,10]
Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12]
MAE for case 1 = 2.0, RMSE for case 1 = 2.0
MAE for case 2 = 2.5, RMSE for case 2 = 2.65
From the above example,
→ we can see that RMSE penalizes the last value prediction more heavily than
MAE. Generally, RMSE will be higher than or equal to MAE.
→ The only case where it equals MAE is when all the differences are equal or zero
(true for case 1 where the difference between actual and predicted is 2 for all
observations).
Mean Absolute Error(MAE)
MAE is the average of the absolute difference between the predicted values and
observed values
→ All the individual differences are weighted equally in the average.
What are the disadvantages of using mean absolute error?
it doesn't tell you whether your model tends to overestimate or underestimate
→ since any direction information is destroyed by taking the absolute value.
Example :
MAE is the sum of absolute differences between actual and predicted values. It doesn’t
consider the direction, that is, positive or negative.
→ When we consider directions also, that is called Mean Bias Error (MBE),
which is a sum of errors(difference).
So which one should you choose and why?
Well, it is easy to understand and interpret MAE because it directly takes the average of
offsets
whereas RMSE penalizes the higher difference more than MAE.
MAE is the sum of absolute differences between actual and predicted values. It doesn’t
consider the direction, that is, positive or negative.
→ When we consider directions also, that is called Mean Bias Error (MBE),
which is a sum of errors(difference).
Residual
→ residual are the difference between the actual and predicted value, you can
think of residuals as being a distance.
→ the closer the residual to zero, the better the model performs in making its
predictions.
R2 Score
The R2 score is a statistical measure that tells us how well our model is making
predictions on a scale of 0 to 1.
→ we can use the R2 square to determine the distance or residual
R-Squared
R-squared is a goodness-of-fit measure for linear regression models. This statistic
indicates the percentage of the variance in the dependent variable that the
independent variables explain collectively.
When to use R2 score
You can use the R2 score to get the accuracy of your model on a percentage
scale, that is 0 - 100, just like in a classification model.
Adjusted R2
Adjusted R2 is the better model when you compare models that have a different
amount of variables
→ The logic behind it is, that R2 always increases when the number of variables
increases. Meaning that even if you add a useless variable to you model, your R2
will still increase. To balance that out, you should always compare models with
different number of independent variables with adjusted R2.
→ Adjusted R2 only increases if the new variable improves the model more than
would be expected by chance.
→ When you compare models use adjusted R2. When you only look at one model
report R2, as it is the not adjusted measure of how much variance is explained by
your model.
Classification Metrics
→ Confusion Matrix
→ Accuracy
→ Precision
→ Recall
→ F1 score
→ AUC(Area under ROC Curve)
TP,TN,FP,FN
We represent prediction as positive(P) or Negative(N) and truth values as True(T) or
False.
→ Representing truth and predicted values together, we get True positive (TP), True
Negative (TN), False Positive (FP), False Negative (FN).
Example : True Positive (TP)
Example : True Negative (TN)
Example : False Positive (FP)
Example : False Negative(FN)
Confusion Matrix
The confusion matrix is used to determine the performance of the classification model.
→ It can only determined if the true values for the test data is known.
→ It shows error in the model performance in the form of a matrix.
Need for confusion matrix
→ It evaluate the performance of the classification model, when they make
predictions on test data and tells how good your model is.
→ with help of confusion matrix we can calculate the different parameters of the
model, such as Accuracy, Precision,Recall.
Example :
Accuracy
Accuracy is the quintessential classification metric. It is pretty easy to understand. And
easily suited for binary as well as a multiclass classification problem.
Accuracy = (TP+TN)/(TP+FP+FN+TN)
Accuracy is the proportion of true results among the total number of cases examined.
When to use?
Accuracy is a valid choice of evaluation for classification problems which are well
balanced and not skewed or No class imbalance.
Accuracy
"What percentage of my predictions are correct?"
True Positives (TP): should be TRUE, you predicted TRUE, These are cases in
which we predicted yes (they have the disease), and they do have the disease.
True Negative (TN): should be FALSE, you predicted FALSE, We predicted no,
and they don't have the disease.
False Positives (FP): should be FALSE, you predicted TRUE, We predicted yes,
but they don't actually have the disease. (Also known as a "Type I error.")
False Negatives (FN): should be TRUE, you predicted FALSE, We predicted no,
but they actually do have the disease. (Also known as a "Type II error.")
Caveats
Let us say that our target class is very sparse. Do we want accuracy as a metric of our
model performance? What if we are predicting if an asteroid will hit the earth? Just say
No all the time. And you will be 99% accurate. My model can be reasonably accurate, but
not at all valuable.
Example :
→ When a search engine returns 30 pages, only 20 of which are relevant, while
failing to return 40 additional relevant pages, its precision is 20/30 = 2/3,
→ which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells
us how complete the results are.
Precision
Let’s start with precision, which answers the following question: what proportion of
predicted Positives is truly Positive?
Precision = (TP)/(TP+FP)
What is the precision of your model ?
→ Yes it is 0.843 or When it is predict that a patient has heart disease, it is
correct around 84% of the time.
When to use?
Precision is a valid choice of evaluation metric when we want to be very sure of our
prediction.
For example:
If we are building a system to predict if we should decrease the credit limit on
a particular account, we want to be very sure about our prediction or it may result in
customer dissatisfaction.
Caveats
Being very precise means our model will leave a lot of credit defaulters untouched and
hence lose money.
Recall
Another very useful measure is recall, which answers a different question: what
proportion of actual Positives is correctly classified?
For your model, Recall = 0.86, recall gives a measure of how accurately your model is
able to identify the relevant data.
Precision
"Of the points that I predicted TRUE, how many are actually TRUE?"
Good for multi-label / multi-class classification and information retrieval
Good for unbalanced datasets
Recall
"Of all the points that are actually TRUE, how many did I correctly
predict?"
Good for multi-label / multi-class classification and information retrieval Good for
unbalanced datasets
Precision / Recall
Let’s say we are evaluating a classifier on the test set.
→ The Actual class of that example in the test set is going to be “1” or “0”.
→ If there is a binary classification problem.
→ High precision would be good.
→ High recall would be a good thing.
True Positive
Your algorithm predicted that’s positive(1) and in reality the example is
positive.
True Negative
Your learning algorithm predicted that something is negative class “Zero” and the
Actual class is “Zero” is called a true negative.
False positive
If our learning algorithm predicts that the class is positive(1) but the actual
class is Negative(0). Then that’s called a False positive.
False Negative
Algorithm predicted as Negative(0), but actual is positive(1)
Suppose we want to predict that the patient has cancer only if we’re very confident that
they really do
→ So maybe we want to tell someone that we think they have cancer only if they are
very confident.
One way to do this would be modify the algorithm, so that instead of setting this
threshold at 0.5 to 0.7.
→ Then you’re predicting someone has cancer only when you’re more
confident.
How to compare precision/recall numbers?
When we are trying to compare Algorithm 1 and algorithm 2 and Algorithm 3 we don’t
have a single real number evaluation metric.
→ If we have a single real number evaluation metric like a number that just tells us
is algorithm 1 or algorithm 2 is better.
→ That helps us to much more quickly decide which algorithm to go with.
F1 Score
F1 score Can you give a single metric that balances precision and recall.
→ Gives equal weight to precision and recall
→ Good for unbalanced datasets
What is AUC - ROC Curve?
AUC - ROC curve is a performance measurement for classification problem at various
thresholds settings.
→ It tells how much model is capable of distinguishing between classes.
→ Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.
ROC Curve
Receiver Operating Characteristic curve represent a probability graph to show the
performance of a classification model at different thresholds levels
1) True positive rate or TPR
2) False positive rate
An excellent model has AUC near to the 1 which means it has good measure of
separability.
A poor model has AUC near to the 0 which means it has worst measure of separability.
In fact it means it is reciprocating the result.
→ It is predicting 0s as 1s and 1s as 0s.
→ And when AUC is 0.5, it means model has no class separation capacity
whatsoever.
https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
As we know, ROC is a curve of probability, so let’s plot the distribution of those
probability
→ Red distribution curve is of the positive class and the green distribution curve is of
the negative class
Example : AUC = 0.7
Example : AUC = 0.5
Example : AUC = 0
When to Use ROC vs. Precision-Recall Curves?
Generally, the use of ROC curves and precision-recall curves are as follows:
● ROC curves should be used when there are roughly equal numbers of observations for each class.
● Precision-Recall curves should be used when there is a moderate to large class imbalance.
The reason for this recommendation is that ROC curves present an optimistic picture of the model on datasets with a class
imbalance.

More Related Content

What's hot

Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
Machine Learning Valencia
 
Data Preprocessing
Data PreprocessingData Preprocessing
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
Sai Kumar Kodam
 
Machine learning
Machine learningMachine learning
Machine learning
Rohit Kumar
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
Nirmalavenkatachalam
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
tdharmaputhiran
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
Rahul Jaiman
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
DheerajPachauri
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
kavitha muneeshwaran
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
Iffat Firozy
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
Setia Pramana
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
Krish_ver2
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 

What's hot (20)

Cross validation
Cross validationCross validation
Cross validation
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Machine learning
Machine learningMachine learning
Machine learning
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 

Similar to Important Classification and Regression Metrics.pptx

Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
Megan Verbakel
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
Abhimanyu Dwivedi
 
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
TAHIRZAMAN81
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptx
taherzamanrather
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
AmnaArooj13
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
Rupak Roy
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
ShayanChowdary
 
Model Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recallModel Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recall
Megha Sharma
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
weka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
DataminingTools Inc
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
Lecture note 2
Lecture note 2Lecture note 2
Lecture note 2
sreenu t
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
Hoopeer Hoopeer
 
Binary classification metrics_cheatsheet
Binary classification metrics_cheatsheetBinary classification metrics_cheatsheet
Binary classification metrics_cheatsheet
Jakub Czakon
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
Chapter 2Chapter 2
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
toneve4907
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
anhlodge
 

Similar to Important Classification and Regression Metrics.pptx (20)

Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptx
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Model Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recallModel Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recall
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Lecture note 2
Lecture note 2Lecture note 2
Lecture note 2
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
Binary classification metrics_cheatsheet
Binary classification metrics_cheatsheetBinary classification metrics_cheatsheet
Binary classification metrics_cheatsheet
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 

More from Chode Amarnath

Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
Chode Amarnath
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
Chode Amarnath
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
Chode Amarnath
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
Chode Amarnath
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategies
Chode Amarnath
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
Chode Amarnath
 

More from Chode Amarnath (6)

Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategies
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
 

Recently uploaded

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Important Classification and Regression Metrics.pptx

  • 1. Important Classification and Regression Metrics By chode Amarnath
  • 2. Important Links referred 1) https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/ 2) https://www.javatpoint.com/confusion-matrix-in-machine-learning 3) https://medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall- f1-score-ade299cf63cd 4) https://www.freecodecamp.org/news/evaluation-metrics-for-regression-problems- machine-learning/
  • 3. Why do we use different evaluation metrics There are plenty of ways to measure the quality of an algorithm and each company decides for themselves → What is the most appropriate way for their particular problem. Example: Let’s say an online shop is trying to maximize effectiveness of their website. → we need to formalize what is effectiveness. → we need to define a metric how effectiveness is measured. → It can be a number of times a website was visited, or the number of times something was ordered using this website. → So the company usually decides for itself what quantity is most important
  • 4. When assessing how well a model fits a dataset, we use the RMSE more often because it is measured in the same units as the response variable
  • 5.
  • 6. Regression & Classification Metrics 1) Regression a) MSE b) RMSE c) R-squared d) MAE e) RMSPE,MAPE 2) Classification a) Confusion Matrix b) Accuracy c) Precision d) Recall e) F1 Score f) AUC
  • 7. Regression Metrics - Mean Square Error(MSE) Mean or Average of the square of the difference between actual and estimated values A high value of MSE means that the model is not performing well, whereas a MSE of 0 would mean that you have a perfect model that predicts the target without any error.
  • 8.
  • 9.
  • 11. Why we Square the difference
  • 12. Example : Model Comparison When we compare Model A with Mobel B is having extreme errors
  • 13. Advantages & Disadvantages Advantages of using MSE Easy to calculate in Python Simple to understand calculation for end users Designed to punish large errors Disadvantages of using MSE Error value not given in terms of the target Difficult to interpret Not comparable across use cases
  • 14. RMSE RMSE is the square root of the mean of the square of all of the error → RMSE has the benefit of penalizing large errors more so can be more appropriate in some cases, → On the other hand, one distinct advantage of RMSE over MAE is that RMSE avoids the use of taking the absolute value
  • 16. Let’s understand the above statement with the two examples: Case 1 : Actual Value = [2,4,6,8], Predicted Values = [4,6,8,10] Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12] MAE for case 1 = 2.0, RMSE for case 1 = 2.0 MAE for case 2 = 2.5, RMSE for case 2 = 2.65 From the above example, → we can see that RMSE penalizes the last value prediction more heavily than MAE. Generally, RMSE will be higher than or equal to MAE. → The only case where it equals MAE is when all the differences are equal or zero (true for case 1 where the difference between actual and predicted is 2 for all observations).
  • 17. Mean Absolute Error(MAE) MAE is the average of the absolute difference between the predicted values and observed values → All the individual differences are weighted equally in the average.
  • 18. What are the disadvantages of using mean absolute error? it doesn't tell you whether your model tends to overestimate or underestimate → since any direction information is destroyed by taking the absolute value.
  • 19.
  • 21.
  • 22. MAE is the sum of absolute differences between actual and predicted values. It doesn’t consider the direction, that is, positive or negative. → When we consider directions also, that is called Mean Bias Error (MBE), which is a sum of errors(difference).
  • 23. So which one should you choose and why? Well, it is easy to understand and interpret MAE because it directly takes the average of offsets whereas RMSE penalizes the higher difference more than MAE.
  • 24. MAE is the sum of absolute differences between actual and predicted values. It doesn’t consider the direction, that is, positive or negative. → When we consider directions also, that is called Mean Bias Error (MBE), which is a sum of errors(difference).
  • 25. Residual → residual are the difference between the actual and predicted value, you can think of residuals as being a distance. → the closer the residual to zero, the better the model performs in making its predictions.
  • 26. R2 Score The R2 score is a statistical measure that tells us how well our model is making predictions on a scale of 0 to 1. → we can use the R2 square to determine the distance or residual
  • 27. R-Squared R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively.
  • 28. When to use R2 score You can use the R2 score to get the accuracy of your model on a percentage scale, that is 0 - 100, just like in a classification model.
  • 29.
  • 30.
  • 31. Adjusted R2 Adjusted R2 is the better model when you compare models that have a different amount of variables → The logic behind it is, that R2 always increases when the number of variables increases. Meaning that even if you add a useless variable to you model, your R2 will still increase. To balance that out, you should always compare models with different number of independent variables with adjusted R2. → Adjusted R2 only increases if the new variable improves the model more than would be expected by chance. → When you compare models use adjusted R2. When you only look at one model report R2, as it is the not adjusted measure of how much variance is explained by your model.
  • 32. Classification Metrics → Confusion Matrix → Accuracy → Precision → Recall → F1 score → AUC(Area under ROC Curve)
  • 33. TP,TN,FP,FN We represent prediction as positive(P) or Negative(N) and truth values as True(T) or False. → Representing truth and predicted values together, we get True positive (TP), True Negative (TN), False Positive (FP), False Negative (FN).
  • 34. Example : True Positive (TP)
  • 35. Example : True Negative (TN)
  • 36. Example : False Positive (FP)
  • 37. Example : False Negative(FN)
  • 38. Confusion Matrix The confusion matrix is used to determine the performance of the classification model. → It can only determined if the true values for the test data is known. → It shows error in the model performance in the form of a matrix.
  • 39. Need for confusion matrix → It evaluate the performance of the classification model, when they make predictions on test data and tells how good your model is. → with help of confusion matrix we can calculate the different parameters of the model, such as Accuracy, Precision,Recall.
  • 41. Accuracy Accuracy is the quintessential classification metric. It is pretty easy to understand. And easily suited for binary as well as a multiclass classification problem. Accuracy = (TP+TN)/(TP+FP+FN+TN) Accuracy is the proportion of true results among the total number of cases examined.
  • 42. When to use? Accuracy is a valid choice of evaluation for classification problems which are well balanced and not skewed or No class imbalance.
  • 43. Accuracy "What percentage of my predictions are correct?" True Positives (TP): should be TRUE, you predicted TRUE, These are cases in which we predicted yes (they have the disease), and they do have the disease. True Negative (TN): should be FALSE, you predicted FALSE, We predicted no, and they don't have the disease. False Positives (FP): should be FALSE, you predicted TRUE, We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") False Negatives (FN): should be TRUE, you predicted FALSE, We predicted no, but they actually do have the disease. (Also known as a "Type II error.")
  • 44.
  • 45. Caveats Let us say that our target class is very sparse. Do we want accuracy as a metric of our model performance? What if we are predicting if an asteroid will hit the earth? Just say No all the time. And you will be 99% accurate. My model can be reasonably accurate, but not at all valuable.
  • 46. Example : → When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, → which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are.
  • 47. Precision Let’s start with precision, which answers the following question: what proportion of predicted Positives is truly Positive? Precision = (TP)/(TP+FP) What is the precision of your model ? → Yes it is 0.843 or When it is predict that a patient has heart disease, it is correct around 84% of the time.
  • 48. When to use? Precision is a valid choice of evaluation metric when we want to be very sure of our prediction. For example: If we are building a system to predict if we should decrease the credit limit on a particular account, we want to be very sure about our prediction or it may result in customer dissatisfaction. Caveats Being very precise means our model will leave a lot of credit defaulters untouched and hence lose money.
  • 49. Recall Another very useful measure is recall, which answers a different question: what proportion of actual Positives is correctly classified? For your model, Recall = 0.86, recall gives a measure of how accurately your model is able to identify the relevant data.
  • 50. Precision "Of the points that I predicted TRUE, how many are actually TRUE?" Good for multi-label / multi-class classification and information retrieval Good for unbalanced datasets Recall "Of all the points that are actually TRUE, how many did I correctly predict?" Good for multi-label / multi-class classification and information retrieval Good for unbalanced datasets
  • 51. Precision / Recall Let’s say we are evaluating a classifier on the test set. → The Actual class of that example in the test set is going to be “1” or “0”. → If there is a binary classification problem. → High precision would be good. → High recall would be a good thing.
  • 52. True Positive Your algorithm predicted that’s positive(1) and in reality the example is positive. True Negative Your learning algorithm predicted that something is negative class “Zero” and the Actual class is “Zero” is called a true negative. False positive If our learning algorithm predicts that the class is positive(1) but the actual class is Negative(0). Then that’s called a False positive. False Negative Algorithm predicted as Negative(0), but actual is positive(1)
  • 53.
  • 54. Suppose we want to predict that the patient has cancer only if we’re very confident that they really do → So maybe we want to tell someone that we think they have cancer only if they are very confident. One way to do this would be modify the algorithm, so that instead of setting this threshold at 0.5 to 0.7. → Then you’re predicting someone has cancer only when you’re more confident.
  • 55.
  • 56. How to compare precision/recall numbers? When we are trying to compare Algorithm 1 and algorithm 2 and Algorithm 3 we don’t have a single real number evaluation metric. → If we have a single real number evaluation metric like a number that just tells us is algorithm 1 or algorithm 2 is better. → That helps us to much more quickly decide which algorithm to go with.
  • 57.
  • 58. F1 Score F1 score Can you give a single metric that balances precision and recall. → Gives equal weight to precision and recall → Good for unbalanced datasets
  • 59. What is AUC - ROC Curve? AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. → It tells how much model is capable of distinguishing between classes. → Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.
  • 60. ROC Curve Receiver Operating Characteristic curve represent a probability graph to show the performance of a classification model at different thresholds levels 1) True positive rate or TPR 2) False positive rate
  • 61. An excellent model has AUC near to the 1 which means it has good measure of separability. A poor model has AUC near to the 0 which means it has worst measure of separability. In fact it means it is reciprocating the result. → It is predicting 0s as 1s and 1s as 0s. → And when AUC is 0.5, it means model has no class separation capacity whatsoever.
  • 62. https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 As we know, ROC is a curve of probability, so let’s plot the distribution of those probability → Red distribution curve is of the positive class and the green distribution curve is of the negative class
  • 63. Example : AUC = 0.7
  • 64. Example : AUC = 0.5
  • 66.
  • 67. When to Use ROC vs. Precision-Recall Curves? Generally, the use of ROC curves and precision-recall curves are as follows: ● ROC curves should be used when there are roughly equal numbers of observations for each class. ● Precision-Recall curves should be used when there is a moderate to large class imbalance. The reason for this recommendation is that ROC curves present an optimistic picture of the model on datasets with a class imbalance.