Subject - Machine Learning IT 312
Topic -Classification Performance
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’Grade Accredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Dr.R.D.Chintamani
Confusion Matrix
• Confusion Matrix: True/False Negative/Positive
• Accuracy, Precision, Recall, Specificity, F1-score
• AUC-ROC evaluation metric
Reference
https://www.youtube.com/watch?v=IImptOIqllo
Confusion Matrix
• The confusion matrix is a matrix used to determine the performance of
the classification models for a given set of test data. It can only be
determined if the true values for test data are known.
• Since it shows the errors in the model performance in the form of a
matrix, hence also known as an error matrix.
• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3
classes, it is 3*3 table, and so on.
• The matrix is divided into two dimensions, that are predicted
values and actual values along with the total number of predictions.
Performance Metrics
• True Negative: Model has given prediction No, and the real
or actual value was also No.
• True Positive: The model has predicted yes, and the actual
value was also true.
• False Negative: The model has predicted no, but the actual
value was Yes, it is also called as Type-II error.
• False Positive: The model has predicted Yes, but the actual
value was No. It is also called a Type-I error.
Performance Metrics
• The table is given for the two-class classifier, which has two predictions "Yes" and
"NO." Here, Yes defines that patient has the disease, and No defines that patient
does not has that disease.
• The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.
• The model has given prediction "yes" for 32 times, and "No" for 68 times. Whereas
the actual "Yes" was 27, and actual "No" was 73 times.
Calculations using Confusion Matrix:
A) Accuracy :ratio of the number of correct predictions made by
the classifier to all number of predictions made by the
classifiers.
Accuracy = TP+TN/TP+FN+FP+TN
Actual
Positive (1) Negative (0)
Prediction Positive (1) TP FP
FN TN
Calculations using Confusion Matrix:
B) Precision - the number of correct outputs provided by the model or out
of all positive classes that have predicted correctly by the model, how many
of them were actually true.
Precision = TP/TP+FP
Actual
Positive (1) Negative (0)
Prediction Positive (1) TP FP
FN TN
Calculations using Confusion Matrix:
C) Recall :It is defined as the out of total positive classes, how our
model predicted correctly. The recall must be as high as possible.
Recall = TP/TP+FN
Actual
Positive (1) Negative (0)
Prediction Positive (1) TP FP
FN TN
Calculations using Confusion Matrix:
Actual
Positive (1) Negative (0)
Prediction Positive (1) TP FP
FN TN
D) Specificity -
Specificity - TN/TN+FP
E) F1- Score= 2*Precision*Recall/Precision + Recall
ROC- receiver operating characteristic curve
• Sensitivity=Recall = (True Positive)/(True Positive + False Negative)
• proportion of the positive class got correctly classified
• Specificity = (True Negative)/(True Negative + False Positive)
proportion of the negative class got correctly classified.
• ROC curve (receiver operating characteristic curve):The ROC is a graph
displaying a classifier's performance for all possible thresholds. The graph is
plotted between the true positive rate (on the Y-axis) and the false Positive rate
(on the x-axis).
ROC- receiver operating characteristic curve
AUC - ROC curve is a performance measurement for the classification problems
at various threshold settings. ROC is a probability curve and AUC represents the
degree or measure of separability. It tells how much the model is capable of
distinguishing between classes. Higher the AUC, the better the model is at
predicting 0 classes as 0 and 1 classes as 1. By analogy, the Higher the AUC, the
better the model is at distinguishing between patients with the disease and no
disease.
The ROC curve is plotted with TPR against the FPR where TPR is on the y-axis
and FPR is on the x-axis.
ROC- receiver operating characteristic curve
Where are Sensitivity and Specificity used?
• An excellent model has AUC near to the 1 which means it has a good
measure of separability. A poor model has an AUC near 0 which means
it has the worst measure of separability. In fact, it means it is
reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when
AUC is 0.5, it means the model has no class separation capacity
whatsoever.
• to plot the ROC curve.
• to plot Area under the ROC curve (AUC) is used to determine the
model performance.
Multiclass classification performance
Predicted
Actual
A B C D E
A TPA EAB EAC EAD EAE
B EBA TPB EBC EBD EBE
C ECA ECB TPC ECD ECE
D EAD EDB EDC TPD EDE
EEA EAB EEC ECD TPE
TPA= True Positive when input is properly identified
EAB= Actually A but classify into B
Continue..
Calculate Performance Metrics
A) True Positive- This value directly received from correctly identified input
number.
B) False Negative - False Negative of respective class is the sum of all
values excluding TPA in same row.
C) False Positive - False Positive of respective class is the sum of all
values excluding TPA in same column.
D) True Negative - True Negative for a specific class will be the sum of all
columns and rows excluding class column and row.
Example
Predicted
A B C
Actual A 30 5 2
2 40 5
2 0 20
Calculate Accuracy, Precision for class A and Recall/Sensitivity for class B.
Answers
A) Accuracy = ( 30+40+20)/ (30+5+2+2+40+5+2+0+20) = 0.8490
B) Precision A= 30/(30+5+2)= 0.8823
C) Recall/Sensitivity B = 40/(2+40+5)= 0.8510
Thank You !!!

3Assessing classification performance.pdf

  • 1.
    Subject - MachineLearning IT 312 Topic -Classification Performance Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423603 (An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune) NAAC ‘A’Grade Accredited, ISO 9001:2015 Certified Department of Information Technology (NBA Accredited) Dr.R.D.Chintamani
  • 2.
    Confusion Matrix • ConfusionMatrix: True/False Negative/Positive • Accuracy, Precision, Recall, Specificity, F1-score • AUC-ROC evaluation metric Reference https://www.youtube.com/watch?v=IImptOIqllo
  • 3.
    Confusion Matrix • Theconfusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. • Since it shows the errors in the model performance in the form of a matrix, hence also known as an error matrix. • For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and so on. • The matrix is divided into two dimensions, that are predicted values and actual values along with the total number of predictions.
  • 4.
    Performance Metrics • TrueNegative: Model has given prediction No, and the real or actual value was also No. • True Positive: The model has predicted yes, and the actual value was also true. • False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-II error. • False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error.
  • 5.
    Performance Metrics • Thetable is given for the two-class classifier, which has two predictions "Yes" and "NO." Here, Yes defines that patient has the disease, and No defines that patient does not has that disease. • The classifier has made a total of 100 predictions. Out of 100 predictions, 89 are true predictions, and 11 are incorrect predictions. • The model has given prediction "yes" for 32 times, and "No" for 68 times. Whereas the actual "Yes" was 27, and actual "No" was 73 times.
  • 6.
    Calculations using ConfusionMatrix: A) Accuracy :ratio of the number of correct predictions made by the classifier to all number of predictions made by the classifiers. Accuracy = TP+TN/TP+FN+FP+TN Actual Positive (1) Negative (0) Prediction Positive (1) TP FP FN TN
  • 7.
    Calculations using ConfusionMatrix: B) Precision - the number of correct outputs provided by the model or out of all positive classes that have predicted correctly by the model, how many of them were actually true. Precision = TP/TP+FP Actual Positive (1) Negative (0) Prediction Positive (1) TP FP FN TN
  • 8.
    Calculations using ConfusionMatrix: C) Recall :It is defined as the out of total positive classes, how our model predicted correctly. The recall must be as high as possible. Recall = TP/TP+FN Actual Positive (1) Negative (0) Prediction Positive (1) TP FP FN TN
  • 9.
    Calculations using ConfusionMatrix: Actual Positive (1) Negative (0) Prediction Positive (1) TP FP FN TN D) Specificity - Specificity - TN/TN+FP E) F1- Score= 2*Precision*Recall/Precision + Recall
  • 10.
    ROC- receiver operatingcharacteristic curve • Sensitivity=Recall = (True Positive)/(True Positive + False Negative) • proportion of the positive class got correctly classified • Specificity = (True Negative)/(True Negative + False Positive) proportion of the negative class got correctly classified. • ROC curve (receiver operating characteristic curve):The ROC is a graph displaying a classifier's performance for all possible thresholds. The graph is plotted between the true positive rate (on the Y-axis) and the false Positive rate (on the x-axis).
  • 11.
    ROC- receiver operatingcharacteristic curve AUC - ROC curve is a performance measurement for the classification problems at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1. By analogy, the Higher the AUC, the better the model is at distinguishing between patients with the disease and no disease. The ROC curve is plotted with TPR against the FPR where TPR is on the y-axis and FPR is on the x-axis.
  • 12.
    ROC- receiver operatingcharacteristic curve
  • 13.
    Where are Sensitivityand Specificity used? • An excellent model has AUC near to the 1 which means it has a good measure of separability. A poor model has an AUC near 0 which means it has the worst measure of separability. In fact, it means it is reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when AUC is 0.5, it means the model has no class separation capacity whatsoever. • to plot the ROC curve. • to plot Area under the ROC curve (AUC) is used to determine the model performance.
  • 14.
    Multiclass classification performance Predicted Actual AB C D E A TPA EAB EAC EAD EAE B EBA TPB EBC EBD EBE C ECA ECB TPC ECD ECE D EAD EDB EDC TPD EDE EEA EAB EEC ECD TPE TPA= True Positive when input is properly identified EAB= Actually A but classify into B
  • 15.
    Continue.. Calculate Performance Metrics A)True Positive- This value directly received from correctly identified input number. B) False Negative - False Negative of respective class is the sum of all values excluding TPA in same row. C) False Positive - False Positive of respective class is the sum of all values excluding TPA in same column. D) True Negative - True Negative for a specific class will be the sum of all columns and rows excluding class column and row.
  • 16.
    Example Predicted A B C ActualA 30 5 2 2 40 5 2 0 20 Calculate Accuracy, Precision for class A and Recall/Sensitivity for class B.
  • 17.
    Answers A) Accuracy =( 30+40+20)/ (30+5+2+2+40+5+2+0+20) = 0.8490 B) Precision A= 30/(30+5+2)= 0.8823 C) Recall/Sensitivity B = 40/(2+40+5)= 0.8510
  • 18.