3Assessing classification performance.pdf

Subject - Machine Learning IT 312
Topic -Classification Performance
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’Grade Accredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Dr.R.D.Chintamani

Confusion Matrix
• Confusion Matrix: True/False Negative/Positive
• Accuracy, Precision, Recall, Specificity, F1-score
• AUC-ROC evaluation metric
Reference
https://www.youtube.com/watch?v=IImptOIqllo

Confusion Matrix
• The confusion matrix is a matrix used to determine the performance of
the classification models for a given set of test data. It can only be
determined if the true values for test data are known.
• Since it shows the errors in the model performance in the form of a
matrix, hence also known as an error matrix.
• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3
classes, it is 3*3 table, and so on.
• The matrix is divided into two dimensions, that are predicted
values and actual values along with the total number of predictions.

Performance Metrics
• True Negative: Model has given prediction No, and the real
or actual value was also No.
• True Positive: The model has predicted yes, and the actual
value was also true.
• False Negative: The model has predicted no, but the actual
value was Yes, it is also called as Type-II error.
• False Positive: The model has predicted Yes, but the actual
value was No. It is also called a Type-I error.

Performance Metrics
• The table is given for the two-class classifier, which has two predictions "Yes" and
"NO." Here, Yes defines that patient has the disease, and No defines that patient
does not has that disease.
• The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.
• The model has given prediction "yes" for 32 times, and "No" for 68 times. Whereas
the actual "Yes" was 27, and actual "No" was 73 times.

Calculations using Confusion Matrix:
A) Accuracy :ratio of the number of correct predictions made by
the classifier to all number of predictions made by the
classifiers.
Accuracy = TP+TN/TP+FN+FP+TN
Actual
Positive (1) Negative (0)
Prediction Positive (1) TP FP
FN TN

B) Precision - the number of correct outputs provided by the model or out
of all positive classes that have predicted correctly by the model, how many
of them were actually true.
Precision = TP/TP+FP
Actual
FN TN

C) Recall :It is defined as the out of total positive classes, how our
model predicted correctly. The recall must be as high as possible.
Recall = TP/TP+FN
Actual
FN TN

Actual
FN TN
D) Specificity -
Specificity - TN/TN+FP
E) F1- Score= 2*Precision*Recall/Precision + Recall

ROC- receiver operating characteristic curve
• Sensitivity=Recall = (True Positive)/(True Positive + False Negative)
• proportion of the positive class got correctly classified
• Specificity = (True Negative)/(True Negative + False Positive)
proportion of the negative class got correctly classified.
• ROC curve (receiver operating characteristic curve):The ROC is a graph
displaying a classifier's performance for all possible thresholds. The graph is
plotted between the true positive rate (on the Y-axis) and the false Positive rate
(on the x-axis).

AUC - ROC curve is a performance measurement for the classification problems
at various threshold settings. ROC is a probability curve and AUC represents the
degree or measure of separability. It tells how much the model is capable of
distinguishing between classes. Higher the AUC, the better the model is at
predicting 0 classes as 0 and 1 classes as 1. By analogy, the Higher the AUC, the
better the model is at distinguishing between patients with the disease and no
disease.
The ROC curve is plotted with TPR against the FPR where TPR is on the y-axis
and FPR is on the x-axis.

Where are Sensitivity and Specificity used?
• An excellent model has AUC near to the 1 which means it has a good
measure of separability. A poor model has an AUC near 0 which means
it has the worst measure of separability. In fact, it means it is
reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when
AUC is 0.5, it means the model has no class separation capacity
whatsoever.
• to plot the ROC curve.
• to plot Area under the ROC curve (AUC) is used to determine the
model performance.

Multiclass classification performance
Predicted
Actual
A B C D E
A TPA EAB EAC EAD EAE
B EBA TPB EBC EBD EBE
C ECA ECB TPC ECD ECE
D EAD EDB EDC TPD EDE
EEA EAB EEC ECD TPE
TPA= True Positive when input is properly identified
EAB= Actually A but classify into B

Continue..
Calculate Performance Metrics
A) True Positive- This value directly received from correctly identified input
number.
B) False Negative - False Negative of respective class is the sum of all
values excluding TPA in same row.
C) False Positive - False Positive of respective class is the sum of all
values excluding TPA in same column.
D) True Negative - True Negative for a specific class will be the sum of all
columns and rows excluding class column and row.

Example
Predicted
A B C
Actual A 30 5 2
2 40 5
2 0 20
Calculate Accuracy, Precision for class A and Recall/Sensitivity for class B.

Answers
A) Accuracy = ( 30+40+20)/ (30+5+2+2+40+5+2+0+20) = 0.8490
B) Precision A= 30/(30+5+2)= 0.8823
C) Recall/Sensitivity B = 40/(2+40+5)= 0.8510

3Assessing classification performance.pdf

More Related Content

Similar to 3Assessing classification performance.pdf

More from RAMESHWAR CHINTAMANI

Recently uploaded

3Assessing classification performance.pdf