CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf

P1WU
UNIT – III: CLASSIFICATION
Topic 10: ACCURACY AND ERROR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES

UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naïve Text Classification
4. Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or Dimensionality
Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional Indexing
SEMESTER – VIII

ACCURACY AND ERROR
SEMESTER – VIII

ACCURACY AND ERROR
• Accuracy
• Accuracy is the proportion of the time that the predicted class
equals the actual class, usually expressed as a percentage.
• It's meaning is straightforward, but may obscure important
differences in costs associated with different errors.
• The classic example of such costs is the medical diagnostic situation,
in which one can err be either:
• 1. keeping a healthy patient in the hospital (low cost), or
• 2. sending home a sick patient (very high cost).
SEMESTER – VIII

ACCURACY AND ERROR
• These classifiers need to be checked for both the accuracy of their
probabilities (Do cases predicted to have a 5% (30%, 80%, etc.)
probability really belong to the target class 5% (30%, 80%, etc.) of
the time?) and their ability to separate the classes in question.
Accuracy can be measured using many of the same metrics used to
evaluate numerical models (MSE, MAE, etc.).
SEMESTER – VIII

Accuracy and Error Measures
• All models must be assessed somehow.
• Despite the existence of a bewildering array of performance
measures, much commercial modeling software provides a
surprisingly limited range of options.
SEMESTER – VIII

Mean Squared Error (MSE)
• Mean Squared Error (MSE) is by far the most common measure of
numerical model performance.
• It is simply the average of the squares of the differences between
the predicted and actual values.
• It is a reasonably good measure of performance, though it could be
argued that it overemphasizes the importance of larger errors.
• Many modeling procedures directly minimize it.
SEMESTER – VIII

Mean Absolute Error (MAE)
• Mean Absolute Error (MAE) is similar to the Mean Squared Error, but it
uses absolute values instead of squaring.
• This measure is not as popular as MSE, though its meaning is more
intuitive .
•
• Bias is the average of the differences between the predicted and actual
values.
• With this measure, positive errors cancel out negative ones.
• Bias is intended to assess how much higher or lower predictions are, on
average, than actual values.
SEMESTER – VIII

Mean Absolute Percent Error (MAPE)
• Mean Absolute Percent Error (MAPE) is the average of the absolute
errors, as a percentage of the actual values.
• This is a relative measure of error, which is useful when larger errors
are more acceptable on larger actual values.
SEMESTER – VIII

Classification Accuracy: Estimating Error Rates
• Partition: Training-and-testing
o Use two independent data sets, e.g., training set (2/3), test set(1/3)
o Used for data set with large number of samples
• Cross-validation
o Divide the data set into k subsamples
o Use k-1 subsamples as training data and one sub-sample as test data—k-fold cross-validation
o For data set with moderate size
• Bootstrapping (leave-one-out)
o For small size data
• Confusion Matrix:
o This matrix shows not only how well the classifier predicts different classes
o It describes information about actual and detected classes:
SEMESTER – VIII

Classification Accuracy: Estimating Error Rates
Detected
Positive Negative
Actual Positive A: True positive B: False Negative
Negative C: False Positive D: True Negative
• The recall (or the true positive rate) and the precision (or the positive predictive rate) can
be derived from the confusion matrix as follows:
• o Recall = A / A+B
• o Precision = A / A+ C
SEMESTER – VIII

Classifier Accuracy Measures
• Accuracy of a classifier M, acc(M):
• percentage of test set tuples that are correctly classified by the model M
• Error rate (misclassification rate) of M = 1 – acc(M)
• Given m classes, CMi,j, an entry in a confusion matrix, indicates # of tuples in class i that are
labeled by the classifier as class j
SEMESTER – VIII
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative

Classifier Accuracy Measures
• Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate */
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
• This model can also be used for cost-benefit analysis
SEMESTER – VIII
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative

Any Questions?
SEMESTER – VIII

CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf

Recommended

Recommended

More Related Content

Similar to CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf

Similar to CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf (20)

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (16)

Recently uploaded

Recently uploaded (20)

CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf