Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
1. P1WU
UNIT – III: CLASSIFICATION
Topic 10: ACCURACY AND ERROR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2. UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naïve Text Classification
4. Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or Dimensionality
Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
3. ACCURACY AND ERROR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
4. ACCURACY AND ERROR
• Accuracy
• Accuracy is the proportion of the time that the predicted class
equals the actual class, usually expressed as a percentage.
• It's meaning is straightforward, but may obscure important
differences in costs associated with different errors.
• The classic example of such costs is the medical diagnostic situation,
in which one can err be either:
• 1. keeping a healthy patient in the hospital (low cost), or
• 2. sending home a sick patient (very high cost).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
5. ACCURACY AND ERROR
• These classifiers need to be checked for both the accuracy of their
probabilities (Do cases predicted to have a 5% (30%, 80%, etc.)
probability really belong to the target class 5% (30%, 80%, etc.) of
the time?) and their ability to separate the classes in question.
Accuracy can be measured using many of the same metrics used to
evaluate numerical models (MSE, MAE, etc.).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
6. Accuracy and Error Measures
• All models must be assessed somehow.
• Despite the existence of a bewildering array of performance
measures, much commercial modeling software provides a
surprisingly limited range of options.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
7. Mean Squared Error (MSE)
• Mean Squared Error (MSE) is by far the most common measure of
numerical model performance.
• It is simply the average of the squares of the differences between
the predicted and actual values.
• It is a reasonably good measure of performance, though it could be
argued that it overemphasizes the importance of larger errors.
• Many modeling procedures directly minimize it.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
8. Mean Absolute Error (MAE)
• Mean Absolute Error (MAE) is similar to the Mean Squared Error, but it
uses absolute values instead of squaring.
• This measure is not as popular as MSE, though its meaning is more
intuitive .
•
• Bias is the average of the differences between the predicted and actual
values.
• With this measure, positive errors cancel out negative ones.
• Bias is intended to assess how much higher or lower predictions are, on
average, than actual values.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
9. Mean Absolute Percent Error (MAPE)
• Mean Absolute Percent Error (MAPE) is the average of the absolute
errors, as a percentage of the actual values.
• This is a relative measure of error, which is useful when larger errors
are more acceptable on larger actual values.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
10. Classification Accuracy: Estimating Error Rates
• Partition: Training-and-testing
o Use two independent data sets, e.g., training set (2/3), test set(1/3)
o Used for data set with large number of samples
• Cross-validation
o Divide the data set into k subsamples
o Use k-1 subsamples as training data and one sub-sample as test data—k-fold cross-validation
o For data set with moderate size
• Bootstrapping (leave-one-out)
o For small size data
• Confusion Matrix:
o This matrix shows not only how well the classifier predicts different classes
o It describes information about actual and detected classes:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
11. Classification Accuracy: Estimating Error Rates
Detected
Positive Negative
Actual Positive A: True positive B: False Negative
Negative C: False Positive D: True Negative
• The recall (or the true positive rate) and the precision (or the positive predictive rate) can
be derived from the confusion matrix as follows:
• o Recall = A / A+B
• o Precision = A / A+ C
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
12. Classifier Accuracy Measures
• Accuracy of a classifier M, acc(M):
• percentage of test set tuples that are correctly classified by the model M
• Error rate (misclassification rate) of M = 1 – acc(M)
• Given m classes, CMi,j, an entry in a confusion matrix, indicates # of tuples in class i that are
labeled by the classifier as class j
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative
13. Classifier Accuracy Measures
• Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate */
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
• This model can also be used for cost-benefit analysis
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative
14. Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES