SlideShare a Scribd company logo
1 of 14
Download to read offline
P1WU
UNIT – III: CLASSIFICATION
Topic 10: ACCURACY AND ERROR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naïve Text Classification
4. Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or Dimensionality
Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ACCURACY AND ERROR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ACCURACY AND ERROR
• Accuracy
• Accuracy is the proportion of the time that the predicted class
equals the actual class, usually expressed as a percentage.
• It's meaning is straightforward, but may obscure important
differences in costs associated with different errors.
• The classic example of such costs is the medical diagnostic situation,
in which one can err be either:
• 1. keeping a healthy patient in the hospital (low cost), or
• 2. sending home a sick patient (very high cost).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ACCURACY AND ERROR
• These classifiers need to be checked for both the accuracy of their
probabilities (Do cases predicted to have a 5% (30%, 80%, etc.)
probability really belong to the target class 5% (30%, 80%, etc.) of
the time?) and their ability to separate the classes in question.
Accuracy can be measured using many of the same metrics used to
evaluate numerical models (MSE, MAE, etc.).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Accuracy and Error Measures
• All models must be assessed somehow.
• Despite the existence of a bewildering array of performance
measures, much commercial modeling software provides a
surprisingly limited range of options.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Mean Squared Error (MSE)
• Mean Squared Error (MSE) is by far the most common measure of
numerical model performance.
• It is simply the average of the squares of the differences between
the predicted and actual values.
• It is a reasonably good measure of performance, though it could be
argued that it overemphasizes the importance of larger errors.
• Many modeling procedures directly minimize it.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Mean Absolute Error (MAE)
• Mean Absolute Error (MAE) is similar to the Mean Squared Error, but it
uses absolute values instead of squaring.
• This measure is not as popular as MSE, though its meaning is more
intuitive .
•
• Bias is the average of the differences between the predicted and actual
values.
• With this measure, positive errors cancel out negative ones.
• Bias is intended to assess how much higher or lower predictions are, on
average, than actual values.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Mean Absolute Percent Error (MAPE)
• Mean Absolute Percent Error (MAPE) is the average of the absolute
errors, as a percentage of the actual values.
• This is a relative measure of error, which is useful when larger errors
are more acceptable on larger actual values.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification Accuracy: Estimating Error Rates
• Partition: Training-and-testing
o Use two independent data sets, e.g., training set (2/3), test set(1/3)
o Used for data set with large number of samples
• Cross-validation
o Divide the data set into k subsamples
o Use k-1 subsamples as training data and one sub-sample as test data—k-fold cross-validation
o For data set with moderate size
• Bootstrapping (leave-one-out)
o For small size data
• Confusion Matrix:
o This matrix shows not only how well the classifier predicts different classes
o It describes information about actual and detected classes:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification Accuracy: Estimating Error Rates
Detected
Positive Negative
Actual Positive A: True positive B: False Negative
Negative C: False Positive D: True Negative
• The recall (or the true positive rate) and the precision (or the positive predictive rate) can
be derived from the confusion matrix as follows:
• o Recall = A / A+B
• o Precision = A / A+ C
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classifier Accuracy Measures
• Accuracy of a classifier M, acc(M):
• percentage of test set tuples that are correctly classified by the model M
• Error rate (misclassification rate) of M = 1 – acc(M)
• Given m classes, CMi,j, an entry in a confusion matrix, indicates # of tuples in class i that are
labeled by the classifier as class j
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative
Classifier Accuracy Measures
• Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate */
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
• This model can also be used for cost-benefit analysis
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
C1 C2
C1 True positive False negative
C2 False positive True negative
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES

More Related Content

Similar to CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf

Need of Quality Engineering and Failure analysis Techniques
Need of Quality Engineering and Failure analysis Techniques  Need of Quality Engineering and Failure analysis Techniques
Need of Quality Engineering and Failure analysis Techniques
Greeshma S
 

Similar to CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf (20)

CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
 
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
CS8080_IRT_UNIT - III T5  DECISION TREES.pdfCS8080_IRT_UNIT - III T5  DECISION TREES.pdf
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
 
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfCS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
 
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx
 
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
 
M.tech. mechanical engineering 2016 17
M.tech. mechanical engineering 2016 17M.tech. mechanical engineering 2016 17
M.tech. mechanical engineering 2016 17
 
Machine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer ClassificationMachine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer Classification
 
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdfCS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
 
AI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentAI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI development
 
CHAPTER 4 SQC.pptx
CHAPTER 4 SQC.pptxCHAPTER 4 SQC.pptx
CHAPTER 4 SQC.pptx
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Me8th
Me8thMe8th
Me8th
 
Me8th
Me8thMe8th
Me8th
 
Need of Quality Engineering and Failure analysis Techniques
Need of Quality Engineering and Failure analysis Techniques  Need of Quality Engineering and Failure analysis Techniques
Need of Quality Engineering and Failure analysis Techniques
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (16)

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.ppt
 
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS3251-_PIC
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf

  • 1. P1WU UNIT – III: CLASSIFICATION Topic 10: ACCURACY AND ERROR AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 2. UNIT III : TEXT CLASSIFICATION AND CLUSTERING 1.A Characterization of Text Classification 2. Unsupervised Algorithms: Clustering 3. Naïve Text Classification 4. Supervised Algorithms 5. Decision Tree 6. k-NN Classifier 7. SVM Classifier 8. Feature Selection or Dimensionality Reduction 9. Evaluation metrics 10. Accuracy and Error 11. Organizing the classes 12. Indexing and Searching 13. Inverted Indexes 14. Sequential Searching 15. Multi-dimensional Indexing AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 3. ACCURACY AND ERROR AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 4. ACCURACY AND ERROR • Accuracy • Accuracy is the proportion of the time that the predicted class equals the actual class, usually expressed as a percentage. • It's meaning is straightforward, but may obscure important differences in costs associated with different errors. • The classic example of such costs is the medical diagnostic situation, in which one can err be either: • 1. keeping a healthy patient in the hospital (low cost), or • 2. sending home a sick patient (very high cost). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 5. ACCURACY AND ERROR • These classifiers need to be checked for both the accuracy of their probabilities (Do cases predicted to have a 5% (30%, 80%, etc.) probability really belong to the target class 5% (30%, 80%, etc.) of the time?) and their ability to separate the classes in question. Accuracy can be measured using many of the same metrics used to evaluate numerical models (MSE, MAE, etc.). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 6. Accuracy and Error Measures • All models must be assessed somehow. • Despite the existence of a bewildering array of performance measures, much commercial modeling software provides a surprisingly limited range of options. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 7. Mean Squared Error (MSE) • Mean Squared Error (MSE) is by far the most common measure of numerical model performance. • It is simply the average of the squares of the differences between the predicted and actual values. • It is a reasonably good measure of performance, though it could be argued that it overemphasizes the importance of larger errors. • Many modeling procedures directly minimize it. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 8. Mean Absolute Error (MAE) • Mean Absolute Error (MAE) is similar to the Mean Squared Error, but it uses absolute values instead of squaring. • This measure is not as popular as MSE, though its meaning is more intuitive . • • Bias is the average of the differences between the predicted and actual values. • With this measure, positive errors cancel out negative ones. • Bias is intended to assess how much higher or lower predictions are, on average, than actual values. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 9. Mean Absolute Percent Error (MAPE) • Mean Absolute Percent Error (MAPE) is the average of the absolute errors, as a percentage of the actual values. • This is a relative measure of error, which is useful when larger errors are more acceptable on larger actual values. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 10. Classification Accuracy: Estimating Error Rates • Partition: Training-and-testing o Use two independent data sets, e.g., training set (2/3), test set(1/3) o Used for data set with large number of samples • Cross-validation o Divide the data set into k subsamples o Use k-1 subsamples as training data and one sub-sample as test data—k-fold cross-validation o For data set with moderate size • Bootstrapping (leave-one-out) o For small size data • Confusion Matrix: o This matrix shows not only how well the classifier predicts different classes o It describes information about actual and detected classes: AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 11. Classification Accuracy: Estimating Error Rates Detected Positive Negative Actual Positive A: True positive B: False Negative Negative C: False Positive D: True Negative • The recall (or the true positive rate) and the precision (or the positive predictive rate) can be derived from the confusion matrix as follows: • o Recall = A / A+B • o Precision = A / A+ C AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 12. Classifier Accuracy Measures • Accuracy of a classifier M, acc(M): • percentage of test set tuples that are correctly classified by the model M • Error rate (misclassification rate) of M = 1 – acc(M) • Given m classes, CMi,j, an entry in a confusion matrix, indicates # of tuples in class i that are labeled by the classifier as class j AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES classes buy_computer = yes buy_computer = no total recognition(%) buy_computer = yes 6954 46 7000 99.34 buy_computer = no 412 2588 3000 86.27 total 7366 2634 10000 95.52 C1 C2 C1 True positive False negative C2 False positive True negative
  • 13. Classifier Accuracy Measures • Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos /* true positive recognition rate */ specificity = t-neg/neg /* true negative recognition rate */ precision = t-pos/(t-pos + f-pos) accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg) • This model can also be used for cost-benefit analysis AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES classes buy_computer = yes buy_computer = no total recognition(%) buy_computer = yes 6954 46 7000 99.34 buy_computer = no 412 2588 3000 86.27 total 7366 2634 10000 95.52 C1 C2 C1 True positive False negative C2 False positive True negative
  • 14. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES