SlideShare a Scribd company logo
1 of 93
Classification
CSC504.3: Identify appropriate data mining algorithms to
solve real world problems.
CONTENT
● Basic Concepts
● Decision Tree Induction
● Naïve Bayesian Classification
● Accuracy and Error measures
● Evaluating the Accuracy of a Classifier:
○ Holdout & Random Subsampling
○ Cross Validation
○ Bootstrap
Regression Analyses
• Regression: technique concerned with predicting
some variables by knowing others
• The process of predicting variable Y using
variable X
Regression
⮚ Uses a variable (x) to predict some outcome
variable (y)
⮚ Tells you how values in y change as a function
of changes in values of x
Correlation and Regression
⮚ Correlation describes the strength of a linear
relationship between two variables
⮚ Linear means “straight line”
⮚ Regression tells us how to draw the straight line
described by the correlation
Regression
⮚ Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of
the residuals smaller than for any other line
Regression minimizes residuals
By using the least squares method (a procedure
that minimizes the vertical deviations of plotted
points surrounding a straight line) we are
able to construct a best fitting straight line to the
scatter diagram points and then formulate a
regression equation in the form of:
b
Regression Equation
⮚ Regression equation
describes the
regression line
mathematically
■ Intercept
■ Slope
Linear Equations
Hours studying and grades
Regressing grades on hours
Predicted final grade in class =
59.95 + 3.17*(number of hours you study per week)
Predict the final grade of…
■ Someone who studies for 12 hours
■ Final grade = 59.95 + (3.17*12)
■ Final grade = 97.99
■ Someone who studies for 1 hour:
■ Final grade = 59.95 + (3.17*1)
Predicted final grade in class = 59.95 + 3.17*(hours of study)
Exercise
A sample of 6 persons was selected the
value of their age ( x variable) and their
weight is demonstrated in the following
table. Find the regression equation and
what is the predicted weight when age is
8.5 years.
Serial no. Age (x) Weight (y)
1
2
3
4
5
6
7
6
8
5
6
9
12
8
12
10
11
13
Answer
Serial no. Age (x) Weight (y) xy X2 Y2
1
2
3
4
5
6
7
6
8
5
6
9
12
8
12
10
11
13
84
48
96
50
66
117
49
36
64
25
36
81
144
64
144
100
121
169
Total 41 66 461 291 742
Regression equation
we create a regression line by plotting two
estimated values for y against their X component,
then extending the line right and left.
EVALUATION OF CLASSIFIERS
● A few tools have been designed to evaluate the performance of a classifier.
● A confusion matrix is a specific table layout that allows visualization of the performance of a
classifier.
● Table shows the confusion matrix for a two-class classifier.
● True positives (TP) are the number of positive instances the classifier correctly identified as
positive.
● False positives (FP) are the number of instances in which the classifier identified as positive
but in reality are negative.
● True negatives (TN) are the number of negative instances the classifier correctly identified
as negative.
● False negatives (FN) are the number of instances classified as negative but in reality are
positive.
● In a two-class classification, a preset threshold may be used to separate positives from
negatives.
● TP and TN are the correct guesses. A good classifier should have large TP and TN and
small (ideally zero) numbers for FP and FN.
The accuracy {or the overall success rate) is a metric defining the rate at which a model has classified
the records correctly. It is defined as the sum of TP and TN divided by the total number of instances, as
shown in Equation
A good model should have a high accuracy score, but having a high accuracy score alone does not
guarantee the model is well established. The following measures can be introduced to better evaluate the
performance of a classifier.
The true positive rate (TPR) shows what percent of positive instances the
classifier
correctly identified. It's also illustrated in Equation
The false negative rate (FNR) shows what percent of positives the classifier
marked as negatives.
It is also known as the miss rate or type II error rate and is shown in Equation.
Note that the sum of TPR and FNR is 1.
A well-performed model should have a high TPR that is ideally 1 and a low FPR
and FNR that are ideally 0.
In reality, it's rare to have TPR = 1, FPR = 0, and FNR = 0, but these measures
are useful to compare the performance of multiple models that are designed for
solving the same problem.
Precision and recall are accuracy metrics used by the information retrieval
community, but they can be used to characterize classifiers in general.
Precision is the percentage of instances marked positive that really are positive,
as shown in Equation
Recall is the percentage of positive instances that were correctly identified. Recall
is equivalent to the TPR.
ROC CURVE
the ROC curve, which is a common tool to evaluate classifiers.
The abbreviation stands for receiver operating characteristic, a term used in signal
detection to characterize the trade-off between hit rate and false-alarm rate over a
noisy channel.
A ROC curve evaluates the performance of a classifier based on the TP and FP,
regardless of other factors such as class distribution and error costs.
The vertical axis is the True Positive Rate (TPR), and the horizontal axis is the
False Positive Rate (FPR).
AREA UNDER CURVE
● Related to the ROC curve is the area under the curve (AUC).
● The AUC is calculated by measuring the area under the
ROC curve.
● Higher AUC scores mean the classifier performs better.
● The score can range from 0.5 (for the diagonal line
TPR=FPR) to 1.0 (with ROC passing through the top-left
corner).
Holdout Method and Random Subsampling
The holdout method is what we have alluded to so far in our discussions about
accuracy. In this method, the given data are randomly partitioned into two independent
sets, a training set and a test set. Typically, two-thirds of the data are allocated to the
training set, and the remaining one-third is allocated to the test set. The training set is
used to derive the model. The model’s accuracy is then estimated with the test set.The
estimate is pessimistic because only a portion of the initial data is used to derive the
model.
Random subsampling is a variation of the holdout method in which the holdout
method is repeated k times. The overall accuracy estimate is taken as the average of
the
accuracies obtained from each iteration.

More Related Content

What's hot

Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
data mining
data miningdata mining
data mininguoitc
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description Sulman Ahmed
 
similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 

What's hot (20)

Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
data mining
data miningdata mining
data mining
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
data mining
data miningdata mining
data mining
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Feature scaling
Feature scalingFeature scaling
Feature scaling
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description
 
Decision tree
Decision treeDecision tree
Decision tree
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
similarity measure
similarity measure similarity measure
similarity measure
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Decision tree
Decision treeDecision tree
Decision tree
 
Clustering
ClusteringClustering
Clustering
 

Similar to Data Mining Algorithms and Classification Accuracy Evaluation

Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithmHoopeer Hoopeer
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdflisow86669
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Classification assessment methods
Classification assessment methodsClassification assessment methods
Classification assessment methodsAlaa Tharwat
 
Parametric & non parametric
Parametric & non parametricParametric & non parametric
Parametric & non parametricANCYBS
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxbelay41
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation AssessmentsDr Lendy Spires
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
An introduction to ROC analysis
An introduction to ROC analysisAn introduction to ROC analysis
An introduction to ROC analysisDr. Volkan OBAN
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionDario Panada
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptxRiadh Al-Haidari
 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
 

Similar to Data Mining Algorithms and Classification Accuracy Evaluation (20)

Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
working with python
working with pythonworking with python
working with python
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Classification assessment methods
Classification assessment methodsClassification assessment methods
Classification assessment methods
 
Parametric & non parametric
Parametric & non parametricParametric & non parametric
Parametric & non parametric
 
Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...
 
X bar and-r_charts
X bar and-r_chartsX bar and-r_charts
X bar and-r_charts
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
An introduction to ROC analysis
An introduction to ROC analysisAn introduction to ROC analysis
An introduction to ROC analysis
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anova
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 

More from nikshaikh786

Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxnikshaikh786
 
Module 1_ Introduction.pptx
Module 1_ Introduction.pptxModule 1_ Introduction.pptx
Module 1_ Introduction.pptxnikshaikh786
 
Module 1_ Introduction to Mobile Computing.pptx
Module 1_  Introduction to Mobile Computing.pptxModule 1_  Introduction to Mobile Computing.pptx
Module 1_ Introduction to Mobile Computing.pptxnikshaikh786
 
Module 2_ GSM Mobile services.pptx
Module 2_  GSM Mobile services.pptxModule 2_  GSM Mobile services.pptx
Module 2_ GSM Mobile services.pptxnikshaikh786
 
Module 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxModule 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxnikshaikh786
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxnikshaikh786
 
MODULE 5- EDA.pptx
MODULE 5- EDA.pptxMODULE 5- EDA.pptx
MODULE 5- EDA.pptxnikshaikh786
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxnikshaikh786
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptxnikshaikh786
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptxnikshaikh786
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxnikshaikh786
 
MAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfMAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfnikshaikh786
 
VIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfVIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfnikshaikh786
 
Mad&pwa practical no. 1
Mad&pwa practical no. 1Mad&pwa practical no. 1
Mad&pwa practical no. 1nikshaikh786
 

More from nikshaikh786 (20)

Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptx
 
Module 1_ Introduction.pptx
Module 1_ Introduction.pptxModule 1_ Introduction.pptx
Module 1_ Introduction.pptx
 
Module 1_ Introduction to Mobile Computing.pptx
Module 1_  Introduction to Mobile Computing.pptxModule 1_  Introduction to Mobile Computing.pptx
Module 1_ Introduction to Mobile Computing.pptx
 
Module 2_ GSM Mobile services.pptx
Module 2_  GSM Mobile services.pptxModule 2_  GSM Mobile services.pptx
Module 2_ GSM Mobile services.pptx
 
TCS MODULE 6.pdf
TCS MODULE 6.pdfTCS MODULE 6.pdf
TCS MODULE 6.pdf
 
Module 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxModule 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptx
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptx
 
MODULE 5- EDA.pptx
MODULE 5- EDA.pptxMODULE 5- EDA.pptx
MODULE 5- EDA.pptx
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptx
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptx
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 
IOE MODULE 6.pptx
IOE MODULE 6.pptxIOE MODULE 6.pptx
IOE MODULE 6.pptx
 
MAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfMAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdf
 
VIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfVIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdf
 
IOE MODULE 5.pptx
IOE MODULE 5.pptxIOE MODULE 5.pptx
IOE MODULE 5.pptx
 
Mad&pwa practical no. 1
Mad&pwa practical no. 1Mad&pwa practical no. 1
Mad&pwa practical no. 1
 
Ioe module 4
Ioe module 4Ioe module 4
Ioe module 4
 
Ioe module 3
Ioe module 3Ioe module 3
Ioe module 3
 
Ioe module 2
Ioe module 2Ioe module 2
Ioe module 2
 

Recently uploaded

Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 

Recently uploaded (20)

Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 

Data Mining Algorithms and Classification Accuracy Evaluation

  • 1. Classification CSC504.3: Identify appropriate data mining algorithms to solve real world problems.
  • 2. CONTENT ● Basic Concepts ● Decision Tree Induction ● Naïve Bayesian Classification ● Accuracy and Error measures ● Evaluating the Accuracy of a Classifier: ○ Holdout & Random Subsampling ○ Cross Validation ○ Bootstrap
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. Regression Analyses • Regression: technique concerned with predicting some variables by knowing others • The process of predicting variable Y using variable X
  • 60. Regression ⮚ Uses a variable (x) to predict some outcome variable (y) ⮚ Tells you how values in y change as a function of changes in values of x
  • 61. Correlation and Regression ⮚ Correlation describes the strength of a linear relationship between two variables ⮚ Linear means “straight line” ⮚ Regression tells us how to draw the straight line described by the correlation
  • 62. Regression ⮚ Calculates the “best-fit” line for a certain set of data The regression line makes the sum of the squares of the residuals smaller than for any other line Regression minimizes residuals
  • 63. By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) we are able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of: b
  • 64. Regression Equation ⮚ Regression equation describes the regression line mathematically ■ Intercept ■ Slope
  • 67. Regressing grades on hours Predicted final grade in class = 59.95 + 3.17*(number of hours you study per week)
  • 68. Predict the final grade of… ■ Someone who studies for 12 hours ■ Final grade = 59.95 + (3.17*12) ■ Final grade = 97.99 ■ Someone who studies for 1 hour: ■ Final grade = 59.95 + (3.17*1) Predicted final grade in class = 59.95 + 3.17*(hours of study)
  • 69. Exercise A sample of 6 persons was selected the value of their age ( x variable) and their weight is demonstrated in the following table. Find the regression equation and what is the predicted weight when age is 8.5 years.
  • 70. Serial no. Age (x) Weight (y) 1 2 3 4 5 6 7 6 8 5 6 9 12 8 12 10 11 13
  • 71. Answer Serial no. Age (x) Weight (y) xy X2 Y2 1 2 3 4 5 6 7 6 8 5 6 9 12 8 12 10 11 13 84 48 96 50 66 117 49 36 64 25 36 81 144 64 144 100 121 169 Total 41 66 461 291 742
  • 73.
  • 74. we create a regression line by plotting two estimated values for y against their X component, then extending the line right and left.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80. EVALUATION OF CLASSIFIERS ● A few tools have been designed to evaluate the performance of a classifier. ● A confusion matrix is a specific table layout that allows visualization of the performance of a classifier. ● Table shows the confusion matrix for a two-class classifier. ● True positives (TP) are the number of positive instances the classifier correctly identified as positive. ● False positives (FP) are the number of instances in which the classifier identified as positive but in reality are negative. ● True negatives (TN) are the number of negative instances the classifier correctly identified as negative. ● False negatives (FN) are the number of instances classified as negative but in reality are positive. ● In a two-class classification, a preset threshold may be used to separate positives from negatives. ● TP and TN are the correct guesses. A good classifier should have large TP and TN and small (ideally zero) numbers for FP and FN.
  • 81.
  • 82.
  • 83. The accuracy {or the overall success rate) is a metric defining the rate at which a model has classified the records correctly. It is defined as the sum of TP and TN divided by the total number of instances, as shown in Equation A good model should have a high accuracy score, but having a high accuracy score alone does not guarantee the model is well established. The following measures can be introduced to better evaluate the performance of a classifier.
  • 84. The true positive rate (TPR) shows what percent of positive instances the classifier correctly identified. It's also illustrated in Equation
  • 85. The false negative rate (FNR) shows what percent of positives the classifier marked as negatives. It is also known as the miss rate or type II error rate and is shown in Equation. Note that the sum of TPR and FNR is 1.
  • 86. A well-performed model should have a high TPR that is ideally 1 and a low FPR and FNR that are ideally 0. In reality, it's rare to have TPR = 1, FPR = 0, and FNR = 0, but these measures are useful to compare the performance of multiple models that are designed for solving the same problem.
  • 87. Precision and recall are accuracy metrics used by the information retrieval community, but they can be used to characterize classifiers in general. Precision is the percentage of instances marked positive that really are positive, as shown in Equation
  • 88. Recall is the percentage of positive instances that were correctly identified. Recall is equivalent to the TPR.
  • 89.
  • 90. ROC CURVE the ROC curve, which is a common tool to evaluate classifiers. The abbreviation stands for receiver operating characteristic, a term used in signal detection to characterize the trade-off between hit rate and false-alarm rate over a noisy channel. A ROC curve evaluates the performance of a classifier based on the TP and FP, regardless of other factors such as class distribution and error costs. The vertical axis is the True Positive Rate (TPR), and the horizontal axis is the False Positive Rate (FPR).
  • 91.
  • 92. AREA UNDER CURVE ● Related to the ROC curve is the area under the curve (AUC). ● The AUC is calculated by measuring the area under the ROC curve. ● Higher AUC scores mean the classifier performs better. ● The score can range from 0.5 (for the diagonal line TPR=FPR) to 1.0 (with ROC passing through the top-left corner).
  • 93. Holdout Method and Random Subsampling The holdout method is what we have alluded to so far in our discussions about accuracy. In this method, the given data are randomly partitioned into two independent sets, a training set and a test set. Typically, two-thirds of the data are allocated to the training set, and the remaining one-third is allocated to the test set. The training set is used to derive the model. The model’s accuracy is then estimated with the test set.The estimate is pessimistic because only a portion of the initial data is used to derive the model. Random subsampling is a variation of the holdout method in which the holdout method is repeated k times. The overall accuracy estimate is taken as the average of the accuracies obtained from each iteration.