SlideShare a Scribd company logo
1 of 9
Dr. Oner Celepcikay
CS 4319
CS 4319
Machine Learning
Week 6
Data Science Tool I – Classification Part II
Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52"
Width: 10.02"
Scale: 70%
Position on slide:
Horizontal - 0"
Vertical - 0"
Tree InductionGreedy strategy.Split the records based on an
attribute test that optimizes certain criterion.
IssuesDetermine how to split the recordsHow to specify the
attribute test condition?How to determine the best
split?Determine when to stop splitting
Stopping Criteria for Tree InductionStop expanding a node
when all the records belong to the same class
Stop expanding a node when all the records have similar
attribute values
Early termination (to be discussed later)
Practical Issues of ClassificationUnderfitting and Overfitting
Missing Values
Costs of Classification
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test
errors are large
Overfitting due to Noise
Decision boundary is distorted by noise point
Overfitting due to Noise
* Bats and Whales are misclassified; non-mammals instead of
mammals.
Overfitting due to Noise
Decision boundary is distorted by noise point
Both humans and dolphins were misclassified as n0n-mammals
b/c Body Temp, Gives_Birth and Four-legged values are
identical to mislabeled records in training set.
Spiny anteaters represent an exceptional case (every warm-
blooded with no gives_birth is non-mammal in TR_Set
Decision tree perfectly fits training data (training error=0)
But error rate on test data is 30%.
Overfitting due to Noise
Estimating Generalization ErrorsRe-substitution errors: error on
Methods for estimating generalization errors:Optimistic
approach: e’(t) = e(t)Pessimistic approach: For each leaf
node: e’(t) = (e(
(N: number of leaf nodes) For a tree with 30 leaf nodes and 10
errors on training
(out of 1000 instances):
Training error = 10/1000 = 1%
2.5%Reduced error pruning (REP): uses validation data set to
estimate generalization
error
Occam’s RazorGiven two models of similar generalization
errors, one should prefer the simpler model over the more
complex model
For complex models, there is a greater chance that it was fitted
accidentally by errors in data
Therefore, one should include model complexity when
evaluating a model
How to Address OverfittingPre-Pruning (Early Stopping
Rule)Stop the algorithm before it becomes a fully-grown
treeTypical stopping conditions for a node: Stop if all instances
belong to the same class Stop if all the attribute values are the
sameMore restrictive conditions: Stop if number of instances is
less than some user-specified threshold Stop if class distribution
of instances are independent of the available features (e.g.,
improve impurity
measures (e.g., Gini or information gain).
How to Address Overfitting…Post-pruningGrow decision tree to
its entiretyTrim the nodes of the decision tree in a bottom-up
fashionIf generalization error improves after trimming, replace
sub-tree by a leaf node.Class label of leaf node is determined
from majority class of instances in the sub-tree
Example of Post-Pruning
Training Error (Before splitting) = 10/30
Pessimistic error = (10 + 0.5)/30 = 10.5/30
Training Error (After splitting) = 7/30
Pessimistic error (After splitting)
PRUNE OR DO NOT PRUNEClass = Yes20Class =
No10Error = ?Class = Yes8Class = No4Class = Yes2Class =
No5Class = Yes6Class = No1Class = Yes4Class = No0
Handling Missing Attribute ValuesMissing values affect
decision tree construction in three different ways:Affects how
impurity measures are computedAffects how to distribute
instance with missing value to child nodesAffects how a test
instance with missing value is classified
Model EvaluationMetrics for Performance EvaluationHow to
evaluate the performance of a model?
Methods for Performance EvaluationHow to obtain reliable
estimates?
Methods for Model ComparisonHow to compare the relative
performance among competing models?
Model EvaluationMetrics for Performance EvaluationHow to
evaluate the performance of a model?
Methods for Performance EvaluationHow to obtain reliable
estimates?
Methods for Model ComparisonHow to compare the relative
performance among competing models?
Metrics for Performance EvaluationFocus on the predictive
capability of a modelRather than how fast it takes to classify or
build models, scalability, etc.Confusion Matrix:
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)PREDICTED CLASS
ACTUAL CLASSClass=YesClass=NoClass=YesabClass=Nocd
Metrics for Performance Evaluation…
Most widely-used metric:PREDICTED CLASS
ACTUAL CLASSClass=YesClass=NoClass=Yesa (TP)b
(FN)Class=Noc (FP)d (TN)
Limitation of AccuracyConsider a 2-class problemNumber of
Class 0 examples = 9990Number of Class 1 examples = 10
If model predicts everything to be class 0, accuracy is
9990/10000 = 99.9 %Accuracy is misleading because model
does not detect any class 1 example
Cost Matrix
C(i|j): Cost of misclassifying class j example as class i
PREDICTED CLASS
ACTUAL
CLASSC(i|j)Class=YesClass=NoClass=YesC(Yes|Yes)C(No|Yes
)Class=NoC(Yes|No)C(No|No)
Computing Cost of Classification
Accuracy = 80%
Cost = 3910
Accuracy = 90%
Cost = 4255Cost MatrixPREDICTED CLASS ACTUAL
CLASSC(i|j)+-+-1100-10Model M1PREDICTED CLASS
ACTUAL CLASS+-+15040-60250Model M2PREDICTED
CLASS ACTUAL CLASS+-+25045-5200
Cost vs AccuracyCountPREDICTED CLASS
ACTUAL
CLASSClass=YesClass=NoClass=YesabClass=NocdCostPREDI
CTED CLASS
ACTUAL CLASSClass=YesClass=NoClass=YespqClass=Noqp
Model EvaluationMetrics for Performance EvaluationHow to
evaluate the performance of a model?
Methods for Performance EvaluationHow to obtain reliable
estimates?
Methods for Model ComparisonHow to compare the relative
performance among competing models?
Methods for Performance EvaluationHow to obtain a reliable
estimate of performance?
Performance of a model may depend on other factors besides the
learning algorithm:Class distributionCost of
misclassificationSize of training and test sets
Methods of EstimationHoldoutReserve 2/3 for training and 1/3
for testing Random subsamplingRepeated holdoutCross
validationPartition data into k disjoint subsetsk-fold: train on k-
1 partitions, test on the remaining oneLeave-one-out:
k=nStratified sampling oversampling vs
undersamplingBootstrapSampling with replacement
Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52"
Width: 10.02"
Scale: 70%
Position on slide:
Horizontal - 0"
Vertical - 0"
A?
A1
A2
A3
A4
FN
FP
TN
TP
TN
TP
d
c
b
a
d
a
+
+
+
+
=
+
+
+
+
=
Accuracy

More Related Content

Similar to Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx

Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 

Similar to Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx (20)

Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 

More from madlynplamondon

. According to your textbook, Contrary to a popular misconception.docx
. According to your textbook, Contrary to a popular misconception.docx. According to your textbook, Contrary to a popular misconception.docx
. According to your textbook, Contrary to a popular misconception.docxmadlynplamondon
 
-How did artwork produced in America from 1945 to 1960 compare to ar.docx
-How did artwork produced in America from 1945 to 1960 compare to ar.docx-How did artwork produced in America from 1945 to 1960 compare to ar.docx
-How did artwork produced in America from 1945 to 1960 compare to ar.docxmadlynplamondon
 
-Just thoughts and opinion on the reading-Consent and compen.docx
-Just thoughts and opinion on the reading-Consent and compen.docx-Just thoughts and opinion on the reading-Consent and compen.docx
-Just thoughts and opinion on the reading-Consent and compen.docxmadlynplamondon
 
. The Questioned Documents Unit (QDU) provides forensic support .docx
. The Questioned Documents Unit (QDU) provides forensic support .docx. The Questioned Documents Unit (QDU) provides forensic support .docx
. The Questioned Documents Unit (QDU) provides forensic support .docxmadlynplamondon
 
.  What is it about the fundamental nature and structure of the Olym.docx
.  What is it about the fundamental nature and structure of the Olym.docx.  What is it about the fundamental nature and structure of the Olym.docx
.  What is it about the fundamental nature and structure of the Olym.docxmadlynplamondon
 
-Learning objectives for presentation-Brief background o.docx
-Learning objectives for presentation-Brief background o.docx-Learning objectives for presentation-Brief background o.docx
-Learning objectives for presentation-Brief background o.docxmadlynplamondon
 
-You will need to play a phone game Angry Birds (any version) to mak.docx
-You will need to play a phone game Angry Birds (any version) to mak.docx-You will need to play a phone game Angry Birds (any version) to mak.docx
-You will need to play a phone game Angry Birds (any version) to mak.docxmadlynplamondon
 
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docxmadlynplamondon
 
. What were the causes of World War II Explain how and why the Unit.docx
. What were the causes of World War II Explain how and why the Unit.docx. What were the causes of World War II Explain how and why the Unit.docx
. What were the causes of World War II Explain how and why the Unit.docxmadlynplamondon
 
. Complete the prewriting for the progress reportPrewriting p.docx
. Complete the prewriting for the progress reportPrewriting p.docx. Complete the prewriting for the progress reportPrewriting p.docx
. Complete the prewriting for the progress reportPrewriting p.docxmadlynplamondon
 
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docxmadlynplamondon
 
-Write about a violent religious event in history.(Ex. Muslim ex.docx
-Write about a violent religious event in history.(Ex. Muslim ex.docx-Write about a violent religious event in history.(Ex. Muslim ex.docx
-Write about a violent religious event in history.(Ex. Muslim ex.docxmadlynplamondon
 
-This project is an opportunity to demonstrate the ability to analyz.docx
-This project is an opportunity to demonstrate the ability to analyz.docx-This project is an opportunity to demonstrate the ability to analyz.docx
-This project is an opportunity to demonstrate the ability to analyz.docxmadlynplamondon
 
-7 Three men are trapped in a cave with no hope of rescue and no foo.docx
-7 Three men are trapped in a cave with no hope of rescue and no foo.docx-7 Three men are trapped in a cave with no hope of rescue and no foo.docx
-7 Three men are trapped in a cave with no hope of rescue and no foo.docxmadlynplamondon
 
-1. Are the three main elements of compensation systems—internal.docx
-1. Are the three main elements of compensation systems—internal.docx-1. Are the three main elements of compensation systems—internal.docx
-1. Are the three main elements of compensation systems—internal.docxmadlynplamondon
 
- What are the key differences between national health service (.docx
- What are the key differences between national health service (.docx- What are the key differences between national health service (.docx
- What are the key differences between national health service (.docxmadlynplamondon
 
--Describe and analyze the ways in which Alfons Heck’s participation.docx
--Describe and analyze the ways in which Alfons Heck’s participation.docx--Describe and analyze the ways in which Alfons Heck’s participation.docx
--Describe and analyze the ways in which Alfons Heck’s participation.docxmadlynplamondon
 
------ Watch an online speechpresentation of 20 minutes or lo.docx
------ Watch an online speechpresentation of 20 minutes or lo.docx------ Watch an online speechpresentation of 20 minutes or lo.docx
------ Watch an online speechpresentation of 20 minutes or lo.docxmadlynplamondon
 
) Florida National UniversityNursing DepartmentBSN.docx
) Florida National UniversityNursing DepartmentBSN.docx) Florida National UniversityNursing DepartmentBSN.docx
) Florida National UniversityNursing DepartmentBSN.docxmadlynplamondon
 
- Please answer question 2 at the end of the case.- cita.docx
- Please answer question 2 at the end of the case.- cita.docx- Please answer question 2 at the end of the case.- cita.docx
- Please answer question 2 at the end of the case.- cita.docxmadlynplamondon
 

More from madlynplamondon (20)

. According to your textbook, Contrary to a popular misconception.docx
. According to your textbook, Contrary to a popular misconception.docx. According to your textbook, Contrary to a popular misconception.docx
. According to your textbook, Contrary to a popular misconception.docx
 
-How did artwork produced in America from 1945 to 1960 compare to ar.docx
-How did artwork produced in America from 1945 to 1960 compare to ar.docx-How did artwork produced in America from 1945 to 1960 compare to ar.docx
-How did artwork produced in America from 1945 to 1960 compare to ar.docx
 
-Just thoughts and opinion on the reading-Consent and compen.docx
-Just thoughts and opinion on the reading-Consent and compen.docx-Just thoughts and opinion on the reading-Consent and compen.docx
-Just thoughts and opinion on the reading-Consent and compen.docx
 
. The Questioned Documents Unit (QDU) provides forensic support .docx
. The Questioned Documents Unit (QDU) provides forensic support .docx. The Questioned Documents Unit (QDU) provides forensic support .docx
. The Questioned Documents Unit (QDU) provides forensic support .docx
 
.  What is it about the fundamental nature and structure of the Olym.docx
.  What is it about the fundamental nature and structure of the Olym.docx.  What is it about the fundamental nature and structure of the Olym.docx
.  What is it about the fundamental nature and structure of the Olym.docx
 
-Learning objectives for presentation-Brief background o.docx
-Learning objectives for presentation-Brief background o.docx-Learning objectives for presentation-Brief background o.docx
-Learning objectives for presentation-Brief background o.docx
 
-You will need to play a phone game Angry Birds (any version) to mak.docx
-You will need to play a phone game Angry Birds (any version) to mak.docx-You will need to play a phone game Angry Birds (any version) to mak.docx
-You will need to play a phone game Angry Birds (any version) to mak.docx
 
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx
. EDU 571 Week 5 Discussion 1 -Data Collection Please respond .docx
 
. What were the causes of World War II Explain how and why the Unit.docx
. What were the causes of World War II Explain how and why the Unit.docx. What were the causes of World War II Explain how and why the Unit.docx
. What were the causes of World War II Explain how and why the Unit.docx
 
. Complete the prewriting for the progress reportPrewriting p.docx
. Complete the prewriting for the progress reportPrewriting p.docx. Complete the prewriting for the progress reportPrewriting p.docx
. Complete the prewriting for the progress reportPrewriting p.docx
 
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx
-in Filomena by Roberta Fernandez the author refers to the Mexican r.docx
 
-Write about a violent religious event in history.(Ex. Muslim ex.docx
-Write about a violent religious event in history.(Ex. Muslim ex.docx-Write about a violent religious event in history.(Ex. Muslim ex.docx
-Write about a violent religious event in history.(Ex. Muslim ex.docx
 
-This project is an opportunity to demonstrate the ability to analyz.docx
-This project is an opportunity to demonstrate the ability to analyz.docx-This project is an opportunity to demonstrate the ability to analyz.docx
-This project is an opportunity to demonstrate the ability to analyz.docx
 
-7 Three men are trapped in a cave with no hope of rescue and no foo.docx
-7 Three men are trapped in a cave with no hope of rescue and no foo.docx-7 Three men are trapped in a cave with no hope of rescue and no foo.docx
-7 Three men are trapped in a cave with no hope of rescue and no foo.docx
 
-1. Are the three main elements of compensation systems—internal.docx
-1. Are the three main elements of compensation systems—internal.docx-1. Are the three main elements of compensation systems—internal.docx
-1. Are the three main elements of compensation systems—internal.docx
 
- What are the key differences between national health service (.docx
- What are the key differences between national health service (.docx- What are the key differences between national health service (.docx
- What are the key differences between national health service (.docx
 
--Describe and analyze the ways in which Alfons Heck’s participation.docx
--Describe and analyze the ways in which Alfons Heck’s participation.docx--Describe and analyze the ways in which Alfons Heck’s participation.docx
--Describe and analyze the ways in which Alfons Heck’s participation.docx
 
------ Watch an online speechpresentation of 20 minutes or lo.docx
------ Watch an online speechpresentation of 20 minutes or lo.docx------ Watch an online speechpresentation of 20 minutes or lo.docx
------ Watch an online speechpresentation of 20 minutes or lo.docx
 
) Florida National UniversityNursing DepartmentBSN.docx
) Florida National UniversityNursing DepartmentBSN.docx) Florida National UniversityNursing DepartmentBSN.docx
) Florida National UniversityNursing DepartmentBSN.docx
 
- Please answer question 2 at the end of the case.- cita.docx
- Please answer question 2 at the end of the case.- cita.docx- Please answer question 2 at the end of the case.- cita.docx
- Please answer question 2 at the end of the case.- cita.docx
 

Recently uploaded

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx

  • 1. Dr. Oner Celepcikay CS 4319 CS 4319 Machine Learning Week 6 Data Science Tool I – Classification Part II Header – dark yellow 24 points Arial Bold Body text – white 20 points Arial Bold, dark yellow highlights Bullets – dark yellow Copyright – white 12 points Arial Size: Height: 7.52" Width: 10.02" Scale: 70% Position on slide: Horizontal - 0" Vertical - 0" Tree InductionGreedy strategy.Split the records based on an attribute test that optimizes certain criterion. IssuesDetermine how to split the recordsHow to specify the attribute test condition?How to determine the best split?Determine when to stop splitting Stopping Criteria for Tree InductionStop expanding a node
  • 2. when all the records belong to the same class Stop expanding a node when all the records have similar attribute values Early termination (to be discussed later) Practical Issues of ClassificationUnderfitting and Overfitting Missing Values Costs of Classification Underfitting and Overfitting Overfitting Underfitting: when model is too simple, both training and test errors are large Overfitting due to Noise Decision boundary is distorted by noise point Overfitting due to Noise * Bats and Whales are misclassified; non-mammals instead of mammals. Overfitting due to Noise Decision boundary is distorted by noise point
  • 3. Both humans and dolphins were misclassified as n0n-mammals b/c Body Temp, Gives_Birth and Four-legged values are identical to mislabeled records in training set. Spiny anteaters represent an exceptional case (every warm- blooded with no gives_birth is non-mammal in TR_Set Decision tree perfectly fits training data (training error=0) But error rate on test data is 30%. Overfitting due to Noise Estimating Generalization ErrorsRe-substitution errors: error on Methods for estimating generalization errors:Optimistic approach: e’(t) = e(t)Pessimistic approach: For each leaf node: e’(t) = (e( (N: number of leaf nodes) For a tree with 30 leaf nodes and 10 errors on training (out of 1000 instances): Training error = 10/1000 = 1% 2.5%Reduced error pruning (REP): uses validation data set to estimate generalization error Occam’s RazorGiven two models of similar generalization errors, one should prefer the simpler model over the more complex model For complex models, there is a greater chance that it was fitted accidentally by errors in data Therefore, one should include model complexity when
  • 4. evaluating a model How to Address OverfittingPre-Pruning (Early Stopping Rule)Stop the algorithm before it becomes a fully-grown treeTypical stopping conditions for a node: Stop if all instances belong to the same class Stop if all the attribute values are the sameMore restrictive conditions: Stop if number of instances is less than some user-specified threshold Stop if class distribution of instances are independent of the available features (e.g., improve impurity measures (e.g., Gini or information gain). How to Address Overfitting…Post-pruningGrow decision tree to its entiretyTrim the nodes of the decision tree in a bottom-up fashionIf generalization error improves after trimming, replace sub-tree by a leaf node.Class label of leaf node is determined from majority class of instances in the sub-tree Example of Post-Pruning Training Error (Before splitting) = 10/30 Pessimistic error = (10 + 0.5)/30 = 10.5/30 Training Error (After splitting) = 7/30 Pessimistic error (After splitting) PRUNE OR DO NOT PRUNEClass = Yes20Class = No10Error = ?Class = Yes8Class = No4Class = Yes2Class = No5Class = Yes6Class = No1Class = Yes4Class = No0
  • 5. Handling Missing Attribute ValuesMissing values affect decision tree construction in three different ways:Affects how impurity measures are computedAffects how to distribute instance with missing value to child nodesAffects how a test instance with missing value is classified Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model? Methods for Performance EvaluationHow to obtain reliable estimates? Methods for Model ComparisonHow to compare the relative performance among competing models? Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model? Methods for Performance EvaluationHow to obtain reliable estimates? Methods for Model ComparisonHow to compare the relative performance among competing models? Metrics for Performance EvaluationFocus on the predictive capability of a modelRather than how fast it takes to classify or build models, scalability, etc.Confusion Matrix: a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative)PREDICTED CLASS
  • 6. ACTUAL CLASSClass=YesClass=NoClass=YesabClass=Nocd Metrics for Performance Evaluation… Most widely-used metric:PREDICTED CLASS ACTUAL CLASSClass=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc (FP)d (TN) Limitation of AccuracyConsider a 2-class problemNumber of Class 0 examples = 9990Number of Class 1 examples = 10 If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 %Accuracy is misleading because model does not detect any class 1 example Cost Matrix C(i|j): Cost of misclassifying class j example as class i PREDICTED CLASS ACTUAL CLASSC(i|j)Class=YesClass=NoClass=YesC(Yes|Yes)C(No|Yes )Class=NoC(Yes|No)C(No|No) Computing Cost of Classification
  • 7. Accuracy = 80% Cost = 3910 Accuracy = 90% Cost = 4255Cost MatrixPREDICTED CLASS ACTUAL CLASSC(i|j)+-+-1100-10Model M1PREDICTED CLASS ACTUAL CLASS+-+15040-60250Model M2PREDICTED CLASS ACTUAL CLASS+-+25045-5200 Cost vs AccuracyCountPREDICTED CLASS ACTUAL CLASSClass=YesClass=NoClass=YesabClass=NocdCostPREDI CTED CLASS ACTUAL CLASSClass=YesClass=NoClass=YespqClass=Noqp Model EvaluationMetrics for Performance EvaluationHow to evaluate the performance of a model? Methods for Performance EvaluationHow to obtain reliable estimates? Methods for Model ComparisonHow to compare the relative performance among competing models? Methods for Performance EvaluationHow to obtain a reliable estimate of performance? Performance of a model may depend on other factors besides the learning algorithm:Class distributionCost of misclassificationSize of training and test sets
  • 8. Methods of EstimationHoldoutReserve 2/3 for training and 1/3 for testing Random subsamplingRepeated holdoutCross validationPartition data into k disjoint subsetsk-fold: train on k- 1 partitions, test on the remaining oneLeave-one-out: k=nStratified sampling oversampling vs undersamplingBootstrapSampling with replacement Header – dark yellow 24 points Arial Bold Body text – white 20 points Arial Bold, dark yellow highlights Bullets – dark yellow Copyright – white 12 points Arial Size: Height: 7.52" Width: 10.02" Scale: 70% Position on slide: Horizontal - 0" Vertical - 0" A? A1 A2 A3 A4 FN FP TN TP TN TP d c b a d a +