SlideShare a Scribd company logo
1 of 12
Classification Using Decision
Trees and Rules
Chapter 5
Introduction
• Decision tree learners use a tree structure to model the
relationships
among the features and the potential outcomes.
• a structure of branching decisions into a final predicted class
value
• Decision begins at the root node, then passed through decision
nodes
that require choices.
• Choices split the data across branches that indicate potential
outcomes of a decision
• Tree is terminated by leaf nodes that denote the action to be
taken as
the result of the series of decisions.
Decision Tree Example
Benefits
• Flowchart-like tree structure is not necessarily exclusively for
the
learner's internal use.
• Resulting structure in a human-readable format.
• Provides insight into how and why the model works or doesn't
work well for a
particular task.
• Useful where classification mechanism needs to be transparent
for legal reasons, or in
case the results need to be shared with others in order to inform
future business
practices
• Credit scoring models where criteria that causes an applicant
to be rejected need to be clearly
documented and free from bias
• Marketing studies of customer behavior such as satisfaction or
churn, which will be shared
with management or advertising agencies
• Diagnosis of medical conditions based on laboratory
measurements, symptoms, or the rate of
disease progression
Applicability
• Widely used machine learning technique
• Can be applied to model almost any type of data with
excellent
results
• Does not fit task where the data has a large number of nominal
features with many levels or it has a large number of numeric
features.
• Result in large number of decisions and an overly complex
tree.
• Tendency of decision trees to overfit data, though this can be
overcome by
adjusting some simple parameters
Divide and Conquer
• Decision trees are built using a heuristic called recursive
partitioning.
• Divide and conquer because it splits the data into subsets,
which are then
split repeatedly into even smaller subsets,
• Stops when the data within the subsets are sufficiently
homogenous, or
another stopping criterion has been met.
• Root node represents the entire dataset
• Algorithm must choose a feature to split upon
• Choose the feature most predictive of the target class.
• Algorithm continues to divide and conquer the data, choosing
the best
candidate feature each time to create another decision node,
until a stopping
criterion is reached.
Divide and Conquer
• Stopping Conditions
• All (or nearly all) of the examples at the node have the same
class
• There are no remaining features to distinguish among the
examples
• The tree has grown to a predefined size limit
Example
• Finding potential for a movie- Box Office Bust, Mainstream
Hit, Critical Success
• Diagonal lines might have split the data even more cleanly.
• Limitation of the decision tree's knowledge representation,
which uses axis-parallel splits.
• Each split considers one feature at a time prevents the
decision tree from forming more
complex decision boundaries
Decision Tree
Predict: Will John Play Tennis?
Predict on D15 Rain High Weak ?
Vs. k-NN: Lookup
• Difficult to Interpret
• Understanding of when John plays
• Divide and conquer
• Split into subsets
• Is it pure?
• Yes: Stop
• No: Repeat
• Understand which subset the
data falls into
Credit Example From V. Lavrenko
Predict: Will John Play Tennis?
Pure Subset
Pure Subset
4 yes/0 No
Split Further
2 yes/3 No
Split Further
3 yes/2 No
Will John Play Tennis?
Will John Play Tennis?
Will John Play Tennis?
• Output of the Decision Tree
• Decision Labels at the Leaf
• Balanced Tree not required
ID3
• Ross Quinlan (ID3, C4.5)
• Breimanetal (CaRT)
Choosing the Best Split
• Choice of Attribute critically determines quality of the
Decision Tree
• Outlook : Pure Subset
• Wind: Weak better than Strong
• Strong completely uncertain
• Need subset strongly biased towards positive or negative.
• Certainty over Uncertainty: 4/0 as pure as 0/4
Entropy
• Quantifies the randomness, or disorder, within a set of class
values.
• High entropy sets are diverse
• Find splits that reduce entropy, increasing homogeneity within
the groups.
• If there are two classes, entropy values can range from 0 to 1.
• For n classes, entropy ranges from 0 to log2 (n)
• Minimum value indicates that the set is completely
homogenous
• Maximum value indicates that the data are as diverse as
possible, and no
group has even a small plurality.
Entropy
• Interpretation: If X is in S, how many bits are used to
represent
whether X is positive or negative?
• Impure (3 pos/ 3 neg)
H(S) = -3/6 log(3/6) – 3/6 log(3/6) = 1/2+1/2 = 1
• Pure (4 pos/0 neg)
H(S) = -4/4 log (4/4) – 0/4 log(0/4) = 0
Information Gain
• Objective of Splits is to produce pure sets
• Information Gain arises out of a split (Reduction in Entropy).
Gain quantifies
the Certainty obtained after a split
• Information Gain performed recursively
Overfitting in Decision Trees
• Always classifies perfectly
• Use of singleton classes
• Classification of test data problematic
• To build an effective decision tree, stop
its growth before it becomes more
specific
Prevent Overfitting
• Avoid splitting when statistically insignificant
• 2 samples vs. 100 samples
• Sub-Tree replacement
• Construct validation set out of training set
• Remove sub-trees (iterate over intermediate nodes) and test
validation set
• Greedy approach. Take sub-tree that gives the best result
• Pick the tree that gives best results on validation set
• Remove sub-trees while validation set result improves
Problems with Information Gain
• Biased towards attributes with many values
• Will not work with new data
• Use a Gain Ratio
Limitations of ID3
C5.0 and Overfitting
• It prunes using reasonable defaults.
• Strategy is to post-prune the tree.
• First grows a large tree that overfits the training data.
• Nodes and branches that have little effect on the classification
errors are
removed.
• Entire branches are moved further up the tree or replaced by
simpler decisions.
C5.0 Boosting Accuracy
• Adaptive boosting: Many decision trees are built and the trees
vote
on the best class for each example
• Combining a number of weak performing learners create a
team that is much
stronger than any of the learners alone.
• Each learner has strengths and weaknesses and they may be
better or worse in solving
certain problems. Using a combination of several learners with
complementary strengths
and weaknesses can therefore dramatically improve the
accuracy of a classifier.
C5.0 Making Mistakes Costly
• Allows us to assign a penalty to different types of errors, in
order to
discourage a tree from making more costly mistakes.
• Penalties are designated in a cost matrix
• Predicted and actual values take two values, yes or no
• Describe a 2 x 2 matrix, using a list of two vectors, each with
two values.
• Create Error Matrix
error_cost <- matrix(c(0, 1, 4, 0), nrow = 2, dimnames =
matrix_dimensions)
• Usage
> credit_cost <- C5.0(credit_train[-17], credit_train$default,
costs = error_cost)
C5.0 Output
Classification Rules
• Classification rules represent knowledge in the form of logical
if-else
statements that assign a class to unlabeled examples.
• Conists of an antecedent (combinations of feature values ) and
a consequent (class
value)
"if X, then Y"
• Example
"if the hard drive is making a clicking sound, then it is about to
fail."
• Rule learners used similar to decision tree learners
• Like decision trees, used for applications that generate
knowledge for
future action
• Identifying conditions that lead to a hardware failure in
mechanical devices
• Describing the key characteristics of groups of people for
customer segmentation
• Finding conditions that precede large drops or increases in the
prices of shares on
the stock market

More Related Content

Similar to Classification Using Decision Trees and RulesChapter 5.docx

Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxPriyadharshiniG41
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxVijayalakshmi171563
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfssuser4c50a9
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...DurgaDevi310087
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random ForestBong-Ho Lee
 

Similar to Classification Using Decision Trees and RulesChapter 5.docx (20)

Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Lecture4.ppt
Lecture4.pptLecture4.ppt
Lecture4.ppt
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Data discretization
Data discretizationData discretization
Data discretization
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random Forest
 

More from monicafrancis71118

1. Discuss Blockchains potential application in compensation system.docx
1. Discuss Blockchains potential application in compensation system.docx1. Discuss Blockchains potential application in compensation system.docx
1. Discuss Blockchains potential application in compensation system.docxmonicafrancis71118
 
1. Describe the characteristics of the aging process. Explain how so.docx
1. Describe the characteristics of the aging process. Explain how so.docx1. Describe the characteristics of the aging process. Explain how so.docx
1. Describe the characteristics of the aging process. Explain how so.docxmonicafrancis71118
 
1. Dis. 7Should we continue to collect data on race and .docx
1. Dis. 7Should we continue to collect data on race and .docx1. Dis. 7Should we continue to collect data on race and .docx
1. Dis. 7Should we continue to collect data on race and .docxmonicafrancis71118
 
1. Differentiate crisis intervention from other counseling therapeut.docx
1. Differentiate crisis intervention from other counseling therapeut.docx1. Differentiate crisis intervention from other counseling therapeut.docx
1. Differentiate crisis intervention from other counseling therapeut.docxmonicafrancis71118
 
1. Despite our rational nature, our ability to reason well is ofte.docx
1. Despite our rational nature, our ability to reason well is ofte.docx1. Despite our rational nature, our ability to reason well is ofte.docx
1. Despite our rational nature, our ability to reason well is ofte.docxmonicafrancis71118
 
1. Describe the ethical challenges faced by organizations operating .docx
1. Describe the ethical challenges faced by organizations operating .docx1. Describe the ethical challenges faced by organizations operating .docx
1. Describe the ethical challenges faced by organizations operating .docxmonicafrancis71118
 
1. Describe in your own words the anatomy of a muscle.  This sho.docx
1. Describe in your own words the anatomy of a muscle.  This sho.docx1. Describe in your own words the anatomy of a muscle.  This sho.docx
1. Describe in your own words the anatomy of a muscle.  This sho.docxmonicafrancis71118
 
1. Describe how your attitude of including aspects of health literac.docx
1. Describe how your attitude of including aspects of health literac.docx1. Describe how your attitude of including aspects of health literac.docx
1. Describe how your attitude of including aspects of health literac.docxmonicafrancis71118
 
1. Choose a behavior (such as overeating, shopping, Internet use.docx
1. Choose a behavior (such as overeating, shopping, Internet use.docx1. Choose a behavior (such as overeating, shopping, Internet use.docx
1. Choose a behavior (such as overeating, shopping, Internet use.docxmonicafrancis71118
 
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docxmonicafrancis71118
 
1. Cryptography is used to protect confidential data in many areas. .docx
1. Cryptography is used to protect confidential data in many areas. .docx1. Cryptography is used to protect confidential data in many areas. .docx
1. Cryptography is used to protect confidential data in many areas. .docxmonicafrancis71118
 
1. Compare and contrast steganography and cryptography.2. Why st.docx
1. Compare and contrast steganography and cryptography.2. Why st.docx1. Compare and contrast steganography and cryptography.2. Why st.docx
1. Compare and contrast steganography and cryptography.2. Why st.docxmonicafrancis71118
 
1. Date September 13, 2017 – September 15, 2017 2. Curr.docx
1. Date September 13, 2017 – September 15, 2017 2. Curr.docx1. Date September 13, 2017 – September 15, 2017 2. Curr.docx
1. Date September 13, 2017 – September 15, 2017 2. Curr.docxmonicafrancis71118
 
1. compare and contrast predictive analytics with prescriptive and d.docx
1. compare and contrast predictive analytics with prescriptive and d.docx1. compare and contrast predictive analytics with prescriptive and d.docx
1. compare and contrast predictive analytics with prescriptive and d.docxmonicafrancis71118
 
1. Creating and maintaining relationships between home and schoo.docx
1. Creating and maintaining relationships between home and schoo.docx1. Creating and maintaining relationships between home and schoo.docx
1. Creating and maintaining relationships between home and schoo.docxmonicafrancis71118
 
1. Compare and contrast Strategic and Tactical Analysis and its .docx
1. Compare and contrast Strategic and Tactical Analysis and its .docx1. Compare and contrast Strategic and Tactical Analysis and its .docx
1. Compare and contrast Strategic and Tactical Analysis and its .docxmonicafrancis71118
 
1. Coalition ProposalVaccination Policy for Infectious Disease P.docx
1. Coalition ProposalVaccination Policy for Infectious Disease P.docx1. Coalition ProposalVaccination Policy for Infectious Disease P.docx
1. Coalition ProposalVaccination Policy for Infectious Disease P.docxmonicafrancis71118
 
1. Company Description and Backgrounda. Weight Watchers was cr.docx
1. Company Description and Backgrounda. Weight Watchers was cr.docx1. Company Description and Backgrounda. Weight Watchers was cr.docx
1. Company Description and Backgrounda. Weight Watchers was cr.docxmonicafrancis71118
 
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docxmonicafrancis71118
 
1. Choose a case for the paper that interests you. Most choose a .docx
1. Choose a case for the paper that interests you.  Most choose a .docx1. Choose a case for the paper that interests you.  Most choose a .docx
1. Choose a case for the paper that interests you. Most choose a .docxmonicafrancis71118
 

More from monicafrancis71118 (20)

1. Discuss Blockchains potential application in compensation system.docx
1. Discuss Blockchains potential application in compensation system.docx1. Discuss Blockchains potential application in compensation system.docx
1. Discuss Blockchains potential application in compensation system.docx
 
1. Describe the characteristics of the aging process. Explain how so.docx
1. Describe the characteristics of the aging process. Explain how so.docx1. Describe the characteristics of the aging process. Explain how so.docx
1. Describe the characteristics of the aging process. Explain how so.docx
 
1. Dis. 7Should we continue to collect data on race and .docx
1. Dis. 7Should we continue to collect data on race and .docx1. Dis. 7Should we continue to collect data on race and .docx
1. Dis. 7Should we continue to collect data on race and .docx
 
1. Differentiate crisis intervention from other counseling therapeut.docx
1. Differentiate crisis intervention from other counseling therapeut.docx1. Differentiate crisis intervention from other counseling therapeut.docx
1. Differentiate crisis intervention from other counseling therapeut.docx
 
1. Despite our rational nature, our ability to reason well is ofte.docx
1. Despite our rational nature, our ability to reason well is ofte.docx1. Despite our rational nature, our ability to reason well is ofte.docx
1. Despite our rational nature, our ability to reason well is ofte.docx
 
1. Describe the ethical challenges faced by organizations operating .docx
1. Describe the ethical challenges faced by organizations operating .docx1. Describe the ethical challenges faced by organizations operating .docx
1. Describe the ethical challenges faced by organizations operating .docx
 
1. Describe in your own words the anatomy of a muscle.  This sho.docx
1. Describe in your own words the anatomy of a muscle.  This sho.docx1. Describe in your own words the anatomy of a muscle.  This sho.docx
1. Describe in your own words the anatomy of a muscle.  This sho.docx
 
1. Describe how your attitude of including aspects of health literac.docx
1. Describe how your attitude of including aspects of health literac.docx1. Describe how your attitude of including aspects of health literac.docx
1. Describe how your attitude of including aspects of health literac.docx
 
1. Choose a behavior (such as overeating, shopping, Internet use.docx
1. Choose a behavior (such as overeating, shopping, Internet use.docx1. Choose a behavior (such as overeating, shopping, Internet use.docx
1. Choose a behavior (such as overeating, shopping, Internet use.docx
 
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx
1. Case 3-4 Franklin Industries’ Whistleblowing (a GVV Case)Natali.docx
 
1. Cryptography is used to protect confidential data in many areas. .docx
1. Cryptography is used to protect confidential data in many areas. .docx1. Cryptography is used to protect confidential data in many areas. .docx
1. Cryptography is used to protect confidential data in many areas. .docx
 
1. Compare and contrast steganography and cryptography.2. Why st.docx
1. Compare and contrast steganography and cryptography.2. Why st.docx1. Compare and contrast steganography and cryptography.2. Why st.docx
1. Compare and contrast steganography and cryptography.2. Why st.docx
 
1. Date September 13, 2017 – September 15, 2017 2. Curr.docx
1. Date September 13, 2017 – September 15, 2017 2. Curr.docx1. Date September 13, 2017 – September 15, 2017 2. Curr.docx
1. Date September 13, 2017 – September 15, 2017 2. Curr.docx
 
1. compare and contrast predictive analytics with prescriptive and d.docx
1. compare and contrast predictive analytics with prescriptive and d.docx1. compare and contrast predictive analytics with prescriptive and d.docx
1. compare and contrast predictive analytics with prescriptive and d.docx
 
1. Creating and maintaining relationships between home and schoo.docx
1. Creating and maintaining relationships between home and schoo.docx1. Creating and maintaining relationships between home and schoo.docx
1. Creating and maintaining relationships between home and schoo.docx
 
1. Compare and contrast Strategic and Tactical Analysis and its .docx
1. Compare and contrast Strategic and Tactical Analysis and its .docx1. Compare and contrast Strategic and Tactical Analysis and its .docx
1. Compare and contrast Strategic and Tactical Analysis and its .docx
 
1. Coalition ProposalVaccination Policy for Infectious Disease P.docx
1. Coalition ProposalVaccination Policy for Infectious Disease P.docx1. Coalition ProposalVaccination Policy for Infectious Disease P.docx
1. Coalition ProposalVaccination Policy for Infectious Disease P.docx
 
1. Company Description and Backgrounda. Weight Watchers was cr.docx
1. Company Description and Backgrounda. Weight Watchers was cr.docx1. Company Description and Backgrounda. Weight Watchers was cr.docx
1. Company Description and Backgrounda. Weight Watchers was cr.docx
 
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx
1. Come up with TWO movie ideas -- as in for TWO screenplays that .docx
 
1. Choose a case for the paper that interests you. Most choose a .docx
1. Choose a case for the paper that interests you.  Most choose a .docx1. Choose a case for the paper that interests you.  Most choose a .docx
1. Choose a case for the paper that interests you. Most choose a .docx
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

Classification Using Decision Trees and RulesChapter 5.docx

  • 1. Classification Using Decision Trees and Rules Chapter 5 Introduction • Decision tree learners use a tree structure to model the relationships among the features and the potential outcomes. • a structure of branching decisions into a final predicted class value • Decision begins at the root node, then passed through decision nodes that require choices. • Choices split the data across branches that indicate potential outcomes of a decision • Tree is terminated by leaf nodes that denote the action to be taken as the result of the series of decisions. Decision Tree Example
  • 2. Benefits • Flowchart-like tree structure is not necessarily exclusively for the learner's internal use. • Resulting structure in a human-readable format. • Provides insight into how and why the model works or doesn't work well for a particular task. • Useful where classification mechanism needs to be transparent for legal reasons, or in case the results need to be shared with others in order to inform future business practices • Credit scoring models where criteria that causes an applicant to be rejected need to be clearly documented and free from bias • Marketing studies of customer behavior such as satisfaction or churn, which will be shared with management or advertising agencies • Diagnosis of medical conditions based on laboratory measurements, symptoms, or the rate of disease progression Applicability • Widely used machine learning technique • Can be applied to model almost any type of data with
  • 3. excellent results • Does not fit task where the data has a large number of nominal features with many levels or it has a large number of numeric features. • Result in large number of decisions and an overly complex tree. • Tendency of decision trees to overfit data, though this can be overcome by adjusting some simple parameters Divide and Conquer • Decision trees are built using a heuristic called recursive partitioning. • Divide and conquer because it splits the data into subsets, which are then split repeatedly into even smaller subsets, • Stops when the data within the subsets are sufficiently homogenous, or another stopping criterion has been met. • Root node represents the entire dataset • Algorithm must choose a feature to split upon • Choose the feature most predictive of the target class. • Algorithm continues to divide and conquer the data, choosing the best candidate feature each time to create another decision node,
  • 4. until a stopping criterion is reached. Divide and Conquer • Stopping Conditions • All (or nearly all) of the examples at the node have the same class • There are no remaining features to distinguish among the examples • The tree has grown to a predefined size limit Example • Finding potential for a movie- Box Office Bust, Mainstream Hit, Critical Success • Diagonal lines might have split the data even more cleanly. • Limitation of the decision tree's knowledge representation, which uses axis-parallel splits. • Each split considers one feature at a time prevents the decision tree from forming more complex decision boundaries Decision Tree
  • 5. Predict: Will John Play Tennis? Predict on D15 Rain High Weak ? Vs. k-NN: Lookup • Difficult to Interpret • Understanding of when John plays • Divide and conquer • Split into subsets • Is it pure? • Yes: Stop • No: Repeat • Understand which subset the data falls into Credit Example From V. Lavrenko Predict: Will John Play Tennis? Pure Subset Pure Subset 4 yes/0 No Split Further 2 yes/3 No Split Further 3 yes/2 No
  • 6. Will John Play Tennis? Will John Play Tennis? Will John Play Tennis? • Output of the Decision Tree • Decision Labels at the Leaf • Balanced Tree not required ID3 • Ross Quinlan (ID3, C4.5) • Breimanetal (CaRT) Choosing the Best Split • Choice of Attribute critically determines quality of the Decision Tree • Outlook : Pure Subset • Wind: Weak better than Strong • Strong completely uncertain • Need subset strongly biased towards positive or negative. • Certainty over Uncertainty: 4/0 as pure as 0/4
  • 7. Entropy • Quantifies the randomness, or disorder, within a set of class values. • High entropy sets are diverse • Find splits that reduce entropy, increasing homogeneity within the groups. • If there are two classes, entropy values can range from 0 to 1. • For n classes, entropy ranges from 0 to log2 (n) • Minimum value indicates that the set is completely homogenous • Maximum value indicates that the data are as diverse as possible, and no group has even a small plurality. Entropy • Interpretation: If X is in S, how many bits are used to represent whether X is positive or negative? • Impure (3 pos/ 3 neg) H(S) = -3/6 log(3/6) – 3/6 log(3/6) = 1/2+1/2 = 1 • Pure (4 pos/0 neg) H(S) = -4/4 log (4/4) – 0/4 log(0/4) = 0
  • 8. Information Gain • Objective of Splits is to produce pure sets • Information Gain arises out of a split (Reduction in Entropy). Gain quantifies the Certainty obtained after a split • Information Gain performed recursively Overfitting in Decision Trees • Always classifies perfectly • Use of singleton classes • Classification of test data problematic • To build an effective decision tree, stop its growth before it becomes more specific Prevent Overfitting • Avoid splitting when statistically insignificant • 2 samples vs. 100 samples • Sub-Tree replacement • Construct validation set out of training set • Remove sub-trees (iterate over intermediate nodes) and test validation set
  • 9. • Greedy approach. Take sub-tree that gives the best result • Pick the tree that gives best results on validation set • Remove sub-trees while validation set result improves Problems with Information Gain • Biased towards attributes with many values • Will not work with new data • Use a Gain Ratio Limitations of ID3 C5.0 and Overfitting • It prunes using reasonable defaults. • Strategy is to post-prune the tree. • First grows a large tree that overfits the training data. • Nodes and branches that have little effect on the classification errors are removed. • Entire branches are moved further up the tree or replaced by simpler decisions.
  • 10. C5.0 Boosting Accuracy • Adaptive boosting: Many decision trees are built and the trees vote on the best class for each example • Combining a number of weak performing learners create a team that is much stronger than any of the learners alone. • Each learner has strengths and weaknesses and they may be better or worse in solving certain problems. Using a combination of several learners with complementary strengths and weaknesses can therefore dramatically improve the accuracy of a classifier. C5.0 Making Mistakes Costly • Allows us to assign a penalty to different types of errors, in order to discourage a tree from making more costly mistakes. • Penalties are designated in a cost matrix • Predicted and actual values take two values, yes or no • Describe a 2 x 2 matrix, using a list of two vectors, each with two values. • Create Error Matrix error_cost <- matrix(c(0, 1, 4, 0), nrow = 2, dimnames = matrix_dimensions)
  • 11. • Usage > credit_cost <- C5.0(credit_train[-17], credit_train$default, costs = error_cost) C5.0 Output Classification Rules • Classification rules represent knowledge in the form of logical if-else statements that assign a class to unlabeled examples. • Conists of an antecedent (combinations of feature values ) and a consequent (class value) "if X, then Y" • Example "if the hard drive is making a clicking sound, then it is about to fail." • Rule learners used similar to decision tree learners • Like decision trees, used for applications that generate knowledge for future action • Identifying conditions that lead to a hardware failure in mechanical devices • Describing the key characteristics of groups of people for customer segmentation • Finding conditions that precede large drops or increases in the
  • 12. prices of shares on the stock market