The document discusses decision trees and their construction. It begins by presenting the weather dataset with various attribute values and classifications. It then shows the decision tree constructed for this dataset, with internal nodes testing attributes and leaf nodes assigning a 'play' classification. The document goes on to explain the top-down induction of decision trees, considering attributes, stopping criteria, and how the best splitting attribute is determined using measures like information gain and information gain ratio to select the attribute that provides the most information about the target classification. It also addresses handling of numerical attributes through discretization.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Mathematics online: some common algorithmsMark Moriarty
Brief overview of some basic algorithms used online and across data-mining, and a word on where to learn them. Prepared specially for UCC Boole Prize 2012.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Mathematics online: some common algorithmsMark Moriarty
Brief overview of some basic algorithms used online and across data-mining, and a word on where to learn them. Prepared specially for UCC Boole Prize 2012.
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Practical deep learning for computer visionEran Shlomo
This is the presentation given in TLV DLD 2017. In this presentation we walk through the planning and implemintation of deeplearning solution for image recognition, with focus on the data.
It is based on the work we do at dataloop.ai and its customers.
These days we see a lot of buzz about Machine Learning(ML)/Artificial Intelligence(AI), and why not, we all are consumers of ML directly or indirectly, irrespective of our profession. AI and ML is a fantastic field, everyone is excited about it, and rightly so. In this tutorial series, we will try to explore and demystify the complicated world of {maths, equations, and theory} that functions in tandem to bring out the "magic" which we experience on many application(s)/software(s). In this talk we will learn about Supervised Learning, Decision Tree, and shall solve some problem with SageMaker.
Blog: https://dev.to/aws/an-introduction-to-decision-tree-and-ensemble-methods-part-1-24p0
Code: https://github.com/debnsuma/AI-ML-Algo2020/tree/master/01.Decision_Tree
How to Measure the Security of your Network DefensesPECB
To defend networks, we should be able to measure their security performance. I’m going to show you the exact techniques to measure the security of portions of your internal networks, such as anti-virus, malware and anomalous event detection. Then we will apply the same techniques to compare the security of classes of protective security products even though vendors don’t supply such specifications.
Main points covered:
• How to measure security and compare the effectiveness of protective devices as a function of time
• The internal process mechanism is immaterial to system measurement; signature-based A/V, rule-based binary decision making, heuristics, deep learning or any possible hybrid
• Attendees will be introduced and receive the math, the tools, charts and schematics on how to measure their own security
Presenter:
Winn has lived Security since 1983, and now says, “I think, maybe, I’m just starting to understand it.” His predictions about the internet & security have been scarily spot on. He coined the term “Electronic Pearl Harbor” while testifying before Congress in 1991 and showed the world how and why massive identify theft, cyber-espionage, nation-state hacking and cyber-terrorism would be an integral part of our future. He was named the “Civilian Architect of Information Warfare,” by Admiral Tyrrell of the British MoD.
Recorded webinar link: https://youtu.be/F8Fs4yE_CUU
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
3. Prof. Pier Luca Lanzi
The Decision Tree for the !
Weather Dataset
3
humidity
overcast
No NoYes Yes
Yes
outlook
windy
sunny rain
falsetruenormalhigh
4. Prof. Pier Luca Lanzi
What is a Decision Tree?
• An internal node is a test on an attribute
• A branch represents an outcome of the test, e.g., outlook=windy
• A leaf node represents a class label or class label distribution
• At each node, one attribute is chosen to split training examples
into distinct classes as much as possible
• A new case is classified by following a matching path to a leaf
node
4
11. Prof. Pier Luca Lanzi
Computing Decision Trees
• Top-down Tree Construction
! Initially, all the training examples are at the root
! Then, the examples are recursively partitioned, !
by choosing one attribute at a time
• Bottom-up Tree Pruning
! Remove subtrees or branches, in a bottom-up manner, to
improve the estimated accuracy on new cases.
11
12. Prof. Pier Luca Lanzi
Top Down Induction of Decision Trees
function TDIDT(S) // S, a set of labeled examples
Tree = new empty node
if (all examples have the same class c
or no further splitting is possible)
then // new leaf labeled with the majority class c
Label(Tree) = c
else // new decision node
(A,T) = FindBestSplit(S)
foreach test t in T do
St = all examples that satisfy t
Nodet = TDIDT(St)
AddEdge(Tree -> Nodet)
endfor
endif
return Tree
12
13. Prof. Pier Luca Lanzi
When Should Building Stop?
• There are several possible stopping criteria
• All samples for a given node belong to the same class
• If there are no remaining attributes for further partitioning,
majority voting is employed
• There are no samples left
• Or there is nothing to gain in splitting
13
14. Prof. Pier Luca Lanzi
How is the Splitting Attribute Determined?
15. Prof. Pier Luca Lanzi
What Do We Need? 15
k
IF salary<kTHEN not repaid
16. Prof. Pier Luca Lanzi
Which Attribute for Splitting?
• At each node, available attributes are evaluated on the basis of
separating the classes of the training examples
• A purity or impurity measure is used for this purpose
• Information Gain: increases with the average purity of the subsets
that an attribute produces
• Splitting Strategy: choose the attribute that results in greatest
information gain
• Typical goodness functions: information gain (ID3), information
gain ratio (C4.5), gini index (CART)
16
18. Prof. Pier Luca Lanzi
Computing Information
• Information is measured in bits
! Given a probability distribution, the info required to predict an
event is the distribution’s entropy
! Entropy gives the information required in bits (this can involve
fractions of bits!)
• Formula for computing the entropy
18
19. Prof. Pier Luca Lanzi
Entropy 19
two classes
0/1
% of elements of class 0
entropy
21. Prof. Pier Luca Lanzi
The Attribute “outlook”
• outlook” = “sunny”
• “outlook” = “overcast”
• “outlook” = “rainy”
• Expected information for attribute
21
22. Prof. Pier Luca Lanzi
Information Gain
• Difference between the information before split and the
information after split
• The information before the split, info(D), is the entropy,
• The information after the split using attribute A is computed as
the weighted sum of the entropies on each split, given n splits,
22
23. Prof. Pier Luca Lanzi
Information Gain
• Difference between the information before split and the
information after split
• Information gain for the attributes from the weather data:
! gain(“outlook”)=0.247 bits
! gain(“temperature”)=0.029 bits
! gain(“humidity”)=0.152 bits
! gain(“windy”)=0.048 bits
23
31. Prof. Pier Luca Lanzi
Another Version of the Weather Dataset
ID Code Outlook Temp Humidity Windy Play
A Sunny Hot High False No
B Sunny Hot High True No
C Overcast Hot High False Yes
D Rainy Mild High False Yes
E Rainy Cool Normal False Yes
F Rainy Cool Normal True No
G Overcast Cool Normal True Yes
H Sunny Mild High False No
I Sunny Cool Normal False Yes
J Rainy Mild Normal False Yes
K Sunny Mild Normal True Yes
L Overcast Mild High True Yes
M Overcast Hot Normal False Yes
N Rainy Mild High True No
31
32. Prof. Pier Luca Lanzi
Decision Tree for the New Dataset
• Entropy for splitting using “ID Code” is zero, since each leaf node is “pure”
• Information Gain is thus maximal for ID code
32
33. Prof. Pier Luca Lanzi
Highly-Branching Attributes
• Attributes with a large number of values are usually problematic
• Examples: id, primary keys, or almost primary key attributes
• Subsets are likely to be pure if there is a large number of values
• Information Gain is biased towards choosing attributes with a
large number of values
• This may result in overfitting (selection of an attribute that is non-
optimal for prediction)
33
35. Prof. Pier Luca Lanzi
Information Gain Ratio
• Modification of the Information Gain that reduces the bias
toward highly-branching attributes
• Information Gain Ratio should be
! Large when data is evenly spread
! Small when all data belong to one branch
• Information Gain Ratio takes number and size of branches into
account when choosing an attribute
• It corrects the information gain by taking the intrinsic information
of a split into account
35
36. Prof. Pier Luca Lanzi
Information Gain Ratio and !
Intrinsic information
• Intrinsic information!
!
!
!
!
computes the entropy of distribution of instances into branches
• Information Gain Ratio normalizes Information Gain by
36
37. Prof. Pier Luca Lanzi
Computing the Information Gain Ratio
• The intrinsic information for ID code is
• Importance of attribute decreases as intrinsic information gets
larger
• The Information gain ratio of “ID code”,
37
38. Prof. Pier Luca Lanzi
Information Gain Ratio for Weather Data 38
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362
Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021
Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049
39. Prof. Pier Luca Lanzi
More on Information Gain Ratio
• “outlook” still comes out top, however “ID code” has greater
Information Gain Ratio
• The standard fix is an ad-hoc test to prevent splitting on that type
of attribute
• First, only consider attributes with greater than average
Information Gain; Then, compare them using the Information
Gain Ratio
• Information Gain Ratio may overcompensate and choose an
attribute just because its intrinsic information is very low
39
41. Prof. Pier Luca Lanzi
The Weather Dataset (Numerical)
Outlook Temp Humidity Windy Play
Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 78 False Yes
Rainy 70 96 False Yes
Rainy 68 80 False Yes
Rainy 65 70 True No
Overcast 64 65 True Yes
Sunny 72 95 False No
Sunny 69 70 False Yes
Rainy 75 80 False Yes
Sunny 75 70 True Yes
Overcast 72 90 True Yes
Overcast 81 75 False Yes
Rainy 71 80 True No
41
42. Prof. Pier Luca Lanzi
The Temperature Attribute
• First, sort the temperature values, including the class labels
• Then, check all the cut points and choose the one with the best
information gain
• E.g. temperature < 71.5: yes/4, no/2!
temperature ≥ 71.5: yes/5, no/3
• Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939
• Place split points halfway between values
• Can evaluate all split points in one pass!
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No
43. Prof. Pier Luca Lanzi
The Information Gain for Humidity
Humidity' Play'
65 Yes'
70 No'
70 Yes'
70 Yes'
75 Yes'
78 Yes'
80 Yes'
80 Yes'
80 No'
85 No'
90 No'
90 Yes'
95 No'
96 Yes'
43
Humidity' Play'
65 Yes'
70 No'
70 Yes'
70 Yes'
75 Yes'
78 Yes'
80 Yes'
80 Yes'
80 No'
85 No'
90 No'
90 Yes'
95 No'
96 Yes'
sort the
attribute
values
compute the gain for
every possible split
what is the information
gain if we split here?
46. Prof. Pier Luca Lanzi
Discretization
• Processing for converting an interval/numerical variable into a a finite
(discrete) set of elements (labels)
• A discretization algorithm computes a series of cut-points which defines
a set of intervals that are mapped into labels
• Motivations: many methods (also in mathematics and computer
science) cannot deal with numerical variables. In these cases,
discretization is required to be able to use them, despite the loss of
information
• Several approaches
! Supervised vs unsupervised
! Static vs dynamic
46
47. Prof. Pier Luca Lanzi
Unsupervised vs Supervised Discretization
• Unsupervised discretization just uses the attribute values, for instance,
discretize humidity as <70, 70-79, >=80)
• Supervised discretization also uses the class attribute to generate interval with
lower entropy
• Example using humidity
! Values alone may suggest <60, 60-70
! Considering the class value might suggest different intervals!
!
!
!
by grouping values to maintain information about the class attribute
47
64 65 68 69 70 71 72 72 75 75 80 81 83 85
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No | Yes | Yes Yes | No | Yes Yes | No
48. Prof. Pier Luca Lanzi
Example of Unsupervised and Supervised
Discretization for Humidity
48
Unsupervised Discretization Supervised Discretization
50. Prof. Pier Luca Lanzi
Another Splitting Criteria: !
The Gini Index
• The gini index, for a data set T contains examples from n classes,
is defined as!
!
!
!
!
where pj is the relative frequency of class j in T
• gini(T) is minimized if the classes in T are skewed.
50
52. Prof. Pier Luca Lanzi
The Gini Index
• If a data set D is split on A into two subsets D1 and D2, then,
• The reduction of impurity is defined as,
• The attribute provides the smallest gini splitting D over A (or the
largest reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute)
52
53. Prof. Pier Luca Lanzi
The Gini Index: Example
• D has 9 tuples labeled “yes” and 5 labeled “no”
• Suppose the attribute income partitions D into 10 in D1
branching on low and medium and 4 in D2
• but gini{medium,high} is 0.30 and thus the best since it is the
lowest
53
57. Prof. Pier Luca Lanzi
Avoiding Overfitting in Decision Trees
• The generated tree may overfit the training data
• Too many branches, some may reflect anomalies!
due to noise or outliers
• Result is in poor accuracy for unseen samples
• Two approaches to avoid overfitting
! Prepruning
! Postpruning
57
58. Prof. Pier Luca Lanzi
Pre-pruning vs Post-pruning
• Prepruning
! Halt tree construction early
! Do not split a node if this would result in the goodness
measure falling below a threshold
! Difficult to choose an appropriate threshold
• Postpruning
! Remove branches from a “fully grown” tree
! Get a sequence of progressively pruned trees
! Use a set of data different from the training data to decide
which is the “best pruned tree”
58
59. Prof. Pier Luca Lanzi
Pre-Pruning
• Based on statistical significance test
• Stop growing the tree when there is no statistically significant
association between any attribute and the class at a particular
node
• Early stopping and halting
! No individual attribute exhibits any interesting information
about the class
! The structure is only visible in fully expanded tree
! Pre-pruning won’t expand the root node
59
60. Prof. Pier Luca Lanzi
Post-Pruning
• First, build full tree, then prune it
• Fully-grown tree shows all attribute interactions
• Problem: some subtrees might be due to chance effects
• Two pruning operations
! Subtree raising
! Subtree replacement
• Possible strategies
! Error estimation
! Significance testing
! MDL principle
60
61. Prof. Pier Luca Lanzi
Subtree Raising
• Delete node
• Redistribute instances
• Slower than subtree replacement
61
X
62. Prof. Pier Luca Lanzi
Subtree Replacement
• Works bottom-up
• Consider replacing a tree!
only after considering !
all its subtrees
62
63. Prof. Pier Luca Lanzi
Estimating Error Rates
• Prune only if it reduces the estimated error
• Error on the training data is NOT a useful estimator!
(Why it would result in very little pruning?)
• A hold-out set might be kept for pruning!
(“reduced-error pruning”)
• Example (C4.5’s method)
! Derive confidence interval from training data
! Use a heuristic limit, derived from this, for pruning
! Standard Bernoulli-process-based method
! Shaky statistical assumptions (based on training data)
63
64. Prof. Pier Luca Lanzi
Mean and Variance
• Mean and variance for a Bernoulli trial: p, p (1–p)
• Expected error rate f=S/N has mean p and variance p(1–p)/N
• For large enough N, f follows a Normal distribution
• c% confidence interval [–z ≤ X ≤ z] for random variable with 0
mean is given by:
• With a symmetric distribution,
64
65. Prof. Pier Luca Lanzi
Confidence Limits
• Confidence limits for the
normal distribution with 0
mean and a variance of 1,
• Thus,
• To use this we have to
reduce our random
variable f to have 0 mean
and unit variance
65
Pr[X ≥ z] z
0.1% 3.09
0.5% 2.58
1% 2.33
5% 1.65
10% 1.28
20% 0.84
25% 0.69
40% 0.25
66. Prof. Pier Luca Lanzi
C4.5’s Pruning Method
• Given the error f on the training data, the upper bound for the
error estimate for a node is computed as
• If c = 25% then z = 0.69 (from normal distribution)
! f is the error on the training data
! N is the number of instances covered by the leaf
66
!
"
#
$
%
&
+
!
!
"
#
$
$
%
&
+−++=
N
z
N
z
N
f
N
f
z
N
z
fe
2
2
222
1
42
67. Prof. Pier Luca Lanzi
f=0.33
e=0.47
f=0.5
e=0.72
f=0.33
e=0.47
f = 5/14
e = 0.46
e < 0.51
so prune!
Combined using ratios 6:2:6 gives 0.51
72. Prof. Pier Luca Lanzi
Regression Trees for Prediction
• Decision trees can also be used to predict the value of a
numerical target variable
• Regression trees work similarly to decision trees, by analyzing
several splits attempted, choosing the one that minimizes impurity
• Differences from Decision Trees
! Prediction is computed as the average of numerical target
variable in the subspace (instead of the majority vote)
! Impurity is measured by sum of squared deviations from leaf
mean (instead of information-based measures)
! Find split that produces greatest separation in
[y – E(y)]2
! Find nodes with minimal within variance and therefore
! Performance is measured by RMSE (root mean squared error)
72
74. Prof. Pier Luca Lanzi
Mining Regression Trees in R
# Regression Tree Example
library(rpart)
library(MASS)
data(cpus)
# grow tree
fit <- rpart(perf~syct + mmin + mmax + cach + chmin + chmax,
method="anova", data=cpus)
# create attractive postcript plot of tree
post(fit, file = "CPUPerformanceRegressionTree.ps",
title = "Regression Tree for CPU Performance Dataset")
74
77. Prof. Pier Luca Lanzi
Summary
• Decision tree construction is a recursive procedure involving
! The selection of the best splitting attribute
! (and thus) a selection of an adequate purity measure
(Information Gain, Information Gain Ratio, Gini Index)
! Pruning to avoid overfitting
• Information Gain is biased toward highly splitting attributes
• Information Gain Ratio takes the number of splits that an
attribute induces into the picture but might not be enough
77
78. Prof. Pier Luca Lanzi
Suggested Homework
• Try building the tree for the nominal version of the Weather
dataset using the Gini Index
• Compute the best upper bound you can for the computational
complexity of the decision tree building
• Check the problems given in the previous year exams on the use
of Information Gain, Gain Ratio and Gini index, solve them and
check the results using Weka.
78