Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

497 views

Published on

asdasd

No Downloads

Total views

497

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

18

Comments

0

Likes

1

No embeds

No notes for slide

- 1. An Algorithm for Building Decision Trees (C4.5)<br />1. Let T be the set of training instances.2. Choose an attribute that best differentiates the instances in T.3. Create a tree node whose value is the chosen attribute. -Create child links from this node where each link represents a unique value for the chosen attribute. -Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3: -If the instances in the subclass satisfy predefined criteria or if the set of remaining attribute choices for this path is null, specify the classification for new instances following this decision path. -If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2.<br />
- 2. Entropy Example<br />Given a set R of objects<br />Entropy(R) = S (–p(I)log2p(I))<br />where p(I) is the proportion of set R that belongs to class I.<br />An example:<br />If set R is a collection of 14 objects, 9 of them belong to class A, and 5 of them belong to class B, then<br />Entropy(R) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940<br />The range of entropy is from 0 (perfectly classified) to 1 (totally random). <br />
- 3. Information Gain Example<br />Actual example:<br />Suppose there are 14 objects in set R, 9 of them belong to the class Evil, 5 of them belong to the class Good.<br />Suppose that each object has an attribute Size, and Size can either be Big or Small.<br />Suppose that out of these 14 objects, 8 have Size = Big, and 6 have Size = Small.<br />Suppose that out of the 8 objects who have Size = Big, 6 are Evil and 2 are Good.<br />Suppose that out of the 6 objects who have Size = Small, 3 are Evil and 3 are Good.<br />Then, the information gain due to splitting R by attribute Size is:<br />Gain(R,Size)=Entropy(R)-(8/14)*Entropy(RBig)-(6/14)*Entropy(RSmall) <br /> = 0.940 - (8/14)*0.811 - (6/14)*1.00<br /> = 0.048<br />Entropy(RBig) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = 0.811<br />Entropy(RSmall) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00<br />
- 4. Which attribute to use as split point for a node in decision tree?<br />At the node, calculate information gain for each attribute.<br />Choose the attribute that has the highest information gain, and use that as the split point.<br />In the preceding example, the attribute Size has only two possible values.<br />Often, an attribute can have more than two possible values, and we’d have to adapt the formula accordingly.<br />
- 5. A Decision Tree Example<br />The weather data example.<br />
- 6. Information Gained by Knowing the Result of a Decision<br /> In the weather data example, there are 9 instances of which the decision to play is “yes” and there are 5 instances of which the decision to play is “no”. Then, the information gained by knowing the result of the decision is<br />
- 7. Information Further Required If “Outlook” Is Placed at the Root<br />Outlook<br />sunny<br />overcast<br />rainy<br />yes<br />yes<br />no<br />no<br />no<br />yes<br />yes<br />yes<br />yes<br />yes<br />yes<br />yes<br />no<br />no<br />
- 8. Information Gained by Placing Each of the 4 Attributes<br />Gain(outlook) = 0.940 bits – 0.693 bits <br /> = 0.247 bits.<br />Gain(temperature) = 0.029 bits.<br />Gain(humidity) = 0.152 bits.<br />Gain(windy) = 0.048 bits.<br />
- 9. The Strategy for Selecting an Attribute to Place at a Node<br />Select the attribute that gives us the largest information gain.<br />In this example, it is the attribute “Outlook”.<br />Outlook<br />sunny<br />overcast<br />rainy<br />2 “yes”<br /> 3 “no”<br />4 “yes”<br />3 “yes”<br />2 “no”<br />
- 10. The Recursive Procedure for Constructing a Decision Tree<br />The operation discussed above is applied to each branch recursively to construct the decision tree.<br />For example, for the branch “Outlook = Sunny”, we evaluate the information gained by applying each of the remaining 3 attributes.<br />Gain(Outlook=sunny;Temperature) = 0.971 – 0.4 = 0.571<br />Gain(Outlook=sunny;Humidity) = 0.971 – 0 = 0.971<br />Gain(Outlook=sunny;Windy) = 0.971 – 0.951 = 0.02<br />
- 11. Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch “Outlook = rainy”.<br />Gain(Outlook=rainy;Temperature) = 0.971 – 0.951 = 0.02<br />Gain(Outlook=rainy;Humidity) = 0.971 – 0.951 = 0.02<br />Gain(Outlook=rainy;Windy) =0.971 – 0 = 0.971<br />
- 12. Over-fitting and Pruning<br />If we recursively build the decision tree based on our training set until each leaf is totally classified, we have most likely over-fitted the data.<br />To avoid over-fitting, we need to set aside part of the training data to test the decision tree, and prune (delete) the branches that give poor predictions.<br />
- 13. The Over-fitting Issue<br />Over-fitting is caused by creating decision rules that work accurately on the training set based on insufficient quantity of samples.<br />As a result, these decision rules may not work well in more general cases.<br />
- 14.
- 15. Evaluation<br />Training accuracy<br />How many training instances can be correctly classify based on the available data?<br />Is high when the tree is deep/large, or when there is less confliction in the training instances.<br /> however, higher training accuracy does not mean good generalization<br />Testing accuracy<br />Given a number of new instances, how many of them can we correctly classify?<br />Cross validation<br />
- 16. A partial decision tree with root node = income range<br />
- 17. A partial decision tree with root node = credit card insurance<br />
- 18. A three-node decision tree for the credit card database<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment