Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Why Beyoncé Is More Popular Than Me... by Jérôme KUNEGIS 2188 views
- Date Mining by Tommy96 836 views
- Data miningpresentation by Manoj Krishna Yad... 737 views
- 1. Introduction to Association Rule... by Surabhi Gosavi 1720 views
- Chapter 5. Concept Description: Cha... by Tommy96 23329 views
- Data Mining: Concepts and Techniq... by Salah Amean 4082 views

1,496 views

Published on

Decision Tree Induction Using Frequency Tables for Attribute Selection

No Downloads

Total views

1,496

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

37

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Decision Tree Induction: Using Frequency Tables for Attribute Selection<br />NguyễnDươngTrungDũng<br />1<br />
- 2. Content<br />1. Calculating Entropy in Practice <br />2. Gini Index of Diversity<br />3. Inductive Bias<br />4. Using Gain Ratio for Attribute Selection <br />2<br />
- 3. Calculating Entropy in Practice <br /> Training Set 1 (age=1) for lens24<br />3<br />
- 4. Calculating Entropy in Practice <br /> Frequency Table for Attribute age for lens24<br />The cells of this table show the number of occurences of each combination of class and attribute value in the training set. <br />4<br />
- 5. Calculating Entropy in Practice <br />The value of Enew can be calculated by a sum as follows:<br />For every non-zero value V in the main body of the table, subtract V*𝑙𝑜𝑔2V<br />For every non-zero value S in the column sum row, add S*𝑙𝑜𝑔2<br />𝐸𝑛𝑒𝑤=(-2𝑙𝑜𝑔22−1𝑙𝑜𝑔21−1𝑙𝑜𝑔21−2𝑙𝑜𝑔22−2𝑙𝑜𝑔22−1𝑙𝑜𝑔21−4𝑙𝑜𝑔2 4−5𝑙𝑜𝑔25−6𝑙𝑜𝑔26)+(8𝑙𝑜𝑔28+8𝑙𝑜𝑔28+8𝑙𝑜𝑔28)=1.2867<br />Agrees with the value calculated previously<br /> <br />5<br />
- 6. GiniIndex of Diversity<br />6<br />
- 7. Gini Index of Diversity<br /> Training Set 1 (age=1) for lens24<br />7<br />
- 8. Gini Index of Diversity<br />If there are K classes, with the probability of the ith class being 𝑝𝑖, the Gini Index is defined as: <br />𝐺𝑠𝑡𝑎𝑟𝑡=1- 𝑖=1𝐾𝑝𝑖2<br />Lens24 dataset: 24 instances, 3 classes (hard:4, soft:5, no: 15)<br />𝑝1=4/24; 𝑝2=5/24; 𝑝3=15/24<br />𝐺𝑠𝑡𝑎𝑟𝑡=0.5382<br /> <br />8<br />
- 9. Gini Index of Diversity<br />We can now calculate the new value of the Gini Index as follows<br />For each non-empty column, form the sum of the squares of the values in the body of the table and divide by the column sum. <br />Add the values obtained for all the columns and divide by N (the number of instances)<br />Subtract the total from 1<br />9<br />
- 10. Gini Index of Diversity<br /> Frequency Table for Attribute age for lens24<br />Age=1: (22+22+42)/8=3<br />Age=2: (12+22+52)/8=3.75<br />Age=3: (12+12+62)/8=4.75<br />𝐺𝑛𝑒𝑤=1-(3+3.75+4.75)/24=0.5208<br /> <br />10<br />
- 11. Gini Index of Diversity<br />specRx: 𝐺𝑛𝑒𝑤 = 0.5278, so G = 0.5382-0.5278=0.0104<br />astig: 𝐺𝑛𝑒𝑤 = 0.4653, so G = 0.5382-0.4653=0.0729<br />tears: 𝐺𝑛𝑒𝑤 = 0.3264, so G = 0.5382-0.3264=0.2118<br />𝐺𝑚𝑎𝑥= 𝐺𝑛𝑒𝑤(tears) This is the same attribute that was selected using entropy<br /> <br />11<br />
- 12. Inductive Bias<br />Fin the next term in the sequences<br />1, 4, 9, 16, ? <br />Most readers will probably have chosen the answer 25, but this is misguided. The correct answer is 20. <br />nth term = (−5𝑛4+50𝑛3−151𝑛2+250-120)/24<br /> <br />12<br />
- 13. Inductive Bias<br />Inductive bias:<br /><ul><li>A preference for one choice rather than another
- 14. Determined by external factors such as our preferences, simplicity, familiarity
- 15. Any formula we use for it introduces an inductive bias</li></ul>13<br />
- 16. Using Gain Ratio for Attribute Selection<br />Gain Ratio is used to reduce the effect of the bias resulting from the use of information gain. <br />Information Gain = 𝐸𝑠𝑡𝑎𝑟𝑡-𝐸𝑛𝑒𝑤<br />Gain Ratio = Information Gain/Split Information <br />Split Information is a value based on the column sums, each non-zero column sum s contributes – (s/N)𝑙𝑜𝑔2(s/N) to the Split Information<br /> <br />14<br />
- 17. Using Gain Ratio for Attribute Selection<br /> Frequency Table for Attribute age for lens24<br />Split Information = - (8/24)𝑙𝑜𝑔2824− (8/24)𝑙𝑜𝑔2824- (8/24)𝑙𝑜𝑔2824=1.5850<br />Gain Ratio = 0.0394/1.5850=0.0249<br />Gain Ratio for splitting on attributes specRx, astig, and tears are 0.0395, 0.3770 and 0. 5488. <br />The largest value is for attribute tears, so in this case Gain Ratio selects the same attribute as entropy. <br /> <br />15<br />
- 18. The end<br />16<br />

No public clipboards found for this slide

Be the first to comment