Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Data Applied: Developer Quicklook by dataapplied content 369 views
- Data Applied:Outliers by dataapplied content 689 views
- Data Applied: Clustering by dataapplied content 480 views
- Data Applied: Correlation by dataapplied content 587 views
- Data Applied: Association by dataapplied content 354 views
- Data Applied: Forecast by dataapplied content 561 views

597 views

Published on

Data Applied: Decision

No Downloads

Total views

597

On SlideShare

0

From Embeds

0

Number of Embeds

14

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Data-Applied.com: Decision<br />
- 2. Introduction<br />Decision trees let you construct decision models<br />They can be used for forecasting, classification or decision<br />At each branch the data is spit based on a particular field of data<br />Decision trees are constructed using Divide and Conquer techniques <br />
- 3. Divide-and-Conquer: Constructing Decision Trees<br />Steps to construct a decision tree recursively:<br />Select an attribute to placed at root node and make one branch for each possible value <br />Repeat the process recursively at each branch, using only those instances that reach the branch<br /> If at any time all instances at a node have the classification, stop developing that part of the tree<br />Problem: How to decide which attribute to split on<br />
- 4. Divide-and-Conquer: Constructing Decision Trees<br />Steps to find the attribute to split on:<br />We consider all the possible attributes as option and branch them according to different possible values<br />Now for each possible attribute value we calculate Information and then find the Information gain for each attribute option<br />Select that attribute for division which gives a Maximum Information Gain<br />Do this until each branch terminates at an attribute which gives Information = 0 <br />
- 5. Divide-and-Conquer: Constructing Decision Trees<br />Calculation of Information and Gain:<br />For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 <br />Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPn<br />Gain = Information before division – Information after division <br />
- 6. Divide-and-Conquer: Constructing Decision Trees<br />Example:<br />Here we have consider each<br />attribute individually<br />Each is divided into branches <br />according to different possible <br />values <br />Below each branch the number of<br />class is marked <br />
- 7. Divide-and-Conquer: Constructing Decision Trees<br />Calculations:<br />Using the formulae for Information, initially we have<br />Number of instances with class = Yes is 9<br /> Number of instances with class = No is 5<br />So we have P1 = 9/14 and P2 = 5/14<br />Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bits<br />Now for example lets consider Outlook attribute, we observe the following:<br />
- 8. Divide-and-Conquer: Constructing Decision Trees<br />Example Contd.<br />Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2])<br /> = 0.940 – 0.693 = 0.247 bits<br />Gain (outlook) = 0.247 bits<br /> Gain (temperature) = 0.029 bits<br /> Gain (humidity) = 0.152 bits<br /> Gain (windy) = 0.048 bits<br />So since Outlook gives maximum gain, we will use it for division<br />And we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it <br />
- 9. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: The problem<br />If we follow the previously subscribed method, it will always favor an attribute with the largest number of branches<br />In extreme cases it will favor an attribute which has different value for each instance: Identification code<br />
- 10. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: The problem<br />Information for such an attribute is 0<br />info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0<br />It will hence have the maximum gain and will be chosen for branching<br />But such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of division<br />So we use gain ratio to compensate for this <br />
- 11. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: Gain ratio<br />Gain ratio = gain/split info<br />To calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the class<br />Then we calculate the split info, so for identification code with 14 different values we have:<br />info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807<br />For Outlook we will have the split info:<br />info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577<br />
- 12. Decision using Data Applied’s web interface<br />
- 13. Step1: Selection of data<br />
- 14. Step2: SelectingDecision<br />
- 15. Step3: Result<br />
- 16. Visit more self help tutorials<br /><ul><li>Pick a tutorial of your choice and browse through it at your own pace.
- 17. The tutorials section is free, self-guiding and will not involve any additional support.
- 18. Visit us at www.dataminingtools.net</li>

No public clipboards found for this slide

Be the first to comment