Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Applied: Decision

597 views

Published on

Data Applied: Decision

Published in: Technology, Education
  • Be the first to comment

Data Applied: Decision

  1. 1. Data-Applied.com: Decision<br />
  2. 2. Introduction<br />Decision trees let you construct decision models<br />They can be used for forecasting, classification or decision<br />At each branch the data is spit based on a particular field of data<br />Decision trees are constructed using Divide and Conquer techniques <br />
  3. 3. Divide-and-Conquer: Constructing Decision Trees<br />Steps to construct a decision tree recursively:<br />Select an attribute to placed at root node and make one branch for each possible value <br />Repeat the process recursively at each branch, using only those instances that reach the branch<br /> If at any time all instances at a node have the classification, stop developing that part of the tree<br />Problem: How to decide which attribute to split on<br />
  4. 4. Divide-and-Conquer: Constructing Decision Trees<br />Steps to find the attribute to split on:<br />We consider all the possible attributes as option and branch them according to different possible values<br />Now for each possible attribute value we calculate Information and then find the Information gain for each attribute option<br />Select that attribute for division which gives a Maximum Information Gain<br />Do this until each branch terminates at an attribute which gives Information = 0 <br />
  5. 5. Divide-and-Conquer: Constructing Decision Trees<br />Calculation of Information and Gain:<br />For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 <br />Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPn<br />Gain = Information before division – Information after division <br />
  6. 6. Divide-and-Conquer: Constructing Decision Trees<br />Example:<br />Here we have consider each<br />attribute individually<br />Each is divided into branches <br />according to different possible <br />values <br />Below each branch the number of<br />class is marked <br />
  7. 7. Divide-and-Conquer: Constructing Decision Trees<br />Calculations:<br />Using the formulae for Information, initially we have<br />Number of instances with class = Yes is 9<br /> Number of instances with class = No is 5<br />So we have P1 = 9/14 and P2 = 5/14<br />Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bits<br />Now for example lets consider Outlook attribute, we observe the following:<br />
  8. 8. Divide-and-Conquer: Constructing Decision Trees<br />Example Contd.<br />Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2])<br /> = 0.940 – 0.693 = 0.247 bits<br />Gain (outlook) = 0.247 bits<br /> Gain (temperature) = 0.029 bits<br /> Gain (humidity) = 0.152 bits<br /> Gain (windy) = 0.048 bits<br />So since Outlook gives maximum gain, we will use it for division<br />And we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it <br />
  9. 9. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: The problem<br />If we follow the previously subscribed method, it will always favor an attribute with the largest number of branches<br />In extreme cases it will favor an attribute which has different value for each instance: Identification code<br />
  10. 10. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: The problem<br />Information for such an attribute is 0<br />info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0<br />It will hence have the maximum gain and will be chosen for branching<br />But such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of division<br />So we use gain ratio to compensate for this <br />
  11. 11. Divide-and-Conquer: Constructing Decision Trees<br />Highly branching attributes: Gain ratio<br />Gain ratio = gain/split info<br />To calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the class<br />Then we calculate the split info, so for identification code with 14 different values we have:<br />info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807<br />For Outlook we will have the split info:<br />info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577<br />
  12. 12. Decision using Data Applied’s web interface<br />
  13. 13. Step1: Selection of data<br />
  14. 14. Step2: SelectingDecision<br />
  15. 15. Step3: Result<br />
  16. 16. Visit more self help tutorials<br /><ul><li>Pick a tutorial of your choice and browse through it at your own pace.
  17. 17. The tutorials section is free, self-guiding and will not involve any additional support.
  18. 18. Visit us at www.dataminingtools.net</li>

×