Data-Applied.com: Decision
IntroductionDecision trees let you construct decision modelsThey can be used for forecasting, classification or decisionAt each branch the data is spit based on a particular field of dataDecision trees are constructed using Divide and Conquer techniques
Divide-and-Conquer: Constructing Decision TreesSteps to construct a decision tree recursively:Select an attribute to placed at root node and make one branch for each possible value Repeat the process recursively at each branch, using only those instances that reach the branch If at any time all instances at a node have the classification, stop developing that part of the treeProblem: How to decide which attribute to split on
Divide-and-Conquer: Constructing Decision TreesSteps to find the attribute to split on:We consider all the possible attributes as option and branch them according to different possible valuesNow for each possible attribute value we calculate Information and then find the Information gain for each attribute optionSelect that attribute for division which gives a Maximum Information GainDo this until each branch terminates at an attribute which gives Information = 0
Divide-and-Conquer: Constructing Decision TreesCalculation of Information and Gain:For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 Information(P1, P2 …..Pn)  =  -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPnGain  = Information before division – Information after division
Divide-and-Conquer: Constructing Decision TreesExample:Here we have consider eachattribute individuallyEach is divided into branches according to different possible values Below each branch the number ofclass is marked
Divide-and-Conquer: Constructing Decision TreesCalculations:Using the formulae for Information, initially we haveNumber of instances with class = Yes is 9 Number of instances with class = No is 5So we have P1 = 9/14 and P2 = 5/14Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bitsNow for example lets consider Outlook attribute, we observe the following:
Divide-and-Conquer: Constructing Decision TreesExample Contd.Gain by using Outlook for division        = info([9,5]) – info([2,3],[4,0],[3,2])				                          = 0.940 – 0.693 = 0.247 bitsGain (outlook) = 0.247 bits	Gain (temperature) = 0.029 bits	Gain (humidity) = 0.152 bits	Gain (windy) = 0.048 bitsSo since Outlook gives maximum gain, we will use it for divisionAnd we repeat the steps for Outlook = Sunny and Rainy and stop for 	Overcast since we have Information = 0 for it
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemIf we follow the previously subscribed method, it will always favor an attribute with the largest number of  branchesIn extreme cases it will favor an attribute which has different value for each instance: Identification code
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemInformation for such an attribute is 0info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0It will hence have the maximum gain and will be chosen for branchingBut such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of divisionSo we use gain ratio to compensate for this
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: Gain ratioGain ratio =  gain/split infoTo calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the classThen we calculate the split info, so for identification code with 14 different values we have:info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807For Outlook we will have the split info:info([5,4,5]) =  -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5  = 1.577
Decision using Data Applied’s web interface
Step1: Selection of data
Step2: SelectingDecision
Step3: Result
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.

Data Applied:Decision Trees

  • 1.
  • 2.
    IntroductionDecision trees letyou construct decision modelsThey can be used for forecasting, classification or decisionAt each branch the data is spit based on a particular field of dataDecision trees are constructed using Divide and Conquer techniques
  • 3.
    Divide-and-Conquer: Constructing DecisionTreesSteps to construct a decision tree recursively:Select an attribute to placed at root node and make one branch for each possible value Repeat the process recursively at each branch, using only those instances that reach the branch If at any time all instances at a node have the classification, stop developing that part of the treeProblem: How to decide which attribute to split on
  • 4.
    Divide-and-Conquer: Constructing DecisionTreesSteps to find the attribute to split on:We consider all the possible attributes as option and branch them according to different possible valuesNow for each possible attribute value we calculate Information and then find the Information gain for each attribute optionSelect that attribute for division which gives a Maximum Information GainDo this until each branch terminates at an attribute which gives Information = 0
  • 5.
    Divide-and-Conquer: Constructing DecisionTreesCalculation of Information and Gain:For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPnGain = Information before division – Information after division
  • 6.
    Divide-and-Conquer: Constructing DecisionTreesExample:Here we have consider eachattribute individuallyEach is divided into branches according to different possible values Below each branch the number ofclass is marked
  • 7.
    Divide-and-Conquer: Constructing DecisionTreesCalculations:Using the formulae for Information, initially we haveNumber of instances with class = Yes is 9 Number of instances with class = No is 5So we have P1 = 9/14 and P2 = 5/14Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bitsNow for example lets consider Outlook attribute, we observe the following:
  • 8.
    Divide-and-Conquer: Constructing DecisionTreesExample Contd.Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bitsGain (outlook) = 0.247 bits Gain (temperature) = 0.029 bits Gain (humidity) = 0.152 bits Gain (windy) = 0.048 bitsSo since Outlook gives maximum gain, we will use it for divisionAnd we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it
  • 9.
    Divide-and-Conquer: Constructing DecisionTreesHighly branching attributes: The problemIf we follow the previously subscribed method, it will always favor an attribute with the largest number of branchesIn extreme cases it will favor an attribute which has different value for each instance: Identification code
  • 10.
    Divide-and-Conquer: Constructing DecisionTreesHighly branching attributes: The problemInformation for such an attribute is 0info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0It will hence have the maximum gain and will be chosen for branchingBut such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of divisionSo we use gain ratio to compensate for this
  • 11.
    Divide-and-Conquer: Constructing DecisionTreesHighly branching attributes: Gain ratioGain ratio = gain/split infoTo calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the classThen we calculate the split info, so for identification code with 14 different values we have:info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807For Outlook we will have the split info:info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577
  • 12.
    Decision using DataApplied’s web interface
  • 13.
  • 14.
  • 15.
  • 16.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.
  • 17.
    The tutorials sectionis free, self-guiding and will not involve any additional support.
  • 18.
    Visit us atwww.dataminingtools.net