Your SlideShare is downloading. ×
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Data Applied: Decision
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Applied: Decision

436

Published on

Data Applied: Decision

Data Applied: Decision

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
436
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data-Applied.com: Decision
  • 2. Introduction
    Decision trees let you construct decision models
    They can be used for forecasting, classification or decision
    At each branch the data is spit based on a particular field of data
    Decision trees are constructed using Divide and Conquer techniques
  • 3. Divide-and-Conquer: Constructing Decision Trees
    Steps to construct a decision tree recursively:
    Select an attribute to placed at root node and make one branch for each possible value
    Repeat the process recursively at each branch, using only those instances that reach the branch
    If at any time all instances at a node have the classification, stop developing that part of the tree
    Problem: How to decide which attribute to split on
  • 4. Divide-and-Conquer: Constructing Decision Trees
    Steps to find the attribute to split on:
    We consider all the possible attributes as option and branch them according to different possible values
    Now for each possible attribute value we calculate Information and then find the Information gain for each attribute option
    Select that attribute for division which gives a Maximum Information Gain
    Do this until each branch terminates at an attribute which gives Information = 0
  • 5. Divide-and-Conquer: Constructing Decision Trees
    Calculation of Information and Gain:
    For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1
    Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPn
    Gain = Information before division – Information after division
  • 6. Divide-and-Conquer: Constructing Decision Trees
    Example:
    Here we have consider each
    attribute individually
    Each is divided into branches
    according to different possible
    values
    Below each branch the number of
    class is marked
  • 7. Divide-and-Conquer: Constructing Decision Trees
    Calculations:
    Using the formulae for Information, initially we have
    Number of instances with class = Yes is 9
    Number of instances with class = No is 5
    So we have P1 = 9/14 and P2 = 5/14
    Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bits
    Now for example lets consider Outlook attribute, we observe the following:
  • 8. Divide-and-Conquer: Constructing Decision Trees
    Example Contd.
    Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2])
    = 0.940 – 0.693 = 0.247 bits
    Gain (outlook) = 0.247 bits
    Gain (temperature) = 0.029 bits
    Gain (humidity) = 0.152 bits
    Gain (windy) = 0.048 bits
    So since Outlook gives maximum gain, we will use it for division
    And we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it
  • 9. Divide-and-Conquer: Constructing Decision Trees
    Highly branching attributes: The problem
    If we follow the previously subscribed method, it will always favor an attribute with the largest number of branches
    In extreme cases it will favor an attribute which has different value for each instance: Identification code
  • 10. Divide-and-Conquer: Constructing Decision Trees
    Highly branching attributes: The problem
    Information for such an attribute is 0
    info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0
    It will hence have the maximum gain and will be chosen for branching
    But such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of division
    So we use gain ratio to compensate for this
  • 11. Divide-and-Conquer: Constructing Decision Trees
    Highly branching attributes: Gain ratio
    Gain ratio = gain/split info
    To calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the class
    Then we calculate the split info, so for identification code with 14 different values we have:
    info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807
    For Outlook we will have the split info:
    info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577
  • 12. Decision using Data Applied’s web interface
  • 13. Step1: Selection of data
  • 14. Step2: SelectingDecision
  • 15. Step3: Result
  • 16. Visit more self help tutorials
    • Pick a tutorial of your choice and browse through it at your own pace.
    • 17. The tutorials section is free, self-guiding and will not involve any additional support.
    • 18. Visit us at www.dataminingtools.net

×