Introduction to IEEE STANDARDS and its different types.pptx
CART – Classification & Regression Trees
1. ACTIVE LEARNING
ASSIGNMENT FOR THE
SUBJECT
“DATA MINING
&
BUSINESS INTELLIGENCE”
CART – Classification & Regression Trees
Guided By : -
Mitali Sonar
Prepared By :-
Hemant H. Chetwani
(130410107010 LY CE-II)
5. Classification
Classification is a data mining technique used for
systematic placement of group membership of data.
It maps the data into predefined groups or classes
and searches for new patterns.
For example, you may wish to use classification to
predict whether the weather on a particular day will
be “sunny”, “rainy”, or “cloudy”.
6. Regression
Used to predict for individuals on the basis of information
gained from a previous sample of similar individuals.
For example, A person wants do some savings for future and
then It will be based on his current values and several past
values. He uses a linear regression formula to predict his
future savings.
It may also be used in modelling the effect of doses in
medicines or agriculture, response of a customer to a mail
and evaluate the risk that the client will not pay back the loan
taken from the bank.
7. What is CART?
Classification And Regression Trees
Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
Introduced tree-based modeling into the statistical mainstream,
rigorous approach involving cross-validation to select the optimal
tree.
One of many tree-based modeling techniques.
CART -- the classic
CHAID
C5.0
Software package variants (SAS, S-Plus, R…)
8. Philosophy
“Data analysis can be done from a number of different
viewpoints. Tree structured regression offers an interesting
alternative for looking at regression type problems. It has
sometimes given clues to data structure not apparent from a
linear regression analysis. Like any tool, its greatest benefit lies
in its intelligent and sensible application.”
--Breiman, Friedman, Olshen,
Stone
10. When & What ?
If the dependent variable is categorical, CART produces a
classification tree. And if the variable is continuous, it
produces a regression tree.
11. THE KEY IDEA
Recursive Partitioning
Take all of your data.
Consider all possible values of all variables.
Select the variable/value (X=t1) that produces the greatest
“separation” in the target.
(X=t1) is called a “split”.
If (X< t1) then send the data to the “left”; otherwise, send data point
to the “right”.
Now repeat same process on these two “nodes”
You get a “tree”
Note: CART only uses binary splits.
13. STEP 1
Starting with the first variable, CART splits a variable at all of
its possible split points. At each possible split point of the
variable, the sample splits into two binary or child nodes.
Cases with the “yes” response to the question posed are sent
to the left node and the “no” responses are sent to the right
node.
It is also possible to define these split based on linear
combinations of variables.
14. STEP 2
CART the applies its goodness of a split criteria to each split
point and evaluates the reduction in impurity, or
heterogeneity due to the split.
This is based on the “Split criterion”. This works in the
following fashion:
Suppose the dependent variable is categorical, taking on
the value of 1 and 2.
The probability distribution of these variables at a given
node t are p(1|t) & p(2|t), respectively.
15. STEP 2
A measure of heterogeneity, or impurity at node, i(t) is a
function of these probabilities,
In the case of categorical dependent variables, CART allows
for a number of specifications of this function.
The objective is to maximize the reduction in the degree of
heterogeneity in i(t).
i(t) = N ( p(1|t), p(2|t) ).
where, i(t) is a generic function.
16. STEPS 3, 4 & 5
It selects the best split on the variable as that split for which
reduction in impurity is the highest, as described in step 2.
Steps 1-3 are repeated for each of the remaining variables at
the root node. CART then ranks all the “best” splits on each
variable according to the reduction in impurity achieved by
each split.
It selects the variable and its split point that most reduced
impurity of the root or parent node.
17. STEPS 6 & 7
CART then assigns classes to these nodes according to a rule
that minimizes misclassification costs. Although all
classification tree procedures will generate some errors, there
are algorithms within CART designed to minimize these.
Steps 1-6 are repeatedly applied to each non – terminal child
node at each of the successive stages.
18. STEP 8
CART continues the splitting process and builds a large tree.
The large tree can be achieved if the splitting process
continues until every observation constitutes a terminal node.
Obviously, such a tree will have a large number of terminal
nodes that are either pure or very small in content.
Having generated a large tree, CART then prunes the result
using cross – validation & creates a sequence of a nested
trees. This also produce a cross – validation error rate & from
this the optimal tree is selected.
19. Simple Example
Goal: Classify a record as “is owner” or “not”
Rule might be “If lot size < 19, and if income > 84.75, then class =
“owner”.
Recursive partitioning
Repeatedly split the records into two parts so as to achieve
maximum homogeneity within the new parts
Pruning the tree
Simplify the tree by pruning peripheral branches to avoid overfitting.
20. Impurity
Obtain overall impurity measure (weighted avg. of individual
rectangles).
At each successive stage, compare this measure across all
possible splits in all variables.
Choose the split that reduces impurity the most.
Chosen split points become nodes on the tree.
24. Summary
Classification and Regression Trees are an easily
understandable and transparent method for predicting or
classifying new records.
A tree is a graphical representation of a set of rules.
Trees must be pruned to avoid over-fitting of the training
data.
As trees do not make any assumptions about the data
structure, they usually require large samples.