3. Steps in Data MiningSteps in Data Mining
a.a. Exploration.Exploration.
b.b. Model Building and Validation.Model Building and Validation.
c.c. Deployment.Deployment.
4. Techniques Used In DataTechniques Used In Data
MiningMining
Association analysis.Association analysis.
Decision trees.Decision trees.
Neural networks.Neural networks.
Statistical methods in general.Statistical methods in general.
5. Decision TreesDecision Trees
A decision tree takes as input an objectA decision tree takes as input an object
or situation described by a set ofor situation described by a set of
properties, and outputs a yes/noproperties, and outputs a yes/no
decision. Decision trees thereforedecision. Decision trees therefore
represent Boolean functions. Functionsrepresent Boolean functions. Functions
with a larger range of outputs can also bewith a larger range of outputs can also be
represented...."represented...."
6. Decision TreesDecision Trees
Are Also Known As Classification TreesAre Also Known As Classification Trees
Regression Trees: A Variant of DecisionRegression Trees: A Variant of Decision
Trees.Trees.
7. Classification versusClassification versus
Regression TreesRegression Trees
As with all regression techniques weAs with all regression techniques we
assume the existence of a singleassume the existence of a single
response (target) variable and one orresponse (target) variable and one or
more predictor variables. If the responsemore predictor variables. If the response
variable is categorical then classificationvariable is categorical then classification
or decision trees are created and if theor decision trees are created and if the
response variable is continuous thenresponse variable is continuous then
regression trees are produced.regression trees are produced.
8. Target variable is Group (G) with a binary Response.
A, C & D are Continuous Predictors and B is Categorical
9. Flexibility ofFlexibility of
Classification TreesClassification Trees
The ability ofThe ability of classification treesclassification trees is tois to
examine the effects of the predictorexamine the effects of the predictor
variables one at a time.variables one at a time.
10. How To Split ?How To Split ?
With a Categorical Predictor Having N LevelsWith a Categorical Predictor Having N Levels
There Can Be 2There Can Be 2k-1k-1 -1 Candidate Splits.-1 Candidate Splits.
With a Continuous Predictor Having N DistinctWith a Continuous Predictor Having N Distinct
Values There Can Be 2Values There Can Be 2N-1N-1 -1 Candidate Splits.-1 Candidate Splits.
All Levels of All Predictors Can be Equally LikelyAll Levels of All Predictors Can be Equally Likely
Candidates For Splitting.Candidates For Splitting.
We Have To Choose a Value Which DecreasesWe Have To Choose a Value Which Decreases
The Misclassification.The Misclassification.
11. Splitting ContinuedSplitting Continued
Split till the time misclassification in theSplit till the time misclassification in the
terminal nodes keeps on decreasing.terminal nodes keeps on decreasing.
Splitting beyond a certain depth does notSplitting beyond a certain depth does not
decrease the misclassification.decrease the misclassification.
In certain cases splitting beyond a certainIn certain cases splitting beyond a certain
depth may increase the misclassificationdepth may increase the misclassification
as well.as well.