2. CORRELATION
• Correlation is the degree of inter-relatedness
among the two or more variables.
• Correlation analysis is a process to find out
the degree of relationship between two or
more variables by applying various statistical
tools and tecniques.
3. THREE STAGES TO SOLVE
CORRELATION PROBLEM
•Determination of relationship,
if yes, measure it.
•Significance of correlation.
•Establishing the cause and
effect relationship , if any
4. USES OF CORRELATION
ANALYSIS
• It is used in deriving the degree and direction
of relationship within the variables.
• It is used in reducing the range of
uncertainty in matter of prediction.
• It I used in presenting the average
relationship between any two variables
through a single value of coefficient of
correlation.
6. DECISION TREE
• Classification is a most familiar and most
popular data mining technique.
• Classification applications includes images and
pattern recognition, loan approval, detecting
faults in industrial applications.
• All approaches to performing classification
assumes some knowledge of the data.
• Training set is used to develop specific
parameters required by the techniques.
7. DECISION TREE ALGORITHM
• INPUT
T
D
• OUTPUT
M
• DT Proc algorithm:
for each t € D do
Obtain answer to question on n applied t;
Identify are from I which contains correct answer;
N=node at end of this arc;
Make prediction for I based on labeling of n;
8. ALGORITHM DEFINITON
• The decision tree approach is most useful in
classification problems, with this technique, a
tree is constructed to model the
classification process.
• Once the tree is build, it is applied to each
tuple in the database and results in a
classification for that tuple.
• There are two basics step in this technique:
Building the tree and applying the tree to
the database.
10. DISADVANTAGES OF DECISION
TREE
• May suffer from over fitting.
• Classifies by rectangular
partitioning.
• Does not easily handle nonnumeric
data.
• Can be quite large– pruning in
necessary.
11. EXAMPLE
• The classification of an unknown input vector
is done by traversing the tree from the root
node of the leaf node.
• E.g : outlook=rain, temp=70,humanity=65, and
weather=true…. Then find the value of class
attribute???????
12. TREE CONSTRUCTION
PRINCIPLE
• Splitting Attribute
• Splitting Criterion
• 3 main phases:
• Construction phase
• Pruning phase
• Processing the pruned tree to improve the
understandability.