SlideShare a Scribd company logo
1 of 121
Classification: Basic Concepts and
Techniques
1
Classification: Definition
 Classification which is the task of assigning objects to
one of several predefined categories.
1/9/2023 Introduction to Data Mining, 2nd Edition 2
1/9/2023 Introduction to Data Mining, 2nd Edition 3
 Classification is the task of learning a target function f
that maps each attribute set x to one of the predefined
class y.
 The target function is also known informally as a
classification model.
 A classification model is useful for the following
purposes:
Descriptive Modeling
Predictive Modeling
1/9/2023 Introduction to Data Mining, 2nd Edition 4
Given a collection of records (training set )
 Each record is by characterized by a tuple (x,y), where x
is the attribute set and y is the class label
 x: attribute, predictor, independent variable, input
 y: class, response, dependent variable, output
1/9/2023 Introduction to Data Mining, 2nd Edition 5
General Approach for Building Classification
Model
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
1/9/2023 Introduction to Data Mining, 2nd Edition 6
Evaluation of the performance of
a classification model
1/9/2023 Introduction to Data Mining, 2nd Edition 7
Performation Metrics
1/9/2023 Introduction to Data Mining, 2nd Edition 8
Decision Tree Induction
 Consider a simpler version of the vertebrate
classification problem described in the previous
section
 Instead of classifying the vertebrates into five distinct
groups of species, we assign them to two categories:
mammals and non-mammals
1/9/2023 Introduction to Data Mining, 2nd Edition 9
1/9/2023 Introduction to Data Mining, 2nd Edition 10
Decision Tree Induction
 A root node that has no incoming edges and zero or
more outgoing edges.
 Internal nodes, each of which has exactly one
incoming edge and two or more outgoing edges.
 Leaf or terminal nodes, each of which has exactly
one incoming edge and no outgoing edges. In a
decision tree, each leaf node is assigned a class label.
1/9/2023 Introduction to Data Mining, 2nd Edition 11
1/9/2023 Introduction to Data Mining, 2nd Edition 12
Classifying unlabeled vertebrate
1/9/2023 Introduction to Data Mining, 2nd Edition 13
Decision Tree Induction
 Many Algorithms:
 Hunt’s Algorithm (one of the earliest)
 CART
 ID3, C4.5
 SLIQ,SPRINT
1/9/2023 Introduction to Data Mining, 2nd Edition 14
Hunt’s AlgoritHm
Let Dt be the set of training records that are associated
with node t and y = {y1, y2, . . . , yc} be the class labels.
The following is a recursive definition of Hunt‘s algorithm.
 Step 1: If all the records in Data belong to the same class yt,
then t is a leaf node labeled as yt.
 Step 2: If Data contains records that belong to more than
one class, an attribute test condition is selected to partition
the records into smaller subsets. A child node is created for
each outcome of the test condition and the records in Dt are
distributed to the children based on the outcomes. The
algorithm is then recursively applied to each child node
1/9/2023 Introduction to Data Mining, 2nd Edition 15
Hunt’s Algorithm
ID
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(a) (b)
(c)
Defaulted = No
Home
Owner
Yes No
Defaulted = No Defaulted = No
Yes No
Marital
Status
Single,
Divorced
Married
(d)
Yes No
Marital
Status
Single,
Divorced
Married
Annual
Income
< 80K >= 80K
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = Yes
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = No
Defaulted = Yes
1/9/2023 Introduction to Data Mining, 2nd Edition 16
(3,0) (4,3)
(3,0)
(1,3) (3,0)
(3,0)
(1,0) (0,3)
(3,0)
(7,3)
Defaulted barrower
Hunt’s Algorithm
ID
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(a) (b)
(c)
Defaulted = No
Home
Owner
Yes No
Defaulted = No Defaulted = No
Yes No
Marital
Status
Single,
Divorced
Married
(d)
Yes No
Marital
Status
Single,
Divorced
Married
Annual
Income
< 80K >= 80K
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = Yes
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = No
Defaulted = Yes
1/9/2023 Introduction to Data Mining, 2nd Edition 17
(3,0) (4,3)
(3,0)
(1,3) (3,0)
(3,0)
(1,0) (0,3)
(3,0)
(7,3)
Hunt’s Algorithm
ID
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(a) (b)
(c)
Defaulted = No
Home
Owner
Yes No
Defaulted = No Defaulted = No
Yes No
Marital
Status
Single,
Divorced
Married
(d)
Yes No
Marital
Status
Single,
Divorced
Married
Annual
Income
< 80K >= 80K
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = Yes
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = No
Defaulted = Yes
1/9/2023 Introduction to Data Mining, 2nd Edition 18
(3,0) (4,3)
(3,0)
(1,3) (3,0)
(3,0)
(1,0) (0,3)
(3,0)
(7,3)
Hunt’s Algorithm
(a) (b)
(c)
Defaulted = No
Home
Owner
Yes No
Defaulted = No Defaulted = No
Yes No
Marital
Status
Single,
Divorced
Married
(d)
Yes No
Marital
Status
Single,
Divorced
Married
Annual
Income
< 80K >= 80K
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = Yes
Home
Owner
Defaulted = No
Defaulted = No
Defaulted = No
Defaulted = Yes
1/9/2023 Introduction to Data Mining, 2nd Edition 19
(3,0) (4,3)
(3,0)
(1,3) (3,0)
(3,0)
(1,0) (0,3)
(3,0)
(7,3)
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 20
Home
Owner
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Start from the root of tree.
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 21
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 22
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 23
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 24
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Apply Model to Test Data
1/9/2023 Introduction to Data Mining, 2nd Edition 25
MarSt
Income
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Assign Defaulted to
“No”
Home
Owner
Design Issues of Decision Tree Induction
 How should training records be split?
 Method for specifying test condition
 depending on attribute types
 Measure for evaluating the goodness of a test condition
 How should the splitting procedure stop?
 Stop splitting if all the records belong to the same class
or have identical attribute values
 Early termination
1/9/2023 Introduction to Data Mining, 2nd Edition 26
Methods for Expressing Test Conditions
 Depends on attribute types
 Binary
 Nominal
 Ordinal
 Continuous
1/9/2023 Introduction to Data Mining, 2nd Edition 27
Binary Attributes:
1/9/2023 Introduction to Data Mining, 2nd Edition 28
Test Condition for Nominal Attributes
 Multi-way split:
 Use as many partitions as distinct
values.
 Binary split:
 Divides values into two subsets
Marital
Status
Single Divorced Married
{Single} {Married,
Divorced}
Marital
Status
{Married} {Single,
Divorced}
Marital
Status
OR
1/9/2023 Introduction to Data Mining, 2nd Edition 29
OR
{Single,
Married}
Marital
Status
{Divorced}
Test Condition for Ordinal Attributes
 Multi-way split:
 Use as many partitions
as distinct values
 Binary split:
 Divides values into two
subsets
 Preserve order property
among attribute values
Large
Shirt
Size
Medium Extra Large
Small
{Medium, Large,
Extra Large}
Shirt
Size
{Small}
{Large,
Extra Large}
Shirt
Size
{Small,
Medium}
1/9/2023 Introduction to Data Mining, 2nd Edition 30
{Medium,
Extra Large}
Shirt
Size
{Small,
Large}
This grouping
violates order
property
Test Condition for Continuous Attributes
Annual
Income
> 80K?
Yes No
Annual
Income?
(i) Binary split (ii) Multi-way split
< 10K
[10K,25K) [25K,50K) [50K,80K)
> 80K
1/9/2023 Introduction to Data Mining, 2nd Edition 31
How to determine the Best Split
C0: 5
C1: 5
1/9/2023 Introduction to Data Mining, 2nd Edition 32
 Greedy approach:
 Nodes with purer class distribution are preferred
 Need a measure of node impurity:
C0: 9
C1: 1
High degree of impurity Low degree of impurity
Measures of for selecting the Best split



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
1/9/2023 Introduction to Data Mining, 2nd Edition 33
 Gini Index
 Entropy
 Misclassification error


 j
t
j
p
t
j
p
t
Entropy )
|
(
log
)
|
(
)
(
)
|
(
max
1
)
( t
i
P
t
Error i


1/9/2023 Introduction to Data Mining, 2nd Edition 34
Examples of computing the different impurity
Comparison among Impurity Measures
1/9/2023 Introduction to Data Mining, 2nd Edition 35
For a 2-class problem:
Finding the Best Split for Binary atrributes
1/9/2023 Introduction to Data Mining, 2nd Edition 36
1/9/2023 Introduction to Data Mining, 2nd Edition 37
If attribute A is chosen to split the data, the Gini index for node N is 0.4898,
for node N2, it is 0.480. The weighted average of the Gini index for the
descendent nodes is (7/12) x 0.4898 + (5/12) x 0.480= 0.486. Similarly we
can show that the weighted average of the Gini index for the attribute B is
0.375.
Categorical Attributes: Computing Gini Index
 For each distinct value, gather counts for each class in the
dataset
 Use the count matrix to make decisions
1/9/2023 Introduction to Data Mining, 2nd Edition 38
CarType
{Sports,
Luxury}
{Family}
C1 9 1
C2 7 3
Gini 0.468
CarType
{Sports}
{Family,
Luxury}
C1 8 2
C2 0 10
Gini 0.167
CarType
Family Sports Luxury
C1 1 8 1
C2 3 0 7
Gini 0.163
Multi-way split Two-way split
(find best partition of values)
Which of these is the best?
Cheat No No No Yes Yes Yes No No No No
Annual Income
60 70 75 85 90 95 100 120 125 220
55 65 72 80 87 92 97 110 122 172 230
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Continuous Attributes: Computing Gini Index...
 For efficient computation: for each attribute,
– Sort the attribute on values
– Linearly scan these values, each time updating the count matrix
and computing gini index
– Choose the split position that has the least gini index
1/9/2023 Introduction to Data Mining, 2nd Edition 39
Split Positions
Sorted Values
Consider the training examples shown in Table for a binary
classification problem.
1/9/2023 Introduction to Data Mining, 2nd Edition 40
(a) Compute the Gini index for the overall collection of
training examples.
(b) Compute the Gini index for the Customer ID attribute.
(c) Compute the Gini index for the Gender attribute.
(d) Compute the Gini index for the Car Type attribute using
multiway split.
(e) Compute the Gini index for the Shirt Size attribute using
multiway split.
(f) Which attribute is better, Gender, Car Type, or Shirt Size?
1/9/2023 Introduction to Data Mining, 2nd Edition 41
Algorithm for Decision Tree Induction
1/9/2023 Introduction to Data Mining, 2nd Edition 42
Characteristics of Decision Tree Induction
 Decision tree induction is a nonparametric
approach for building classification models.
 Finding an optimal decision tree is an NP-complete
problem.
 Techniques developed for constructing decision trees
are computationally inexpensive
 Decision trees, especially smaller-sized trees, are
relatively easy to interpret.
Introduction to Data Mining, 2nd Edition 43
 Decision trees provide an expressive representation for
learning discrete valued functions
 Decision tree algorithms are quite robust to the presence
of noise, especially when methods for avoiding
overfitting
 The presence of redundant attributes does not
adversely affect the accuracy of decision trees.
1/9/2023 Introduction to Data Mining, 2nd Edition 44
 Since most decision tree algorithms employ a top-down,
recursive partitioning approach, the number of records
becomes smaller as we traverse down the tree.
 A subtree can be replicated multiple times in a decision
tree
1/9/2023 Introduction to Data Mining, 2nd Edition 45
Characteristics of Decision Tree Induction
Consider the training examples shown in Table 4.2 for a binary classification
problem.
(a) What is the entropy of this collection of training examples with respect to the positive
class?
(b) What are the information gains of a1 and a2 relative to these training examples?
(c) For a3, which is a continuous attribute, compute the information gain for every possible
split.
(d)What is the best split (among a1, a2, and a3) according to the information gain?
(e) What is the best split (between a1 and a2) according to the classification error rate?
(f) What is the best split (between a1 and a2) according to the Gini index?
Rule-Based Classifier
 Classify records by using a collection of “if…then…”
rules
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
 The left-hand side of the rule is called the rule antecedent or
precondition.
 It contains a conjunction of attribute tests:
Conditioni = (A1 op v1) ∧ (A2 op v2) ∧ . . . (Ak op vk)
 where (Aj , vj) is an attribute-value pair and op is a logical operator chosen from the
set {=, ≠,<,>,≤,≥}. Each attribute test (Aj op vj) is known as a conjunct.
 The right-hand side of the rule is called the rule consequent, which contains
the predicted class yi.
1/9/2023 Introduction to Data Mining, 2nd Edition 48
Rule Representation
1/9/2023 Introduction to Data Mining, 2nd Edition 50
Rule Coverage and Accuracy
 Coverage of a rule:
 Fraction of records that satisfy
the antecedent of a rule
 Accuracy of a rule:
 Fraction of records that satisfy
both the antecedent and
consequent of a rule
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single)  No
Coverage = 40%, Accuracy = 50%
Characteristics of Rule-Based Classifier
 Mutually exclusive rules
 Classifier contains mutually exclusive rules if the rules
are independent of each other
 Every record is covered by at most one rule
 Exhaustive rules
 Classifier has exhaustive coverage if it accounts for every
possible combination of attribute values
 Each record is covered by at least one rule
Effect of Rule Simplification
 Rules are no longer mutually exclusive
 A record may trigger more than one rule
 Solution?
 Ordered rule set
 Unordered rule set – use voting schemes
 Rules are no longer exhaustive
 A record may not trigger any rules
 Solution?
 Use a default class
Ordered Rule Set
 Rules are rank ordered according to their priority
 An ordered rule set is known as a decision list
 When a test record is presented to the classifier
 It is assigned to the class label of the highest ranked rule it has
triggered
 If none of the rules fired, it is assigned to the default class
Rule Ordering Schemes
 Rule-based ordering
 Individual rules are ranked based on their quality
 Class-based ordering
 Rules that belong to the same class appear together
1/9/2023 Introduction to Data Mining, 2nd Edition 57
How to Build a Rule-Based Classifier
 Direct Method:
 Extract rules directly from data
 e.g.: RIPPER, CN2, Holte’s 1R
 Indirect Method:
 Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
 e.g: C4.5rules
Direct Method: Sequential Covering
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is
met
1/9/2023 Introduction to Data Mining, 2nd Edition 60
Example of Sequential Covering
(i) Original Data (ii) Step 1
Example of Sequential Covering…
(iii) Step 2
R1
(iv) Step 3
R1
R2
Rule Growing
 Two common strategies
Rule Evaluation
 An evaluation metric is needed to determine which
conjunct should be added (or removed) during the rule-
growing process.
 Accuracy is an obvious choice because it explicitly measures
the fraction of training examples classified correctly by the
rule.
 a potential limitation of accuracy is that it does not take
into account the rule’s coverage.
1/9/2023 Introduction to Data Mining, 2nd Edition 64
For example, consider a training set that contains 60 positive
examples and 100 negative examples. Suppose we are given the
following two candidate rules:
Rule r1: covers 50 positive examples and 5 negative examples,
Rule r2: covers 2 positive examples and no negative examples.
The accuracies for r1 and r2 are 90.9% and 100%, respectively. However, r1
is the better rule despite its lower accuracy. The high accuracy for r2 is
potentially spurious because the coverage of the rule is too low.
1/9/2023 Introduction to Data Mining, 2nd Edition 65
Approaches to Handle such problem
1. Likelihood ratio statistic:
2. Laplace Measure
3. FOIL’s information gain
1/9/2023 Introduction to Data Mining, 2nd Edition 66
 Likelihood ratio statistic
where k is the number of classes, fi is the observed frequency of
class i examples that are covered by the rule, and ei is the
expected frequency of a rule that makes random predictions
1/9/2023 Introduction to Data Mining, 2nd Edition 67
The likelihood ratio for r1 is
 R(r1) = 2 × [50 × log2(50/20.625) + 5 × log2(5/34.375)]
= 99.9.
The likelihood ratio statistic for r2 is
 R(r2) = 2 × [2 × log2(2/0.75) + 0 × log2(0/1.25)] = 5.66.
This statistic therefore suggests that r1 is a better rule than r2.
1/9/2023 Introduction to Data Mining, 2nd Edition 68
Laplace Measure
where n is the number of examples covered by the rule, f+ is
the number of positive examples covered by the rule, k is the
total number of classes, and p+ is the prior probability for the
positive class.
1/9/2023 Introduction to Data Mining, 2nd Edition 69
The Laplace measure for r1 is 51/57 = 89.47%, which
is quite close to its accuracy. Conversely, the Laplace
measure for r2 (75%) is significantly lower than its
accuracy because r2 has a much lower coverage.
1/9/2023 Introduction to Data Mining, 2nd Edition 70
 FOIL’s information gain
1/9/2023 Introduction to Data Mining, 2nd Edition 71
 Foil’s Information Gain
 R0: {} => class (initial rule)
 R1: {A} => class (rule after adding conjunct)
Gain(R0, R1) = p1 x [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
where t: number of positive instances covered by
both R0 and R1
p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
(a) Rule accuracy
(b) FOIL’s information gain.
(c) likelihood ratio
(d) The Laplace measure.
(e) m-estimate measure (with k = 2 and p+ = 0.2).
1/9/2023 Introduction to Data Mining, 2nd Edition 73
Direct Method: RIPPER
 For 2-class problem, choose one of the classes as positive class,
and the other as negative class
 Learn rules for negative class
 positive class will be default class
 For multi-class problem
 Order the classes according to increasing class prevalence
(fraction of instances that belong to a particular class)
 Learn the rule set for smallest class first, treat the rest as
negative class
 Repeat with next smallest class as positive class
Direct Method: RIPPER
 Growing a rule:
 Start from empty rule
 Add conjuncts as long as they improve FOIL’s information gain
 Stop when rule no longer covers negative examples
 Prune the rule immediately using incremental reduced error
pruning
 Measure for pruning: v = (p-n)/(p+n)
 p: number of positive examples covered by the rule in
the validation set
 n: number of negative examples covered by the rule in
the validation set
 Pruning method: delete any final sequence of conditions that
maximizes v
1/9/2023 Introduction to Data Mining, 2nd Edition 76
1/9/2023 Introduction to Data Mining, 2nd Edition 77
1/9/2023 Introduction to Data Mining, 2nd Edition 78
1/9/2023 Introduction to Data Mining, 2nd Edition 79
1/9/2023 Introduction to Data Mining, 2nd Edition 80
1/9/2023 Introduction to Data Mining, 2nd Edition 81
1/9/2023 Introduction to Data Mining, 2nd Edition 82
1/9/2023 Introduction to Data Mining, 2nd Edition 83
1/9/2023 Introduction to Data Mining, 2nd Edition 84
1/9/2023 Introduction to Data Mining, 2nd Edition 85
1/9/2023 Introduction to Data Mining, 2nd Edition 86
RIFFER
1/9/2023 Introduction to Data Mining, 2nd Edition 87
2. Indirect Methods for Rule Extraction
1/9/2023 Introduction to Data Mining, 2nd Edition 88
1/9/2023 Introduction to Data Mining, 2nd Edition 89
Charecteristics of Rule-Based Classifiers
 As highly expressive as decision trees
 Easy to interpret
 Easy to generate
 Can classify new instances rapidly
Nearest Neighbor Classifiers
 Basic idea:
 If it walks like a duck, quacks like a duck, then it’s
probably a duck
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
Nearest-Neighbor Classifiers
 Nearest Neighbour classifier assumes the similarity
between the new case/data and available cases and put
the new case into the category that is most similar to
the available categories.
 Example: Suppose, we have an image of a creature that
looks similar to cat and dog, but we want to know either
it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our
KNN model will find the similar features of the new
data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog
category.
Suppose there are two categories, i.e., Category A and
Category B, and we have a new data point x1, so this data
point will lie in which of these categories. To solve this type
of problem, we need a K-NN algorithm. With the help of K-
NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below
algorithm:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of new data
point with all other data points.
 Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
 Step-4: Among these k neighbors, count the number of
the data points in each category.
 Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
 Step-6: Our model is ready.
 Suppose we have a new data point and we need to
put it in the required category. Consider the
below image:
 Firstly, we will choose the number of neighbors, so
we will choose the k=5.
 Next, we will calculate the Euclidean
distance between the data points. The Euclidean
distance is the distance between two points, which
we have already studied in geometry. It can be
calculated as:
 By calculating the Euclidean distance we got the nearest
neighbors, as three nearest neighbors in category A and two
nearest neighbors in category B. Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
Nearest-Neighbor Classifiers
 Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
 To classify an unknown record:
– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Unknown record
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points
that have the k smallest distance to x
Nearest Neighbor Classification
 Compute distance between two points:
 Euclidean distance
 Determine the class from nearest neighbor list
 take the majority vote of class labels among the k-
nearest neighbors
 Weigh the vote according to distance
 weight factor, w = 1/d2
 
 i i
i
q
p
q
p
d 2
)
(
)
,
(
1/9/2023 Introduction to Data Mining, 2nd Edition 102
Characteristics of Nearest-Neighbor Classifiers
 Uses specific training instances to make predictions
without having to maintain an abstraction (or model)
derived from data.
 Lazy learners such as nearest-neighbor classifiers are quite
expensive
 Make their predictions based on local information
 Produce arbitrarily shaped decision boundaries.
 Produce wrong predictions unless the appropriate
proximity measure and data preprocessing steps are taken.
1/9/2023 Introduction to Data Mining, 2nd Edition 103
Bayesian Classifiers
 Bayes theorem, a statistical principle for combining
prior knowledge of the classes with new evidence
gathered from data.
 Two Implementations of Bayesian classifiers:
 Naive Bayes
 Bayesian belief network.
1/9/2023 Introduction to Data Mining, 2nd Edition 104
Bayes Theorem
 Let X and Y be a pair of random variables. Their joint
probability, P(X=x, Y=y), refers to the probability that
variable X will take on the value x and variable Y will take
on the value y.
 A conditional probability is the probability that a
random variable will take on a particular value given that
the outcome for another random variable is known.
 For example, The conditional probability P(Y=y | X=x)
refers to the probability that the variable Y will take on the
value y, given that the variable X is observed to have the
value x.
1/9/2023 Introduction to Data Mining, 2nd Edition 105
 The joint and conditional probabilities for X and Y are
related in the following way:
P(X,Y)= P(Y|x) x P(X) = P(X|Y) x P(Y).
 Bayes theorem:
1/9/2023 Introduction to Data Mining, 2nd Edition 106
 Consider a football game between two rival teams: Team0
and Team1 .
 Suppose Team0 wins 65%o of the time and Team 1 wins the
remaining matches
 Among the games won by Team0, only 30% of them come
from playing on Team1 's football field.
 On the other hand,75% of the victories for Team1 are
obtained while playing at home.
 If Team1 is to host the next match between the two teams,
which team will most likely emerge as the winner?
1/9/2023 Introduction to Data Mining, 2nd Edition 107
1/9/2023 Introduction to Data Mining, 2nd Edition 108
P(Y =O|X = 1) = 1 - P(Y =l | x = 1) =0.4262.
Since P(Y : llx : 1) > P(Y : OlX : 1), Team 1 has a better chance than Team 0 of winning the next
match.
Using the Bayes Theorem for Classification
 Let X denote the attribute set and Y denote the class
variable. If the class variable has a non-deterministic
relationship with the attributes
 Then we can treat X and Y as random variables and capture
their relationship probabilistically using P(Y|X).
 This conditional probability is also known as the posterior
probability for Y, as opposed to its prior probability, P(Y).
1/9/2023 Introduction to Data Mining, 2nd Edition 109
 The Bayes theorem is useful because it allows to express
the posterior probability in terms of the prior probability
P(f), the class-conditional probability P(X|Y), and the
evidence, P(X):
 To estimate the class-conditional probabilities P(X|Y),
we present two implementations of Bayesian
classification methods:
the naiVe Bayes classifier
the Bayesian belief network.
1/9/2023 Introduction to Data Mining, 2nd Edition 110
NaiVe Bayes Classifier
 A naive Bayes classifier estimates the class-conditional
probability by assuming that the attributes are
conditionally independent, given the class label Y.
 The conditional independence assumption can be
formally stated as follows:
where each attribute set X : {X1, X2,..., Xd} consists of d
attributes.
1/9/2023 Introduction to Data Mining, 2nd Edition 111
1/9/2023 Introduction to Data Mining, 2nd Edition 112
1/9/2023 Introduction to Data Mining, 2nd Edition 113
P(Play=Yes) = 9/14 P(Play=No) = 5/14
1/9/2023 Introduction to Data Mining, 2nd Edition 114
1/9/2023 Introduction to Data Mining, 2nd Edition 115
1/9/2023 Introduction to Data Mining, 2nd Edition 116
Bayesian Belief Networks
 A Bayesian belief network (BBN), or simply, Bayesian
network, provides a graphical representation of the
probabilistic relationships among a set of random
variables.
 There are two key elements of a Bayesian network:
1. A directed acyclic graph (dag) encoding the dependence
relationships among a set of variables.
2. A probability table associating each node to its
immediate parent nodes.
1/9/2023 Introduction to Data Mining, 2nd Edition 117
1/9/2023 Introduction to Data Mining, 2nd Edition 118
1/9/2023 Introduction to Data Mining, 2nd Edition 119
1/9/2023 Introduction to Data Mining, 2nd Edition 120
1/9/2023 Introduction to Data Mining, 2nd Edition 121
Model building in Bayesian networks involves two steps:
(1) creating the structure of the network, and
(2) estimating the probability values in the tables associated with each node.
ModelBuilding

More Related Content

Similar to data mining Module 4.ppt

Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...DataStax Academy
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDustiBuckner14
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhibhuvan8999
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhidevipatnala1
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhidevipatnala1
 
data science training in hyderabad
data science training in hyderabaddata science training in hyderabad
data science training in hyderabaddevipatnala1
 
Data science certification
Data science certificationData science certification
Data science certificationprathyusha1234
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.Data Analytics Courses in Pune
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.hrhrenurenu
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.Data Analytics Courses in Pune
 
Data scientist course in hyderabad
Data scientist course in hyderabadData scientist course in hyderabad
Data scientist course in hyderabadprathyusha1234
 
Data scientist training in bangalore
Data scientist training in bangaloreData scientist training in bangalore
Data scientist training in bangaloreprathyusha1234
 
Data science course in chennai (3)
Data science course in chennai (3)Data science course in chennai (3)
Data science course in chennai (3)prathyusha1234
 
data science course in chennai
data science course in chennaidata science course in chennai
data science course in chennaidevipatnala1
 
Best institute for data science in hyderabad
Best institute for data science in hyderabadBest institute for data science in hyderabad
Best institute for data science in hyderabadprathyusha1234
 
Data science online course
Data science online courseData science online course
Data science online courseprathyusha1234
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangaloredevipatnala1
 
Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.sripadojwarumavilas
 

Similar to data mining Module 4.ppt (20)

Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhi
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhi
 
data science training in hyderabad
data science training in hyderabaddata science training in hyderabad
data science training in hyderabad
 
Data science certification
Data science certificationData science certification
Data science certification
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Data scientist course in hyderabad
Data scientist course in hyderabadData scientist course in hyderabad
Data scientist course in hyderabad
 
Data scientist training in bangalore
Data scientist training in bangaloreData scientist training in bangalore
Data scientist training in bangalore
 
Data science course in chennai (3)
Data science course in chennai (3)Data science course in chennai (3)
Data science course in chennai (3)
 
data science course in chennai
data science course in chennaidata science course in chennai
data science course in chennai
 
Best institute for data science in hyderabad
Best institute for data science in hyderabadBest institute for data science in hyderabad
Best institute for data science in hyderabad
 
Data science training
Data science trainingData science training
Data science training
 
Data science online course
Data science online courseData science online course
Data science online course
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangalore
 
Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 

Recently uploaded

Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptjigup7320
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligencemahaffeycheryld
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Stationsiddharthteach18
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Toolssoginsider
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 

Recently uploaded (20)

Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 

data mining Module 4.ppt

  • 2. Classification: Definition  Classification which is the task of assigning objects to one of several predefined categories. 1/9/2023 Introduction to Data Mining, 2nd Edition 2
  • 3. 1/9/2023 Introduction to Data Mining, 2nd Edition 3  Classification is the task of learning a target function f that maps each attribute set x to one of the predefined class y.  The target function is also known informally as a classification model.  A classification model is useful for the following purposes: Descriptive Modeling Predictive Modeling
  • 4. 1/9/2023 Introduction to Data Mining, 2nd Edition 4
  • 5. Given a collection of records (training set )  Each record is by characterized by a tuple (x,y), where x is the attribute set and y is the class label  x: attribute, predictor, independent variable, input  y: class, response, dependent variable, output 1/9/2023 Introduction to Data Mining, 2nd Edition 5
  • 6. General Approach for Building Classification Model Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set 1/9/2023 Introduction to Data Mining, 2nd Edition 6
  • 7. Evaluation of the performance of a classification model 1/9/2023 Introduction to Data Mining, 2nd Edition 7
  • 8. Performation Metrics 1/9/2023 Introduction to Data Mining, 2nd Edition 8
  • 9. Decision Tree Induction  Consider a simpler version of the vertebrate classification problem described in the previous section  Instead of classifying the vertebrates into five distinct groups of species, we assign them to two categories: mammals and non-mammals 1/9/2023 Introduction to Data Mining, 2nd Edition 9
  • 10. 1/9/2023 Introduction to Data Mining, 2nd Edition 10
  • 11. Decision Tree Induction  A root node that has no incoming edges and zero or more outgoing edges.  Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.  Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges. In a decision tree, each leaf node is assigned a class label. 1/9/2023 Introduction to Data Mining, 2nd Edition 11
  • 12. 1/9/2023 Introduction to Data Mining, 2nd Edition 12
  • 13. Classifying unlabeled vertebrate 1/9/2023 Introduction to Data Mining, 2nd Edition 13
  • 14. Decision Tree Induction  Many Algorithms:  Hunt’s Algorithm (one of the earliest)  CART  ID3, C4.5  SLIQ,SPRINT 1/9/2023 Introduction to Data Mining, 2nd Edition 14
  • 15. Hunt’s AlgoritHm Let Dt be the set of training records that are associated with node t and y = {y1, y2, . . . , yc} be the class labels. The following is a recursive definition of Hunt‘s algorithm.  Step 1: If all the records in Data belong to the same class yt, then t is a leaf node labeled as yt.  Step 2: If Data contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets. A child node is created for each outcome of the test condition and the records in Dt are distributed to the children based on the outcomes. The algorithm is then recursively applied to each child node 1/9/2023 Introduction to Data Mining, 2nd Edition 15
  • 16. Hunt’s Algorithm ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (a) (b) (c) Defaulted = No Home Owner Yes No Defaulted = No Defaulted = No Yes No Marital Status Single, Divorced Married (d) Yes No Marital Status Single, Divorced Married Annual Income < 80K >= 80K Home Owner Defaulted = No Defaulted = No Defaulted = Yes Home Owner Defaulted = No Defaulted = No Defaulted = No Defaulted = Yes 1/9/2023 Introduction to Data Mining, 2nd Edition 16 (3,0) (4,3) (3,0) (1,3) (3,0) (3,0) (1,0) (0,3) (3,0) (7,3) Defaulted barrower
  • 17. Hunt’s Algorithm ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (a) (b) (c) Defaulted = No Home Owner Yes No Defaulted = No Defaulted = No Yes No Marital Status Single, Divorced Married (d) Yes No Marital Status Single, Divorced Married Annual Income < 80K >= 80K Home Owner Defaulted = No Defaulted = No Defaulted = Yes Home Owner Defaulted = No Defaulted = No Defaulted = No Defaulted = Yes 1/9/2023 Introduction to Data Mining, 2nd Edition 17 (3,0) (4,3) (3,0) (1,3) (3,0) (3,0) (1,0) (0,3) (3,0) (7,3)
  • 18. Hunt’s Algorithm ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (a) (b) (c) Defaulted = No Home Owner Yes No Defaulted = No Defaulted = No Yes No Marital Status Single, Divorced Married (d) Yes No Marital Status Single, Divorced Married Annual Income < 80K >= 80K Home Owner Defaulted = No Defaulted = No Defaulted = Yes Home Owner Defaulted = No Defaulted = No Defaulted = No Defaulted = Yes 1/9/2023 Introduction to Data Mining, 2nd Edition 18 (3,0) (4,3) (3,0) (1,3) (3,0) (3,0) (1,0) (0,3) (3,0) (7,3)
  • 19. Hunt’s Algorithm (a) (b) (c) Defaulted = No Home Owner Yes No Defaulted = No Defaulted = No Yes No Marital Status Single, Divorced Married (d) Yes No Marital Status Single, Divorced Married Annual Income < 80K >= 80K Home Owner Defaulted = No Defaulted = No Defaulted = Yes Home Owner Defaulted = No Defaulted = No Defaulted = No Defaulted = Yes 1/9/2023 Introduction to Data Mining, 2nd Edition 19 (3,0) (4,3) (3,0) (1,3) (3,0) (3,0) (1,0) (0,3) (3,0) (7,3)
  • 20. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 20 Home Owner MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Start from the root of tree.
  • 21. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 21 MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Home Owner
  • 22. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 22 MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Home Owner
  • 23. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 23 MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Home Owner
  • 24. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 24 MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Home Owner
  • 25. Apply Model to Test Data 1/9/2023 Introduction to Data Mining, 2nd Edition 25 MarSt Income YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Home Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test Data Assign Defaulted to “No” Home Owner
  • 26. Design Issues of Decision Tree Induction  How should training records be split?  Method for specifying test condition  depending on attribute types  Measure for evaluating the goodness of a test condition  How should the splitting procedure stop?  Stop splitting if all the records belong to the same class or have identical attribute values  Early termination 1/9/2023 Introduction to Data Mining, 2nd Edition 26
  • 27. Methods for Expressing Test Conditions  Depends on attribute types  Binary  Nominal  Ordinal  Continuous 1/9/2023 Introduction to Data Mining, 2nd Edition 27
  • 28. Binary Attributes: 1/9/2023 Introduction to Data Mining, 2nd Edition 28
  • 29. Test Condition for Nominal Attributes  Multi-way split:  Use as many partitions as distinct values.  Binary split:  Divides values into two subsets Marital Status Single Divorced Married {Single} {Married, Divorced} Marital Status {Married} {Single, Divorced} Marital Status OR 1/9/2023 Introduction to Data Mining, 2nd Edition 29 OR {Single, Married} Marital Status {Divorced}
  • 30. Test Condition for Ordinal Attributes  Multi-way split:  Use as many partitions as distinct values  Binary split:  Divides values into two subsets  Preserve order property among attribute values Large Shirt Size Medium Extra Large Small {Medium, Large, Extra Large} Shirt Size {Small} {Large, Extra Large} Shirt Size {Small, Medium} 1/9/2023 Introduction to Data Mining, 2nd Edition 30 {Medium, Extra Large} Shirt Size {Small, Large} This grouping violates order property
  • 31. Test Condition for Continuous Attributes Annual Income > 80K? Yes No Annual Income? (i) Binary split (ii) Multi-way split < 10K [10K,25K) [25K,50K) [50K,80K) > 80K 1/9/2023 Introduction to Data Mining, 2nd Edition 31
  • 32. How to determine the Best Split C0: 5 C1: 5 1/9/2023 Introduction to Data Mining, 2nd Edition 32  Greedy approach:  Nodes with purer class distribution are preferred  Need a measure of node impurity: C0: 9 C1: 1 High degree of impurity Low degree of impurity
  • 33. Measures of for selecting the Best split    j t j p t GINI 2 )] | ( [ 1 ) ( 1/9/2023 Introduction to Data Mining, 2nd Edition 33  Gini Index  Entropy  Misclassification error    j t j p t j p t Entropy ) | ( log ) | ( ) ( ) | ( max 1 ) ( t i P t Error i  
  • 34. 1/9/2023 Introduction to Data Mining, 2nd Edition 34 Examples of computing the different impurity
  • 35. Comparison among Impurity Measures 1/9/2023 Introduction to Data Mining, 2nd Edition 35 For a 2-class problem:
  • 36. Finding the Best Split for Binary atrributes 1/9/2023 Introduction to Data Mining, 2nd Edition 36
  • 37. 1/9/2023 Introduction to Data Mining, 2nd Edition 37 If attribute A is chosen to split the data, the Gini index for node N is 0.4898, for node N2, it is 0.480. The weighted average of the Gini index for the descendent nodes is (7/12) x 0.4898 + (5/12) x 0.480= 0.486. Similarly we can show that the weighted average of the Gini index for the attribute B is 0.375.
  • 38. Categorical Attributes: Computing Gini Index  For each distinct value, gather counts for each class in the dataset  Use the count matrix to make decisions 1/9/2023 Introduction to Data Mining, 2nd Edition 38 CarType {Sports, Luxury} {Family} C1 9 1 C2 7 3 Gini 0.468 CarType {Sports} {Family, Luxury} C1 8 2 C2 0 10 Gini 0.167 CarType Family Sports Luxury C1 1 8 1 C2 3 0 7 Gini 0.163 Multi-way split Two-way split (find best partition of values) Which of these is the best?
  • 39. Cheat No No No Yes Yes Yes No No No No Annual Income 60 70 75 85 90 95 100 120 125 220 55 65 72 80 87 92 97 110 122 172 230 <= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= > Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0 No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0 Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420 Continuous Attributes: Computing Gini Index...  For efficient computation: for each attribute, – Sort the attribute on values – Linearly scan these values, each time updating the count matrix and computing gini index – Choose the split position that has the least gini index 1/9/2023 Introduction to Data Mining, 2nd Edition 39 Split Positions Sorted Values
  • 40. Consider the training examples shown in Table for a binary classification problem. 1/9/2023 Introduction to Data Mining, 2nd Edition 40
  • 41. (a) Compute the Gini index for the overall collection of training examples. (b) Compute the Gini index for the Customer ID attribute. (c) Compute the Gini index for the Gender attribute. (d) Compute the Gini index for the Car Type attribute using multiway split. (e) Compute the Gini index for the Shirt Size attribute using multiway split. (f) Which attribute is better, Gender, Car Type, or Shirt Size? 1/9/2023 Introduction to Data Mining, 2nd Edition 41
  • 42. Algorithm for Decision Tree Induction 1/9/2023 Introduction to Data Mining, 2nd Edition 42
  • 43. Characteristics of Decision Tree Induction  Decision tree induction is a nonparametric approach for building classification models.  Finding an optimal decision tree is an NP-complete problem.  Techniques developed for constructing decision trees are computationally inexpensive  Decision trees, especially smaller-sized trees, are relatively easy to interpret. Introduction to Data Mining, 2nd Edition 43
  • 44.  Decision trees provide an expressive representation for learning discrete valued functions  Decision tree algorithms are quite robust to the presence of noise, especially when methods for avoiding overfitting  The presence of redundant attributes does not adversely affect the accuracy of decision trees. 1/9/2023 Introduction to Data Mining, 2nd Edition 44
  • 45.  Since most decision tree algorithms employ a top-down, recursive partitioning approach, the number of records becomes smaller as we traverse down the tree.  A subtree can be replicated multiple times in a decision tree 1/9/2023 Introduction to Data Mining, 2nd Edition 45 Characteristics of Decision Tree Induction
  • 46. Consider the training examples shown in Table 4.2 for a binary classification problem. (a) What is the entropy of this collection of training examples with respect to the positive class? (b) What are the information gains of a1 and a2 relative to these training examples? (c) For a3, which is a continuous attribute, compute the information gain for every possible split. (d)What is the best split (among a1, a2, and a3) according to the information gain? (e) What is the best split (between a1 and a2) according to the classification error rate? (f) What is the best split (between a1 and a2) according to the Gini index?
  • 47. Rule-Based Classifier  Classify records by using a collection of “if…then…” rules R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians
  • 48.  The left-hand side of the rule is called the rule antecedent or precondition.  It contains a conjunction of attribute tests: Conditioni = (A1 op v1) ∧ (A2 op v2) ∧ . . . (Ak op vk)  where (Aj , vj) is an attribute-value pair and op is a logical operator chosen from the set {=, ≠,<,>,≤,≥}. Each attribute test (Aj op vj) is known as a conjunct.  The right-hand side of the rule is called the rule consequent, which contains the predicted class yi. 1/9/2023 Introduction to Data Mining, 2nd Edition 48 Rule Representation
  • 49.
  • 50. 1/9/2023 Introduction to Data Mining, 2nd Edition 50
  • 51. Rule Coverage and Accuracy  Coverage of a rule:  Fraction of records that satisfy the antecedent of a rule  Accuracy of a rule:  Fraction of records that satisfy both the antecedent and consequent of a rule Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (Status=Single)  No Coverage = 40%, Accuracy = 50%
  • 52.
  • 53. Characteristics of Rule-Based Classifier  Mutually exclusive rules  Classifier contains mutually exclusive rules if the rules are independent of each other  Every record is covered by at most one rule  Exhaustive rules  Classifier has exhaustive coverage if it accounts for every possible combination of attribute values  Each record is covered by at least one rule
  • 54. Effect of Rule Simplification  Rules are no longer mutually exclusive  A record may trigger more than one rule  Solution?  Ordered rule set  Unordered rule set – use voting schemes  Rules are no longer exhaustive  A record may not trigger any rules  Solution?  Use a default class
  • 55. Ordered Rule Set  Rules are rank ordered according to their priority  An ordered rule set is known as a decision list  When a test record is presented to the classifier  It is assigned to the class label of the highest ranked rule it has triggered  If none of the rules fired, it is assigned to the default class
  • 56. Rule Ordering Schemes  Rule-based ordering  Individual rules are ranked based on their quality  Class-based ordering  Rules that belong to the same class appear together
  • 57. 1/9/2023 Introduction to Data Mining, 2nd Edition 57
  • 58. How to Build a Rule-Based Classifier  Direct Method:  Extract rules directly from data  e.g.: RIPPER, CN2, Holte’s 1R  Indirect Method:  Extract rules from other classification models (e.g. decision trees, neural networks, etc).  e.g: C4.5rules
  • 59. Direct Method: Sequential Covering 1. Start from an empty rule 2. Grow a rule using the Learn-One-Rule function 3. Remove training records covered by the rule 4. Repeat Step (2) and (3) until stopping criterion is met
  • 60. 1/9/2023 Introduction to Data Mining, 2nd Edition 60
  • 61. Example of Sequential Covering (i) Original Data (ii) Step 1
  • 62. Example of Sequential Covering… (iii) Step 2 R1 (iv) Step 3 R1 R2
  • 63. Rule Growing  Two common strategies
  • 64. Rule Evaluation  An evaluation metric is needed to determine which conjunct should be added (or removed) during the rule- growing process.  Accuracy is an obvious choice because it explicitly measures the fraction of training examples classified correctly by the rule.  a potential limitation of accuracy is that it does not take into account the rule’s coverage. 1/9/2023 Introduction to Data Mining, 2nd Edition 64
  • 65. For example, consider a training set that contains 60 positive examples and 100 negative examples. Suppose we are given the following two candidate rules: Rule r1: covers 50 positive examples and 5 negative examples, Rule r2: covers 2 positive examples and no negative examples. The accuracies for r1 and r2 are 90.9% and 100%, respectively. However, r1 is the better rule despite its lower accuracy. The high accuracy for r2 is potentially spurious because the coverage of the rule is too low. 1/9/2023 Introduction to Data Mining, 2nd Edition 65
  • 66. Approaches to Handle such problem 1. Likelihood ratio statistic: 2. Laplace Measure 3. FOIL’s information gain 1/9/2023 Introduction to Data Mining, 2nd Edition 66
  • 67.  Likelihood ratio statistic where k is the number of classes, fi is the observed frequency of class i examples that are covered by the rule, and ei is the expected frequency of a rule that makes random predictions 1/9/2023 Introduction to Data Mining, 2nd Edition 67
  • 68. The likelihood ratio for r1 is  R(r1) = 2 × [50 × log2(50/20.625) + 5 × log2(5/34.375)] = 99.9. The likelihood ratio statistic for r2 is  R(r2) = 2 × [2 × log2(2/0.75) + 0 × log2(0/1.25)] = 5.66. This statistic therefore suggests that r1 is a better rule than r2. 1/9/2023 Introduction to Data Mining, 2nd Edition 68
  • 69. Laplace Measure where n is the number of examples covered by the rule, f+ is the number of positive examples covered by the rule, k is the total number of classes, and p+ is the prior probability for the positive class. 1/9/2023 Introduction to Data Mining, 2nd Edition 69
  • 70. The Laplace measure for r1 is 51/57 = 89.47%, which is quite close to its accuracy. Conversely, the Laplace measure for r2 (75%) is significantly lower than its accuracy because r2 has a much lower coverage. 1/9/2023 Introduction to Data Mining, 2nd Edition 70
  • 71.  FOIL’s information gain 1/9/2023 Introduction to Data Mining, 2nd Edition 71
  • 72.  Foil’s Information Gain  R0: {} => class (initial rule)  R1: {A} => class (rule after adding conjunct) Gain(R0, R1) = p1 x [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ] where t: number of positive instances covered by both R0 and R1 p0: number of positive instances covered by R0 n0: number of negative instances covered by R0 p1: number of positive instances covered by R1 n1: number of negative instances covered by R1
  • 73. (a) Rule accuracy (b) FOIL’s information gain. (c) likelihood ratio (d) The Laplace measure. (e) m-estimate measure (with k = 2 and p+ = 0.2). 1/9/2023 Introduction to Data Mining, 2nd Edition 73
  • 74. Direct Method: RIPPER  For 2-class problem, choose one of the classes as positive class, and the other as negative class  Learn rules for negative class  positive class will be default class  For multi-class problem  Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class)  Learn the rule set for smallest class first, treat the rest as negative class  Repeat with next smallest class as positive class
  • 75. Direct Method: RIPPER  Growing a rule:  Start from empty rule  Add conjuncts as long as they improve FOIL’s information gain  Stop when rule no longer covers negative examples  Prune the rule immediately using incremental reduced error pruning  Measure for pruning: v = (p-n)/(p+n)  p: number of positive examples covered by the rule in the validation set  n: number of negative examples covered by the rule in the validation set  Pruning method: delete any final sequence of conditions that maximizes v
  • 76. 1/9/2023 Introduction to Data Mining, 2nd Edition 76
  • 77. 1/9/2023 Introduction to Data Mining, 2nd Edition 77
  • 78. 1/9/2023 Introduction to Data Mining, 2nd Edition 78
  • 79. 1/9/2023 Introduction to Data Mining, 2nd Edition 79
  • 80. 1/9/2023 Introduction to Data Mining, 2nd Edition 80
  • 81. 1/9/2023 Introduction to Data Mining, 2nd Edition 81
  • 82. 1/9/2023 Introduction to Data Mining, 2nd Edition 82
  • 83. 1/9/2023 Introduction to Data Mining, 2nd Edition 83
  • 84. 1/9/2023 Introduction to Data Mining, 2nd Edition 84
  • 85. 1/9/2023 Introduction to Data Mining, 2nd Edition 85
  • 86. 1/9/2023 Introduction to Data Mining, 2nd Edition 86
  • 87. RIFFER 1/9/2023 Introduction to Data Mining, 2nd Edition 87
  • 88. 2. Indirect Methods for Rule Extraction 1/9/2023 Introduction to Data Mining, 2nd Edition 88
  • 89. 1/9/2023 Introduction to Data Mining, 2nd Edition 89
  • 90. Charecteristics of Rule-Based Classifiers  As highly expressive as decision trees  Easy to interpret  Easy to generate  Can classify new instances rapidly
  • 91. Nearest Neighbor Classifiers  Basic idea:  If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records
  • 92. Nearest-Neighbor Classifiers  Nearest Neighbour classifier assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.
  • 93.  Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category.
  • 94. Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K- NN, we can easily identify the category or class of a particular dataset. Consider the below diagram:
  • 95. How does K-NN work? The K-NN working can be explained on the basis of the below algorithm:  Step-1: Select the number K of the neighbors  Step-2: Calculate the Euclidean distance of new data point with all other data points.  Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.  Step-4: Among these k neighbors, count the number of the data points in each category.  Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.  Step-6: Our model is ready.
  • 96.  Suppose we have a new data point and we need to put it in the required category. Consider the below image:
  • 97.  Firstly, we will choose the number of neighbors, so we will choose the k=5.  Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as:
  • 98.  By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below image: • As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A.
  • 99. Nearest-Neighbor Classifiers  Requires three things – The set of stored records – Distance Metric to compute distance between records – The value of k, the number of nearest neighbors to retrieve  To classify an unknown record: – Compute distance to other training records – Identify k nearest neighbors – Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Unknown record
  • 100. Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
  • 101. Nearest Neighbor Classification  Compute distance between two points:  Euclidean distance  Determine the class from nearest neighbor list  take the majority vote of class labels among the k- nearest neighbors  Weigh the vote according to distance  weight factor, w = 1/d2    i i i q p q p d 2 ) ( ) , (
  • 102. 1/9/2023 Introduction to Data Mining, 2nd Edition 102
  • 103. Characteristics of Nearest-Neighbor Classifiers  Uses specific training instances to make predictions without having to maintain an abstraction (or model) derived from data.  Lazy learners such as nearest-neighbor classifiers are quite expensive  Make their predictions based on local information  Produce arbitrarily shaped decision boundaries.  Produce wrong predictions unless the appropriate proximity measure and data preprocessing steps are taken. 1/9/2023 Introduction to Data Mining, 2nd Edition 103
  • 104. Bayesian Classifiers  Bayes theorem, a statistical principle for combining prior knowledge of the classes with new evidence gathered from data.  Two Implementations of Bayesian classifiers:  Naive Bayes  Bayesian belief network. 1/9/2023 Introduction to Data Mining, 2nd Edition 104
  • 105. Bayes Theorem  Let X and Y be a pair of random variables. Their joint probability, P(X=x, Y=y), refers to the probability that variable X will take on the value x and variable Y will take on the value y.  A conditional probability is the probability that a random variable will take on a particular value given that the outcome for another random variable is known.  For example, The conditional probability P(Y=y | X=x) refers to the probability that the variable Y will take on the value y, given that the variable X is observed to have the value x. 1/9/2023 Introduction to Data Mining, 2nd Edition 105
  • 106.  The joint and conditional probabilities for X and Y are related in the following way: P(X,Y)= P(Y|x) x P(X) = P(X|Y) x P(Y).  Bayes theorem: 1/9/2023 Introduction to Data Mining, 2nd Edition 106
  • 107.  Consider a football game between two rival teams: Team0 and Team1 .  Suppose Team0 wins 65%o of the time and Team 1 wins the remaining matches  Among the games won by Team0, only 30% of them come from playing on Team1 's football field.  On the other hand,75% of the victories for Team1 are obtained while playing at home.  If Team1 is to host the next match between the two teams, which team will most likely emerge as the winner? 1/9/2023 Introduction to Data Mining, 2nd Edition 107
  • 108. 1/9/2023 Introduction to Data Mining, 2nd Edition 108 P(Y =O|X = 1) = 1 - P(Y =l | x = 1) =0.4262. Since P(Y : llx : 1) > P(Y : OlX : 1), Team 1 has a better chance than Team 0 of winning the next match.
  • 109. Using the Bayes Theorem for Classification  Let X denote the attribute set and Y denote the class variable. If the class variable has a non-deterministic relationship with the attributes  Then we can treat X and Y as random variables and capture their relationship probabilistically using P(Y|X).  This conditional probability is also known as the posterior probability for Y, as opposed to its prior probability, P(Y). 1/9/2023 Introduction to Data Mining, 2nd Edition 109
  • 110.  The Bayes theorem is useful because it allows to express the posterior probability in terms of the prior probability P(f), the class-conditional probability P(X|Y), and the evidence, P(X):  To estimate the class-conditional probabilities P(X|Y), we present two implementations of Bayesian classification methods: the naiVe Bayes classifier the Bayesian belief network. 1/9/2023 Introduction to Data Mining, 2nd Edition 110
  • 111. NaiVe Bayes Classifier  A naive Bayes classifier estimates the class-conditional probability by assuming that the attributes are conditionally independent, given the class label Y.  The conditional independence assumption can be formally stated as follows: where each attribute set X : {X1, X2,..., Xd} consists of d attributes. 1/9/2023 Introduction to Data Mining, 2nd Edition 111
  • 112. 1/9/2023 Introduction to Data Mining, 2nd Edition 112
  • 113. 1/9/2023 Introduction to Data Mining, 2nd Edition 113 P(Play=Yes) = 9/14 P(Play=No) = 5/14
  • 114. 1/9/2023 Introduction to Data Mining, 2nd Edition 114
  • 115. 1/9/2023 Introduction to Data Mining, 2nd Edition 115
  • 116. 1/9/2023 Introduction to Data Mining, 2nd Edition 116
  • 117. Bayesian Belief Networks  A Bayesian belief network (BBN), or simply, Bayesian network, provides a graphical representation of the probabilistic relationships among a set of random variables.  There are two key elements of a Bayesian network: 1. A directed acyclic graph (dag) encoding the dependence relationships among a set of variables. 2. A probability table associating each node to its immediate parent nodes. 1/9/2023 Introduction to Data Mining, 2nd Edition 117
  • 118. 1/9/2023 Introduction to Data Mining, 2nd Edition 118
  • 119. 1/9/2023 Introduction to Data Mining, 2nd Edition 119
  • 120. 1/9/2023 Introduction to Data Mining, 2nd Edition 120
  • 121. 1/9/2023 Introduction to Data Mining, 2nd Edition 121 Model building in Bayesian networks involves two steps: (1) creating the structure of the network, and (2) estimating the probability values in the tables associated with each node. ModelBuilding