SlideShare a Scribd company logo
Chapter 6
UNIVERSITY OF INFORMATION TECHNOLOGY
Faculty of Information Systems
CLASSIFICATION
Cao Thi Nhan
1. Introduction
2. Decision Tree
3. Bayes Classification Methods
4. Neural network
5. K - Nearest Neighbor Classifier
6. Support Vector Machine
CONTENT
INTRODUCTION
Introduction
Supervised vs. Unsupervised Learning
Supervised Learning (classification)
Supervision: The training data (observations,
measurements,…) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised Learning (Clusetering)
The class labels of training data is unknown
Given a set of measurements, observations, etc.
with the aim of establishing the existence of classes
or clusters in the data
Introduction
Classification
predicts categorical class labels
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new
data
Model:
Training: based on training data to identidy classifier.
(xi,yi), with xi: object ith , yi class of ith. F(X)
Testing: new object x will be predicted based on classifier.
Introduction
Training
Introduction
Testing
Binary Classification
Introduction
Multiclass classification
Introduction
K fold - cross validation: Evaluating Classifier
Accuracy
Randomly partition the data into k mutually
exclusive subsets, each approximately equal size
At i-th iteration, use Di as test set and others as
training set
K = 10
Leave-one-out: k folds where k = # of tuples, for
small sized data
Introduction
Confusion matrix: Given m classes, an entry, CMi,j in a
confusion matrix indicates # of tuples in class i that
they are labeled by the classifier as class j
Introduction
Classification:
1. Decision tree
2. Bayes classification methods
3. Neural network
4. Rough set
5. Regression
6. K- nearest neighbor (k-nn)
7. Support vector machine (SVM)
8. Fuzzy
…
DECISION TREE
Decision Tree
Introduction
Construct tree
Measure
Conclusion
❑ Duration: 1 min.
❑ Question:
Why did you decide to study this course “Data mining”
Question
14
Should we play baseball today?
fwind : {weak, strong}
ftemperature : {hot, mild, cool}
fhumidity : {high, normal}
foutlook : {sunny, overcast, rainy}
{sunny, mild, normal, strong}
Outlook
Rainy
Overcas
t
Sunny
Yes
Wind Humidity
Yes No
Yes No
Weak Strong Normal High
Playball =
{Yes, No}
{foutlook, ftemperature, fhumidity, fwind}  flearning
Should we play baseball today?
Conditions: {Outlook = Sunny, Temperature = Hot,
Humidity = Normal, Wind = Strong}
Outlook
Rainy
Overcast
Sunny
Yes
Wind Humidity
Yes No
Yes No
Weak Strong Normal High
The answer: Yes, today we should play baseball.
Description: Decision tree is a tree including root node
and branch node (representing a choice among choices),
and leaf node (representing a decision).
Outlook
Rainy
Overcast
Sunny
Yes
Wind Humidity
Yes No
Yes No
Weak Strong Normal High
Root node
Leaf
node
Branch
node
Branch
Decision tree
18
Algorithm for Decision Tree
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-
conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are
discretized in advance)
Examples are partitioned recursively based on selected
attributes
Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain, Gini index…)
Conditions for stopping partitioning
All samples for a given node belong to the same class
There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
There are no samples left
19
Algorithm for Decision Tree
Generate rules based on decision tree
IF (conditon1) [and (condition2) and …] THEN Conclusion
IF outlook = sunny AND humidity = high THEN playball = no
IF outlook = overcast THEN playball = yes
IF outlook = rainy AND wind = weak THEN playball = yes
Outlook
Rainy
Overcast
Sunny
Yes
Wind Humidity
Yes No
Yes No
Weak Strong Normal High
❑ Members: 3-5
students; Duration:
10 mins.
❑ Question: The
data below is
ready to apply
decision
algorithm? Why?
Propose your
solution.
Group Discussion
20
21
Entropy
Entropy
A measure of uncertainty associated with a random variable
Entropy is used to build the tree
Calculation: Entropy of set S:
S: sample set
N: number of different values of all samples in S
Aj: number of sample corresponding to each j
Fs(Aj): ratio of Aj to S
S is a 14-sample set having 9 samples belong to class
Yes, and 5 samples belong to class No
22
Entropy
Example:
A class has 35 students. 25 students do homework,
and 10 students do not do homework.
23
Information Gain
Information Gain of set of sample S based on attribute
A:
G(S,A): information gain of set S based on attribute A
E(S): entropy of S
m: number of different values of attribute A
Ai: number of sample corresponding to each I of
attribute A
Fs(Ai): ratio of Ai to S
SAi: subset of S including all samples having value Ai
24
Information Gain
Day Outlook Temperature Humidity Wind Play ball
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rainy Mild High Weak Yes
D5 Rainy Cool Normal Weak Yes
D6 Rainy Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rainy Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rainy Mild High Strong No
25
Information Gain
G(S,Wind) = ?
S has 14 samples and 2 classes: 9 Yes, 5 No
Wind has 2 different values: Weak, Strong
Wind=Weak (8 samples: 6 Yes, 2 No); Wind=Strong
(6 samples: 3 Yes, 3 No):
– With:
❑ Members: 3-5 students; Duration: 10 mins.
❑ Compute:
1. G(S, Outlook)
2. G(S, Temperature)
3. G(S, Humidity)
Group Discussion
26
27
Decision Tree
Outlook is the root (has maximal Information Gain)
28
Decision Tree
Outlook has 3 different values: sunny, overcast, and
rainy → The root has 3 branches
Which attribute should be chosen at Sunny branch?
(Outlook, Humidity, Temperature, Wind)
➢ Ssunny = {D1, D2, D8, D9, D11}, then it has 5 samples with
Outlook = sunny
➢ Gain(Ssunny, Humidity) = 0.970
➢ Gain(Ssunny, Temperature) = 0.570
➢ Gain(Ssunny, Wind) = 0.019
➢ Select Humidity
Keep doing until all samples are classified or there are
no remaining attributes for further partitioning
29
Information Gain
Day Outlook Temperature Humidity Wind Play ball
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rainy Mild High Weak Yes
D5 Rainy Cool Normal Weak Yes
D6 Rainy Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rainy Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rainy Mild High Strong No
30
Gini index
Gini index of data D:
With: the relative frequency of class j in D
Example: with data set above:
14 samples: 9 Yes, 5 No
Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459
2
)
(
1
)
(
Gini 
−
=
j
D
j
p
D
)
( D
j
p
31
Gini index
If a data set D is split on A into k subsets D1, D2,…,
Dk the gini index giniA(D) is defined as:
With:
➢ ni: #samples of node i
➢ N: #samples of node A
Select attribute with minimal Gini index for
partitioning

=
=
k
i
i
A i
n
n
D
1
)
(
Gini
)
(
Gini
32
Gini index
Day Outlook Temperature Humidity Wind Play ball
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rainy Mild High Weak Yes
D5 Rainy Cool Normal Weak Yes
D6 Rainy Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rainy Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rainy Mild High Strong No
33
Gini index
Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459 (9 yes, 5 No)
1. GiniOutlook(D)
= 5/14*Gini(Ssunny) + 4/14*Gini(Sovercast) + 5/14*Gini(Srainy)
= 0.343
With:
Gini(Ssunny) = 0.48 // 2Yes, 3No
Gini(Sovercast) = 0 // 4Yes, 0No
Gini(Srainy) = 0.48 // 3Yes, 2No
❑ Members: 3-5 students; Duration: 5 mins.
❑ Compute:
1. Gini Temperature (D)
2. Gini Humidity (D)
3. Gini Wind (D)
Group Discussion
34
35
Gini index
Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459
1. GiniOutlook(D)= 0.343
2. GiniTemperature(D) = 0.440
3. GiniHumidity(D) = 0.367
4. GiniWind(D) = 0.428
→ Outlook is selected as the root (Gini index is the
minimal value)
36
Decision Tree
Conclusion:
Easy to understand
Data preprocessing
Big data
1. Introduction
2. Decision Tree
3. Bayes Classification Method
4. Neural network
CONTENT
BAYES CLASSIFICATION
METHOD
39
Bayes classification
1. Introduction
2. Bayes classification
3. Comments
40
Bayes classification
Introduction
A statistical classifier: performs probabilistic
prediction, i.e., predicts class membership
probabilities
Foundation: Based on Bayes’ Theorem (1763)
Incremental: Each training example can
incrementally increase/decrease the probability that
a hypothesis is correct — prior knowledge can be
combined with observed data
41
Bayes’ Theorem: Basics
Let X be a data sample (“evidence”): class label is unknown
Let H be a hypothesis that X belongs to class C
Classification is to determine P(H|X), (i.e., posteriori
probability): the probability that the hypothesis holds given
the observed data sample X
P(H) (prior probability): the initial probability
X will plays baseball, regardless of humidity, wind, overcast…
P(X) (prior probability): probability that sample data is
observed
)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 
=
=
42
Bayes’ Theorem: Basics
P(X|H) (likelihood): the probability of observing the sample X,
given that the hypothesis holds
Informally, this can be viewed as:
posteriori = likelihood x prior/evidence
Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 
=
=
43
Bayes’ Theorem: Basics
Naïve Bayes Classifier: attributes are conditionally
independent (i.e., no dependence relation among attributes)
P(X|H): X=(x1, x2,…, xk)
P(x1,…,xk|H) = P(x1|H)·…·P(xk|H)
)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 
=
=
Bayes’ Classifier – Example
Outlook Temperature Humidity Wind Play ball
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rainy Mild High Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rainy Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rainy Mild High Strong No
Bayes’ Classifier – Example
Let X = (Outlook = Rainy, Temp = Cool, Humidity =
Normal, Wind = Weak) → X belongs to class Yes or No?
Compute → Predict
1. P(Play=Yes) *P(X|Play=Yes) = P(Play=Yes) *
P(Outlook=Rainy|Play=Yes)*P(Temp=Cool|Play=Yes)*
P(Humidity=Normal|Play=Yes)* P(Wind=Weak|Play=Yes)
2. P(Play=No) *P(X|Play=No) = P(Play=No) *
P(Outlook=Rainy|Play=No)*P(Temp=Cool|Play=No)*
P(Humidity=Normal|Play=No)* P(Wind=Weak|Play=No)
)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 
=
=
Comment
Bayes’ Classifier – Example
Let X = (Outlook = Rainy, Temp = Cool, Humidity =
Normal, Wind = Weak) → X belongs to class Yes or No?
Compute:
✓ P (Play=Yes) = 9/14; P(Play=No) = 5/14
✓ P(Outlook=Rainy|Play=Yes) = 3/9;
✓ P(Outlook=Rainy|Play=No) = 2/5;
❑ Members: 3-5 students; Duration: 5 mins.
❑ Compute:
1. P(Temp=Cool|Play=Yes) =
2. P(Temp=Cool|Play=No) =
3. P(Humidity=Normal|Play=Yes) =
4. P(Humidity=Normal|Play=No) =
5. P(Wind=Weak|Play=Yes) =
6. P(Wind=Weak|Play=No) =
Group Discussion
47
Bayes’ Classifier – Example
1. P(Temp=Cool|Play=Yes) = 3/9
2. P(Temp=Cool|Play=No) = 1/5
3. P(Humidity=Normal|Play=Yes) = 6/9
4. P(Humidity=Normal|Play=No) = 1/5
5. P(Wind=Weak|Play=Yes) = 6/9
6. P(Wind=Weak|Play=No) = 2/5
Bayes’ Classifier – Example
1. P(Play=Yes) *P(X|Play=Yes) = (9/14) *
(3/9) * (3/9) * (6/9) * (6/9) = 0.032
2. P(Play=No) *P(X|Play=No) = (5/14) *
(2/5) * (1/5) * (1/5) * (2/5) = 0.002
Conlusion: X=(Rainy, Cool, Normal, Weak) belongs to
class Play = Yes
❑ Members: 3-5 students;
❑ Duration: 5 mins.
❑ Let X = (Outlook = Sunny, Temp = Hot, Humidity = High,
Wind = Weak), predict X.
Group Discussion
50
❑ Members: 3-5 students;
❑ Duration: 5 mins.
❑ Let X = (Outlook = Overcast, Temp = Hot, Humidity =
High, Wind = Weak), predict X.
Group Discussion
51
Naïve Bayesian prediction requires
each conditional prob. be non-zero.
52
Bayes’ Classifier
Need to avoid the Zero-Probability Problem
Use Laplacian correction (or Laplacian estimator)
P(Ci)=(|Ci,D|+1)/(|D|+m)
P(Xk|Ci)=(# Ci,D {xk}+1)/(|Ci,D|+r)
With:
- m: #classes
- r: #different values of the attribute
Bayes’ Classifier
Let X = (Outlook = Overcast, Temp = Hot, Humidity =
High, Wind = Weak) using Laplacian correction.
Compute:
✓ P(Play=Yes) = (9+1)/(14+2) = 10/16
✓ P(Play=No) = (5+1)/(14+2) = 6/16
✓ P(Outlook=Overcast|Play=Yes)=(4+1)/(9+3)=5/12
✓ P(Outlook=Overcast|Play=No) = 1/8
54
Comments
Advatages
Easy to implement
Good results obtained in most of the cases
Disadvatages
Assumption: class conditional independence, therefore loss of
accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
Dependencies among these cannot be modeled by Naïve
Bayes Classifier
1. Introduction
2. Decision Tree
3. Bayes Classification Method
4. Neural network
CONTENT
NEURAL NETWORK
57
Neural network
1. Introduction
2. Neural network
3. Comments
58
Neural network
Nervous system
59
Neural network
Neural: Soma, Dendrite, Axon
https://science.howstuffworks.com/life/inside-the-mind/human-
Neural network
Artificial Neuron Model
McCulloch-Pitts neuron (1943)
Weights: Wij
Net input
Activation fuction f
thresholds

=
j
j
ij
i x
w
net
Neural network
Artificial Neuron Model
Activation functions
Neural network
Artificial Neuron Model
Activation functions
Neural network
Artificial Neuron Model
64
Comments
Advantages
High tolerance to noisy data
Ability to classify untrained patterns
Well-suited for continuous-valued inputs and
outputs
Successful on an array of real-world data, e.g.,
hand-written letters, …
Techniques have recently been developed for very
complicated topics
65
Comments
Disadvantages
Long training time
Require a number of parameters typically best
determined empirically, e.g., the network topology
or “structure.”
Poor interpretability: Difficult to interpret the
symbolic meaning behind the learned weights and
of “hidden units” in the network
1. Introduction
2. Decision Tree
3. Bayes Classification Methods
4. Neural network
5. K - Nearest Neighbor Classifier
6. Support Vector Machine
CONTENT
K - NEAREST NEIGHBOR CLASSIFIER
68
K - Nearest Neighbor Classifier
1. Introduction
2. K - Nearest Neighbor Classifier
3. Comments
69
Introduction
The k-nearest-neighbor method was first described in
the early 1950s
The idea is to search for the closest match(es) of the
test data in the feature space.
All instances correspond to points in the n-D space
The nearest neighbor are defined: dist(X1, X2) (ex:
Euclidean distance)
70
K - Nearest Neighbor Classifier
The training tuples are described by n attributes
Each tuple represents a point in an n-dimensional
space → all the training tuples are stored in an n-
dimensional pattern space
When given an unknown tuple: a k-nearest-neighbor
classifier searches the pattern space for the k training
tuples that are closest to the unknown tuple
These k training tuples are the k “nearest neighbors” of
the unknown tuple.
71
K - Nearest Neighbor Classifier
“Closeness” is defined in terms of a distance metric
(such as Euclidean distance)
X1(x11, x12, …x1n), X2(x21, x22, …x2n)
For discrete-valued, k-NN returns the most common
value among the k training examples nearest to xq
72
K - Nearest Neighbor Classifier
K=1, 3, 4, or 7?
K should be an odd number
Neighbours with equal
importance?
Weighted kNN: depending on
their distance to the new-comer:
https://docs.opencv.org/3.4/d5/d26/tutorial_py_knn_understanding.html
2
)
,
(
1
i
x
q
x
d
w
73
Comments
K=?
Extremely slow when classifying test tuples.
"learning" involves only memorizing (storing) the
data, before testing and classifying.
Distance?
Robust to noisy data
1. Introduction
2. Decision Tree
3. Bayes Classification Methods
4. Neural network
5. K - Nearest Neighbor Classifier
6. Support Vector Machine
CONTENT
75
Support Vector Machine
1. Introduction
2. Support Vector Machine
3. Comments
76
Introduction
A relatively new classification method for both linear
and nonlinear data
It uses a nonlinear mapping to transform the original
training data into a higher dimension
With the new dimension, it searches for the linear
optimal separating hyperplane (i.e., “decision
boundary”)
With an appropriate nonlinear mapping to a
sufficiently high dimension, data from two classes can
always be separated by a hyperplane
SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by
the support vectors)
77
SVM—History and Applications
Vapnik and colleagues (1992)—groundwork from Vapnik
& Chervonenkis’ statistical learning theory in 1960s
Features: training can be slow but accuracy is high owing
to their ability to model complex nonlinear decision
boundaries (margin maximization)
Used for: classification and numeric prediction
Applications:
handwritten digit recognition, object recognition,
speaker identification, benchmarking time-series
prediction tests
78
SVM—General Philosophy
Support Vectors
Small Margin Large Margin
April 7, 2022 Data Mining: Concepts and
Techniques
79
SVM—Margins and Support Vectors
80
SVM—When Data Is Linearly
Separable
m
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to
find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)
81
SVM—Linearly Separable
◼ A separating hyperplane can be written as
W ● X + b = 0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
◼ For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
◼ The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
◼ Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
◼ This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints →
Quadratic Programming (QP) → Lagrangian multipliers
82
Why Is SVM Effective on High Dimensional Data?
◼ The complexity of trained classifier is characterized by the # of
support vectors rather than the dimensionality of the data
◼ The support vectors are the essential or critical training examples —
they lie closest to the decision boundary (MMH)
◼ If all other training examples are removed and the training is
repeated, the same separating hyperplane would be found
◼ The number of support vectors found can be used to compute an
(upper) bound on the expected error rate of the SVM classifier, which
is independent of the data dimensionality
◼ Thus, an SVM with a small number of support vectors can have good
generalization, even when the dimensionality of the data is high
83
SVM: Different Kernel functions
◼ Instead of computing the dot product on the transformed
data, it is math. equivalent to applying a kernel function
K(Xi, Xj) to the original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj)
◼ Typical Kernel Functions
84
SVM Related Links
SVM Website: http://www.kernel-machines.org/
SVM practical guide: library for SVM
Representative implementations
LIBSVM: an efficient implementation of SVM, multi-class
classifications, nu-SVM, one-class SVM, including also
various interfaces with java, python, etc.
SVM-light: simpler but performance is not better than
LIBSVM, support only binary classification and only in C
SVM-torch: another recent implementation also written
in C
1. Introduction
2. Decision Tree
3. Bayes Classification Methods
4. Neural network
5. K - Nearest Neighbor Classifier
6. Support Vector Machine
CONTENT
Nhan_Chapter 6_Classification 2022.pdf

More Related Content

Similar to Nhan_Chapter 6_Classification 2022.pdf

Machine Learning 2D5362
Machine Learning 2D5362Machine Learning 2D5362
Machine Learning 2D5362
butest
 
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
butest
 
3.Classification.ppt
3.Classification.ppt3.Classification.ppt
3.Classification.ppt
KhalilDaiyob1
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
MdAlAmin187
 
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
Learnbay Datascience
 
Decision Tree Steps
Decision Tree StepsDecision Tree Steps
Decision Tree Steps
Vikash Kumar
 
Ml lab manual 7
Ml lab manual 7Ml lab manual 7
Ml lab manual 7
dindjarin
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
CHIRAGGOWDA41
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
Dr. Radhey Shyam
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
rajasamal1999
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
HimanshuSharma997566
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
Rohit Raj
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very good
ranjankumarbehera14
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
engrasi
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
ritumysterious1
 
Classification using decision tree in detail
Classification using decision tree in detailClassification using decision tree in detail
Classification using decision tree in detail
Ramadan Babers, PhD
 
Machine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxMachine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptx
GulamSarwar31
 

Similar to Nhan_Chapter 6_Classification 2022.pdf (20)

Machine Learning 2D5362
Machine Learning 2D5362Machine Learning 2D5362
Machine Learning 2D5362
 
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
 
3.Classification.ppt
3.Classification.ppt3.Classification.ppt
3.Classification.ppt
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
Data Science Training in Bangalore | Learnbay.in | Decision Tree | Machine Le...
 
Decision Tree Steps
Decision Tree StepsDecision Tree Steps
Decision Tree Steps
 
Ml lab manual 7
Ml lab manual 7Ml lab manual 7
Ml lab manual 7
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very good
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
Classification using decision tree in detail
Classification using decision tree in detailClassification using decision tree in detail
Classification using decision tree in detail
 
Machine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxMachine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptx
 

Recently uploaded

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 

Recently uploaded (20)

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 

Nhan_Chapter 6_Classification 2022.pdf

  • 1. Chapter 6 UNIVERSITY OF INFORMATION TECHNOLOGY Faculty of Information Systems CLASSIFICATION Cao Thi Nhan
  • 2. 1. Introduction 2. Decision Tree 3. Bayes Classification Methods 4. Neural network 5. K - Nearest Neighbor Classifier 6. Support Vector Machine CONTENT
  • 4. Introduction Supervised vs. Unsupervised Learning Supervised Learning (classification) Supervision: The training data (observations, measurements,…) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised Learning (Clusetering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 5. Introduction Classification predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Model: Training: based on training data to identidy classifier. (xi,yi), with xi: object ith , yi class of ith. F(X) Testing: new object x will be predicted based on classifier.
  • 9. Introduction K fold - cross validation: Evaluating Classifier Accuracy Randomly partition the data into k mutually exclusive subsets, each approximately equal size At i-th iteration, use Di as test set and others as training set K = 10 Leave-one-out: k folds where k = # of tuples, for small sized data
  • 10. Introduction Confusion matrix: Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i that they are labeled by the classifier as class j
  • 11. Introduction Classification: 1. Decision tree 2. Bayes classification methods 3. Neural network 4. Rough set 5. Regression 6. K- nearest neighbor (k-nn) 7. Support vector machine (SVM) 8. Fuzzy …
  • 14. ❑ Duration: 1 min. ❑ Question: Why did you decide to study this course “Data mining” Question 14
  • 15. Should we play baseball today? fwind : {weak, strong} ftemperature : {hot, mild, cool} fhumidity : {high, normal} foutlook : {sunny, overcast, rainy} {sunny, mild, normal, strong} Outlook Rainy Overcas t Sunny Yes Wind Humidity Yes No Yes No Weak Strong Normal High Playball = {Yes, No} {foutlook, ftemperature, fhumidity, fwind}  flearning
  • 16. Should we play baseball today? Conditions: {Outlook = Sunny, Temperature = Hot, Humidity = Normal, Wind = Strong} Outlook Rainy Overcast Sunny Yes Wind Humidity Yes No Yes No Weak Strong Normal High The answer: Yes, today we should play baseball.
  • 17. Description: Decision tree is a tree including root node and branch node (representing a choice among choices), and leaf node (representing a decision). Outlook Rainy Overcast Sunny Yes Wind Humidity Yes No Yes No Weak Strong Normal High Root node Leaf node Branch node Branch Decision tree
  • 18. 18 Algorithm for Decision Tree Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and- conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain, Gini index…) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf There are no samples left
  • 19. 19 Algorithm for Decision Tree Generate rules based on decision tree IF (conditon1) [and (condition2) and …] THEN Conclusion IF outlook = sunny AND humidity = high THEN playball = no IF outlook = overcast THEN playball = yes IF outlook = rainy AND wind = weak THEN playball = yes Outlook Rainy Overcast Sunny Yes Wind Humidity Yes No Yes No Weak Strong Normal High
  • 20. ❑ Members: 3-5 students; Duration: 10 mins. ❑ Question: The data below is ready to apply decision algorithm? Why? Propose your solution. Group Discussion 20
  • 21. 21 Entropy Entropy A measure of uncertainty associated with a random variable Entropy is used to build the tree Calculation: Entropy of set S: S: sample set N: number of different values of all samples in S Aj: number of sample corresponding to each j Fs(Aj): ratio of Aj to S S is a 14-sample set having 9 samples belong to class Yes, and 5 samples belong to class No
  • 22. 22 Entropy Example: A class has 35 students. 25 students do homework, and 10 students do not do homework.
  • 23. 23 Information Gain Information Gain of set of sample S based on attribute A: G(S,A): information gain of set S based on attribute A E(S): entropy of S m: number of different values of attribute A Ai: number of sample corresponding to each I of attribute A Fs(Ai): ratio of Ai to S SAi: subset of S including all samples having value Ai
  • 24. 24 Information Gain Day Outlook Temperature Humidity Wind Play ball D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rainy Mild High Weak Yes D5 Rainy Cool Normal Weak Yes D6 Rainy Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rainy Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rainy Mild High Strong No
  • 25. 25 Information Gain G(S,Wind) = ? S has 14 samples and 2 classes: 9 Yes, 5 No Wind has 2 different values: Weak, Strong Wind=Weak (8 samples: 6 Yes, 2 No); Wind=Strong (6 samples: 3 Yes, 3 No): – With:
  • 26. ❑ Members: 3-5 students; Duration: 10 mins. ❑ Compute: 1. G(S, Outlook) 2. G(S, Temperature) 3. G(S, Humidity) Group Discussion 26
  • 27. 27 Decision Tree Outlook is the root (has maximal Information Gain)
  • 28. 28 Decision Tree Outlook has 3 different values: sunny, overcast, and rainy → The root has 3 branches Which attribute should be chosen at Sunny branch? (Outlook, Humidity, Temperature, Wind) ➢ Ssunny = {D1, D2, D8, D9, D11}, then it has 5 samples with Outlook = sunny ➢ Gain(Ssunny, Humidity) = 0.970 ➢ Gain(Ssunny, Temperature) = 0.570 ➢ Gain(Ssunny, Wind) = 0.019 ➢ Select Humidity Keep doing until all samples are classified or there are no remaining attributes for further partitioning
  • 29. 29 Information Gain Day Outlook Temperature Humidity Wind Play ball D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rainy Mild High Weak Yes D5 Rainy Cool Normal Weak Yes D6 Rainy Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rainy Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rainy Mild High Strong No
  • 30. 30 Gini index Gini index of data D: With: the relative frequency of class j in D Example: with data set above: 14 samples: 9 Yes, 5 No Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459 2 ) ( 1 ) ( Gini  − = j D j p D ) ( D j p
  • 31. 31 Gini index If a data set D is split on A into k subsets D1, D2,…, Dk the gini index giniA(D) is defined as: With: ➢ ni: #samples of node i ➢ N: #samples of node A Select attribute with minimal Gini index for partitioning  = = k i i A i n n D 1 ) ( Gini ) ( Gini
  • 32. 32 Gini index Day Outlook Temperature Humidity Wind Play ball D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rainy Mild High Weak Yes D5 Rainy Cool Normal Weak Yes D6 Rainy Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rainy Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rainy Mild High Strong No
  • 33. 33 Gini index Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459 (9 yes, 5 No) 1. GiniOutlook(D) = 5/14*Gini(Ssunny) + 4/14*Gini(Sovercast) + 5/14*Gini(Srainy) = 0.343 With: Gini(Ssunny) = 0.48 // 2Yes, 3No Gini(Sovercast) = 0 // 4Yes, 0No Gini(Srainy) = 0.48 // 3Yes, 2No
  • 34. ❑ Members: 3-5 students; Duration: 5 mins. ❑ Compute: 1. Gini Temperature (D) 2. Gini Humidity (D) 3. Gini Wind (D) Group Discussion 34
  • 35. 35 Gini index Gini(D)=1 - (9/14)2 - (5/14)2 = 0.459 1. GiniOutlook(D)= 0.343 2. GiniTemperature(D) = 0.440 3. GiniHumidity(D) = 0.367 4. GiniWind(D) = 0.428 → Outlook is selected as the root (Gini index is the minimal value)
  • 36. 36 Decision Tree Conclusion: Easy to understand Data preprocessing Big data
  • 37. 1. Introduction 2. Decision Tree 3. Bayes Classification Method 4. Neural network CONTENT
  • 39. 39 Bayes classification 1. Introduction 2. Bayes classification 3. Comments
  • 40. 40 Bayes classification Introduction A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities Foundation: Based on Bayes’ Theorem (1763) Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data
  • 41. 41 Bayes’ Theorem: Basics Let X be a data sample (“evidence”): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), (i.e., posteriori probability): the probability that the hypothesis holds given the observed data sample X P(H) (prior probability): the initial probability X will plays baseball, regardless of humidity, wind, overcast… P(X) (prior probability): probability that sample data is observed ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P  = =
  • 42. 42 Bayes’ Theorem: Basics P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis holds Informally, this can be viewed as: posteriori = likelihood x prior/evidence Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P  = =
  • 43. 43 Bayes’ Theorem: Basics Naïve Bayes Classifier: attributes are conditionally independent (i.e., no dependence relation among attributes) P(X|H): X=(x1, x2,…, xk) P(x1,…,xk|H) = P(x1|H)·…·P(xk|H) ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P  = =
  • 44. Bayes’ Classifier – Example Outlook Temperature Humidity Wind Play ball Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rainy Mild High Weak Yes Rainy Cool Normal Weak Yes Rainy Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rainy Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rainy Mild High Strong No
  • 45. Bayes’ Classifier – Example Let X = (Outlook = Rainy, Temp = Cool, Humidity = Normal, Wind = Weak) → X belongs to class Yes or No? Compute → Predict 1. P(Play=Yes) *P(X|Play=Yes) = P(Play=Yes) * P(Outlook=Rainy|Play=Yes)*P(Temp=Cool|Play=Yes)* P(Humidity=Normal|Play=Yes)* P(Wind=Weak|Play=Yes) 2. P(Play=No) *P(X|Play=No) = P(Play=No) * P(Outlook=Rainy|Play=No)*P(Temp=Cool|Play=No)* P(Humidity=Normal|Play=No)* P(Wind=Weak|Play=No) ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P  = = Comment
  • 46. Bayes’ Classifier – Example Let X = (Outlook = Rainy, Temp = Cool, Humidity = Normal, Wind = Weak) → X belongs to class Yes or No? Compute: ✓ P (Play=Yes) = 9/14; P(Play=No) = 5/14 ✓ P(Outlook=Rainy|Play=Yes) = 3/9; ✓ P(Outlook=Rainy|Play=No) = 2/5;
  • 47. ❑ Members: 3-5 students; Duration: 5 mins. ❑ Compute: 1. P(Temp=Cool|Play=Yes) = 2. P(Temp=Cool|Play=No) = 3. P(Humidity=Normal|Play=Yes) = 4. P(Humidity=Normal|Play=No) = 5. P(Wind=Weak|Play=Yes) = 6. P(Wind=Weak|Play=No) = Group Discussion 47
  • 48. Bayes’ Classifier – Example 1. P(Temp=Cool|Play=Yes) = 3/9 2. P(Temp=Cool|Play=No) = 1/5 3. P(Humidity=Normal|Play=Yes) = 6/9 4. P(Humidity=Normal|Play=No) = 1/5 5. P(Wind=Weak|Play=Yes) = 6/9 6. P(Wind=Weak|Play=No) = 2/5
  • 49. Bayes’ Classifier – Example 1. P(Play=Yes) *P(X|Play=Yes) = (9/14) * (3/9) * (3/9) * (6/9) * (6/9) = 0.032 2. P(Play=No) *P(X|Play=No) = (5/14) * (2/5) * (1/5) * (1/5) * (2/5) = 0.002 Conlusion: X=(Rainy, Cool, Normal, Weak) belongs to class Play = Yes
  • 50. ❑ Members: 3-5 students; ❑ Duration: 5 mins. ❑ Let X = (Outlook = Sunny, Temp = Hot, Humidity = High, Wind = Weak), predict X. Group Discussion 50
  • 51. ❑ Members: 3-5 students; ❑ Duration: 5 mins. ❑ Let X = (Outlook = Overcast, Temp = Hot, Humidity = High, Wind = Weak), predict X. Group Discussion 51 Naïve Bayesian prediction requires each conditional prob. be non-zero.
  • 52. 52 Bayes’ Classifier Need to avoid the Zero-Probability Problem Use Laplacian correction (or Laplacian estimator) P(Ci)=(|Ci,D|+1)/(|D|+m) P(Xk|Ci)=(# Ci,D {xk}+1)/(|Ci,D|+r) With: - m: #classes - r: #different values of the attribute
  • 53. Bayes’ Classifier Let X = (Outlook = Overcast, Temp = Hot, Humidity = High, Wind = Weak) using Laplacian correction. Compute: ✓ P(Play=Yes) = (9+1)/(14+2) = 10/16 ✓ P(Play=No) = (5+1)/(14+2) = 6/16 ✓ P(Outlook=Overcast|Play=Yes)=(4+1)/(9+3)=5/12 ✓ P(Outlook=Overcast|Play=No) = 1/8
  • 54. 54 Comments Advatages Easy to implement Good results obtained in most of the cases Disadvatages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. Dependencies among these cannot be modeled by Naïve Bayes Classifier
  • 55. 1. Introduction 2. Decision Tree 3. Bayes Classification Method 4. Neural network CONTENT
  • 57. 57 Neural network 1. Introduction 2. Neural network 3. Comments
  • 59. 59 Neural network Neural: Soma, Dendrite, Axon https://science.howstuffworks.com/life/inside-the-mind/human-
  • 60. Neural network Artificial Neuron Model McCulloch-Pitts neuron (1943) Weights: Wij Net input Activation fuction f thresholds  = j j ij i x w net
  • 61. Neural network Artificial Neuron Model Activation functions
  • 62. Neural network Artificial Neuron Model Activation functions
  • 64. 64 Comments Advantages High tolerance to noisy data Ability to classify untrained patterns Well-suited for continuous-valued inputs and outputs Successful on an array of real-world data, e.g., hand-written letters, … Techniques have recently been developed for very complicated topics
  • 65. 65 Comments Disadvantages Long training time Require a number of parameters typically best determined empirically, e.g., the network topology or “structure.” Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network
  • 66. 1. Introduction 2. Decision Tree 3. Bayes Classification Methods 4. Neural network 5. K - Nearest Neighbor Classifier 6. Support Vector Machine CONTENT
  • 67. K - NEAREST NEIGHBOR CLASSIFIER
  • 68. 68 K - Nearest Neighbor Classifier 1. Introduction 2. K - Nearest Neighbor Classifier 3. Comments
  • 69. 69 Introduction The k-nearest-neighbor method was first described in the early 1950s The idea is to search for the closest match(es) of the test data in the feature space. All instances correspond to points in the n-D space The nearest neighbor are defined: dist(X1, X2) (ex: Euclidean distance)
  • 70. 70 K - Nearest Neighbor Classifier The training tuples are described by n attributes Each tuple represents a point in an n-dimensional space → all the training tuples are stored in an n- dimensional pattern space When given an unknown tuple: a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple These k training tuples are the k “nearest neighbors” of the unknown tuple.
  • 71. 71 K - Nearest Neighbor Classifier “Closeness” is defined in terms of a distance metric (such as Euclidean distance) X1(x11, x12, …x1n), X2(x21, x22, …x2n) For discrete-valued, k-NN returns the most common value among the k training examples nearest to xq
  • 72. 72 K - Nearest Neighbor Classifier K=1, 3, 4, or 7? K should be an odd number Neighbours with equal importance? Weighted kNN: depending on their distance to the new-comer: https://docs.opencv.org/3.4/d5/d26/tutorial_py_knn_understanding.html 2 ) , ( 1 i x q x d w
  • 73. 73 Comments K=? Extremely slow when classifying test tuples. "learning" involves only memorizing (storing) the data, before testing and classifying. Distance? Robust to noisy data
  • 74. 1. Introduction 2. Decision Tree 3. Bayes Classification Methods 4. Neural network 5. K - Nearest Neighbor Classifier 6. Support Vector Machine CONTENT
  • 75. 75 Support Vector Machine 1. Introduction 2. Support Vector Machine 3. Comments
  • 76. 76 Introduction A relatively new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training data into a higher dimension With the new dimension, it searches for the linear optimal separating hyperplane (i.e., “decision boundary”) With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane SVM finds this hyperplane using support vectors (“essential” training tuples) and margins (defined by the support vectors)
  • 77. 77 SVM—History and Applications Vapnik and colleagues (1992)—groundwork from Vapnik & Chervonenkis’ statistical learning theory in 1960s Features: training can be slow but accuracy is high owing to their ability to model complex nonlinear decision boundaries (margin maximization) Used for: classification and numeric prediction Applications: handwritten digit recognition, object recognition, speaker identification, benchmarking time-series prediction tests
  • 79. April 7, 2022 Data Mining: Concepts and Techniques 79 SVM—Margins and Support Vectors
  • 80. 80 SVM—When Data Is Linearly Separable m Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated with the class labels yi There are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one that minimizes classification error on unseen data) SVM searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
  • 81. 81 SVM—Linearly Separable ◼ A separating hyperplane can be written as W ● X + b = 0 where W={w1, w2, …, wn} is a weight vector and b a scalar (bias) ◼ For 2-D it can be written as w0 + w1 x1 + w2 x2 = 0 ◼ The hyperplane defining the sides of the margin: H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1 ◼ Any training tuples that fall on hyperplanes H1 or H2 (i.e., the sides defining the margin) are support vectors ◼ This becomes a constrained (convex) quadratic optimization problem: Quadratic objective function and linear constraints → Quadratic Programming (QP) → Lagrangian multipliers
  • 82. 82 Why Is SVM Effective on High Dimensional Data? ◼ The complexity of trained classifier is characterized by the # of support vectors rather than the dimensionality of the data ◼ The support vectors are the essential or critical training examples — they lie closest to the decision boundary (MMH) ◼ If all other training examples are removed and the training is repeated, the same separating hyperplane would be found ◼ The number of support vectors found can be used to compute an (upper) bound on the expected error rate of the SVM classifier, which is independent of the data dimensionality ◼ Thus, an SVM with a small number of support vectors can have good generalization, even when the dimensionality of the data is high
  • 83. 83 SVM: Different Kernel functions ◼ Instead of computing the dot product on the transformed data, it is math. equivalent to applying a kernel function K(Xi, Xj) to the original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj) ◼ Typical Kernel Functions
  • 84. 84 SVM Related Links SVM Website: http://www.kernel-machines.org/ SVM practical guide: library for SVM Representative implementations LIBSVM: an efficient implementation of SVM, multi-class classifications, nu-SVM, one-class SVM, including also various interfaces with java, python, etc. SVM-light: simpler but performance is not better than LIBSVM, support only binary classification and only in C SVM-torch: another recent implementation also written in C
  • 85. 1. Introduction 2. Decision Tree 3. Bayes Classification Methods 4. Neural network 5. K - Nearest Neighbor Classifier 6. Support Vector Machine CONTENT