SlideShare a Scribd company logo
1 of 69
Topik 7
Rule-Based Machine Learning, Clustering,
dan Association Rules
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
June 28, 2022
June 28, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma-
chine learning klasik (linear regression, rule-based machine learning, probabilistic machine
learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge-
nalan citra (convolutional neural network).
Adapun indikator tercapainya CPMK tersebut adalah memahami konsep dasar deci-
sion tree, dapat melakukan penghitungan entropi dan information gain, memahami konsep
clustering dan association rules.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Introduction to Decision Tree: materi ini membahas tentang konsep generalisasi dalam
machine learning. Hal mendasar dalam machine learning adalah menghasilkan rules
atau aturan dari sekolompok data. Rules ini akan digunakan untuk memprediksi kelas
atau kategori dari data yang belum pernah dilihat sebelumnya oleh sistem. Meskipun
demikian, pada beberapa kasus ada lebih dari satu opsi untuk melakukan klasifikasi.
Tree yang dihasilkan dari rules tersebut memiliki berbagai kemungkinan. Di sinilah
peran dari teori informasi dalam menentukan peran dan posisi atribut dalam pemben-
tukan tree.
b) Entropy and Information Gain: materi ini membahas konsep teori informasi untuk
menentukan tingkat ketidakteraturan dalam sebuah data. Semakin besar entropi dari
sebuah atribut, maka peran atribut tersebut dalam membagi kelas atau kategori se-
makin kecil. Semakin kecil nilai entropi, semakin besar peran tersebut dalam membagi
kelas atau kategori dari data yang kita miliki. Atribut yang memiliki penurunan en-
tropi terendah—atau memiliki information gain tertinggi—adalah atribut yang akan
menempati root node dalam decision tree.
c) Clustering: materi ini membahas salah satu teknik unsupervised learning yang sangat
terkenal, yakni konsep clustering. Clustering biasa dimanfaatkan untuk mencari pola
atau kesamaan karakteristik pada data. Dengan clustering, data dapat dibagi menjadi
beberapa kategori. Clustering menjadi langkah awal untuk melakukan labeling pada
data.
d) Association Rules: teknik ini adalah bagian dari unsupervised learning yang sering
digunakan untuk memberikan rekomendasi bagi pelanggan e-commerce berdasarkan
frekuensi kemunculan sebuah produk atau seberapa sering sebuah dua buah produk
dibeli bersamaan. Pada materi ini akan dibahas tiga buah ukuran (metrics) yang
sering digunakan dalam association rules, yakni support, confidence, dan lift.
1
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Decision Tree (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Review: Key concepts of AI
• Machines learn from experience…
• Through examples, analogy or discovery
• Adapting…
• Changes in response to interaction
• Generalizing…
• To use experience to form a response to
novel situations (i.e., unseen data)
• Machine learning is the branch of Artificial
Intelligence concerned with building
systems that generalize from examples
2
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Can a machine Learn?
• From a limited set of examples, you
should be able to generalize.
• Human can easily do generalization if
the data are simple enough
• If the data are complicated, human
ability is limited and prone to error.
• We get a machine to do this task.
3
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
What rules that you can extract to predict outcome?
4
J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015
Credit scoring dataset
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
What rules that you can extract to predict outcome?
5
J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Can you extract the rules manually?
6
More complicated credit scoring dataset
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
What rules that you can extract to predict outcome?
7
Not as easy as the first case, right?
That’s when machine learning helps us!
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
8
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Decision Tree (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
What rules that you can extract to predict outcome?
2
J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015
Credit scoring dataset
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Goal of machine learning?
3
Using a set of training data to find the best rule (model) from that
generalize well (do a good labeling task / predict the outcome)
on new data (unseen data)
training
data
new
data
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Four types of classification techniques
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Rule-based Machine Learning: Decision Trees
• A map of the reasoning process, good at solving classification
problems (Negnevitsky, 2005)
• A decision tree represents a number of different attributes and
values
• Nodes represent attributes*
• Branches represent values of the attributes
• Path through a tree represents a decision
• Tree can be associated with rules
5
Note: attributes = features
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Why Decision Trees?
• Decision Trees (DT) are one of the most popular data mining
tools (with linear and logistic regression)
• They are:
• Easy to understand
• Easy to implement
• Computationally cheap
• Almost all data mining packages include DT
• They have advantages for model comprehensibility, which is
important for:
• model evaluation
• communication to non-technical stakeholders
6
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Example: Ice-cream
7
Outlook Temperature Holiday Season Result
Overcast Mild Yes Don’t Sell
Sunny Mild Yes Sell
Sunny Hot No Sell
Overcast Hot No Don’t Sell
Sunny Cold No Don’t Sell
Overcast Cold Yes Don’t Sell
*overcast = cloudy
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Example: Ice-cream
• When should an ice-cream seller
attempt to sell ice-cream?
• Could you write a set of rules?
• How would you acquire the
knowledge?
• You might learn by experience:
• For example, experience of:
• ‘Outlook’: Overcast or Sunny
• ‘Temperature’: Hot, Mild or Cold
• ‘Holiday Season’: Yes or No
8
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Generalisation
• What should the seller do when:
• ‘Outlook’: Sunny
• ‘Temperature’: Hot
• ‘Holiday Season’: Yes
• What about:
• ‘Outlook’: Overcast
• ‘Temperature’: Hot
• ‘Holiday Season’: Yes
9
Sell
Sell
Let’s visualize the rules
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Example 1: Ice-cream
10
Outlook
Temperature
Sunny
Hot
Sell
Don’t Sell
Sell
Yes No
Mild
Holiday Season
Cold
Don’t Sell
Holiday Season
Overcast
No
Don’t Sell
Yes
Temperature
Hot Cold
Mild
Don’t Sell
Sell Don’t Sell
Root node
Branch
(value of
attributes)
Leaf
Node
(attributes)
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
Construction
• Concept learning:
• Inducing concepts from examples
• Different algorithms used to construct a
tree based upon the examples
• Most popular Iterative Dichotomizer 3
(called ID3, proposed by Quinlan, 1986)
• But:
• Different trees can be constructed from
the same set of examples
• Real-life is noisy and often contradictory
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
12
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Decision Tree (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Example: Ice-cream
2
Outlook
Temperature
Sunny
Hot
Sell
Don’t Sell
Sell
Yes No
Mild
Holiday Season
Cold
Don’t Sell
Holiday Season
Overcast
No
Don’t Sell
Yes
Temperature
Hot Cold
Mild
Don’t Sell
Sell Don’t Sell
Root node
Branch
(value of
attributes)
Leaf
Node
(attributes)
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Construction
• Concept learning:
• Inducing concepts from examples
• Different algorithms used to construct a
tree based upon the examples
• Most popular Iterative Dichotomizer 3
(called ID3, proposed by Quinlan, 1986)
• But:
• Different trees can be constructed from
the same set of examples
• Real-life is noisy and often contradictory
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Ambiguous Trees
4
Item X Y Class
1 False False +
2 True False +
3 False True -
4 True True -
Consider the following data:
pengawet
pewarna
Pewarna = food coloring
Pengawet = preservatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Ambiguous Trees
5
Y
{3,4}
Negative
{1,2}
Positive
True False
1st option: Y as root node
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Ambiguous Trees
6
X
{2,4}
Y
{1,3}
Y
{2}
Positive
{4}
Negative
True False
True False
{1}
Positive
{3}
Negative
True False
Different trees can be constructed from the same set of examples.
Which tree is the best? Based upon choice of attributes at each node in the tree
2nd option: X as root node
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Information Theory in Decision Tree (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Ambiguous Trees
2
Item X Y Class
1 False False +
2 True False +
3 False True -
4 True True -
Consider the following data:
pengawet
pewarna
Pewarna = food coloring
Pengawet = preservatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Ambiguous Trees
3
Y
{3,4}
Negative
{1,2}
Positive
True False
1st option: Y as root node
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Ambiguous Trees
4
X
{2,4}
Y
{1,3}
Y
{2}
Positive
{4}
Negative
True False
True False
{1}
Positive
{3}
Negative
True False
Different trees can be constructed from the same set of examples.
Which tree is the best? Based upon choice of attributes at each node in the tree
2nd option: X as root node
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Information Theory
• We can use information theory to help us understand:
• Which attribute is the best to choose for a particular node of the tree
• This is the node that is the best at separating the required
predictions, and hence which leads to the best (or at least a good)
tree
• ‘Information Theory address both the limitations and the possibilities of
communication’ (MacKay, 2003:16):
• Measuring information content
• Probability (measure of chance) and entropy (measure of disorder)
5
MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge, UK: Cambridge University Press.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Choosing attributes
• Entropy:
• Measure of disorder/unexpected/
uncertainty/ surprise/random
• High entropy means the data has high
variance and thus contains a lot of
information and/or noise.
• For classification categories:
• Attribute that has value
• Probability of being in category is
• Entropy is:
6
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Information Theory in Decision Tree (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Choosing attributes
• Entropy:
• Measure of disorder/unexpected/
uncertainty/ surprise/random
• High entropy means the data has high
variance and thus contains a lot of
information and/or noise.
• For classification categories:
• Attribute that has value
• Probability of being in category is
• Entropy is:
2
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
3
The attributes define whether
the example is a:
1. City or town: Yes or No
2. Has a university nearby: Yes or No
3. Type of nearby housing estate:
None, Small, Medium, Large
4. Quality of public transport:
Good, Average, Poor
5. The number of school nearby:
Small, Medium, Large
Class : + (Yes, locate new bar)
- ( No, don’t locate new bar)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Entropy example
• Choice of attributes:
• City/Town, University, Housing Estate,
Industrial Estate, Transport and
Schools
• Let’s compute entropy of City/Town
• City/Town: is either Y or N
• For Y: 7 positive examples, 3 negative
• For N: 4 positive examples, 6 negative
4
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
5
For Y (Yes):
7 positives, 3 negatives
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
6
For N (No):
4 positives, 6 negatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Information Theory in Decision Tree (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
2
The attributes define whether
the example is a:
1. City or town: Yes or No
2. Has a university nearby: Yes or No
3. Type of nearby housing estate:
None, Small, Medium, Large
4. Quality of public transport:
Good, Average, Poor
5. The number of school nearby:
Small, Medium, Large
Class : + (Yes, locate new bar)
- ( No, don’t locate new bar)
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Entropy example
• Choice of attributes:
• City/Town, University, Housing Estate,
Industrial Estate, Transport and
Schools
• Let’s compute entropy of City/Town
• City/Town: is either Y or N
• For Y: 7 positive examples, 3 negative
• For N: 4 positive examples, 6 negative
3
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
4
For Y (Yes):
7 positives, 3 negatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
5
For N (No):
4 positives, 6 negatives
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Entropy example
• City/Town as root node:
• For c=2 (positive and negative)
classification categories
• Attribute a=City/Town that has value v=Y
• Probability of v=Y being in category positive
= 7/10
• Probability of v=Y being in category
negative
= 3/10
6
For Y (Yes):
7 positives, 3 negatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Entropy example
• City/Town as root node:
• For c=2 (positive and negative) classification
categories
• Attribute a=City/Town that has value v=Y
• Entropy E is:
E (City/Town = Y)
= (-7/10 x log2 7/10) + (- 3/10 x log2 3/10)
= -[0.7 x -0.51 + 0.3 x -1.74 ]
= 0.881
7
For Y (Yes):
7 positives, 3 negatives
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
8
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Information Theory in Decision Tree (Part 04)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Entropy example
• Choice of attributes:
• City/Town, University, Housing
Estate,
Industrial Estate, Transport and
Schools
• City/Town: is either Y or N
• For Y: 7 positive examples, 3 negative
• For N: 4 positive examples, 6 negative
2
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
3
For Y (Yes):
7 positives, 3 negatives
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
4
For N (No):
4 positives, 6 negatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Entropy Example
• City/Town as root node:
• For c=2 (positive and negative)
classification categories
• Attribute a=City/Town that has value v=N
• Probability of v=N being in category
positive
=4/10
• Probability of v=N being in category
negative
= 6/10
5
For N (No):
4 positives, 6 negatives
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Entropy Example
• City/Town as root node:
• For c=2 (positive and negative)
classification categories
• Attribute a=City/Town that has value v=N
• Entropy E is:
E (City/Town = N)
= (-4/10 x log2 4/10) + (- 6/10 x log2 6/10)
= 0.971
6
For N (No):
4 positives, 6 negatives
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Entropy Example
• If the purity of the instances increases, the entropy decreases
• High entropy means high disorder / uncertainty
• In the example below, 7(+) and 3(-) has lower entropy (0.881)
because the purity of the instances is higher (7 City/Town = Y tends
to show (+) class)
7
E(City/Town = Y) = 0.881 —> 7 (+) and 3 (-)
E(City/Town = N) = 0.971 —> 4 (+) and 6 (-)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Entropy
8
Ten instances consist of two classes : + and -
Source: F. Provost and T. Fawcett, Data Science for Business, O’Reilly Media, 2013
High entropy
means high
disorder /
uncertainty
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Choosing attributes
• Information gain:
• Expected reduction in entropy (high is good)
• Entropy of whole example set is
• Examples with = and is value are
• Entropy = = ( )
• Gain is:
• = total samples = 20
• = number of samples with value (Y/N values)
9
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Root of tree
• For root of tree, there are 20 examples:
• For c=2 (positive and negative)
classification categories
• Probability of being positive class with
11 examples
=11/20
• Probability of being negative with
9 examples
= 9/20
10
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
Information gain example
• For root of tree there are 20
examples:
• For c=2 (positive and negative)
classification categories
• Entropy of all training examples E(T) is:
|T | = 20
E(T) = (-11/20 x log2 11/20) +
(- 9/20 x log2 9/20)
= 0.993
11
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
Entropy example
12
E(City/Town = Y) = 0.881 —> 7 (+) and 3 (-)
E(City/Town = N) = 0.971 —> 4 (+) and 6 (-)
Total sample for E(City/Town = Y) = 10
Total sample for E(City/Town = N) = 10
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
Information gain example
• City/Town as root node:
• 10 examples for a=City/Town and value v=Y
• |Tj=Y | = 10 E(Tj=Y) = 0.881
• 10 examples for a=City/Town and value v=N
• |Tj=N | = 10 E(Tj=N) = 0.971
13
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14
14
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Information Theory in Decision Tree (Part 05)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Choosing attributes
• Information gain:
• Expected reduction in entropy (high is good)
• Entropy of whole example set is
• Examples with = and is value are
• Entropy = = ( )
• Gain is:
• = total samples = 20
• = number of samples with value (Y/N values)
2
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Information gain example
• For root of tree there are 20
examples:
• For c=2 (positive and negative)
classification categories
• Entropy of all training examples E(T) is:
|T | = 20
E(T) = (-11/20 x log2 11/20) +
(- 9/20 x log2 9/20)
= 0.993
3
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Information gain example
• City/Town as root node:
• 10 examples for a=City/Town and value v=Y
• |Tj=Y | = 10 E(Tj=Y) = 0.881
• 10 examples for a=City/Town and value v=N
• |Tj=N | = 10 E(Tj=N) = 0.971
4
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Now, we compute information gain of
another attribute: Transport
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
6
The attributes define whether
the example is a:
1. City or town: Yes or No
2. Has a university nearby: Yes or
No
3. Type of nearby housing estate:
None, Small, Medium, Large
4. Quality of public transport:
Good, Average, Poor
5. The number of school nearby:
Small, Medium, or Large
E(T) = 0.993
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Compute the entropy of transport
• Transport:
• For c=2 (positive and negative) classification
categories
• Attribute a=Transport that has value v=G
• Probability of v=G being in category positive
=
5/5
= 1
• Probability of v=G being in category negative
• =
0/0 = 0
7
E(Transport = G) = (-5/5 x log2 5/5) + (0) = 0
Quality of public transport:
Good, Average, Poor
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Compute the entropy of transport
• Transport:
• For c=2 (positive and negative) classification
categories
• Attribute a=Transport that has value v=A
• Probability of v=A being in category positive
= 3/7 = 0.429
• Probability of v=A being in category negative
= 4/7 = 0.571
8
E(Transport = A) = (-3/7 x log2 3/7) + (-4/7 x log2 4/7)
= -[3/7 x -1.22 + 4/7 x -0.808]
= 0.524 + 0.461 = 0.985
Quality of public transport:
Good, Average, Poor
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Compute the entropy of transport
• Transport:
• For c=2 (positive and negative) classification
categories
• Attribute a=Transport that has value v=P
• Probability of v=P being in category positive
= 3/8 = 0.375
• Probability of v=P being in category negative
= 5/8 = 0.625
9
E(Transport = P) = (-3/8 x log2 3/8) + (-5/8 x log2 5/8)
= - [3/8 x -1.415 + 5/8 x -0.678]
= 0.530 + 0.424 = 0.954
Quality of public transport:
Good, Average, Poor
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Information gain of transport
10
Gain(T, Transport) = 0.993 – ((5/20 x 0) + (7/20 x 0.985) + (8/20 x 0.954))
= 0.993 - (0.345 + 0.382)
= 0.266
A = average
P = poor
G = good
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
Information gain: City/Town vs. Transport
11
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
Choosing attributes
• Chose root node as the attribute that
gives the highest Information Gain
• In this case attribute Transport,
IG=0.226
• Branches from root node then become
the values associated with the attribute
• Recursive calculation of IG of
attributes/nodes
• Filter examples by attribute value
12
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
Recursive example for selecting
the next attribute
• Example: with Transport as the root
node:
• Select examples where Transport is
Average: (1, 3, 6, 8, 11, 15, 17)
• Use only these examples to construct
this branch of the tree.
• Select other attributes by computing the
Information Gain (IG), get the highest
one.
• Repeat for each value of Transport
(Poor, Good)
13
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14
Final tree
14
Transport
{7,12,16,19,20}
Positive
A P G
{8}
Negative
{6}
Negative
{1,3,6,8,11,15,17}
Housing Estate
L M S N
{11,17}
Industrial Estate
{17}
Negative
{11}
Positive
Y N
{1,3,15}
University
{15}
Negative
{1,3}
Positive
Y N
Callan 2003:243
{5,9,14}
Positive
{2,4,10,13,18}
Negative
{2,4,5,9,10,13,14,18}
Industrial Estate
Y N
The attributes define whether
the example is a:
1. City or town: Yes or No
2. Has a university nearby: Yes or No
3. Type of nearby housing estate:
None, Small, Medium, Large
4. Quality of public transport:
Good, Average, Poor
5. The number of school nearby:
Small, Medium, or Large
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15
15
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Clustering (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Unsupervised learning
• Learning without teacher. No label / class in the data
• Normally used:
• when we want to explore the unlabeled data for the
first time (to create a training set for later prediction
or classification)
• when we want to do voters profiling based on their
characteristics and online activities (useful for
political campaign)
• when we want to provide association between
customer products (recommender system in e-
commerce).
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Clustering analysis
The daily expenditures on food (X1) and clothing (X2) of five persons
are shown in a table below
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Clustering analysis
• The numbers are fictitious and not at all realistic, but the example will
help us explain the essential features of cluster analysis as simply as
possible. The data in the table are plotted in the figure below.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Clustering analysis
• Inspection of the figure suggests that the five
observations form two clusters.
• The first consists of persons a and d, and the second of
b, c and e.
• It can be noted that the observations in each cluster are
similar to one another with respect to expenditures on
food (X1) and clothing (X2), and that the two clusters
are quite distinct from each other.
• This inspection was possible because only two
variables were involved in grouping the observations.
The question is: Can a procedure be devised for similarly
grouping observations when there are more than two
variables or attributes?
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Measures of distances for variables
• Clustering methods require a more precise
definition of "similarity" ("closeness",
"proximity") of observations and clusters.
• When the grouping is based on variables,
it is natural to employ the familiar concept of
distance.
• Consider the right figure as a map showing
two points, i and j, with coordinates (X1i,X2i)
and (X1j ,X2j), respectively.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Euclidean distance
• The Euclidean distance between the two
points is the hypotenuse of the triangle ABC:
• An observation i is declared to be closer
(more similar) to j than to observation k if
D(i,j) < D(i,k).
• An alternative measure is the squared
Euclidean distance. In the figure, the squared
distance between the two points i and j is
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Clustering methods
• Nearest neighbor (or single linkage) method
• Furthest neighbor (or complete linkage) method
• K-means method
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Nearest neighbor method
• One of the the simplest method to treat the distance between the two
nearest observations, one from each cluster, as the distance between
the two clusters.
• This is known as the nearest neighbor (or single linkage) method.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Nearest neighbor method
Let us suppose that Euclidean distance is the appropriate measure of proximity.
We begin with each of the five observations forming its own cluster.
The distance between each pair of observations is shown in the figure below.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
Nearest neighbor method
• For example, the distance between a and b is
• Observations b and e are nearest (most similar)
and, as shown in figure (b), are grouped in the
same cluster.
• Assuming the nearest neighbor method is used, the
distance between the cluster (be) and another
observation is the smaller distances between that
observation, on the one hand, and b and e, on the
other. For example,
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
Nearest neighbor Method
• Observation a and d are nearest with distance
1.414
• We arbitrarily select (a,d) as the new cluster
• Then, the distance between (be) and (ad) is
• while that between c and (ad) is
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
Nearest neighbor Method
• We finally merge (be) with c to form the cluster (bce) shown in below
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14
Nearest neighbor method
• The grouping of these two clusters, it will be noted, occurs at a distance of
6.325, a much greater distance than that at which the earlier groupings took
place.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15
Nearest neighbor method
• The groupings and the distance at
which these took place are also shown
in the tree diagram (dendrogram)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 16
Nearest neighbor method
• One usually searches the dendrogram for
large jumps in the grouping distance as
guidance in arriving at the number of
groups.
• In this illustration, it is clear that the elements
in each of the clusters (ad) and (bce) are close
(they were merged at a small distance)
• However, the clusters are distant (the
distance at which they merge is large).
• Thus, we conclude that there are two clusters
instead of one big cluster.
Largest
jumps
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 17
Furthest neighbor method
• Under the furthest neighbor (or complete linkage) method, the
distance between two clusters is the distant members.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 18
Furthest neighbor method
• The distances between all pairs of observations shown
in the figure are the same as with the nearest neighbor
method.
• Therefore, the furthest neighbor method also calls for
grouping b and e at Step 1.
• However, the distances between (be), on the one hand,
and the clusters (a), (c), and (d), on the other, are
different:
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 19
Furthest neighbor method
• The four clusters remaining at Step 2 and the distances between these
clusters are shown below
• The nearest clusters are (a) and (d), which are now grouped into the
cluster (ad). The remaining steps are similarly executed.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 20
20
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Clustering (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
K-means clustering
• k-means clustering is a technique used to uncover
categories.
• In the retail sector, it can be used to categorize both products
and customers.
• k represents the number of categories identified, with each
category’s average (mean) characteristics being appreciably
different from that of other categories.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Determining cluster membership
• Specify the number of clusters arbitrarily.
• We can then determine cluster membership.
This involves a simple iterative process.
• We will illustrate this process with a 2-cluster
example:
Step 1: Start by making a guess on where the
central points of each cluster are. Let’s call
these pseudo-centers, since we do not yet
know if they are actually at the center of their
clusters.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Determining cluster membership
Step 2: Assign each data point to the
nearest pseudo-center (measured by
euclidean distance).
By doing so, we have just formed
clusters, with each cluster comprising
all data points associated with its
pseudo-center.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Determining cluster membership
Step 3: Update the location of each
cluster’s pseudo-center, such that it is
now indeed in the center of all its
members (cluster’s centroid).
NOTE: The cluster centroid is the point with
coordinates equal to the average values of
the variables for the observations in that
cluster.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Determining cluster membership
Step 4: Repeat the steps of re-
assigning cluster members
(Step 2) and re-locating cluster
centers (Step 3), until there are
no more changes to cluster
membership.
(see the animation)
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
K-means method
Suppose two clusters are to be formed for the observations listed in
a table below, showing the daily expenditures on food (X1)
and clothing (X2) of five persons
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
K-means method
• Step 1: we begin by arbitrarily assigning a, b and d to Cluster 1, and
c and e to Cluster 2. The cluster centroids are calculated as shown
in the table.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
K-means method
• The cluster centroid is the point with
coordinates equal to the average values of
the variables for the observations in that
cluster.
• Thus, the centroid of Cluster 1 is the point
(X1 = 3.67, X2 = 3.67), and that of Cluster
2 the point (8.75, 2). The two centroids are
marked by C1 and C2.
• The cluster's centroid, therefore, can be
considered the center of the observations
in the cluster.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
K-means method
• We now calculate the distance between a
and the two centroids:
• Observe that a is closer to the centroid of
Cluster 1, to which it is currently assigned.
Therefore, a is not reassigned.
• Next, we calculate the distance between b
and the two cluster centroids:
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
K-means method
• Step 2: since b is closer to Cluster 2's centroid than to that of
Cluster 1, it is reassigned to Cluster 2. The new cluster centroids
are calculated as shown in figure (a).
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
K-means method
• The new centroids are plotted. The distances of the observations from the new cluster
centroids are as follows (an asterisk indicates the nearest centroid):
• Every observation belongs to the cluster to the centroid of which it is nearest, and the k-
means method stops. The elements of the two clusters are shown in the table.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
13
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Association Rules (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Supermarket’s problem
• When we go grocery shopping, we often
have a standard list of things to buy.
• Each shopper has a distinctive list,
depending on one’s needs and
preferences.
• A housewife might buy healthy ingredients
for a family dinner, while a bachelor might
buy fruits and chips.
• Understanding these buying patterns can
help to increase sales in several ways.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Supermarket’s problem
If there is a pair of items, X and Y, that are
frequently bought together:
• Both X and Y can be placed on the same
shelf, so that buyers of one item would be
prompted to buy the other.
• Promotional discounts could be applied
to just one out of the two items.
• Advertisements on X could be targeted at
buyers who purchase Y.
• X and Y could be combined into a new
product, such as having Y in flavors of X.
While we may know that certain items are
frequently bought together, the question is, how
do we uncover these associations?
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Association rules (1/3)
Table 1. Example Transactions
• Association rules analysis is a technique to uncover how items are
associated to each other. There are three common ways to
measure association.
• Measure 1: Support. This says how popular an itemset is, as
measured by the proportion of transactions in which an itemset
appears.
• In Table 1, the support of {apple} is 4 out of 8, or 50%. Itemsets
can also contain multiple items. For instance, the support of {apple,
beer, rice} is 2 out of 8, or 25%.
• If you discover that sales of items beyond a certain proportion tend
to have a significant impact on your profits, you might consider
using that proportion as your support threshold.
• You may then identify itemsets with support values above this
threshold as significant itemsets.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Association rules (2/3)
• Measure 2: Confidence. This says how likely item Y is purchased
when item X is purchased, expressed as {X -> Y}.
• This is measured by the proportion of transactions with item X, in
which item Y also appears.
In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.
• One drawback of the confidence measure is that it might misrepresent
the importance of an association.
• This is because it only accounts for how popular apples are, but not
beers.
• If beers are also very popular in general, there will be a higher
chance that a transaction containing apples will also contain
beers, thus inflating the confidence measure.
• To account for the base popularity of both constituent items, we use a
third measure called lift.
Table 1. Example Transactions
Support {apple, beer} : 3 / 8
Support {apple} : 4 / 8
Confidence {apple, beer} : 3/8 * 8/4 = 3/4
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Association rules (3/3)
Measure 3: Lift. This says how likely item Y is
purchased when item X is purchased, while
controlling for how popular item Y is.
In Table 1, the lift of {apple -> beer} is 1,
which implies no association between items.
A lift value greater than 1:
item Y is likely to be bought if item X is bought
A lift value less than 1:
item Y is unlikely to be bought if item X is bought.
Table 1. Example Transactions
Support {apple, beer} : 3 / 8
Support {apple} = 4 / 8
Support {beer} = 6/8
Support {apple} * support {beer} = 24/64
Lift : 3/8 * 64/24 = 8/8 = 1
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Illustration of association rules
The network graph shows associations between selected items in
a supermarket.
Larger circles imply higher support, while red circles imply
higher lift. Several purchase patterns can be observed:
• The most popular transaction was of pip and tropical fruits (#1)
• Another popular transaction was of onions and other
vegetables (#2)
• If someone buys meat spreads, he is likely to have bought
yogurt as well (#3)
• Relatively many people buy sausage along with sliced cheese
(#4)
• If someone buys tea, he is likely to have bought fruit as well,
possibly inspiring the production of fruit-flavored tea (#5)
1
2
3
4
5
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
8
End of File
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Association Rules (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
How to use support, confidence, and lift
• The {beer -> soda} rule has the highest
confidence at 20% (see Table 3)
• However, both beer and soda appear
frequently across all transactions (see
Table 2), so their association could simply
be a fluke.
• This is confirmed by the lift value of
{beer -> soda}, which is 1, implying no
association between beer and soda.
Table 2. Support of individual items
Table 3. Association measures for beer-related rules
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
How to use support, confidence, and lift
• On the other hand, the {beer -> male cosmetics}
rule has a low confidence, due to few purchases
of male cosmetics in general (see Table 3)
• However, whenever someone does buy male
cosmetics, he is very likely to buy beer as well,
as inferred from a high lift value of 2.6 (see
Table 3)
• The converse is true for {beer -> berries}.
• With a lift value below 1, we may conclude that
if someone buys berries, he would likely be
averse to beer.
Table 2. Support of individual items
Table 3. Association measures for beer-related rules
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Apriori algorithm
• The apriori principle can reduce the number of itemsets we need to
examine.
• Put simply, the apriori principle states that if an itemset is infrequent,
then all its subsets must also be infrequent.
• This means that if {beer} was found to be infrequent, we can expect
{beer, pizza} to be equally or even more infrequent.
• So in consolidating the list of popular itemsets, we need not consider
{beer, pizza}, nor any other itemset configuration that contains beer
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Apriori algorithm
Using the apriori principle, the number of itemsets that
have to be examined can be pruned, and the list of
popular itemsets can be obtained in these steps:
Step 0. Start with itemsets containing just a single item,
such as {apple} and {pear}.
Step 1. Determine the support for itemsets. Keep the
itemsets that meet your minimum support threshold, and
remove itemsets that do not.
Step 2. Using the itemsets you have kept from
Step 1, generate all the possible itemset configurations.
Step 3. Repeat Steps 1 & 2 until there are no more new
itemsets.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Apriori Algorithm
• As seen in the animation, {apple} was determine to
have low support, hence it was removed and all
other itemset configurations that contain apple need
not be considered.
• This reduced the number of itemsets to consider by
more than half.
• Note that the support threshold that you pick in Step
1 could be based on formal analysis or past
experience.
• If you discover that sales of items beyond a certain
proportion tend to have a significant impact on your
profits, you might consider using that proportion as
your support threshold.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Finding item rules with high confidence or lift
• We have seen how the apriori algorithm can be
used to identify itemsets with high support.
• The same principle can also be used to identify
item associations with high confidence or lift.
• Finding rules with high confidence or lift is less
computationally taxing once high-support
itemsets have been identified, because
confidence and lift values are calculated using
support values.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Finding item rules with high confidence or lift
• Take for example the task of finding high-confidence
rules.
• If the rule {beer, chips -> apple} has low confidence, all
other rules with the same constituent items and with
apple on the right hand side would have low confidence
too.
• Specifically, the rules
{beer -> apple, chips}
{chips -> apple, beer}
would have low confidence as well.
• As before, lower level candidate item rules can be
pruned using the apriori algorithm, so that fewer
candidate rules need to be examined.
27/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Limitations
• Computationally Expensive.
• Even though the apriori algorithm reduces the number of candidate itemsets to consider,
this number could still be huge when store inventories are large or when the support
threshold is low.
• However, an alternative solution would be to reduce the number of comparisons by using
advanced data structures, to sort candidate itemsets more efficiently.
• Spurious (fake) Associations.
• Analysis of large inventories would involve more itemset configurations, and the support
threshold might have to be lowered to detect certain associations.
• However, lowering the support threshold might also increase the number of spurious
associations detected.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
10
End of File

More Related Content

Similar to Machine Learning Techniques

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningshivani saluja
 
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of ThingsHow We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of ThingsJason Hong
 
Data mining is the statistical technique of processing raw data in a structur...
Data mining is the statistical technique of processing raw data in a structur...Data mining is the statistical technique of processing raw data in a structur...
Data mining is the statistical technique of processing raw data in a structur...ssuser6478a8
 
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Karen Graham
 
Hair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxHair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxAsadAli104515
 
“Building Embedded Vision Products: Management Lessons From The School of Har...
“Building Embedded Vision Products: Management Lessons From The School of Har...“Building Embedded Vision Products: Management Lessons From The School of Har...
“Building Embedded Vision Products: Management Lessons From The School of Har...Edge AI and Vision Alliance
 
Framing Patterns
Framing PatternsFraming Patterns
Framing PatternsSmartOrg
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 
Machine learning by prity mahato
Machine learning by prity mahatoMachine learning by prity mahato
Machine learning by prity mahatoPrity Mahato
 
Publishing Strategic Technology for Association of Catholic Publishers
Publishing Strategic Technology for Association of Catholic PublishersPublishing Strategic Technology for Association of Catholic Publishers
Publishing Strategic Technology for Association of Catholic PublishersCraig Miller
 
Introduction to research
Introduction to researchIntroduction to research
Introduction to researchHKRabby2
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Get Smart: The Present and Future of Data Discovery
Get Smart: The Present and Future of Data DiscoveryGet Smart: The Present and Future of Data Discovery
Get Smart: The Present and Future of Data DiscoveryInside Analysis
 
Day 1 wazz up ai
Day 1  wazz up aiDay 1  wazz up ai
Day 1 wazz up aiHuyPhmNht2
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorRudradeb Mitra
 
Issue challenges and future scope of big data.pptx
Issue challenges and future scope of big data.pptxIssue challenges and future scope of big data.pptx
Issue challenges and future scope of big data.pptxEsmailRahmani1
 
Why is your model stuck in the lab? How to move your model from the lab to pr...
Why is your model stuck in the lab? How to move your model from the lab to pr...Why is your model stuck in the lab? How to move your model from the lab to pr...
Why is your model stuck in the lab? How to move your model from the lab to pr...Data Con LA
 

Similar to Machine Learning Techniques (20)

Cs583 intro
Cs583 introCs583 intro
Cs583 intro
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of ThingsHow We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
 
Data mining is the statistical technique of processing raw data in a structur...
Data mining is the statistical technique of processing raw data in a structur...Data mining is the statistical technique of processing raw data in a structur...
Data mining is the statistical technique of processing raw data in a structur...
 
Decision Tree Modeling Using R Training in Bangalore
Decision Tree Modeling Using R Training in BangaloreDecision Tree Modeling Using R Training in Bangalore
Decision Tree Modeling Using R Training in Bangalore
 
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
 
Hair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxHair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptx
 
“Building Embedded Vision Products: Management Lessons From The School of Har...
“Building Embedded Vision Products: Management Lessons From The School of Har...“Building Embedded Vision Products: Management Lessons From The School of Har...
“Building Embedded Vision Products: Management Lessons From The School of Har...
 
Framing Patterns
Framing PatternsFraming Patterns
Framing Patterns
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Machine learning by prity mahato
Machine learning by prity mahatoMachine learning by prity mahato
Machine learning by prity mahato
 
Publishing Strategic Technology for Association of Catholic Publishers
Publishing Strategic Technology for Association of Catholic PublishersPublishing Strategic Technology for Association of Catholic Publishers
Publishing Strategic Technology for Association of Catholic Publishers
 
Introduction to research
Introduction to researchIntroduction to research
Introduction to research
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Z group project guide
Z group project guideZ group project guide
Z group project guide
 
Get Smart: The Present and Future of Data Discovery
Get Smart: The Present and Future of Data DiscoveryGet Smart: The Present and Future of Data Discovery
Get Smart: The Present and Future of Data Discovery
 
Day 1 wazz up ai
Day 1  wazz up aiDay 1  wazz up ai
Day 1 wazz up ai
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
 
Issue challenges and future scope of big data.pptx
Issue challenges and future scope of big data.pptxIssue challenges and future scope of big data.pptx
Issue challenges and future scope of big data.pptx
 
Why is your model stuck in the lab? How to move your model from the lab to pr...
Why is your model stuck in the lab? How to move your model from the lab to pr...Why is your model stuck in the lab? How to move your model from the lab to pr...
Why is your model stuck in the lab? How to move your model from the lab to pr...
 

More from Sunu Wibirama

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanModul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanSunu Wibirama
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfSunu Wibirama
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanSunu Wibirama
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Sunu Wibirama
 

More from Sunu Wibirama (7)

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan Buatan
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan Buatan
 
Modul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanModul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan Buatan
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan Buatan
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 

Machine Learning Techniques

  • 1. Topik 7 Rule-Based Machine Learning, Clustering, dan Association Rules Dr. Sunu Wibirama Modul Kuliah Kecerdasan Buatan Kode mata kuliah: UGMx 001001132012 June 28, 2022
  • 2. June 28, 2022 1 Capaian Pembelajaran Mata Kuliah Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma- chine learning klasik (linear regression, rule-based machine learning, probabilistic machine learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge- nalan citra (convolutional neural network). Adapun indikator tercapainya CPMK tersebut adalah memahami konsep dasar deci- sion tree, dapat melakukan penghitungan entropi dan information gain, memahami konsep clustering dan association rules. 2 Cakupan Materi Cakupan materi dalam topik ini sebagai berikut: a) Introduction to Decision Tree: materi ini membahas tentang konsep generalisasi dalam machine learning. Hal mendasar dalam machine learning adalah menghasilkan rules atau aturan dari sekolompok data. Rules ini akan digunakan untuk memprediksi kelas atau kategori dari data yang belum pernah dilihat sebelumnya oleh sistem. Meskipun demikian, pada beberapa kasus ada lebih dari satu opsi untuk melakukan klasifikasi. Tree yang dihasilkan dari rules tersebut memiliki berbagai kemungkinan. Di sinilah peran dari teori informasi dalam menentukan peran dan posisi atribut dalam pemben- tukan tree. b) Entropy and Information Gain: materi ini membahas konsep teori informasi untuk menentukan tingkat ketidakteraturan dalam sebuah data. Semakin besar entropi dari sebuah atribut, maka peran atribut tersebut dalam membagi kelas atau kategori se- makin kecil. Semakin kecil nilai entropi, semakin besar peran tersebut dalam membagi kelas atau kategori dari data yang kita miliki. Atribut yang memiliki penurunan en- tropi terendah—atau memiliki information gain tertinggi—adalah atribut yang akan menempati root node dalam decision tree. c) Clustering: materi ini membahas salah satu teknik unsupervised learning yang sangat terkenal, yakni konsep clustering. Clustering biasa dimanfaatkan untuk mencari pola atau kesamaan karakteristik pada data. Dengan clustering, data dapat dibagi menjadi beberapa kategori. Clustering menjadi langkah awal untuk melakukan labeling pada data. d) Association Rules: teknik ini adalah bagian dari unsupervised learning yang sering digunakan untuk memberikan rekomendasi bagi pelanggan e-commerce berdasarkan frekuensi kemunculan sebuah produk atau seberapa sering sebuah dua buah produk dibeli bersamaan. Pada materi ini akan dibahas tiga buah ukuran (metrics) yang sering digunakan dalam association rules, yakni support, confidence, dan lift. 1
  • 3. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to Decision Tree (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Review: Key concepts of AI • Machines learn from experience… • Through examples, analogy or discovery • Adapting… • Changes in response to interaction • Generalizing… • To use experience to form a response to novel situations (i.e., unseen data) • Machine learning is the branch of Artificial Intelligence concerned with building systems that generalize from examples 2
  • 4. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Can a machine Learn? • From a limited set of examples, you should be able to generalize. • Human can easily do generalization if the data are simple enough • If the data are complicated, human ability is limited and prone to error. • We get a machine to do this task. 3 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 What rules that you can extract to predict outcome? 4 J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015 Credit scoring dataset
  • 5. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 What rules that you can extract to predict outcome? 5 J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Can you extract the rules manually? 6 More complicated credit scoring dataset
  • 6. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 What rules that you can extract to predict outcome? 7 Not as easy as the first case, right? That’s when machine learning helps us! sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 8 End of File
  • 7. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to Decision Tree (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 What rules that you can extract to predict outcome? 2 J.D. Kelleher, B.M. Namee, A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015 Credit scoring dataset
  • 8. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Goal of machine learning? 3 Using a set of training data to find the best rule (model) from that generalize well (do a good labeling task / predict the outcome) on new data (unseen data) training data new data sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Four types of classification techniques
  • 9. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Rule-based Machine Learning: Decision Trees • A map of the reasoning process, good at solving classification problems (Negnevitsky, 2005) • A decision tree represents a number of different attributes and values • Nodes represent attributes* • Branches represent values of the attributes • Path through a tree represents a decision • Tree can be associated with rules 5 Note: attributes = features sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Why Decision Trees? • Decision Trees (DT) are one of the most popular data mining tools (with linear and logistic regression) • They are: • Easy to understand • Easy to implement • Computationally cheap • Almost all data mining packages include DT • They have advantages for model comprehensibility, which is important for: • model evaluation • communication to non-technical stakeholders 6
  • 10. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Example: Ice-cream 7 Outlook Temperature Holiday Season Result Overcast Mild Yes Don’t Sell Sunny Mild Yes Sell Sunny Hot No Sell Overcast Hot No Don’t Sell Sunny Cold No Don’t Sell Overcast Cold Yes Don’t Sell *overcast = cloudy sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Example: Ice-cream • When should an ice-cream seller attempt to sell ice-cream? • Could you write a set of rules? • How would you acquire the knowledge? • You might learn by experience: • For example, experience of: • ‘Outlook’: Overcast or Sunny • ‘Temperature’: Hot, Mild or Cold • ‘Holiday Season’: Yes or No 8
  • 11. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Generalisation • What should the seller do when: • ‘Outlook’: Sunny • ‘Temperature’: Hot • ‘Holiday Season’: Yes • What about: • ‘Outlook’: Overcast • ‘Temperature’: Hot • ‘Holiday Season’: Yes 9 Sell Sell Let’s visualize the rules sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Example 1: Ice-cream 10 Outlook Temperature Sunny Hot Sell Don’t Sell Sell Yes No Mild Holiday Season Cold Don’t Sell Holiday Season Overcast No Don’t Sell Yes Temperature Hot Cold Mild Don’t Sell Sell Don’t Sell Root node Branch (value of attributes) Leaf Node (attributes)
  • 12. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 Construction • Concept learning: • Inducing concepts from examples • Different algorithms used to construct a tree based upon the examples • Most popular Iterative Dichotomizer 3 (called ID3, proposed by Quinlan, 1986) • But: • Different trees can be constructed from the same set of examples • Real-life is noisy and often contradictory sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 12 End of File
  • 13. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to Decision Tree (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Example: Ice-cream 2 Outlook Temperature Sunny Hot Sell Don’t Sell Sell Yes No Mild Holiday Season Cold Don’t Sell Holiday Season Overcast No Don’t Sell Yes Temperature Hot Cold Mild Don’t Sell Sell Don’t Sell Root node Branch (value of attributes) Leaf Node (attributes)
  • 14. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Construction • Concept learning: • Inducing concepts from examples • Different algorithms used to construct a tree based upon the examples • Most popular Iterative Dichotomizer 3 (called ID3, proposed by Quinlan, 1986) • But: • Different trees can be constructed from the same set of examples • Real-life is noisy and often contradictory sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Ambiguous Trees 4 Item X Y Class 1 False False + 2 True False + 3 False True - 4 True True - Consider the following data: pengawet pewarna Pewarna = food coloring Pengawet = preservatives
  • 15. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Ambiguous Trees 5 Y {3,4} Negative {1,2} Positive True False 1st option: Y as root node sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Ambiguous Trees 6 X {2,4} Y {1,3} Y {2} Positive {4} Negative True False True False {1} Positive {3} Negative True False Different trees can be constructed from the same set of examples. Which tree is the best? Based upon choice of attributes at each node in the tree 2nd option: X as root node
  • 16. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 17. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Information Theory in Decision Tree (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Ambiguous Trees 2 Item X Y Class 1 False False + 2 True False + 3 False True - 4 True True - Consider the following data: pengawet pewarna Pewarna = food coloring Pengawet = preservatives
  • 18. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Ambiguous Trees 3 Y {3,4} Negative {1,2} Positive True False 1st option: Y as root node sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Ambiguous Trees 4 X {2,4} Y {1,3} Y {2} Positive {4} Negative True False True False {1} Positive {3} Negative True False Different trees can be constructed from the same set of examples. Which tree is the best? Based upon choice of attributes at each node in the tree 2nd option: X as root node
  • 19. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Information Theory • We can use information theory to help us understand: • Which attribute is the best to choose for a particular node of the tree • This is the node that is the best at separating the required predictions, and hence which leads to the best (or at least a good) tree • ‘Information Theory address both the limitations and the possibilities of communication’ (MacKay, 2003:16): • Measuring information content • Probability (measure of chance) and entropy (measure of disorder) 5 MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge, UK: Cambridge University Press. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Choosing attributes • Entropy: • Measure of disorder/unexpected/ uncertainty/ surprise/random • High entropy means the data has high variance and thus contains a lot of information and/or noise. • For classification categories: • Attribute that has value • Probability of being in category is • Entropy is: 6
  • 20. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 21. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Information Theory in Decision Tree (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Choosing attributes • Entropy: • Measure of disorder/unexpected/ uncertainty/ surprise/random • High entropy means the data has high variance and thus contains a lot of information and/or noise. • For classification categories: • Attribute that has value • Probability of being in category is • Entropy is: 2
  • 22. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 3 The attributes define whether the example is a: 1. City or town: Yes or No 2. Has a university nearby: Yes or No 3. Type of nearby housing estate: None, Small, Medium, Large 4. Quality of public transport: Good, Average, Poor 5. The number of school nearby: Small, Medium, Large Class : + (Yes, locate new bar) - ( No, don’t locate new bar) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Entropy example • Choice of attributes: • City/Town, University, Housing Estate, Industrial Estate, Transport and Schools • Let’s compute entropy of City/Town • City/Town: is either Y or N • For Y: 7 positive examples, 3 negative • For N: 4 positive examples, 6 negative 4
  • 23. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 5 For Y (Yes): 7 positives, 3 negatives sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 6 For N (No): 4 positives, 6 negatives
  • 24. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 25. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Information Theory in Decision Tree (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 2 The attributes define whether the example is a: 1. City or town: Yes or No 2. Has a university nearby: Yes or No 3. Type of nearby housing estate: None, Small, Medium, Large 4. Quality of public transport: Good, Average, Poor 5. The number of school nearby: Small, Medium, Large Class : + (Yes, locate new bar) - ( No, don’t locate new bar)
  • 26. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Entropy example • Choice of attributes: • City/Town, University, Housing Estate, Industrial Estate, Transport and Schools • Let’s compute entropy of City/Town • City/Town: is either Y or N • For Y: 7 positive examples, 3 negative • For N: 4 positive examples, 6 negative 3 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 4 For Y (Yes): 7 positives, 3 negatives
  • 27. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 5 For N (No): 4 positives, 6 negatives sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Entropy example • City/Town as root node: • For c=2 (positive and negative) classification categories • Attribute a=City/Town that has value v=Y • Probability of v=Y being in category positive = 7/10 • Probability of v=Y being in category negative = 3/10 6 For Y (Yes): 7 positives, 3 negatives
  • 28. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Entropy example • City/Town as root node: • For c=2 (positive and negative) classification categories • Attribute a=City/Town that has value v=Y • Entropy E is: E (City/Town = Y) = (-7/10 x log2 7/10) + (- 3/10 x log2 3/10) = -[0.7 x -0.51 + 0.3 x -1.74 ] = 0.881 7 For Y (Yes): 7 positives, 3 negatives sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 8 End of File
  • 29. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Information Theory in Decision Tree (Part 04) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Entropy example • Choice of attributes: • City/Town, University, Housing Estate, Industrial Estate, Transport and Schools • City/Town: is either Y or N • For Y: 7 positive examples, 3 negative • For N: 4 positive examples, 6 negative 2
  • 30. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 3 For Y (Yes): 7 positives, 3 negatives sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 4 For N (No): 4 positives, 6 negatives
  • 31. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Entropy Example • City/Town as root node: • For c=2 (positive and negative) classification categories • Attribute a=City/Town that has value v=N • Probability of v=N being in category positive =4/10 • Probability of v=N being in category negative = 6/10 5 For N (No): 4 positives, 6 negatives sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Entropy Example • City/Town as root node: • For c=2 (positive and negative) classification categories • Attribute a=City/Town that has value v=N • Entropy E is: E (City/Town = N) = (-4/10 x log2 4/10) + (- 6/10 x log2 6/10) = 0.971 6 For N (No): 4 positives, 6 negatives
  • 32. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Entropy Example • If the purity of the instances increases, the entropy decreases • High entropy means high disorder / uncertainty • In the example below, 7(+) and 3(-) has lower entropy (0.881) because the purity of the instances is higher (7 City/Town = Y tends to show (+) class) 7 E(City/Town = Y) = 0.881 —> 7 (+) and 3 (-) E(City/Town = N) = 0.971 —> 4 (+) and 6 (-) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Entropy 8 Ten instances consist of two classes : + and - Source: F. Provost and T. Fawcett, Data Science for Business, O’Reilly Media, 2013 High entropy means high disorder / uncertainty
  • 33. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Choosing attributes • Information gain: • Expected reduction in entropy (high is good) • Entropy of whole example set is • Examples with = and is value are • Entropy = = ( ) • Gain is: • = total samples = 20 • = number of samples with value (Y/N values) 9 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Root of tree • For root of tree, there are 20 examples: • For c=2 (positive and negative) classification categories • Probability of being positive class with 11 examples =11/20 • Probability of being negative with 9 examples = 9/20 10
  • 34. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 Information gain example • For root of tree there are 20 examples: • For c=2 (positive and negative) classification categories • Entropy of all training examples E(T) is: |T | = 20 E(T) = (-11/20 x log2 11/20) + (- 9/20 x log2 9/20) = 0.993 11 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 Entropy example 12 E(City/Town = Y) = 0.881 —> 7 (+) and 3 (-) E(City/Town = N) = 0.971 —> 4 (+) and 6 (-) Total sample for E(City/Town = Y) = 10 Total sample for E(City/Town = N) = 10
  • 35. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 Information gain example • City/Town as root node: • 10 examples for a=City/Town and value v=Y • |Tj=Y | = 10 E(Tj=Y) = 0.881 • 10 examples for a=City/Town and value v=N • |Tj=N | = 10 E(Tj=N) = 0.971 13 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14 14 End of File
  • 36. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Information Theory in Decision Tree (Part 05) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Choosing attributes • Information gain: • Expected reduction in entropy (high is good) • Entropy of whole example set is • Examples with = and is value are • Entropy = = ( ) • Gain is: • = total samples = 20 • = number of samples with value (Y/N values) 2
  • 37. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Information gain example • For root of tree there are 20 examples: • For c=2 (positive and negative) classification categories • Entropy of all training examples E(T) is: |T | = 20 E(T) = (-11/20 x log2 11/20) + (- 9/20 x log2 9/20) = 0.993 3 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Information gain example • City/Town as root node: • 10 examples for a=City/Town and value v=Y • |Tj=Y | = 10 E(Tj=Y) = 0.881 • 10 examples for a=City/Town and value v=N • |Tj=N | = 10 E(Tj=N) = 0.971 4
  • 38. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Now, we compute information gain of another attribute: Transport sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 6 The attributes define whether the example is a: 1. City or town: Yes or No 2. Has a university nearby: Yes or No 3. Type of nearby housing estate: None, Small, Medium, Large 4. Quality of public transport: Good, Average, Poor 5. The number of school nearby: Small, Medium, or Large E(T) = 0.993
  • 39. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Compute the entropy of transport • Transport: • For c=2 (positive and negative) classification categories • Attribute a=Transport that has value v=G • Probability of v=G being in category positive = 5/5 = 1 • Probability of v=G being in category negative • = 0/0 = 0 7 E(Transport = G) = (-5/5 x log2 5/5) + (0) = 0 Quality of public transport: Good, Average, Poor sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Compute the entropy of transport • Transport: • For c=2 (positive and negative) classification categories • Attribute a=Transport that has value v=A • Probability of v=A being in category positive = 3/7 = 0.429 • Probability of v=A being in category negative = 4/7 = 0.571 8 E(Transport = A) = (-3/7 x log2 3/7) + (-4/7 x log2 4/7) = -[3/7 x -1.22 + 4/7 x -0.808] = 0.524 + 0.461 = 0.985 Quality of public transport: Good, Average, Poor
  • 40. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Compute the entropy of transport • Transport: • For c=2 (positive and negative) classification categories • Attribute a=Transport that has value v=P • Probability of v=P being in category positive = 3/8 = 0.375 • Probability of v=P being in category negative = 5/8 = 0.625 9 E(Transport = P) = (-3/8 x log2 3/8) + (-5/8 x log2 5/8) = - [3/8 x -1.415 + 5/8 x -0.678] = 0.530 + 0.424 = 0.954 Quality of public transport: Good, Average, Poor sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Information gain of transport 10 Gain(T, Transport) = 0.993 – ((5/20 x 0) + (7/20 x 0.985) + (8/20 x 0.954)) = 0.993 - (0.345 + 0.382) = 0.266 A = average P = poor G = good
  • 41. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 Information gain: City/Town vs. Transport 11 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 Choosing attributes • Chose root node as the attribute that gives the highest Information Gain • In this case attribute Transport, IG=0.226 • Branches from root node then become the values associated with the attribute • Recursive calculation of IG of attributes/nodes • Filter examples by attribute value 12
  • 42. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 Recursive example for selecting the next attribute • Example: with Transport as the root node: • Select examples where Transport is Average: (1, 3, 6, 8, 11, 15, 17) • Use only these examples to construct this branch of the tree. • Select other attributes by computing the Information Gain (IG), get the highest one. • Repeat for each value of Transport (Poor, Good) 13 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14 Final tree 14 Transport {7,12,16,19,20} Positive A P G {8} Negative {6} Negative {1,3,6,8,11,15,17} Housing Estate L M S N {11,17} Industrial Estate {17} Negative {11} Positive Y N {1,3,15} University {15} Negative {1,3} Positive Y N Callan 2003:243 {5,9,14} Positive {2,4,10,13,18} Negative {2,4,5,9,10,13,14,18} Industrial Estate Y N The attributes define whether the example is a: 1. City or town: Yes or No 2. Has a university nearby: Yes or No 3. Type of nearby housing estate: None, Small, Medium, Large 4. Quality of public transport: Good, Average, Poor 5. The number of school nearby: Small, Medium, or Large
  • 43. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15 15 End of File
  • 44. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Clustering (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Unsupervised learning • Learning without teacher. No label / class in the data • Normally used: • when we want to explore the unlabeled data for the first time (to create a training set for later prediction or classification) • when we want to do voters profiling based on their characteristics and online activities (useful for political campaign) • when we want to provide association between customer products (recommender system in e- commerce).
  • 45. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Clustering analysis The daily expenditures on food (X1) and clothing (X2) of five persons are shown in a table below sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Clustering analysis • The numbers are fictitious and not at all realistic, but the example will help us explain the essential features of cluster analysis as simply as possible. The data in the table are plotted in the figure below.
  • 46. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Clustering analysis • Inspection of the figure suggests that the five observations form two clusters. • The first consists of persons a and d, and the second of b, c and e. • It can be noted that the observations in each cluster are similar to one another with respect to expenditures on food (X1) and clothing (X2), and that the two clusters are quite distinct from each other. • This inspection was possible because only two variables were involved in grouping the observations. The question is: Can a procedure be devised for similarly grouping observations when there are more than two variables or attributes? sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Measures of distances for variables • Clustering methods require a more precise definition of "similarity" ("closeness", "proximity") of observations and clusters. • When the grouping is based on variables, it is natural to employ the familiar concept of distance. • Consider the right figure as a map showing two points, i and j, with coordinates (X1i,X2i) and (X1j ,X2j), respectively.
  • 47. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Euclidean distance • The Euclidean distance between the two points is the hypotenuse of the triangle ABC: • An observation i is declared to be closer (more similar) to j than to observation k if D(i,j) < D(i,k). • An alternative measure is the squared Euclidean distance. In the figure, the squared distance between the two points i and j is sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Clustering methods • Nearest neighbor (or single linkage) method • Furthest neighbor (or complete linkage) method • K-means method
  • 48. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Nearest neighbor method • One of the the simplest method to treat the distance between the two nearest observations, one from each cluster, as the distance between the two clusters. • This is known as the nearest neighbor (or single linkage) method. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Nearest neighbor method Let us suppose that Euclidean distance is the appropriate measure of proximity. We begin with each of the five observations forming its own cluster. The distance between each pair of observations is shown in the figure below.
  • 49. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 Nearest neighbor method • For example, the distance between a and b is • Observations b and e are nearest (most similar) and, as shown in figure (b), are grouped in the same cluster. • Assuming the nearest neighbor method is used, the distance between the cluster (be) and another observation is the smaller distances between that observation, on the one hand, and b and e, on the other. For example, sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 Nearest neighbor Method • Observation a and d are nearest with distance 1.414 • We arbitrarily select (a,d) as the new cluster • Then, the distance between (be) and (ad) is • while that between c and (ad) is
  • 50. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 Nearest neighbor Method • We finally merge (be) with c to form the cluster (bce) shown in below sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14 Nearest neighbor method • The grouping of these two clusters, it will be noted, occurs at a distance of 6.325, a much greater distance than that at which the earlier groupings took place.
  • 51. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15 Nearest neighbor method • The groupings and the distance at which these took place are also shown in the tree diagram (dendrogram) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 16 Nearest neighbor method • One usually searches the dendrogram for large jumps in the grouping distance as guidance in arriving at the number of groups. • In this illustration, it is clear that the elements in each of the clusters (ad) and (bce) are close (they were merged at a small distance) • However, the clusters are distant (the distance at which they merge is large). • Thus, we conclude that there are two clusters instead of one big cluster. Largest jumps
  • 52. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 17 Furthest neighbor method • Under the furthest neighbor (or complete linkage) method, the distance between two clusters is the distant members. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 18 Furthest neighbor method • The distances between all pairs of observations shown in the figure are the same as with the nearest neighbor method. • Therefore, the furthest neighbor method also calls for grouping b and e at Step 1. • However, the distances between (be), on the one hand, and the clusters (a), (c), and (d), on the other, are different:
  • 53. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 19 Furthest neighbor method • The four clusters remaining at Step 2 and the distances between these clusters are shown below • The nearest clusters are (a) and (d), which are now grouped into the cluster (ad). The remaining steps are similarly executed. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 20 20 End of File
  • 54. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Clustering (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 K-means clustering • k-means clustering is a technique used to uncover categories. • In the retail sector, it can be used to categorize both products and customers. • k represents the number of categories identified, with each category’s average (mean) characteristics being appreciably different from that of other categories.
  • 55. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Determining cluster membership • Specify the number of clusters arbitrarily. • We can then determine cluster membership. This involves a simple iterative process. • We will illustrate this process with a 2-cluster example: Step 1: Start by making a guess on where the central points of each cluster are. Let’s call these pseudo-centers, since we do not yet know if they are actually at the center of their clusters. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Determining cluster membership Step 2: Assign each data point to the nearest pseudo-center (measured by euclidean distance). By doing so, we have just formed clusters, with each cluster comprising all data points associated with its pseudo-center.
  • 56. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Determining cluster membership Step 3: Update the location of each cluster’s pseudo-center, such that it is now indeed in the center of all its members (cluster’s centroid). NOTE: The cluster centroid is the point with coordinates equal to the average values of the variables for the observations in that cluster. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Determining cluster membership Step 4: Repeat the steps of re- assigning cluster members (Step 2) and re-locating cluster centers (Step 3), until there are no more changes to cluster membership. (see the animation)
  • 57. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 K-means method Suppose two clusters are to be formed for the observations listed in a table below, showing the daily expenditures on food (X1) and clothing (X2) of five persons sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 K-means method • Step 1: we begin by arbitrarily assigning a, b and d to Cluster 1, and c and e to Cluster 2. The cluster centroids are calculated as shown in the table.
  • 58. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 K-means method • The cluster centroid is the point with coordinates equal to the average values of the variables for the observations in that cluster. • Thus, the centroid of Cluster 1 is the point (X1 = 3.67, X2 = 3.67), and that of Cluster 2 the point (8.75, 2). The two centroids are marked by C1 and C2. • The cluster's centroid, therefore, can be considered the center of the observations in the cluster. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 K-means method • We now calculate the distance between a and the two centroids: • Observe that a is closer to the centroid of Cluster 1, to which it is currently assigned. Therefore, a is not reassigned. • Next, we calculate the distance between b and the two cluster centroids:
  • 59. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 K-means method • Step 2: since b is closer to Cluster 2's centroid than to that of Cluster 1, it is reassigned to Cluster 2. The new cluster centroids are calculated as shown in figure (a). sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 K-means method • The new centroids are plotted. The distances of the observations from the new cluster centroids are as follows (an asterisk indicates the nearest centroid): • Every observation belongs to the cluster to the centroid of which it is nearest, and the k- means method stops. The elements of the two clusters are shown in the table.
  • 60. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 13 End of File
  • 61. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Association Rules (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Supermarket’s problem • When we go grocery shopping, we often have a standard list of things to buy. • Each shopper has a distinctive list, depending on one’s needs and preferences. • A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy fruits and chips. • Understanding these buying patterns can help to increase sales in several ways.
  • 62. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Supermarket’s problem If there is a pair of items, X and Y, that are frequently bought together: • Both X and Y can be placed on the same shelf, so that buyers of one item would be prompted to buy the other. • Promotional discounts could be applied to just one out of the two items. • Advertisements on X could be targeted at buyers who purchase Y. • X and Y could be combined into a new product, such as having Y in flavors of X. While we may know that certain items are frequently bought together, the question is, how do we uncover these associations? sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Association rules (1/3) Table 1. Example Transactions • Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association. • Measure 1: Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. • In Table 1, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%. • If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. • You may then identify itemsets with support values above this threshold as significant itemsets.
  • 63. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Association rules (2/3) • Measure 2: Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. • This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%. • One drawback of the confidence measure is that it might misrepresent the importance of an association. • This is because it only accounts for how popular apples are, but not beers. • If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. • To account for the base popularity of both constituent items, we use a third measure called lift. Table 1. Example Transactions Support {apple, beer} : 3 / 8 Support {apple} : 4 / 8 Confidence {apple, beer} : 3/8 * 8/4 = 3/4 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Association rules (3/3) Measure 3: Lift. This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1, which implies no association between items. A lift value greater than 1: item Y is likely to be bought if item X is bought A lift value less than 1: item Y is unlikely to be bought if item X is bought. Table 1. Example Transactions Support {apple, beer} : 3 / 8 Support {apple} = 4 / 8 Support {beer} = 6/8 Support {apple} * support {beer} = 24/64 Lift : 3/8 * 64/24 = 8/8 = 1
  • 64. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Illustration of association rules The network graph shows associations between selected items in a supermarket. Larger circles imply higher support, while red circles imply higher lift. Several purchase patterns can be observed: • The most popular transaction was of pip and tropical fruits (#1) • Another popular transaction was of onions and other vegetables (#2) • If someone buys meat spreads, he is likely to have bought yogurt as well (#3) • Relatively many people buy sausage along with sliced cheese (#4) • If someone buys tea, he is likely to have bought fruit as well, possibly inspiring the production of fruit-flavored tea (#5) 1 2 3 4 5 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 8 End of File
  • 65. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Association Rules (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 How to use support, confidence, and lift • The {beer -> soda} rule has the highest confidence at 20% (see Table 3) • However, both beer and soda appear frequently across all transactions (see Table 2), so their association could simply be a fluke. • This is confirmed by the lift value of {beer -> soda}, which is 1, implying no association between beer and soda. Table 2. Support of individual items Table 3. Association measures for beer-related rules
  • 66. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 How to use support, confidence, and lift • On the other hand, the {beer -> male cosmetics} rule has a low confidence, due to few purchases of male cosmetics in general (see Table 3) • However, whenever someone does buy male cosmetics, he is very likely to buy beer as well, as inferred from a high lift value of 2.6 (see Table 3) • The converse is true for {beer -> berries}. • With a lift value below 1, we may conclude that if someone buys berries, he would likely be averse to beer. Table 2. Support of individual items Table 3. Association measures for beer-related rules sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Apriori algorithm • The apriori principle can reduce the number of itemsets we need to examine. • Put simply, the apriori principle states that if an itemset is infrequent, then all its subsets must also be infrequent. • This means that if {beer} was found to be infrequent, we can expect {beer, pizza} to be equally or even more infrequent. • So in consolidating the list of popular itemsets, we need not consider {beer, pizza}, nor any other itemset configuration that contains beer
  • 67. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Apriori algorithm Using the apriori principle, the number of itemsets that have to be examined can be pruned, and the list of popular itemsets can be obtained in these steps: Step 0. Start with itemsets containing just a single item, such as {apple} and {pear}. Step 1. Determine the support for itemsets. Keep the itemsets that meet your minimum support threshold, and remove itemsets that do not. Step 2. Using the itemsets you have kept from Step 1, generate all the possible itemset configurations. Step 3. Repeat Steps 1 & 2 until there are no more new itemsets. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Apriori Algorithm • As seen in the animation, {apple} was determine to have low support, hence it was removed and all other itemset configurations that contain apple need not be considered. • This reduced the number of itemsets to consider by more than half. • Note that the support threshold that you pick in Step 1 could be based on formal analysis or past experience. • If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold.
  • 68. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Finding item rules with high confidence or lift • We have seen how the apriori algorithm can be used to identify itemsets with high support. • The same principle can also be used to identify item associations with high confidence or lift. • Finding rules with high confidence or lift is less computationally taxing once high-support itemsets have been identified, because confidence and lift values are calculated using support values. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Finding item rules with high confidence or lift • Take for example the task of finding high-confidence rules. • If the rule {beer, chips -> apple} has low confidence, all other rules with the same constituent items and with apple on the right hand side would have low confidence too. • Specifically, the rules {beer -> apple, chips} {chips -> apple, beer} would have low confidence as well. • As before, lower level candidate item rules can be pruned using the apriori algorithm, so that fewer candidate rules need to be examined.
  • 69. 27/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Limitations • Computationally Expensive. • Even though the apriori algorithm reduces the number of candidate itemsets to consider, this number could still be huge when store inventories are large or when the support threshold is low. • However, an alternative solution would be to reduce the number of comparisons by using advanced data structures, to sort candidate itemsets more efficiently. • Spurious (fake) Associations. • Analysis of large inventories would involve more itemset configurations, and the support threshold might have to be lowered to detect certain associations. • However, lowering the support threshold might also increase the number of spurious associations detected. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 10 End of File