Modul Topik 6 - Kecerdasan Buatan.pdf

Topik 6
Uncertainty Problems dan Probabilistic
Machine Learning
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
June 22, 2022

June 22, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma-
chine learning klasik (linear regression, rule-based machine learning, probabilistic machine
learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge-
nalan citra (convolutional neural network).
Adapun indikator tercapainya CPMK tersebut adalah
memahami teori dasar probabilitas (random variables, probability distributions, joint
probability, conditional probability, marginal probability), memahami terminologi dalam teori
uncertainty dan teori peluang, memahami konsep teori Bayes dan implementasinya.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Introduction to Uncertainty: materi ini menjelaskan tentang sebab-sebab munculnya
ketidakpastian dalam data dan kelemahan machine learning yang berbasis decision
tree (rule-based machine learning).
b) Probability for Machine Learning: materi ini menjelaskan dasar-dasar teori peluang,
termasuk di antaranya random variables, probability distributions, joint probability,
conditional probability, marginal probability.
c) Bayesian Rules: materi ini menjelaskan konsep-konsep dasar implementasi dari con-
ditional, joint, dan marginal probability dalam formula Bayesian rule atau Bayesian
reasoning.
d) Implementasi Bayesian Rule: materi ini menjelaskan implementasi Bayesian rule
dalam beberapa kasus, misalnya memprediksi gender, peluang terinfeksi Covid-19, dan
implementasi pada Computer Aided Diagnosis untuk memprediksi beberapa penyakit
berdasarkan ragam gejala yang ada.
1

17/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Uncertainty (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Neural network model Geometric model
Logical model/rule-based model Probabilistic model
Four types of classification technique

17/06/2022
sunu@ugm.ac.id
Geometric model
Distance of a house from city hospital
Price of
a house (USD)
Exclusive
housing
Government
supported housing
(e.g.,: SVM, linear discriminant analysis, KNN)
sunu@ugm.ac.id
Logical model / Rule-based model
SUBSCRIBE CHEAP LOTTERY
COUPON FOR ONLY $19
TO BE REMOVED FROM FUTURE
MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
SUBJECT.
Classifying spam email –
easy if you know the “feature”.
’Viagra’ and ‘lottery’ are two
important features of spam email.
Class:
spam
Class:
spam
Class:
ham
(e.g.,: decision tree)

17/06/2022
sunu@ugm.ac.id
What about sentiment analysis?
No definite rule to judge “positive” or “negative” tweet
sunu@ugm.ac.id
Sentiment analysis and
uncertainty of political view
• When analyzing Twitter data or Facebook data, you
observe the sentiment (tone) of the tweet /status.
• There is no exact rule on determining political view,
except that you can minimize the uncertainty of the
tweet/status tone.
• How to minimize the uncertainty?
• Gathering more evidence (in case of sentiment analysis:
words). But they are so many words? Use language
corpus!
• Measuring the probability of the occurrence
• Categorizing the tweet / status based on the probability
• This is the case when the logical machine learning model
does not work well. We will use the probabilistic model.

17/06/2022
sunu@ugm.ac.id
7
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Introduction to Uncertainty (Part 02)
sunu@ugm.ac.id
Uncertainty and probabilistic model
• Information can be incomplete, inconsistent, uncertain, or all three.
In other words, information is often unsuitable for solving a problem /
making inference
• Uncertainty is defined as the lack of the exact knowledge that would
enable us to reach a perfectly reliable conclusion.
• Classical logic (logical model) permits only exact reasoning.
 It assumes that perfect knowledge always exists and the law of the
excluded middle (Every proposition is either true or not true) can
always be applied:
IF A is true IF A is false
THEN A is not false THEN A is not true

17/06/2022
sunu@ugm.ac.id
Sources of uncertain knowledge (1)
• Weak implications: domain experts and engineers
have a painful task of establishing concrete
correlations between IF (condition) and THEN
(action) parts of the rules.
• Therefore, expert systems need to have the ability
to handle vague (lack of clarity) associations.
• For example by accepting the degree of
correlations as numerical certainty factors (i.e.,:
strong correlation between two variables
represents more certainty)
Marvin Minsky with Block Blocks Vision Robot at MIT. In 1946, he entered
Harvard University after returning from service in the U.S. Navy during World
War II. After graduating from Harvard in 1950, he attended Princeton
University, earning his Ph.D. in mathematics in 1954. In 1958, Minsky joined
the faculty of MIT’s Department of Electrical Engineering and Computer
Science. A year later, he co-founded the Artificial Intelligence Laboratory.
sunu@ugm.ac.id
• Imprecise language. Our natural language is
ambiguous and imprecise.
 We describe facts with such terms as often and
sometimes, frequently and hardly (=almost not) ever.
• As a result, it can be difficult to express knowledge
in the precise IF-THEN form of production rules.
• However, if the meaning of the facts is quantified,
it can be used in expert systems.
• In 1944, Ray Simpson asked 355 high school and
college students to place 20 terms such as ”often”
on a scale between 1 and 100.
• In 1968, Milton Hakel repeated this experiment.
4
Ray H. Simpson (1944) The specific meanings of certain terms indicating
differing degrees of frequency,Quarterly Journal of Speech, 30:3, 328-330

17/06/2022
sunu@ugm.ac.id
Sources of Uncertain Knowledge (2)
Imprecise
language
5
Term
Always
Very often
Usually
Often
Generally
Frequently
Rather often
About as often as not
Now and then
Sometimes
Occasionally
Once in a while
Not often
Usually not
Seldom
Hardly ever
Very seldom
Rarely
Almost never
Never
Mean value
99
88
85
78
78
73
65
50
20
20
20
15
13
10
10
7
6
5
3
0
Term
Always
Very often
Usually
Often
Generally
Frequently
Rather often
About as often as not
Now and then
Sometimes
Occasionally
Once in a while
Not often
Usually not
Seldom
Hardly ever
Very seldom
Rarely
Almost never
Never
Mean value
100
87
79
74
74
72
72
50
34
29
28
22
16
16
9
8
7
5
2
0
Milton Hakel (1968)
Ray Simpson (1944)
sunu@ugm.ac.id
• Unknown data. When the data is incomplete or missing, the
only solution is to accept the value “unknown” and proceed to
an approximate reasoning with this value.
• Combining the views of different experts. Large expert
systems usually combine the knowledge and expertise of a
number of experts.
 Unfortunately, experts often have contradictory opinions
and produce conflicting rules.
 To resolve the conflict, the engineer has to attach a
weight to each expert and then calculate the composite
conclusion. But no systematic method exists to obtain
these weights.
6

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Probability for Machine Learning (Part 01)
sunu@ugm.ac.id
Basic statistical measures
• The mean of a vector, usually denoted as ̅, is the mean of
its elements, that is to say the sum of the components
divided by the number of components.
• The variance is a value describing how the data is spread
around the mean. A dataset with a large variance means
that data points are spread far away from the mean. A
dataset with a small variance means that the data points are
grouped closely around the mean.
• The standard deviation is simply the square root of the
variance.
• The covariance between two variables tells if large values
in one variable are associated with large values in the other
and vice versa.

17/06/2022
sunu@ugm.ac.id
Basic probability theory
• The concept of probability has a long history
that goes back thousands of years when words
like “probably”, “likely”, “maybe”, “perhaps” and
“possibly” were introduced into spoken
languages.
• However, the mathematical theory of probability
was formulated only in the 17th century.
• The probability of an event is the proportion of
cases in which the event occurs.
• Probability can also be defined as a scientific
measure of chance.
3
sunu@ugm.ac.id
• Probability can be expressed mathematically as a numerical index with a range between
zero (an absolute impossibility) to unity (an absolute certainty).
• Most events have a probability index strictly between 0 and 1, which means that each
event has at least two possible outcomes: favourable outcome or success, and
unfavourable outcome or failure.
4
=
=

17/06/2022
sunu@ugm.ac.id
• If s is the number of times success can occur, and f is
the number of times failure can occur, then
+ = 1
• If we throw a coin, the probability of getting a head will
be equal to the probability of getting a tail. In a single
throw, s = f = 1, s + f = 2, and therefore the probability of
getting a head (or a tail) is 0.5.
5
= =
+
= =
+
sunu@ugm.ac.id
Geometric representation of event
Mutually exclusive Non-mutually exclusive
A B A B
(¬A)
(¬B)
A
A
B
B
A+B A∩B
Non-mutually exclusive:
Ada peluang event A dan B
muncul bersamaan.
Contoh: peluang munculnya
tail dan angka 6 ketika kita
melempar dadu, lalu
melempar koin

17/06/2022
sunu@ugm.ac.id
Mutually exclusive Non-mutually exclusive
if A and B are non-mutually exclusive
Non-mutually exclusive:
Ada peluang event A dan B muncul bersamaan.
Contoh: peluang munculnya tail dan angka 6 ketika kita melempar dadu, lalu melempar koin
Geometric representation of event
A B A B
Given two events, A and B, we define the probability of A or B as follows:
∪ = + − ∩
or
∪ = + if A and B are mutually exclusive
sunu@ugm.ac.id
8
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Random variables
• A random experiment, or simply experiment describes a
process that gives you uncertain results, as a coin flip, for
instance. The outcome of a random experiment is the result
you obtain.
• A random variable takes a value corresponding to the
outcome of a random experiment. Use big letter to
denote random variables and its corresponding small letter
for one of its value.
• As shown in the figure, if you flip a coin, the two possible
outcomes are ‘heads’ and ‘tails’. An example of random
variable would map ‘heads’ to 0 and ‘tails to 1.
• The event A corresponds to the following set of outcomes:
{‘heads’}. This means that the probability that the outcome
is ‘head’ can be denoted as:
= 0 = ′ℎ = ( )

17/06/2022
sunu@ugm.ac.id
Sample Space
• Discrete sample space:
• A sample space containing a finite number of possibilities or an
unending sequence with as many elements as there are
numbers.
• Variable called a “discrete random variable”
• Can be counted
• Example: # of people in the room with red shoes on.
• Continuous sample space:
• Sample space containing an infinite number of possibilities
equal to the number of points on a line segment.
• Non-discrete
• Variable is a “continuous random variable”
• Can’t be counted but can be measured
• Example: heights of children
3
Richard Holzer, Patrick Wüchner, Hermann de Meer,
“Modeling of Self-Organizing Systems: An Overview”,
Universitätsbibliothek TU Berlin, 2010
sunu@ugm.ac.id
Discrete Probability Distributions
• Discrete random variable has a certain
probability of equaling each of its possible
values.
• Example: tossing coin 3 times
• = number of heads (H)
• = {HHH,HHT, HTH, HTT, THH, THT,
TTH, TTT}
• For = 2, = 3/8
• Using formula:
• = =
• 3 = = 3 = 1/8
4
4

17/06/2022
sunu@ugm.ac.id
Probability distribution for discrete variables
• The probability distribution of a random variable is a
function that takes the sample space as input and returns
probabilities: probability value of random variable when
its value equals to
• Set of ordered pairs ( , ( ) ) is a
probability function, or
probability mass function, or
probability distribution
of the discrete random variable, , if:
( ) > 0
∑ ( ) = 1
= = ( )
5
sunu@ugm.ac.id
6
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Probability mass functions (1)
• Another example:
• let’s say that you’re running a dice-rolling
experiment.
• is the random variable corresponding to
this experiment. Assuming that the die is
fair, each outcome is equiprobable
• That is, if you run the experiment a large
number of times, you will get each
outcome approximately the same number
of times.
Probability mass function of the random variable X
corresponding to a die rolling a six-sided die estimated from
20 rolls.
Probability mass function of the random variable X
corresponding to a die rolling a six-sided die estimated from
100,000 rolls.

17/06/2022
sunu@ugm.ac.id
• A stockroom clerk returns three safety
helmets at random to three steel mill
employees who had previously checked
them.
• If Smith (S), Jones (J), and Brown (B), in that
order, receive one of the three hats:
• list the sample points for the possible
orders of returning the helmets
• find the value of random variable that
represents the number of correct matches.
sunu@ugm.ac.id
• A stockroom clerk returns three safety
helmets at random to three steel mill
employees who had previously checked
them.
• If Smith (S), Jones (J), and Brown (B), in that
order, receive one of the three hats:
• list the sample points for the possible
orders of returning the helmets
• find the value of random variable that
represents the number of correct matches.

17/06/2022
sunu@ugm.ac.id
Cumulative Distributions for Discrete
5
For the random variable , the number of correct matches in the previous
example
The cumulative distribution function for is
0
sunu@ugm.ac.id
6
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Another case…
• Statement 1:
What is the probability of tomorrow
morning temperature equals to 10O
Celsius ?
• Statement 2:
What is the probability of tomorrow
morning temperature less or more
than 10O Celsius ?
What is the answer of each statement?
Which statement is more appropriate to be used?

17/06/2022
sunu@ugm.ac.id
Continuous Probability Distributions
• Probability Density Function (PDF)
• Continuous random variable has P = 0
of being exactly any value.
• Example: being exactly 175 cm tall
• ( ℎ =175.000) = 0
• However, can find probability that height
lies within some range. Example:
( ℎ ≥ 175) or
(160 ≤ ℎ ≤ 175)
3
Remember: probability of continuous random variable
must be described over a range
The probability to draw a number between 0 and 0.2 is the
highlighted area under the curve.
sunu@ugm.ac.id
Continuous Distributions
4
P(a < X < b) = f(x)dx
a
b
ò
1
dx
f(x)
-
=
ò


Sum of all f(x) must equal unity
P(a < X < b) must be positive
a b
b)
X
P(a
b)
X
P(a 

=
<
<

17/06/2022
sunu@ugm.ac.id
5
Example
• Given that a random variable, X, has a density function:
• f(x) = 2x, for 0 < x < 1
• f(x) = 0, for all other x
• Verify area under curve = 1.0
2x
0
1
ò dx
=
2x2
2
| 1
0
= 12
- (0)2
f (x)dx
-

ò = + 0
1

ò dx
0dx
-
0
ò +
= x2
| 1
0
= 1.0
1
f(x)
sunu@ugm.ac.id
6
• Given that a random variable, , has a density function:
• f(x) = 2x, for 0 < x < 1
• What is the probability that: -¼ < x < 1/2
Example (2)

17/06/2022
sunu@ugm.ac.id
2x
0
1/2
ò dx
= x2
| 1/2
0
= (1/ 2)2
- (0)2
0dx
-1/4
0
ò +
= ¼ - 0 = ¼
P(-¼ < x < ½) =
• Given that a random variable, , has a density function:
• f(x) = 2x, for 0 < x < 1
• What is the probability that: -¼ < x < 1/2
Example (2)
sunu@ugm.ac.id
Cumulative distributions for
continues random variable
• The cumulative distribution ( ) of a
continuous random variable, , with
density function, ( ), is:
F(a) = P(X  a) = f(x)dx
-
a
ò
P(a<x<b) = F(b) - F(a)
= f(x)dx
-
b
ò - f(x)dx
-
a
ò = f(x)dx
a
b
ò

17/06/2022
sunu@ugm.ac.id
9
Differentiation of ( )
Lets you determine ( ) from ( )
f (x) =
dF(x)
dx
Cumulative distributions for
continues random variable
sunu@ugm.ac.id
10
Example
• Suppose the error in the reaction temperature for a experiment is
a continuous random variable, x, having the cumulative
probability density function of:
• ( ) = x3/9, for –1 < x < 2
• ( ) = 0, for all other x<-1 and x>2
• Thus, expression for density function ( ) for (-1< x < 2)
=
( )
=
/
= /3

17/06/2022
sunu@ugm.ac.id
11
End of File

22/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Probability Distribution
sunu@ugm.ac.id
Discrete uniform distribution
If the random variable assume the values , , … , with equal probabilities,
then the discrete uniform distribution is given by
• When a light bulb is selected at random from a box that
contains a 40-watt bulb, a 60-watt bulb, a 75-watt bulb, and a
100-watt bulb, each element of the sample space
S = {40, 60, 75, 100} occurs with probability 1/4.
Therefore, we have a uniform distribution, with

22/06/2022
sunu@ugm.ac.id
Poisson distribution
1. The number of outcomes occurring in one time interval
or specified region is independent of the number that
occurs in any other disjoint time interval or region of
space. Thus, the Poisson process has no memory.
2. The probability that a single outcome will occur during a
very short time interval or in a small region is
proportional to the length of the time interval or the size
of the region and does not depend on the number of
outcomes occurring outside this time interval or region.
3. The probability that more than one outcome will occur in
such a short time interval or fall in such a small region is
negligible (unimportant)
sunu@ugm.ac.id
Poisson distribution
• Customer care center receives 100 calls
per hour, 8 hours a day.
• As we can see that the calls are
independent of each other.
• The probability of the number of calls per
minute has a Poisson probability
distribution.
• There can be any number of calls per
minute irrespective of the number of calls
received in the previous minute
Source: https://www.cuemath.com/data/poisson-distribution/

22/06/2022
sunu@ugm.ac.id
Binomial distribution
 An experiment often consists of repeated trials, each
with two possible outcomes that may be labeled
success or failure.
 The most obvious application deals with the testing of
items as they come off an assembly line, where each
test or trial may indicate a defective or a non-defective
item (or coin, with head or tail)
 We may choose to define either outcome as a success.
The process is referred to as a Bernoulli process.
Each trial is called a Bernoulli trial.
sunu@ugm.ac.id
1. The experiment consists of n identical trials
2. There are only 2 possible outcomes on each trial.
We will denote one outcome by S (for Success)
and the other by F (for Failure).
3. The probability of S remains the same from trial
to trial. This probability will be denoted by p, and
the probability of F will be denoted by q (q = 1-p).
4. The trials are independent.
5. The binomial random variable is the number of
S in n trials.
Binomial distribution

22/06/2022
sunu@ugm.ac.id
Normal distribution
• Most important continuous probability
distribution
• Graph called the “normal curve”
(bell-shaped).
• Total area under the curve = 1
• Derived by DeMoivre and Gauss.
Hence, it is called the “Gaussian”
distribution.
• Describes many phenomena in nature,
industry and research
7
sunu@ugm.ac.id
Other probability distributions
Source: https://www.kdnuggets.com/2020/02/probability-distributions-data-science.html

22/06/2022
sunu@ugm.ac.id
Why probability distributions?
• Some machine learning models work best under
some distributions assumptions
• For example, these algorithms and function
assume normal distribution: Linear Discriminant
Analysis (LDA), Gaussian Naive Bayes, Logistic
Regression, Linear Regression, Sigmoid
function.
• When you are working with dataset, you are
dealing with sample instead of population.
Probability distributions help you to make
predictions about the whole population.
• Understanding probability distributions help you
to choose appropriate data transformation
methods/ feature extraction techniques. Source: https://www.kdnuggets.com/2020/02/probability-distributions-data-science.html
sunu@ugm.ac.id
10
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Conditional Probability and Bayesian Rule
sunu@ugm.ac.id
Conditional probability
• Let A be an event in the world and B be another event.
Suppose that events A and B are not mutually exclusive /
not separated, but occur conditionally on the occurrence of
the other.
• The probability that event A will occur if event B occurs is
called the conditional probability.
• Conditional probability is denoted mathematically as p(A|B) in
which the vertical bar represents "given" and the complete
probability expression is interpreted as: conditional probability
of event A occurring given that event B has occurred.
A B
=

17/06/2022
sunu@ugm.ac.id
3
Conditional probability
• The number of times A and B can occur, or the probability that
both A and B will occur, is called the joint probability of A and
B. It is represented mathematically as ∩ .
• The number of ways B can occur is the probability of B: ( )
• The probability of an event A, given that an event B has occurred,
is called the conditional probability of A given B and denoted
by the symbol
• The probability of an event B, given that an event A has occurred,
is called the conditional probability of B given A and denoted
by the symbol
=
∩
( )
=
∩
( )
sunu@ugm.ac.id
Two events A and B are independent if and only if
Otherwise, A and B are dependent.
Conditional probability and independence
Independent:
event A dan B bersifat non-mutually exclusive (ada peluang dua-duanya
muncul bersamaan), tapi kemunculan A tidak mempengaruhi B, dan juga
sebaliknya. Misal: kemunculan mata dadu 6 dan bagian tail pada koin jika
kita melempar dadu, lalu melempar koin
= ( )
= ( )

17/06/2022
sunu@ugm.ac.id
5
Bayesian rule
Hence
and
Substituting the first equation into this equation
yields the Bayesian rule (developed by a statistician Thomas Bayes)
∩ = ( | ) × ( )
∩ = ( | ) × ( )
=
∩
( )
=
| × ( )
( )
=
∩
( )
=
∩
( )
Thomas Bayes (1701-1762)
sunu@ugm.ac.id
A B
6
Bayesian rule
• If the occurrence of event A depends on only two mutually exclusive
events (B and NOT B) we obtain marginal probability:
where  is the logical function NOT. Similarly
• Substituting this equation into the Bayesian rule
yields:
( ∩ ¬ )
( ∩ )
= × + ¬ × (¬ )
( ∩ )
( ∩ ¬ )
= × + ¬ × (¬ )
=
× ( )
( )
=
× ( )
× + ¬ × (¬ )

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Implementation Bayesian Reasoning (Part 01)
sunu@ugm.ac.id
A B
2
Bayesian rule
• If the occurrence of event A depends on only two mutually exclusive
events (B and NOT B) we obtain:
where  is the logical function NOT. Similarly
• Substituting this equation into the Bayesian rule
yields:
( ∩ ¬ )
( ∩ )
= × + ¬ × (¬ )
( ∩ )
( ∩ ¬ )
= × + ¬ × (¬ )
=
× ( )
( )
=
× ( )
× + ¬ × (¬ )

17/06/2022
sunu@ugm.ac.id
Dilemma at the movies
• This person dropped their ticket
in the hallway.
• Do you call out “Excuse me,
ma’am!” or “Excuse me, sir!”
• You have to make a guess.
Courtesy of Brandon Rohrer, 2019
sunu@ugm.ac.id
• What if they’re standing in line
for the men’s restroom?
• Bayesian reasoning (a.k.a
Bayesian inference) is a way to
capture common sense.
• It helps you use what you know
to make better guesses.

17/06/2022
sunu@ugm.ac.id
Put numbers to our dilemma
Out of 100 men
at the movies
4 have
long hair
96 have
short hair
Out of 100 women
at the movies
50 have
long hair
50 have
short hair
sunu@ugm.ac.id
• About 12 times more women have long hair than men.
4 have
long hair
96 have
short hair
50 have
long hair
50 have
short hair
Out of 100 men
at the movies
Out of 100 women
at the movies

17/06/2022
sunu@ugm.ac.id
• But there are 98 men and 2 women in line for the men’s restroom.
Out of 98 men
in line
4 have
long hair
94 have
short hair
Out of 2 women
in line
1 has
long hair
1 has
short hair
sunu@ugm.ac.id
• In the line, 4 times more men have long hair than women.
4 have
long hair
94 have
short hair
1 has
long hair
1 has
short hair
Out of 98 men
in line
Out of 2 women
in line

17/06/2022
sunu@ugm.ac.id
50 are men
2 men have long hair
48 men
have
short hair
50 are women
25 women
have
long hair
25 women
have
short hair
Out of 100 people
at the movies
sunu@ugm.ac.id
98 are men
94 men
have
short hair
2 are
women
One
woman
has
long
hair
One
woman
has
short
hair
Out of 100 people
In line for the
men’s restroom

17/06/2022
sunu@ugm.ac.id
Translate to math
P(something) = # something / # everything
P(woman) = Probability that a person is a woman
= # women / # people
= 50 / 100 = .5
P(man) = Probability that a person is a man
= # men / # people
= 50 / 100 = .5
50 are men
50 are women
Out of 100 people
at the movies
sunu@ugm.ac.id
Translate to math
P(something) = # something / # everything
P(woman) = Probability that a person is a woman
= # women / # people
= 2 / 100 = .02
P(man) = Probability that a person is a man
= # men / # people
= 98 / 100 = .98
98 are men
2 are
women
Out of 100 people
In line for the
men’s restroom

17/06/2022
sunu@ugm.ac.id
13
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Conditional probabilities
• P(long hair | woman)
• If I know that a person is a woman,
what is the probability that person has
long hair?
• P(long hair | woman)
• = # women with long hair / # women
• = 25 / 50 = .5
50 are women
25 women
have
long hair
25 women
have
short hair
Out of 100 people
at the movies

17/06/2022
sunu@ugm.ac.id
• If I know that a person is a man,
what is the probability that person
has long hair?
• P(long hair | man)
• = # men with long hair / # men
• = 2 / 50 = .04
50 are men
48 men
have
short hair
Out of 100 people
at the movies
sunu@ugm.ac.id
• P(A | B) is the probability of A, given B.
• “If I know B is the case, what is the
probability that A is also the case?”
• P(A | B) is not the same as P(B | A).
• P(cute | puppy) is not the same as P(puppy | cute)
• If I know the thing I’m holding is a puppy, what is the probability that it is cute?
• If I know the the thing I’m holding is cute, what is the probability that it is a puppy?

17/06/2022
sunu@ugm.ac.id
Joint probabilities
What is the probability that a person
is both a woman and has short hair?
P(woman with short hair)
= P(woman) * P(short hair | woman)
= .5 * .5 = .25
P(man) = .5
P(woman) = .5
Out of probability of 1
P(woman with
short hair) = .25
sunu@ugm.ac.id
Joint probabilities
P(woman with long hair)
= P(woman) * P(long hair | woman)
= .5 * .5 = .25
P(man) = .5
P(woman) = .5
P(woman with
short hair) = .25
P(woman with
long hair) = .25

17/06/2022
sunu@ugm.ac.id
Joint probabilities
P(man with short hair)
= P(man) * P(short hair | man)
= .5 * .96 = .48
P(man) = .5
P(woman) = .5
P(woman with
short hair) = .25
P(woman with
long hair) = .25
P(man with
short hair) = .48
sunu@ugm.ac.id
Joint probabilities
• P(man with long hair)
• = P(man) * P(long hair | man)
• = .5 * .04 = .02
P(man) = .5
P(woman) = .5
P(woman with
short hair) = .25
P(woman with
long hair) = .25
P(man with
short hair) = .48
P(man with
long hair) = .02

17/06/2022
sunu@ugm.ac.id
Joint probabilities
If P(man) = .98 and P(woman) = .02,
then the answers change.
P(man with long hair)
= P(man) * P(long hair | man)
= .98 * .04 = .04
P(man) = .98
P(woman) = .02
P(woman
with short
hair) = .01
P(woman
with long
hair) = .01
P(man with short hair) = .94
P(man with long hair) = .04
sunu@ugm.ac.id
Joint probabilities
• P(woman with long hair)
• = P(woman) * P(long hair | woman)
• = .02 * .5 = .01
P(man) = .98
P(woman) = .02
P(woman
with short
hair) = .01
P(woman
with long
hair) = .01

17/06/2022
sunu@ugm.ac.id
Joint probabilities
• P(A and B) is the probability that both A and B
are the case.
• Also written P(A, B) or P(A ∩ B)
• P(A and B) is the same as P(B and A)
• The probability that I am having a jelly donut with
my milk is the same as the probability that I am
having milk with my jelly donut.
• P(donut and milk) = P(milk and donut)
sunu@ugm.ac.id
Marginal probabilities
P(long hair) = P(woman with long hair) +
P(man with long hair)
= .01 + .04 = .05
P(man) = .98
P(woman) = .02
P(woman
with short
hair) = .01
P(woman
with long
hair) = .01

17/06/2022
sunu@ugm.ac.id
Marginal probabilities
P(short hair) = P(woman with short hair)+
P(man with short hair)
= .01 + .94 = .95
P(man) = .98
P(woman) = .02
P(woman
with short
hair) = .01
P(woman
with long
hair) = .01
sunu@ugm.ac.id
What we really care about
• We know the person has long hair.
Are they a man or a woman?
• P(man | long hair)
• We don’t know this answer yet, but
we already learned about joint
probabilities and marginal
probabilities.

17/06/2022
sunu@ugm.ac.id
Thomas Bayes noticed something cool
• P(man with long hair) = P(long hair) * P(man | long hair)
• P(long hair and man) = P(man) * P(long hair | man)
sunu@ugm.ac.id
• Because P(man and long hair) = P(long hair and man)
• P(long hair) * P(man | long hair) = P(man) * P(long hair | man)

17/06/2022
sunu@ugm.ac.id
Back to the movie theater, this time with Bayes
P(man | long hair) = P(man) * P(long hair | man)
P(long hair)
P(man) = .5
P(woman) = .5
P(long hair | woman) = .5
P(long hair | man) = .04
P(woman with
long hair) = .25
P(man with
long hair) = .02
sunu@ugm.ac.id
P(long hair)
P(woman with long hair) + P(man with long hair)
P(man) = .5
P(woman) = .5
P(woman with
long hair) = .25
P(man with
long hair) = .02

17/06/2022
sunu@ugm.ac.id
P(long hair)
P(man | long hair) = .5 * .04 = .02 / .27 = .07
.25 + .02
P(man) = .5
P(woman) = .5
P(woman with
long hair) = .25
P(man with
long hair) = .02
sunu@ugm.ac.id
Now, knowing that they are in line of
men’s rest room changes the probability
P(man | long hair)

17/06/2022
sunu@ugm.ac.id
P(man) = .98
P(woman) = .02
P(long hair)
P(woman with long hair) = .01
sunu@ugm.ac.id
P(man) = .98
P(woman) = .02
P(long hair)
P(man | long hair) = .98 * .04 = .04 / .05 = .80
.01 + .04
P(woman with long hair) = .01

17/06/2022
sunu@ugm.ac.id
This person dropped their
ticket in front of men’s toilet,
surely you can call:
“Excuse me, sir!”
With 0.80 probability of this
person is man when you
see the hair
sunu@ugm.ac.id
26
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
The Bayesian rule expressed in terms of hypothesis (H) and evidence (E) looks like this:
where:
( | ) probability that hypothesis H is true given evidence E
( ) is the prior probability of hypothesis H being true;
( | ) is the probability that hypothesis H being true will result in evidence E;
(¬ ) is the prior probability of hypothesis H being false;
( |¬ ) is the probability of finding evidence E even when hypothesis H is false.
Prior probability
Likelihood
Posterior probability
Marginal probability
=
× ( )
× + ¬ × (¬H)
Bayesian reasoning
Thomas Bayes (1701-1762)

17/06/2022
sunu@ugm.ac.id
Computer Aided Diagnosis for Covid-19 detection
Abraham, Bejoy, and Madhu S. Nair. "Computer-aided detection of COVID-19 from
X-ray images using multi-CNN and Bayesnet classifier." Biocybernetics and
biomedical engineering 40, no. 4 (2020): 1436-1445.
sunu@ugm.ac.id
• Recently you decide to have a CT-scan test for Covid-19.
If the test is positive, what is the probability you are
infected?
• Suppose you are told the test has a sensitivity of 80%,
which means, if you are infected by Covid-19, the test will
be positive with probability 0.8. In other words
where x = 1 is the event the test is positive, and y = 1 is
the event you are infected by Covid-19.
Note:
x is evidence (E)
y is hypothesis (H)
Source: https://spectrum.ieee.org/hospitals-deploy-ai-tools-detect-covid19-chest-scans

17/06/2022
sunu@ugm.ac.id
• Many people conclude they are therefore 80%
likely to be infected by Covid-19. But this is false! It
ignores the prior probability of having Covid-19, which
fortunately is quite low in June 2022:
• We also need to take into account the fact that the test
may be a false positive or false alarm. Unfortunately,
such false positives are quite likely (with current
screening technology):
Prior probability that someone
is infected by Covid-19
Probability that the test result is positive,
although you aren’t infected
sunu@ugm.ac.id
• Combining these three terms using Bayes rule, we can
compute the correct answer as follows:
Note:
x is evidence (E)
y is hypothesis (H)
=
× ( )
× + ¬ × (¬H)
where = 0 = 1 − = 1 = 0.996.
In other words, if the test’s result is positive, you only have about a 3%
chance of actually infected by Covid-19!
p(x=1 | y=1) : peluang seseorang
terinfeksi Covid-19 dan
hasil tes positif
p(y=1) : peluang seseorang
terinfeksi Covid-19
p(y=0) : peluang seseorang
TIDAK terinfeksi Covid-19
p(x=1 | y=0) : false alarm, seseorang
tidak terinfeksi, tapi
hasil tes menyatakan ia
positif terinfeksi Covid-19

17/06/2022
sunu@ugm.ac.id
• The Computer Aided Diagnosis (CAD) system can then determine whether you are
infected by Covid-19 using a threshold (e.g. corona score) on the probability value.
• For example:
• IF = 1 = 1 > = 80
THEN the CAD will notify the medical doctor that you are infected by Covid-19.
sunu@ugm.ac.id
8
End of File

17/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Bayesian reasoning with multiple hypothesis and evidences
• We can take into account both multiple hypotheses H1, H2,..., Hm and
multiple evidences E1, E2,..., En. The hypotheses as well as the
evidences must be mutually exclusive and exhaustive/comprehensive
• Single evidence E and multiple hypotheses follow:
• Multiple evidences and multiple hypotheses follow:
     
   





m
k
k
k
i
i
i
H
p
H
E
p
H
p
H
E
p
E
H
p
1
     
   





m
k
k
k
n
i
i
n
n
i
H
p
H
E
.
.
.
E
E
p
H
p
H
E
.
.
.
E
E
p
E
.
.
.
E
E
H
p
1
2
1
2
1
2
1
Prior probability
Likelihood

17/06/2022
sunu@ugm.ac.id
• However, this method requires obtaining the conditional
probabilities of all possible combinations of evidences for
all hypotheses. Thus places an enormous / huge burden
on the expert.
• Therefore, in Bayesian reasoning with multiple hypothesis
and evidences, conditional independence among different
evidences assumed.
• Thus, instead of the unworkable equation, we attain:
         
       











m
k
k
k
n
k
k
i
i
n
i
i
n
i
H
p
H
E
p
.
.
.
H
E
p
H
E
p
H
p
H
E
p
H
E
p
H
E
p
E
.
.
.
E
E
H
p
1
2
1
2
1
2
1
.
.
.
Prior probability
Likelihood
sunu@ugm.ac.id
4
• Let us consider a simple example:
• Suppose a CAD (computer aided diagnosis) is given three
conditionally independent evidences E1, E2,..., En, creates three
mutually exclusive and exhaustive (comprehensive) hypotheses
H1, H2,..., Hm, and provides prior probabilities for these hypotheses :
p(H1), p(H2) and p(H3), respectively.
H1= dengue H2=flu H3=malaria
E1=headache E2=cough E3=fever
• The CAD (computer aided diagnosis) system also determines the
conditional probabilities of observing each evidence for all possible
hypotheses.
diseases
symptoms

17/06/2022
sunu@ugm.ac.id
The prior and conditional probabilities
•H1=dengue
•H2=flu
•H3=malaria
•E1=headache, E2=cough, E3=fever.
H y p o t h e s i s
Probability
= 1
i = 2
i = 3
i
0.40
0.9
0.6
0.3
0.35
0.0
0.7
0.8
0.25
0.7
0.9
0.5
 
i
H
p
 
i
H
E
p 1
 
i
H
E
p 2
 
i
H
E
p 3
Peluang terjadinya kasus
penyakit dengue, flu, malaria
Peluang terjadinya sakit kepala
(E1) pada pasien berpenyakit
dengue, flu, malaria
Peluang terjadinya batuk (E2)
pada pasien berpenyakit
Peluang terjadinya demam (E3)
pada pasien berpenyakit
sunu@ugm.ac.id
thus
After evidence E3 is observed, belief in hypothesis H2 decreases and becomes equal to belief in
hypothesis H1 (p(H2) : from 0.35 to 0.34). Belief in hypothesis H3 increases and even nearly reaches
beliefs in hypotheses H1 and H2 (p(H3): from 0.25 to 0.32)
     
   
3
2,
1,
=
,
3
1
3
3
3 i
H
p
H
E
p
H
p
H
E
p
E
H
p
k
k
k
i
i
i





  0.34
25
.
0
9
.
0
+
35
.
0
7
.
0
+
0.40
0.6
0.40
0.6
3
1 





E
H
p
  0.34
25
.
0
9
.
0
+
35
.
0
7
.
0
+
0.40
0.6
35
.
0
7
.
0
3
2 





E
H
p
  0.32
25
.
0
9
.
0
+
35
.
0
7
.
0
+
0.40
0.6
25
.
0
9
.
0
3
3 





E
H
p
Assume that we first observe evidence E3(fever) that is found in a patient.
The CAD system computes the posterior probabilities of each hypothesis as:

17/06/2022
sunu@ugm.ac.id
Suppose now that we also observe evidence E1 (headache) in a patient.
The posterior probabilities are calculated as follows:
hence
Hypothesis H2 (flu) has now become the most likely one (0.52 compared with 0.19 and 0.29).
       
     
3
2,
1,
=
,
3
1
3
1
3
1
3
1 i
H
p
H
E
p
H
E
p
H
p
H
E
p
H
E
p
E
E
H
p
k
k
k
k
i
i
i
i







  0.19
25
.
0
9
.
0
0.5
+
35
.
0
7
.
0
0.8
+
0.40
0.6
0.3
0.40
0.6
0.3
3
1
1 









E
E
H
p
  0.52
25
.
0
9
.
0
0.5
+
35
.
0
7
.
0
0.8
+
0.40
0.6
0.3
35
.
0
7
.
0
0.8
3
1
2 









E
E
H
p
  0.29
25
.
0
9
.
0
0.5
+
35
.
0
7
.
0
0.8
+
0.40
0.6
0.3
25
.
0
9
.
0
0.5
3
1
3 









E
E
H
p
sunu@ugm.ac.id
After observing evidence E2, the final posterior probabilities for all hypotheses are calculated:
hence
Although the initial ranking was H1, H2 and H3, only hypotheses H1 and H3 remain under consideration after all
evidences (E1, E2 and E3) were observed.
In the end, the doctor should than decide whether the patient has got dengue or malaria (malaria is likely as
the probability is higher than dengue, H3 > H1).
         
       
3
2,
1,
=
,
3
1
3
2
1
3
2
1
3
2
1 i
H
p
H
E
p
H
E
p
H
E
p
H
p
H
E
p
H
E
p
H
E
p
E
E
E
H
p
k
k
k
k
k
i
i
i
i
i









  0.45
25
.
0
9
.
0
0.7
0.5
+
35
.
0
7
.
0
0.0
0.8
+
0.40
0.6
0.9
0.3
0.40
0.6
0.9
0.3
3
2
1
1 













E
E
E
H
p
  0
25
.
0
9
.
0
0.7
0.5
+
35
.
0
7
.
0
0.0
0.8
+
0.40
0.6
0.9
0.3
35
.
0
7
.
0
0.0
0.8
3
2
1
2 













E
E
E
H
p
  0.55
25
.
0
9
.
0
0.7
0.5
+
35
.
0
7
.
0
0.0
0.8
+
0.40
0.6
0.9
0.3
25
.
0
9
.
0
0.7
0.5
3
2
1
3 













E
E
E
H
p

17/06/2022
sunu@ugm.ac.id
What about sentiment analysis using social media?
You can do similar computation, given that :
H1= neutral
H2= positive sentiment
H3= negative sentiment
E1= occurence of word “illegal”
E2= occurence of word “reform”
E3= occurence of word “crime”
Ek= occurence of word .......
The engineer should provide information of
likelihood and prior probability
sunu@ugm.ac.id
10
End of File

Modul Topik 6 - Kecerdasan Buatan.pdf

Recommended

Recommended

More Related Content

Similar to Modul Topik 6 - Kecerdasan Buatan.pdf

Similar to Modul Topik 6 - Kecerdasan Buatan.pdf (20)

Recently uploaded

Recently uploaded (20)

Modul Topik 6 - Kecerdasan Buatan.pdf