3. • Example: Soccer Game
• You are off to soccer, and love being the Goalkeeper, but that
• depends who is the Coach today:
• with Coach Sam the probability of being Goalkeeper is 0.5
• with Coach Alex the probability of being Goalkeeper is 0.3
• Sam is Coach more often ... about 6 out of every 10 games (a probability of 0.6).
• So, what is the probability you will be a Goalkeeper today?
• Let's build the tree diagram. First we show the two possible coaches: Sam or Alex:
4. • The probability of getting Sam is 0.6, so the probability of Alex must
be 0.4 (together the
• probability is 1)
• Now, if you get Sam, there is 0.5 probability of being Goalie (and 0.5
of not being Goalie):
If you get Alex, there is 0.3 probability of being Goalie (and 0.7 not):
5. • (When we take the 0.6 chance of Sam being coach and include the 0.5
chance that Sam will
• let you be Goalkeeper we end up with an 0.3 chance.)
• But we are not done yet! We haven't included Alex as Coach:
• An 0.4 chance of Alex as Coach, followed by an 0.3 chance gives 0.12.
• Now we add the column:
• 0.3 + 0.12 = 0.42 probability of being a Goalkeeper today
• (That is a 42% chance)
12. Fundamentals for Bayes Theorem
● 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
● P(A) = 0, indicates total uncertainty in an event A.
● P(A) =1, indicates total certainty in an event A.
● Event: Each possible outcome of a variable is called an event.
● Sample space: The collection of all possible events is called sample
space.
● Random variables: Random variables are used to represent the events
and objects in the real world.
● Prior probability: The prior probability of an event is probability
computed before observing new information.
● Posterior Probability: The probability that is calculated after all
evidence or information has taken into account. It is a combination of
prior probability and new information.
13. ● Bayes' theorem is also known as Bayes' rule, Bayes' law,
or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge.
● In probability theory, it relates the conditional
probability and marginal probabilities of two random
events.
● Bayes' theorem was named after the British
mathematician Thomas Bayes. The Bayesian inference
is an application of Bayes' theorem, which is
fundamental to Bayesian statistics.
● It is a way to calculate the value of P(B|A) with the
knowledge of P(A|B).
14. • Bayes' Theorem is a way of finding a probability when we know certain other
probabilities. The formula is:
• P(A|B) = P(A) P(B|A)/P(B)
• Which tells us: how often A happens given that B happens, written P(A|B),
• When we know: how often B happens given that A happens, written P(B|A)
• and how likely A is on its own, written P(A)
• and how likely B is on its own, written P(B)
15. Bayes's theorem is expressed mathematically by the following equation that is
given below.
𝐏 𝐗 ∣ 𝐘 =
𝐏 𝐘∣𝐗 𝐏 𝐗
𝐏 𝐘
Where X and Y are the events and P (Y) ≠ 0
P(X/Y) is a conditional probability that describes the occurrence of event X is
given that Y is true.
P(Y/X) is a conditional probability that describes the occurrence of event Y is
given that X is true.
P(X) and P(Y) are the probabilities of observing X and Y independently of each
other. This is known as the marginal probability.
Let us say P(Fire) means how often there is fire, and P(Smoke) means how often we see
smoke, then:
P(Fire|Smoke) means how often there is fire when we can see smoke
P(Smoke|Fire) means how often we can see smoke when there is fire
So the formula kind of tells us "forwards" P(Fire|Smoke) when we know "backwards"
P(Smoke|Fire)
16. Example:
dangerous fires are rare (1%)
but smoke is fairly common (10%) due to barbecues,
and 90% of dangerous fires make smoke
We can then discover the probability of dangerous Fire when there is Smoke:
P(Fire|Smoke) = P(Fire) P(Smoke|Fire)/P(Smoke)
= 1% x 90%10%
= 9%
17.
18. ● Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
● As from product rule we can write: P(A ⋀ B)= P(A|B) P(B) or
● Similarly, the probability of event B with known event A: P(A ⋀ B)= P(B|A) P(A)
● According to above statements the equation for Bayes’ rule or Bayes’ theorem is as
below:
𝐏 𝐀 ∣ 𝐁 =
𝐏 𝐁∣𝐀 𝐏 𝐀
𝐏 𝐁
● P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
● P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
● P(A) is called the prior probability, probability of hypothesis before considering the
evidence
● P(B) is called marginal probability, pure probability of an evidence.
● In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:
𝐏 𝐀𝐢 ∣ 𝐁 =
𝐏 𝐀𝐢 𝐏 𝐁∣𝐀𝐢
𝐢=𝟏
𝐊 𝐏 𝐀𝐢 ∗𝐏 𝐁∣𝐀𝐢
Where A1, A2, A3, ..............., An is a set of mutually exclusive and exhaustive events.
19. Example: In a particular pain clinic, 10% of patients are prescribed narcotic pain
killers. Overall, 5% of the clinic's patients are addicted to narcotics(including
pain killers and illegal substances). Out of all the people restricted pain pills, 8%
are addicts. If a patient is an addict, what is the probability that they will be
prescribed pain pills?
A: Being prescribed pain pills 10%
B: Being an addict 5%
B|A: Out of all 8% addict for narcotics.
P(A|B)=P(B|A)P(A)/P(B)
P(A|B)=(0.08 * 0.1)/0.05 =0.16
That means
If a patient is an addict, then there are 16% chances to prescribe pain pills.
20.
21. ● According to the given statement the below are statistics:
● The prior probability of having cancer among population is P(Cancer) Which is
P(Cancer)=0.08
● The prior probability of not having cancer among population is P(ךCancer) Which is
P(ךCancer)=1-0.08=0.992
● The cancer test results are accurate up to 98%. It is indicated as P(+|Cancer)=0.98 and
2% of results are negative such as: P(|ךCancer)=0.2(1-0.98)
● Probability of positive result for no cancer is P(+|ךCancer)=1-0.97=0.3
● Probability of negative result for no cancer is P(-|ךCancer)=0.97
22. Bayes’ Rule
Let S1 , S2 , S3 ,..., Sk be mutually exclusive and
exhaustive events with prior probabilities P(S1),
P(S2),…,P(Sk). If an event A occurs, the posterior
probability of Si, given that A occurred is
,...k
,
i
S
A
P
S
P
S
A
P
S
P
A
S
P
i
i
i
i
i 2
1
for
)
|
(
)
(
)
|
(
)
(
)
|
(
)
|
(
)
(
)
|
(
)
(
)
(
)
(
)
|
(
)
|
(
)
(
)
(
)
(
)
(
)
|
(
Proof
i
i
i
i
i
i
i
i
i
i
i
i
S
A
P
S
P
S
A
P
S
P
A
P
AS
P
A
S
P
S
A
P
S
P
AS
P
S
P
AS
P
S
A
P
23. We know:
P(F) =
P(M) =
P(H|F) =
P(H|M) =
Example
From a previous example, we know that 49% of the
population are female. Of the female patients, 8% are
high risk for heart attack, while 12% of the male patients
are high risk. A single person is selected at random and
found to be high risk. What is the probability that it is a
male? Define H: high risk F: female M: male
61
.
)
08
(.
49
.
)
12
(.
51
.
)
12
(.
51
.
)
|
(
)
(
)
|
(
)
(
)
|
(
)
(
)
|
(
F
H
P
F
P
M
H
P
M
P
M
H
P
M
P
H
M
P
.12
.08
.51
.49
24. Example
Suppose a rare disease infects one out of every
1000 people in a population. And suppose that
there is a good, but not perfect, test for this
disease: if a person has the disease, the test
comes back positive 99% of the time. On the other
hand, the test also produces some false positives:
2% of uninfected people are also test positive.
And someone just tested positive. What are his
chances of having this disease?
25. We know:
P(A) = .001 P(Ac) =.999
P(B|A) = .99 P(B|Ac) =.02
Example
Define A: has the disease B: test positive
0472
.
02
.
999
.
99
.
001
.
99
.
001
.
)
|
(
)
(
)
|
(
)
(
)
|
(
)
(
)
|
(
c
A
B
P
c
A
P
A
B
P
A
P
A
B
P
A
P
B
A
P
We want to know P(A|B)=?
26. Example
A survey of job satisfaction2 of teachers was
taken, giving the following results
2 “Psychology of the Scientist: Work Related Attitudes of U.S. Scientists”
(Psychological Reports (1991): 443 – 450).
Satisfied Unsatisfied Total
College 74 43 117
High School 224 171 395
Elementary 126 140 266
Total 424 354 778
Job Satisfaction
L
E
V
E
L
27. Example
If all the cells are divided by the total number
surveyed, 778, the resulting table is a table of
empirically derived probabilities.
Satisfied Unsatisfied Total
College 0.095 0.055 0.150
High School 0.288 0.220 0.508
Elementary 0.162 0.180 0.342
Total 0.545 0.455 1.000
L
E
V
E
L
Job Satisfaction
28. Example
For convenience, let C stand for the event that the
teacher teaches college, S stand for the teacher being
satisfied and so on. Let’s look at some probabilities
and what they mean.
is the proportion of teachers who are college teachers.
P(C) 0.150
is the proportion of teachers who are satisfied with
their job.
P(S) 0.545
is the proportion of teachers who are college teachers
and who are satisfied with their job.
P(C S) 0.095
Satisfied Unsatisfied Total
College 0.095 0.055 0.150
High School 0.288 0.220 0.508
Elementary 0.162 0.180 0.342
Total 0.545 0.455 1.000
L
E
V
E
L
Job Satisfaction
29. Example
is the proportion of teachers who are college
teachers given they are satisfied. Restated:
This is the proportion of satisfied that are
college teachers.
P(C S)
P(C | S)
P(S)
0.095
0.175
0.545
is the proportion of teachers who are satisfied
given they are college teachers. Restated:
This is the proportion of college teachers that
are satisfied.
P(S C)
P(S | C)
P(C)
P(C S) 0.095
P(C) 0.150
0.632
Satisfied Unsatisfied Total
College 0.095 0.055 0.150
High School 0.288 0.220 0.508
Elementary 0.162 0.180 0.342
Total 0.545 0.455 1.000
L
E
V
E
L
Job Satisfaction
30. Example
P(C S) 0.095
P(C) 0.150 and P(C | S) 0.175
P(S) 0.545
Satisfied Unsatisfied Total
College 0.095 0.055 0.150
High School 0.288 0.220 0.508
Elementary 0.162 0.180 0.342
Total 0.545 0.455 1.000
L
E
V
E
L
Job Satisfaction
P(C|S) P(C) so C and S are dependent events.
Are C and S independent events?
31. Example
Satisfied Unsatisfied Total
College 0.095 0.055 0.150
High School 0.288 0.220 0.508
Elementary 0.162 0.180 0.658
Total 0.545 0.455 1.000
L
E
V
E
L
Job Satisfaction
P(C) = 0.150, P(S) = 0.545 and
P(CS) = 0.095, so
P(CS) = P(C)+P(S) - P(CS)
= 0.150 + 0.545 - 0.095
= 0.600
P(CS)?
32. Tom and Dick are going to take
a driver's test at the nearest DMV office.
Tom estimates that his chances to pass the
test are 70% and Dick estimates his as
80%. Tom and Dick take their tests
independently.
Define D = {Dick passes the driving test}
T = {Tom passes the driving test}
T and D are independent.
P (T) = 0.7, P (D) = 0.8
Example
33. What is the probability that at most one of the two friends will pass the test?
Example
P(At most one person pass)
= P(Dc Tc) + P(Dc T) + P(D Tc)
= (1 - 0.8) (1 – 0.7) + (0.7) (1 – 0.8) + (0.8) (1 –
0.7)
= .44
P(At most one person pass)
= 1-P(both pass) = 1- 0.8 x 0.7 = .44
34. What is the probability that at least one of the two friends will pass the test?
Example
P(At least one person pass)
= P(D T)
= 0.8 + 0.7 - 0.8 x 0.7
= .94
P(At least one person pass)
= 1-P(neither passes) = 1- (1-0.8) x (1-0.7) =
.94
35. Suppose we know that only one of the two friends passed the test. What is
the probability that it was Dick?
Example
P(D | exactly one person passed)
= P(D exactly one person passed) / P(exactly
one person passed)
= P(D Tc) / (P(D Tc) + P(Dc T) )
= 0.8 x (1-0.7)/(0.8 x (1-0.7)+(1-.8) x 0.7)
= .63