STAB52 Lecture Notes (Week 2)

STAB52 - An Introduction to Probability (Week 2) Danny Cao
STAB52 - An Introduction to Probability
Week 2 Lecture Notes
Here are four more combinatorics problems to further your understanding of the topic:
Example (5 card hands)
Suppose we are dealt five cards from an ordinary 52-card deck. What is the probability that
(a) we get all four aces, plus the king of spades?
(b) all five cards are spades?
(c) we get no pairs (i.e. all five cards are different values)?
(d) we get a full house (i.e. three cards of a kind, plus a different pair)?
Solution: (a) If we omit order of the hands (i.e. (A♦, A♣, A♥, A♠, K♠) and (A♣, A♥, A♦, K♠, A♠) are
the same hands) then desired probability is
1
52
5
.
(b) There are 52
5 possible 5-card hands (omitting order) and 13 spaded cards to choose from. Thus, the
desired probability is
13
5
52
5
.
(c) If we do count the order, then the possible 5 card combinations is 52·51·50·49·48 since subsequent cards
cannot be repeats of any previous ones. On the other hand, there are 52 · 48 · 44 · 40 · 36 5 card combinations
which contain no pairs (i.e. every subsequence card cannot repeat the face value of any previous card).
Combining these results, the probability of obtaining no pairs is
52 · 48 · 44 · 40 · 36
52 · 51 · 50 · 49 · 48
.
(d) The size of the sample space of all 5 card combinations (omitting order) is 52
5 . Now to count the total
number of full house hands, we must first pick a face value which will be our triple (There are 13
1 different
face values). Then for each of these face values, we must pick a triple from the four possible suits of a given
face (i.e. 4
3 possible triples). Now, for each of these triples, we must choose from the 12
1 remaining face
values, a new face value that will be our pair. For each of these face values, we must choose 4
2 cards from
the 4 possible suits. By the multiplication principle, the probability of a full-house is then
13
1
4
3
12
1
4
2
52
5
.
1.5 Probability: Conditional Probability and Independence
1

We motivate the study of conditional probability as follow with the following infamous problem:
Example (Monty Hall)
Suppose you’re on a game show and you’re given the choice of three doors: Behind one door is a car; behind
the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens
another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to
your advantage to switch your choice?
Solution: If you do not switch your guess, the probability of winning a car is 1/3. If you always switch your
guess, the you win provided that your initial pick is a goat; an event which occurs with probability of 2/3.
Thus, we see how introducing information (sometimes very subtly) can change the probability of an event.
In the previous example, we saw how being given additional information or “conditioning” on certain
events can lead to differences in probabilities. Once the host opened a door revealing a goat, the probability
of our initial prediction being correct changed from 1/3 top 2/3 upon switching. We now give a rigorous
definition to this idea.
Definition 1.5.1 Given two events A and B, with P(B) > 0, the conditional probability of A given B
is equal to
P(A|B) =
P(A ∩ B)
P(B)
.
We can think of P(A|B) as the proportion of times when B occurs, that A also occurs.
Example: (Conditional coins)
Suppose we flip three fair coins. What is the probability that all three coins show heads? What is the
probability that all three coins show heads given that the first two show heads?
Solution: Let A be the event that the first two coins show heads. Let B be the event that the event that
all three coins show heads. We then compute
P(A|B) =
P(A ∩ B)
P(B)
=
P(A)
P(B)
=
1
8
2
8
=
1
2
.
Is this result surprising? Or could we have guess it without invoking the definition of conditional probability?
Using the definition of conditional probability, we may rewrite our original law of total probability
(Theorem 1.3.1) in terms of conditional probabilities
Theorem 1.5.1 (Law of total probability, conditioned version) Let A1, A2, ... be events that form a
partition of the sample space S, each of positive probability. Let B be any event. Then
P(B) =
∞
n=1
P(An)P(B|An).
Proof: By the ordinary law of total probability, we have
P(B) =
∞
n=1
P(An ∩ B).
Making the substitution P(An)P(B|An) = P(An ∩ B) then yields the result.
2

This version of the law of total probability is very useful as it is often much simpler to compute probabili-
ties once we condition on the correct choice a partitioning events. This is illustrated in the following example.
Example (A die and three coins)
Suppose we roll a fair six-sided die and flip three fair coins. What is the probability that the total number
of heads is equal to the number showing on the die?
Solution: Let X denote the value that appears on the die. Let S be the total number of heads obtained.
Then {S = k}6
k=1 partition the sample space and so by the law of total probability (conditioned version),
we compute
P(S = X) =
6
k=1
P(S = X|X = k)P(X = k)
=
1
6
6
k=1
P(S = X|X = k)
=
1
6
6
k=1
P(S = k)
=
1
6
3
8
+
3
8
+
1
8
+ 0 + 0 + 0
=
7
48
.
We conclude this discussion of condition probability with a handy little formula for reversing the order
of conditioning events i.e. finding P(A|B) from P(B|A) and vice versa.
Theorem 1.5.2 (Baye’s theorem) Let A and B be two events, each of positive probability. Then
P(A|B) =
P(A)
P(B)
P(B|A).
Proof: From the definition of conditional probability, we know
P(A|B) =
P(A ∩ B)
P(B)
and
P(B|A) =
P(A ∩ B)
P(A)
.
Rewriting the second identity as P(A ∩ B) = P(A)P(B|A) and substituting this into the first identity then
give Baye’s theorem.
One special case of conditional probability is when conditioning on one event has no affect on the prob-
ability of the other i.e. P(A|B) = P(A). In this case, knowing that event B occurred provides no further
information about the likeliness of A occurring. One might describe such events as being “independent”.
This discussion motivates the following definition:
3

Definition 1.5.2 Two events A and B are independent if
P(A ∩ B) = P(A) · P(B).
More generally, for multiple events, we define...
Definition 1.5.3 A collection of events {An}∞
n=1 are independent if
P(Ai1
∩ · · · ∩ Aij
) = P(Ai1
) · · · P(Aij
)
for any subcollection Ai1
, ..., Aij
of distinct events.
In particular, we note that if A and B are independent events such that P(B) > 0, then
P(A|B) =
P(A)P(B)
P(B)
= P(A).
Thus, conditioning on an independent event has no influence.
Example (Bad luck)
Suppose that in a city which is prone to natural disasters, the probability of a flood in any given month is
0.5 while the probability of a tornado is 0.4. Furthermore, suppose that the probability of either a flood or
a tornado is 0.7. Are the tornadoes and floods which occur in the city independent?
Solution: Let T be the event that a tornado occurs and F be the event that a flood occurs. Then notice
by the inclusion-exclusion principle that
P(T ∩ F) = P(T) + P(F) − P(T ∪ F)
= 0.4 + 0.5 − 0.7
= 0.2
= P(T)P(F).
Thus, the occurrence of floods and tornadoes are independent in the city.
Example (0 to 100)
Suppose we flipped a fair coin 100 times and recorded the values showing. What is the probability model
here? Let A be the event that the first 99 flips are shows heads and let B be the event that the 100th
flip is
a head. Are A and B independent?
Solution: Notice that P(A) = 2−99
and P(B) = 1/2. Thus, P(A)P(B) = 2−100
. On the other hand,
P(A ∩ B) = P(all coins are H) =
1
2100
.
Exercise (“Due to win”)
Test your hypothesis to the above example using an R simulation (maybe for 10 flips rather than 100 flips for
the sake of limited computing power). For example, you can generate a very large number of coin flips and
then record the proportion of heads which appear after 9 consecutive heads. Does this proportion approach
1/2? Interpret this result as advice that you can give to an addicted gambler e.g. If someone repeatedly
bets on the same outcome believing that is is “due” to appear soon.
4

Example (Independence of compliments)
If A and B are independent events, show that Ac
and B are independent as well.
Solution: If A and B are independent, then P(A ∩ B) = P(A) · P(B). Using this and the law of total
probability, we compute
P(Ac
∩ B) = P(B) − P(A ∩ B)
= P(B) − P(A) · P(B)
= P(B) · (1 − P(A))
= P(B) · P(Ac
)
and thus, Ac
and B are independent.
Exercises
Evans & Rosenthal 1.5.1, 1.5.2, 1.5.4, 1.5.5, 1.5.7, 1.5.9, 1.5.14
1.6 Continuity of P
Recall from calculus that given a sequence of real numbers {an}∞
n=1, the limit of this sequence equals
L (denoted limn→∞ an = L) if for every > 0, there exists a N ∈ N such that for all n > N, we have
|an − L| < . This is just a fancy way of saying that {an} gets arbitrarily close to L and stays arbitrarily
close. In set theory, there are analogous ways of considering a “limit of sets”. Namely, given a sequence of
events {An}∞
n=1, we can speak of
∞
n=1
An and
∞
n=1
An,
which are again sets (or events in a probabilistic context). To calculate the probabilities of such events
(under some minor assumptions) requires the “continuity of probability”.
Theorem 1.6.1 Let {An}∞
n=1 be events such that either An ⊆ An+1 for all n ∈ N (resp. An+1 ⊆ An
for all n). Then
lim
n→∞
P(An) = P
∞
n=1
An resp. lim
n→∞
P(An) = P
∞
n=1
An .
So the probability of an inﬁnite union or intersection of events is the limit of their probabilities provided
they are “growing” or “shrinking”. This is a rather abstract idea so let us examine two concrete examples.
Example (Uniform distribution)
Let S = [0, 1] and P be a probability measure such that P([a, b] = b−a for all 0 ≤ a < b ≤ 1 (this is called the
uniform distribution on the unit interval). Prove that P({1/2}) = 0 for ﬁrst using continuity of probability
and then using monotonicity. Does this imply that the value {1/2} can never occur?
Solution Let An = [1/2 − 1/n, 1/2 + 1/n] for n ≥ 2. Then An+1 ⊆ An for all n and ∩∞
n=1An = {1/2}.
Moreover,
P(An) = P
1
2
−
1
n
,
1
2
+
1
n
=
2
n
.
5

Thus, by the continuity of probability measure, we have
P({1/2}) = lim
n→∞
P(An) = lim
n→∞
2
n
= 0.
Alternatively, we may define A = [1/2 − , 1/2 + ]. Then for every > 0, we have {1/2} ⊆ Ae. So using
monotonicity and probability axiom 1, we compute
0 ≤ P({1/2}) ≤ P(A ) = 2 .
Since this inequality holds for arbitrary > 0, we conclude that P({1/2}) = 0.
Just because the probability of an event is 0, that does NOT mean that it can never occur. It just means
that the probability is so small that the only meaningful and non-contradictory probability it can have is 0.
Example (The Cantor set)
The standard Cantor set is created by repeatedly deleting the open middle third of a set of line segments.
One starts by deleting the open middle third (1/3, 2/3) from the interval [0, 1], leaving two line segments:
[0, 1/3]∪[2/3, 1]. Next, the open middle third of each of these remaining line segments is deleted, leaving four
new line segments: [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]. This processes is repeated infinitely many times
and what remains of the unit interval is called the Cantor set. More explicitly, the initial set is C0 = [0, 1]
and then the nth
iteration is defined as
Cn =
Cn−1
3
∪
2
3
+
Cn−1
3
.
If we impose the uniform distribution on the unit interval, what is P(C)?
Solution: If we let C denote the Cantor set, then we may write
C =
∞
n=0
Cn
where Cn+1 ⊆ Cn for all n ≥ 0. Thus, {Cn} C and so we may use continuity of probability to compute
P(C) = lim
n→∞
P(Cn) = lim
n→∞
2
3
n
= 0.
Amazingly, it can be shown that C is uncountable (math majors should try to prove this). Recall that
[0, 1] is also uncountable... This means that even though C is the same “size” as the unit interval, it has
probability 0... What a crazy result!
As the two above examples illustrate, continuity of probability allows us to indirectly evaluated the prob-
ability of events which we may not be able to directly compute using our given probability measure. In the
second example, we were able to compute the probability of the Cantor set even though we have absolutely
no idea what it looks like. This is an extremely powerful tool.
Exercises
Evans & Rosenthal 1.6.1, 1.6.3, 1.6.7, 1.6.10
6

STAB52 Lecture Notes (Week 2)

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to STAB52 Lecture Notes (Week 2)

Similar to STAB52 Lecture Notes (Week 2) (20)

STAB52 Lecture Notes (Week 2)