SlideShare a Scribd company logo
1 of 6
Download to read offline
STAB52 - An Introduction to Probability (Week 2) Danny Cao
STAB52 - An Introduction to Probability
Week 2 Lecture Notes
Here are four more combinatorics problems to further your understanding of the topic:
Example (5 card hands)
Suppose we are dealt five cards from an ordinary 52-card deck. What is the probability that
(a) we get all four aces, plus the king of spades?
(b) all five cards are spades?
(c) we get no pairs (i.e. all five cards are different values)?
(d) we get a full house (i.e. three cards of a kind, plus a different pair)?
Solution: (a) If we omit order of the hands (i.e. (A♦, A♣, A♥, A♠, K♠) and (A♣, A♥, A♦, K♠, A♠) are
the same hands) then desired probability is
1
52
5
.
(b) There are 52
5 possible 5-card hands (omitting order) and 13 spaded cards to choose from. Thus, the
desired probability is
13
5
52
5
.
(c) If we do count the order, then the possible 5 card combinations is 52·51·50·49·48 since subsequent cards
cannot be repeats of any previous ones. On the other hand, there are 52 · 48 · 44 · 40 · 36 5 card combinations
which contain no pairs (i.e. every subsequence card cannot repeat the face value of any previous card).
Combining these results, the probability of obtaining no pairs is
52 · 48 · 44 · 40 · 36
52 · 51 · 50 · 49 · 48
.
(d) The size of the sample space of all 5 card combinations (omitting order) is 52
5 . Now to count the total
number of full house hands, we must first pick a face value which will be our triple (There are 13
1 different
face values). Then for each of these face values, we must pick a triple from the four possible suits of a given
face (i.e. 4
3 possible triples). Now, for each of these triples, we must choose from the 12
1 remaining face
values, a new face value that will be our pair. For each of these face values, we must choose 4
2 cards from
the 4 possible suits. By the multiplication principle, the probability of a full-house is then
13
1
4
3
12
1
4
2
52
5
.
1.5 Probability: Conditional Probability and Independence
1
STAB52 - An Introduction to Probability (Week 2) Danny Cao
We motivate the study of conditional probability as follow with the following infamous problem:
Example (Monty Hall)
Suppose you’re on a game show and you’re given the choice of three doors: Behind one door is a car; behind
the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens
another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to
your advantage to switch your choice?
Solution: If you do not switch your guess, the probability of winning a car is 1/3. If you always switch your
guess, the you win provided that your initial pick is a goat; an event which occurs with probability of 2/3.
Thus, we see how introducing information (sometimes very subtly) can change the probability of an event.
In the previous example, we saw how being given additional information or “conditioning” on certain
events can lead to differences in probabilities. Once the host opened a door revealing a goat, the probability
of our initial prediction being correct changed from 1/3 top 2/3 upon switching. We now give a rigorous
definition to this idea.
Definition 1.5.1 Given two events A and B, with P(B) > 0, the conditional probability of A given B
is equal to
P(A|B) =
P(A ∩ B)
P(B)
.
We can think of P(A|B) as the proportion of times when B occurs, that A also occurs.
Example: (Conditional coins)
Suppose we flip three fair coins. What is the probability that all three coins show heads? What is the
probability that all three coins show heads given that the first two show heads?
Solution: Let A be the event that the first two coins show heads. Let B be the event that the event that
all three coins show heads. We then compute
P(A|B) =
P(A ∩ B)
P(B)
=
P(A)
P(B)
=
1
8
2
8
=
1
2
.
Is this result surprising? Or could we have guess it without invoking the definition of conditional probability?
Using the definition of conditional probability, we may rewrite our original law of total probability
(Theorem 1.3.1) in terms of conditional probabilities
Theorem 1.5.1 (Law of total probability, conditioned version) Let A1, A2, ... be events that form a
partition of the sample space S, each of positive probability. Let B be any event. Then
P(B) =
∞
n=1
P(An)P(B|An).
Proof: By the ordinary law of total probability, we have
P(B) =
∞
n=1
P(An ∩ B).
Making the substitution P(An)P(B|An) = P(An ∩ B) then yields the result.
2
STAB52 - An Introduction to Probability (Week 2) Danny Cao
This version of the law of total probability is very useful as it is often much simpler to compute probabili-
ties once we condition on the correct choice a partitioning events. This is illustrated in the following example.
Example (A die and three coins)
Suppose we roll a fair six-sided die and flip three fair coins. What is the probability that the total number
of heads is equal to the number showing on the die?
Solution: Let X denote the value that appears on the die. Let S be the total number of heads obtained.
Then {S = k}6
k=1 partition the sample space and so by the law of total probability (conditioned version),
we compute
P(S = X) =
6
k=1
P(S = X|X = k)P(X = k)
=
1
6
6
k=1
P(S = X|X = k)
=
1
6
6
k=1
P(S = k)
=
1
6
3
8
+
3
8
+
1
8
+ 0 + 0 + 0
=
7
48
.
We conclude this discussion of condition probability with a handy little formula for reversing the order
of conditioning events i.e. finding P(A|B) from P(B|A) and vice versa.
Theorem 1.5.2 (Baye’s theorem) Let A and B be two events, each of positive probability. Then
P(A|B) =
P(A)
P(B)
P(B|A).
Proof: From the definition of conditional probability, we know
P(A|B) =
P(A ∩ B)
P(B)
and
P(B|A) =
P(A ∩ B)
P(A)
.
Rewriting the second identity as P(A ∩ B) = P(A)P(B|A) and substituting this into the first identity then
give Baye’s theorem.
One special case of conditional probability is when conditioning on one event has no affect on the prob-
ability of the other i.e. P(A|B) = P(A). In this case, knowing that event B occurred provides no further
information about the likeliness of A occurring. One might describe such events as being “independent”.
This discussion motivates the following definition:
3
STAB52 - An Introduction to Probability (Week 2) Danny Cao
Definition 1.5.2 Two events A and B are independent if
P(A ∩ B) = P(A) · P(B).
More generally, for multiple events, we define...
Definition 1.5.3 A collection of events {An}∞
n=1 are independent if
P(Ai1
∩ · · · ∩ Aij
) = P(Ai1
) · · · P(Aij
)
for any subcollection Ai1
, ..., Aij
of distinct events.
In particular, we note that if A and B are independent events such that P(B) > 0, then
P(A|B) =
P(A)P(B)
P(B)
= P(A).
Thus, conditioning on an independent event has no influence.
Example (Bad luck)
Suppose that in a city which is prone to natural disasters, the probability of a flood in any given month is
0.5 while the probability of a tornado is 0.4. Furthermore, suppose that the probability of either a flood or
a tornado is 0.7. Are the tornadoes and floods which occur in the city independent?
Solution: Let T be the event that a tornado occurs and F be the event that a flood occurs. Then notice
by the inclusion-exclusion principle that
P(T ∩ F) = P(T) + P(F) − P(T ∪ F)
= 0.4 + 0.5 − 0.7
= 0.2
= P(T)P(F).
Thus, the occurrence of floods and tornadoes are independent in the city.
Example (0 to 100)
Suppose we flipped a fair coin 100 times and recorded the values showing. What is the probability model
here? Let A be the event that the first 99 flips are shows heads and let B be the event that the 100th
flip is
a head. Are A and B independent?
Solution: Notice that P(A) = 2−99
and P(B) = 1/2. Thus, P(A)P(B) = 2−100
. On the other hand,
P(A ∩ B) = P(all coins are H) =
1
2100
.
Exercise (“Due to win”)
Test your hypothesis to the above example using an R simulation (maybe for 10 flips rather than 100 flips for
the sake of limited computing power). For example, you can generate a very large number of coin flips and
then record the proportion of heads which appear after 9 consecutive heads. Does this proportion approach
1/2? Interpret this result as advice that you can give to an addicted gambler e.g. If someone repeatedly
bets on the same outcome believing that is is “due” to appear soon.
4
STAB52 - An Introduction to Probability (Week 2) Danny Cao
Example (Independence of compliments)
If A and B are independent events, show that Ac
and B are independent as well.
Solution: If A and B are independent, then P(A ∩ B) = P(A) · P(B). Using this and the law of total
probability, we compute
P(Ac
∩ B) = P(B) − P(A ∩ B)
= P(B) − P(A) · P(B)
= P(B) · (1 − P(A))
= P(B) · P(Ac
)
and thus, Ac
and B are independent.
Exercises
Evans & Rosenthal 1.5.1, 1.5.2, 1.5.4, 1.5.5, 1.5.7, 1.5.9, 1.5.14
1.6 Continuity of P
Recall from calculus that given a sequence of real numbers {an}∞
n=1, the limit of this sequence equals
L (denoted limn→∞ an = L) if for every > 0, there exists a N ∈ N such that for all n > N, we have
|an − L| < . This is just a fancy way of saying that {an} gets arbitrarily close to L and stays arbitrarily
close. In set theory, there are analogous ways of considering a “limit of sets”. Namely, given a sequence of
events {An}∞
n=1, we can speak of
∞
n=1
An and
∞
n=1
An,
which are again sets (or events in a probabilistic context). To calculate the probabilities of such events
(under some minor assumptions) requires the “continuity of probability”.
Theorem 1.6.1 Let {An}∞
n=1 be events such that either An ⊆ An+1 for all n ∈ N (resp. An+1 ⊆ An
for all n). Then
lim
n→∞
P(An) = P
∞
n=1
An resp. lim
n→∞
P(An) = P
∞
n=1
An .
So the probability of an infinite union or intersection of events is the limit of their probabilities provided
they are “growing” or “shrinking”. This is a rather abstract idea so let us examine two concrete examples.
Example (Uniform distribution)
Let S = [0, 1] and P be a probability measure such that P([a, b] = b−a for all 0 ≤ a < b ≤ 1 (this is called the
uniform distribution on the unit interval). Prove that P({1/2}) = 0 for first using continuity of probability
and then using monotonicity. Does this imply that the value {1/2} can never occur?
Solution Let An = [1/2 − 1/n, 1/2 + 1/n] for n ≥ 2. Then An+1 ⊆ An for all n and ∩∞
n=1An = {1/2}.
Moreover,
P(An) = P
1
2
−
1
n
,
1
2
+
1
n
=
2
n
.
5
STAB52 - An Introduction to Probability (Week 2) Danny Cao
Thus, by the continuity of probability measure, we have
P({1/2}) = lim
n→∞
P(An) = lim
n→∞
2
n
= 0.
Alternatively, we may define A = [1/2 − , 1/2 + ]. Then for every > 0, we have {1/2} ⊆ Ae. So using
monotonicity and probability axiom 1, we compute
0 ≤ P({1/2}) ≤ P(A ) = 2 .
Since this inequality holds for arbitrary > 0, we conclude that P({1/2}) = 0.
Just because the probability of an event is 0, that does NOT mean that it can never occur. It just means
that the probability is so small that the only meaningful and non-contradictory probability it can have is 0.
Example (The Cantor set)
The standard Cantor set is created by repeatedly deleting the open middle third of a set of line segments.
One starts by deleting the open middle third (1/3, 2/3) from the interval [0, 1], leaving two line segments:
[0, 1/3]∪[2/3, 1]. Next, the open middle third of each of these remaining line segments is deleted, leaving four
new line segments: [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]. This processes is repeated infinitely many times
and what remains of the unit interval is called the Cantor set. More explicitly, the initial set is C0 = [0, 1]
and then the nth
iteration is defined as
Cn =
Cn−1
3
∪
2
3
+
Cn−1
3
.
If we impose the uniform distribution on the unit interval, what is P(C)?
Solution: If we let C denote the Cantor set, then we may write
C =
∞
n=0
Cn
where Cn+1 ⊆ Cn for all n ≥ 0. Thus, {Cn} C and so we may use continuity of probability to compute
P(C) = lim
n→∞
P(Cn) = lim
n→∞
2
3
n
= 0.
Amazingly, it can be shown that C is uncountable (math majors should try to prove this). Recall that
[0, 1] is also uncountable... This means that even though C is the same “size” as the unit interval, it has
probability 0... What a crazy result!
As the two above examples illustrate, continuity of probability allows us to indirectly evaluated the prob-
ability of events which we may not be able to directly compute using our given probability measure. In the
second example, we were able to compute the probability of the Cantor set even though we have absolutely
no idea what it looks like. This is an extremely powerful tool.
Exercises
Evans & Rosenthal 1.6.1, 1.6.3, 1.6.7, 1.6.10
6

More Related Content

What's hot (17)

Quantitative Techniques random variables
Quantitative Techniques random variablesQuantitative Techniques random variables
Quantitative Techniques random variables
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
 
Ch5
Ch5Ch5
Ch5
 
Mathematical Statistics Assignment Help
Mathematical Statistics Assignment HelpMathematical Statistics Assignment Help
Mathematical Statistics Assignment Help
 
Course material mca
Course material   mcaCourse material   mca
Course material mca
 
Ch6
Ch6Ch6
Ch6
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theorem
 
Probability Assignment Help
Probability Assignment HelpProbability Assignment Help
Probability Assignment Help
 
2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
Probability 4.1
Probability 4.1Probability 4.1
Probability 4.1
 
Statistical Assignment Help
Statistical Assignment HelpStatistical Assignment Help
Statistical Assignment Help
 
03 Probability Review for Analysis of Algorithms
03 Probability Review for Analysis of Algorithms03 Probability Review for Analysis of Algorithms
03 Probability Review for Analysis of Algorithms
 
C2.0 propositional logic
C2.0 propositional logicC2.0 propositional logic
C2.0 propositional logic
 
Probability
ProbabilityProbability
Probability
 
PROBABILITY DISTRIBUTION
PROBABILITY DISTRIBUTIONPROBABILITY DISTRIBUTION
PROBABILITY DISTRIBUTION
 
Statistics Coursework Help
Statistics Coursework HelpStatistics Coursework Help
Statistics Coursework Help
 

Similar to STAB52 Lecture Notes (Week 2)

Indefinite integration class 12
Indefinite integration class 12Indefinite integration class 12
Indefinite integration class 12nysa tutorial
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probabilityIkhlas Rahman
 
probability-180324013552.pptx
probability-180324013552.pptxprobability-180324013552.pptx
probability-180324013552.pptxVukile Xhego
 
Brian Prior - Probability and gambling
Brian Prior - Probability and gamblingBrian Prior - Probability and gambling
Brian Prior - Probability and gamblingonthewight
 
Probability (gr.11)
Probability (gr.11)Probability (gr.11)
Probability (gr.11)Vukile Xhego
 
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Scilab
 
Year 12 Maths A Textbook - Chapter 9
Year 12 Maths A Textbook - Chapter 9Year 12 Maths A Textbook - Chapter 9
Year 12 Maths A Textbook - Chapter 9westy67968
 
Basics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsBasics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsps6005tec
 
Making probability easy!!!
Making probability easy!!!Making probability easy!!!
Making probability easy!!!GAURAV SAHA
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxpavantech57
 
Probability and Randomness
Probability and RandomnessProbability and Randomness
Probability and RandomnessSalmaAlbakri2
 
Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Darren Kuropatwa
 
Pre-Cal 40S Slides May 21, 2008
Pre-Cal 40S Slides May 21, 2008Pre-Cal 40S Slides May 21, 2008
Pre-Cal 40S Slides May 21, 2008Darren Kuropatwa
 
Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Darren Kuropatwa
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 

Similar to STAB52 Lecture Notes (Week 2) (20)

Probability[1]
Probability[1]Probability[1]
Probability[1]
 
Indefinite integration class 12
Indefinite integration class 12Indefinite integration class 12
Indefinite integration class 12
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probability
 
probability-180324013552.pptx
probability-180324013552.pptxprobability-180324013552.pptx
probability-180324013552.pptx
 
Brian Prior - Probability and gambling
Brian Prior - Probability and gamblingBrian Prior - Probability and gambling
Brian Prior - Probability and gambling
 
Probability (gr.11)
Probability (gr.11)Probability (gr.11)
Probability (gr.11)
 
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
 
Year 12 Maths A Textbook - Chapter 9
Year 12 Maths A Textbook - Chapter 9Year 12 Maths A Textbook - Chapter 9
Year 12 Maths A Textbook - Chapter 9
 
603-probability mj.pptx
603-probability mj.pptx603-probability mj.pptx
603-probability mj.pptx
 
Basics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsBasics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the concepts
 
Making probability easy!!!
Making probability easy!!!Making probability easy!!!
Making probability easy!!!
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
 
Nossi ch 10
Nossi ch 10Nossi ch 10
Nossi ch 10
 
Probability and Randomness
Probability and RandomnessProbability and Randomness
Probability and Randomness
 
Mathematical Statistics Homework Help
Mathematical Statistics Homework HelpMathematical Statistics Homework Help
Mathematical Statistics Homework Help
 
Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007
 
Pre-Cal 40S Slides May 21, 2008
Pre-Cal 40S Slides May 21, 2008Pre-Cal 40S Slides May 21, 2008
Pre-Cal 40S Slides May 21, 2008
 
Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Probability
ProbabilityProbability
Probability
 

STAB52 Lecture Notes (Week 2)

  • 1. STAB52 - An Introduction to Probability (Week 2) Danny Cao STAB52 - An Introduction to Probability Week 2 Lecture Notes Here are four more combinatorics problems to further your understanding of the topic: Example (5 card hands) Suppose we are dealt five cards from an ordinary 52-card deck. What is the probability that (a) we get all four aces, plus the king of spades? (b) all five cards are spades? (c) we get no pairs (i.e. all five cards are different values)? (d) we get a full house (i.e. three cards of a kind, plus a different pair)? Solution: (a) If we omit order of the hands (i.e. (A♦, A♣, A♥, A♠, K♠) and (A♣, A♥, A♦, K♠, A♠) are the same hands) then desired probability is 1 52 5 . (b) There are 52 5 possible 5-card hands (omitting order) and 13 spaded cards to choose from. Thus, the desired probability is 13 5 52 5 . (c) If we do count the order, then the possible 5 card combinations is 52·51·50·49·48 since subsequent cards cannot be repeats of any previous ones. On the other hand, there are 52 · 48 · 44 · 40 · 36 5 card combinations which contain no pairs (i.e. every subsequence card cannot repeat the face value of any previous card). Combining these results, the probability of obtaining no pairs is 52 · 48 · 44 · 40 · 36 52 · 51 · 50 · 49 · 48 . (d) The size of the sample space of all 5 card combinations (omitting order) is 52 5 . Now to count the total number of full house hands, we must first pick a face value which will be our triple (There are 13 1 different face values). Then for each of these face values, we must pick a triple from the four possible suits of a given face (i.e. 4 3 possible triples). Now, for each of these triples, we must choose from the 12 1 remaining face values, a new face value that will be our pair. For each of these face values, we must choose 4 2 cards from the 4 possible suits. By the multiplication principle, the probability of a full-house is then 13 1 4 3 12 1 4 2 52 5 . 1.5 Probability: Conditional Probability and Independence 1
  • 2. STAB52 - An Introduction to Probability (Week 2) Danny Cao We motivate the study of conditional probability as follow with the following infamous problem: Example (Monty Hall) Suppose you’re on a game show and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? Solution: If you do not switch your guess, the probability of winning a car is 1/3. If you always switch your guess, the you win provided that your initial pick is a goat; an event which occurs with probability of 2/3. Thus, we see how introducing information (sometimes very subtly) can change the probability of an event. In the previous example, we saw how being given additional information or “conditioning” on certain events can lead to differences in probabilities. Once the host opened a door revealing a goat, the probability of our initial prediction being correct changed from 1/3 top 2/3 upon switching. We now give a rigorous definition to this idea. Definition 1.5.1 Given two events A and B, with P(B) > 0, the conditional probability of A given B is equal to P(A|B) = P(A ∩ B) P(B) . We can think of P(A|B) as the proportion of times when B occurs, that A also occurs. Example: (Conditional coins) Suppose we flip three fair coins. What is the probability that all three coins show heads? What is the probability that all three coins show heads given that the first two show heads? Solution: Let A be the event that the first two coins show heads. Let B be the event that the event that all three coins show heads. We then compute P(A|B) = P(A ∩ B) P(B) = P(A) P(B) = 1 8 2 8 = 1 2 . Is this result surprising? Or could we have guess it without invoking the definition of conditional probability? Using the definition of conditional probability, we may rewrite our original law of total probability (Theorem 1.3.1) in terms of conditional probabilities Theorem 1.5.1 (Law of total probability, conditioned version) Let A1, A2, ... be events that form a partition of the sample space S, each of positive probability. Let B be any event. Then P(B) = ∞ n=1 P(An)P(B|An). Proof: By the ordinary law of total probability, we have P(B) = ∞ n=1 P(An ∩ B). Making the substitution P(An)P(B|An) = P(An ∩ B) then yields the result. 2
  • 3. STAB52 - An Introduction to Probability (Week 2) Danny Cao This version of the law of total probability is very useful as it is often much simpler to compute probabili- ties once we condition on the correct choice a partitioning events. This is illustrated in the following example. Example (A die and three coins) Suppose we roll a fair six-sided die and flip three fair coins. What is the probability that the total number of heads is equal to the number showing on the die? Solution: Let X denote the value that appears on the die. Let S be the total number of heads obtained. Then {S = k}6 k=1 partition the sample space and so by the law of total probability (conditioned version), we compute P(S = X) = 6 k=1 P(S = X|X = k)P(X = k) = 1 6 6 k=1 P(S = X|X = k) = 1 6 6 k=1 P(S = k) = 1 6 3 8 + 3 8 + 1 8 + 0 + 0 + 0 = 7 48 . We conclude this discussion of condition probability with a handy little formula for reversing the order of conditioning events i.e. finding P(A|B) from P(B|A) and vice versa. Theorem 1.5.2 (Baye’s theorem) Let A and B be two events, each of positive probability. Then P(A|B) = P(A) P(B) P(B|A). Proof: From the definition of conditional probability, we know P(A|B) = P(A ∩ B) P(B) and P(B|A) = P(A ∩ B) P(A) . Rewriting the second identity as P(A ∩ B) = P(A)P(B|A) and substituting this into the first identity then give Baye’s theorem. One special case of conditional probability is when conditioning on one event has no affect on the prob- ability of the other i.e. P(A|B) = P(A). In this case, knowing that event B occurred provides no further information about the likeliness of A occurring. One might describe such events as being “independent”. This discussion motivates the following definition: 3
  • 4. STAB52 - An Introduction to Probability (Week 2) Danny Cao Definition 1.5.2 Two events A and B are independent if P(A ∩ B) = P(A) · P(B). More generally, for multiple events, we define... Definition 1.5.3 A collection of events {An}∞ n=1 are independent if P(Ai1 ∩ · · · ∩ Aij ) = P(Ai1 ) · · · P(Aij ) for any subcollection Ai1 , ..., Aij of distinct events. In particular, we note that if A and B are independent events such that P(B) > 0, then P(A|B) = P(A)P(B) P(B) = P(A). Thus, conditioning on an independent event has no influence. Example (Bad luck) Suppose that in a city which is prone to natural disasters, the probability of a flood in any given month is 0.5 while the probability of a tornado is 0.4. Furthermore, suppose that the probability of either a flood or a tornado is 0.7. Are the tornadoes and floods which occur in the city independent? Solution: Let T be the event that a tornado occurs and F be the event that a flood occurs. Then notice by the inclusion-exclusion principle that P(T ∩ F) = P(T) + P(F) − P(T ∪ F) = 0.4 + 0.5 − 0.7 = 0.2 = P(T)P(F). Thus, the occurrence of floods and tornadoes are independent in the city. Example (0 to 100) Suppose we flipped a fair coin 100 times and recorded the values showing. What is the probability model here? Let A be the event that the first 99 flips are shows heads and let B be the event that the 100th flip is a head. Are A and B independent? Solution: Notice that P(A) = 2−99 and P(B) = 1/2. Thus, P(A)P(B) = 2−100 . On the other hand, P(A ∩ B) = P(all coins are H) = 1 2100 . Exercise (“Due to win”) Test your hypothesis to the above example using an R simulation (maybe for 10 flips rather than 100 flips for the sake of limited computing power). For example, you can generate a very large number of coin flips and then record the proportion of heads which appear after 9 consecutive heads. Does this proportion approach 1/2? Interpret this result as advice that you can give to an addicted gambler e.g. If someone repeatedly bets on the same outcome believing that is is “due” to appear soon. 4
  • 5. STAB52 - An Introduction to Probability (Week 2) Danny Cao Example (Independence of compliments) If A and B are independent events, show that Ac and B are independent as well. Solution: If A and B are independent, then P(A ∩ B) = P(A) · P(B). Using this and the law of total probability, we compute P(Ac ∩ B) = P(B) − P(A ∩ B) = P(B) − P(A) · P(B) = P(B) · (1 − P(A)) = P(B) · P(Ac ) and thus, Ac and B are independent. Exercises Evans & Rosenthal 1.5.1, 1.5.2, 1.5.4, 1.5.5, 1.5.7, 1.5.9, 1.5.14 1.6 Continuity of P Recall from calculus that given a sequence of real numbers {an}∞ n=1, the limit of this sequence equals L (denoted limn→∞ an = L) if for every > 0, there exists a N ∈ N such that for all n > N, we have |an − L| < . This is just a fancy way of saying that {an} gets arbitrarily close to L and stays arbitrarily close. In set theory, there are analogous ways of considering a “limit of sets”. Namely, given a sequence of events {An}∞ n=1, we can speak of ∞ n=1 An and ∞ n=1 An, which are again sets (or events in a probabilistic context). To calculate the probabilities of such events (under some minor assumptions) requires the “continuity of probability”. Theorem 1.6.1 Let {An}∞ n=1 be events such that either An ⊆ An+1 for all n ∈ N (resp. An+1 ⊆ An for all n). Then lim n→∞ P(An) = P ∞ n=1 An resp. lim n→∞ P(An) = P ∞ n=1 An . So the probability of an infinite union or intersection of events is the limit of their probabilities provided they are “growing” or “shrinking”. This is a rather abstract idea so let us examine two concrete examples. Example (Uniform distribution) Let S = [0, 1] and P be a probability measure such that P([a, b] = b−a for all 0 ≤ a < b ≤ 1 (this is called the uniform distribution on the unit interval). Prove that P({1/2}) = 0 for first using continuity of probability and then using monotonicity. Does this imply that the value {1/2} can never occur? Solution Let An = [1/2 − 1/n, 1/2 + 1/n] for n ≥ 2. Then An+1 ⊆ An for all n and ∩∞ n=1An = {1/2}. Moreover, P(An) = P 1 2 − 1 n , 1 2 + 1 n = 2 n . 5
  • 6. STAB52 - An Introduction to Probability (Week 2) Danny Cao Thus, by the continuity of probability measure, we have P({1/2}) = lim n→∞ P(An) = lim n→∞ 2 n = 0. Alternatively, we may define A = [1/2 − , 1/2 + ]. Then for every > 0, we have {1/2} ⊆ Ae. So using monotonicity and probability axiom 1, we compute 0 ≤ P({1/2}) ≤ P(A ) = 2 . Since this inequality holds for arbitrary > 0, we conclude that P({1/2}) = 0. Just because the probability of an event is 0, that does NOT mean that it can never occur. It just means that the probability is so small that the only meaningful and non-contradictory probability it can have is 0. Example (The Cantor set) The standard Cantor set is created by repeatedly deleting the open middle third of a set of line segments. One starts by deleting the open middle third (1/3, 2/3) from the interval [0, 1], leaving two line segments: [0, 1/3]∪[2/3, 1]. Next, the open middle third of each of these remaining line segments is deleted, leaving four new line segments: [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]. This processes is repeated infinitely many times and what remains of the unit interval is called the Cantor set. More explicitly, the initial set is C0 = [0, 1] and then the nth iteration is defined as Cn = Cn−1 3 ∪ 2 3 + Cn−1 3 . If we impose the uniform distribution on the unit interval, what is P(C)? Solution: If we let C denote the Cantor set, then we may write C = ∞ n=0 Cn where Cn+1 ⊆ Cn for all n ≥ 0. Thus, {Cn} C and so we may use continuity of probability to compute P(C) = lim n→∞ P(Cn) = lim n→∞ 2 3 n = 0. Amazingly, it can be shown that C is uncountable (math majors should try to prove this). Recall that [0, 1] is also uncountable... This means that even though C is the same “size” as the unit interval, it has probability 0... What a crazy result! As the two above examples illustrate, continuity of probability allows us to indirectly evaluated the prob- ability of events which we may not be able to directly compute using our given probability measure. In the second example, we were able to compute the probability of the Cantor set even though we have absolutely no idea what it looks like. This is an extremely powerful tool. Exercises Evans & Rosenthal 1.6.1, 1.6.3, 1.6.7, 1.6.10 6