4. Probability
The sample space S is the set of outcomes of an experiment.
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
5. Probability
The sample space S is the set of outcomes of an experiment.
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Flip a coin: S = {H, T}.
6. Probability
The sample space S is the set of outcomes of an experiment.
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Flip a coin: S = {H, T}.
Amount of money you can win when playing some lottery:
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
7. Probability
The sample space S is the set of outcomes of an experiment.
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Flip a coin: S = {H, T}.
Amount of money you can win when playing some lottery:
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
8. Probability
The sample space S is the set of outcomes of an experiment.
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Flip a coin: S = {H, T}.
Amount of money you can win when playing some lottery:
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
9. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
10. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLE
Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = 1
6 .
11. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
12. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
Flip a coin: S = {H, T}.
Pr(H) = Pr(T) = 1
2 .
EXAMPLE
13. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
14. Probability
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]
Amount of money you can win when playing some lottery:
Pr(£0) = 0.9, Pr(£10) = 0.08, . . . , Pr(£100, 000) = 0.0001.
EXAMPLE
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
17. Probability
The sample space is not necessarily finite.
Flip a coin until first tail shows up:
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
18. Probability
The sample space is not necessarily finite.
Flip a coin until first tail shows up:
Pr(“It takes n coin flips”) = 1
2
n
, and
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
19. Probability
The sample space is not necessarily finite.
Flip a coin until first tail shows up:
Pr(“It takes n coin flips”) = 1
2
n
, and
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
∞
n=1
1
2
n
20. Probability
The sample space is not necessarily finite.
Flip a coin until first tail shows up:
Pr(“It takes n coin flips”) = 1
2
n
, and
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
∞
n=1
1
2
n
= 1
2 + 1
4 + 1
8 + 1
16 . . .
21. Probability
The sample space is not necessarily finite.
Flip a coin until first tail shows up:
Pr(“It takes n coin flips”) = 1
2
n
, and
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
∞
n=1
1
2
n
= 1
2 + 1
4 + 1
8 + 1
16 . . . = 1
23. Event
An event is a subset V of the sample space S.
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
24. Event
An event is a subset V of the sample space S.
EXAMPLE
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8
25. Event
An event is a subset V of the sample space S.
EXAMPLE
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8
Define V to be the event “the first and last coin flips are the same”
26. Event
An event is a subset V of the sample space S.
EXAMPLE
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8
Define V to be the event “the first and last coin flips are the same”
in other words, V = {HHH, HTH, THT, TTT}
27. Event
An event is a subset V of the sample space S.
EXAMPLE
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8
Define V to be the event “the first and last coin flips are the same”
in other words, V = {HHH, HTH, THT, TTT}
What is Pr(V )?
28. Event
An event is a subset V of the sample space S.
Pr(V ) = Pr(HHH) + Pr(HTH) + Pr(THT) + Pr(TTT) = 4 × 1
8 = 1
2 .
EXAMPLE
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8
Define V to be the event “the first and last coin flips are the same”
in other words, V = {HHH, HTH, THT, TTT}
What is Pr(V )?
29. Random variable
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
30. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
31. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
32. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
33. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
What is Pr(Y = 2)?
34. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2
35. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2
36. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2
37. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2
38. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
EXAMPLE
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
39. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
The expected value (the mean) of a r.v. Y ,
EXAMPLE
denoted E(Y ), is
E(Y ) =
x∈S
Y (x)·Pr(x).
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
40. Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Two coin flips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
The expected value (the mean) of a r.v. Y ,
E(Y ) = 2 · 1
2 + 1 · 1
4 + 5 · 1
4 = 7
2
EXAMPLE
denoted E(Y ), is
E(Y ) =
x∈S
Y (x)·Pr(x).
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).
Pr(Y = y) = Pr(x).
41. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
THEOREM (Linearity of expectation)
42. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
(regardless of whether the random variables are independent or not.)
43. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
44. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
random variable
45. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
46. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
47. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
48. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)
49. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)
50. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)
51. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)
1
36 (1 · 2 + 2 · 3 + 3 · 4 + · · · + 1 · 12) = 7
52. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 1: (without the theorem)
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)
1
36 (1 · 2 + 2 · 3 + 3 · 4 + · · · + 1 · 12) = 7
53. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
54. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 2: (with the theorem)
55. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 2: (with the theorem)
Let the r.v. Y1 be the value of the first die and Y2 the value of the second
56. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 2: (with the theorem)
Let the r.v. Y1 be the value of the first die and Y2 the value of the second
E(Y1) = E(Y2) = 3.5
57. Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
THEOREM (Linearity of expectation)
EXAMPLE
(regardless of whether the random variables are independent or not.)
Roll two dice. Let the r.v. Y be the sum of the values.
What is E(Y )?
Approach 2: (with the theorem)
Let the r.v. Y1 be the value of the first die and Y2 the value of the second
E(Y1) = E(Y2) = 3.5
so E(Y ) = E(Y1 + Y2) = E(Y1) + E(Y2) = 7
58. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
(usually referred to by the letter I)
59. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
(usually referred to by the letter I)
Fact: E(I) = 0 · Pr(I = 0) + 1 · Pr(I = 1) = Pr(I = 1).
60. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
61. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Often an indicator r.v. I is associated with an event such that
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
62. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
63. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
64. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
65. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
66. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
67. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
68. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
Linearity of Expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
69. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
70. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
71. Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
Often an indicator r.v. I is associated with an event such that
Indicator random variables and linearity of expectation work great together!
EXAMPLE
(usually referred to by the letter I)
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).
How many rolls do we expect to show a value
that is at least the value of the previous roll?
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
73. Markov’s inequality
It then follows that at most
EXAMPLE
Suppose that the average (mean) speed on the motorway is 60 mph.
74. Markov’s inequality
It then follows that at most
EXAMPLE
Suppose that the average (mean) speed on the motorway is 60 mph.
1
2 of all cars drive at least 120 mph,
75. Markov’s inequality
It then follows that at most
EXAMPLE
. . . otherwise the mean must be higher than 60 mph. (a contradiction)
Suppose that the average (mean) speed on the motorway is 60 mph.
1
2 of all cars drive at least 120 mph,
76. Markov’s inequality
It then follows that at most
EXAMPLE
. . . otherwise the mean must be higher than 60 mph. (a contradiction)
Suppose that the average (mean) speed on the motorway is 60 mph.
2
3 of all cars drive at least 90 mph,
77. Markov’s inequality
It then follows that at most
If X is a non-negative r.v., then for all a > 0,
Pr(X ≥ a) ≤
E(X)
a
.
THEOREM (Markov’s inequality)
EXAMPLE
. . . otherwise the mean must be higher than 60 mph. (a contradiction)
Suppose that the average (mean) speed on the motorway is 60 mph.
2
3 of all cars drive at least 90 mph,
78. Markov’s inequality
It then follows that at most
If X is a non-negative r.v., then for all a > 0,
Pr(X ≥ a) ≤
E(X)
a
.
From the example above:
Pr(speed of a random car ≥ 120 mph) ≤ 60
120 = 1
2 ,
Pr(speed of a random car ≥ 90mph) ≤ 60
90 = 2
3 .
EXAMPLE
THEOREM (Markov’s inequality)
EXAMPLE
. . . otherwise the mean must be higher than 60 mph. (a contradiction)
Suppose that the average (mean) speed on the motorway is 60 mph.
2
3 of all cars drive at least 90 mph,
80. Markov’s inequality
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
81. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
otherwise Ij = 0.
82. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
83. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . . Fact: E(I) = Pr(I = 1).
84. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
85. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
86. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
Pr(5 or more people leaving with their own hats) ≤ 1
5 ,
87. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
Pr(5 or more people leaving with their own hats) ≤ 1
5 ,
Pr(at least 1 person leaving with their own hat) ≤ 1
1 = 1.
88. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
(sometimes Markov’s inequality is not particularly informative)
EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
Pr(5 or more people leaving with their own hats) ≤ 1
5 ,
Pr(at least 1 person leaving with their own hat) ≤ 1
1 = 1.
89. Markov’s inequality
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
(sometimes Markov’s inequality is not particularly informative)
EXAMPLE
In fact, here it can be shown that as n → ∞, the probability that at least
one person leaves with their own hat is 1 − 1
e .
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.
How many people leave with their own hat?
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .
Pr(5 or more people leaving with their own hats) ≤ 1
5 ,
Pr(at least 1 person leaving with their own hat) ≤ 1
1 = 1.
90. Markov’s inequality
If X is a non-negative r.v. that only takes integer values, then
Pr(X > 0) = Pr(X ≥ 1) ≤ E(X) .
COROLLARY
For an indicator r.v. I, the bound is tight (=), as Pr(I > 0) = E(I).
91. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
THEOREM (union bound)
92. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
THEOREM (union bound)
This is the probability at least one of the events happens
93. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
THEOREM (union bound)
94. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
95. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
96. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
97. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
Let the r.v. X = k
j=1 Ij be the number of events that happen.
98. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
j=1 Ij be the number of events that happen.
= k
j=1 Pr(Vj)
99. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
by previous
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
j=1 Ij be the number of events that happen.
= k
j=1 Pr(Vj)
Markov corollary
100. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
by previous
Linearity of expectation
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
j=1 Ij be the number of events that happen.
= k
j=1 Pr(Vj)
Markov corollary
101. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
by previous
Linearity of expectation
Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
THEOREM (union bound)
PROOF
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
j=1 Ij be the number of events that happen.
= k
j=1 Pr(Vj)
Markov corollary
E(I) = Pr(I = 1)
102. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
103. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
EXAMPLE
104. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
105. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
106. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
107. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
in fact, Pr(V1 ∪ V2) = 2
3
108. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
in fact, Pr(V1 ∪ V2) = 2
3 (3 was ‘double counted’)
109. Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
THEOREM (union bound)
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)
S = {1, . . . , 6} is the set of outcomes of a die roll.
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
We define two events: V1 = {3, 4}
V2 = {1, 2, 3}
in fact, Pr(V1 ∪ V2) = 2
3 (3 was ‘double counted’)
Typically the union bound is used when each Pr(Vi) is much smaller than k.
110. Summary
The sample space S is the set of outcomes of an experiment.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,
An event is a subset V of the sample space S, Pr(V ) = x∈V Pr(x)
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
A random variable (r.v.) Y is a function which maps x ∈ S to S(x) ∈ R
Pr(Y = y) = Pr(x).
The expected value (the mean) of Y is E(Y ) =
x∈S
Y (x)·Pr(x).
An indicator random variable is a r.v. that can only be 0 or 1.
Fact: E(I) = Pr(I = 1).
Let V1, . . . , Vk be k events then,
THEOREM (union bound)
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
If X is a non-negative r.v., then for all a > 0,
THEOREM (Markov’s inequality)
Pr(X ≥ a) ≤
E(X)
a
.
Let Y1,Y2,...,Yk be k random variables then,
E
k
i=1
Yi =
k
i=1
E(Yi)
THEOREM (Linearity of expectation)