Probability Recap

Advanced Algorithms – COMS31900
Probability recap.
(based on slides by Markus Jalsenius)
Benjamin Sach

Probability
The sample space S is the set of outcomes of an experiment.

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Flip a coin: S = {H, T}.

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
Amount of money you can win when playing some lottery:
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
For x ∈ S, the probability of x, written Pr(x),
such that x∈S Pr(x) = 1.
is a real number between 0 and 1,

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLES
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Probability

Probability
Roll a die: S = {1, 2, 3, 4, 5, 6}.
EXAMPLE
Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = 1
6 .

Probability
Pr(H) = Pr(T) = 1
2 .
EXAMPLE

Probability
Pr(£0) = 0.9, Pr(£10) = 0.08, . . . , Pr(£100, 000) = 0.0001.
EXAMPLE
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.

Probability
The sample space is not necessarily ﬁnite.

Probability
EXAMPLE
Flip a coin until ﬁrst tail shows up

Probability
Flip a coin until ﬁrst tail shows up:
EXAMPLE
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

Probability
Pr(“It takes n coin ﬂips”) = 1
2
n
, and
EXAMPLE

Probability
2
n
, and
EXAMPLE
∞
n=1
1
2
n

Probability
2
n
, and
EXAMPLE
∞
n=1
1
2
n
= 1
2 + 1
4 + 1
8 + 1
16 . . .

Probability
2
n
, and
EXAMPLE
∞
n=1
1
2
n
= 1
2 + 1
4 + 1
8 + 1
16 . . . = 1

Event
An event is a subset V of the sample space S.

Event
The probability of event V happening, denoted Pr(V ), is
Pr(V ) =
x∈V
Pr(x).

Event
EXAMPLE
Pr(V ) =
x∈V
Pr(x).
Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
For each x ∈ S, Pr(x) = 1
8

Event
EXAMPLE
Pr(V ) =
x∈V
Pr(x).
8
Define V to be the event “the first and last coin flips are the same”

Event
EXAMPLE
Pr(V ) =
x∈V
Pr(x).
8
in other words, V = {HHH, HTH, THT, TTT}

Event
EXAMPLE
Pr(V ) =
x∈V
Pr(x).
8
What is Pr(V )?

Event
Pr(V ) = Pr(HHH) + Pr(HTH) + Pr(THT) + Pr(TTT) = 4 × 1
8 = 1
2 .
EXAMPLE
Pr(V ) =
x∈V
Pr(x).
8
What is Pr(V )?

Random variable
A random variable (r.v.) Y over sample space S is a function S → R
i.e. it maps each outcome x ∈ S to some real number Y (x).

Random variable
The probability of Y taking value y is
{x ∈ S st. Y(x) = y}
Pr(Y = y) = Pr(x).

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
Pr(Y = y) = Pr(x).

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
Pr(Y = y) = Pr(x).
sum over all values of x such that Y (x) = y

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
Pr(Y = y) = Pr(x).
What is Pr(Y = 2)?

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
EXAMPLE
Pr(Y = y) = Pr(x).
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
EXAMPLE
Pr(Y = y) = Pr(x).
What is Pr(Y = 2)?
Pr(Y = 2) =
x∈{HH,TT}
Pr(x) =
1
4
+
1
4
=
1
2

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
EXAMPLE
Pr(Y = y) = Pr(x).

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
The expected value (the mean) of a r.v. Y ,
EXAMPLE
denoted E(Y ), is
E(Y ) =
x∈S
Y (x)·Pr(x).
Pr(Y = y) = Pr(x).

Random variable
{x ∈ S st. Y(x) = y}
Two coin ﬂips.
H H
H T
T H
T T
2
1
5
2
S Y
Pr(Y = 2) = 1
2
The expected value (the mean) of a r.v. Y ,
E(Y ) = 2 · 1
2 + 1 · 1
4 + 5 · 1
4 = 7
2
EXAMPLE
denoted E(Y ), is
E(Y ) =
x∈S
Y (x)·Pr(x).
Pr(Y = y) = Pr(x).

Linearity of expectation
Let Y1, Y2, . . . , Yk be k random variables. Then
E
k
i=1
Yi =
k
i=1
E(Yi)
THEOREM (Linearity of expectation)

E
k
i=1
Yi =
k
i=1
E(Yi)
Linearity of expectation always holds,
(regardless of whether the random variables are independent or not.)

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
Roll two dice. Let the r.v. Y be the sum of the values.

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
random variable

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
Approach 1: (without the theorem)

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

E
k
i=1
Yi =
k
i=1
E(Yi)
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
EXAMPLE
What is E(Y )?

E
k
i=1
Yi =
k
i=1
E(Yi)
E(Y ) = x∈S Y (x) · Pr(x) = 1
36 x∈S Y (x) =
EXAMPLE
What is E(Y )?
1
36 (1 · 2 + 2 · 3 + 3 · 4 + · · · + 1 · 12) = 7

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
Approach 2: (with the theorem)

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
Let the r.v. Y1 be the value of the ﬁrst die and Y2 the value of the second

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
E(Y1) = E(Y2) = 3.5

E
k
i=1
Yi =
k
i=1
E(Yi)
EXAMPLE
What is E(Y )?
E(Y1) = E(Y2) = 3.5
so E(Y ) = E(Y1 + Y2) = E(Y1) + E(Y2) = 7

Indicator random variables
An indicator random variable is a r.v. that can only be 0 or 1.
(usually referred to by the letter I)

Fact: E(I) = 0 · Pr(I = 0) + 1 · Pr(I = 1) = Pr(I = 1).

Fact: E(I) = Pr(I = 1).

Often an indicator r.v. I is associated with an event such that
Fact: E(I) = Pr(I = 1).
I = 1 if the event happens (and I = 0 otherwise).

Indicator random variables and linearity of expectation work great together!
Fact: E(I) = Pr(I = 1).

Roll a die n times.
EXAMPLE
Fact: E(I) = Pr(I = 1).

Roll a die n times.
EXAMPLE
Fact: E(I) = Pr(I = 1).
How many rolls do we expect to show a value
that is at least the value of the previous roll?

Roll a die n times.
EXAMPLE
Fact: E(I) = Pr(I = 1).
For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth roll
is at least the value of the previous roll (and Ij = 0 otherwise)

Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
12 . (by counting the outcomes)
EXAMPLE
Fact: E(I) = Pr(I = 1).

Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
EXAMPLE
Fact: E(I) = Pr(I = 1).
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12

Roll a die n times.
Pr(Ij = 1) = 21
36 = 7
EXAMPLE
Fact: E(I) = Pr(I = 1).
E
n
j=2
Ij =
n
j=2
E(Ij) =
n
j=2
Pr(Ij = 1) = (n − 1) ·
7
12
Linearity of Expectation
E
k
i=1
Yi =
k
i=1
E(Yi)

Markov’s inequality
EXAMPLE
Suppose that the average (mean) speed on the motorway is 60 mph.

It then follows that at most
EXAMPLE

EXAMPLE
1
2 of all cars drive at least 120 mph,

EXAMPLE
. . . otherwise the mean must be higher than 60 mph. (a contradiction)
1

EXAMPLE
2

If X is a non-negative r.v., then for all a > 0,
Pr(X ≥ a) ≤
E(X)
a
.
THEOREM (Markov’s inequality)
EXAMPLE
2

Pr(X ≥ a) ≤
E(X)
a
.
From the example above:
Pr(speed of a random car ≥ 120 mph) ≤ 60
120 = 1
2 ,
Pr(speed of a random car ≥ 90mph) ≤ 60
90 = 2
3 .
EXAMPLE
EXAMPLE
2

EXAMPLE
n people go to a party, leaving their hats at the door.
Each person leaves with a random hat.

EXAMPLE
How many people leave with their own hat?

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,
EXAMPLE
otherwise Ij = 0.

EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . .

EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
By linearity of expectation. . . Fact: E(I) = Pr(I = 1).

By Markov’s inequality (recall: Pr(X ≥ a) ≤
E(X)
a ),
EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.

E(X)
a ),
EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
Pr(5 or more people leaving with their own hats) ≤ 1
5 ,

E(X)
a ),
EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
5 ,
Pr(at least 1 person leaving with their own hat) ≤ 1
1 = 1.

E(X)
a ),
(sometimes Markov’s inequality is not particularly informative)
EXAMPLE
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
5 ,
1 = 1.

E(X)
a ),
(sometimes Markov’s inequality is not particularly informative)
EXAMPLE
In fact, here it can be shown that as n → ∞, the probability that at least
one person leaves with their own hat is 1 − 1
e .
E
n
j=1
Ij =
n
j=1
E(Ij) =
n
j=1
Pr(Ij = 1) = n·
1
n
= 1.
otherwise Ij = 0.
5 ,
1 = 1.

If X is a non-negative r.v. that only takes integer values, then
Pr(X > 0) = Pr(X ≥ 1) ≤ E(X) .
COROLLARY
For an indicator r.v. I, the bound is tight (=), as Pr(I > 0) = E(I).

Union bound
Let V1, . . . , Vk be k events. Then
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
THEOREM (union bound)

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This is the probability at least one of the events happens

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
This bound is tight (=) when the events are all disjoint.
(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
PROOF

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
Deﬁne indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.
PROOF

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
PROOF
Let the r.v. X = k
j=1 Ij be the number of events that happen.

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
PROOF
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
= k
j=1 Pr(Vj)

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
by previous
PROOF
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
= k
j=1 Pr(Vj)
Markov corollary

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
by previous
PROOF
Pr k
j=1 Vj = Pr(X >0) ≤ E(X) = E( k
j=1 Ij) = k
j=1 E(Ij)
Let the r.v. X = k
= k
j=1 Pr(Vj)
Markov corollary
E(I) = Pr(I = 1)

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
S = {1, . . . , 6} is the set of outcomes of a die roll.
EXAMPLE

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
EXAMPLE
We deﬁne two events: V1 = {3, 4}
V2 = {1, 2, 3}

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
V2 = {1, 2, 3}

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
V2 = {1, 2, 3}

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
V2 = {1, 2, 3}
in fact, Pr(V1 ∪ V2) = 2
3

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
V2 = {1, 2, 3}
3 (3 was ‘double counted’)

Union bound
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
1
2
S
V1
V2
4
6
3
5
Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) = 1
3 + 1
2 = 5
6
EXAMPLE
V2 = {1, 2, 3}
3 (3 was ‘double counted’)
Typically the union bound is used when each Pr(Vi) is much smaller than k.

Summary
An event is a subset V of the sample space S, Pr(V ) = x∈V Pr(x)
{x ∈ S st. Y(x) = y}
A random variable (r.v.) Y is a function which maps x ∈ S to S(x) ∈ R
Pr(Y = y) = Pr(x).
The expected value (the mean) of Y is E(Y ) =
x∈S
Y (x)·Pr(x).
Fact: E(I) = Pr(I = 1).
Let V1, . . . , Vk be k events then,
Pr
k
i=1
Vi ≤
k
i=1
Pr(Vi).
Pr(X ≥ a) ≤
E(X)
a
.
Let Y1,Y2,...,Yk be k random variables then,
E
k
i=1
Yi =
k
i=1
E(Yi)

Probability Recap

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (6)

Similar to Probability Recap

Similar to Probability Recap (20)

More from Benjamin Sach

More from Benjamin Sach (16)

Recently uploaded

Recently uploaded (20)

Probability Recap