Review of probability calculus

Review of probability calculus
June 11, 2017
Andreas Scheidegger
Eawag: Swiss Federal Institute of Aquatic Science and Technology

Random variables (RV)
“Mathematical machines that generate numbers”

Completely described by the cumulative probability distribution
function (cdf) or the probability distribution/density function
(pdf).
Some properties can be described by measures such as mean,
variance, mode, . . .
Andreas Scheidegger Univariate Random Variables 1

Probability Distribution/Density Function (pdf)
PA fB
z1 z2 zn zzrzl
Discrete RV Probability to obtain a certain output.
Continuous RV Proportional to the probability to obtain an output
close to a certain value.

Cumulative Distribution Function (cdf)
FA FB
z1 z2 zn zzrzl
0
1
0
1
Discrete and continous RV Probability to obtain an output equal
or smaller than a certain value.

cdf and pdf
Discrete RVs
Distribution function:
FA(z) = P(A ≤ z)
Probability distribution:
PA(zi ) for zi ∈ ΩA
Continous RVs
Distribution function:
FB(z) = P(B ≤ z)
Probability density:
fB(z) =
d
dz
FB(z)
P(B ∈ [z1, z2]) =
z2
z1
fB(z) dz
P(B ∈ [z, z + ∆]) ≈ ∆ · fB(z)

Characteristics of Random Variables
Measures of Location
Expected value:
E[A] =
z∈ΩA
z PA(z) , E[B] =
ΩB
z fB(z) dz
Median:
Med[Z] : P(Z ≤ Med[Z]) = P(Z Med[Z]) = Q0.5[Z]
Quantiles:
Qp[Z] : P(Z ≤ Qp[Z]) = p and P(Z Qp[Z]) = 1 − p
Mode:
Mode[A] = arg max
zi ∈ΩA
PA(zi ) , Mode[B] = arg max
z∈ΩB
fB(z)

Expected value of a function of a RV:
E[g(A)] =
z∈ΩA
g(z)PA(z)
E[g(B)] =
ΩB
g(z)fB(z) dz

Expected value of a function of a RV:
E[g(A)] =
z∈ΩA
g(z)PA(z)
E[g(B)] =
ΩB
g(z)fB(z) dz
Attention!
E[g(X)] = g (E[X])

Measures of Extension
Variance:
Var[Z] = E Z − E[Z]
2
Standard Deviation:
SD[Z] = Var[Z]
Inter-Quantile Range:
QRp[Z] = Q(1+p)/2[Z] − Q(1−p)/2[Z]

E[aZ + b] = a E[Z] + b
E[Z1 ± Z2] = E[Z1] ± E[Z2]
Var[Z] = E[Z2
] − E[Z]2
Var[aZ + b] = a2
Var[Z]
Only if Z1 and Z2 are independent:
Var[Z1 ± Z2] = Var[Z1] + Var[Z2]

Multivariate random variables

A

Andreas Scheidegger Multivariate Random Variables 9

Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.

Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.
continous RV:
fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)
E.g.: fA,B(3, 1) : proportional to the probability to obtain a
realization close to 3 and 1.

Conditional Distributions
Discrete RV:
PA|B(a|b) =
PA,B(a, b)
PB(b)
Continuous RV:
fA|B(a|b) =
fA,B(a, b)
fB(b)

Marginal distribution
Discrete random variables:
PA(a) =
b∈ΩB
PA,B(a, b)

Marginal distribution
PA(a) =
b∈ΩB
PA,B(a, b)
Continuous random variables:
fA(a) =
ΩB
fA,B(a, b) db

Independence
Deﬁnition:
FA,B(a, b) = FA(a) · FB(b)
PA,B(a, b) = PA(a) · PB(b)
Continuous random variables:
fA,B(a, b) = fA(a) · fB(b)

Bayes’ Theorem1
Discrete random variables
Because
PA|B(a|b)PB(b) = PB|A(b|a)PA(a)
we can write
PA|B(a|b) =
PB|A(b|a)PA(a)
PB(b)
=
PB|A(b|a)PA(a)
a ∈ΩA
PB|A(b|a )PA(a )
1
Bayes’ Theorem as we know it today was actually formulated by P. Laplace
in 1774 and not by T. Bayes.

Bayes’ Theorem
Continuous random variables
fA|B(a|b) =
fB|A(b|a)fA(a)
fB(b)
=
fB|A(b|a)fA(a)
fB|A(b|a )fA(a ) da

Dependencies
Variance-Covariance Matrix:
Var[Z] = E Z − E[Z] Z − E[Z]
T
Individual Covariances:
Cov[Zi , Zj] = E Zi − E[Zi ] Zj − E[Zj] = Var[Z]i,j
Correlation Matrix:
Cor[Z]i,j =
Cov[Zi , Zj]
Var[Zi ] · Var[Zj]

Correlation
Correlation measures only linear dependencies!
Figure: Several sets of (x, y) points, with the correlation coeﬃcient of x
and y for each set. Source: Wikipedia.

Short Notation
Function argument corresponds to RV
PA(a), PB|A(b|a) ←→ P(a), P(b|a)
fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)
Example:
fX1|X2,X3
(x1|x2, x3) =
fX2|X1
(x2|x1)fX1|X3
(x1|x3)
fX2 (x2)
p(x1|x2, x3) =
p(x2|x1)p(x1|x3)
p(x2)

Directed Acyclic Graphs
Visualize independence structure of RV
A
B
DC

Directed Acyclic Graphs
Visualize independence structure of RV
A
B
DC
p(A)
p(B | A)
p(C | A, B)
p(D | B)
e.g. A and D are conditionally
independent. joint distribution:
p(A, B, C, D) =
p(A) p(B | A) p(C | A, B) p(D | B)

Normal distribution
Andreas Scheidegger Normal distributions 20

Central Limit Theorem
Lets X1, X2, . . . be independent and identically distributed RVs
with mean µ and a ﬁnite variance σ2. Further we deﬁne
Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2.
Then the standardized RV
Zn =
Sn − nµ
√
nσ
is standard normal distributed for n → ∞.

Central Limit Theorem Example
n = 1
Density
−2 −1 0 1 2
0.00.40.8
n = 2
Density
−2 −1 0 1 2
0.00.30.6
n = 3
Density
−2 −1 0 1 2
0.00.30.6
n = 4
Density
−2 −1 0 1 2
0.00.20.4
n = 5
Density
−2 −1 0 1 2
0.00.20.4
n = 6
Density
−2 −1 0 1 2
0.00.20.4
n = 7
Density
−2 −1 0 1 2
0.00.20.4
n = 8
Density
−2 −1 0 1 2
0.00.20.4
n = 9
Density
−2 −1 0 1 2
0.00.20.4
n = 10
Density
−2 −1 0 1 2
0.00.20.4
n = 11
Density
−2 −1 0 1 2
0.00.20.4
n = 12
Density
−2 −1 0 1 2
0.00.20.4

Relationships of Univariate Distributions
Figure 1. Univariate distribution relationships.
The American Statistician, February 2008, Vol. 62, No. 1 47
Downloadedby[Lib4RI]at02:2428May2013
at02:2428May2013
From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distribution
relationships. The American Statistician, 62(1), 45–53. → Link

Multivariate Normal Distribution
Density of a multivariate Normal distribution of dimension n with a
mean vector µ and a variance-covariance matrix Σ:
Z ∼ N(µ, Σ)
fN(µ,σ,R)(z) =
1
(2π)n/2
1
| Σ |1/2
exp −
1
2
(z − µ)T
Σ−1
(z − µ)

Properties
All marginals are normal distributed
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )

Properties
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )
Linear transformation:
Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT

Properties
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )
Linear transformation:
Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT
Conditional distribution:
Z =
X
Y
∼ N
µX
µY
,
ΣX,X ΣX,Y
ΣT
X,Y ΣY,Y
⇒ X | Y (y) ∼ N µX + ΣX,YΣ−1
Y,Y(y − µY), ΣX,X − ΣX,YΣ−1
Y,YΣT
X,Y

Further Generalization
one-dimensional

what’s next?
Andreas Scheidegger Random Processes 26

Discrete random process
“Random vectors with inﬁnity large number of elements”
(0.11, 10.78, -10.24, -3.90, 5.91, ...)
(-1.11, -4.06, -8.64, -0.92, -2.27, ...)
(0.76, -8.54, 0.81, 2.03, 12.9, ...)

Continous random processes
“Random functions”

What is a Probability?
Interpretation of probabilities
1. The probability for “head” is 1/2.
2. The probability that it rains tomorrow is 30%.
Frequentist Subjective
Other probability interpretations:
→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29

Frequentist
1. The frequency that “head”
occurs if the random
experiment is repeated.
Subjective
1. Somebody’s belief that a
coin toss results in “head”,
given his/her experience.

Frequentist
1. The frequency that “head”
occurs if the random
experiment is repeated.
2. “Rain tomorrow” is not a
repeatable experiment
Subjective
1. Somebody’s belief that a
coin toss results in “head”,
given his/her experience.
2. Somebody’s belief that it
rains tomorrow, given
his/her experience.

Summary
joint = conditional x marginal
f (a, b) = f (a|b) f (b) = f (b|a) f (a)
Marginals:
f (a) = f (a, b) db = f (a|b) f (b) db
More information in Appendix A.2 – A.5.
Andreas Scheidegger Summary 30

Common distributions

Implemented distribution in R
For all distributions four functions are implemented:
d__(x, ...) pdf evaluated at x
p__(x, ...) cdf evaluated at x
q__(p, ...) p-th quantile
r__(n, ...) sample n random numbers
beta *beta binomial *binom
Cauchy *cauchy chi-squared *chisq
exponential *exp F *f
gamma *gamma geometric *geom
hypergeometric *hyper log-normal *lnorm
multinomial *multinom negative binomial *nbinom
normal *norm Poisson *pois
Student’s t *t uniform *unif
Weibull *weibull

Normal Distribution
Density
Z ∼ N(µ, σ) fN(µ,σ)(z) =
1
σ
√
2π
exp −
(z − µ)2
2σ2
−3 −2 −1 0 1 2 3
012345
Normal with mean=0
z
f
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
Normal with mean=0
z
F
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4

Normal Distribution
Properties
E N(µ, σ) = Mode N(µ, σ) = Med N(µ, σ) = µ
SD N(µ, σ) = σ
Central limit theorem:
Lets X1, X2, . . . be independent and identically distributed RVs
with mean µ and a ﬁnite variance σ2. Further we deﬁne
Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2.
Then the standardized RV
Zn =
Sn − nµ
√
nσ
is standard normal distributed for n → ∞.

Lognormal Distribution
Deﬁnition:
Z = exp(X) , X ∼ N(m, s)
Density:
Z ∼ LN(µ, σ)
fLN(µ,σ)(z) =



1
√
2π
1
sz
exp





−
1
2
log
z
µ
+
s2
2
2
s2





for z 0
0 for z ≤ 0
with
s = log 1 +
σ2
µ2

0.0 0.5 1.0 1.5 2.0 2.5 3.0
012345
Lognormal with mean=1
z
f
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Lognormal with mean=1
z
F
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4

Properties
E LN(µ, σ) = µ
Mode LN(µ, σ) =
µ
1 +
σ2
µ2
3
2
Med LN(µ, σ) =
µ
1 +
σ2
µ2
SD LN(µ, σ) = σ

R implementation
Attention: The lognormal distribution in R is deﬁned with m and s
(the mean and standard deviation of X)!
The code below computes the arguments if mean µ and standard
deviation σ are given:
## conversion , ’mu ’ and ’sigma ’ given
meanlog - log(mu) - 0.5*log(1 + (sigma/mu )^2)
sdlog - sqrt(log(1 + sigma ^2/(mu ^2)))
## generate 1000 random samples
rlnorm (1000 , meanlog=meanlog , sdlog=sdlog)

χ2
Distribution
Deﬁnition:
Z =
n
i=1
X2
i , Xi ∼ N(0, 1)
Density:
Z ∼ χ2
n fχ2
n
(z) =
z(n−2)/2 exp(−z/2)
2n/2 Γ(n/2)

χ2
Distribution
0 2 4 6 8 10 12 14
0.00.10.20.30.40.50.6
χ2
z
f
df = 1
df = 2
df = 3
df = 4
df = 5
df = 10
0 2 4 6 8 10 12 14
0.00.20.40.60.81.0
χ2
z
F
df = 1
df = 2
df = 3
df = 4
df = 5
df = 10

χ2
Distribution
Properties
E χ2
n = n
Mode χ2
n = n − 2 for n ≥ 2
SD χ2
n =
√
2n

F Distribution
Deﬁnition:
Z =
X
n
Y
m
, X ∼ χ2
n , Y ∼ χ2
m
Density:
Z ∼ Fn,m fFn,m (z) =
Γ (n + m)/2 (n/m)n/2 z(n−2)/2
Γ n/2 Γ m/2

F Distribution
0 1 2 3 4
0.00.20.40.60.81.01.2
F
z
f
df1 = 2 df2 = 10
df1 = 3 df2 = 10
df1 = 5 df2 = 10
df1 = 5 df2 = 100
0 1 2 3 4
0.00.20.40.60.81.0
F
z
F
df1 = 2 df2 = 10
df1 = 3 df2 = 10
df1 = 5 df2 = 10
df1 = 5 df2 = 100

F Distribution
Properties
E Fn,m =
m
m − 2
for m 2
Mode Fn,m =
m(n − 2)
n(m + 2)
for n 2
SD Fn,m =
2m2(n + m − 2)
n(m − 2)2(m − 4)
for m 4

t Distribution
Deﬁnition:
Z =
X
Y
n
, X ∼ N(0, 1) , Y ∼ χ2
n
Density:
Z ∼ tn ftn (z) =
Γ (n + 1)/2
√
π n Γ n/2 (1 + z2/n)(n+1)/2

Review of probability calculus

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Review of probability calculus

Similar to Review of probability calculus (20)

More from Andreas Scheidegger

More from Andreas Scheidegger (8)

Recently uploaded

Recently uploaded (20)

Review of probability calculus