Review of probability calculus
June 11, 2017
Andreas Scheidegger
Eawag: Swiss Federal Institute of Aquatic Science and Technology
Random variables (RV)
“Mathematical machines that generate numbers”
Completely described by the cumulative probability distribution
function (cdf) or the probability distribution/density function
(pdf).
Some properties can be described by measures such as mean,
variance, mode, . . .
Andreas Scheidegger Univariate Random Variables 1
Probability Distribution/Density Function (pdf)
PA fB
z1 z2 zn zzrzl
Discrete RV Probability to obtain a certain output.
Continuous RV Proportional to the probability to obtain an output
close to a certain value.
Andreas Scheidegger Univariate Random Variables 2
Cumulative Distribution Function (cdf)
FA FB
z1 z2 zn zzrzl
0
1
0
1
Discrete and continous RV Probability to obtain an output equal
or smaller than a certain value.
Andreas Scheidegger Univariate Random Variables 3
cdf and pdf
Discrete RVs
Distribution function:
FA(z) = P(A ≤ z)
Probability distribution:
PA(zi ) for zi ∈ ΩA
Continous RVs
Distribution function:
FB(z) = P(B ≤ z)
Probability density:
fB(z) =
d
dz
FB(z)
P(B ∈ [z1, z2]) =
z2
z1
fB(z) dz
P(B ∈ [z, z + ∆]) ≈ ∆ · fB(z)
Andreas Scheidegger Univariate Random Variables 4
Characteristics of Random Variables
Measures of Location
Expected value:
E[A] =
z∈ΩA
z PA(z) , E[B] =
ΩB
z fB(z) dz
Median:
Med[Z] : P(Z ≤ Med[Z]) = P(Z  Med[Z]) = Q0.5[Z]
Quantiles:
Qp[Z] : P(Z ≤ Qp[Z]) = p and P(Z  Qp[Z]) = 1 − p
Mode:
Mode[A] = arg max
zi ∈ΩA
PA(zi ) , Mode[B] = arg max
z∈ΩB
fB(z)
Andreas Scheidegger Univariate Random Variables 5
Characteristics of Random Variables
Measures of Location
Expected value of a function of a RV:
E[g(A)] =
z∈ΩA
g(z)PA(z)
E[g(B)] =
ΩB
g(z)fB(z) dz
Andreas Scheidegger Univariate Random Variables 6
Characteristics of Random Variables
Measures of Location
Expected value of a function of a RV:
E[g(A)] =
z∈ΩA
g(z)PA(z)
E[g(B)] =
ΩB
g(z)fB(z) dz
Attention!
E[g(X)] = g (E[X])
Andreas Scheidegger Univariate Random Variables 6
Characteristics of Random Variables
Measures of Extension
Variance:
Var[Z] = E Z − E[Z]
2
Standard Deviation:
SD[Z] = Var[Z]
Inter-Quantile Range:
QRp[Z] = Q(1+p)/2[Z] − Q(1−p)/2[Z]
Andreas Scheidegger Univariate Random Variables 7
Characteristics of Random Variables
E[aZ + b] = a E[Z] + b
E[Z1 ± Z2] = E[Z1] ± E[Z2]
Var[Z] = E[Z2
] − E[Z]2
Var[aZ + b] = a2
Var[Z]
Only if Z1 and Z2 are independent:
Var[Z1 ± Z2] = Var[Z1] + Var[Z2]
Andreas Scheidegger Univariate Random Variables 8
Multivariate random variables

A
Andreas Scheidegger Multivariate Random Variables 9
Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.
Andreas Scheidegger Multivariate Random Variables 10
Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.
continous RV:
fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)
E.g.: fA,B(3, 1) : proportional to the probability to obtain a
realization close to 3 and 1.
Andreas Scheidegger Multivariate Random Variables 10
Conditional Distributions
Discrete RV:
PA|B(a|b) =
PA,B(a, b)
PB(b)
Continuous RV:
fA|B(a|b) =
fA,B(a, b)
fB(b)
Andreas Scheidegger Multivariate Random Variables 11
Marginal distribution
Discrete random variables:
PA(a) =
b∈ΩB
PA,B(a, b)
Andreas Scheidegger Multivariate Random Variables 12
Marginal distribution
Discrete random variables:
PA(a) =
b∈ΩB
PA,B(a, b)
Continuous random variables:
fA(a) =
ΩB
fA,B(a, b) db
Andreas Scheidegger Multivariate Random Variables 12
Independence
Definition:
FA,B(a, b) = FA(a) · FB(b)
Discrete random variables:
PA,B(a, b) = PA(a) · PB(b)
Continuous random variables:
fA,B(a, b) = fA(a) · fB(b)
Andreas Scheidegger Multivariate Random Variables 13
Bayes’ Theorem1
Discrete random variables
Because
PA|B(a|b)PB(b) = PB|A(b|a)PA(a)
we can write
PA|B(a|b) =
PB|A(b|a)PA(a)
PB(b)
=
PB|A(b|a)PA(a)
a ∈ΩA
PB|A(b|a )PA(a )
1
Bayes’ Theorem as we know it today was actually formulated by P. Laplace
in 1774 and not by T. Bayes.
Andreas Scheidegger Multivariate Random Variables 14
Bayes’ Theorem
Continuous random variables
fA|B(a|b) =
fB|A(b|a)fA(a)
fB(b)
=
fB|A(b|a)fA(a)
fB|A(b|a )fA(a ) da
Andreas Scheidegger Multivariate Random Variables 15
Characteristics of Random Variables
Dependencies
Variance-Covariance Matrix:
Var[Z] = E Z − E[Z] Z − E[Z]
T
Individual Covariances:
Cov[Zi , Zj] = E Zi − E[Zi ] Zj − E[Zj] = Var[Z]i,j
Correlation Matrix:
Cor[Z]i,j =
Cov[Zi , Zj]
Var[Zi ] · Var[Zj]
Andreas Scheidegger Multivariate Random Variables 16
Correlation
Correlation measures only linear dependencies!
Figure: Several sets of (x, y) points, with the correlation coefficient of x
and y for each set. Source: Wikipedia.
Andreas Scheidegger Multivariate Random Variables 17
Short Notation
Function argument corresponds to RV
PA(a), PB|A(b|a) ←→ P(a), P(b|a)
fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)
Andreas Scheidegger Notation 18
Short Notation
Function argument corresponds to RV
PA(a), PB|A(b|a) ←→ P(a), P(b|a)
fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)
Example:
fX1|X2,X3
(x1|x2, x3) =
fX2|X1
(x2|x1)fX1|X3
(x1|x3)
fX2 (x2)
p(x1|x2, x3) =
p(x2|x1)p(x1|x3)
p(x2)
Andreas Scheidegger Notation 18
Directed Acyclic Graphs
Visualize independence structure of RV
A
B
DC
Andreas Scheidegger Notation 19
Directed Acyclic Graphs
Visualize independence structure of RV
A
B
DC
p(A)
p(B | A)
p(C | A, B)
p(D | B)
e.g. A and D are conditionally
independent. joint distribution:
p(A, B, C, D) =
p(A) p(B | A) p(C | A, B) p(D | B)
Andreas Scheidegger Notation 19
Normal distribution
Andreas Scheidegger Normal distributions 20
Central Limit Theorem
Lets X1, X2, . . . be independent and identically distributed RVs
with mean µ and a finite variance σ2. Further we define
Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2.
Then the standardized RV
Zn =
Sn − nµ
√
nσ
is standard normal distributed for n → ∞.
Andreas Scheidegger Normal distributions 21
Central Limit Theorem Example
n = 1
Density
−2 −1 0 1 2
0.00.40.8
n = 2
Density
−2 −1 0 1 2
0.00.30.6
n = 3
Density
−2 −1 0 1 2
0.00.30.6
n = 4
Density
−2 −1 0 1 2
0.00.20.4
n = 5
Density
−2 −1 0 1 2
0.00.20.4
n = 6
Density
−2 −1 0 1 2
0.00.20.4
n = 7
Density
−2 −1 0 1 2
0.00.20.4
n = 8
Density
−2 −1 0 1 2
0.00.20.4
n = 9
Density
−2 −1 0 1 2
0.00.20.4
n = 10
Density
−2 −1 0 1 2
0.00.20.4
n = 11
Density
−2 −1 0 1 2
0.00.20.4
n = 12
Density
−2 −1 0 1 2
0.00.20.4
Andreas Scheidegger Normal distributions 22
Relationships of Univariate Distributions
Figure 1. Univariate distribution relationships.
The American Statistician, February 2008, Vol. 62, No. 1 47
Downloadedby[Lib4RI]at02:2428May2013
at02:2428May2013
From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distribution
relationships. The American Statistician, 62(1), 45–53. → Link
Andreas Scheidegger Normal distributions 23
Multivariate Normal Distribution
Density of a multivariate Normal distribution of dimension n with a
mean vector µ and a variance-covariance matrix Σ:
Z ∼ N(µ, Σ)
fN(µ,σ,R)(z) =
1
(2π)n/2
1
| Σ |1/2
exp −
1
2
(z − µ)T
Σ−1
(z − µ)
Andreas Scheidegger Normal distributions 24
Multivariate Normal Distribution
Properties
All marginals are normal distributed
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )
Andreas Scheidegger Normal distributions 25
Multivariate Normal Distribution
Properties
All marginals are normal distributed
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )
Linear transformation:
Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT
Andreas Scheidegger Normal distributions 25
Multivariate Normal Distribution
Properties
All marginals are normal distributed
Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i )
Linear transformation:
Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT
Conditional distribution:
Z =
X
Y
∼ N
µX
µY
,
ΣX,X ΣX,Y
ΣT
X,Y ΣY,Y
⇒ X | Y (y) ∼ N µX + ΣX,YΣ−1
Y,Y(y − µY), ΣX,X − ΣX,YΣ−1
Y,YΣT
X,Y
Andreas Scheidegger Normal distributions 25
Further Generalization
one-dimensional
n-dimensional

A
what’s next?
Andreas Scheidegger Random Processes 26
Discrete random process
“Random vectors with infinity large number of elements”
(0.11, 10.78, -10.24, -3.90, 5.91, ...)
(-1.11, -4.06, -8.64, -0.92, -2.27, ...)
(0.76, -8.54, 0.81, 2.03, 12.9, ...)
Andreas Scheidegger Random Processes 27
Continous random processes
“Random functions”
Andreas Scheidegger Random Processes 28
What is a Probability?
Interpretation of probabilities
1. The probability for “head” is 1/2.
2. The probability that it rains tomorrow is 30%.
Frequentist Subjective
Other probability interpretations:
→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
What is a Probability?
Interpretation of probabilities
1. The probability for “head” is 1/2.
2. The probability that it rains tomorrow is 30%.
Frequentist
1. The frequency that “head”
occurs if the random
experiment is repeated.
Subjective
1. Somebody’s belief that a
coin toss results in “head”,
given his/her experience.
Other probability interpretations:
→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
What is a Probability?
Interpretation of probabilities
1. The probability for “head” is 1/2.
2. The probability that it rains tomorrow is 30%.
Frequentist
1. The frequency that “head”
occurs if the random
experiment is repeated.
2. “Rain tomorrow” is not a
repeatable experiment
Subjective
1. Somebody’s belief that a
coin toss results in “head”,
given his/her experience.
2. Somebody’s belief that it
rains tomorrow, given
his/her experience.
Other probability interpretations:
→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
Summary
joint = conditional x marginal
f (a, b) = f (a|b) f (b) = f (b|a) f (a)
Marginals:
f (a) = f (a, b) db = f (a|b) f (b) db
More information in Appendix A.2 – A.5.
Andreas Scheidegger Summary 30
Common distributions
Andreas Scheidegger Summary 31
Implemented distribution in R
For all distributions four functions are implemented:
d__(x, ...) pdf evaluated at x
p__(x, ...) cdf evaluated at x
q__(p, ...) p-th quantile
r__(n, ...) sample n random numbers
beta *beta binomial *binom
Cauchy *cauchy chi-squared *chisq
exponential *exp F *f
gamma *gamma geometric *geom
hypergeometric *hyper log-normal *lnorm
multinomial *multinom negative binomial *nbinom
normal *norm Poisson *pois
Student’s t *t uniform *unif
Weibull *weibull
Andreas Scheidegger Summary 32
Normal Distribution
Density
Z ∼ N(µ, σ) fN(µ,σ)(z) =
1
σ
√
2π
exp −
(z − µ)2
2σ2
−3 −2 −1 0 1 2 3
012345
Normal with mean=0
z
f
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
Normal with mean=0
z
F
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
Andreas Scheidegger Summary 33
Normal Distribution
Properties
E N(µ, σ) = Mode N(µ, σ) = Med N(µ, σ) = µ
SD N(µ, σ) = σ
Central limit theorem:
Lets X1, X2, . . . be independent and identically distributed RVs
with mean µ and a finite variance σ2. Further we define
Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2.
Then the standardized RV
Zn =
Sn − nµ
√
nσ
is standard normal distributed for n → ∞.
Andreas Scheidegger Summary 34
Lognormal Distribution
Definition:
Z = exp(X) , X ∼ N(m, s)
Density:
Z ∼ LN(µ, σ)
fLN(µ,σ)(z) =



1
√
2π
1
sz
exp





−
1
2
log
z
µ
+
s2
2
2
s2





for z  0
0 for z ≤ 0
with
s = log 1 +
σ2
µ2
Andreas Scheidegger Summary 35
Lognormal Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0
012345
Lognormal with mean=1
z
f
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Lognormal with mean=1
z
F
sd = 0.1
sd = 0.25
sd = 0.5
sd = 1
sd = 2
sd = 4
Andreas Scheidegger Summary 36
Lognormal Distribution
Properties
E LN(µ, σ) = µ
Mode LN(µ, σ) =
µ
1 +
σ2
µ2
3
2
Med LN(µ, σ) =
µ
1 +
σ2
µ2
SD LN(µ, σ) = σ
Andreas Scheidegger Summary 37
Lognormal Distribution
R implementation
Attention: The lognormal distribution in R is defined with m and s
(the mean and standard deviation of X)!
The code below computes the arguments if mean µ and standard
deviation σ are given:
## conversion , ’mu ’ and ’sigma ’ given
meanlog - log(mu) - 0.5*log(1 + (sigma/mu )^2)
sdlog - sqrt(log(1 + sigma ^2/(mu ^2)))
## generate 1000 random samples
rlnorm (1000 , meanlog=meanlog , sdlog=sdlog)
Andreas Scheidegger Summary 38
χ2
Distribution
Definition:
Z =
n
i=1
X2
i , Xi ∼ N(0, 1)
Density:
Z ∼ χ2
n fχ2
n
(z) =
z(n−2)/2 exp(−z/2)
2n/2 Γ(n/2)
Andreas Scheidegger Summary 39
χ2
Distribution
0 2 4 6 8 10 12 14
0.00.10.20.30.40.50.6
χ2
z
f
df = 1
df = 2
df = 3
df = 4
df = 5
df = 10
0 2 4 6 8 10 12 14
0.00.20.40.60.81.0
χ2
z
F
df = 1
df = 2
df = 3
df = 4
df = 5
df = 10
Andreas Scheidegger Summary 40
χ2
Distribution
Properties
E χ2
n = n
Mode χ2
n = n − 2 for n ≥ 2
SD χ2
n =
√
2n
Andreas Scheidegger Summary 41
F Distribution
Definition:
Z =
X
n
Y
m
, X ∼ χ2
n , Y ∼ χ2
m
Density:
Z ∼ Fn,m fFn,m (z) =
Γ (n + m)/2 (n/m)n/2 z(n−2)/2
Γ n/2 Γ m/2
Andreas Scheidegger Summary 42
F Distribution
0 1 2 3 4
0.00.20.40.60.81.01.2
F
z
f
df1 = 2 df2 = 10
df1 = 3 df2 = 10
df1 = 5 df2 = 10
df1 = 5 df2 = 100
0 1 2 3 4
0.00.20.40.60.81.0
F
z
F
df1 = 2 df2 = 10
df1 = 3 df2 = 10
df1 = 5 df2 = 10
df1 = 5 df2 = 100
Andreas Scheidegger Summary 43
F Distribution
Properties
E Fn,m =
m
m − 2
for m  2
Mode Fn,m =
m(n − 2)
n(m + 2)
for n  2
SD Fn,m =
2m2(n + m − 2)
n(m − 2)2(m − 4)
for m  4
Andreas Scheidegger Summary 44
t Distribution
Definition:
Z =
X
Y
n
, X ∼ N(0, 1) , Y ∼ χ2
n
Density:
Z ∼ tn ftn (z) =
Γ (n + 1)/2
√
π n Γ n/2 (1 + z2/n)(n+1)/2
Andreas Scheidegger Summary 45

Review of probability calculus

  • 1.
    Review of probabilitycalculus June 11, 2017 Andreas Scheidegger Eawag: Swiss Federal Institute of Aquatic Science and Technology
  • 2.
    Random variables (RV) “Mathematicalmachines that generate numbers”
  • 3.
    Completely described bythe cumulative probability distribution function (cdf) or the probability distribution/density function (pdf). Some properties can be described by measures such as mean, variance, mode, . . . Andreas Scheidegger Univariate Random Variables 1
  • 4.
    Probability Distribution/Density Function(pdf) PA fB z1 z2 zn zzrzl Discrete RV Probability to obtain a certain output. Continuous RV Proportional to the probability to obtain an output close to a certain value. Andreas Scheidegger Univariate Random Variables 2
  • 5.
    Cumulative Distribution Function(cdf) FA FB z1 z2 zn zzrzl 0 1 0 1 Discrete and continous RV Probability to obtain an output equal or smaller than a certain value. Andreas Scheidegger Univariate Random Variables 3
  • 6.
    cdf and pdf DiscreteRVs Distribution function: FA(z) = P(A ≤ z) Probability distribution: PA(zi ) for zi ∈ ΩA Continous RVs Distribution function: FB(z) = P(B ≤ z) Probability density: fB(z) = d dz FB(z) P(B ∈ [z1, z2]) = z2 z1 fB(z) dz P(B ∈ [z, z + ∆]) ≈ ∆ · fB(z) Andreas Scheidegger Univariate Random Variables 4
  • 7.
    Characteristics of RandomVariables Measures of Location Expected value: E[A] = z∈ΩA z PA(z) , E[B] = ΩB z fB(z) dz Median: Med[Z] : P(Z ≤ Med[Z]) = P(Z Med[Z]) = Q0.5[Z] Quantiles: Qp[Z] : P(Z ≤ Qp[Z]) = p and P(Z Qp[Z]) = 1 − p Mode: Mode[A] = arg max zi ∈ΩA PA(zi ) , Mode[B] = arg max z∈ΩB fB(z) Andreas Scheidegger Univariate Random Variables 5
  • 8.
    Characteristics of RandomVariables Measures of Location Expected value of a function of a RV: E[g(A)] = z∈ΩA g(z)PA(z) E[g(B)] = ΩB g(z)fB(z) dz Andreas Scheidegger Univariate Random Variables 6
  • 9.
    Characteristics of RandomVariables Measures of Location Expected value of a function of a RV: E[g(A)] = z∈ΩA g(z)PA(z) E[g(B)] = ΩB g(z)fB(z) dz Attention! E[g(X)] = g (E[X]) Andreas Scheidegger Univariate Random Variables 6
  • 10.
    Characteristics of RandomVariables Measures of Extension Variance: Var[Z] = E Z − E[Z] 2 Standard Deviation: SD[Z] = Var[Z] Inter-Quantile Range: QRp[Z] = Q(1+p)/2[Z] − Q(1−p)/2[Z] Andreas Scheidegger Univariate Random Variables 7
  • 11.
    Characteristics of RandomVariables E[aZ + b] = a E[Z] + b E[Z1 ± Z2] = E[Z1] ± E[Z2] Var[Z] = E[Z2 ] − E[Z]2 Var[aZ + b] = a2 Var[Z] Only if Z1 and Z2 are independent: Var[Z1 ± Z2] = Var[Z1] + Var[Z2] Andreas Scheidegger Univariate Random Variables 8
  • 12.
  • 13.
  • 14.
    Joint distribution discrete RV: PA,B(a,b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a) E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1. Andreas Scheidegger Multivariate Random Variables 10
  • 15.
    Joint distribution discrete RV: PA,B(a,b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a) E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1. continous RV: fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a) E.g.: fA,B(3, 1) : proportional to the probability to obtain a realization close to 3 and 1. Andreas Scheidegger Multivariate Random Variables 10
  • 16.
    Conditional Distributions Discrete RV: PA|B(a|b)= PA,B(a, b) PB(b) Continuous RV: fA|B(a|b) = fA,B(a, b) fB(b) Andreas Scheidegger Multivariate Random Variables 11
  • 17.
    Marginal distribution Discrete randomvariables: PA(a) = b∈ΩB PA,B(a, b) Andreas Scheidegger Multivariate Random Variables 12
  • 18.
    Marginal distribution Discrete randomvariables: PA(a) = b∈ΩB PA,B(a, b) Continuous random variables: fA(a) = ΩB fA,B(a, b) db Andreas Scheidegger Multivariate Random Variables 12
  • 19.
    Independence Definition: FA,B(a, b) =FA(a) · FB(b) Discrete random variables: PA,B(a, b) = PA(a) · PB(b) Continuous random variables: fA,B(a, b) = fA(a) · fB(b) Andreas Scheidegger Multivariate Random Variables 13
  • 20.
    Bayes’ Theorem1 Discrete randomvariables Because PA|B(a|b)PB(b) = PB|A(b|a)PA(a) we can write PA|B(a|b) = PB|A(b|a)PA(a) PB(b) = PB|A(b|a)PA(a) a ∈ΩA PB|A(b|a )PA(a ) 1 Bayes’ Theorem as we know it today was actually formulated by P. Laplace in 1774 and not by T. Bayes. Andreas Scheidegger Multivariate Random Variables 14
  • 21.
    Bayes’ Theorem Continuous randomvariables fA|B(a|b) = fB|A(b|a)fA(a) fB(b) = fB|A(b|a)fA(a) fB|A(b|a )fA(a ) da Andreas Scheidegger Multivariate Random Variables 15
  • 22.
    Characteristics of RandomVariables Dependencies Variance-Covariance Matrix: Var[Z] = E Z − E[Z] Z − E[Z] T Individual Covariances: Cov[Zi , Zj] = E Zi − E[Zi ] Zj − E[Zj] = Var[Z]i,j Correlation Matrix: Cor[Z]i,j = Cov[Zi , Zj] Var[Zi ] · Var[Zj] Andreas Scheidegger Multivariate Random Variables 16
  • 23.
    Correlation Correlation measures onlylinear dependencies! Figure: Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Source: Wikipedia. Andreas Scheidegger Multivariate Random Variables 17
  • 24.
    Short Notation Function argumentcorresponds to RV PA(a), PB|A(b|a) ←→ P(a), P(b|a) fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b) Andreas Scheidegger Notation 18
  • 25.
    Short Notation Function argumentcorresponds to RV PA(a), PB|A(b|a) ←→ P(a), P(b|a) fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b) Example: fX1|X2,X3 (x1|x2, x3) = fX2|X1 (x2|x1)fX1|X3 (x1|x3) fX2 (x2) p(x1|x2, x3) = p(x2|x1)p(x1|x3) p(x2) Andreas Scheidegger Notation 18
  • 26.
    Directed Acyclic Graphs Visualizeindependence structure of RV A B DC Andreas Scheidegger Notation 19
  • 27.
    Directed Acyclic Graphs Visualizeindependence structure of RV A B DC p(A) p(B | A) p(C | A, B) p(D | B) e.g. A and D are conditionally independent. joint distribution: p(A, B, C, D) = p(A) p(B | A) p(C | A, B) p(D | B) Andreas Scheidegger Notation 19
  • 28.
  • 29.
    Central Limit Theorem LetsX1, X2, . . . be independent and identically distributed RVs with mean µ and a finite variance σ2. Further we define Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2. Then the standardized RV Zn = Sn − nµ √ nσ is standard normal distributed for n → ∞. Andreas Scheidegger Normal distributions 21
  • 30.
    Central Limit TheoremExample n = 1 Density −2 −1 0 1 2 0.00.40.8 n = 2 Density −2 −1 0 1 2 0.00.30.6 n = 3 Density −2 −1 0 1 2 0.00.30.6 n = 4 Density −2 −1 0 1 2 0.00.20.4 n = 5 Density −2 −1 0 1 2 0.00.20.4 n = 6 Density −2 −1 0 1 2 0.00.20.4 n = 7 Density −2 −1 0 1 2 0.00.20.4 n = 8 Density −2 −1 0 1 2 0.00.20.4 n = 9 Density −2 −1 0 1 2 0.00.20.4 n = 10 Density −2 −1 0 1 2 0.00.20.4 n = 11 Density −2 −1 0 1 2 0.00.20.4 n = 12 Density −2 −1 0 1 2 0.00.20.4 Andreas Scheidegger Normal distributions 22
  • 31.
    Relationships of UnivariateDistributions Figure 1. Univariate distribution relationships. The American Statistician, February 2008, Vol. 62, No. 1 47 Downloadedby[Lib4RI]at02:2428May2013 at02:2428May2013 From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distribution relationships. The American Statistician, 62(1), 45–53. → Link Andreas Scheidegger Normal distributions 23
  • 32.
    Multivariate Normal Distribution Densityof a multivariate Normal distribution of dimension n with a mean vector µ and a variance-covariance matrix Σ: Z ∼ N(µ, Σ) fN(µ,σ,R)(z) = 1 (2π)n/2 1 | Σ |1/2 exp − 1 2 (z − µ)T Σ−1 (z − µ) Andreas Scheidegger Normal distributions 24
  • 33.
    Multivariate Normal Distribution Properties Allmarginals are normal distributed Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i ) Andreas Scheidegger Normal distributions 25
  • 34.
    Multivariate Normal Distribution Properties Allmarginals are normal distributed Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i ) Linear transformation: Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT Andreas Scheidegger Normal distributions 25
  • 35.
    Multivariate Normal Distribution Properties Allmarginals are normal distributed Z ∼ N(µ, Σ) ⇒ Zi ∼ N(µi , Σi,i ) Linear transformation: Z ∼ N(µ, Σ) ⇒ AZ + b ∼ N Aµ + b, AΣAT Conditional distribution: Z = X Y ∼ N µX µY , ΣX,X ΣX,Y ΣT X,Y ΣY,Y ⇒ X | Y (y) ∼ N µX + ΣX,YΣ−1 Y,Y(y − µY), ΣX,X − ΣX,YΣ−1 Y,YΣT X,Y Andreas Scheidegger Normal distributions 25
  • 36.
  • 37.
  • 38.
  • 39.
    Discrete random process “Randomvectors with infinity large number of elements” (0.11, 10.78, -10.24, -3.90, 5.91, ...) (-1.11, -4.06, -8.64, -0.92, -2.27, ...) (0.76, -8.54, 0.81, 2.03, 12.9, ...) Andreas Scheidegger Random Processes 27
  • 40.
    Continous random processes “Randomfunctions” Andreas Scheidegger Random Processes 28
  • 41.
    What is aProbability? Interpretation of probabilities 1. The probability for “head” is 1/2. 2. The probability that it rains tomorrow is 30%. Frequentist Subjective Other probability interpretations: → http://www.webcitation.org/6YupVo9zG Andreas Scheidegger Interpretation 29
  • 42.
    What is aProbability? Interpretation of probabilities 1. The probability for “head” is 1/2. 2. The probability that it rains tomorrow is 30%. Frequentist 1. The frequency that “head” occurs if the random experiment is repeated. Subjective 1. Somebody’s belief that a coin toss results in “head”, given his/her experience. Other probability interpretations: → http://www.webcitation.org/6YupVo9zG Andreas Scheidegger Interpretation 29
  • 43.
    What is aProbability? Interpretation of probabilities 1. The probability for “head” is 1/2. 2. The probability that it rains tomorrow is 30%. Frequentist 1. The frequency that “head” occurs if the random experiment is repeated. 2. “Rain tomorrow” is not a repeatable experiment Subjective 1. Somebody’s belief that a coin toss results in “head”, given his/her experience. 2. Somebody’s belief that it rains tomorrow, given his/her experience. Other probability interpretations: → http://www.webcitation.org/6YupVo9zG Andreas Scheidegger Interpretation 29
  • 44.
    Summary joint = conditionalx marginal f (a, b) = f (a|b) f (b) = f (b|a) f (a) Marginals: f (a) = f (a, b) db = f (a|b) f (b) db More information in Appendix A.2 – A.5. Andreas Scheidegger Summary 30
  • 45.
  • 46.
    Implemented distribution inR For all distributions four functions are implemented: d__(x, ...) pdf evaluated at x p__(x, ...) cdf evaluated at x q__(p, ...) p-th quantile r__(n, ...) sample n random numbers beta *beta binomial *binom Cauchy *cauchy chi-squared *chisq exponential *exp F *f gamma *gamma geometric *geom hypergeometric *hyper log-normal *lnorm multinomial *multinom negative binomial *nbinom normal *norm Poisson *pois Student’s t *t uniform *unif Weibull *weibull Andreas Scheidegger Summary 32
  • 47.
    Normal Distribution Density Z ∼N(µ, σ) fN(µ,σ)(z) = 1 σ √ 2π exp − (z − µ)2 2σ2 −3 −2 −1 0 1 2 3 012345 Normal with mean=0 z f sd = 0.1 sd = 0.25 sd = 0.5 sd = 1 sd = 2 sd = 4 −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 Normal with mean=0 z F sd = 0.1 sd = 0.25 sd = 0.5 sd = 1 sd = 2 sd = 4 Andreas Scheidegger Summary 33
  • 48.
    Normal Distribution Properties E N(µ,σ) = Mode N(µ, σ) = Med N(µ, σ) = µ SD N(µ, σ) = σ Central limit theorem: Lets X1, X2, . . . be independent and identically distributed RVs with mean µ and a finite variance σ2. Further we define Sn = X1 + X2 + . . . + Xn, that has a mean nµ and variance nσ2. Then the standardized RV Zn = Sn − nµ √ nσ is standard normal distributed for n → ∞. Andreas Scheidegger Summary 34
  • 49.
    Lognormal Distribution Definition: Z =exp(X) , X ∼ N(m, s) Density: Z ∼ LN(µ, σ) fLN(µ,σ)(z) =    1 √ 2π 1 sz exp      − 1 2 log z µ + s2 2 2 s2      for z 0 0 for z ≤ 0 with s = log 1 + σ2 µ2 Andreas Scheidegger Summary 35
  • 50.
    Lognormal Distribution 0.0 0.51.0 1.5 2.0 2.5 3.0 012345 Lognormal with mean=1 z f sd = 0.1 sd = 0.25 sd = 0.5 sd = 1 sd = 2 sd = 4 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.20.40.60.81.0 Lognormal with mean=1 z F sd = 0.1 sd = 0.25 sd = 0.5 sd = 1 sd = 2 sd = 4 Andreas Scheidegger Summary 36
  • 51.
    Lognormal Distribution Properties E LN(µ,σ) = µ Mode LN(µ, σ) = µ 1 + σ2 µ2 3 2 Med LN(µ, σ) = µ 1 + σ2 µ2 SD LN(µ, σ) = σ Andreas Scheidegger Summary 37
  • 52.
    Lognormal Distribution R implementation Attention:The lognormal distribution in R is defined with m and s (the mean and standard deviation of X)! The code below computes the arguments if mean µ and standard deviation σ are given: ## conversion , ’mu ’ and ’sigma ’ given meanlog - log(mu) - 0.5*log(1 + (sigma/mu )^2) sdlog - sqrt(log(1 + sigma ^2/(mu ^2))) ## generate 1000 random samples rlnorm (1000 , meanlog=meanlog , sdlog=sdlog) Andreas Scheidegger Summary 38
  • 53.
    χ2 Distribution Definition: Z = n i=1 X2 i ,Xi ∼ N(0, 1) Density: Z ∼ χ2 n fχ2 n (z) = z(n−2)/2 exp(−z/2) 2n/2 Γ(n/2) Andreas Scheidegger Summary 39
  • 54.
    χ2 Distribution 0 2 46 8 10 12 14 0.00.10.20.30.40.50.6 χ2 z f df = 1 df = 2 df = 3 df = 4 df = 5 df = 10 0 2 4 6 8 10 12 14 0.00.20.40.60.81.0 χ2 z F df = 1 df = 2 df = 3 df = 4 df = 5 df = 10 Andreas Scheidegger Summary 40
  • 55.
    χ2 Distribution Properties E χ2 n =n Mode χ2 n = n − 2 for n ≥ 2 SD χ2 n = √ 2n Andreas Scheidegger Summary 41
  • 56.
    F Distribution Definition: Z = X n Y m ,X ∼ χ2 n , Y ∼ χ2 m Density: Z ∼ Fn,m fFn,m (z) = Γ (n + m)/2 (n/m)n/2 z(n−2)/2 Γ n/2 Γ m/2 Andreas Scheidegger Summary 42
  • 57.
    F Distribution 0 12 3 4 0.00.20.40.60.81.01.2 F z f df1 = 2 df2 = 10 df1 = 3 df2 = 10 df1 = 5 df2 = 10 df1 = 5 df2 = 100 0 1 2 3 4 0.00.20.40.60.81.0 F z F df1 = 2 df2 = 10 df1 = 3 df2 = 10 df1 = 5 df2 = 10 df1 = 5 df2 = 100 Andreas Scheidegger Summary 43
  • 58.
    F Distribution Properties E Fn,m= m m − 2 for m 2 Mode Fn,m = m(n − 2) n(m + 2) for n 2 SD Fn,m = 2m2(n + m − 2) n(m − 2)2(m − 4) for m 4 Andreas Scheidegger Summary 44
  • 59.
    t Distribution Definition: Z = X Y n ,X ∼ N(0, 1) , Y ∼ χ2 n Density: Z ∼ tn ftn (z) = Γ (n + 1)/2 √ π n Γ n/2 (1 + z2/n)(n+1)/2 Andreas Scheidegger Summary 45
  • 60.
    t Distribution −6 −4−2 0 2 4 6 0.00.10.20.30.40.5 t z f df = 1 df = 2 df = 4 df = 10 df = 100 −6 −4 −2 0 2 4 6 0.00.20.40.60.81.0 t z F df = 1 df = 2 df = 4 df = 10 df = 100 Andreas Scheidegger Summary 46
  • 61.
    t Distribution Properties E tn= Mode tn = 0 for n 1 SD tn = n n − 2 for n 2 Andreas Scheidegger Summary 47
  • 62.
    Uniform Distribution Density Z ∼U(zmin, zmax) fU(zmin,zmax) = 1 zmax − zmin −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 Uniform with mean=0 z f max = 1 max = 2 −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 Uniform with mean=0 z F max = 1 max = 2 Andreas Scheidegger Summary 48
  • 63.
    Uniform Distribution Properties E U(zmin,zmax) = zmin + zmax 2 Med U(zmin, zmax) = zmin + zmax 2 SD U(zmin, zmax) = zmax − zmin 2 √ 3 Andreas Scheidegger Summary 49