Formulation of model likelihood functions

Formulation of model likelihood functions
The most useful representation of stochastic models
June 12, 2017
Andreas Scheidegger
Eawag: Swiss Federal Institute of Aquatic Science and Technology

Statistical models are stories about how the data
came to be – Dave Harris
Andreas Scheidegger Motivation 1

What is a likelihood function?
Deﬁnition
The likelihood function p(y1, . . . , yn|θ) or L(θ) is the joint
probability (density) of observations {y1, . . . yn} given a stochastic
model with parameter values θ.

What is a likelihood function?
Deﬁnition
The likelihood function p(y1, . . . , yn|θ) or L(θ) is the joint
probability (density) of observations {y1, . . . yn} given a stochastic
model with parameter values θ.
Informal
If we simulate output data similar to my measurements with a
stochastic model while setting the parameters equal θ, what is the
probability (density) that we obtain {y1, . . . yn}?

For what do we need likelihood functions?
Many parameter calibration and predictions techniques require that
the model is described by its likelihood function:
Frequentist statistics:
Maximum likelihood estimator (MLE), LR-tests, . . .
Bayesian statistics:
Parameter inference, uncertainty propagation,
predictions, model comparison, . . .
→ topic of this course
Note: The actual value of the likelihood function per se is usually not of
interest.

How to formulate likelihood functions?
Often, models are not described by the likelihood function.
A common description may rather look like this:
Yi = M(xi , θ) + i , i ∼ N(0, σ2
)
While this a complete description of the stochastic model1, it is
not directly useful for inference → we must translate such a
description into p(y|θ, x).
1
M(xi , θ) is a deterministic function. The complete model, however, is
stochastic because we added a random error term i .

Some (informal) advices
• Formulate first the likelihood general without specific
distribution assumptions.
• Think (informally!) p(x) as Prob(X = x) and change sums to
integrals.
• Practically, a function that is proportional to the likelihood
function is sufficient.
• The logarithmic scale is prefered for computation.
• Don’t care about identifiability of the parameters at this stage.
Andreas Scheidegger Derivation of p(y|θ) 6

Example 1: sex ratio
discrete data
Observed data y
The gender of n newborns.
Model description
We assume the probability for
girl is θ and for boy 1 − θ.
Andreas Scheidegger Examples 7

discrete data
Probability for a single observation:
Prob(yi |θ) =
θ, yi = girl
1 − θ, yi = boy

discrete data
Probability for a single observation:
Prob(yi |θ) =
θ, yi = girl
1 − θ, yi = boy
Independence is a reasonable assumption:
Prob(y1, . . . , yn|θ) =
n
i=1
Prob(yi |θ)
= θ#girls
(1 − θ)#boys

discrete data
R implementation as function:
logL <- function(theta, n.girls, n.boys) {
LL <- n.girls*log(theta) + n.boys*log(1-theta)
return(LL)
}
Call:
logL(theta=0.4, n.girls=10, n.boys=5)
> -11.717035

Example 2: rating curve
continuous data
Observed data y, x
n pairs of water level xi and
run-oﬀ yi .
Figure: Rating curve of Sluzew
Creek. Sikorska et al. (2013)

continuous data
Observed data y, x
n pairs of water level xi and
run-oﬀ yi .
Model description
Water level x and run-oﬀ y are
related as
y = RC(x, θ) = θ1(x − θ2)θ3
Figure: Rating curve of Sluzew
Creek. Sikorska et al. (2013)

continuous data
A deterministic model?
→ We must make assumptions
about the error distribution. E.g.,
Yi = RC(xi , θ) + i , i ∼ N(0, σ2
)
or equivalent
Yi ∼ N RC(xi , θ), σ2
The RC model describes only the
expected value of an observation
for a given xi .
2
φ(x) = 1√
2π
exp{−x2
2
}

continuous data
Yi = RC(xi , θ) + i , i ∼ N(0, σ2
)
or equivalent
for a given xi .
So the pdf for a single
observation is the density of a
normal distribution2
p(yi |xi , θ, σ) =
1
σ
φ
yi − RC(xi , θ)
σ
2
φ(x) = 1√
2π
exp{−x2
2
}

continuous data
Yi = RC(xi , θ) + i , i ∼ N(0, σ2
)
or equivalent
for a given xi .
So the pdf for a single
observation is the density of a
normal distribution2
p(yi |xi , θ, σ) =
1
σ
φ
yi − RC(xi , θ)
σ
Finally, assuming independent
observations
p(y1, . . . , yn|x1, . . . , xn, θ, σ)
=
n
i=1
p(yi |xi , θ, σ)
2
φ(x) = 1√
2π
exp{−x2
2
}

continuous data
Figure: Rating curve. Example of a non-linear regression.

continuous data
X (water level)
Y(runoﬀ)
RC(X,θ)
Figure: Rating curve. Example of a non-linear regression.

continuous data
## deterministic raiting curve model
RC <- function(x, theta) {
y <- theta[1]*(x-theta[2])^theta[3]
return(y)
}
## log likelihood with normal distributed errors
## sigma is included as theta[4]=sigma.
logL <- function(theta, y.data, x.data) {
mean.y <- RC(x.data, theta[1:3]) # mean value for y
LL <- sum(dnorm(y.data, mean=mean.y,
sd=theta[4], log=TRUE))
return(LL)
}

Example 3: limit of quantiﬁcation
censored data
Observed data y
lab 1 lab 2 lab 3 . . .
concentration y1 y2 n.d. . . .
Limit of quantiﬁcation: LOQ
standard deviation of measurements: σ
Model description
“A model? I just want to calculate the
concentration.”

censored data
Figure: Left censored data.
Model description
The measurements are normal
distributed around the true mean
θ with standard deviation σ.

censored data
Likelihood for a single measured observation:
p(yi |θ, σ) = φ
yi − θ
σ

censored data
p(yi |θ, σ) = φ
yi − θ
σ
Likelihood for a single “not detected” observation:
Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =
LOQ
0
p(y|θ, σ) dy
=Φ
LOQ − θ
σ

censored data
## data, if left censored = "nd"
y <- c(y1=0.35, y2=0.45, y3="nd", y4="nd", y5=0.4)
## log likelihood
logL <- function(theta, y, sigma, LOQ) {
## number of censored observations
n.censored <- sum(y=="nd")
## convert not censored observations into type ’numeric’
y.not.cen <- as.numeric(y[y!="nd"])
## likelihood for not censored observations
LL.not.cen <- sum(dnorm(y.not.cen, mean=theta, sd=sigma, log=TRUE))
## likelihood for left censored observations
LL.left.cen <- n.censored * pnorm(LOQ, mean=theta, sd=sigma, log=TRUE)
return(LL.not.cen + LL.left.cen)
}

Example 4: auto-regressive model
auto-correlated data
Observed data y
equally spaced time series data y1, . . . , yn.
Model description
Classical AR(1) model:
yt+1 = θyt + t+1, t+1 ∼ N(0, σ2
)
y0 = k
Time
waterlevel[feet]
1880 1920 1960
576
577
578
579
580
581
582
Figure: Annual water
level of Lake Huron.
Brockwell and Davis
(1991)

Observations are only dependent on the preceding observation.
Hence:
p(y1, . . . , yn|θ, σ, y0) =
n
i=1
p(yi |yi−1, θ, σ)
LL <- dnorm(y[1], k, sigma, log=TRUE) +
sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],
sd=sigma, log=TRUE))

Observations are only dependent on the preceding observation.
Hence:
p(y1, . . . , yn|θ, σ, y0) =
n
i=1
p(yi |yi−1, θ, σ)
The conditional probabilities are all normal
p(yt|y0, . . . , yt−1, θ, σ) = p(yt|yt−1, θ, σ) = φ
yt − θyt−1
σ
LL <- dnorm(y[1], k, sigma, log=TRUE) +
sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],
sd=sigma, log=TRUE))

Normality and “iid.”
Reality is normally not normal distributed
Typical statistical assumption, such as
• normality
• independence
are often chosen from a computational view point.
However, other distribution assumptions can be
incorporated easily in most cases.
Andreas Scheidegger General remarks 20

Rating curve modiﬁed
Lets assume we observe more extreme values than compatible with
a normal distribution → try t-distribution.
## log likelihood with t-distributed errors
## theta[4]=scale, theta[5]=degree of freedom.
logL <- function(theta, y.data, x.data) {
mean.y <- RC(x.data, theta[1:3]) # mean value for y
residuals <- (y.data - mean.y)/theta[4] # scaling
LL <- sum(dt(residuals, df=theta[5], log=TRUE))
return(LL)
}

Formulation of model likelihood functions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Formulation of model likelihood functions

Similar to Formulation of model likelihood functions (20)

More from Andreas Scheidegger

More from Andreas Scheidegger (8)

Recently uploaded

Recently uploaded (20)

Formulation of model likelihood functions