This document presents an overview of maximum entropy approaches for operational risk modeling. It discusses challenges in modeling loss distributions with scarce data, including parameter uncertainty, poor tail fits, and an inability to separately fit the body and tails of distributions. The maximum entropy approach models the loss distribution by maximizing entropy subject to moment constraints derived from the empirical Laplace transform of the loss data. This provides a robust density reconstruction over the entire range of losses. The document provides examples of using maximum entropy to model univariate and multivariate loss distributions, including dependencies between different risk types.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING
1. MAXENTROPIC AND QUANTITATIVE METHODS IN
OPERATIONAL RISK MODELING
Erika Gomes
epgomes@emp.uc3m.es
joint work with
Silvia Mayoral Henryk Gzyl
Department of Business Administration
Universidad Carlos III de Madrid
September, 2016
1 / 48
2. Outline
Work Review
(P1) Two Maxentropic Approaches to determine the probability density of
compound losses. Journal of Insurance: Mathematics and Economics, 2015
(P2) Density Reconstructions with Errors in the Data. Entropy, 2014
(P3) Maxentropic approach to decompound aggregate risk losses. Journal of
Insurance: Mathematics and Economics, 2015
(P4) Loss data analysis: Analysis of the sample dependence in density
reconstruction by maxentropic methods. Journal of Insurance: Mathematics and Economics, 2016
(P5) Maximum entropy approach to the loss data aggregation problem. Journal of
Operational Risk, 2016.
2 / 48
3. Outline
1 Introduction
Motivation
Methodology: Loss distribution Approach
Univariate case
Multivariate case
2 Maximum Entropy Approach
Examples and Applications
Theory
3 Numerical Results
4 Conclusions
3 / 48
4. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
4 / 48
5. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
4 / 48
6. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
4 / 48
7. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
4 / 48
8. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
The problem is the calculation of the amount of money you may need in order
to be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a
1-in-1000 (quantile .999) years worst possible loss event.
4 / 48
9. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
The problem is the calculation of the amount of money you may need in order
to be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a
1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
4 / 48
10. Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
financial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
The problem is the calculation of the amount of money you may need in order
to be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a
1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
11. Introduction
Loss distribution Approach (LDA): Univariate Case
Operational risk has to do with
losses due failures in processes,
technology, people, etc.
Two variables play a role in operational risk:
Severity (X): Lognormal, gamma,
weibull distributions, Subexponential
distributions...
Frequency (N): Poisson, negative
binomial , binomial distributions.
S = X1 + X2 + ... + XN =
N
n=1
Xn
where S represents the aggregate claim
amount in a fixed time period (typically one
year) per risk event.
Approach used: Fit parametric distributions
to N and X and obtain fS through recursive
models or convolutions
No single distribution fits well over the en-
tire data set.
5 / 48
12. Introduction
Loss distribution Approach (LDA): Multivariate Case
Then, each of these loss distributions are further summed among all types of risk, to
arrive at the total aggregate loss.
(S1, ..., Sm) = (
N1
i=1
X1i, ...,
Nm
i=1
Xmi)
ST =
m
i=1
Si = S1 + S2 + ... + Sm
where b = 1, . . . , 8(business lines), l = 1, . . . , 7(event types),
m = 8 × 7, types of risks in Operational Risk
Dependence structure between risks Si , i.e. choice of a copula model.
6 / 48
13. Introduction
Loss distribution Approach (LDA): Illustrative Example
Loss distribution Approach (LDA): Illustrative Example
Estimate parametric distributions for the Frequency N and Severity X for each
individual risk (Maximum-Likelihood Estimation - MLE).
Compound the distributions (Panjer, Convolutions, Fourier, ...). Then, we have
fSi
(Univariate Case)
Then, the density fST
of the sum
ST = S1 + ... + SB
(Multivariate case) can be obtained by a sequential convolution procedure:
1 Derive the distribution of the sum of a pair of values S1 + S2 from the
joint density fS1,S2
(s1, s2) = fS1
(s1)fS2
(s2)c(s1, s2), where C is the copula
model .
2 Apply the convolution integral
fS1+S2
= s1
fS1,S2
(s1, l12 − s1) = s2
fS1,S2
(s2, l12 − s2)
steps (1) and (2) are repeated for the rest of the sum.
7 / 48
14. Introduction
Illustrative Example
200 samples of size 10
Aggregation: 7 independent types of risks
What happen when the data is scarce, as
in common in banking?
Problems
Parameter uncertainty.
Bad fit in the tails.
Scarcity of data, impossibility to fit
tails and body separately.
Underestimation of the regulatory
capital charge.
This methodology gives a bad fit
even when re-sampling is an
alternative.
8 / 48
15. Maximum entropy approach
Illustrative example - size and tail concern
Parametric Approach
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.001234
Losses
density
True
AVERAGE
Reconstructions
Maxentropic Approach
Maxentropic methodologies provides a density reconstruction over the entire
range of values.
9 / 48
16. Maximum entropy approach
Illustrative example - bimodal concern
(1)
(2)
Error (1) (2)
MAE 0.02652 0.01291
RMSE 0.03286 0.01647
Table: Errors.
Maxentropic is able to model asymmetries.
10 / 48
17. Maximum entropy approach
Dependencies
We use maxentropic methodologies to model dependencies between different
types of risks in the framework of Operational risk
11 / 48
18. Maximum entropy approach
Find a probability distribution P on some measure
space (Ω, F), which is absolutely continuous respect
to some (usually σ-finite) measure on Q and on
(Ω, F)
max
P
HQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ)
satisfying
{P << Q such that EP [AX] = Y }
Ω ρ(ξ)dQ(ξ) = 1
This method consist in to find the probability measure which
best represent the current state of knowledge which is the one
with the largest information theoretical entropy.
12 / 48
19. Maximum entropy approach
Jaynes, 1957
This concept was used by Jaynes (1957) for first time as a method
of statistical inference in the case of a under-determined problem.
For example:
We rolled 1000 times, a six-sided die comes up with an average of
4.7 dots. We want to estimate, as best we can, the probability
distribution of the faces.
There are infinitely many 6-tuples (p1, ..., p6) with pi ≥ 0,
i
pi = 1 and
i
ipi = 4.7
13 / 48
20. Maximum Entropy Approach
General overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω → C
into a problem of optimization, by maximizing the entropy measure.
Where C is a constraint set of possible reconstructions.
Then, we have a
Unique and robust solution
14 / 48
21. Laplace Transform
In probability theory and statistics, the Laplace transform is defined as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
If any two continuous functions have the same Laplace transform, then those
functions must be identical.
The Laplace transforms of some pdf’s are not easy to invert and there is not
a completely general method which works equally well for all possible
transforms.
15 / 48
22. Laplace Transform
In probability theory and statistics, the Laplace transform is defined as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
All the information about the problem can be compressed in a set of moments
obtained from the Laplace transform, through a change of variables.
ψ(α) = E[e−αS
] = E[Y α
] =
1
0
Y α
dFY (y) with Y = e−St and Y ∈ (0, 1)
The selection of those moments should be in a way that we use only those that
are the more relevant or informative (Lin-1992 and Entropy Convergence
Theorem).
16 / 48
23. Laplace Transform
We want to model fS with S > 0.
When N = 0 ⇒ S = 0 and we rewrite the Laplace transform as
ψ(α) = E[e−αS
] = P(S = 0) · E[e−αS
|S = 0] + P(S > 0) · E[e−αS
|S > 0]
ψ(α) = E[e−αS
] = P(N = 0) · E[e−αS
|N = 0] + P(N > 0) · E[e−αS
|N > 0]
where P(S = 0) = P(N = 0) = po, then
ψ(α) = E[e−αS
] = po · 1 + (1 − po) · E[e−αS
|N > 0]
µ(α) = E[e−αS
|N > 0] =
ψ(α) − po
1 − po
ψ(α) and po has to be estimated from the data.
17 / 48
24. Input of the Methodology
Univariate Case
Thus, the problem becomes to determine fS from the integral constraint after a
change of variables,
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 18 / 48
25. Input of the Methodology
Univariate Case
Thus, the problem becomes to determine fS from the integral constraint after a
change of variables,
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
Fit parametrically the frequency and severity distributions and calculate the Laplace
transform through the probability generation function.
Poisson-Gamma
ψ(αk ) = exp(− (1 − ba
(αk + b)−a
))with αk = α0/k
The quality of the results is linked to how well the data fit to the defined
distributions.
It is not possible to find a closed form of ψ(αk ) for some pdf’s. This is
particular true for long tail pdf’s, as for example the lognormal distribution.
19 / 48
26. Input of the Methodology
Univariate Case
Thus, the problem becomes to determine fS from the integral constraint after a
change of variables,
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 20 / 48
27. Input of the Methodology
Multivariate Case Dependencies
(1) We can aggregate dependencies to our input, knowing each fSi
ψ(α) = E[e−αk (S1+S2+...+SB )] =
=
N−1
i=1
e−(s1i +s2i +...+sBi )αk f (s1i , s2i , ..., sBi )∆s1∆s2...∆sB
where N is the number of partitions used in the discretization and
f (s1, s2, ..., sB ) = c[F1(s1), ..., F1(sB )]
B
i=1
fSi
(xi )
the joint distribution, c is the density of the copula model C, and fS1
, ..., fSB
are
marginal densities.
(2) Simply ψ(α) = 1
T
T
i=1
e−αk (s1i +s2i +...+sBi )
where T is the sample size.
21 / 48
28. Maximum Entropy Methods
max H(f ) = −
1
0
fY (y)lnfY (y)dy
SME approach. Find the probability density on [0,1]
ψ(α) =
1
0
yαk f (y)dy = µ(αk ) with Y = e−S
where µ =
ψ(αk )−P(N=0)
1−P(N=0)
MEM approach: Extension of the SME approach, allows to include a reference
measure Q, which is a parametric distribution.
SMEE approach: Extension of the SME approach when we assume that the
data has noise.
1
0
yαk f (y)dy ∈ Ck = [ak , bk ] with Y = e−S
These methods consist in to find the probability measure which best represent
the current state of knowledge which is the one with the largest information
theoretical entropy.
22 / 48
29. Standard Maximum Entropy Method (SME)
In general, the maximum entropy density is obtained by maximizing the entropy
measure
Max H(f ) = −
1
0
fY (y)lnfY (y)dy
satisfying
E(yαk ) =
1
0
yαk fY (y)dy = µαk , k = 1, 2, ..., K with K = 8
1
0 fY (y)dy = 1
where
µk : k-th moment, which it is positive and it is a known value
K = 8: number of moments
Fractional value: αk = α0/k, α0 = 1.5
23 / 48
30. Standard Maximum Entropy Method (SME)
When the problem has a solution can be expressed in terms of the Lagrange
multipliers as
f ∗
Y (y) =
1
Z(λ)
exp −
K
k=1
λk yαk = exp −
K
k=0
λk yαk
the normalization constant is determined by
Z(λ) =
Ω
exp −
K
k=1
λk yαk dy
Then it is necessary to find λ∗, that is the minimizer of the dual entropy that is
in function of the Lagrange multipliers λ and is given by
H(λ) = lnZ(λ)+ < λ, µ >= (λ, µ)
Basically it is a problem of minimizing a convex function and there have to re-
duce the step size as it progresses (Barzilai and Borwein non-monotone gradient
method)
f ∗
Y (y) =
1
Z(λ∗)
exp −
K
k=1
λ∗
k yαk Y ∈ (0, 1)
24 / 48
31. Standard Maximum Entropy Method (SME)
1 The starting point is ∞
0 e−αs dFS (s) = µk , S ∈ (0, ∞)
2 Make a change of variables, setting Y = e−S(N), Y ∈ (0, 1),
3 Find a minimum of the dual entropy that is in function of λ,
min
λ
(λ, µ) = lnZ(λ)+ < λ, µ >
where
Z(λ) =
1
0
e− K
k=1 λk yαk
dy.
4 The solution
f ∗
Y (y) =
1
Z(λ∗
)
e− K
k=1 λ∗
k yαk
= e− K
k=0 λ∗
k yαk
Y ∈ (0, 1)
5 Return the change of variables
f ∗
S (s) = e−s
f ∗
Y (e−s
), S ∈ (0, ∞)
25 / 48
32. Extensions of the SME Approach: SMEE
Remember that µk = E[e−αS ] was estimated from observed
values s1, ...., sN of S. But, there is some measurement error ε
Approach I: The input µ is an interval Ck = [ak, bk].
Find f ∗
S such that Ω e−αk S dFS (s) ∈ Ck
Approach II: We have two inputs: µk and an interval for the
errors [ak, bk], centered on zero.
Find f ∗
S and p∗
k such that
Ω
e−αk S
dFS (s) + pkak + (1 − pk)bk = µk
26 / 48
33. Extensions of the SME Approach (SMEE)
Approach II: We have two inputs, µk and an interval for
the errors [ak, bk], centered on zero.
µk = E[e−αS ] + ε, where ε ∈ [ak, bk]
max
H(f , p) = −
1
0
f (y) ln(f (y))dy−
K
k=1
(pk ln pk + (1 − pk) ln(1 − pk))
such that
1
0 yαk fY (y)dy + pkak + (1 − pk)bk = µk
0 < pk < 1,
1
0 fY (y)dy = 1
k = 1, ..., K with K = 8.
27 / 48
34. The solution can be expressed in terms of the Lagrange multipliers
f ∗(y) = e− K
k=1 λk yαk
Z(λ)
p∗
k = e−ak λk
e−ak λk +e−bk λk
.
Here, the normalization factor Z(λ) is as above. The vector λ∗
of
Lagrange multipliers is to be found minimizing the dual entropy
H(λ) = ln Z(λ) +
K
k=1
ln e−ak λk
+ e−bk λk
+ < λ, µ >= Σ(λ)
Once λ∗
is found, the estimator of the measurement error is, given
by
εk =
ake−ak λ∗
k + bke−bk λ∗
k
e−ak λ∗
k + e−bk λ∗
k
.
f ∗(y) = e− K
k=1 λ∗
k yαk
Z(λ∗)
, p∗
k = e−ak λ∗
k
e
−ak λ∗
k +e
−bk λ∗
k
28 / 48
35. Numerical Results
1 Two Maxentropic Approaches to determine the probability density of compound
losses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in Density
Reconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
29 / 48
36. Numerical Results
To test the methodology we consider different combinations of frequencies and
severity losses.
We use a sample large enough for not to worry about the effect of the size in
the results.
We use several methods to verify the quality of the results: L1 & L2 distances,
MAE & RMSE distances, visual comparisons, and goodness of fit tests.
MAE =
1
T
T
n=1
| ˆF(xn) − Fe (xn)|
RMSE =
1
T
T
n=1
ˆF(xn) − Fe (xn)
2
RMSE is more sensitive to outliers, because this measure gives a relatively high
weight to large errors. So, the greater the difference between MAE and RMSE,
the greater the variance of the individual errors in the sample.
30 / 48
37. Simulation details
To test the methodology we consider different combinations of fre-
quencies and loss distributions.
Sbh Nbh Xbh
S1: POISSON (λ = 80) CHAMPERNOWNE (α = 20, M = 85, c = 15)
S2: POISSON (λ = 60) LogNormal (µ = -0.01, σ = 2)
S3: BINOMIAL(n = 70, p = 0.5) PARETO( shape = 10, scale = 85)
S4: BINOMIAL (n = 62, p = 0.5) CHAMPERNOWNE (α = 10, M = 125, c = 45)
S5: BINOMIAL (n = 50, p = 0.5) GAMMA(shape = 4500, rate = 15)
S6: BINOMIAL (n = 76, p = 0.5) GAMMA(shape = 9000, rate = 35)
S7: NEGATIVE BINOMIAL (r = 80, p = 0.3) WEIBULL(shape = 200, scale = 50)
Tail: NEGATIVE BINOMIAL (r = 90, p = 0.8) PARETO(shape = 5.5, scale = 5550)
Table: Inputs for the simulation of S
All the risks are independent.
31 / 48
38. Results
Approach MAE RMSE
SMEE 0.005928 0.006836
SME 0.006395 0.009399
Table: MAE and RMSE for a sample size o
5000
MAE =
1
T
T
n=1
| ˆF(xn) − Fe (xn)|
RMSE =
1
T
T
n=1
ˆF(xn) − Fe (xn)
2
32 / 48
44. Computation of the regulatory capital
γ Empirical SME SMEE
VaR
0.950 5.05 4.935 5.004
0.990 5.72 5.755 5.772
TVaR
0.950 5.45 5.443 5.461
0.990 6.05 6.0207 6.014
Table: Comparison of VaR and TVaR at 95% and 99% for a unique
sample of size 5000
Size
VaR(95%) TVaR(95%) VaR(99%) TVaR(99%)
SME SMEE SME SMEE SME SMEE SME SMEE
10 4.96 4.87 5.30 5.156 4.331 5.283 4.328 5.634
100 4.96 4.931 5.44 5.43 5.457 5.779 5.694 6.016
500 4.95 4.93 5.45 5.45 5.708 5.822 5.972 6.017
1000 4.95 4.95 5.45 5.45 5.729 5.828 5.977 6.064
Table: Mean and standard deviation of the VaR and TVaR for 200
samples of different sizes
38 / 48
45. Conclusions
In this work we present an application of the maxentropic methodologies to
Operational Risk. We showed that this methodology can provide a good density
reconstruction over the entire range, in the case of scarcity, heavy tails and
asymmetries, using only eight moments as input of the methodology.
This methodology gives the possibility to obtain the density distributions from
different levels of aggregation and allow us to include dependencies between
different types of risks.
We can joint marginal densities obtained from any methodology and give
them any relation or
we can obtain the joint distribution directly from the data and avoid bad
estimations
The estimation of the underlying loss process provides a starting point to design
policies, set premiums and reserves, calculate optimal reinsurance levels and
calculate risk pressures for solvency purposes in insurance and risk management.
Also, this is useful in structural engineering to describe the accumulated damage
of a structure, just to mention one more possible application.
39 / 48
46. Conclusions
1 Two Maxentropic Approaches to determine the probability density of compound
losses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in Density
Reconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
40 / 48
47. Conclusions
Here, we work with four different approximations of
Maxentropic methodologies
(SME/MEM/SMEE(ap1)/SMEE(ap2)), two of those allows
to include the possibility of consider a little of uncertainty in
the input, using a interval for the moments instead of the
sample estimation. Additionally, the maximum entropy in the
mean (MEM) allows to aggregate to the estimation a
reference measure to improve the results.
In general the SMEE method improves the quality of the
results in terms of convergence and number of iterations.
This methodology can be used to estimate the distributions of
the losses fX in the first level of aggregation, when we have
the distribution of aggregated losses fS and the distribution of
the frequency pn.
41 / 48
49. Maximum Entropy Method - General overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω → C
into a problem of convex optimization, by maximizing the entropy measure.
Where
C is a constraint set of possible reconstructions, as values of a r.v. in X Ω is a sample
space
Then, we have a
Unique and robust solution
Among those x’s ∈ C yielding similar reconstruction error, choose one with a
norm as small as possible.
43 / 48
50. Maximum Entropy Method - General overview
Find a probability distribution P on some measure space (Ω, F), which is absolutely
continuous respect to some (usually σ-finite) measure on Q and on (Ω, F)
max
P
SQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ) = −
Ω
dP
dQ
ln
dP
dQ
dQ = −
Ω
ln
dP
dQ
dP
(1)
satisfying
{P << Q such that EP [AX] = Y } (2)
where
Q: is the reference measure which reflects the information that we have.
EP [AX] = AEP [X] = − Ω AξdP(ξ) = − Ω Aξρ(ξ)dξ = y
Ω ρ(ξ)dQ(ξ) = 1
dP(ξ) = ρ(ξ)dQ(ξ) dQ(ξ) = qdξ
Note that if such measure P is found, them
xj = Ep[Xj ]
44 / 48
51. We introduce Lagrangian multipliers λ to obtain the result
dP(λ) =
exp(− < λ, Aξ >)
Z(λ)
dQ(ξ)
where the normalization constant is determined by
Z(λ) =
Ω
e−<λ,Aξ>)
dQ(ξ) (3)
Then it is necessary to find λ∗, that is the minimizer of the dual entropy that is
in function of λ and is given by
inf (SQ (λ)) = −sup(−SQ (P))
SQ (λ) = −
Ω
exp(− < λ, Aξ >)
Z(λ)
ln
exp(− < λ, Aξ >)
Z(λ)
dQ(ξ)
min
λ
(λ, y) = lnZ(λ)+ < λ, y >
45 / 48
52. Basically it is a problem of minimizing a convex function that is very flat and
there have to reduce the step size as it progresses
Min
λ
(λ, y) = Min
λ
{lnZ(λ)+ < λ, y >} (4)
Z(λ) =
Ω
e−<λ,Aξ>)
dQ(ξ) (5)
Barzilai and Borwein optimization method (BB-method)
Optimal solution:
ρ∗
(ξ) =
exp(− < λ∗, Aξ >)
Z(λ∗)
and
x∗
j =
exp(−At λ∗)
z(λ∗)
, with z(λ∗
) = exp(−λo)
46 / 48
53. Measures of Reference
The choice of the measure Q is up to the modeler, and it may thought of as a first
guess of the unknown distribution.
SQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ)
SQ (λ) = (λ, µ) = lnZ(λ)+ < λ, µ >
Z(λ) = EQ [e−<λ,µ>)
] =
Ω
e−<λ,µ>)
dQ(ξ)
Uniform distribution ξ ∼ U(0, 1), dQ(ξ) = dξ ⇒ SME
47 / 48
54. Lin (1992). Characterization of distributions via moments.
Theorem (Lin 1) Let Fy be the distribution of a positive random
variable Y . Let αn be a sequence of positive and distinct numbers in
(0, A) for some A, satisfying limn→∞αn = αo < A. If E[Y A] < ∞,
the sequence of moments E[Y αn ] characterizes Fy .
Theorem (Lin 2) Let Fy be the distribution of a positive random
variable Y . Let αn be a sequence of positive and distinct numbers
satisfying limn→∞αn = 0 and n≥1 αn = ∞. Then the sequence
of moments E[Y αn ] characterizes Fy .
Both results hinge on the fact that an analytic function is determined
from a collection of its values on a countable set having an accu-
mulation point in its domain of analyticity. The connection between
the two explained in Lin’s paper.
48 / 48