MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING

MAXENTROPIC AND QUANTITATIVE METHODS IN
OPERATIONAL RISK MODELING
Erika Gomes
epgomes@emp.uc3m.es
joint work with
Silvia Mayoral Henryk Gzyl
Department of Business Administration
Universidad Carlos III de Madrid
September, 2016
1 / 48

Outline
Work Review
(P1) Two Maxentropic Approaches to determine the probability density of
compound losses. Journal of Insurance: Mathematics and Economics, 2015
(P2) Density Reconstructions with Errors in the Data. Entropy, 2014
(P3) Maxentropic approach to decompound aggregate risk losses. Journal of
Insurance: Mathematics and Economics, 2015
(P4) Loss data analysis: Analysis of the sample dependence in density
reconstruction by maxentropic methods. Journal of Insurance: Mathematics and Economics, 2016
(P5) Maximum entropy approach to the loss data aggregation problem. Journal of
Operational Risk, 2016.
2 / 48

Outline
1 Introduction
Motivation
Methodology: Loss distribution Approach
Univariate case
Multivariate case
2 Maximum Entropy Approach
Examples and Applications
Theory
3 Numerical Results
4 Conclusions
3 / 48

Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
ﬁnancial system.
4 / 48

Introduction
Motivation
ﬁnancial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
4 / 48

Introduction
Motivation
ﬁnancial system.
The diﬀerences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
4 / 48

Introduction
Motivation
ﬁnancial system.
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
4 / 48

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
The problem is the calculation of the amount of money you may need in order
to be hedged at a high level of conﬁdence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a
1-in-1000 (quantile .999) years worst possible loss event.
4 / 48

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
4 / 48

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48

Introduction
Loss distribution Approach (LDA): Univariate Case
Operational risk has to do with
losses due failures in processes,
technology, people, etc.
Two variables play a role in operational risk:
Severity (X): Lognormal, gamma,
weibull distributions, Subexponential
distributions...
Frequency (N): Poisson, negative
binomial , binomial distributions.
S = X1 + X2 + ... + XN =
N
n=1
Xn
where S represents the aggregate claim
amount in a ﬁxed time period (typically one
year) per risk event.
Approach used: Fit parametric distributions
to N and X and obtain fS through recursive
models or convolutions
No single distribution ﬁts well over the en-
tire data set.
5 / 48

Introduction
Loss distribution Approach (LDA): Multivariate Case
Then, each of these loss distributions are further summed among all types of risk, to
arrive at the total aggregate loss.
(S1, ..., Sm) = (
N1
i=1
X1i, ...,
Nm
i=1
Xmi)
ST =
m
i=1
Si = S1 + S2 + ... + Sm
where b = 1, . . . , 8(business lines), l = 1, . . . , 7(event types),
m = 8 × 7, types of risks in Operational Risk
Dependence structure between risks Si , i.e. choice of a copula model.
6 / 48

Introduction
Loss distribution Approach (LDA): Illustrative Example
Loss distribution Approach (LDA): Illustrative Example
Estimate parametric distributions for the Frequency N and Severity X for each
individual risk (Maximum-Likelihood Estimation - MLE).
Compound the distributions (Panjer, Convolutions, Fourier, ...). Then, we have
fSi
(Univariate Case)
Then, the density fST
of the sum
ST = S1 + ... + SB
(Multivariate case) can be obtained by a sequential convolution procedure:
1 Derive the distribution of the sum of a pair of values S1 + S2 from the
joint density fS1,S2
(s1, s2) = fS1
(s1)fS2
(s2)c(s1, s2), where C is the copula
model .
2 Apply the convolution integral
fS1+S2
= s1
fS1,S2
(s1, l12 − s1) = s2
fS1,S2
(s2, l12 − s2)
steps (1) and (2) are repeated for the rest of the sum.
7 / 48

Introduction
Illustrative Example
200 samples of size 10
Aggregation: 7 independent types of risks
What happen when the data is scarce, as
in common in banking?
Problems
Parameter uncertainty.
Bad fit in the tails.
Scarcity of data, impossibility to fit
tails and body separately.
Underestimation of the regulatory
capital charge.
This methodology gives a bad fit
even when re-sampling is an
alternative.
8 / 48

Maximum entropy approach
Illustrative example - size and tail concern
Parametric Approach
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.001234
Losses
density
True
AVERAGE
Reconstructions
Maxentropic Approach
Maxentropic methodologies provides a density reconstruction over the entire
range of values.
9 / 48

Illustrative example - bimodal concern
(1)
(2)
Error (1) (2)
MAE 0.02652 0.01291
RMSE 0.03286 0.01647
Table: Errors.
Maxentropic is able to model asymmetries.
10 / 48

Dependencies
We use maxentropic methodologies to model dependencies between diﬀerent
types of risks in the framework of Operational risk
11 / 48

Find a probability distribution P on some measure
space (Ω, F), which is absolutely continuous respect
to some (usually σ-ﬁnite) measure on Q and on
(Ω, F)
max
P
HQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ)
satisfying
{P << Q such that EP [AX] = Y }
Ω ρ(ξ)dQ(ξ) = 1
This method consist in to ﬁnd the probability measure which
best represent the current state of knowledge which is the one
with the largest information theoretical entropy.
12 / 48

Jaynes, 1957
This concept was used by Jaynes (1957) for ﬁrst time as a method
of statistical inference in the case of a under-determined problem.
For example:
We rolled 1000 times, a six-sided die comes up with an average of
4.7 dots. We want to estimate, as best we can, the probability
distribution of the faces.
There are inﬁnitely many 6-tuples (p1, ..., p6) with pi ≥ 0,
i
pi = 1 and
i
ipi = 4.7
13 / 48

Maximum Entropy Approach
General overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω → C
into a problem of optimization, by maximizing the entropy measure.
Where C is a constraint set of possible reconstructions.
Then, we have a
Unique and robust solution
14 / 48

Laplace Transform
In probability theory and statistics, the Laplace transform is deﬁned as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
If any two continuous functions have the same Laplace transform, then those
functions must be identical.
The Laplace transforms of some pdf’s are not easy to invert and there is not
a completely general method which works equally well for all possible
transforms.
15 / 48

Laplace Transform
In probability theory and statistics, the Laplace transform is deﬁned as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
All the information about the problem can be compressed in a set of moments
obtained from the Laplace transform, through a change of variables.
ψ(α) = E[e−αS
] = E[Y α
] =
1
0
Y α
dFY (y) with Y = e−St and Y ∈ (0, 1)
The selection of those moments should be in a way that we use only those that
are the more relevant or informative (Lin-1992 and Entropy Convergence
Theorem).
16 / 48

Laplace Transform
We want to model fS with S > 0.
When N = 0 ⇒ S = 0 and we rewrite the Laplace transform as
ψ(α) = E[e−αS
] = P(S = 0) · E[e−αS
|S = 0] + P(S > 0) · E[e−αS
|S > 0]
ψ(α) = E[e−αS
] = P(N = 0) · E[e−αS
|N = 0] + P(N > 0) · E[e−αS
|N > 0]
where P(S = 0) = P(N = 0) = po, then
ψ(α) = E[e−αS
] = po · 1 + (1 − po) · E[e−αS
|N > 0]
µ(α) = E[e−αS
|N > 0] =
ψ(α) − po
1 − po
ψ(α) and po has to be estimated from the data.
17 / 48

Input of the Methodology
Univariate Case
Thus, the problem becomes to determine fS from the integral constraint after a
change of variables,
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 18 / 48

Univariate Case
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
Fit parametrically the frequency and severity distributions and calculate the Laplace
transform through the probability generation function.
Poisson-Gamma
ψ(αk ) = exp(− (1 − ba
(αk + b)−a
))with αk = α0/k
The quality of the results is linked to how well the data fit to the defined
distributions.
It is not possible to find a closed form of ψ(αk ) for some pdf’s. This is
particular true for long tail pdf’s, as for example the lognormal distribution.
19 / 48

Univariate Case
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 20 / 48

Multivariate Case Dependencies
(1) We can aggregate dependencies to our input, knowing each fSi
ψ(α) = E[e−αk (S1+S2+...+SB )] =
=
N−1
i=1
e−(s1i +s2i +...+sBi )αk f (s1i , s2i , ..., sBi )∆s1∆s2...∆sB
where N is the number of partitions used in the discretization and
f (s1, s2, ..., sB ) = c[F1(s1), ..., F1(sB )]
B
i=1
fSi
(xi )
the joint distribution, c is the density of the copula model C, and fS1
, ..., fSB
are
marginal densities.
(2) Simply ψ(α) = 1
T
T
i=1
e−αk (s1i +s2i +...+sBi )
where T is the sample size.
21 / 48

Maximum Entropy Methods
max H(f ) = −
1
0
fY (y)lnfY (y)dy
SME approach. Find the probability density on [0,1]
ψ(α) =
1
0
yαk f (y)dy = µ(αk ) with Y = e−S
where µ =
ψ(αk )−P(N=0)
1−P(N=0)
MEM approach: Extension of the SME approach, allows to include a reference
measure Q, which is a parametric distribution.
SMEE approach: Extension of the SME approach when we assume that the
data has noise.
1
0
yαk f (y)dy ∈ Ck = [ak , bk ] with Y = e−S
These methods consist in to ﬁnd the probability measure which best represent
the current state of knowledge which is the one with the largest information
theoretical entropy.
22 / 48

Standard Maximum Entropy Method (SME)
In general, the maximum entropy density is obtained by maximizing the entropy
measure
Max H(f ) = −
1
0
fY (y)lnfY (y)dy
satisfying
E(yαk ) =
1
0
yαk fY (y)dy = µαk , k = 1, 2, ..., K with K = 8
1
0 fY (y)dy = 1
where
µk : k-th moment, which it is positive and it is a known value
K = 8: number of moments
Fractional value: αk = α0/k, α0 = 1.5
23 / 48

When the problem has a solution can be expressed in terms of the Lagrange
multipliers as
f ∗
Y (y) =
1
Z(λ)
exp −
K
k=1
λk yαk = exp −
K
k=0
λk yαk
the normalization constant is determined by
Z(λ) =
Ω
exp −
K
k=1
λk yαk dy
Then it is necessary to ﬁnd λ∗, that is the minimizer of the dual entropy that is
in function of the Lagrange multipliers λ and is given by
H(λ) = lnZ(λ)+ < λ, µ >= (λ, µ)
Basically it is a problem of minimizing a convex function and there have to re-
duce the step size as it progresses (Barzilai and Borwein non-monotone gradient
method)
f ∗
Y (y) =
1
Z(λ∗)
exp −
K
k=1
λ∗
k yαk Y ∈ (0, 1)
24 / 48

1 The starting point is ∞
0 e−αs dFS (s) = µk , S ∈ (0, ∞)
2 Make a change of variables, setting Y = e−S(N), Y ∈ (0, 1),
3 Find a minimum of the dual entropy that is in function of λ,
min
λ
(λ, µ) = lnZ(λ)+ < λ, µ >
where
Z(λ) =
1
0
e− K
k=1 λk yαk
dy.
4 The solution
f ∗
Y (y) =
1
Z(λ∗
)
e− K
k=1 λ∗
k yαk
= e− K
k=0 λ∗
k yαk
Y ∈ (0, 1)
5 Return the change of variables
f ∗
S (s) = e−s
f ∗
Y (e−s
), S ∈ (0, ∞)
25 / 48

Extensions of the SME Approach: SMEE
Remember that µk = E[e−αS ] was estimated from observed
values s1, ...., sN of S. But, there is some measurement error ε
Approach I: The input µ is an interval Ck = [ak, bk].
Find f ∗
S such that Ω e−αk S dFS (s) ∈ Ck
Approach II: We have two inputs: µk and an interval for the
errors [ak, bk], centered on zero.
Find f ∗
S and p∗
k such that
Ω
e−αk S
dFS (s) + pkak + (1 − pk)bk = µk
26 / 48

Extensions of the SME Approach (SMEE)
Approach II: We have two inputs, µk and an interval for
the errors [ak, bk], centered on zero.
µk = E[e−αS ] + ε, where ε ∈ [ak, bk]
max
H(f , p) = −
1
0
f (y) ln(f (y))dy−
K
k=1
(pk ln pk + (1 − pk) ln(1 − pk))
such that
1
0 yαk fY (y)dy + pkak + (1 − pk)bk = µk
0 < pk < 1,
1
0 fY (y)dy = 1
k = 1, ..., K with K = 8.
27 / 48

The solution can be expressed in terms of the Lagrange multipliers
f ∗(y) = e− K
k=1 λk yαk
Z(λ)
p∗
k = e−ak λk
e−ak λk +e−bk λk
.
Here, the normalization factor Z(λ) is as above. The vector λ∗
of
Lagrange multipliers is to be found minimizing the dual entropy
H(λ) = ln Z(λ) +
K
k=1
ln e−ak λk
+ e−bk λk
+ < λ, µ >= Σ(λ)
Once λ∗
is found, the estimator of the measurement error is, given
by
εk =
ake−ak λ∗
k + bke−bk λ∗
k
e−ak λ∗
k + e−bk λ∗
k
.
f ∗(y) = e− K
k=1 λ∗
k yαk
Z(λ∗)
, p∗
k = e−ak λ∗
k
e
−ak λ∗
k +e
−bk λ∗
k
28 / 48

Numerical Results
1 Two Maxentropic Approaches to determine the probability density of compound
losses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in Density
Reconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
29 / 48

Numerical Results
To test the methodology we consider different combinations of frequencies and
severity losses.
We use a sample large enough for not to worry about the effect of the size in
the results.
We use several methods to verify the quality of the results: L1 & L2 distances,
MAE & RMSE distances, visual comparisons, and goodness of fit tests.
MAE =
1
T
T
n=1
| ˆF(xn) − Fe (xn)|
RMSE =
1
T
T
n=1
ˆF(xn) − Fe (xn)
2
RMSE is more sensitive to outliers, because this measure gives a relatively high
weight to large errors. So, the greater the difference between MAE and RMSE,
the greater the variance of the individual errors in the sample.
30 / 48

Simulation details
To test the methodology we consider diﬀerent combinations of fre-
quencies and loss distributions.
Sbh Nbh Xbh
S1: POISSON (λ = 80) CHAMPERNOWNE (α = 20, M = 85, c = 15)
S2: POISSON (λ = 60) LogNormal (µ = -0.01, σ = 2)
S3: BINOMIAL(n = 70, p = 0.5) PARETO( shape = 10, scale = 85)
S4: BINOMIAL (n = 62, p = 0.5) CHAMPERNOWNE (α = 10, M = 125, c = 45)
S5: BINOMIAL (n = 50, p = 0.5) GAMMA(shape = 4500, rate = 15)
S6: BINOMIAL (n = 76, p = 0.5) GAMMA(shape = 9000, rate = 35)
S7: NEGATIVE BINOMIAL (r = 80, p = 0.3) WEIBULL(shape = 200, scale = 50)
Tail: NEGATIVE BINOMIAL (r = 90, p = 0.8) PARETO(shape = 5.5, scale = 5550)
Table: Inputs for the simulation of S
All the risks are independent.
31 / 48

Results
Approach MAE RMSE
SMEE 0.005928 0.006836
SME 0.006395 0.009399
Table: MAE and RMSE for a sample size o
5000
MAE =
1
T
T
n=1
| ˆF(xn) − Fe (xn)|
RMSE =
1
T
T
n=1
ˆF(xn) − Fe (xn)
2
32 / 48

SME Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
01234
Losses
density
True
AVERAGE
Reconstructions
(a) 10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.02.5
Losses
density
True
AVERAGE
Reconstructions
(b) 20
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.0
Losses
density
True
AVERAGE
Reconstructions
(c) 50
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.0
Losses
density
True
AVERAGE
Reconstructions
(d) 100
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.5
Losses
density
True
AVERAGE
Reconstructions
(e) 500
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.5
Losses
density
True
AVERAGE
Reconstructions
(f) 1000
Figure: SME reconstructions for diﬀerent sample sizes. Tolerance: 1 × 106
34 / 48

SMEE Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0123
Losses
density
True
AVERAGE
Reconstructions
(a) 10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.02.5
Losses
density
True
AVERAGE
Reconstructions
(b) 20
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.0
Losses
density
True
AVERAGE
Reconstructions
(c) 50
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.52.0
Losses
density
True
AVERAGE
Reconstructions
(d) 100
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.5
Losses
density
True
AVERAGE
Reconstructions
(e) 500
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.5
Losses
density
True
AVERAGE
Reconstructions
(f) 1000
Figure: SMEE reconstructions for diﬀerent sample sizes. Tolerance:
1 × 106
35 / 48

SME Results & SMEE Results
Size Error Mean (SME) Mean (SMEE) sd (SME) sd (SMEE)
10
MAE 0.0880 0.0690 0.0472 0.0299
RMSE 0.1010 0.0784 0.0527 0.0324
20
MAE 0.0619 0.0620 0.0339 0.0324
RMSE 0.0702 0.0705 0.0374 0.0378
50
MAE 0.0377 0.0378 0.0213 0.0215
RMSE 0.0429 0.0430 0.0234 0.0237
100
MAE 0.0266 0.0267 0.0158 0.0157
RMSE 0.0304 0.0306 0.0172 0.0173
200
MAE 0.0194 0.0196 0.0099 0.0099
RMSE 0.0225 0.0229 0.0111 0.0111
500
MAE 0.0128 0.0131 0.0063 0.0063
RMSE 0.0153 0.0156 0.0069 0.0069
1000
MAE 0.0093 0.0093 0.0035 0.0035
RMSE 0.0115 0.114 0.0039 0.0040
Table: SME & SMEE results for diﬀerent sample sizes. Tolerance:
1 × 106
36 / 48

SME Results & SMEE Results
Size Area (SME) Area (SMEE) AVE.(SME) AVE.(SMEE)
10 2.625 2.619
0.0092 0.0069
0.0120 0.0110
20 1.523 1.759
0.0082 0.0066
0.0116 0.0109
50 0.955 1.044
0.0082 0.0065
0.0106 0.0102
100 0.696 0.690
0.0053 0.0060
0.0066 0.0082
200 0.538 0.552
0.0053 0.0063
0.0067 0.0072
500 0.326 0.294
0.0055 0.0058
0.0076 0.0083
1000 0.203 0.200
0.0054 0.0057
0.0078 0.0082
Table: SME & SMEE results for diﬀerent sample sizes. Tolerance:
1 × 106
37 / 48

Computation of the regulatory capital
γ Empirical SME SMEE
VaR
0.950 5.05 4.935 5.004
0.990 5.72 5.755 5.772
TVaR
0.950 5.45 5.443 5.461
0.990 6.05 6.0207 6.014
Table: Comparison of VaR and TVaR at 95% and 99% for a unique
sample of size 5000
Size
VaR(95%) TVaR(95%) VaR(99%) TVaR(99%)
SME SMEE SME SMEE SME SMEE SME SMEE
10 4.96 4.87 5.30 5.156 4.331 5.283 4.328 5.634
100 4.96 4.931 5.44 5.43 5.457 5.779 5.694 6.016
500 4.95 4.93 5.45 5.45 5.708 5.822 5.972 6.017
1000 4.95 4.95 5.45 5.45 5.729 5.828 5.977 6.064
Table: Mean and standard deviation of the VaR and TVaR for 200
samples of diﬀerent sizes
38 / 48

Conclusions
In this work we present an application of the maxentropic methodologies to
Operational Risk. We showed that this methodology can provide a good density
reconstruction over the entire range, in the case of scarcity, heavy tails and
asymmetries, using only eight moments as input of the methodology.
This methodology gives the possibility to obtain the density distributions from
diﬀerent levels of aggregation and allow us to include dependencies between
diﬀerent types of risks.
We can joint marginal densities obtained from any methodology and give
them any relation or
we can obtain the joint distribution directly from the data and avoid bad
estimations
The estimation of the underlying loss process provides a starting point to design
policies, set premiums and reserves, calculate optimal reinsurance levels and
calculate risk pressures for solvency purposes in insurance and risk management.
Also, this is useful in structural engineering to describe the accumulated damage
of a structure, just to mention one more possible application.
39 / 48

Conclusions
1 Two Maxentropic Approaches to determine the probability density of compound
losses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in Density
Reconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
40 / 48

Conclusions
Here, we work with four diﬀerent approximations of
Maxentropic methodologies
(SME/MEM/SMEE(ap1)/SMEE(ap2)), two of those allows
to include the possibility of consider a little of uncertainty in
the input, using a interval for the moments instead of the
sample estimation. Additionally, the maximum entropy in the
mean (MEM) allows to aggregate to the estimation a
reference measure to improve the results.
In general the SMEE method improves the quality of the
results in terms of convergence and number of iterations.
This methodology can be used to estimate the distributions of
the losses fX in the ﬁrst level of aggregation, when we have
the distribution of aggregated losses fS and the distribution of
the frequency pn.
41 / 48

Maximum Entropy Method - General overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω → C
into a problem of convex optimization, by maximizing the entropy measure.
Where
C is a constraint set of possible reconstructions, as values of a r.v. in X Ω is a sample
space
Then, we have a
Unique and robust solution
Among those x’s ∈ C yielding similar reconstruction error, choose one with a
norm as small as possible.
43 / 48

Maximum Entropy Method - General overview
Find a probability distribution P on some measure space (Ω, F), which is absolutely
continuous respect to some (usually σ-ﬁnite) measure on Q and on (Ω, F)
max
P
SQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ) = −
Ω
dP
dQ
ln
dP
dQ
dQ = −
Ω
ln
dP
dQ
dP
(1)
satisfying
{P << Q such that EP [AX] = Y } (2)
where
Q: is the reference measure which reﬂects the information that we have.
EP [AX] = AEP [X] = − Ω AξdP(ξ) = − Ω Aξρ(ξ)dξ = y
Ω ρ(ξ)dQ(ξ) = 1
dP(ξ) = ρ(ξ)dQ(ξ) dQ(ξ) = qdξ
Note that if such measure P is found, them
xj = Ep[Xj ]
44 / 48

We introduce Lagrangian multipliers λ to obtain the result
dP(λ) =
exp(− < λ, Aξ >)
Z(λ)
dQ(ξ)
where the normalization constant is determined by
Z(λ) =
Ω
e−<λ,Aξ>)
dQ(ξ) (3)
Then it is necessary to ﬁnd λ∗, that is the minimizer of the dual entropy that is
in function of λ and is given by
inf (SQ (λ)) = −sup(−SQ (P))
SQ (λ) = −
Ω
exp(− < λ, Aξ >)
Z(λ)
ln
exp(− < λ, Aξ >)
Z(λ)
dQ(ξ)
min
λ
(λ, y) = lnZ(λ)+ < λ, y >
45 / 48

Basically it is a problem of minimizing a convex function that is very ﬂat and
there have to reduce the step size as it progresses
Min
λ
(λ, y) = Min
λ
{lnZ(λ)+ < λ, y >} (4)
Z(λ) =
Ω
e−<λ,Aξ>)
dQ(ξ) (5)
Barzilai and Borwein optimization method (BB-method)
Optimal solution:
ρ∗
(ξ) =
exp(− < λ∗, Aξ >)
Z(λ∗)
and
x∗
j =
exp(−At λ∗)
z(λ∗)
, with z(λ∗
) = exp(−λo)
46 / 48

Measures of Reference
The choice of the measure Q is up to the modeler, and it may thought of as a ﬁrst
guess of the unknown distribution.
SQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ)
SQ (λ) = (λ, µ) = lnZ(λ)+ < λ, µ >
Z(λ) = EQ [e−<λ,µ>)
] =
Ω
e−<λ,µ>)
dQ(ξ)
Uniform distribution ξ ∼ U(0, 1), dQ(ξ) = dξ ⇒ SME
47 / 48

Lin (1992). Characterization of distributions via moments.
Theorem (Lin 1) Let Fy be the distribution of a positive random
variable Y . Let αn be a sequence of positive and distinct numbers in
(0, A) for some A, satisfying limn→∞αn = αo < A. If E[Y A] < ∞,
the sequence of moments E[Y αn ] characterizes Fy .
Theorem (Lin 2) Let Fy be the distribution of a positive random
variable Y . Let αn be a sequence of positive and distinct numbers
satisfying limn→∞αn = 0 and n≥1 αn = ∞. Then the sequence
of moments E[Y αn ] characterizes Fy .
Both results hinge on the fact that an analytic function is determined
from a collection of its values on a countable set having an accu-
mulation point in its domain of analyticity. The connection between
the two explained in Lin’s paper.
48 / 48

MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING

Similar to MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING (20)

Recently uploaded

Recently uploaded (20)

MAXENTROPIC METHODS FOR OPERATIONAL RISK MODELING