A Maximum Entropy Approach to the Loss Data Aggregation Problem

MAXIMUM ENTROPY APPROACH TO THE LOSS DATA
AGGREGATION PROBLEM
Erika Gomes
epgomes@emp.uc3m.es
joint work with
Silvia Mayoral Henryk Gzyl
Department of Business Administration
Universidad Carlos III de Madrid
June, 2016
1 / 38

Outline
1 Introduction
Motivation
Methodology: Loss distribution Approach
Univariate case
Multivariate case
2 Maximum Entropy Approach
Examples and Applications
Theory
3 Numerical Results
Joint marginal density distributions
Joint vector of losses
4 Conclusions
2 / 38

Introduction
Motivation
Banks developed a conceptual framework to characterize and quantify risk, to
put money aside to cover large-scale losses and to ensure the stability of the
ﬁnancial system.
3 / 38

Introduction
Motivation
ﬁnancial system.
In this sense, a similar problem appears also in Insurance, to set premiums and
optimal reinsurance levels.
3 / 38

Introduction
Motivation
ﬁnancial system.
The diﬀerences between these two sectors lies in the availability of the data. In
Operational risk the size of the historical data is small. So, the results may vary
widely.
3 / 38

Introduction
Motivation
ﬁnancial system.
widely.
More precisely, we are interested in the calculation of regulatory/economic
capital using advanced models (LDA: loss distribution approach) allowed by
Basel II.
3 / 38

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
The problem is the calculation of the amount of money you may need in order
to be hedged at a high level of conﬁdence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a
1-in-1000 (quantile .999) years worst possible loss event.
3 / 38

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
3 / 38

Introduction
Motivation
ﬁnancial system.
widely.
Basel II.
It is necessary to calculate the distribution of the losses, and the methodology
used has to take in consideration challenges related with size of the data sets,
bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
3 / 38

Introduction
Loss distribution Approach (LDA): Univariate Case
Operational risk has to do with
losses due failures in processes,
technology, people, etc.
Two variables play a role in operational risk:
Severity (X): Lognormal, gamma,
weibull distributions, Subexponential
distributions...
Frequency (N): Poisson, negative
binomial , binomial distributions.
S = X1 + X2 + ... + XN =
N
n=1
Xn
where S represents the aggregate claim
amount in a ﬁxed time period (typically one
year) per risk event.
Approach used: Fit parametric distributions
to N and X and obtain fS through recursive
models or convolutions
No single distribution ﬁts well over the en-
tire data set.
4 / 38

Introduction
Loss distribution Approach (LDA): Multivariate Case
Then, each of these loss distributions are further summed among all types of risk, to
arrive at the total aggregate loss.
(S1, ..., Sm) = (
N1
i=1
X1i, ...,
Nm
i=1
Xmi)
ST =
m
i=1
Si = S1 + S2 + ... + Sm
where b = 1, . . . , 8(business lines), l = 1, . . . , 7(event types),
m = 8 × 7, types of risks in Operational Risk
Dependence structure between risks Si , i.e. choice of a copula model.
5 / 38

Introduction
Loss distribution Approach (LDA): Illustrative Example
Loss distribution Approach (LDA): Illustrative Example
Estimate parametric distributions for the Frequency N and Severity X for each
individual risk (Maximum-Likelihood Estimation - MLE).
Compound the distributions (Panjer, Convolutions, Fourier, ...). Then, we have
fSi
(Univariate Case)
Then, the density fST
of the sum
ST = S1 + ... + SB
(Multivariate case) can be obtained by a sequential convolution procedure:
1 Derive the distribution of the sum of a pair of values S1 + S2 from the
joint density fS1,S2
(s1, s2) = fS1
(s1)fS2
(s2)c(s1, s2), where C is the copula
model .
2 Apply the convolution integral
fS1+S2
= s1
fS1,S2
(s1, l12 − s1) = s2
fS1,S2
(s2, l12 − s2)
steps (1) and (2) are repeated for the rest of the sum.
6 / 38

Introduction
Loss distribution Approach (LDA): Copulas
It is a method used to introduce dependence among random
variables.
C(u1, u2, ..., un) = P[U1 ≤ u1, U2 ≤ u2, ..., Un ≤ un]
7 / 38

Introduction
Illustrative Example
200 samples of size 10
Aggregation: 7 independent types of risks
What happen when the data is scarce, as
in common in banking?
Problems
Parameter uncertainty.
Bad fit in the tails.
Scarcity of data, impossibility to fit
tails and body separately.
Underestimation of the regulatory
capital charge.
This methodology gives a bad fit
even when re-sampling is an
alternative.
8 / 38

Maximum entropy approach
Illustrative example - size and tail concern
Parametric Approach
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.001234
Losses
density
True
AVERAGE
Reconstructions
Maxent Approach
Maxent provides a density reconstruction over the entire range of values.
9 / 38

Illustrative example - bimodal concern
(1)
(2)
Error (1) (2)
MAE 0.02652 0.01291
RMSE 0.03286 0.01647
Table : Errors.
Maxent is able to model asymmetries.
10 / 38

Question
How can we use maxent methodologies to model dependencies between
diﬀerent types of risks in the framework of Operational risk?
11 / 38

Find a probability distribution P on some measure
space (Ω, F), which is absolutely continuous respect
to some (usually σ-ﬁnite) measure on Q and on
(Ω, F)
max
P
HQ (P) = −
Ω
ρ(ξ)lnρ(ξ)dQ(ξ)
satisfying
{P << Q such that EP [AX] = Y }
Ω ρ(ξ)dQ(ξ) = 1
This method consist in to ﬁnd the probability measure which
best represent the current state of knowledge which is the one
with the largest information theoretical entropy.
12 / 38

Jaynes, 1957
This concept was used by Jaynes (1957) for ﬁrst time as a method
of statistical inference in the case of a under-determined problem.
For example:
We rolled 1000 times, a six-sided die comes up with an average of
4.7 dots. We want to estimate, as best we can, the probability
distribution of the faces.
There are inﬁnitely many 6-tuples (p1, ..., p6) with pi ≥ 0,
i
pi = 1 and
i
ipi = 4.7
13 / 38

Entropy in Finance
In the last two decades, this concept which has its original role in
statistical physics, has found important applications in several
fields, especially in finance
It has been an important tool for portfolio selection, as a
measure of the degree of diversification.
In asset pricing, to tackle the problem of extracting asset
probability distributions from limited data.
14 / 38

Maximum Entropy Approach
General overview
The essence of the Maxent method consist in transform a problem of the type
AX = Y X : Ω → C
into a problem of convex optimization, by maximizing the entropy measure.
Where C is a constraint set of possible reconstructions.
Then, we have a
Unique and robust solution
15 / 38

Laplace Transform
In probability theory and statistics, the Laplace transform is deﬁned as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
If any two continuous functions have the same Laplace transform, then those
functions must be identical.
The Laplace transforms of some pdf’s are not easy to invert and there is not
a completely general method which works equally well for all possible
transforms.
16 / 38

Laplace Transform
In probability theory and statistics, the Laplace transform is deﬁned as expectation of
a random variable.
ψ(α) = E[e−αS
] =
∞
0
e−αs
dFS (s) S ∈ IR+
If any two continuous functions have the same Laplace transform, then those
functions must be identical.
All the information about the problem can be compressed in a set of moments
obtained from the Laplace transform, through a change of variables.
ψ(α) = E[e−αS
] = E[Y α
] =
1
0
Y α
dFY (y) with Y = e−St and Y ∈ (0, 1)
The selection of those moments should be in a way that we use only those that
are the more relevant or informative (Lin-1992 and Entropy Convergence
Theorem).
17 / 38

Laplace Transform
We want to model fS with S > 0.
When N = 0 ⇒ S = 0 and we rewrite the Laplace transform as
ψ(α) = E[e−αS
] = P(S = 0) · E[e−αS
|S = 0] + P(S > 0) · E[e−αS
|S > 0]
ψ(α) = E[e−αS
] = P(N = 0) · E[e−αS
|N = 0] + P(N > 0) · E[e−αS
|N > 0]
where P(S = 0) = P(N = 0) = po, then
ψ(α) = E[e−αS
] = po · 1 + (1 − po) · E[e−αS
|N > 0]
µ(α) = E[e−αS
|N > 0] =
ψ(α) − po
1 − po
ψ(α) and po has to be estimated from the data, or any other procedure.
18 / 38

Input of the Methodology
Univariate Case
Thus, the problem becomes to determine fS from the integral constraint after a
change of variables,
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 19 / 38

Univariate Case
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
Fit parametrically the frequency and severity distributions and calculate the Laplace
transform through the probability generation function.
Poisson-Gamma
ψ(αk ) = exp(− (1 − ba
(αk + b)−a
))with αk = α0/k
The quality of the results is linked to how well the data fit to the defined
distributions.
It is not possible to find a closed form of ψ(αk ) for some pdf’s. This is
particular true for long tail pdf’s, as for example the lognormal distribution.
20 / 38

Univariate Case
E[e−αS
|S > 0] =
1
0
yαj fY (y)dy = µ(αj ), j = 0, ..., K.
Analytical form
ψ(αk ) = E(e−αSt ) = ∞
n=0(ϕX (t))k pn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =
1
T
T
i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ..., K optimal number of moments.
φX (αk ): Laplace transform of X, αk ∈ R+
G(·): probability generation function of the frequencies
ψ(αk ): Laplace transform of the total losses
T: sample size. 21 / 38

Multivariate Case
Here we explore how the maxent methodologies can handle with dependencies
between different types of risks, when
1 The marginal densities for each risk fS1
, ..., fSm are independently modeled, and
the input of the maxent methodology should be adapted to model
dependencies between risks, through a copula model. In order to see the
quality of the results we compare with a sequential convolution procedure.
2 Choose the incorrect copula can harm the fit of the distribution. In the case
where the losses are collected as a joint vector (where the dependencies are
included), the maxent techniques can find directly the density distribution.
We compare the results with a sequential convolution procedure with a correct
and incorrect copula to see the differences.
22 / 38

Multivariate Case
(1) We can aggregate dependencies to our input, knowing each fSi
ψ(α) = E[e−αk (S1+S2+...+SB )] =
=
N−1
i=1
e−(s1i +s2i +...+sBi )αk f (s1i , s2i , ..., sBi )∆s1∆s2...∆sB
where N is the number of partitions used in the discretization and
f (s1, s2, ..., sB ) = c[F1(s1), ..., F1(sB )]
B
i=1
fSi
(xi )
the joint distribution, c is the density of the copula model C, and fS1
, ..., fSB
are
marginal densities.
(2) Simply ψ(α) = 1
T
T
i=1
e−αk (s1i +s2i +...+sBi )
where T is the sample size.
23 / 38

Maximum Entropy Methods
max H(f ) = −
1
0
fY (y)lnfY (y)dy
SME approach. Find the probability density on [0,1]
ψ(α) =
1
0
yαk f (y)dy = µ(αk ) with Y = e−S
where µ =
ψ(αk )−P(N=0)
1−P(N=0)
MEM approach: Extension of the SME approach, allows to include a reference
measure Q, which is a parametric distribution.
SMEE approach: Extension of the SME approach when we assume that the
data has noise.
1
0
yαk f (y)dy ∈ Ck = [ak , bk ] with Y = e−S
These methods consist in to ﬁnd the probability measure which best represent
the current state of knowledge which is the one with the largest information
theoretical entropy.
24 / 38

Standard Maximum Entropy Method (SME)
In general, the maximum entropy density is obtained by maximizing the entropy
measure
Max H(f ) = −
1
0
fY (y)lnfY (y)dy
satisfying
E(yαk ) =
1
0
yαk fY (y)dy = µαk , k = 1, 2, ..., K with K = 8
1
0 fY (y)dy = 1
where
µk : k-th moment, which it is positive and it is a known value
K = 8: number of moments
Fractional value: αk = α0/k, α0 = 1.5
25 / 38

When the problem has a solution can be expressed in terms of the Lagrange
multipliers as
f ∗
Y (y) =
1
Z(λ)
exp −
K
k=1
λk yαk = exp −
K
k=0
λk yαk
the normalization constant is determined by
Z(λ) =
Ω
exp −
K
k=1
λk yαk dy
Then it is necessary to ﬁnd λ∗, that is the minimizer of the dual entropy that is
in function of the Lagrange multipliers λ and is given by
H(λ) = lnZ(λ)+ < λ, µ >= (λ, µ)
Basically it is a problem of minimizing a convex function and there have to re-
duce the step size as it progresses (Barzilai and Borwein non-monotone gradient
method)
f ∗
Y (y) =
1
Z(λ∗)
exp −
K
k=1
λ∗
k yαk Y ∈ (0, 1)
26 / 38

1 The starting point is ∞
0 e−αs dFS (s) = µk , S ∈ (0, ∞)
2 Make a change of variables, setting Y = e−S(N), Y ∈ (0, 1),
3 Find a minimum of the dual entropy that is in function of λ,
min
λ
(λ, µ) = lnZ(λ)+ < λ, µ >
where
Z(λ) =
1
0
e− K
k=1 λk yαk
dy.
4 The solution
f ∗
Y (y) =
1
Z(λ∗
)
e− K
k=1 λ∗
k yαk
= e− K
k=0 λ∗
k yαk
Y ∈ (0, 1)
5 Return the change of variables
f ∗
S (s) = e−s
f ∗
Y (e−s
), S ∈ (0, ∞)
27 / 38

Numerical Results
Multivariate Case
28 / 38

Numerical Results
To test the methodology we consider different combinations of frequencies and
severity losses.
We use a sample large enough for not to worry about the effect of the size in
the results.
We use several methods to verify the quality of the results: L1 & L2 distances,
MAE & RMSE distances, visual comparisons, and goodness of fit tests.
MAE =
1
T
T
n=1
|ˆF(xn) − Fe (xn)|
RMSE =
1
T
T
n=1
ˆF(xn) − Fe (xn)
2
RMSE is more sensitive to outliers, because this measure gives a relatively high
weight to large errors. So, the greater the difference between MAE and RMSE,
the greater the variance of the individual errors in the sample.
29 / 38

Numerical Results
S
Density
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
01234
(a) S1
S
Density
1.5 2.0 2.5 3.0
0.00.51.01.52.02.53.0
(b) S2
S
Density
4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.00.51.01.5
(c) S3
Figure : Losses for each line of activity, reconstructed by SME
Error
S MAE RMSE
S1 0.0054 0.0072
S2 0.0241 0.0282
S3 0.0061 0.0071
30 / 38

Error
Copula MAE RMSE
Independent 0.027 0.039
Gaussian, ρ = 0.5 0.004 0.005
Gaussian, ρ = 0.8 0.004 0.005
t − student, ρ = 0.7 , ν = 10 0.004 0.005
32 / 38

Numerical Results
Multivariate Case
33 / 38

Gaussian copula with weak Negative correlation
S
Density
6.5 7.0 7.5 8.0 8.5 9.0
0.00.51.01.5
SME (Gaussian)
Convolution (t−student)
Convolution (Gaussian)
Error SME Convolution(Gaussian) Convolution (t-Student)
MAE 0.007989 0.01071 0.01430
RMSE 0.009605 0.01264 0.01652
Table : Errors SME approach.
34 / 38

Computation of the regulatory capital
VaR
Approaches Errors Conf. Interval 95%
γ SME Conv. Empirical SME err. Conv.err. VaRinf VaRsup
0.900 7.237 7.293 7.236 0.001 0.057 7.212 7.263
0.950 7.399 7.293 7.365 0.034 0.072 7.309 7.389
0.990 7.682 7.569 7.658 0.024 0.089 7.516 7.689
0.995 7.803 7.707 7.719 0.084 0.012 7.595 7.856
0.999 8.175 8.534 8.601 0.426 0.067 7.689 8.926
Table : Comparison of VaR for the SME & Convolution approach
35 / 38

Computation of the regulatory capital
TVaRγ(S) = E[S|S > VaRγ]
TVaR
Approaches Errors Conf. Interval 95%
γ SME Conv. Emp. SME err. Conv.err. TVaRinf TVaRsup
0.900 7.439 7.404 7.419 0.020 0.015 7.336 7.536
0.950 7.571 7.514 7.549 0.022 0.035 7.415 7.735
0.990 7.892 7.837 7.920 0.028 0.083 7.551 8.443
0.995 8.052 8.089 8.047 0.005 0.042 7.578 8.926
0.999 8.529 8.758 8.334 0.195 0.424 7.658 8.926
Table : Comparison of TVaR for the SME & Convolution approaches.
36 / 38

Conclusions
In this work we present an application of the maxent methodologies to
Operational Risk. We showed that this methodology can provide a good density
reconstruction over the entire range, in the case of scarcity, heavy tails and
asymmetries, using only eight moments as input of the methodology.
This methodology gives the possibility to obtain the density distributions from
diﬀerent levels of aggregation and allow us to include dependencies between
diﬀerent types of risks.
We can joint marginal densities obtained from any methodology and give
them any relation or
we can obtain the joint distribution directly from the data and avoid bad
estimations
The estimation of the underlying loss process provides a starting point to design
policies, set premiums and reserves, calculate optimal reinsurance levels and
calculate risk pressures for solvency purposes in insurance and risk management.
Also, this is useful in structural engineering to describe the accumulated damage
of a structure, just to mention one more possible application.
37 / 38

A Maximum Entropy Approach to the Loss Data Aggregation Problem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to A Maximum Entropy Approach to the Loss Data Aggregation Problem

Similar to A Maximum Entropy Approach to the Loss Data Aggregation Problem (20)

Recently uploaded

Recently uploaded (20)

A Maximum Entropy Approach to the Loss Data Aggregation Problem