1. Pareto Models for Top Incomes
Arthur Charpentier
UQAM
and
Emmanuel Flachaire
Aix-Marseille Universit´e
2. What model should be fitted to top incomes?
With heavy-tailed distributions, everybody will agree to fit a
Pareto distribution in the upper tail
However, do people have the same distribution in mind?
For economists: Pareto in the tail = Pareto I
For statisticians: Pareto in the tail = GPD, or Pareto II
In this paper:
Key issue: threshold selection, with no good solution
Our approach: sensitivity to the threshold
We show that EPD GPD Pareto I
We discuss di↵erent types of bias encountered in practice
3. Overview
Strict Pareto Models
Pareto I and GPD distributions
Threshold sensitivity
Pareto-type Models
First- and second-order regular variation
Extended Pareto distribution
From Theory to Practice
Misspecification bias
Estimation bias
Sampling bias
Applications
Income distribution in South-Africa in 2012
Wealth distribution in U.S.A. in 2013
5. Pareto I and GPD
Pareto Type I distribution, bounded from below by > 0,
F(x) = 1
⇣x
u
⌘ ↵
, for x u (1)
Generalized Pareto distribution, bounded from below by u > 0,
F(x) = 1
1 +
✓
x u
◆ ↵
, for x u (2)
Extreme value theory: Pickands-Balkema-de Haan theorem.1
Fu(x) ! GPD (or Pareto II) as u ! +1
Pareto I is a special case of GPD, when = u
1
Fu(x) is the conditional (excess) distribution fct of X above a threshold u.
6. Pareto I and GPD in the tail: Is it really di↵erent?
“The Pareto I is as good as the Pareto II only at extremely high incomes,
beyond the range of thresholds usually considered” (Jenkins 2017)
If the distribution is GPD for a fixed threshold u, it is also
GPD for a higher threshold u + :
if X ⇠ GPD(u, , ↵) for X > u
then X ⇠ GPD(u + , + , ↵) for X > u +
GPD ⇡ Pareto I when u + ⇡ + (3)
1 when u = , (3) is true 8 0
2 when u 6= , (3) is true for very large value of only.
GPD above a threshold ⇡ Pareto I above a higher threshold,
much higher as u 6=
7. Pareto I and GPD: Threshold sensitivity when u 6=
Pareto I - GPD
Tail parameter estimation as the threshold increases: 1000 samples
of 1000 observations drawn from GPD(u = 0.5, = 1.5, ↵ = 2)
9. Pareto-type in the tail, rather than strictly Pareto
If the threshold is not extremely high, the tail may not be
strictly Pareto. We will assume it is Pareto-type.
Most heavy-tailed distributions are regularly varying:
1 F(x) = x ↵
L(x)
L(x) captures deviations from the strict Pareto model:2
if L(x) !
quickly
cst : ”quickly” Pareto in the tail3
if L(x) !
slowly
cst : ”slowly” Pareto in the tail
The optimal choice of u depends on the rate of cv. of L(x)
2
It is a slowly varying function (at infinity): lim L(tx)/L(x) = 1 as x ! 1.
3
L(x) can also converge to infinity (ex: L(x) = log(x)).
10. Extended Pareto distribution (EPD)
Based on an approximation of a class of Pareto-type distrib.,4
Beirlant et al.(2009) proposed an Extended Pareto distribution
F(x) = 1
hx
u
⇣
1 +
⇣x
u
⌘⌧ ⌘i ↵
for x u (4)
It is a more flexible distribution:5
EPD(u, 0, ⌧, ↵) = Pareto I(u, ↵)
EPD(u, , 1, ↵) = GPD(1, u/(1 + ), ↵)
Mean over a threshold: no closed form ! numerical methods
EPD can capture the 2nd-order of the regular variation
4
Hall class of distributions (Singh-Maddala, Student, Fr´echet, Cauchy)
5
It also nested Pareto Type III distribution, when ↵ = 1.
11. Sensitivity to the threshold in large sample, n = 50,000
Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵:
1,000 samples of 50,000 observations drawn from a Singh-Maddala
distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis
is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and
EPD (red) models.
12. Sensitivity to the threshold in huge sample, n = 1 million
Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵:
1,000 samples of 1,000,000 observations drawn from a Singh-Maddala
distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis
is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and
EPD (red) models.
14. Pareto diagram
From a Pareto Type I distribution, we have:
log(1 F(x)) = c ↵ log x (5)
Pareto diagram: plot of survival function vs. incomes (in logs)
{log x, log(1 F(x))} (6)
It shows % of the population with x or more against x, in logs
If F is strictly Pareto: the pareto diagram is a linear function
The slope of the linear function is given by the tail index: ↵
If F is Pareto in the tail: the Pareto diagram is ultimately linear
15. Misspecification bias
Figure: Pareto diagram based on the true CDF (red), with two linear
approximations based on log x 2 and log x 1
Pareto diagrams based on the CDF are (ultimately) concave
A threshold too low leads to underestimate ↵
A threshold too low leads to overestimate inequality
16. Estimation bias
Figure: Pareto diagram based on a sample (black), with an artificial
increase of the largest observations (blue) and with topcoding (green).
In practice, the CDF is unknown, it is replaced by the EDF
Erratic behavior in the right: EDF = poor fit of the upper tail
The presence of outliers leads to overestimate inequality,
Topcoding and underreporting leads to underestimate inequality.
17. Sampling bias
Surveys do not capture well the upper tail, because the rich
are harder to reach or more likely to refuse to participate
When some members are more or less likely to be included
than others in the survey ! sampling bias
To correct it, data producers provide sample weights
Pareto diagrams and ML estimation of GPD and EPD models
with weights are not provided by standard software
We develop R functions, that we make available on GitHub:
https://github.com/freakonometrics/TopIncomes
18. Inequality measures
EDF in bottom (1 q)100% + Pareto in top q100%
The top p100% income share is
TS
(GPD)
p,q =
8
>>><
>>>:
p
[↵/(↵ 1)] (p/q) 1/↵ + u
(1 q)¯xq + q /(↵ 1) qu
if p q
1
(1 p) ¯xp
(1 q)¯xq + q /(↵ 1) qu
if p > q
(7)
where ¯xq (¯xp) is the weighted mean of the (1 q)100%
((1 p)100%) smallest ordered observations.
TS
(EPD)
p,q =
8
>><
>>:
pu0 + qEu0
(1 q)¯xq + q(u + Eu)
if p q
1
(1 p) ¯xp
(1 q)¯xq + q(u + Eu)
if p > q
(8)
where Eu and Eu0 are obtained by numerical integration.
20. Incomes in South-Africa, 2012
The following Table shows the tail index ˆ↵, and the top 1% index
for 3 thresholds, q90, q95 and q99, that is, when the Pareto distrib
is fitted on, respectively, the top 10%, 5% and 1% observations.
tail index top 1%
threshold q90 q95 q99 q90 q95 q99
Pareto I 1.742 1.881 2.492 0.192 0.171 0.146
GPD 2.689 2.935 19.249 0.142 0.141 0.139
EPD 2.236 2.255 4.198 0.148 0.149 0.139
More heaviness and more inequality with Pareto I 6
EPD and GPD are more stable
6
for a given threshold
21. Incomes in South-Africa, 2012: Tail index estimates
0 200 400 600 800
012345
MLE estimates of the tail index
k largest values
tailindex(alpha)
Pareto 1 (Hill estimator)
GPD
EPD
q90q95q99
Figure: Incomes in South-Africa 2012: plot of MLE estimates of the tail
index ↵ of Pareto I (blue), GPD (green) and EPD (red) models, as a
function of the number of largest observations used.
22. Incomes in South-Africa, 2012: Top 1% shares
0 200 400 600 800
0.050.100.150.200.250.30
Top 1% share
k largest values
share
Pareto 1
GPD
EPD
q90q95q99
Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I
(blue) and EPD (red) models fitted on the top 10% incomes.
23. Incomes in South-Africa, 2012: Pareto diagram
9 10 11 12 13 14
−10−8−6−4−20
Pareto diagram
log(x)
log(1−F(x))
Pareto 1
GPD
EPD
q90 q95 q99
Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I
(blue) and EPD (red) models fitted on the top 10% incomes.
24. R code
https://github.com/freakonometrics/TopIncomes
1 l i b r a r y ( TopIncomes )
2 df < read . t a b l e ( ” d a t a s e t . t x t ” , header=TRUE)
3 c u t o f f s=seq ( 0 . 2 0 , 0 . 0 1 , by= .001)
4 r1=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=” pareto1 ” )
5 r2=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”gpd” )
6 r3=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”epd” )
7
8 # F i g u r e of t a i l index
9 p l o t ( r1 $k , r1 $ alpha , c o l=” blue ” , main=” T a i l index ” , type=” l ” )
10 l i n e s ( r2 $k , r2 $ alpha , c o l=” green ” )
11 l i n e s ( r3 $k , r3 $ alpha , c o l=” red ” )
12
13 # F i g u r e of top share
14 p l o t ( r1 $k , r1 $ index , c o l=” blue ” , main=”Top share ” , type=” l ” )
15 l i n e s ( r2 $k , r2 $ index , c o l=” green ” )
16 l i n e s ( r3 $k , r3 $ index , c o l=” red ” )
17
18 # Pareto diagram :
19 Pareto diagram ( data=df $y , weights=df $w)
25. Conclusion
To date, researchers have invariably used the Pareto I model
Pareto I can lead to severe biases of ˆ↵ and, thus, of inequality
Under-estimation of ˆ↵ . . . over-estimation of inequality:
A threshold too low (misspecification bias, it doesn’t # as n ")
The presence of outliers in the sample (estimation bias)
Over-estimation of ˆ↵ . . . under-estimation of inequality:
Topcoding, censoring, underreporting (estimation bias)
Extended Pareto Pareto distribution can help to reduce biases
Pareto diagrams and tail index estimates plots are useful to
check the results. They should be more often used in practice
26. Beliefs on surveys
It is widely believed that ˆ↵ is upward-biased in surveys
In a seminal paper, Atkinson et al. (2011) write: ”The Pareto parameter
is estimated using the ratio of the top 5 percent income share to the top
decile income share (. . . ). Because those top income shares are often
based on survey data (and not tax data), they likely underestimate the
magnitude of the changes at the very top.”
True if F is Pareto I above the top decile and if no outliers
Otherwise, ˆ↵ might as well be biased downward
! The estimation of the tail index of Pareto I model on surveys is
not necessarily upward-biased, it can also be downward-biased
27. Beliefs on tax data
It is widely believed that ˆ↵ is much more reliable in tax data
No estimation bias: no topcoding, underreporting
But sensitive to misspecification bias: A threshold too low
may lead to severe under-estimation of the tail index
(over-estimation of inequality), even with millions of
observations
Jenkins (2017): threshold higher than the 99.5-percentile7
! This suggests that fitting Pareto model on tax data should be
done with caution
7
For Pareto I model fitted on U.K. tax data for several years 1995-2010