SlideShare a Scribd company logo
1 of 27
Download to read offline
Pareto Models for Top Incomes
Arthur Charpentier
UQAM
and
Emmanuel Flachaire
Aix-Marseille Universit´e
What model should be fitted to top incomes?
With heavy-tailed distributions, everybody will agree to fit a
Pareto distribution in the upper tail
However, do people have the same distribution in mind?
For economists: Pareto in the tail = Pareto I
For statisticians: Pareto in the tail = GPD, or Pareto II
In this paper:
Key issue: threshold selection, with no good solution
Our approach: sensitivity to the threshold
We show that EPD GPD Pareto I
We discuss di↵erent types of bias encountered in practice
Overview
Strict Pareto Models
Pareto I and GPD distributions
Threshold sensitivity
Pareto-type Models
First- and second-order regular variation
Extended Pareto distribution
From Theory to Practice
Misspecification bias
Estimation bias
Sampling bias
Applications
Income distribution in South-Africa in 2012
Wealth distribution in U.S.A. in 2013
Strict Pareto Models
Pareto I and GPD
Pareto Type I distribution, bounded from below by > 0,
F(x) = 1
⇣x
u
⌘ ↵
, for x u (1)
Generalized Pareto distribution, bounded from below by u > 0,
F(x) = 1

1 +
✓
x u
◆ ↵
, for x u (2)
Extreme value theory: Pickands-Balkema-de Haan theorem.1
Fu(x) ! GPD (or Pareto II) as u ! +1
Pareto I is a special case of GPD, when = u
1
Fu(x) is the conditional (excess) distribution fct of X above a threshold u.
Pareto I and GPD in the tail: Is it really di↵erent?
“The Pareto I is as good as the Pareto II only at extremely high incomes,
beyond the range of thresholds usually considered” (Jenkins 2017)
If the distribution is GPD for a fixed threshold u, it is also
GPD for a higher threshold u + :
if X ⇠ GPD(u, , ↵) for X > u
then X ⇠ GPD(u + , + , ↵) for X > u +
GPD ⇡ Pareto I when u + ⇡ + (3)
1 when u = , (3) is true 8 0
2 when u 6= , (3) is true for very large value of only.
GPD above a threshold ⇡ Pareto I above a higher threshold,
much higher as u 6=
Pareto I and GPD: Threshold sensitivity when u 6=
Pareto I - GPD
Tail parameter estimation as the threshold increases: 1000 samples
of 1000 observations drawn from GPD(u = 0.5, = 1.5, ↵ = 2)
Pareto-type Models
Pareto-type in the tail, rather than strictly Pareto
If the threshold is not extremely high, the tail may not be
strictly Pareto. We will assume it is Pareto-type.
Most heavy-tailed distributions are regularly varying:
1 F(x) = x ↵
L(x)
L(x) captures deviations from the strict Pareto model:2
if L(x) !
quickly
cst : ”quickly” Pareto in the tail3
if L(x) !
slowly
cst : ”slowly” Pareto in the tail
The optimal choice of u depends on the rate of cv. of L(x)
2
It is a slowly varying function (at infinity): lim L(tx)/L(x) = 1 as x ! 1.
3
L(x) can also converge to infinity (ex: L(x) = log(x)).
Extended Pareto distribution (EPD)
Based on an approximation of a class of Pareto-type distrib.,4
Beirlant et al.(2009) proposed an Extended Pareto distribution
F(x) = 1
hx
u
⇣
1 +
⇣x
u
⌘⌧ ⌘i ↵
for x u (4)
It is a more flexible distribution:5
EPD(u, 0, ⌧, ↵) = Pareto I(u, ↵)
EPD(u, , 1, ↵) = GPD(1, u/(1 + ), ↵)
Mean over a threshold: no closed form ! numerical methods
EPD can capture the 2nd-order of the regular variation
4
Hall class of distributions (Singh-Maddala, Student, Fr´echet, Cauchy)
5
It also nested Pareto Type III distribution, when ↵ = 1.
Sensitivity to the threshold in large sample, n = 50,000
Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵:
1,000 samples of 50,000 observations drawn from a Singh-Maddala
distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis
is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and
EPD (red) models.
Sensitivity to the threshold in huge sample, n = 1 million
Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵:
1,000 samples of 1,000,000 observations drawn from a Singh-Maddala
distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis
is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and
EPD (red) models.
From Theory to Practice
Pareto diagram
From a Pareto Type I distribution, we have:
log(1 F(x)) = c ↵ log x (5)
Pareto diagram: plot of survival function vs. incomes (in logs)
{log x, log(1 F(x))} (6)
It shows % of the population with x or more against x, in logs
If F is strictly Pareto: the pareto diagram is a linear function
The slope of the linear function is given by the tail index: ↵
If F is Pareto in the tail: the Pareto diagram is ultimately linear
Misspecification bias
Figure: Pareto diagram based on the true CDF (red), with two linear
approximations based on log x 2 and log x 1
Pareto diagrams based on the CDF are (ultimately) concave
A threshold too low leads to underestimate ↵
A threshold too low leads to overestimate inequality
Estimation bias
Figure: Pareto diagram based on a sample (black), with an artificial
increase of the largest observations (blue) and with topcoding (green).
In practice, the CDF is unknown, it is replaced by the EDF
Erratic behavior in the right: EDF = poor fit of the upper tail
The presence of outliers leads to overestimate inequality,
Topcoding and underreporting leads to underestimate inequality.
Sampling bias
Surveys do not capture well the upper tail, because the rich
are harder to reach or more likely to refuse to participate
When some members are more or less likely to be included
than others in the survey ! sampling bias
To correct it, data producers provide sample weights
Pareto diagrams and ML estimation of GPD and EPD models
with weights are not provided by standard software
We develop R functions, that we make available on GitHub:
https://github.com/freakonometrics/TopIncomes
Inequality measures
EDF in bottom (1 q)100% + Pareto in top q100%
The top p100% income share is
TS
(GPD)
p,q =
8
>>><
>>>:
p
[↵/(↵ 1)] (p/q) 1/↵ + u
(1 q)¯xq + q /(↵ 1) qu
if p  q
1
(1 p) ¯xp
(1 q)¯xq + q /(↵ 1) qu
if p > q
(7)
where ¯xq (¯xp) is the weighted mean of the (1 q)100%
((1 p)100%) smallest ordered observations.
TS
(EPD)
p,q =
8
>><
>>:
pu0 + qEu0
(1 q)¯xq + q(u + Eu)
if p  q
1
(1 p) ¯xp
(1 q)¯xq + q(u + Eu)
if p > q
(8)
where Eu and Eu0 are obtained by numerical integration.
Applications
Incomes in South-Africa, 2012
The following Table shows the tail index ˆ↵, and the top 1% index
for 3 thresholds, q90, q95 and q99, that is, when the Pareto distrib
is fitted on, respectively, the top 10%, 5% and 1% observations.
tail index top 1%
threshold q90 q95 q99 q90 q95 q99
Pareto I 1.742 1.881 2.492 0.192 0.171 0.146
GPD 2.689 2.935 19.249 0.142 0.141 0.139
EPD 2.236 2.255 4.198 0.148 0.149 0.139
More heaviness and more inequality with Pareto I 6
EPD and GPD are more stable
6
for a given threshold
Incomes in South-Africa, 2012: Tail index estimates
0 200 400 600 800
012345
MLE estimates of the tail index
k largest values
tailindex(alpha)
Pareto 1 (Hill estimator)
GPD
EPD
q90q95q99
Figure: Incomes in South-Africa 2012: plot of MLE estimates of the tail
index ↵ of Pareto I (blue), GPD (green) and EPD (red) models, as a
function of the number of largest observations used.
Incomes in South-Africa, 2012: Top 1% shares
0 200 400 600 800
0.050.100.150.200.250.30
Top 1% share
k largest values
share
Pareto 1
GPD
EPD
q90q95q99
Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I
(blue) and EPD (red) models fitted on the top 10% incomes.
Incomes in South-Africa, 2012: Pareto diagram
9 10 11 12 13 14
−10−8−6−4−20
Pareto diagram
log(x)
log(1−F(x))
Pareto 1
GPD
EPD
q90 q95 q99
Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I
(blue) and EPD (red) models fitted on the top 10% incomes.
R code
https://github.com/freakonometrics/TopIncomes
1 l i b r a r y ( TopIncomes )
2 df < read . t a b l e ( ” d a t a s e t . t x t ” , header=TRUE)
3 c u t o f f s=seq ( 0 . 2 0 , 0 . 0 1 , by= .001)
4 r1=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=” pareto1 ” )
5 r2=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”gpd” )
6 r3=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”epd” )
7
8 # F i g u r e of t a i l index
9 p l o t ( r1 $k , r1 $ alpha , c o l=” blue ” , main=” T a i l index ” , type=” l ” )
10 l i n e s ( r2 $k , r2 $ alpha , c o l=” green ” )
11 l i n e s ( r3 $k , r3 $ alpha , c o l=” red ” )
12
13 # F i g u r e of top share
14 p l o t ( r1 $k , r1 $ index , c o l=” blue ” , main=”Top share ” , type=” l ” )
15 l i n e s ( r2 $k , r2 $ index , c o l=” green ” )
16 l i n e s ( r3 $k , r3 $ index , c o l=” red ” )
17
18 # Pareto diagram :
19 Pareto diagram ( data=df $y , weights=df $w)
Conclusion
To date, researchers have invariably used the Pareto I model
Pareto I can lead to severe biases of ˆ↵ and, thus, of inequality
Under-estimation of ˆ↵ . . . over-estimation of inequality:
A threshold too low (misspecification bias, it doesn’t # as n ")
The presence of outliers in the sample (estimation bias)
Over-estimation of ˆ↵ . . . under-estimation of inequality:
Topcoding, censoring, underreporting (estimation bias)
Extended Pareto Pareto distribution can help to reduce biases
Pareto diagrams and tail index estimates plots are useful to
check the results. They should be more often used in practice
Beliefs on surveys
It is widely believed that ˆ↵ is upward-biased in surveys
In a seminal paper, Atkinson et al. (2011) write: ”The Pareto parameter
is estimated using the ratio of the top 5 percent income share to the top
decile income share (. . . ). Because those top income shares are often
based on survey data (and not tax data), they likely underestimate the
magnitude of the changes at the very top.”
True if F is Pareto I above the top decile and if no outliers
Otherwise, ˆ↵ might as well be biased downward
! The estimation of the tail index of Pareto I model on surveys is
not necessarily upward-biased, it can also be downward-biased
Beliefs on tax data
It is widely believed that ˆ↵ is much more reliable in tax data
No estimation bias: no topcoding, underreporting
But sensitive to misspecification bias: A threshold too low
may lead to severe under-estimation of the tail index
(over-estimation of inequality), even with millions of
observations
Jenkins (2017): threshold higher than the 99.5-percentile7
! This suggests that fitting Pareto model on tax data should be
done with caution
7
For Pareto I model fitted on U.K. tax data for several years 1995-2010

More Related Content

What's hot (20)

Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Slides ensae 8
Slides ensae 8Slides ensae 8
Slides ensae 8
 
Slides amsterdam-2013
Slides amsterdam-2013Slides amsterdam-2013
Slides amsterdam-2013
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Classification
ClassificationClassification
Classification
 
Slides simplexe
Slides simplexeSlides simplexe
Slides simplexe
 
Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
 
Side 2019, part 1
Side 2019, part 1Side 2019, part 1
Side 2019, part 1
 
transformations and nonparametric inference
transformations and nonparametric inferencetransformations and nonparametric inference
transformations and nonparametric inference
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
 
Slides risk-rennes
Slides risk-rennesSlides risk-rennes
Slides risk-rennes
 
Slides toulouse
Slides toulouseSlides toulouse
Slides toulouse
 
Slides lln-risques
Slides lln-risquesSlides lln-risques
Slides lln-risques
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
 
Slides ads ia
Slides ads iaSlides ads ia
Slides ads ia
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
 

Similar to Pareto Models, Slides EQUINEQ

Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
Advanced Statistics Homework Help
Advanced Statistics Homework HelpAdvanced Statistics Homework Help
Advanced Statistics Homework HelpExcel Homework Help
 
Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)MeetupDataScienceRoma
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Adrian Aley
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulationstim_hare
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceLong Beach City College
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplotsLong Beach City College
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Editor IJARCET
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxGairuzazmiMGhani
 
ders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxErgin Akalpler
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator Muhammad Ali
 

Similar to Pareto Models, Slides EQUINEQ (20)

ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Advanced Statistics Homework Help
Advanced Statistics Homework HelpAdvanced Statistics Homework Help
Advanced Statistics Homework Help
 
Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.ppt
 
Lecture8
Lecture8Lecture8
Lecture8
 
Statistics
StatisticsStatistics
Statistics
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulations
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Cb36469472
Cb36469472Cb36469472
Cb36469472
 
Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
ders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptx
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 

More from Arthur Charpentier (20)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 

Recently uploaded

Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Roomdivyansh0kumar0
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfGale Pooley
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free DeliveryPooja Nehwal
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxanshikagoel52
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfGale Pooley
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Pooja Nehwal
 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfGale Pooley
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
 

Recently uploaded (20)

Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptx
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdf
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
 

Pareto Models, Slides EQUINEQ

  • 1. Pareto Models for Top Incomes Arthur Charpentier UQAM and Emmanuel Flachaire Aix-Marseille Universit´e
  • 2. What model should be fitted to top incomes? With heavy-tailed distributions, everybody will agree to fit a Pareto distribution in the upper tail However, do people have the same distribution in mind? For economists: Pareto in the tail = Pareto I For statisticians: Pareto in the tail = GPD, or Pareto II In this paper: Key issue: threshold selection, with no good solution Our approach: sensitivity to the threshold We show that EPD GPD Pareto I We discuss di↵erent types of bias encountered in practice
  • 3. Overview Strict Pareto Models Pareto I and GPD distributions Threshold sensitivity Pareto-type Models First- and second-order regular variation Extended Pareto distribution From Theory to Practice Misspecification bias Estimation bias Sampling bias Applications Income distribution in South-Africa in 2012 Wealth distribution in U.S.A. in 2013
  • 5. Pareto I and GPD Pareto Type I distribution, bounded from below by > 0, F(x) = 1 ⇣x u ⌘ ↵ , for x u (1) Generalized Pareto distribution, bounded from below by u > 0, F(x) = 1  1 + ✓ x u ◆ ↵ , for x u (2) Extreme value theory: Pickands-Balkema-de Haan theorem.1 Fu(x) ! GPD (or Pareto II) as u ! +1 Pareto I is a special case of GPD, when = u 1 Fu(x) is the conditional (excess) distribution fct of X above a threshold u.
  • 6. Pareto I and GPD in the tail: Is it really di↵erent? “The Pareto I is as good as the Pareto II only at extremely high incomes, beyond the range of thresholds usually considered” (Jenkins 2017) If the distribution is GPD for a fixed threshold u, it is also GPD for a higher threshold u + : if X ⇠ GPD(u, , ↵) for X > u then X ⇠ GPD(u + , + , ↵) for X > u + GPD ⇡ Pareto I when u + ⇡ + (3) 1 when u = , (3) is true 8 0 2 when u 6= , (3) is true for very large value of only. GPD above a threshold ⇡ Pareto I above a higher threshold, much higher as u 6=
  • 7. Pareto I and GPD: Threshold sensitivity when u 6= Pareto I - GPD Tail parameter estimation as the threshold increases: 1000 samples of 1000 observations drawn from GPD(u = 0.5, = 1.5, ↵ = 2)
  • 9. Pareto-type in the tail, rather than strictly Pareto If the threshold is not extremely high, the tail may not be strictly Pareto. We will assume it is Pareto-type. Most heavy-tailed distributions are regularly varying: 1 F(x) = x ↵ L(x) L(x) captures deviations from the strict Pareto model:2 if L(x) ! quickly cst : ”quickly” Pareto in the tail3 if L(x) ! slowly cst : ”slowly” Pareto in the tail The optimal choice of u depends on the rate of cv. of L(x) 2 It is a slowly varying function (at infinity): lim L(tx)/L(x) = 1 as x ! 1. 3 L(x) can also converge to infinity (ex: L(x) = log(x)).
  • 10. Extended Pareto distribution (EPD) Based on an approximation of a class of Pareto-type distrib.,4 Beirlant et al.(2009) proposed an Extended Pareto distribution F(x) = 1 hx u ⇣ 1 + ⇣x u ⌘⌧ ⌘i ↵ for x u (4) It is a more flexible distribution:5 EPD(u, 0, ⌧, ↵) = Pareto I(u, ↵) EPD(u, , 1, ↵) = GPD(1, u/(1 + ), ↵) Mean over a threshold: no closed form ! numerical methods EPD can capture the 2nd-order of the regular variation 4 Hall class of distributions (Singh-Maddala, Student, Fr´echet, Cauchy) 5 It also nested Pareto Type III distribution, when ↵ = 1.
  • 11. Sensitivity to the threshold in large sample, n = 50,000 Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵: 1,000 samples of 50,000 observations drawn from a Singh-Maddala distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and EPD (red) models.
  • 12. Sensitivity to the threshold in huge sample, n = 1 million Figure: Boxplots of maximum likelihood estimators of the tail index ˆ↵: 1,000 samples of 1,000,000 observations drawn from a Singh-Maddala distribution, SM(2.07, 1.14, 1.75). From the left to the right, the x-axis is the threshold (percentile) used to fit Pareto I (blue), GPD (green) and EPD (red) models.
  • 13. From Theory to Practice
  • 14. Pareto diagram From a Pareto Type I distribution, we have: log(1 F(x)) = c ↵ log x (5) Pareto diagram: plot of survival function vs. incomes (in logs) {log x, log(1 F(x))} (6) It shows % of the population with x or more against x, in logs If F is strictly Pareto: the pareto diagram is a linear function The slope of the linear function is given by the tail index: ↵ If F is Pareto in the tail: the Pareto diagram is ultimately linear
  • 15. Misspecification bias Figure: Pareto diagram based on the true CDF (red), with two linear approximations based on log x 2 and log x 1 Pareto diagrams based on the CDF are (ultimately) concave A threshold too low leads to underestimate ↵ A threshold too low leads to overestimate inequality
  • 16. Estimation bias Figure: Pareto diagram based on a sample (black), with an artificial increase of the largest observations (blue) and with topcoding (green). In practice, the CDF is unknown, it is replaced by the EDF Erratic behavior in the right: EDF = poor fit of the upper tail The presence of outliers leads to overestimate inequality, Topcoding and underreporting leads to underestimate inequality.
  • 17. Sampling bias Surveys do not capture well the upper tail, because the rich are harder to reach or more likely to refuse to participate When some members are more or less likely to be included than others in the survey ! sampling bias To correct it, data producers provide sample weights Pareto diagrams and ML estimation of GPD and EPD models with weights are not provided by standard software We develop R functions, that we make available on GitHub: https://github.com/freakonometrics/TopIncomes
  • 18. Inequality measures EDF in bottom (1 q)100% + Pareto in top q100% The top p100% income share is TS (GPD) p,q = 8 >>>< >>>: p [↵/(↵ 1)] (p/q) 1/↵ + u (1 q)¯xq + q /(↵ 1) qu if p  q 1 (1 p) ¯xp (1 q)¯xq + q /(↵ 1) qu if p > q (7) where ¯xq (¯xp) is the weighted mean of the (1 q)100% ((1 p)100%) smallest ordered observations. TS (EPD) p,q = 8 >>< >>: pu0 + qEu0 (1 q)¯xq + q(u + Eu) if p  q 1 (1 p) ¯xp (1 q)¯xq + q(u + Eu) if p > q (8) where Eu and Eu0 are obtained by numerical integration.
  • 20. Incomes in South-Africa, 2012 The following Table shows the tail index ˆ↵, and the top 1% index for 3 thresholds, q90, q95 and q99, that is, when the Pareto distrib is fitted on, respectively, the top 10%, 5% and 1% observations. tail index top 1% threshold q90 q95 q99 q90 q95 q99 Pareto I 1.742 1.881 2.492 0.192 0.171 0.146 GPD 2.689 2.935 19.249 0.142 0.141 0.139 EPD 2.236 2.255 4.198 0.148 0.149 0.139 More heaviness and more inequality with Pareto I 6 EPD and GPD are more stable 6 for a given threshold
  • 21. Incomes in South-Africa, 2012: Tail index estimates 0 200 400 600 800 012345 MLE estimates of the tail index k largest values tailindex(alpha) Pareto 1 (Hill estimator) GPD EPD q90q95q99 Figure: Incomes in South-Africa 2012: plot of MLE estimates of the tail index ↵ of Pareto I (blue), GPD (green) and EPD (red) models, as a function of the number of largest observations used.
  • 22. Incomes in South-Africa, 2012: Top 1% shares 0 200 400 600 800 0.050.100.150.200.250.30 Top 1% share k largest values share Pareto 1 GPD EPD q90q95q99 Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I (blue) and EPD (red) models fitted on the top 10% incomes.
  • 23. Incomes in South-Africa, 2012: Pareto diagram 9 10 11 12 13 14 −10−8−6−4−20 Pareto diagram log(x) log(1−F(x)) Pareto 1 GPD EPD q90 q95 q99 Figure: Incomes in South-Africa 2012: Pareto diagram, with Pareto I (blue) and EPD (red) models fitted on the top 10% incomes.
  • 24. R code https://github.com/freakonometrics/TopIncomes 1 l i b r a r y ( TopIncomes ) 2 df < read . t a b l e ( ” d a t a s e t . t x t ” , header=TRUE) 3 c u t o f f s=seq ( 0 . 2 0 , 0 . 0 1 , by= .001) 4 r1=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=” pareto1 ” ) 5 r2=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”gpd” ) 6 r3=Top Share ( df $y , df $w, p=.01 ,q=c u t o f f s , method=”epd” ) 7 8 # F i g u r e of t a i l index 9 p l o t ( r1 $k , r1 $ alpha , c o l=” blue ” , main=” T a i l index ” , type=” l ” ) 10 l i n e s ( r2 $k , r2 $ alpha , c o l=” green ” ) 11 l i n e s ( r3 $k , r3 $ alpha , c o l=” red ” ) 12 13 # F i g u r e of top share 14 p l o t ( r1 $k , r1 $ index , c o l=” blue ” , main=”Top share ” , type=” l ” ) 15 l i n e s ( r2 $k , r2 $ index , c o l=” green ” ) 16 l i n e s ( r3 $k , r3 $ index , c o l=” red ” ) 17 18 # Pareto diagram : 19 Pareto diagram ( data=df $y , weights=df $w)
  • 25. Conclusion To date, researchers have invariably used the Pareto I model Pareto I can lead to severe biases of ˆ↵ and, thus, of inequality Under-estimation of ˆ↵ . . . over-estimation of inequality: A threshold too low (misspecification bias, it doesn’t # as n ") The presence of outliers in the sample (estimation bias) Over-estimation of ˆ↵ . . . under-estimation of inequality: Topcoding, censoring, underreporting (estimation bias) Extended Pareto Pareto distribution can help to reduce biases Pareto diagrams and tail index estimates plots are useful to check the results. They should be more often used in practice
  • 26. Beliefs on surveys It is widely believed that ˆ↵ is upward-biased in surveys In a seminal paper, Atkinson et al. (2011) write: ”The Pareto parameter is estimated using the ratio of the top 5 percent income share to the top decile income share (. . . ). Because those top income shares are often based on survey data (and not tax data), they likely underestimate the magnitude of the changes at the very top.” True if F is Pareto I above the top decile and if no outliers Otherwise, ˆ↵ might as well be biased downward ! The estimation of the tail index of Pareto I model on surveys is not necessarily upward-biased, it can also be downward-biased
  • 27. Beliefs on tax data It is widely believed that ˆ↵ is much more reliable in tax data No estimation bias: no topcoding, underreporting But sensitive to misspecification bias: A threshold too low may lead to severe under-estimation of the tail index (over-estimation of inequality), even with millions of observations Jenkins (2017): threshold higher than the 99.5-percentile7 ! This suggests that fitting Pareto model on tax data should be done with caution 7 For Pareto I model fitted on U.K. tax data for several years 1995-2010