1. 0
Executive summary
The central theme of this paper is to understand which factors might be related to
income inequality. Gini index was set as response variable and several explanatory
variables were examined. At last, only four of explanatory variables were used in the
multiple regression to model Gini index across courtiers.
Introduction
The Gini index is a measurement of the income distribution of a country's residents.
This number, which ranges between 0 and 100 and is based on residents' net
income, helps define the gap between the rich and the poor, with 0 representing
perfect equality and 100 representing perfect inequality. Research paper has shown
that Gini index is significantly associated with macroeconomic factors such as growth
rate, income level, and investment rate. This paper looks beyond macroeconomic
factors, but considers indicators such as education, urban population, unemployment
rate and so on. Based on cross-country data, the goal is to find what explanatory
factors are highly related to income inequality and how those factors influence the
unfairness.
Data
The year 2010 Gini index across country was downloaded from The World Bank and
data form 2011 or 2009 was treated as 2010’s in order to have more observations. At
beginning, I started with setting “lending interest rate” and “expenditure on education
as percentage of GDP” as explanatory variables. Although data was from the same
year 2010, the data shared by the same countries are so few that I had to give up
these two explanatory variables because they made observations too small. Then I
found putting the following five explanatory variables together with Gini index could
end up with 84 observations. The variables “Population” and “unemployment” were
removed after fitting the model because their p-value are too big which indicate they
are not statistically important.
Data description
2. 1
Explanatory variables Description
GDP per capita Gross domestic product divided by
midyear population
Population Total population, which counts all
residents regardless of legal status or
citizenship
Unemployment The share of the labor force that is
without work but available for and
seeking employment
Urban population The percentage of people who live in
urban area
Enrollment ratio Percentage of total enrollment in tertiary
education (ISCED 5 to 8), regardless of
age
Model 1 summary
Methods
Model 2 summary
3. 2
34.85% of variation in the response variable that can be explained by the
explanatory variables, which is not high but acceptable. All the explanatory
variables are significant.
To compare between models 1 and 2, we can do a partial F-test:
H0 : βpopulation= βunemployment= 0
Ha : at least one slope is not 0
The F statistic is 0.1524 with 80 and 78 degrees of freedom. The p-value of 0.8589 is
far larger than α=0.05. Therefore we accept the null hypothesis and conclude that
employment and population are not significant. If we look at the individual t-tests for
slopes in the summary of the larger model, we see that two variables are not
significant. Therefore, the smaller model is better.
4. 3
Based on the scatter plot above, we can see percentage of urban population has a
positive linear relationship with enrolment ratio of high-level education. This indicates
that high percentage of urban population is correlated with high enrolment ratio of
high-level education. GDP has curved relationship with percentage of urban
population and enrolment ratio. Gini index has negative linear relationship with GDP
per capita, enrolment ratio and percentage of urban population. The correlation
coefficients between Gini index and the three explanatory variables are all negative.
In addition, VIF values were computed for each explanatory variable in model and
they are all not big so multicollinearity is not a problem for this model.
5. 4
In the normal residual plots for “GDP per capita”, variability of residuals
increases as x value increases, residuals heteroscedastic. There is cloud of
points in residuals plots for urban population and enrolment ratio, residuals
homoscedastic. In the normal Q-Q plot, the points do not form a line, which
indicates that the assumption of normality of residuals is not satisfied.
Results
6. 5
The estimated regression:
𝑦𝐺𝑖𝑛𝑖 = (𝐺𝑖𝑛𝑖 𝑖𝑛𝑑𝑒𝑥)0.355
− (𝐺𝑖𝑛𝑖 𝑖𝑛𝑑𝑒𝑥/𝑑𝑜𝑙𝑙𝑎𝑟𝑠)0.000149×𝑥𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎
+ (𝐺𝑖𝑛𝑖 𝑖𝑛𝑑𝑒𝑥/%) 0.227×𝑥𝑢𝑟𝑏𝑎𝑛 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
− (𝐺𝑖𝑛𝑖 𝑖𝑛𝑑𝑒𝑥/%)0.2117× 𝑥𝑒𝑛𝑟𝑜𝑙𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑖𝑜
Interpretation of y-intercept and slopes:
• Y-intercept: when GDP per capita, percentage of urban population and
enrolment ratio are equal to 0, the Gini index should be 0.355. In our
data, values of the three explanatory variables are far above 0, so it is
extrapolation. In practice, GDP per capita, percentage of urban
population and enrolment cannot be 0 so y- intercept does not make
sense.
• Partial slope for GDP per capita: holding other explanatory variables
constant, one dollar increase in GDP per capita is associated with
0.000149 decrease in Gini index on average.
• Partial slope for percentage of urban population: holding other
explanatory variables constant, 1% increase in the percentage of urban
population is associated with 0.227 increase in Gini index on average.
• Partial slope for enrolment ratio: holding other explanatory variables
constant, 1% increase in the percentage of high-lever education
enrolment is associated with 0.2117 decrease in Gini index on
average.
All three explanatory variables are statistically significant from zero, and
should be kept in the model. Improvement is still necessary, for example,
variance stabilizing transformation may be needed for the explanatory
variable “GDP per capita”. Alternatively, log() can be used to transform
“GDP per capita” to make its histogram look more symmetric.
7. 6
References
Worldbank. (2016). GINI index (World Bank estimate). [online] Available at:
http://data.worldbank.org/indicator/SI.POV.GINI [Accessed 14 Apr. 2016].
Sarel, M. (1997). How Macroeconomic Factors Affect Income Distribution:
The Cross-Country Evidence. IMF Working Papers, 97(152), p.1.