SlideShare a Scribd company logo
1 of 36
1
TABLE OF CONTENTS
PAGE
SECTION 1
Introduction……………………………………………………………………………………………………………………………………..2
SECTION 2
Literature review………………………………………………………………………………………………………………………………3
SECTION 3
Descriptive Analysis:Part1……………………………………………………………………………………………………………... 6
Descriptive Analysis:Part2…………………………………………………………………………………..............................18
SECTION
ConcludingRemarksandRecommendations …………………………………………………………………………………..32
REFERENCES………………………………………………………………………………………………………………………………………….33
LIST OF TABLES
Table 1: Descriptive statisticsinLabourMarket…………………………………………………………………………………12
Table 2: Descriptive statisticsinSouthAfricanLabourMarket…………………………………………………………..18
Table 3: Multiple LinearRegression(income)……………………………………………………………………………………18
Table 4: Testfor RandomSamplingRace Variable …………………………………………………………………………….26
Table 5: Testfor RandomSamplinginGendervariable………………………………………………………………………27
Table 6: Testfor RandomSamplinginBest age variable…………………………………………………………………….27
Table 7: Testfor RandomSamplinginEducationvariable………………………………………………………………….27
Table 8: Testfor RandomSamplinginFinal EconomicSectorvariable……………………………………………….28
Table 9: Testfor RandomSamplinginFinal occupationvariable………………………………………………………..29
Table 10: Test forRandom SamplinginGeographicareavariable………………………………………………………29
Table 11: CollinearityDiagnostics………………………………………………………………………………………………………29
LIST OF FIGURES
Figure 1: Income distributioninSA Labour Market………………………………………………………………………….6
Figure 2: An Example of Whatthe Distributionof Income ShouldLookLike………………………………………..7
Figure 3: Average Labourincome byrace in SA Labour Market…………………………………………………………..8
Figure 4: Average income byGenderinSA Labour Market………………………………………………………………….9
Figure 5: Average income byAge inSouthAfricanLabour Market……………………………………………………..10
Figure 6: Average income bylocationinSouthAfricanLabourMarket………………………………………………11
Figure 7: Descriptive statisticsinSA Labourmarket……………………………………………………………………………..13
Figure 8: Labour income acrossLocation Type…………………………………………………………………………………..14
Figure 9: Labour Income DistributionacrossGenderinSouthAfrican……………………………………………….15
Figure 10: Labour Income DistributionacrossAge CohortsinSA………………………………………………………16
Figure 11: Testfor Normal distribution……………………………………………………………………………………………..23
Figure 12 Scatterplotsof eachindependentvariable plottedagainstresiduals…………………………………24
Figure13: Testfor ZeroConditional Mean………………………………………………………………………………………….30
Figure 14: Testfor Homoscedasticity…………………………………………………………………………………………………31
2
INTRODUCTION
South Africa is the second largest economy in Africa, rich in natural resources and a leading
producer of platinum, chromium, iron and gold. From 2002 to 2007, South Africa grew at an
average of 4.5 percent year-on-year, which is reported to be its fastest expansion since the
demise of Apartheid in 1994 (Trade Economics, 2009).The ANC in recent years failed to address
the structural problems and instabilities in the economy such as the widening gap between rich
and poor, high unemployment rate, low skilled labour force, crime rates and corruption. In fact,
amongst developing and transition nations, South Africa has a relatively high Gini-coefficient
showing that distribution is markedly unequal, (Bhorat et al., 2009). According to Van Der
Berg(2010), wage inequality is deeply rooted in South Africa’s history and plays a central role
in overall income distribution.The Marikana incident is evidence that not all is well in the labour
market (Burger, 2015). It therefore comes as no surprise that we are interested in what
determines household’s income in South African labour markets. We achieve this by studying
the determinants of labour income and how these factors contribute to the income distribution.
This paper proceeds as follows: Section 1 is the introduction, Section 2 reviews and summarizes
the relevant literature about the studies of the distribution and determinants of income in the
Labour Market. Section 3 includes descriptive analyses of the output received from Stata as well
as a regression analysis and the interpretation thereof, and the final section will include
concluding remarks and policy recommendations.
3
LITERATURE REVIEW
Coming into power in 1994, the new government implemented various policies in order to redress
inequality and discrimination that existed in the market at large. Instabilities caused by Apartheid
are still haunting South Africa currently and the labour market is of no exception.
Some jobs pay a lot more than others and are in greater demand than others. Generally the more
skills, education and training one has the higher the earnings. According to a Cambridge
University Lecturer, Rana Sinha (2010) no matter how highly one prices him/herself it all boils
down to three factors: how others value what you do, how good you are at what you do and how
difficult it is to replace you. As job market continually changes the job skills that are needed are
also evolving. Many of jobs require specific skills and to acquire those one has to go through
training and education.
Those skills include hard and soft skills. Hard skills are for example: plumbing in a house, several
months of training are needed to learn this skill. In addition to hard skills we also consider soft
skills and these include leadership, teamwork and communication skills as well as ability to deal
with difficult people and situations etc. To some people this comes naturally whilst to other people
it doesn’t and they acquire it through education and practice in a similar way to hard skills. Having
both good soft and good hard skills will help you improve thereby improving your potential
earnings. The more education one attains the more probable he/she will earn a higher income.
According to US Census Bureau (2000) people who have a bachelor’s degree or higher earn nearly
twice as much as people with only a high school diploma with the difference amounting to almost
$1 million over a lifetime. The productivity of workers depends on their ability and on their
education: workers with higher innate ability are more productive and acquiring education allows
access to the high-skilled sector, where the productivity per unit of ability is higher in general.
Jobs that were popular 20 years ago are no longer as popular nowadays. As technology advances
the job skills also evolve. For example typists were common in the nineties but nowadays they are
not popular and jobs that require unique and new skills have emerged e.g. Website Designers.
4
Highly demanded jobs are normally highly paid, like any other Market. Since 1994 the Labour
Market has since been fundamentally reformed with a range of laws primarily aimed at protecting
worker’s rights and promoting workers from previously disadvantaged groups (blacks, disabled
and women).
The Basic Conditions of Employment Act of 1997 provides for the improvement of labour market
standards through the setting of minimum wages by the minister of labour and bargaining councils.
The Labour Relations Act of 1996 promotes wage determination through collective bargaining.
Bargaining councils serve as industry specific intermediaries between employers and employees
during wage negotiations. Minimum wages set are standardised across industries, regardless of the
size and physical location of the firm. According to Barker (2009), in the absence of the
bargaining councils large firms would be paying high whilst small firm are paying low. The
introduction of the minimum wages law standardises wages in the market and eliminates wage
differentials. Whether one is a member of a union or not affects the wage (income) levels as well.
Since their dramatic rise in 1980s workers wage premiums have increased (Woolard and Burger,
2005). Hofmeyr (2001) also found a similar effect of unionisation. He mentioned that there is
there is a substantial wage premium attached to union membership, and that this is rising over
time, suggesting that the unionised part of the workforce is able to insulate their wage levels against
changing labour market conditions. The geographical area where one chooses to work in has an
impact on their wages (income) too. Following the neoclassical assumption of a clearing labour
market, if workers are allowed mobility and have an attractive area to move to they will move. As
increased amounts of workers move to these areas, this will in turn increase labour supply and
therefore depress wages.
The employers will set the wages high in order to deter their workers from moving to another
area. Areas with high skilled labour tend to have high wage levels as well. Therefore, a sustained
equilibrium with different wage rates across different areas can occur. Recent events such as the
labour unrest in South Africa has shown that a growing number of South Africans are tired of the
inequalities and becoming restless.
Labour market problems and unrests affect the output produced. As less output is now produced
this adversely affects the economy. As the economic growth rate declines it also has a downward
pressure on wages. In his article, Sinha (2010) also pointed out that If an employer knows that it
5
would take him months to get someone as qualified and experienced as you to do your job as well
as you do it, your employer has an incentive to pay you the wage that you demand.
A person’s age affects the ‘reservation wage’ (i.e. the minimum wage one is willing to receive).
This topic like any other in economics has recently gained attention with researchers dedicating
their time to study how age affects the reservation wage. Burger and Woolard (2005) are some
researchers that found a positive relationship. Nattrass and Walker(2005) in their paper also
mentioned a similar relationship adding that the positive relationship is not only found among the
unemployed it is also found among the employed, with the employed demanding a relatively
higher reservation wage than the unemployed. Older job seekers tend to be less willing to work
for a reduced wage (Ahn and Garcia-Pérez 2002).
Overall income inequality in South Africa has not improved and remains extremely high at a Gini
coefficient of 0.65 (World Bank, 2011) for the past decade. The Gini coefficient measures the gap
between the rich and the poor citizens of a country. This coefficient lies between 0 and 1, the
higher the Gini-coefficient the wider the gap between the rich and the poor. Although it has
remained high, its nature has changed in South Africa. Historically, as a result of policies
implemented during apartheid high levels of income inequality have been driven by inequality
between racial groups. More recent studies from 1999-2001 Census have seen intra racial
distribution improving whilst income inequality within racial groups worsens (Bhorat, et al.,
2009).
According to Stiglitz (2013), income inequality in rich countries around 1990s and 2000s declined
significantly and it was due to an improvement on gender inequality which was for a long time
contributing largely to the increase in their labour market inequality. It is not so in South Africa.
Although the labour market after 1994 is becoming more feminised, with women making up to
3-fifths of new labour market entrants, the decrease is not strongly contributing enough to prevent
the labour market Gini-coefficient from rising (Trade Economics, 2009). Inequality within races
especially blacks contributes greatly to income inequality (Bhorat, 2009).
6
DESCRPTIVE ANALYSIS: PART 1
Figure 1: Income distribution in SA Labour Market
Looking at the above histogram it is clear that the data is skewed to the right and does not follow
a normal distribution, which is what we expect it to be. This means that most of the data points lie
within the first income bracket, however, we can also see that there seems to be a spike between
R40000 and R50000, which could be due to outliers. This is a perfect representation of the unequal
distribution of income in South Africa. Very few people earn very large salaries whereas the
majority of people living in South Africa, regardless of which race group they belong to, earn a
very low wage. The income gap between rich and poor is very large and we know that South Africa
faces a large inequality when looking at income distribution.
0
.1.2.3.4
Density
0 10000 20000 30000 40000 50000
income
7
Figure 2: An Example of What the Distribution of Income Should Look Like
This histogram is an example of what we expect the distribution of income to look like and we
achieve this by logging the income variable since income is a non-linear function. This is a normal
distribution and this would represent a fair distribution of income.
0
100020003000
Density
2 4 6 8 10 12
lincome
Distribution of Income When Logging Income Variable
8
Figure 3: Average Labour income by race in SA LabourMarket.
This represents the average income across different race groups in South Africa. It is clear that, on
average, White people hold the largest share of income from the labour market and African people
hold the smallest share of income from the labour market. These differences can be explained by
the apartheid era, which was the division of race groups and the opportunities received by each
race group. White people had more opportunities to prosper than African people and the reason
why we still see this gap after so many years since apartheid ended, is because the opportunities
received by White people during apartheid set them up to still reap the benefits now, whereas
African people are still trying to catch up. From what we know and see above, it can be said that
the lower the bar, the less opportunities were received during apartheid.
African Coloured
Indian White
0
5,000
10,00015,00020,000
0
5,000
10,00015,00020,000
1 2
3 4
meanofincome
Graphs by RECODE of w3_best_race (Best population group)
Mean Income By Race
9
Figure 4: Average income by Gender in SA LabourMarket
On average the income distribution between men and women is fairly large. On average males
tend to earn a lot more than females. It was only in the last few decades that we have seen women
being viewed as more and more equal to men, but according to the above graph we can see that
there’s still a lot of progress to be made.
Male Female
0
2,0004,0006,0008,000
1 2
meanofincome
Graphs by RECODE of w3_best_gen (Best gender)
Mean Income By Gender
10
Figure 5: Average income by Age in South African LabourMarket
This graph represents the average income distribution across various age groups. From what we
can see, on average, the age group “55-59” holds the highest share of income and the age group
“80-84” has the smallest share of income. It makes sense for the age group “55-59” to be the
highest on average, because at that stage in your life you are assumed to have been working for a
few years and have long enough job tenure and experience to earn a good salary. What does not
make sense is the age groups, “0-1”, “2-4”, “5-9”, “10-14” and “15-19” to have larger shares of
income on average than some of the older age groups because at those young ages it is not
expected of them to be working and earning any form of income from the labour market. Since it
is on average it can be expected that these things could occur because it across all age groups and
has not been adjusted for these problems.
0-1 2-4 5-9 10-14 15-19
20-24 25-29 30-34 35-39 40-44
45-49 50-54 55-59 60-64 65-69
70-74 75-79 80-84 85+
0
5,000
10,00015,00020,000
0
5,000
10,00015,00020,000
0
5,000
10,00015,00020,000
0
5,000
10,00015,00020,000
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19
meanofincome
Graphs by RECODE of w3_age_intervals (Age Intervals)
Mean Income By Age Cohort
11
Figure 6: Average income by location in South African Labour Market
This graph depicts the average income across different geographic area types. From the above
graph it is clear that people living in urban areas earn more, on average, than those living in
traditional or farm areas. It is to be expected that people living in urban areas would earn more
because urban areas tend to be around the central business district (CBD) where job opportunities
tend to be. Traditional and farm areas are far from the CBD and, therefore, are not close to job
opportunities and, therefore, will not earn more than those living in urban areas.
Traditional Urban
Farm
0
2,0004,0006,0008,000
10,000
0
2,0004,0006,0008,000
10,000
1 2
3
meanofincome
Graphs by RECODE of w3_hhgeo2011 (GeoType (2011 Census))
Mean Income By Geographic Area Type
12
Table 1: Descriptive statistics in Labour Market
Source: Wave 3 Data: Southern Africa Labour and Development Research Unit. National
Income Dynamics Study 2012, Wave 3 [dataset]. Version 1.3. Cape Town: Southern Africa
Labour and Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor],
2015
The table above shows the important information needed to describe the data in the most
simplistic way. We see each variable’s minimum and maximum value, their standard deviation
(which describes the volatility of the variable), their average frequencies and their total amount
of observations. These values can be used to evaluate the data, since these are the most important
values required for data analysis, and these values can be used to describe an individual’s age,
education, race, gender, and their occupation and in which sector of the economy they work.
From the table we can conclude that the education variable is the most volatile with the highest
standard deviation and the gender variable is the least volatile, however, we do see a large
difference between the income earned by males and females when looking at the “mean income
by gender” graph above.
best_age 41,970 27.45149 20.49551 0 105
gender 42,050 1.543924 .4980729 1 2
race 42,050 1.269631 .6680649 1 4
best_educa~n 39,582 6.228942 4.646704 0 18
final_econ~r 5,401 6.277356 3.51116 1 11
final_occu~n 21,426 8.819378 1.941586 1 10
Variable Obs Mean Std. Dev. Min Max
13
Figure 7: Labour income across Location Type
This pie chart represents the distribution of income across geographic areas in South Africa. These
geographic areas consist of, “Traditional, Urban and Farms”. The largest percentage of income is
held by those that live in urban areas, which is to be expected. Those that live in urban areas are
closer to the central business district (CBD), where the majority of job opportunities tend to be.
Those that live on farms hold the least amount of overall income because farms tend to be beyond
the outskirts of urban areas, where very few people live and their livelihoods largely depend on
farming, also in most cases these individuals farm for themselves, which is known as subsistence
farming.
14
Figure 8: Income Distributionacross Race Groups in SouthAfrica
What we see in this pie chart is the distribution of income across the different race groups in South
Africa. This is made up of, “African, Coloured, Asian/Indian and White”. The African race group
holds the largest percentage of overall income. This is so, because the African race group is the
majority race group in South Africa. They might not necessarily be the wealthiest race group, but
because they are the majority race group it is clear that they will hold the largest percentage of
income when taking into account the entire country and dividing it into race groups. The
Asian/Indian race group holds the smallest percentage of the overall income in South Africa. This
could be due to their race group being a minority in South Africa. They could all be wealthy, but
because we are only analysing according to race groups, and not taking into account the size of the
race group and then only using the size to determine the percentage of income held by each race
group could give skewed results because it does not tell us whether the African race group holds
the majority of income because they are the wealthiest or because they are the majority race group,
50.59%
10.72%
6.522%
32.17%
1. African 2. Coloured 3. Asian/Indian 4. White
Income Across Race Groups
15
and that is what makes this pie chart generally useless when determining the distribution of income.
It does not tell us about the equality of income distribution among race groups.
Figure 9: Labour Income Distribution across Gender in South African
This pie chart tells us about the distribution of income between men and women living in South
Africa. Males hold the larger percentage of income from the labour market and females hold the
smaller percentage. It is clear from the above pie chart that the income distribution among men
and women are far from being equal, but there has been some improvements in the last few
decades, and women are being seen as more and more equal to men. These are rough estimations
of the income distribution among males and females, as yet again we find that this pie chart is
flawed because the distribution could purely be based on the fact that there are more men than
women in the labour force and not necessarily the real distribution of income among men and
70.58%
29.42%
Male Female
Income Across Gender
16
women taking into account the size of the working male population and the size of the working
female population and adjusting for these differences. Once that is done the results will be more
accurate.
Figure 10: Labour Income Distribution across Age Cohorts inSA
The income distribution among age groups is what is being represented above. This pie chart
demonstrates the income distribution across all age groups. It is to be expected that those in the
“0-1”, “2-4”, “5-9” and “10-14” age groups hold the least amount of income because these are
children whom have not yet reached the legal age to start working and receiving income from the
labour market, but a certain amount of them do still receive a percentage of overall income, which
could be due to a variety of reasons including child labour. The largest percentage of income is
held by the age group, “30-34”. This is because at this stage in one’s life it is to be assumed that
17
one has received some sort of qualification and have been working for some time and those that
do not necessarily have a qualification is assumed to be working and well established at their job.
Excluding the first five age groups, income seems to be fairly distributed among the different age
groups.
18
DESCRIPTIVE ANALYSIS: PART 2
Table 2: Descriptive statistics in South African Labour Market
i.race _Irace_1-4 (naturally coded;_Irace_1 ommmitted)
i.gender _Igender_1-2 (naturally coded;_Igender_1_omitted)
i.best_age _Ibest_age_1-5 (naturally coded;_Ibest_age_1 omitted)
i.best_education _Ibest_education_1-2 (naturally coded;_Ibest_educ_1 omitted)
i.final_economic_sector _Ifinal_economic_sector_1-2 (naturally coded;_Ifinal_eco_1 omitted)
i.final_occupation _Ifinal_occupation_1-2 (naturally coded;_Ifinal_occ_1 omitted)
i.geographic_area _Igeographic_area_1-3 (naturally coded;_Igeographi_1 omitted)
Table 3: Multiple Linear Regression(income)
Variable Coefficient p-value
_Irace_2 -.08176 0.287
_Irace_3 .7814 0.000
_Irace_4 .9395 0.000
_Igender_2 -.0980 0.090
_Ibest_age_2 -.1942 0.023
_Ibest_age_3 .0549 0.500
_Ibest_age_4 -.2264 0.110
_Ibest_age_5 .0348 0.925
_Ibest_educ_2 .2322 0.002
_Ifinal_eco_2 .3767 0.000
_Ifinal_occ_2 -.1863 0.131
_Igeographi_2 .2210 0.002
_Igeographi_3 -.0966 0.396
_cons 7.7714 0.000
Number of observations: 961
R-squared= 0.2069
Source: Wave 3 Data: Southern Africa Labour and Development Research Unit. National
Income Dynamics Study 2012, Wave 3 [dataset]. Version 1.3. Cape Town: Southern AfricLabour
and Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor], 2015
19
For this regression analysis we will assume a perfect market with perfect information. The above
table represents a regression on income. We have chosen income as the dependent variable, this
variable had to be logged, since raw income data is not a linear function, and the independent
variables are race, gender, age, education, economic sector, occupation and the geographic area.
The independent variables will be used in the regression analysis to explain the variation in the
dependent variable (Wooldridge 2014: 57).
In order to do this analysis we will look at the R-squared, the coefficients, the standard error, the
t-statistic and the p-values. This is a multiple regression equation and the general form is y = β0 +
β1X1 + β2X2 + u, where y is the dependent variable and X1 and X2 are the independent variables;
β0 represents the intercept and β1 is the slope; β1 and β2 are the parameters of X1 and X2, β0 is a
constant and u represents the error term, which are all the variables that we have not included in
the regression equation, but could have an effect on the variation in the dependent variable
(Wooldridge 2014: 59 – 60).
The resulting equation from running the regression is: log(income) = 7.7714 – 0.0818(_Irace_2)
+ 0.7814(_Irace_3) + 0.9395(_Irace_4) – 0.0980(_Igender_2) – 0.1942(_Ibest_age_2) +
0.0549(_Ibest_age_3) – 0.2264(_Ibest_age_4) + 0.0348(_Ibest_age_5) + 0.2322(_Ibest_educ_2)
+ 0.3767(_Ifinal_eco_2) – 0.1863(_Ifinal_occ_2) + 0.2210(_Igeographi_2) –
0.0966(_Igeographi_3). With an R-squared of 0.2069 and the number of observations are 961.
Each one of the above independent variables are categorical. Each observation of the above
independent variables will fall within a certain category, for example, if we take the education
variable, each observation will fall into a certain category. The observation could either fall into
the category of a grade 10 level of education or the category of a grade 12 education or the category
of bachelor’s degree and diploma. Each of the selected independent variables have been divided
into categories in order for the regression output to be more accurate when analyzing.
The regression is done by taking the first group and omitting it from the model and then comparing
the other groups to this base group. It is important to also know the overall effect that the divided
independent variable will have on the dependent variable.The coefficient of each independent
variable indicate a certain effect that the variable will have on the dependent variable should there
be any changes in the independent variable, holding all other independent variables constant. It is
important to test the significance or the ‘necessity’ of each independent variable when it comes to
20
explaining the variation in the dependent variable. For this analysis we can make use of the t-test
as well as the interpretation the p-values given in the model. The t-statistic is calculated to test if
one x-variable has a significant effect on the y-variable, when holding all the other x-variables in
the regression equation constant (Wooldridge 2014: 98). The t-statistic is calculated by first stating
a null hypothesis, which is what we believe to be true, against an alternative hypothesis. The null
hypothesis will be written as follows: H0: βj = 0, where j corresponds to any of the independent
variables, and the alternative hypothesis will be written as follows: H1: βj > 0 for a one-sided test,
and H1: βj ≠ 0 for a two-sided test. The t-statistic is defined as follows: t≡
^
j/se (
^
j) (Wooldridge
2014: 98).
At the end of this analysis we have included a regression output of raw data (Figure 1) so that the
overall significance of each variable can be shown and how each one will affect the variation in
income. We will use the p-value analysis; if the p-value is less than a specified significance level
then we would reject the null hypothesis and conclude that the variable is significant.
Race variable: The race groups included in the analysis is Coloured, Indian and White. Only the
White and Indian race groups are statistically significant due to both variables having a p-value of
0.000. Race has a significant impact on income. As revealed by previous literature, income
inequality between races has seen its betterment since the demise of Apartheid, but what seems to
enlarge the income gap is inequality within races, especially non-Whites (Indians, Coloureds and
Africans) (Bhorat, et al., 2009). This could be a result of unintended consequences of the
government’s policies such as Affirmative Action which was implemented to redress past
injustices which seem to benefit only a small portion of “qualified blacks”.
Gender variable: the first gender category (male) is the base category. The second gender category,
female, has been included in the model. This variable has a negative effect on income; a 1 unit
increase in females, decreases income by 9.80%. However, it is not statistically significant due to
its p-value equaling 0.090.
Best age variable: This variable has been divided into 5 categories; category 1 is from 0-20 years
of age, category 2 is from 21-40 years of age, category 3 is from 41-60 years of age, category 4 is
from 61-80 years of age and category 5 is from 81-105 years of age. Categories 2-5 have been
included in the model, but only category 2 is significant with a p-value of 0.023. This variable will
21
affect income because this is the age group where the observations are developing their careers. A
1 unit increase in _Ibest_age_2 will cause a 19.42% decrease in income. Logically this does not
make sense because we know that as you get older, you get more experience, therefore, seen as
more competent and will receive a higher income.
Best education variable: Education has been divided into 2 groups. Group 1 consists of – education
received from grade 1 to grade 12, and group 2 consists of – post school education, other and no
schooling. If group 2 increase by 1 unit then income increases by 23.22%. This makes logical
sense because when an individual is more educated they are believed to be more competent and
will receive a higher income.
Final economic sector variable: Consists of 2 groups. Group 1 – private households, agriculture,
fishing, forestry, mining and quarrying, manufacturing (e.g. clothing, food), electricity, gas, water
and construction. Group 2 – wholesale/retail, transport, storage and communication, finance, real
estate, business services, community, social and personal services, catering and accommodation
and other. Group 2 is in the model and is statistically significant because of a p-value equal to
0.000. If an individual moves from a small sector to a large sector, he/she will possibly earn more
income and have better job opportunities.
Final occupation variable: This variable also consists of 2 groups. Group 1 – managers,
professionals, technicians and associate profession, clerical support workers and service and sales
workers. Group 2 – skilled agricultural, forestry and fishing, craft and related trade workers, plant
and machine operators and associate profession, elementary occupation and never worked. Group
2 of the final occupation variable is included in the model and it is statistically significant as it has
a p-value of 0.131.
Geographic area variable: There are 3 groups within this variable. Group 1 is traditional, group 2
is urban and group 3 is farms. Only group 2 is statistically significant with a p-value of 0.002. The
assumption is that living in an urban area will increase the prospect of finding a job with a better
income than living in a traditional or farm area. In order for this to make logical sense let’s say for
example, someone living in a traditional/rural area moves to the urban area, where more jobs tend
to be, they are able to increase their income.
22
Overall, each independent variable’s effect on income can be explained, whether it increases or
decreases income.
If we look at figure 1, the raw Stata output regression model, the only variables that are overall
statistically significant are race, education and the economic sector.
The R-squared equals 0.2069. The R-squared is used to evaluate the validity of the independent
variables used to describe the variation in the dependent variable. It is used to test the overall
significance of the model. The R-squared of this model is low, which means that the independent
variables explain very little of the variation in income. This is also referred to as the goodness-of-
fit test. One of the major values to look at in this model is the number of observations. There is a
drastic decrease in the number of observations in the data and the number of observations in the
model. It decreased from 52466 to 961. This indicates that there must be an error in the regression
output.
When doing any type of multiple regression analysis there are certain assumptions that are made
and have to be satisfied in order for the regression model to be useful. These assumptions are as
follows:
1. Residuals should be normally distributed
2. Linear in parameters
3. Random sampling
4. No perfect collinearity
5. Zero conditional mean
6. Homoskedasticity
Each assumption will be tested using the first regression model.
23
1. Residuals should be normally distributed
 “The population error u is independent of the explanatory variables…and is normally
distributed with zero mean and variance…” (Wooldridge 2014: 94).
Figure 11: Test for Normal distribution
We use a histogram to represent the distributions of the residuals. The above histogram clearly
shows us that the distribution of the residuals are normally distributed. Also, from the normal
probability-plot above there are no indications of non-normality. Therefore, the above regression
model satisfies the first condition of residuals having to be normally distributed.
1. Linear in parameters
 “The model in the population can be written as: y = β0 + β1X1 + β2X2+…+ βkXk + u,
where β0, β1,…, βk are the unknown parameters (constants) of interest and u is an
unobserved random error or disturbance term” (Wooldridge 2014: 71).
0
.1.2.3.4.5
Density
-4 -2 0 2
Residuals
Historgram of Residuals
0.000.250.500.751.00
NormalF[(r-m)/s]
0.00 0.25 0.50 0.75 1.00
Empirical P[i] = i/(N+1)
Normal Probability Plot
24
Figure 12: Scatter plots of each independent variable plotted against residuals-6-4-2
024
Residuals
1 2 3 4
RECODE of w3_best_race (Best population group)
Plotting Race Variables Against Residuals
-6-4-2
024
Residuals
1 1.2 1.4 1.6 1.8 2
RECODE of w3_best_gen (Best gender)
Plotting Gender Variable Against Residuals
-6-4-2
024
Residuals
0 20 40 60 80 100
RECODE of w3_best_age_yrs (Best age - years)
Plotting Best_Age Variable Against Residuals
-6-4-2
024
Residuals
0 5 10 15 20
RECODE of w3_best_edu (Best Education)
Plotting Best_Education Variable Against Residuals
-6-4-2
024
Residuals
0 5 10
final_economic_sector
Plotting Final_Economic_Sector Against Residuals
-6-4-2
024
Residuals
0 2 4 6 8 10
final_occupation
Plotting Final_Occupation Variable Against Residuals
-6-4-2
024
Residuals
1 1.5 2 2.5 3
RECODE of w3_hhgeo2011 (GeoType (2011 Census))
Plotting Geographic_Area Variable Against Residuals
25
 Augmented partial residual plot for each variable will give a clearly
 Looking at the augmented partial residual plot for race and best_age there is not an extreme
deviation from linearity, so we could say that they are linear.
Note: Cannot plot augmented partial residual plot for gender since gender^2 is collinear with
gender.
 Best education and final economic sector also do not have extreme deviations from
linearity and we can say that they are linear.
-6-4-2
024
1 2 3 4
RECODE of w3_best_race (Best population group)
Augmented Partial Residual Plot For Race Variable
-6-4-2
02
Augmentedcomponentplusresidual
0 20 40 60 80 100
RECODE of w3_best_age_yrs (Best age - years)
Augmented Partial Residual Plot For Best_Age Variable
-6-4-2
024
0 5 10 15 20
RECODE of w3_best_edu (Best Education)
Augmented Partial Residual Plot For Best_Education Variable
-4-2
024
Augmentedcomponentplusresidual
0 5 10
final_economic_sector
Augmented Partial Residual Plot For Final_Economic_Sector Variable
26
 Final occupation does not have large deviations from linearity and geographic area has
large deviations, but not large enough to say that it is non-linear.
 Overall, all variables satisfy the condition of linear in parameters except for the gender
variable, therefore our model does not satisfy this condition, unless we remove the gender
variable from the model.
2. Random Sampling
 “We have random sample of n observations…following the population model…”
(Wooldridge 2014: 72).
 We will use the “runtest” command in STATA to test for random order using the median
as the threshold and the number of runs compared to the number of observations will tell
us whether there is random sampling or not.
a) Table 4: Test for Random Sampling Race Variable
o There are 3600 runs in these 42050 observations. The large number of runs indicates that
residuals are serially independent or that they are random.
-6-4-2
02
0 2 4 6 8 10
final_occupation
Augmented Partial Residual Plot For Final_Occupation Variable
-4-2
02
Augmentedcomponentplusresidual
1 1.5 2 2.5 3
RECODE of w3_hhgeo2011 (GeoType (2011 Census))
Augmented Partial Residual Plot For Geographic_Area Variable
Prob>|z| = 0
z = -146.26
N(runs) = 3600
obs = 42050
N(race > 1) = 7676
N(race <= 1) = 34374
27
Table 5: Test for Random Sampling in Gender variable
o There is 1 run and 42050 observations. The number of runs is too low to have serially
independent residuals, therefore, it violates the random assumption
Table 6: Test for Random Sampling in Best_age variable
o There are 18935 runs and 41970 observations. The large number of runs indicate
randomness or serial independence of residuals.
Table 7: Test for Random Sampling in education variable
Prob>|z| = .
z = .
N(runs) = 1
obs = 42050
N(gender > 2) = 0
N(gender <= 2) = 42050
Prob>|z| = 0
z = -19.81
N(runs) = 18935
obs = 41970
N(best_age > 23) = 20275
N(best_age <= 23) = 21695
Prob>|z| = 0
z = -31.45
N(runs) = 16562
obs = 39582
N(best_educa~n > 7) = 18246
N(best_educa~n <= 7) = 21336
28
o There are 16562 runs and 39582 observations. The large number of runs means that there
is randomness or that the residuals are serially independent.
Table 8: Test for Random Sampling in Final Economic Sector variable
o There are 2426 runs and 5401 observations. The large number of runs indicate that
residuals are serially independent, which means that they are random.
o
Table 9: Test for Random Sampling in Final occupation variable
o Here we only have one run and 21426 observations. This low amount of runs indicates
that the residuals are not serially independent and this violates the random sampling
assumption.
o
Prob>|z| = 0
z = -6.78
N(runs) = 2426
obs = 5401
N(final_econ~r > 7) = 2420
N(final_econ~r <= 7) = 2981
Prob>|z| = .
z = .
N(runs) = 1
obs = 21426
N(final_occu~n > 10) = 0
N(final_occu~n <= 10) = 21426
29
Table 10: Test for Random Sampling in Geographic area variable
o Here we have 3137 runs and 49807 observations. The large number of runs means that
the residuals are serially independent and the variable meets the requirements for random
sampling.
 The random sampling assumption has been violated since final occupation and gender do
not have serially independent residuals. These variables will have to be removed in order
for the model to meet the requirements to satisfy this assumption.
3. No perfect collinearity
 “In the sample (and therefore in the population), none of the independent variables is
constant, and there are no exact linear relationships among the independent variables”
(Wooldridge 2014: 72).
Table 11: Collinearity Diagnostics
Variable VIF SQRTVIF TOLERANCE 𝑹 𝟐
Race 1.13 1.06 0.8883 0.1117
Gender 1.00 1.00 0.9960 0.0040
Best_Age 1.14 1.07 0.8776 0.1224
Best_Education 1.16 1.08 0.8631 0.1369
Final_Economic_Sector 1.10 1.05 0.9093 0.0907
Final_Occupation 1.06 1.03 0.9470 0.0530
Geographic_Area 1.11 1.05 0.8988 0.1012
MEAN VIF 1.10 CONDITION NUMBER 28.7607
Prob>|z| = 0
z = -139.25
N(runs) = 3137
obs = 49807
N(geographic~a > 2) = 4593
N(geographic~a <= 2) = 45214
30
 We can use the VIF (variance inflation factor) command after we have done the regression
in order to test for multi-collinearity. If the VIF is greater than 10 then collinearity could
exist and if the tolerance value is less than 0.1 then the variable could be a linear
combination of other independent variables (UCLA 2006).
 From the above table we can see that the VIF value for each variable is less than 10 and
the tolerance value for each variable is greater than 0.1. Therefore, there is no perfect
collinearity in this model and the assumption has been satisfied.
4. Zero conditional mean
 “The error u has an unexpected value of zero given any values of the independent
variables” (Wooldridge 2014: 74).
Figure13: Test for Zero Conditional Mean
 There does not seem to be any clear pattern to the residuals plotted against the fitted values,
plus from what we saw in the previous assumption of collinearity, clearly this assumption
has been met and since conditional variances depend on conditional means we can
conclude that the “zero conditional mean” assumption has been met.
-4-2
02
ResidualValues
7 7.5 8 8.5 9 9.5
Fitted Values
Residual Versus Fitted Values
31
5. Homoscedasticity
Figure 14: Test for Homoscedasticity
 “The error u has the same variance given any values of the explanatory variables”
(Wooldridge 2014: 81).
 The pattern of the data points appear to be getting closer to each other towards the right of
the graph, therefore indicating heteroscedasticity and the assumption fails.
Overall, regarding the assumptions we have seen that the random sampling and homoscedasticity
assumptions have been violated, therefore, this model is flawed and cannot be used for any type
of analysis.
-4-2
02
Residuals
7 7.5 8 8.5 9 9.5
Fitted values
5.Homoscedasticity of Residuals
32
Concluding remarks and Recommendations
South African labour market presents opportunities for participants to earn some income that
determines their welfare. These different earning opportunities presented are based on gender,
economic sector, location, race, education and occupation. The main objective of this project is to
identify and analyse the determinants and distribution of income in South African labour market.
We find that among all the factors identified: race, education and economic sector are the most
significant variables that we can use to explain the variation of income in our model. From the
empirical analysis, whites hold a largest share of income in the labour market. Age also affects the
level of income. Which the literature strongly reveals that older people are mostly skilled and
would prefer enjoying their retirements at their homes but to incentive those to get into the labour
market again proposed salary packages must be very lucrative.
Gender has turned out to be an insignificant factor in our model. Although average income
distribution between men and women looks unequal, since the demise of Apartheid men and
women have been given equal access to opportunities, labour market is becoming more feminized
and income inequality based on gender is gradually decreasing. Income levels differ by location,
income levels in the rural areas are lower than in the urban area. The outcome of the labour market
analysis in this study shows that income inequality is more pronounced within races. The rich are
getting richer and the poor even poorer. This is contributing largely to the worst Gini-coefficient
in SA. Also education is another influential determinant of labour income.
Upon testing our model against multiple linear regression Assumptions our model failed, meaning
that we cannot use our model to make any predictions. Income inequality can be effectively
addressed and labour market is the primary area where it should happen. In conclusion, equitable
policies directed at human capital development could be of utmost help to decrease the unequal
distribution of income within race groups. As we have seen as one steps up the education ladder
the higher are their chances of getting paid a higher wage, more equitable educational opportunities
should be created which eventually contribute to more equitable distribution of labour income.
33
REFERENCES
Ahn, N & Garcia-Pérez J.I 2002 Unemployment duration and workers' wage aspirations in Spain.
[Online] Available at http://www.econ.upf.edu/docs/papers/downloads/426.pdf [Accessed 18
August 2015]
Burger, P.2015. Unpacking labor’s declining income share: manufacturing, mining and growing
inequality. [Online] Available at
http://www.econ3x3.org/article/unpacking-labour%E2%80%99s-declining-income-share-
manufacturing-mining-and-growing-inequality#sthash.hY5pLzqE.dpuf [Accessed 15 August
2015]
Bhorat, H. (2004) Labour Market Challenges in the Post-Apartheid South Africa, Working Paper
72 DPRU Cape Town: Development Policy Research Unit, University of Cape Town. [Online]
Available at http://www.sarpn.org/documents/d0000747/P832-Labour_Market_Challenges-
Bhorat.pdf [Accessed 13August 2015]
Bhorat, H., Van der Westhuizen, C. & Jacobs, T. (2009) Income and Non-Income Inequality in
Post-Apartheid South Africa: What are the Drivers and Possible Policy Interventions? DPRU
Working Paper 09/138. Cape Town: Development Policy Research Unit, University of Cape
Town. [Online] Available at
http://www.dpru.uct.ac.za/sites/default/files/image_tool/images/36/DPRU%20WP09-138.In
[Accessed 13 August 2015]
Burger. R & Woolard, I. 2005. The State of the Labour Market in South Africa after the first
decade of democracy. Working paper 133 DPR Cape Town: Development Policy Research Unit,
University of Cape. [Online] Available at
http://www.cssr.uct.ac.za/sites/cssr.uct.ac.za/files/pubs/wp133.pdf[Accessed 13 August 2015]
34
Burger, P (2015). Unpacking labor’s declining income share: manufacturing, mining and growing
inequality. Available at http://www.econ3x3.org/article/unpacking-labour%E2%80%99s-
declining-income-share-manufacturing-mining-and-growing-inequality#sthash.hY5pLzqE.dpuf
[Accessed 18 August 2015]
Hofmeyr, J.F.2001. Segmentation in the South African Labour Market in 1999. Paper presented
at DPRU/FES Conference on Labour Markets and Poverty in South Africa, Johannesburg, 15-16
November. Quoted in Burger. R & Woolard, I. 2005. The State of the Labour Market in South
Africa after the first decade of democracy. Working paper 133 DPR Cape Town: Development
Policy Research Unit, University of Cape Town.
Introduction to SAS. UCLA: Statistical Consulting Group [Online]. Available at
http://www.ats.ucla.edu/stat/sas/notes2/ [Accessed 29 September 2015]
Nattrass, N & Walker, R. (2005).Unemployment and Reservation wages in Working- Class Cape
Town. South African Journal of Economics, 73(3) pp. 503-505. Available from
http://onlinelibrary.wiley.com/doi/10.1111/j.1813-6982.2005.00034.x/epdf [Accessed 13 August
2015]
OECD.2011. Record inequality between rich and poor. Available from
https://youtu.be/ZaoGscbtPWU [Accessed on 17 August 2015]
Sinha, R. (2010, April 22) What Factors Affect Your Income? Knoji Consumer Knowledge.
Retrieved from https://income-wages-earningpower.knoji.com/what-factors-affect-your-
income/ [Accessed 07 August 2015]
35
Stiglitz, J.E 2013, Inequality is a choice, The New York Times, 13 October 2014 Available from
http://opinionator.blogs.nytimes.com/2013/10/13/inequality-is-a-choice/ [Accessed 18 August
2015]
Trading Economics 2009. GINI index in South Africa. [Online] Available from
http://www.tradingeconomics.com/south-africa/gini-index-wb-data.html [Accessed 10 August
2015]
The World Bank, (2011).GINI index (World Bank estimate). [Online] Available from
http://data.worldbank.org/indicator/SI.POV.GINI [Accessed 11 August 2015]
US Census Bureau, 2000. Earnings by Opportunities and Education. Available from
http://www.census.gov/hhes/www/income/data/earnings/call1usmale.html
[Accessed 12 August 2015]
Van Der Berg, S. 2010. Current poverty and income distribution in the context of South African
history. Stellenbosch Economic Working Papers: 22/10
Wooldridge, J.M. 2014. Introduction to Econometrics. Europe, Middle East & Africa Edition.
Cengage.
36

More Related Content

What's hot

Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
AJSSMTJournal
 
ReischauerFinalPaper
ReischauerFinalPaperReischauerFinalPaper
ReischauerFinalPaper
Nihal Maunder
 
Hrm 603 major assign
Hrm 603 major assignHrm 603 major assign
Hrm 603 major assign
Redchick
 
Unemployment (1)
Unemployment (1)Unemployment (1)
Unemployment (1)
arslan_bzu
 
Unemployment and the labour market
Unemployment and the labour marketUnemployment and the labour market
Unemployment and the labour market
PGKelly
 
Labor market developments during economic transition
Labor market developments during economic transitionLabor market developments during economic transition
Labor market developments during economic transition
Dr Lendy Spires
 
Mapping key dimensions of industrial relations
Mapping key dimensions of industrial relationsMapping key dimensions of industrial relations
Mapping key dimensions of industrial relations
Eurofound
 
Dinkelman ranchhod minwages_0710
Dinkelman ranchhod minwages_0710Dinkelman ranchhod minwages_0710
Dinkelman ranchhod minwages_0710
Dr Lendy Spires
 

What's hot (20)

0116
01160116
0116
 
Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
Moonlighting as ‘Coping Strategy’ for Irregular Payment of Salaries in Nigeri...
 
ReischauerFinalPaper
ReischauerFinalPaperReischauerFinalPaper
ReischauerFinalPaper
 
Compensation & globalization..aki
Compensation & globalization..akiCompensation & globalization..aki
Compensation & globalization..aki
 
S ocioeconomic causes of unemployment after incorporation of comments_final (1)
S ocioeconomic causes of unemployment after incorporation of comments_final (1)S ocioeconomic causes of unemployment after incorporation of comments_final (1)
S ocioeconomic causes of unemployment after incorporation of comments_final (1)
 
Hrm 603 major assign
Hrm 603 major assignHrm 603 major assign
Hrm 603 major assign
 
Types of unemployment 2
Types of unemployment 2Types of unemployment 2
Types of unemployment 2
 
Unemployment (1)
Unemployment (1)Unemployment (1)
Unemployment (1)
 
Unemployment and the labour market
Unemployment and the labour marketUnemployment and the labour market
Unemployment and the labour market
 
28
2828
28
 
Chapter 13 unemployment
Chapter 13 unemploymentChapter 13 unemployment
Chapter 13 unemployment
 
CASE Network Studies and Analyses 210 - Labour Developments in Moldova
CASE Network Studies and Analyses 210 - Labour Developments in MoldovaCASE Network Studies and Analyses 210 - Labour Developments in Moldova
CASE Network Studies and Analyses 210 - Labour Developments in Moldova
 
Siddique
SiddiqueSiddique
Siddique
 
Labor market developments during economic transition
Labor market developments during economic transitionLabor market developments during economic transition
Labor market developments during economic transition
 
Employment unemployment
Employment unemploymentEmployment unemployment
Employment unemployment
 
Mapping key dimensions of industrial relations
Mapping key dimensions of industrial relationsMapping key dimensions of industrial relations
Mapping key dimensions of industrial relations
 
Reinforcement Learning Literature review - apr2019/feb2021 (with zip file)
Reinforcement Learning Literature review - apr2019/feb2021 (with zip file)Reinforcement Learning Literature review - apr2019/feb2021 (with zip file)
Reinforcement Learning Literature review - apr2019/feb2021 (with zip file)
 
Ilo global wage report 2016 2017
Ilo global wage report 2016 2017Ilo global wage report 2016 2017
Ilo global wage report 2016 2017
 
Dinkelman ranchhod minwages_0710
Dinkelman ranchhod minwages_0710Dinkelman ranchhod minwages_0710
Dinkelman ranchhod minwages_0710
 
Mapping key dimensions of industrial relations - 2016
Mapping key dimensions of industrial relations - 2016Mapping key dimensions of industrial relations - 2016
Mapping key dimensions of industrial relations - 2016
 

Similar to Final Project 2

Business Design OutlineThe StudentBA500-ManagementInstructor.docx
Business Design OutlineThe StudentBA500-ManagementInstructor.docxBusiness Design OutlineThe StudentBA500-ManagementInstructor.docx
Business Design OutlineThe StudentBA500-ManagementInstructor.docx
RAHUL126667
 
The macro environment refers to both international and national environment i...
The macro environment refers to both international and national environment i...The macro environment refers to both international and national environment i...
The macro environment refers to both international and national environment i...
TSELENG KOBATSOENE
 
HGSHMUN First Conference
HGSHMUN First ConferenceHGSHMUN First Conference
HGSHMUN First Conference
Joyce Chin
 
Arunisriwardene sandeepkhurana-131008015750-phpapp02
Arunisriwardene sandeepkhurana-131008015750-phpapp02Arunisriwardene sandeepkhurana-131008015750-phpapp02
Arunisriwardene sandeepkhurana-131008015750-phpapp02
PMI_IREP_TP
 

Similar to Final Project 2 (9)

Business Design OutlineThe StudentBA500-ManagementInstructor.docx
Business Design OutlineThe StudentBA500-ManagementInstructor.docxBusiness Design OutlineThe StudentBA500-ManagementInstructor.docx
Business Design OutlineThe StudentBA500-ManagementInstructor.docx
 
Cody I. Smith: Gender Disparities in the Peripheral and Core Sectors of the ...
Cody I. Smith: Gender Disparities in the Peripheral and Core Sectors  of the ...Cody I. Smith: Gender Disparities in the Peripheral and Core Sectors  of the ...
Cody I. Smith: Gender Disparities in the Peripheral and Core Sectors of the ...
 
The macro environment refers to both international and national environment i...
The macro environment refers to both international and national environment i...The macro environment refers to both international and national environment i...
The macro environment refers to both international and national environment i...
 
Brue-Chapter-17.ppt. Wagre Determination
Brue-Chapter-17.ppt. Wagre DeterminationBrue-Chapter-17.ppt. Wagre Determination
Brue-Chapter-17.ppt. Wagre Determination
 
Econ 198 report
Econ 198 reportEcon 198 report
Econ 198 report
 
Essay About Work-Life Conflicts
Essay About Work-Life ConflictsEssay About Work-Life Conflicts
Essay About Work-Life Conflicts
 
Dcr Trendline June 2015
Dcr Trendline June 2015Dcr Trendline June 2015
Dcr Trendline June 2015
 
HGSHMUN First Conference
HGSHMUN First ConferenceHGSHMUN First Conference
HGSHMUN First Conference
 
Arunisriwardene sandeepkhurana-131008015750-phpapp02
Arunisriwardene sandeepkhurana-131008015750-phpapp02Arunisriwardene sandeepkhurana-131008015750-phpapp02
Arunisriwardene sandeepkhurana-131008015750-phpapp02
 

Final Project 2

  • 1. 1 TABLE OF CONTENTS PAGE SECTION 1 Introduction……………………………………………………………………………………………………………………………………..2 SECTION 2 Literature review………………………………………………………………………………………………………………………………3 SECTION 3 Descriptive Analysis:Part1……………………………………………………………………………………………………………... 6 Descriptive Analysis:Part2…………………………………………………………………………………..............................18 SECTION ConcludingRemarksandRecommendations …………………………………………………………………………………..32 REFERENCES………………………………………………………………………………………………………………………………………….33 LIST OF TABLES Table 1: Descriptive statisticsinLabourMarket…………………………………………………………………………………12 Table 2: Descriptive statisticsinSouthAfricanLabourMarket…………………………………………………………..18 Table 3: Multiple LinearRegression(income)……………………………………………………………………………………18 Table 4: Testfor RandomSamplingRace Variable …………………………………………………………………………….26 Table 5: Testfor RandomSamplinginGendervariable………………………………………………………………………27 Table 6: Testfor RandomSamplinginBest age variable…………………………………………………………………….27 Table 7: Testfor RandomSamplinginEducationvariable………………………………………………………………….27 Table 8: Testfor RandomSamplinginFinal EconomicSectorvariable……………………………………………….28 Table 9: Testfor RandomSamplinginFinal occupationvariable………………………………………………………..29 Table 10: Test forRandom SamplinginGeographicareavariable………………………………………………………29 Table 11: CollinearityDiagnostics………………………………………………………………………………………………………29 LIST OF FIGURES Figure 1: Income distributioninSA Labour Market………………………………………………………………………….6 Figure 2: An Example of Whatthe Distributionof Income ShouldLookLike………………………………………..7 Figure 3: Average Labourincome byrace in SA Labour Market…………………………………………………………..8 Figure 4: Average income byGenderinSA Labour Market………………………………………………………………….9 Figure 5: Average income byAge inSouthAfricanLabour Market……………………………………………………..10 Figure 6: Average income bylocationinSouthAfricanLabourMarket………………………………………………11 Figure 7: Descriptive statisticsinSA Labourmarket……………………………………………………………………………..13 Figure 8: Labour income acrossLocation Type…………………………………………………………………………………..14 Figure 9: Labour Income DistributionacrossGenderinSouthAfrican……………………………………………….15 Figure 10: Labour Income DistributionacrossAge CohortsinSA………………………………………………………16 Figure 11: Testfor Normal distribution……………………………………………………………………………………………..23 Figure 12 Scatterplotsof eachindependentvariable plottedagainstresiduals…………………………………24 Figure13: Testfor ZeroConditional Mean………………………………………………………………………………………….30 Figure 14: Testfor Homoscedasticity…………………………………………………………………………………………………31
  • 2. 2 INTRODUCTION South Africa is the second largest economy in Africa, rich in natural resources and a leading producer of platinum, chromium, iron and gold. From 2002 to 2007, South Africa grew at an average of 4.5 percent year-on-year, which is reported to be its fastest expansion since the demise of Apartheid in 1994 (Trade Economics, 2009).The ANC in recent years failed to address the structural problems and instabilities in the economy such as the widening gap between rich and poor, high unemployment rate, low skilled labour force, crime rates and corruption. In fact, amongst developing and transition nations, South Africa has a relatively high Gini-coefficient showing that distribution is markedly unequal, (Bhorat et al., 2009). According to Van Der Berg(2010), wage inequality is deeply rooted in South Africa’s history and plays a central role in overall income distribution.The Marikana incident is evidence that not all is well in the labour market (Burger, 2015). It therefore comes as no surprise that we are interested in what determines household’s income in South African labour markets. We achieve this by studying the determinants of labour income and how these factors contribute to the income distribution. This paper proceeds as follows: Section 1 is the introduction, Section 2 reviews and summarizes the relevant literature about the studies of the distribution and determinants of income in the Labour Market. Section 3 includes descriptive analyses of the output received from Stata as well as a regression analysis and the interpretation thereof, and the final section will include concluding remarks and policy recommendations.
  • 3. 3 LITERATURE REVIEW Coming into power in 1994, the new government implemented various policies in order to redress inequality and discrimination that existed in the market at large. Instabilities caused by Apartheid are still haunting South Africa currently and the labour market is of no exception. Some jobs pay a lot more than others and are in greater demand than others. Generally the more skills, education and training one has the higher the earnings. According to a Cambridge University Lecturer, Rana Sinha (2010) no matter how highly one prices him/herself it all boils down to three factors: how others value what you do, how good you are at what you do and how difficult it is to replace you. As job market continually changes the job skills that are needed are also evolving. Many of jobs require specific skills and to acquire those one has to go through training and education. Those skills include hard and soft skills. Hard skills are for example: plumbing in a house, several months of training are needed to learn this skill. In addition to hard skills we also consider soft skills and these include leadership, teamwork and communication skills as well as ability to deal with difficult people and situations etc. To some people this comes naturally whilst to other people it doesn’t and they acquire it through education and practice in a similar way to hard skills. Having both good soft and good hard skills will help you improve thereby improving your potential earnings. The more education one attains the more probable he/she will earn a higher income. According to US Census Bureau (2000) people who have a bachelor’s degree or higher earn nearly twice as much as people with only a high school diploma with the difference amounting to almost $1 million over a lifetime. The productivity of workers depends on their ability and on their education: workers with higher innate ability are more productive and acquiring education allows access to the high-skilled sector, where the productivity per unit of ability is higher in general. Jobs that were popular 20 years ago are no longer as popular nowadays. As technology advances the job skills also evolve. For example typists were common in the nineties but nowadays they are not popular and jobs that require unique and new skills have emerged e.g. Website Designers.
  • 4. 4 Highly demanded jobs are normally highly paid, like any other Market. Since 1994 the Labour Market has since been fundamentally reformed with a range of laws primarily aimed at protecting worker’s rights and promoting workers from previously disadvantaged groups (blacks, disabled and women). The Basic Conditions of Employment Act of 1997 provides for the improvement of labour market standards through the setting of minimum wages by the minister of labour and bargaining councils. The Labour Relations Act of 1996 promotes wage determination through collective bargaining. Bargaining councils serve as industry specific intermediaries between employers and employees during wage negotiations. Minimum wages set are standardised across industries, regardless of the size and physical location of the firm. According to Barker (2009), in the absence of the bargaining councils large firms would be paying high whilst small firm are paying low. The introduction of the minimum wages law standardises wages in the market and eliminates wage differentials. Whether one is a member of a union or not affects the wage (income) levels as well. Since their dramatic rise in 1980s workers wage premiums have increased (Woolard and Burger, 2005). Hofmeyr (2001) also found a similar effect of unionisation. He mentioned that there is there is a substantial wage premium attached to union membership, and that this is rising over time, suggesting that the unionised part of the workforce is able to insulate their wage levels against changing labour market conditions. The geographical area where one chooses to work in has an impact on their wages (income) too. Following the neoclassical assumption of a clearing labour market, if workers are allowed mobility and have an attractive area to move to they will move. As increased amounts of workers move to these areas, this will in turn increase labour supply and therefore depress wages. The employers will set the wages high in order to deter their workers from moving to another area. Areas with high skilled labour tend to have high wage levels as well. Therefore, a sustained equilibrium with different wage rates across different areas can occur. Recent events such as the labour unrest in South Africa has shown that a growing number of South Africans are tired of the inequalities and becoming restless. Labour market problems and unrests affect the output produced. As less output is now produced this adversely affects the economy. As the economic growth rate declines it also has a downward pressure on wages. In his article, Sinha (2010) also pointed out that If an employer knows that it
  • 5. 5 would take him months to get someone as qualified and experienced as you to do your job as well as you do it, your employer has an incentive to pay you the wage that you demand. A person’s age affects the ‘reservation wage’ (i.e. the minimum wage one is willing to receive). This topic like any other in economics has recently gained attention with researchers dedicating their time to study how age affects the reservation wage. Burger and Woolard (2005) are some researchers that found a positive relationship. Nattrass and Walker(2005) in their paper also mentioned a similar relationship adding that the positive relationship is not only found among the unemployed it is also found among the employed, with the employed demanding a relatively higher reservation wage than the unemployed. Older job seekers tend to be less willing to work for a reduced wage (Ahn and Garcia-Pérez 2002). Overall income inequality in South Africa has not improved and remains extremely high at a Gini coefficient of 0.65 (World Bank, 2011) for the past decade. The Gini coefficient measures the gap between the rich and the poor citizens of a country. This coefficient lies between 0 and 1, the higher the Gini-coefficient the wider the gap between the rich and the poor. Although it has remained high, its nature has changed in South Africa. Historically, as a result of policies implemented during apartheid high levels of income inequality have been driven by inequality between racial groups. More recent studies from 1999-2001 Census have seen intra racial distribution improving whilst income inequality within racial groups worsens (Bhorat, et al., 2009). According to Stiglitz (2013), income inequality in rich countries around 1990s and 2000s declined significantly and it was due to an improvement on gender inequality which was for a long time contributing largely to the increase in their labour market inequality. It is not so in South Africa. Although the labour market after 1994 is becoming more feminised, with women making up to 3-fifths of new labour market entrants, the decrease is not strongly contributing enough to prevent the labour market Gini-coefficient from rising (Trade Economics, 2009). Inequality within races especially blacks contributes greatly to income inequality (Bhorat, 2009).
  • 6. 6 DESCRPTIVE ANALYSIS: PART 1 Figure 1: Income distribution in SA Labour Market Looking at the above histogram it is clear that the data is skewed to the right and does not follow a normal distribution, which is what we expect it to be. This means that most of the data points lie within the first income bracket, however, we can also see that there seems to be a spike between R40000 and R50000, which could be due to outliers. This is a perfect representation of the unequal distribution of income in South Africa. Very few people earn very large salaries whereas the majority of people living in South Africa, regardless of which race group they belong to, earn a very low wage. The income gap between rich and poor is very large and we know that South Africa faces a large inequality when looking at income distribution. 0 .1.2.3.4 Density 0 10000 20000 30000 40000 50000 income
  • 7. 7 Figure 2: An Example of What the Distribution of Income Should Look Like This histogram is an example of what we expect the distribution of income to look like and we achieve this by logging the income variable since income is a non-linear function. This is a normal distribution and this would represent a fair distribution of income. 0 100020003000 Density 2 4 6 8 10 12 lincome Distribution of Income When Logging Income Variable
  • 8. 8 Figure 3: Average Labour income by race in SA LabourMarket. This represents the average income across different race groups in South Africa. It is clear that, on average, White people hold the largest share of income from the labour market and African people hold the smallest share of income from the labour market. These differences can be explained by the apartheid era, which was the division of race groups and the opportunities received by each race group. White people had more opportunities to prosper than African people and the reason why we still see this gap after so many years since apartheid ended, is because the opportunities received by White people during apartheid set them up to still reap the benefits now, whereas African people are still trying to catch up. From what we know and see above, it can be said that the lower the bar, the less opportunities were received during apartheid. African Coloured Indian White 0 5,000 10,00015,00020,000 0 5,000 10,00015,00020,000 1 2 3 4 meanofincome Graphs by RECODE of w3_best_race (Best population group) Mean Income By Race
  • 9. 9 Figure 4: Average income by Gender in SA LabourMarket On average the income distribution between men and women is fairly large. On average males tend to earn a lot more than females. It was only in the last few decades that we have seen women being viewed as more and more equal to men, but according to the above graph we can see that there’s still a lot of progress to be made. Male Female 0 2,0004,0006,0008,000 1 2 meanofincome Graphs by RECODE of w3_best_gen (Best gender) Mean Income By Gender
  • 10. 10 Figure 5: Average income by Age in South African LabourMarket This graph represents the average income distribution across various age groups. From what we can see, on average, the age group “55-59” holds the highest share of income and the age group “80-84” has the smallest share of income. It makes sense for the age group “55-59” to be the highest on average, because at that stage in your life you are assumed to have been working for a few years and have long enough job tenure and experience to earn a good salary. What does not make sense is the age groups, “0-1”, “2-4”, “5-9”, “10-14” and “15-19” to have larger shares of income on average than some of the older age groups because at those young ages it is not expected of them to be working and earning any form of income from the labour market. Since it is on average it can be expected that these things could occur because it across all age groups and has not been adjusted for these problems. 0-1 2-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85+ 0 5,000 10,00015,00020,000 0 5,000 10,00015,00020,000 0 5,000 10,00015,00020,000 0 5,000 10,00015,00020,000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 meanofincome Graphs by RECODE of w3_age_intervals (Age Intervals) Mean Income By Age Cohort
  • 11. 11 Figure 6: Average income by location in South African Labour Market This graph depicts the average income across different geographic area types. From the above graph it is clear that people living in urban areas earn more, on average, than those living in traditional or farm areas. It is to be expected that people living in urban areas would earn more because urban areas tend to be around the central business district (CBD) where job opportunities tend to be. Traditional and farm areas are far from the CBD and, therefore, are not close to job opportunities and, therefore, will not earn more than those living in urban areas. Traditional Urban Farm 0 2,0004,0006,0008,000 10,000 0 2,0004,0006,0008,000 10,000 1 2 3 meanofincome Graphs by RECODE of w3_hhgeo2011 (GeoType (2011 Census)) Mean Income By Geographic Area Type
  • 12. 12 Table 1: Descriptive statistics in Labour Market Source: Wave 3 Data: Southern Africa Labour and Development Research Unit. National Income Dynamics Study 2012, Wave 3 [dataset]. Version 1.3. Cape Town: Southern Africa Labour and Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor], 2015 The table above shows the important information needed to describe the data in the most simplistic way. We see each variable’s minimum and maximum value, their standard deviation (which describes the volatility of the variable), their average frequencies and their total amount of observations. These values can be used to evaluate the data, since these are the most important values required for data analysis, and these values can be used to describe an individual’s age, education, race, gender, and their occupation and in which sector of the economy they work. From the table we can conclude that the education variable is the most volatile with the highest standard deviation and the gender variable is the least volatile, however, we do see a large difference between the income earned by males and females when looking at the “mean income by gender” graph above. best_age 41,970 27.45149 20.49551 0 105 gender 42,050 1.543924 .4980729 1 2 race 42,050 1.269631 .6680649 1 4 best_educa~n 39,582 6.228942 4.646704 0 18 final_econ~r 5,401 6.277356 3.51116 1 11 final_occu~n 21,426 8.819378 1.941586 1 10 Variable Obs Mean Std. Dev. Min Max
  • 13. 13 Figure 7: Labour income across Location Type This pie chart represents the distribution of income across geographic areas in South Africa. These geographic areas consist of, “Traditional, Urban and Farms”. The largest percentage of income is held by those that live in urban areas, which is to be expected. Those that live in urban areas are closer to the central business district (CBD), where the majority of job opportunities tend to be. Those that live on farms hold the least amount of overall income because farms tend to be beyond the outskirts of urban areas, where very few people live and their livelihoods largely depend on farming, also in most cases these individuals farm for themselves, which is known as subsistence farming.
  • 14. 14 Figure 8: Income Distributionacross Race Groups in SouthAfrica What we see in this pie chart is the distribution of income across the different race groups in South Africa. This is made up of, “African, Coloured, Asian/Indian and White”. The African race group holds the largest percentage of overall income. This is so, because the African race group is the majority race group in South Africa. They might not necessarily be the wealthiest race group, but because they are the majority race group it is clear that they will hold the largest percentage of income when taking into account the entire country and dividing it into race groups. The Asian/Indian race group holds the smallest percentage of the overall income in South Africa. This could be due to their race group being a minority in South Africa. They could all be wealthy, but because we are only analysing according to race groups, and not taking into account the size of the race group and then only using the size to determine the percentage of income held by each race group could give skewed results because it does not tell us whether the African race group holds the majority of income because they are the wealthiest or because they are the majority race group, 50.59% 10.72% 6.522% 32.17% 1. African 2. Coloured 3. Asian/Indian 4. White Income Across Race Groups
  • 15. 15 and that is what makes this pie chart generally useless when determining the distribution of income. It does not tell us about the equality of income distribution among race groups. Figure 9: Labour Income Distribution across Gender in South African This pie chart tells us about the distribution of income between men and women living in South Africa. Males hold the larger percentage of income from the labour market and females hold the smaller percentage. It is clear from the above pie chart that the income distribution among men and women are far from being equal, but there has been some improvements in the last few decades, and women are being seen as more and more equal to men. These are rough estimations of the income distribution among males and females, as yet again we find that this pie chart is flawed because the distribution could purely be based on the fact that there are more men than women in the labour force and not necessarily the real distribution of income among men and 70.58% 29.42% Male Female Income Across Gender
  • 16. 16 women taking into account the size of the working male population and the size of the working female population and adjusting for these differences. Once that is done the results will be more accurate. Figure 10: Labour Income Distribution across Age Cohorts inSA The income distribution among age groups is what is being represented above. This pie chart demonstrates the income distribution across all age groups. It is to be expected that those in the “0-1”, “2-4”, “5-9” and “10-14” age groups hold the least amount of income because these are children whom have not yet reached the legal age to start working and receiving income from the labour market, but a certain amount of them do still receive a percentage of overall income, which could be due to a variety of reasons including child labour. The largest percentage of income is held by the age group, “30-34”. This is because at this stage in one’s life it is to be assumed that
  • 17. 17 one has received some sort of qualification and have been working for some time and those that do not necessarily have a qualification is assumed to be working and well established at their job. Excluding the first five age groups, income seems to be fairly distributed among the different age groups.
  • 18. 18 DESCRIPTIVE ANALYSIS: PART 2 Table 2: Descriptive statistics in South African Labour Market i.race _Irace_1-4 (naturally coded;_Irace_1 ommmitted) i.gender _Igender_1-2 (naturally coded;_Igender_1_omitted) i.best_age _Ibest_age_1-5 (naturally coded;_Ibest_age_1 omitted) i.best_education _Ibest_education_1-2 (naturally coded;_Ibest_educ_1 omitted) i.final_economic_sector _Ifinal_economic_sector_1-2 (naturally coded;_Ifinal_eco_1 omitted) i.final_occupation _Ifinal_occupation_1-2 (naturally coded;_Ifinal_occ_1 omitted) i.geographic_area _Igeographic_area_1-3 (naturally coded;_Igeographi_1 omitted) Table 3: Multiple Linear Regression(income) Variable Coefficient p-value _Irace_2 -.08176 0.287 _Irace_3 .7814 0.000 _Irace_4 .9395 0.000 _Igender_2 -.0980 0.090 _Ibest_age_2 -.1942 0.023 _Ibest_age_3 .0549 0.500 _Ibest_age_4 -.2264 0.110 _Ibest_age_5 .0348 0.925 _Ibest_educ_2 .2322 0.002 _Ifinal_eco_2 .3767 0.000 _Ifinal_occ_2 -.1863 0.131 _Igeographi_2 .2210 0.002 _Igeographi_3 -.0966 0.396 _cons 7.7714 0.000 Number of observations: 961 R-squared= 0.2069 Source: Wave 3 Data: Southern Africa Labour and Development Research Unit. National Income Dynamics Study 2012, Wave 3 [dataset]. Version 1.3. Cape Town: Southern AfricLabour and Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor], 2015
  • 19. 19 For this regression analysis we will assume a perfect market with perfect information. The above table represents a regression on income. We have chosen income as the dependent variable, this variable had to be logged, since raw income data is not a linear function, and the independent variables are race, gender, age, education, economic sector, occupation and the geographic area. The independent variables will be used in the regression analysis to explain the variation in the dependent variable (Wooldridge 2014: 57). In order to do this analysis we will look at the R-squared, the coefficients, the standard error, the t-statistic and the p-values. This is a multiple regression equation and the general form is y = β0 + β1X1 + β2X2 + u, where y is the dependent variable and X1 and X2 are the independent variables; β0 represents the intercept and β1 is the slope; β1 and β2 are the parameters of X1 and X2, β0 is a constant and u represents the error term, which are all the variables that we have not included in the regression equation, but could have an effect on the variation in the dependent variable (Wooldridge 2014: 59 – 60). The resulting equation from running the regression is: log(income) = 7.7714 – 0.0818(_Irace_2) + 0.7814(_Irace_3) + 0.9395(_Irace_4) – 0.0980(_Igender_2) – 0.1942(_Ibest_age_2) + 0.0549(_Ibest_age_3) – 0.2264(_Ibest_age_4) + 0.0348(_Ibest_age_5) + 0.2322(_Ibest_educ_2) + 0.3767(_Ifinal_eco_2) – 0.1863(_Ifinal_occ_2) + 0.2210(_Igeographi_2) – 0.0966(_Igeographi_3). With an R-squared of 0.2069 and the number of observations are 961. Each one of the above independent variables are categorical. Each observation of the above independent variables will fall within a certain category, for example, if we take the education variable, each observation will fall into a certain category. The observation could either fall into the category of a grade 10 level of education or the category of a grade 12 education or the category of bachelor’s degree and diploma. Each of the selected independent variables have been divided into categories in order for the regression output to be more accurate when analyzing. The regression is done by taking the first group and omitting it from the model and then comparing the other groups to this base group. It is important to also know the overall effect that the divided independent variable will have on the dependent variable.The coefficient of each independent variable indicate a certain effect that the variable will have on the dependent variable should there be any changes in the independent variable, holding all other independent variables constant. It is important to test the significance or the ‘necessity’ of each independent variable when it comes to
  • 20. 20 explaining the variation in the dependent variable. For this analysis we can make use of the t-test as well as the interpretation the p-values given in the model. The t-statistic is calculated to test if one x-variable has a significant effect on the y-variable, when holding all the other x-variables in the regression equation constant (Wooldridge 2014: 98). The t-statistic is calculated by first stating a null hypothesis, which is what we believe to be true, against an alternative hypothesis. The null hypothesis will be written as follows: H0: βj = 0, where j corresponds to any of the independent variables, and the alternative hypothesis will be written as follows: H1: βj > 0 for a one-sided test, and H1: βj ≠ 0 for a two-sided test. The t-statistic is defined as follows: t≡ ^ j/se ( ^ j) (Wooldridge 2014: 98). At the end of this analysis we have included a regression output of raw data (Figure 1) so that the overall significance of each variable can be shown and how each one will affect the variation in income. We will use the p-value analysis; if the p-value is less than a specified significance level then we would reject the null hypothesis and conclude that the variable is significant. Race variable: The race groups included in the analysis is Coloured, Indian and White. Only the White and Indian race groups are statistically significant due to both variables having a p-value of 0.000. Race has a significant impact on income. As revealed by previous literature, income inequality between races has seen its betterment since the demise of Apartheid, but what seems to enlarge the income gap is inequality within races, especially non-Whites (Indians, Coloureds and Africans) (Bhorat, et al., 2009). This could be a result of unintended consequences of the government’s policies such as Affirmative Action which was implemented to redress past injustices which seem to benefit only a small portion of “qualified blacks”. Gender variable: the first gender category (male) is the base category. The second gender category, female, has been included in the model. This variable has a negative effect on income; a 1 unit increase in females, decreases income by 9.80%. However, it is not statistically significant due to its p-value equaling 0.090. Best age variable: This variable has been divided into 5 categories; category 1 is from 0-20 years of age, category 2 is from 21-40 years of age, category 3 is from 41-60 years of age, category 4 is from 61-80 years of age and category 5 is from 81-105 years of age. Categories 2-5 have been included in the model, but only category 2 is significant with a p-value of 0.023. This variable will
  • 21. 21 affect income because this is the age group where the observations are developing their careers. A 1 unit increase in _Ibest_age_2 will cause a 19.42% decrease in income. Logically this does not make sense because we know that as you get older, you get more experience, therefore, seen as more competent and will receive a higher income. Best education variable: Education has been divided into 2 groups. Group 1 consists of – education received from grade 1 to grade 12, and group 2 consists of – post school education, other and no schooling. If group 2 increase by 1 unit then income increases by 23.22%. This makes logical sense because when an individual is more educated they are believed to be more competent and will receive a higher income. Final economic sector variable: Consists of 2 groups. Group 1 – private households, agriculture, fishing, forestry, mining and quarrying, manufacturing (e.g. clothing, food), electricity, gas, water and construction. Group 2 – wholesale/retail, transport, storage and communication, finance, real estate, business services, community, social and personal services, catering and accommodation and other. Group 2 is in the model and is statistically significant because of a p-value equal to 0.000. If an individual moves from a small sector to a large sector, he/she will possibly earn more income and have better job opportunities. Final occupation variable: This variable also consists of 2 groups. Group 1 – managers, professionals, technicians and associate profession, clerical support workers and service and sales workers. Group 2 – skilled agricultural, forestry and fishing, craft and related trade workers, plant and machine operators and associate profession, elementary occupation and never worked. Group 2 of the final occupation variable is included in the model and it is statistically significant as it has a p-value of 0.131. Geographic area variable: There are 3 groups within this variable. Group 1 is traditional, group 2 is urban and group 3 is farms. Only group 2 is statistically significant with a p-value of 0.002. The assumption is that living in an urban area will increase the prospect of finding a job with a better income than living in a traditional or farm area. In order for this to make logical sense let’s say for example, someone living in a traditional/rural area moves to the urban area, where more jobs tend to be, they are able to increase their income.
  • 22. 22 Overall, each independent variable’s effect on income can be explained, whether it increases or decreases income. If we look at figure 1, the raw Stata output regression model, the only variables that are overall statistically significant are race, education and the economic sector. The R-squared equals 0.2069. The R-squared is used to evaluate the validity of the independent variables used to describe the variation in the dependent variable. It is used to test the overall significance of the model. The R-squared of this model is low, which means that the independent variables explain very little of the variation in income. This is also referred to as the goodness-of- fit test. One of the major values to look at in this model is the number of observations. There is a drastic decrease in the number of observations in the data and the number of observations in the model. It decreased from 52466 to 961. This indicates that there must be an error in the regression output. When doing any type of multiple regression analysis there are certain assumptions that are made and have to be satisfied in order for the regression model to be useful. These assumptions are as follows: 1. Residuals should be normally distributed 2. Linear in parameters 3. Random sampling 4. No perfect collinearity 5. Zero conditional mean 6. Homoskedasticity Each assumption will be tested using the first regression model.
  • 23. 23 1. Residuals should be normally distributed  “The population error u is independent of the explanatory variables…and is normally distributed with zero mean and variance…” (Wooldridge 2014: 94). Figure 11: Test for Normal distribution We use a histogram to represent the distributions of the residuals. The above histogram clearly shows us that the distribution of the residuals are normally distributed. Also, from the normal probability-plot above there are no indications of non-normality. Therefore, the above regression model satisfies the first condition of residuals having to be normally distributed. 1. Linear in parameters  “The model in the population can be written as: y = β0 + β1X1 + β2X2+…+ βkXk + u, where β0, β1,…, βk are the unknown parameters (constants) of interest and u is an unobserved random error or disturbance term” (Wooldridge 2014: 71). 0 .1.2.3.4.5 Density -4 -2 0 2 Residuals Historgram of Residuals 0.000.250.500.751.00 NormalF[(r-m)/s] 0.00 0.25 0.50 0.75 1.00 Empirical P[i] = i/(N+1) Normal Probability Plot
  • 24. 24 Figure 12: Scatter plots of each independent variable plotted against residuals-6-4-2 024 Residuals 1 2 3 4 RECODE of w3_best_race (Best population group) Plotting Race Variables Against Residuals -6-4-2 024 Residuals 1 1.2 1.4 1.6 1.8 2 RECODE of w3_best_gen (Best gender) Plotting Gender Variable Against Residuals -6-4-2 024 Residuals 0 20 40 60 80 100 RECODE of w3_best_age_yrs (Best age - years) Plotting Best_Age Variable Against Residuals -6-4-2 024 Residuals 0 5 10 15 20 RECODE of w3_best_edu (Best Education) Plotting Best_Education Variable Against Residuals -6-4-2 024 Residuals 0 5 10 final_economic_sector Plotting Final_Economic_Sector Against Residuals -6-4-2 024 Residuals 0 2 4 6 8 10 final_occupation Plotting Final_Occupation Variable Against Residuals -6-4-2 024 Residuals 1 1.5 2 2.5 3 RECODE of w3_hhgeo2011 (GeoType (2011 Census)) Plotting Geographic_Area Variable Against Residuals
  • 25. 25  Augmented partial residual plot for each variable will give a clearly  Looking at the augmented partial residual plot for race and best_age there is not an extreme deviation from linearity, so we could say that they are linear. Note: Cannot plot augmented partial residual plot for gender since gender^2 is collinear with gender.  Best education and final economic sector also do not have extreme deviations from linearity and we can say that they are linear. -6-4-2 024 1 2 3 4 RECODE of w3_best_race (Best population group) Augmented Partial Residual Plot For Race Variable -6-4-2 02 Augmentedcomponentplusresidual 0 20 40 60 80 100 RECODE of w3_best_age_yrs (Best age - years) Augmented Partial Residual Plot For Best_Age Variable -6-4-2 024 0 5 10 15 20 RECODE of w3_best_edu (Best Education) Augmented Partial Residual Plot For Best_Education Variable -4-2 024 Augmentedcomponentplusresidual 0 5 10 final_economic_sector Augmented Partial Residual Plot For Final_Economic_Sector Variable
  • 26. 26  Final occupation does not have large deviations from linearity and geographic area has large deviations, but not large enough to say that it is non-linear.  Overall, all variables satisfy the condition of linear in parameters except for the gender variable, therefore our model does not satisfy this condition, unless we remove the gender variable from the model. 2. Random Sampling  “We have random sample of n observations…following the population model…” (Wooldridge 2014: 72).  We will use the “runtest” command in STATA to test for random order using the median as the threshold and the number of runs compared to the number of observations will tell us whether there is random sampling or not. a) Table 4: Test for Random Sampling Race Variable o There are 3600 runs in these 42050 observations. The large number of runs indicates that residuals are serially independent or that they are random. -6-4-2 02 0 2 4 6 8 10 final_occupation Augmented Partial Residual Plot For Final_Occupation Variable -4-2 02 Augmentedcomponentplusresidual 1 1.5 2 2.5 3 RECODE of w3_hhgeo2011 (GeoType (2011 Census)) Augmented Partial Residual Plot For Geographic_Area Variable Prob>|z| = 0 z = -146.26 N(runs) = 3600 obs = 42050 N(race > 1) = 7676 N(race <= 1) = 34374
  • 27. 27 Table 5: Test for Random Sampling in Gender variable o There is 1 run and 42050 observations. The number of runs is too low to have serially independent residuals, therefore, it violates the random assumption Table 6: Test for Random Sampling in Best_age variable o There are 18935 runs and 41970 observations. The large number of runs indicate randomness or serial independence of residuals. Table 7: Test for Random Sampling in education variable Prob>|z| = . z = . N(runs) = 1 obs = 42050 N(gender > 2) = 0 N(gender <= 2) = 42050 Prob>|z| = 0 z = -19.81 N(runs) = 18935 obs = 41970 N(best_age > 23) = 20275 N(best_age <= 23) = 21695 Prob>|z| = 0 z = -31.45 N(runs) = 16562 obs = 39582 N(best_educa~n > 7) = 18246 N(best_educa~n <= 7) = 21336
  • 28. 28 o There are 16562 runs and 39582 observations. The large number of runs means that there is randomness or that the residuals are serially independent. Table 8: Test for Random Sampling in Final Economic Sector variable o There are 2426 runs and 5401 observations. The large number of runs indicate that residuals are serially independent, which means that they are random. o Table 9: Test for Random Sampling in Final occupation variable o Here we only have one run and 21426 observations. This low amount of runs indicates that the residuals are not serially independent and this violates the random sampling assumption. o Prob>|z| = 0 z = -6.78 N(runs) = 2426 obs = 5401 N(final_econ~r > 7) = 2420 N(final_econ~r <= 7) = 2981 Prob>|z| = . z = . N(runs) = 1 obs = 21426 N(final_occu~n > 10) = 0 N(final_occu~n <= 10) = 21426
  • 29. 29 Table 10: Test for Random Sampling in Geographic area variable o Here we have 3137 runs and 49807 observations. The large number of runs means that the residuals are serially independent and the variable meets the requirements for random sampling.  The random sampling assumption has been violated since final occupation and gender do not have serially independent residuals. These variables will have to be removed in order for the model to meet the requirements to satisfy this assumption. 3. No perfect collinearity  “In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables” (Wooldridge 2014: 72). Table 11: Collinearity Diagnostics Variable VIF SQRTVIF TOLERANCE 𝑹 𝟐 Race 1.13 1.06 0.8883 0.1117 Gender 1.00 1.00 0.9960 0.0040 Best_Age 1.14 1.07 0.8776 0.1224 Best_Education 1.16 1.08 0.8631 0.1369 Final_Economic_Sector 1.10 1.05 0.9093 0.0907 Final_Occupation 1.06 1.03 0.9470 0.0530 Geographic_Area 1.11 1.05 0.8988 0.1012 MEAN VIF 1.10 CONDITION NUMBER 28.7607 Prob>|z| = 0 z = -139.25 N(runs) = 3137 obs = 49807 N(geographic~a > 2) = 4593 N(geographic~a <= 2) = 45214
  • 30. 30  We can use the VIF (variance inflation factor) command after we have done the regression in order to test for multi-collinearity. If the VIF is greater than 10 then collinearity could exist and if the tolerance value is less than 0.1 then the variable could be a linear combination of other independent variables (UCLA 2006).  From the above table we can see that the VIF value for each variable is less than 10 and the tolerance value for each variable is greater than 0.1. Therefore, there is no perfect collinearity in this model and the assumption has been satisfied. 4. Zero conditional mean  “The error u has an unexpected value of zero given any values of the independent variables” (Wooldridge 2014: 74). Figure13: Test for Zero Conditional Mean  There does not seem to be any clear pattern to the residuals plotted against the fitted values, plus from what we saw in the previous assumption of collinearity, clearly this assumption has been met and since conditional variances depend on conditional means we can conclude that the “zero conditional mean” assumption has been met. -4-2 02 ResidualValues 7 7.5 8 8.5 9 9.5 Fitted Values Residual Versus Fitted Values
  • 31. 31 5. Homoscedasticity Figure 14: Test for Homoscedasticity  “The error u has the same variance given any values of the explanatory variables” (Wooldridge 2014: 81).  The pattern of the data points appear to be getting closer to each other towards the right of the graph, therefore indicating heteroscedasticity and the assumption fails. Overall, regarding the assumptions we have seen that the random sampling and homoscedasticity assumptions have been violated, therefore, this model is flawed and cannot be used for any type of analysis. -4-2 02 Residuals 7 7.5 8 8.5 9 9.5 Fitted values 5.Homoscedasticity of Residuals
  • 32. 32 Concluding remarks and Recommendations South African labour market presents opportunities for participants to earn some income that determines their welfare. These different earning opportunities presented are based on gender, economic sector, location, race, education and occupation. The main objective of this project is to identify and analyse the determinants and distribution of income in South African labour market. We find that among all the factors identified: race, education and economic sector are the most significant variables that we can use to explain the variation of income in our model. From the empirical analysis, whites hold a largest share of income in the labour market. Age also affects the level of income. Which the literature strongly reveals that older people are mostly skilled and would prefer enjoying their retirements at their homes but to incentive those to get into the labour market again proposed salary packages must be very lucrative. Gender has turned out to be an insignificant factor in our model. Although average income distribution between men and women looks unequal, since the demise of Apartheid men and women have been given equal access to opportunities, labour market is becoming more feminized and income inequality based on gender is gradually decreasing. Income levels differ by location, income levels in the rural areas are lower than in the urban area. The outcome of the labour market analysis in this study shows that income inequality is more pronounced within races. The rich are getting richer and the poor even poorer. This is contributing largely to the worst Gini-coefficient in SA. Also education is another influential determinant of labour income. Upon testing our model against multiple linear regression Assumptions our model failed, meaning that we cannot use our model to make any predictions. Income inequality can be effectively addressed and labour market is the primary area where it should happen. In conclusion, equitable policies directed at human capital development could be of utmost help to decrease the unequal distribution of income within race groups. As we have seen as one steps up the education ladder the higher are their chances of getting paid a higher wage, more equitable educational opportunities should be created which eventually contribute to more equitable distribution of labour income.
  • 33. 33 REFERENCES Ahn, N & Garcia-Pérez J.I 2002 Unemployment duration and workers' wage aspirations in Spain. [Online] Available at http://www.econ.upf.edu/docs/papers/downloads/426.pdf [Accessed 18 August 2015] Burger, P.2015. Unpacking labor’s declining income share: manufacturing, mining and growing inequality. [Online] Available at http://www.econ3x3.org/article/unpacking-labour%E2%80%99s-declining-income-share- manufacturing-mining-and-growing-inequality#sthash.hY5pLzqE.dpuf [Accessed 15 August 2015] Bhorat, H. (2004) Labour Market Challenges in the Post-Apartheid South Africa, Working Paper 72 DPRU Cape Town: Development Policy Research Unit, University of Cape Town. [Online] Available at http://www.sarpn.org/documents/d0000747/P832-Labour_Market_Challenges- Bhorat.pdf [Accessed 13August 2015] Bhorat, H., Van der Westhuizen, C. & Jacobs, T. (2009) Income and Non-Income Inequality in Post-Apartheid South Africa: What are the Drivers and Possible Policy Interventions? DPRU Working Paper 09/138. Cape Town: Development Policy Research Unit, University of Cape Town. [Online] Available at http://www.dpru.uct.ac.za/sites/default/files/image_tool/images/36/DPRU%20WP09-138.In [Accessed 13 August 2015] Burger. R & Woolard, I. 2005. The State of the Labour Market in South Africa after the first decade of democracy. Working paper 133 DPR Cape Town: Development Policy Research Unit, University of Cape. [Online] Available at http://www.cssr.uct.ac.za/sites/cssr.uct.ac.za/files/pubs/wp133.pdf[Accessed 13 August 2015]
  • 34. 34 Burger, P (2015). Unpacking labor’s declining income share: manufacturing, mining and growing inequality. Available at http://www.econ3x3.org/article/unpacking-labour%E2%80%99s- declining-income-share-manufacturing-mining-and-growing-inequality#sthash.hY5pLzqE.dpuf [Accessed 18 August 2015] Hofmeyr, J.F.2001. Segmentation in the South African Labour Market in 1999. Paper presented at DPRU/FES Conference on Labour Markets and Poverty in South Africa, Johannesburg, 15-16 November. Quoted in Burger. R & Woolard, I. 2005. The State of the Labour Market in South Africa after the first decade of democracy. Working paper 133 DPR Cape Town: Development Policy Research Unit, University of Cape Town. Introduction to SAS. UCLA: Statistical Consulting Group [Online]. Available at http://www.ats.ucla.edu/stat/sas/notes2/ [Accessed 29 September 2015] Nattrass, N & Walker, R. (2005).Unemployment and Reservation wages in Working- Class Cape Town. South African Journal of Economics, 73(3) pp. 503-505. Available from http://onlinelibrary.wiley.com/doi/10.1111/j.1813-6982.2005.00034.x/epdf [Accessed 13 August 2015] OECD.2011. Record inequality between rich and poor. Available from https://youtu.be/ZaoGscbtPWU [Accessed on 17 August 2015] Sinha, R. (2010, April 22) What Factors Affect Your Income? Knoji Consumer Knowledge. Retrieved from https://income-wages-earningpower.knoji.com/what-factors-affect-your- income/ [Accessed 07 August 2015]
  • 35. 35 Stiglitz, J.E 2013, Inequality is a choice, The New York Times, 13 October 2014 Available from http://opinionator.blogs.nytimes.com/2013/10/13/inequality-is-a-choice/ [Accessed 18 August 2015] Trading Economics 2009. GINI index in South Africa. [Online] Available from http://www.tradingeconomics.com/south-africa/gini-index-wb-data.html [Accessed 10 August 2015] The World Bank, (2011).GINI index (World Bank estimate). [Online] Available from http://data.worldbank.org/indicator/SI.POV.GINI [Accessed 11 August 2015] US Census Bureau, 2000. Earnings by Opportunities and Education. Available from http://www.census.gov/hhes/www/income/data/earnings/call1usmale.html [Accessed 12 August 2015] Van Der Berg, S. 2010. Current poverty and income distribution in the context of South African history. Stellenbosch Economic Working Papers: 22/10 Wooldridge, J.M. 2014. Introduction to Econometrics. Europe, Middle East & Africa Edition. Cengage.
  • 36. 36