1. 2017 World Happiness Report Data Analysis
Prepared by:
Achilleas Papatsimpas
Mathematician, M.Sc. Statistics and Operational Research
We will use for this project the 2017_world_happiness.csv dataset from Kaggle. Our
dataset consists of 155 countries and various characteristics such as the Happiness
Score or the Economy GDP per capita are investigated. The aim of this project is to
analyze the factors that affect the World happiness score and investigate possible
correlations with Python.
I. Variable Definitions (Helliwell et. Al (2017))
Happiness score or subjective well-being: the national average response to the
question of life evaluations. The English wording of the question is “Please
imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top.
Economy..GDP.per.capita: the statistics of GDP per capita in purchasing power
parity (PPP) at constant 2011 international dollar prices.
Healthy Life Expectancy (HLE): the time series of total life expectancy to healthy
life expectancy by simple multiplication, assuming that the ratio remains
constant within each country over the sample period.
Family: the national average of the binary responses (either 0 or 1) to the GWP
question “If you were in trouble, do you have relatives or friends you can count
on to help you whenever you need them, or not?”
Freedom: Freedom to make life choices is the national average of responses to
the GWP question “Are you satisfied or dissatisfied with your freedom to choose
what you do with your life?”.
Generosity: the residual of regressing national average of response to the GWP
question “Have you donated money to a charity in the past month?” on GDP
per capita.
Trust..Government.Corruption.: The measure is the national average of the
survey responses to two questions in the GWP: “Is corruption widespread
throughout the government or not” and “Is corruption widespread within
businesses or not?” The overall perception is just the average of the two 0-or-1
responses.
Dystopia: an imaginary country that has the world’s least-happy people. The
purpose in establishing Dystopia is to have a benchmark against which all
2. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
2
countries can be favorably compared (no country performs more poorly than
Dystopia) in terms of each of the six key variables, thus allowing each sub-bar
to be of positive width.
First, we import the data in Python:
import pandas as pd # data processing
import seaborn as sns
import matplotlib.pyplot as plt
my_data=pd.read_csv("../input/2017_world_hapiness.csv")
my_data.head()
II. Descriptive Statistics for the Happiness Score
print(my_data['Happiness.Score'].describe())
plt.figure(figsize=(9, 8))
sns.distplot(my_data['Happiness.Score'], color='g', bins=10,
hist_kws={'alpha': 0.4});
count 155.000000
mean 5.354019
std 1.131230
min 2.693000
25% 4.505500
50% 5.279000
75% 6.101500
max 7.537000
Name: Happiness.Score, dtype: float64
The mean Happiness Score worldwide is 5.354 which means that the global population
is moderately satisfied by its way of life. The standard deviation is 1.1312 and the
3. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
3
median is 5.279. From the above histogram we can conclude that the variable
Happiness.Score seems to be normally distributed.
III. Descriptive Statistics for all Variables
my_data.describe()
The mean GDP per capita worldwide is 0.984718 and the mean satisfaction with
people’s freedom to choose what they do with their life is 0.408786. This can be
considered as unsatisfactory with the freedom to make life choices.
Additionally, the mean for the trust of Government corruption is 0.123120 which shows
that it is mainly widespread throughout the government.
IV. Boxplots for the variables
boxplot = my_data.boxplot(column=['Happiness.Score',
'Economy..GDP.per.Capita.', 'Family',
'Health..Life.Expectancy.', 'Freedom', 'Generosity',
'Dystopia.Residual', 'Trust..Government.Corruption.'],
figsize=(18,8))
4. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
4
From the above boxplots we have the presence of extreme values for the variables
Family, Generosity, Dystopia Residual and Trust Government corruption. Also, the
variance is higher in the Happiness.Score variable comparing to the other variables.
V. Histograms for all the variables
my_data.hist(figsize=(16, 20), bins=20, xlabelsize=8,
ylabelsize=8);
From the above histograms it seems that:
Economy..GDP.per.capita, Family, Freedom, Healthy Life Expectancy (HLE) are
left skewed
5. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
5
Generosity and Trust..Government.Corruption. are right skewed.
VI. Correlations with Happiness Score
Next, we will investigate which variables are strongly correlated with Happiness Score.
my_data_corr = my_data.corr()['Happiness.Score'][:-1]
golden_features_list = my_data_corr[abs(my_data_corr) >
0.5].sort_values(ascending=False)
print("There is {} strongly correlated values with
Happiness.Score:n{}".format(len(golden_features_list),
golden_features_list))
There is 8 strongly correlated values with Happiness.Score:
Happiness.Score 1.000000
Economy..GDP.per.Capita. 0.812469
Health..Life.Expectancy. 0.781951
Family 0.752737
Freedom 0.570137
Happiness.Rank -0.992774
Name: Happiness.Score, dtype: float64
Economy.GDP.per.Capita, Health.Life.Expectancy, Family and Freedom are strongly
correlated and have statistically significant correlations with Happiness Score.
A heat map of the 8 strongly correlated values with Happiness.Score is presented
below.
corr = my_data.drop('Happiness.Rank', axis=1).corr()
plt.figure(figsize=(12, 10))
sns.heatmap(corr[(corr >= 0.5) | (corr <= -0.4)],
cmap='viridis', vmax=1.0, vmin=-1.0,
linewidths=0.1,
annot=True, annot_kws={"size": 8},
square=True);
6. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
6
Higher GDP per capita, Family, Health life expectancy and Freedom levels seem to be
significant factors for a higher Happiness Score.
VII. Scatter dots
plt.scatter(my_data['Happiness.Score'],
my_data['Economy..GDP.per.Capita.'], edgecolors='r')
plt.xlabel('Happiness Score')
plt.ylabel('GDP per Capita')
plt.show()
8. 2017 WORLD HAPPINESS REPORT DATA ANALYSIS
8
We graphed happiness score with GDP per capita, Family, Health life expectancy and
Freedom. From this, we can see a positive correlation. As the above variables increase
so does the overall happiness score. This is very informative and gives us insight into
why people are happy in certain countries.
VIII. Conclusions
The mean Happiness Score worldwide is 5.354 which indicates that there exists a
moderate satisfaction from the global population. Economy.GDP.per.Capita,
Health.Life.Expectancy, Family and Freedom are strongly correlated and have
statistically significant correlations with Happiness Score, which is presented with a
heat map and scatter dot diagrams.
IX. Bibliography
Helliwell John F., Huang Haifang and Wang Shun (2017), The social foundations of
world happiness", World Happiness Report 2017