Business statistcs

1

QUALITY OF
LIFE IN G 20
COUNTRIES

Matteo
Biagini

2

Index:
p.3 Introduction
p.5 Correlation Matrix
p.6 Regression model
p.7 Factor Analysis
p.14 Cluster Analysis
p.19 Conclusion

3

QUALITY
OF
LIFE
IN
G
20
COUNTRIES

INTRODUCTION

The
aim
of
this
research
is
to
investigate
how
the
quality
of
life
in
G20
countries
is
related
to
some

indicators
of

life
quality.

Considering
quality
of
life

we
refer
to
the
general
well-‐being
of
individuals
and
societies.
The
term

is
used
in
a
wide
range
of
contexts,
including
the
fields
of
international
development,
healthcare,

and
politics.
Standard
indicators
of
the
quality
of
life
include
not
only
wealth
and
employment,
but

also
the
built
environment,
physical
and
mental
health,
education,
recreation
and
leisure
time,

and
social
belonging.

So
among
a
variety
of
indicators
we
have
chosen
8.

Life
expectancy
is
a
key
indicator
of
the
general
health
of
the
population.
Improvements
in
overall

life
expectancy
reflect
improvements
in
social
and
economic
conditions,
lifestyle,
access
to
health

services
and
medical
advances.
This
indicator
uses
estimated
life
expectancy
at
birth.

CO2
emissions
and
terrestrial
protected
areas
are
indicators
that
concern
how
natural

environment
supports
its
people,
economy
and
culture.
As
the
population
grows
and
economic

activity
increases,
more
demands
are
placed
on
the
natural
environment.
Environmental
issues

impact
on
economic
and
public
health
issues.
In
fact
another
indicator
that
we

have
chosen
is

health
expenditure
per
capita
that
is
very
related
with
previous
indicators.

Urban
population
refers
to
population
growth
and
change
in
cities
impact
on
the
relationships

people
have
with
others
and
their
sense
of
belonging
to
an
area.

The
concept
of
community
is
fundamental
to
people’s
overall
quality
of
life
and
sense
of

belonging.
In
fact
we
have
chosen
subsidies
and
other
transfers
like
an
indicator
of
quality
of
life

because
these
are
an
instrument
with
whom
government
reassign
wealth
among
people
of
a

country.

Public
expenditure
on

education
provides
an
insight
into
the
knowledge
and
skills
of
residents
and

how
they
can
apply
these
to
improve
their
quality
of
life.
Educational
achievement
is
essential
for

effective
participation
in
society.

The
last
indicator
is
unemployment:
a
reduction
of
this
indicator
helps
stimulate
further

opportunities
for
economic
growth
and
development
within
a
community
or
nation.

4

The
considered
countries
(G20
countries
that
are
the
richest
one
in
the
world)
are:
Canada,

France,
Germany,
Japan,
Italy,
Russian
Federation,
United
States,
United
Kingdom,
Brazil,
China,

South
Africa,
Australia,
Saudi
Arabia,
South
Korea,
Indonesia,
Mexico,
Turkey,
Spain,
Netherlands.

The
source
of
data
is
the
World
data
Bank
in
the
section
of
World
Development
indicators(WDI).

The
year
chosen
to
extract
data
is
2008.

The
specific
software
used
on
this
project
are:

·∙ Gretl(regression)

·∙ R-‐Project
(factor
and
cluster
analysis)

·∙ Microsoft
Excel
(data
matrix
elaboration,
before
and
after
using
R)

We
have
numbered
X
from
1
to
8
in
relation
to
any
variable:

·∙ X1=CO2
emissions
(kg
per
2000
US$
of
GDP)

·∙ X2=Urban
population

·∙ X3=Health
expenditure
per
capita
(current
US$)

·∙ X4=Life
expectancy
at
birth,
total
(years)

·∙ X5=Unemployment,
total
(%
of
total
labor
force)

·∙ X6=Public
spending
on
education,
total
(%
of
GDP)

·∙ X7=Subsidies
and
other
transfers
(%
of
expense)

·∙ X8=Terrestrial
protected
areas
(%
of
total
land
area)

5

Correlation
matrix

X1
X2
X3
X4
X5
X6
X7
X8

1,0000
0,4108
-‐0,6168
-‐0,7387
0,2370
-‐0,4123
-‐0,0290
-‐0,2151
X1

1,0000
-‐0,2571
-‐0,2300
-‐0,2166
-‐0,5982
-‐0,1159
-‐0,0277
X2

1,0000
0,6361
-‐0,2003
0,4932
0,3154
0,1806
X3

1,0000
-‐0,6507
0,2132
0,2230
0,2105
X4

1,0000
0,0424
-‐0,0984
-‐0,1525
X5

1,0000
0,0872
0,2719
X6

1,0000
0,1855
X7

1,0000
X8

We
can
see
from

the
data
that
there
is
not
a
very
high
correlation,
but
we
can
run
a
factor

analysis
since
there
are
some
correlations.
Using
R
we
have
found
this
values
that
refers
to

correlation
coefficient
of
Pearson.
So
we
can
conclude
that
there
is
a
strong
correlation
between

X4-‐X1
and
there
is
a
moderate
correlation
among
X1
and
X6-‐X3-‐X2,
between
X2-‐X6,
between
X3

and
X6-‐X4
and
finally
between
X4-‐X5.

We
have
considered
a
strong
correlation
if

corr
>
0.7
and
moderate
correlation
if
0.3
<
corr
<
0.7.

6

REGRESSION
MODEL

Model
1:
OLS,
number
of
observations

1-‐20

Dependent
variable:
Life
expectancy
at
birth.

Coefficient

Std.
Error
t-‐ratio
p-‐value

Constant.
88,4781
8,19707
10,7939
<0,00001
***

CO2
emissions
kg

per
2000
US$
of

GDP
.

-‐3,18062
1,18728
-‐2,6789
0,02008
**

Urban
population.
-‐1,19832e-‐08
8,08775e-‐09
-‐1,4817
0,16421

Health

expenditure
per

capita.

0,00106495
0,000551237
1,9319
0,07732
*

Unemployment

total.

-‐0,903724
0,206679
-‐4,3726
0,00091
***

Public
spending
on

education.

-‐1,75829
1,13982
-‐1,5426
0,14888

Subsidies
and

other
transfers.

0,0396108
0,0953704
0,4153
0,68523

Terrestrial

protected
areas.

0,026664
0,0893965
0,2983
0,77060

R-‐squared

0,865092

R
(adjusted)

0,786395

P-‐value(F)

0,000221

With
the
software
Gretl

we
have
run
a
regression
of
our
data
using
OLS
regression
method.

Analyzing
R-‐squared
we
can
conclude
that
the
model
as
a
whole
is
very
good.
Also
P-‐value(F)
is

very
low
so
it
means
that
the
model
as
a
whole
is
very
significant
for
any
value
of
α.
The

dependent
variable
is
“life
expectancy
at
birth”
and
the
others
are
independent
variables.
The

7

independent
variables
that
have
a
significant
p-‐value
are:
CO2
emissions,
health
expenditure
per

capita
and
unemployment.

Since
p-‐value
is
smaller
than
0.05,
we
reject
the
null
hypothesis
and
we
affirm
that
the
regressor

CO2
emissions
has
a
significant
impact
on
life
expectancy
at
birth
at
level
5%..

Since
p-‐value
is
smaller
than
0.1,
we
reject
the
null
hypothesis
and
we
affirm
that
the
regressor

health
expenditure
per
capita
has
a
significant
impact
on
life
expectancy
at
birth
at
level
10%..

Finally
since
p-‐value
is
smaller
than
0.01,
we
reject
the
null
hypothesis
and
we
affirm
that
the

regressor
unemployment
total
has
a
significant
impact
on
life
expectancy
at
birth
at
level
1%.

So
we
can
conclude
that
if
CO2
emissions
increase

of
1
Kg
per
2000
US$
of
GDP,
life
expectancy
at

birth
will
reduce
of
3,18062
years.

Another
conclusion
is
that
if
health
expenditure
per
capita
increases
of
1
current
US$,
life

expectancy
at
birth

will
increase
of
0,00106495
years.
Finally
if
unemployment
total
will
increase

of
1%

life
expectancy

at
birth
will
reduce
of
-‐0,903724
years.

FACTOR
ANALYSIS

In
order
to
run
a
factor
analysis
we
applied
the
“Principal
component
method”

by
using
R.
So
we

found
these
data
of
eigenvalues,
portion
of
variance(total)
and
cumulative
proportion
of

variance(total).

Eigenvalues
Portion
of
variance

(total)

Cumulative

proportion
of

variance(total)

3.13602447
0.3920031
0.3920031

1.59218446
0.1990231
0.5910261

1.06125308
0.1326566
0.7236828

0.88797144
0.1109964
0.8346792

0.55766918
0.06970865
0.90438783

0.48900580
0.06112573
0.96551355

0.19844296
0.02480537
0.99031892

0.07744861
0.009681076
1.000000000

8

To
select
how
many
factors
to
use
we
considered
eigenvalues>
1
applying
“kaiser
criterium”,
so

we
dropped
all
components
with
eigenvalues
under
1.

Eigenvalue≅equivalent
number
of
variables
which
the
factor
represents.

Looking
at
the
table
we
can
see
that
with
3
eigenvalues,
the
factor
model
will
explain
72.37%
of

total
original
variability.

SCREE
PLOT

We
can
see
also
the
results
from
another

point
of
view
thanks
to
the
scree
plot.
This

test
puts
the
components
in
the
X
axis
and

the
corresponding
eigenvalues
in
the
Y-‐axis.

The
factor
loading
lij
is
the
covariance
between
the
j-‐th
common
factor
and
the
i-‐th
original

variable.
But
the
chosen
variables
are
standardized
so
it
coincides
with
the
correlation
between

the
j-‐th
common
factor
and
the
i-‐th
original
variable.
In
these
case
the
minimum
value
is
-‐1
(in

case
of
perfect
negative
correlation)
and
the
maximum
value
is
1
(in
case
of
perfect
positive

correlation).

Comp.1 Comp.3 Comp.5 Comp.7
.PC
Variances
0.00.51.01.52.02.53.0

9

VARIANCE
EXPLAINED
BY
EACH
FACTOR

FACTOR
1
FACTOR
2
FACTOR
3

30.11%
22.34%
8.9%

The
portion
of
total
variability
explained
by
the
first
factor
is
2.409/8=30.11%
(ss
loading/sum
of

total
variance).
The
portion
of
total
variability
explained
by
the
second
factor
is
1.787/8=22.34%.

The
portion
of
total
variability
explained
by
the
third
factor
is
0.712/8=8.9%.
The
total
variance

explained
by
the
model
is
61.35%,
which
indicates
that
the
model
is
quite
good.

FACTOR
LOADING
MATRIX

Factor
1
Factor
2
Factor
3

CO2.emissions
(
X1)

-‐0.596
-‐0.349
-‐0.460

Health
expenditure
per
capita
(
X2)
0.532
0.430
0.334

Life
expectancy
at
birth
(
X3)

0.923

0.376

Public
spending.on
education
(
X4)

0.246
0.955
-‐0.148

Subsidies
and
other
transfers
of
expense
(

X5)

0.188

0.122

Terrestrial
protected
areas

(
X6)

0.237
0.216

Unemployment
(
X7)
-‐0.869
0.325
0.365

Urban
population
(
X8)

-‐0.106
-‐0.640
-‐0.274

SS
loadings

2.409

1.787

0.712

Proportion
Var

0.301

0.223

0.089

Cumulative
Var

0.301

0.525

0.614

10

FINAL
ESTIMATION
OF
THE
COMMUNALITIES

communalities
Specific
variance

CO2.emissions
(
X1)

0,689
0,311

Health
expenditure
per
capita
(
X2)
0,58
0,42

Life
expectancy
at
birth
(
X3)

0,995
0,005

Public
spending
on
education
(
X4)

0,995
0,005

Subsidies
and
other
transfers
of
expense
(X5
)

0,0054
0,946

Terrestrial
protected
areas

(
X6)

0,105
0,895

Unemployment
(
X7)
0,995
0,005

Urban
population
(
X8)

0,496
0,504

Total
4,8604

By
the
final
estimation
of
the
communalities
we
can
see
that
there
are
5
communalities
that
well

explain
the
model

because
higher
than
50%
(these
communalities
refers
to
variables:
X1
,
X2,

X3,

X4,
X7).
There
are
also
3
communalities
that
don’t
explain
the
model
very
well

(these

communalities
refers
to
variables
X5,

X6,

X8)
.

In
fact
variables
with
high
communality
share
more
in
common
with
the
rest
of
the
variables.

Indeed
specific
variance
for
each
observed
variable
is
that
portion
of
the
variable
that
cannot
be

predicted
from
the
other
variables.

So
we
decided
that
after
,in
naming
factors,
we
will
not
consider
X5,
X6.
But
given
that
X8
has
a

communality
very
near
to
50%
we
can
consider
this
variable.

11

Now
we
can
improve
the
interpretation
of
a
the
factors
by
applying
a
rotation
to
the
factor
loading

matrix.

ROTATED
VARIANCE
EXPLAINED
BY
EACH
FACTOR

(Total=61.36%)

FACTOR
1
FACTOR
2
FACTOR
3

26.02%
19.9%
15.44%

ROTATED
FACTOR
LOADING
MATRIX

(
varimax)

Factor
1
Factor
2
Factor
3

CO2.emissions
(
X1)

-‐0.772
-‐0.301

Health
expenditure
per
capita
(
X2)
0.645
0.402

Life
expectancy
at
birth
(
X3)

0.890
0.101
-‐0.439

Public
spending.on
education
(
X4)

0.154
0.984

Subsidies
and
other
transfers
of
expense
(

X5)

0.221

Terrestrial
protected
areas

(
X6)

0.143
0.260
-‐0.129

Unemployment
(
X7)
-‐0.260

0.962

Urban
population
(
X8)

-‐0.343
-‐0.537
-‐0.300

SS
loadings

2.082
1.592
1.235

Proportion
Var

0.260

0.199
0.154

Cumulative
Var

0.260
0.459
0.614

12

It
is
clear
that
with
the
rotation
now
the
variance
explained
by
each
factor
is
well
distributed
and

mostable
factor
3
passes
from
8.9%
to
15.44%.

Furthermore
we
want
to
assign
a
label
to
each
factor
considering
the
more
significant

variables.

In
naming
the
label
of
latent
variables
we
have
considered
more
the
original
variables
with

communality>50%.
First
factor
is
mainly
explained
by
CO2
emissions,

health
expenditure

per

capita,

life
expectancy

at
birth
unemployment.
We
have
not
considered

subsidies
and
other

transfers
of
expense
and
terrestrial
protected
areas

because
they
have
communality<50%.

Second
factor
is
mainly
explained
by

public
spending
on
education
and
urban
population
but
only

the
first
has
a
communality>50%.

The
third
factor
is
explained
by
unemployment.

In
principal
components,
the
first
factor
describes
most
of
variability.

After
choosing
number
of
factors
to
retain,
we
want
to
spread
variability
among
factors
to

improve
the
interpretation.
So
we
consider
“rotated
factors”
that
have
a
better
distinction
in
the

meanings
of
the
factor.

NEW
LATENT
VARIABLES
ORIGINAL
VARIABLES

FACTOR
1

WELFARE
AND
WELL-‐BEING

CO2.emissions
(
X1)

Health
expenditure
per
capita
(
X2)

Life
expectancy
at
birth
(
X3)

Subsidies
and
other
transfers
of
expense
(

X5)

FACTOR2
PUBLIC
INTERVENTION
ON

POPULATION

Public
spending
on
education
(
X4)

Terrestrial
protected
areas

(
X6)

Urban
population
(
X8)

FACTOR3
UNEMPLYMENT
Unemployment
(
X7)

13

CLUSTER
ANALYSIS

Now
we
want
to
analyze
how
we
can
cluster
the
countries
using
the
observations
of
real
variable

in
order
to
get
few
homogenous
groups.

We

compared
two
methods
of
clustering:

1.
hierarchical
method,
using
Euclidean
distance
and
the
ward
method;

2.
hierarchical
method,
using
Euclidean
distance
and
the
complete
linkage
method.

This
is
the
legend
of
countries:

1. Canada

2. France

3. Germany

4. Japan

5. Italy

6. RussianFederation

7. United
States

8. United
Kingdom

9. Brazil

10. China

11. India

12. South
Africa

13. Australia

14. Saudi
Arabia

15. Korea,
Rep.

16. Indonesia

17. Mexico

18. Turkey

19. Spain

20. Netherlands

14

With R Software we have run an analysis to choose the number of clusters basing on the within
sum of squares computation. From this graph we see that we could have four clusters after cluster
analysis.

15

In
this
cluster
analysis
we
have
used
the
ward
method
with
the
Euclidian
distance.
The
ward

method
is
a
non-‐hierarchical
method
based
on
the
ANOVA
approach.
Where
ANOVA
stands
for

ANalysis
Of
VAriance
table.

The graph suggests us that we can use 3 clusters because we can consider China like an isolated
country because has very few in common with other clusters.
Cluster 1: Usa, India. (7-11)
Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16)
Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia,
South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)

16

These are the means for each variable:

Cluster1
Cluster2
Cluster3

X1=CO2
emissions
(kg
per
2000

US$
of
GDP)

7.584599e-‐01
1.401193e+00
1.765399e+00

X2=Urban
population
3.652584e+07
1.153590e+08
4.082957e+08

X3=Health
expenditure
per
capita

(current
US$)

3.652584e+07
1.036108e+03
2.639784e+03

X4=Life
expectancy
at
birth,
total

(years)

7.691514e+01
7.343499e+01
7.173244e+01

X5=Unemployment,
total
(%
of

total
labor
force)

7.783333e+00
5.860000e+00
4.833333e+00

X6=Public
spending
on
education,

total
(%
of
GDP)

4.815916e+00
4.147186e+00
3.538987e+00

X7=Subsidies
and
other
transfers

(%
of
expense)

6.459847e+01
6.140823e+01
6.176835e+01

X8=Terrestrial
protected
areas
(%

of
total
land
area)

1.513201e+01
1.538366e+01

1.134538e+01

The
cluster
1
is
that
one
represents
more
variables.
It
is
composed
only
by
Usa
and
India.
This

cluster
seems
to
have

higher
values
in
health
expenditure,
life
expectancy,
unemployment,
public

spending
on
education
and
subsidies.

The
second
cluster
is
that
one
with
more
terrestrial
protected
areas.

Finally
the
third
cluster
has
the
higher
co2
emissions
and
urban
population,
but
we
can
see
also

that
is
the
cluster
formed
by
the
majority
of
elements.

17

10
1
12
20
13
14
19
5
15
3
2
8
18
9
17
6
4
16
7
11
0e+001e+082e+083e+084e+085e+08
Cluster Dendrogram for Solution HClust.10
Method=average; Distance=euclidian
Observation Number in Data Set Dataset
Height

This
cluster
analysis
with
average
method
and
Euclidian
distance
give
us
a
result
worse
than
the

previous
analysis.
Now
we
have
10(China)
that
is
an
outlier
and
7
and
11(U.S.
and
India)
that
are

far
different
from
other
two
clusters.

18

Without
7
9
10
11(U.S.
Brazil,
China,
India),
we
obtain
a
better
cluster
analysis
without
outlier.

Now
we
have
two
clusters,
the
first
composed
by Canada, France, Germany, Italy, United
Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-
13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4-
16) .

19

CONCLUSION

The
initial
aim
of
this
research
was
to
find
a
possible
relationship
between
countries
belonging
to

G20.
After
cluster
and
factor
analysis
we
can
say
that
the
results
obtained
are
quite
interesting

since
the
factor
analysis
suggests
us
3
new
latent
variables
that
summarize
the
original
ones.

We

passed
from
11
original
variables
to
3
variables.

The
factor
analysis
produced
a
quite
satisfactory
result.
We
have
now
three
groups:
“welfare
and

well-‐being”,
“public
intervention”
and
“unemplyment”.

Also
cluster
analysis
produced
a
satisfactory
result.
We
can
find
some
common
characteristics

among
clusters.
We
can
note
that
cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is
characterized by countries with an high population and apart Japan they are all developing
countries.

Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia,
South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is
the cluster with the higher welfare and equality of people inside clusters. We can also note that there
is the highest urban population but also the highest CO2 emissions.
It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the
U.S. is characterized by richness and is developed. Indeed India as a majority of poor population
and is a developing country. But we can also find some common points that could be public
spending on education because both India and U.S. have a good system of education.

Business statistcs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Business statistcs

Similar to Business statistcs (20)

More from Matteo Biagini

More from Matteo Biagini (11)

Business statistcs